Skip to main content

ClickHouse - Data Warehouse

Hosted ClickHouse service for storing and processing analytical data from both Pixlr and Designs.ai.

Overview

We use a hosted ClickHouse service to avoid infrastructure management overhead while maintaining high performance for analytical queries.

Why Hosted?

  • Limited Headcount: No need to manage ClickHouse infrastructure
  • Auto-scaling: Automatically scales up and down based on usage
  • Cost Optimization: Can shutdown when unused to save costs
  • Managed Service: Updates, backups, and monitoring handled by provider

Data Sources

Raw Data Tables

  • Pixlr Data: Users, sessions, images, subscriptions, events
  • Designs.ai Data: Users, projects, subscriptions, usage logs, billing
  • Ingestion: From Airbyte (self-hosted on AWS)

Processed Data Tables

  • Intermediate Tables: Created by SQLMesh CI/CD pipeline
  • Aggregated Tables: Pre-computed metrics for fast queries
  • Data Models: Business logic and transformations

Integration Points

Pixlr Admin Area

  • Current: Existing admin area connects directly to ClickHouse
  • Method: ClickHouse SDK or HTTP API
  • Purpose: Real-time analytics and reporting

Designs.ai Admin Area

  • Status: Planned for future implementation
  • Method: ClickHouse SDK or HTTP API
  • Purpose: Similar analytics capabilities as Pixlr

BI Team (Caleb)

  • Tools: Various BI tools connect to ClickHouse
  • Access: Read-only access for reporting and analysis
  • Purpose: Business intelligence and data analysis

Key Benefits

  • No Infrastructure Management: Focus on data, not servers
  • Automatic Scaling: Handles varying workloads efficiently
  • Cost Effective: Pay only for what you use
  • High Performance: Optimized for analytical queries
  • Reliability: Managed service with high availability

Next Steps

  • SQLMesh - CI/CD data transformation
  • Airbyte - Self-hosted data ingestion