Skip to main content

SQLMesh - Data Transformation

Self-hosted SQLMesh running in CI/CD to create intermediate tables and data models from raw ClickHouse data.

Overview

SQLMesh runs in our CI/CD pipeline to transform raw data into business-ready analytics tables. This approach ensures data transformations are version-controlled and automatically deployed.

Why Self-Hosted?

  • Full Control: Complete control over data transformation logic
  • Cost Effective: No per-transformation pricing
  • Integration: Seamless integration with our existing CI/CD pipeline
  • Customization: Ability to create custom data models and business logic

CI/CD Integration

GitHub Actions Workflow

name: SQLMesh Data Transformation

on:
push:
branches: [main]
paths: ['analytics/models/**']
schedule:
- cron: '0 2 * * *' # Daily at 2 AM

jobs:
transform:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Setup Python
uses: actions/setup-python@v4
with:
python-version: '3.9'
- name: Install SQLMesh
run: pip install sqlmesh
- name: Configure ClickHouse Connection
run: |
sqlmesh config set default_connection.clickhouse.host ${{ secrets.CLICKHOUSE_HOST }}
sqlmesh config set default_connection.clickhouse.port ${{ secrets.CLICKHOUSE_PORT }}
sqlmesh config set default_connection.clickhouse.user ${{ secrets.CLICKHOUSE_USER }}
sqlmesh config set default_connection.clickhouse.password ${{ secrets.CLICKHOUSE_PASSWORD }}
- name: Plan Changes
run: sqlmesh plan --auto-apply
- name: Run Tests
run: sqlmesh test

Data Models

User Analytics

  • Daily User Activity: Active users, new users, engagement metrics
  • User Cohorts: Retention analysis and cohort tracking
  • User Segmentation: User behavior and demographic analysis

Revenue Analytics

  • Daily Revenue Metrics: Revenue trends, subscription metrics
  • Monthly Recurring Revenue (MRR): MRR calculations and growth
  • Customer Lifetime Value (CLV): Customer value analysis

Product Usage

  • Feature Usage: Feature adoption and usage patterns
  • Product Performance: Product-specific metrics and KPIs
  • User Journey: User flow and conversion analysis

Key Benefits

  • Version Control: All data transformations are version-controlled
  • Automated Deployment: Changes are automatically deployed via CI/CD
  • Data Quality: Built-in testing and validation
  • Incremental Processing: Efficient processing of large datasets
  • Team Collaboration: Dev team and BI team work together on data models

Integration Points

ClickHouse

  • Source: Raw data from Airbyte ingestion
  • Destination: Intermediate tables for analytics
  • Connection: Direct connection from CI/CD to ClickHouse

Admin Areas

  • Pixlr Admin: Uses processed data for real-time analytics
  • Designs.ai Admin: Will use processed data when implemented

BI Team

  • Caleb's Tools: Access to processed data for business intelligence
  • Reporting: Pre-computed metrics for faster reporting

Next Steps