SQLMesh - Data Transformation
Self-hosted SQLMesh running in CI/CD to create intermediate tables and data models from raw ClickHouse data.
Overview
SQLMesh runs in our CI/CD pipeline to transform raw data into business-ready analytics tables. This approach ensures data transformations are version-controlled and automatically deployed.
Why Self-Hosted?
- Full Control: Complete control over data transformation logic
- Cost Effective: No per-transformation pricing
- Integration: Seamless integration with our existing CI/CD pipeline
- Customization: Ability to create custom data models and business logic
CI/CD Integration
GitHub Actions Workflow
name: SQLMesh Data Transformation
on:
push:
branches: [main]
paths: ['analytics/models/**']
schedule:
- cron: '0 2 * * *' # Daily at 2 AM
jobs:
transform:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Setup Python
uses: actions/setup-python@v4
with:
python-version: '3.9'
- name: Install SQLMesh
run: pip install sqlmesh
- name: Configure ClickHouse Connection
run: |
sqlmesh config set default_connection.clickhouse.host ${{ secrets.CLICKHOUSE_HOST }}
sqlmesh config set default_connection.clickhouse.port ${{ secrets.CLICKHOUSE_PORT }}
sqlmesh config set default_connection.clickhouse.user ${{ secrets.CLICKHOUSE_USER }}
sqlmesh config set default_connection.clickhouse.password ${{ secrets.CLICKHOUSE_PASSWORD }}
- name: Plan Changes
run: sqlmesh plan --auto-apply
- name: Run Tests
run: sqlmesh test
Data Models
User Analytics
- Daily User Activity: Active users, new users, engagement metrics
- User Cohorts: Retention analysis and cohort tracking
- User Segmentation: User behavior and demographic analysis
Revenue Analytics
- Daily Revenue Metrics: Revenue trends, subscription metrics
- Monthly Recurring Revenue (MRR): MRR calculations and growth
- Customer Lifetime Value (CLV): Customer value analysis
Product Usage
- Feature Usage: Feature adoption and usage patterns
- Product Performance: Product-specific metrics and KPIs
- User Journey: User flow and conversion analysis
Key Benefits
- Version Control: All data transformations are version-controlled
- Automated Deployment: Changes are automatically deployed via CI/CD
- Data Quality: Built-in testing and validation
- Incremental Processing: Efficient processing of large datasets
- Team Collaboration: Dev team and BI team work together on data models
Integration Points
ClickHouse
- Source: Raw data from Airbyte ingestion
- Destination: Intermediate tables for analytics
- Connection: Direct connection from CI/CD to ClickHouse
Admin Areas
- Pixlr Admin: Uses processed data for real-time analytics
- Designs.ai Admin: Will use processed data when implemented
BI Team
- Caleb's Tools: Access to processed data for business intelligence
- Reporting: Pre-computed metrics for faster reporting
Next Steps
- ClickHouse - Hosted data warehouse
- Airbyte - Self-hosted data ingestion