Skip to main content

EC2 with abctl (Current Setup)

This document outlines the setup and automated scheduling for the Airbyte EC2 instance, including the ABCTL management tool.

EC2 Instance Setup

1. Launch EC2 Instance

  • OS: Amazon Linux 2023
  • Instance Type: t3.xlarge
  • Storage: 60GB GP3 EBS volume
  • Security Group:
    • Allow SSH (port 22)
    • Allow all outbound traffic

2. Connect to Instance

ssh -i /path/to/your-key.pem ec2-user@your-instance-ip

Install Kubernetes and Tools

1. Update System

sudo yum update -y

2. Install Docker

sudo yum install -y docker
sudo systemctl enable docker
sudo systemctl start docker
sudo usermod -aG docker $USER

3. Install kubectl

curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl

4. Install k3s (Lightweight Kubernetes)

curl -sfL https://get.k3s.io | sh -
sudo chmod 644 /etc/rancher/k3s/k3s.yaml
mkdir -p ~/.kube
sudo cp /etc/rancher/k3s/k3s.yaml ~/.kube/config
sudo chown $USER:$USER ~/.kube/config
export KUBECONFIG=~/.kube/config
echo "export KUBECONFIG=~/.kube/config" >> ~/.bashrc

Install Airbyte with ABCTL

1. Install ABCTL

curl -s https://raw.githubusercontent.com/airbytehq/airbyte-platform/main/install.sh | bash /dev/stdin --install-abctl
source ~/.bashrc

2. Create Airbyte Namespace

kubectl create namespace airbyte

3. Install Airbyte

abctl init
abctl install

4. Verify Installation

kubectl get pods -n airbyte

Overview

Our current Airbyte deployment uses an EC2 instance managed by ABCTL, with automated scheduling to optimize costs while ensuring data synchronization.

Key Features

  • Automated Scheduling: Instance runs only when needed
  • Cost Optimization: Reduced operational costs
  • Simplified Management: Using ABCTL for deployment and management

EC2 Setup

Instance Requirements

  • Instance Type: t3.xlarge (minimum)
  • Storage: Minimum 60GB EBS volume
  • Ports: 8000 (Airbyte UI) and 8001 (Airbyte API) open in security group

VPC Peering Configuration

For secure connectivity between Airbyte and MongoDB in a different VPC/account:

  1. Create VPC Peering Connection

    • Navigate to VPC > Peering Connections > Create Peering Connection
    • Configure peering between Airbyte VPC and MongoDB VPC
    • Accept the peering request in the MongoDB account
  2. Update Route Tables

    • Add routes in both VPC route tables to allow traffic between peered VPCs
    • Example route for Airbyte VPC:
      • Destination: MongoDB VPC CIDR
      • Target: Peering Connection ID
  3. Security Group Rules

    • Update MongoDB security group to allow inbound traffic from Airbyte VPC CIDR
    • Configure Airbyte security group to allow outbound to MongoDB VPC

Network Verification

# Test connectivity to MongoDB from EC2
nc -zv <mongodb-private-ip> 27017

# Check route tables
aws ec2 describe-route-tables --route-table-ids <my-route-table-id>

Automated Scheduling

We've implemented an automated scheduling system that:

  • Starts the EC2 instance daily at 8:00 AM
  • Stops the instance at 9:00 AM
  • Ensures the instance only runs during business hours when needed

IAM Role Configuration

Created a dedicated IAM role AirbyteECSchedulerRole with the following permissions:

  • ec2:StartInstances
  • ec2:StopInstances
  • ec2:DescribeInstances

Schedule Configuration

1. Start Schedule

  • Name: start-airbyte-instance
  • Schedule: cron(0 8 * * ? *) (8:00 AM)
  • Target: EC2 StartInstances API
  • Instance ID: i-0b5cfa05229d92dcf (Airbyte EC2)
  • Retry Policy: 3 attempts with exponential backoff

2. Stop Schedule

  • Name: stop-airbyte-instance
  • Schedule: cron(0 9 * * ? *) (9:00 AM)
  • Target: EC2 StopInstances API
  • Instance ID: i-0b5cfa05229d92dcf (Airbyte EC2)
  • Additional Actions:
    • Gracefully shutdown Airbyte services before stopping instance
    • Verify all data syncs are complete
    • Send notification on successful shutdown

Verification & Monitoring

Manual Verification

  1. Go to Amazon EventBridge > Schedules
  2. Check both schedules are in ENABLED state
  3. Verify next trigger times are correct
  4. Check execution history for any failures

CloudWatch Alarms

Set up CloudWatch alarms for:

  • Instance running outside scheduled hours
  • Failed schedule executions
  • High CPU/Memory usage during sync windows

Logging

  • All schedule executions are logged to CloudTrail
  • Instance state changes are logged in CloudWatch Logs
  • Airbyte service logs are available in Kubernetes pods

Cost Optimization

Instance Scheduling

  • Runtime: 1 hour per day (8:00 AM - 9:00 AM)
  • Monthly Cost Estimate: ~$10 (t3.xlarge in us-east-1)
  • Savings: ~85% compared to 24/7 operation, obviously

Cost Monitoring

  • Use AWS Cost Explorer to track actual spend
  • Set up budget alerts for unexpected charges
  • Review and adjust schedule as needed
  • Look for /aws/events/ec2-scheduler log group

EC2 Instance State

  • Check instance state history in EC2 Console
  • Verify instance starts/stops at scheduled times

Troubleshooting

Common Issues

  1. Instance Not Starting/Stopping

    • Verify IAM role permissions
    • Check CloudWatch logs for errors
    • Ensure instance ID is correct
  2. Schedule Not Triggering

    • Verify EventBridge Scheduler status
    • Check for resource conflicts
    • Verify timezone settings