EC2 with abctl (Current Setup)
This document outlines the setup and automated scheduling for the Airbyte EC2 instance, including the ABCTL management tool.
EC2 Instance Setup
1. Launch EC2 Instance
- OS: Amazon Linux 2023
- Instance Type: t3.xlarge
- Storage: 60GB GP3 EBS volume
- Security Group:
- Allow SSH (port 22)
- Allow all outbound traffic
2. Connect to Instance
ssh -i /path/to/your-key.pem ec2-user@your-instance-ip
Install Kubernetes and Tools
1. Update System
sudo yum update -y
2. Install Docker
sudo yum install -y docker
sudo systemctl enable docker
sudo systemctl start docker
sudo usermod -aG docker $USER
3. Install kubectl
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl
4. Install k3s (Lightweight Kubernetes)
curl -sfL https://get.k3s.io | sh -
sudo chmod 644 /etc/rancher/k3s/k3s.yaml
mkdir -p ~/.kube
sudo cp /etc/rancher/k3s/k3s.yaml ~/.kube/config
sudo chown $USER:$USER ~/.kube/config
export KUBECONFIG=~/.kube/config
echo "export KUBECONFIG=~/.kube/config" >> ~/.bashrc
Install Airbyte with ABCTL
1. Install ABCTL
curl -s https://raw.githubusercontent.com/airbytehq/airbyte-platform/main/install.sh | bash /dev/stdin --install-abctl
source ~/.bashrc
2. Create Airbyte Namespace
kubectl create namespace airbyte
3. Install Airbyte
abctl init
abctl install
4. Verify Installation
kubectl get pods -n airbyte
Overview
Our current Airbyte deployment uses an EC2 instance managed by ABCTL, with automated scheduling to optimize costs while ensuring data synchronization.
Key Features
- Automated Scheduling: Instance runs only when needed
- Cost Optimization: Reduced operational costs
- Simplified Management: Using ABCTL for deployment and management
EC2 Setup
Instance Requirements
- Instance Type: t3.xlarge (minimum)
- Storage: Minimum 60GB EBS volume
- Ports: 8000 (Airbyte UI) and 8001 (Airbyte API) open in security group
VPC Peering Configuration
For secure connectivity between Airbyte and MongoDB in a different VPC/account:
-
Create VPC Peering Connection
- Navigate to VPC > Peering Connections > Create Peering Connection
- Configure peering between Airbyte VPC and MongoDB VPC
- Accept the peering request in the MongoDB account
-
Update Route Tables
- Add routes in both VPC route tables to allow traffic between peered VPCs
- Example route for Airbyte VPC:
- Destination: MongoDB VPC CIDR
- Target: Peering Connection ID
-
Security Group Rules
- Update MongoDB security group to allow inbound traffic from Airbyte VPC CIDR
- Configure Airbyte security group to allow outbound to MongoDB VPC
Network Verification
# Test connectivity to MongoDB from EC2
nc -zv <mongodb-private-ip> 27017
# Check route tables
aws ec2 describe-route-tables --route-table-ids <my-route-table-id>
Automated Scheduling
We've implemented an automated scheduling system that:
- Starts the EC2 instance daily at 8:00 AM
- Stops the instance at 9:00 AM
- Ensures the instance only runs during business hours when needed
IAM Role Configuration
Created a dedicated IAM role AirbyteECSchedulerRole with the following permissions:
ec2:StartInstancesec2:StopInstancesec2:DescribeInstances
Schedule Configuration
1. Start Schedule
- Name:
start-airbyte-instance - Schedule:
cron(0 8 * * ? *)(8:00 AM) - Target: EC2
StartInstancesAPI - Instance ID:
i-0b5cfa05229d92dcf(Airbyte EC2) - Retry Policy: 3 attempts with exponential backoff
2. Stop Schedule
- Name:
stop-airbyte-instance - Schedule:
cron(0 9 * * ? *)(9:00 AM) - Target: EC2
StopInstancesAPI - Instance ID:
i-0b5cfa05229d92dcf(Airbyte EC2) - Additional Actions:
- Gracefully shutdown Airbyte services before stopping instance
- Verify all data syncs are complete
- Send notification on successful shutdown
Verification & Monitoring
Manual Verification
- Go to Amazon EventBridge > Schedules
- Check both schedules are in
ENABLEDstate - Verify next trigger times are correct
- Check execution history for any failures
CloudWatch Alarms
Set up CloudWatch alarms for:
- Instance running outside scheduled hours
- Failed schedule executions
- High CPU/Memory usage during sync windows
Logging
- All schedule executions are logged to CloudTrail
- Instance state changes are logged in CloudWatch Logs
- Airbyte service logs are available in Kubernetes pods
Cost Optimization
Instance Scheduling
- Runtime: 1 hour per day (8:00 AM - 9:00 AM)
- Monthly Cost Estimate: ~$10 (t3.xlarge in us-east-1)
- Savings: ~85% compared to 24/7 operation, obviously
Cost Monitoring
- Use AWS Cost Explorer to track actual spend
- Set up budget alerts for unexpected charges
- Review and adjust schedule as needed
- Look for
/aws/events/ec2-schedulerlog group
EC2 Instance State
- Check instance state history in EC2 Console
- Verify instance starts/stops at scheduled times
Troubleshooting
Common Issues
-
Instance Not Starting/Stopping
- Verify IAM role permissions
- Check CloudWatch logs for errors
- Ensure instance ID is correct
-
Schedule Not Triggering
- Verify EventBridge Scheduler status
- Check for resource conflicts
- Verify timezone settings