Skip to main content

Zero-Downtime Deployment

This guide covers blue-green deployment for production environments where downtime is not acceptable.
Looking for simple deployment? See the main README for standard Docker Compose deployment.

Overview

Blue-green deployment runs two versions of stateless services simultaneously:
  1. Blue and Green environments alternate as active/standby
  2. New version starts alongside the current one
  3. Traffic switches only after health checks pass
  4. Old version is drained and removed
Stateful services (database, proxy) are shared between environments and are not affected by deployments.

Requirements

  • Docker and Docker Compose
  • At least 12-16 GB RAM (runs 2x services during deployment)
  • Services must respond to health checks within 5 minutes

Quick Start

# Deploy specific version (zero-downtime)
./scripts/deploy.sh deploy v1.0.0

# Deploy latest version
./scripts/deploy.sh deploy latest

# Check current status
./scripts/deploy.sh status

# Rollback to previous version
./scripts/deploy.sh rollback

# Clean up inactive containers
./scripts/deploy.sh cleanup

Commands

deploy.sh

CommandDescription
deploy <version>Pull and deploy specified version with zero downtime
statusShow current deployment status and health
rollbackRevert to the previous version
cleanupRemove inactive containers
resetRemove ALL blue-green containers and return to normal mode
helpShow usage information

Configuration

Environment variables to customize deployment:
VariableDefaultDescription
HEALTH_CHECK_TIMEOUT300sMax time to wait for health checks
HEALTH_CHECK_INTERVAL3sInterval between health checks
DRAIN_TIMEOUT30sTime to drain old containers before removal
Example with custom timeout:
HEALTH_CHECK_TIMEOUT=300 ./scripts/deploy.sh deploy v1.0.0

How It Works

Architecture

┌─────────────────────────────────────────────────────────────┐
│                         Caddy Proxy                          │
│              (Routes traffic to healthy backend)             │
└─────────────────────────┬───────────────────────────────────┘

        ┌─────────────────┴─────────────────┐
        ▼                                   ▼
┌───────────────────┐               ┌───────────────────┐
│   Blue Services   │               │  Green Services   │
│    (Active)       │               │   (Standby)       │
├───────────────────┤               ├───────────────────┤
│ platform-blue     │               │ platform-green    │
│ rag-blue          │               │ rag-green         │
│ crawler-blue      │               │ crawler-green     │
│ operator-blue     │               │ operator-green    │
└───────────────────┘               └───────────────────┘
        │                                   │
        └─────────────────┬─────────────────┘

              ┌───────────────────┐
              │  Shared Services  │
              │   (Stateful)      │
              ├───────────────────┤
              │ db (TimescaleDB)  │
              │ proxy (Caddy)     │
              └───────────────────┘

Deployment Flow

  1. Pull images - Download specified version from GHCR
  2. Detect current state - Determine which color (blue/green) is currently active
  3. Start new version - Run the new containers alongside existing ones
  4. Health checks - Wait for all new services to report healthy
  5. Traffic switch - Caddy automatically routes to healthy backends
  6. Drain old version - Wait for in-flight requests to complete
  7. Cleanup - Remove old containers

Files

FilePurpose
compose.ymlBase configuration for all services
compose.blue.ymlBlue environment overlay (container names, network aliases)
compose.green.ymlGreen environment overlay
scripts/deploy.shDeployment automation script
.deployment-colorTracks current active deployment (auto-generated)

Database Migrations

Database changes require special handling since the database is shared between blue and green environments. Both old and new versions of the application must work with the database during the transition period.

Guidelines

  1. Backward-compatible migrations only - New code must work with old schema, old code must work with new schema
  2. Expand-contract pattern - Split breaking changes into multiple deployments:
    • Deploy 1 (Expand): Add new columns/tables, keep old ones
    • Deploy 2 (Migrate): Application uses new structure
    • Deploy 3 (Contract): Remove old columns/tables after confirming success

Examples

ChangeSafe Approach
Add columnAdd as nullable or with default value
Remove columnFirst deploy code that stops using it, then remove
Rename columnAdd new column → migrate data → update code → remove old
Change typeAdd new column with new type → migrate → update code → remove old

Migration Workflow

# 1. Run migrations BEFORE deploying new code
bun run db:migrate  # or your migration command

# 2. Deploy new version
./scripts/deploy.sh deploy v1.0.0

# 3. Verify everything works

# 4. (Later) Run cleanup migrations if needed
bun run db:migrate:cleanup

Non-Backward-Compatible Changes

If a migration cannot be made backward-compatible:
  1. Schedule a maintenance window
  2. Use standard deployment with downtime
  3. Or coordinate a multi-phase rollout over several deployments

Rollback

If a deployment fails or you need to revert:
./scripts/deploy.sh rollback
Rollback restarts containers from the previous deployment. It works best immediately after a failed deployment when the previous containers still exist. For a fresh deployment after cleanup:
./scripts/deploy.sh deploy v1.0.0

Troubleshooting

Health checks timing out

# Increase timeout
HEALTH_CHECK_TIMEOUT=300 ./scripts/deploy.sh deploy v1.0.0

Deployment failed

# Clean up failed deployment
./scripts/deploy.sh cleanup

# Try again
./scripts/deploy.sh deploy v1.0.0

Check service health

# View all container status
docker ps --format "table {{.Names}}\t{{.Status}}"

# Check specific service
docker inspect --format='{{.State.Health.Status}}' tale-platform-blue

Memory issues during deployment

Blue-green deployment temporarily runs 2x services. If you run out of memory:
  1. Increase available RAM to at least 12-16 GB
  2. Or use standard deployment instead (with brief downtime)

When to Use

ScenarioRecommendation
Development / TestingStandard deployment
Low traffic, scheduled maintenance OKStandard deployment
Production, zero-downtime requiredBlue-green deployment
High availability requirementsBlue-green deployment
Limited server resources (<12 GB RAM)Standard deployment