Infrastructure Orchestration Protocol

This skill ensures services are managed through proper orchestration scripts, preventing dependency issues and maintaining correct startup/shutdown sequences.

Core Principle

NEVER start/stop individual services when orchestration exists

•MUST search for orchestration scripts: start.sh, launch.sh, stop.sh, docker-compose.yml
•MUST use orchestration for ALL service operations
•MUST follow sequence: Stop ALL → Change → Start ALL → Verify
•MUST test complete lifecycle

Orchestration Discovery

Step 1: Find Orchestration Scripts

bash

# Search for orchestration files
fd -t f "(start|launch|stop|restart|run|up|down)\.(sh|bash|py)"
fd "docker-compose.*\.ya?ml"
fd "Makefile"
fd "(package|composer|Gemfile|requirements)"

# Check common locations
ls scripts/ | rg "(start|stop|launch)"
ls bin/ | rg "(start|stop|launch)"
ls . | rg "docker-compose"

# Check for process managers
rg "supervisor|systemd|pm2|forever" --type yaml --type json

Step 2: Understand Dependencies

python

def analyze_service_dependencies():
    """Map out service dependency graph."""

    dependencies = {}

    # Parse docker-compose.yml
    if Path("docker-compose.yml").exists():
        with open("docker-compose.yml") as f:
            compose = yaml.load(f)
            for service, config in compose.get('services', {}).items():
                dependencies[service] = config.get('depends_on', [])

    # Parse start scripts
    start_scripts = find_files("start*.sh")
    for script in start_scripts:
        deps = extract_service_order(script)
        dependencies.update(deps)

    return create_dependency_graph(dependencies)

Service Management Patterns

Docker Compose Orchestration

bash

# ❌ WRONG - Starting individual containers
docker run -d postgres
docker run -d redis
docker run -d app

# ✅ CORRECT - Using orchestration
docker-compose up -d

# Full lifecycle management
docker-compose down  # Stop all
# Make changes
docker-compose up -d  # Start all
docker-compose ps  # Verify

Script-Based Orchestration

bash

# ❌ WRONG - Manual service starts
systemctl start postgresql
systemctl start redis
systemctl start nginx
npm start

# ✅ CORRECT - Using orchestration script
./scripts/start-all.sh

# Typical orchestration script structure
#!/bin/bash
# start-all.sh
echo "Starting infrastructure..."

# Start in dependency order
systemctl start postgresql
wait_for_service postgresql 5432

systemctl start redis
wait_for_service redis 6379

systemctl start elasticsearch
wait_for_service elasticsearch 9200

# Start application
npm start &
wait_for_service app 3000

echo "All services started successfully"

Kubernetes Orchestration

bash

# ❌ WRONG - Individual deployments
kubectl apply -f postgres-deployment.yaml
kubectl apply -f redis-deployment.yaml
kubectl apply -f app-deployment.yaml

# ✅ CORRECT - Using orchestration
kubectl apply -f k8s/  # Apply all manifests
# OR
helm install myapp ./chart

# Proper lifecycle
kubectl delete -f k8s/  # Stop all
# Make changes
kubectl apply -f k8s/  # Start all
kubectl get pods  # Verify

Service Lifecycle Management

Complete Shutdown Sequence

python

def shutdown_services_safely():
    """Shutdown services in reverse dependency order."""

    # Get dependency graph
    deps = get_service_dependencies()
    shutdown_order = topological_sort_reverse(deps)

    for service in shutdown_order:
        print(f"Stopping {service}...")

        # Graceful shutdown
        send_sigterm(service)
        if not wait_for_shutdown(service, timeout=30):
            print(f"Force stopping {service}")
            send_sigkill(service)

        # Verify stopped
        assert not is_running(service), f"{service} still running!"

    print("All services stopped")

Complete Startup Sequence

python

def start_services_safely():
    """Start services in dependency order with health checks."""

    # Get dependency graph
    deps = get_service_dependencies()
    startup_order = topological_sort(deps)

    started = []

    for service in startup_order:
        print(f"Starting {service}...")

        try:
            start_service(service)
            wait_for_healthy(service)
            started.append(service)
            print(f"✅ {service} is healthy")

        except Exception as e:
            print(f"❌ Failed to start {service}: {e}")
            # Rollback
            for s in reversed(started):
                stop_service(s)
            raise

    print("All services started successfully")

Health Check Patterns

python

def wait_for_healthy(service, timeout=60):
    """Wait for service to become healthy."""

    health_checks = {
        'postgres': lambda: check_postgres_connection(),
        'redis': lambda: check_redis_ping(),
        'elasticsearch': lambda: check_elastic_cluster(),
        'app': lambda: check_http_endpoint('/health'),
        'rabbitmq': lambda: check_amqp_connection(),
        'mongodb': lambda: check_mongo_connection()
    }

    check = health_checks.get(service)
    if not check:
        # Generic TCP check
        return wait_for_port(get_service_port(service))

    start = time.time()
    while time.time() - start < timeout:
        try:
            if check():
                return True
        except:
            pass
        time.sleep(1)

    raise TimeoutError(f"{service} not healthy after {timeout}s")

Configuration Management

Environment-Specific Orchestration

bash

# Development
./scripts/dev/start.sh

# Staging
docker-compose -f docker-compose.yml -f docker-compose.staging.yml up

# Production
kubectl apply -k overlays/production/

Secret Management

python

def load_secrets_before_start():
    """Load secrets from vault before starting services."""

    # ❌ WRONG - Hardcoded secrets
    os.environ['DB_PASSWORD'] = 'hardcoded_password'

    # ✅ CORRECT - Load from secret manager
    secrets = load_from_vault([
        'db/password',
        'redis/password',
        'api/keys/external'
    ])

    for key, value in secrets.items():
        os.environ[key] = value

    # Now safe to start services
    run_orchestration_script()

Common Orchestration Files

docker-compose.yml

yaml

version: '3.8'
services:
  db:
    image: postgres:14
    environment:
      POSTGRES_PASSWORD: ${DB_PASSWORD}
    healthcheck:
      test: ["CMD", "pg_isready", "-U", "postgres"]
      interval: 10s
      timeout: 5s
      retries: 5

  redis:
    image: redis:7
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]

  app:
    build: .
    depends_on:
      db:
        condition: service_healthy
      redis:
        condition: service_healthy
    command: ./start.sh

Makefile

makefile

.PHONY: start stop restart status

start:
	@echo "Starting all services..."
	@docker-compose up -d
	@./scripts/wait-for-healthy.sh
	@echo "All services ready"

stop:
	@echo "Stopping all services..."
	@docker-compose down
	@echo "All services stopped"

restart: stop start

status:
	@docker-compose ps
	@./scripts/health-check.sh

Integration with BMAD

When working with BMAD workflows:

•Check workflow-manifest.csv for orchestration workflows
•Use task-manifest.csv for service task sequences
•Maintain consistency with existing orchestration patterns

Scripts

Orchestration Finder

See scripts/find_orchestration.py - Discovers orchestration scripts

Dependency Analyzer

See scripts/analyze_dependencies.py - Maps service dependencies

Critical Reminders

•Never Skip Orchestration: Individual commands break dependencies
•Test Full Lifecycle: Always test stop → start → verify
•Health Checks Required: Don't assume services are ready immediately
•Rollback on Failure: If any service fails, stop all and investigate