Design end-to-end ML workflows covering experiment tracking, feature engineering and storage, model training pipelines, model serving and deployment, A/B testing for models, and monitoring for data and model drift. Produces a workflow architecture, tool selection rationale, and operational runbook.

Inputs

Process

Step 1: Define the ML Problem Clearly

Document the problem statement, target variable, evaluation metric, and success threshold.

Step 2: Design the Feature Engineering Pipeline

Step 3: Design Experiment Tracking

Step 4: Design the Training Pipeline

Step 5: Design Model Serving

For real-time serving, specify: latency SLA (p50/p99), throughput (requests/second), scaling strategy (auto-scale triggers), and fallback behavior (what happens if the model is unavailable?).

Step 6: Design A/B Testing for Models

Step 7: Design Monitoring and Drift Detection

Define retraining policy: scheduled (weekly/monthly), triggered (drift detected), or continuous (online learning).

Output Format

markdown

# ML Workflow: [Project/Model Name]

## Problem Definition

| Aspect | Detail |
|--------|--------|
| Problem type | ... |
| Target variable | ... |
| Business metric | ... |
| Evaluation metric | ... |
| Baseline performance | ... |
| Success threshold | ... |

## Feature Engineering

| Feature | Source | Transformation | Type | Leakage Risk |
|---------|--------|---------------|------|-------------|
| ...     | ...    | ...           | ...  | Low/Med/High |

**Feature store:** [Yes/No — tool choice and rationale]

## Experiment Tracking

| Aspect | Choice | Rationale |
|--------|--------|-----------|
| Tool | ... | ... |
| What's tracked | ... | ... |
| Organization | ... | ... |

## Training Pipeline

code


| Stage | Tool/Method | Notes |
|-------|------------|-------|
| Data split | ... | ... |
| Training | ... | ... |
| Tuning | ... | ... |
| Validation | ... | ... |
| Registry | ... | ... |

## Model Serving

| Aspect | Detail |
|--------|--------|
| Serving mode | Batch / Real-time / Streaming / Edge |
| Latency SLA | ... |
| Throughput | ... |
| Scaling | ... |
| Fallback | ... |

## A/B Testing

| Aspect | Detail |
|--------|--------|
| Traffic split | ... |
| Primary metric | ... |
| Guardrail metrics | ... |
| Min duration | ... |
| Rollback criteria | ... |

## Monitoring and Drift

| Monitor | Tool | Threshold | Action |
|---------|------|-----------|--------|
| Data drift | ... | ... | ... |
| Model drift | ... | ... | ... |
| Concept drift | ... | ... | ... |
| Operational | ... | ... | ... |

**Retraining policy:** [Scheduled / Triggered / Continuous — details]

ML Workflow