Systematic Debugging
Core principle: Find root cause before attempting fixes. Symptom fixes are failure.
NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST
Phase 1: Root Cause Investigation
BEFORE attempting ANY fix:
- •
Read Error Messages Carefully
- •Read stack traces completely
- •Note line numbers, file paths, error codes
- •Don't skip warnings
- •
Reproduce Consistently
- •What are the exact steps?
- •If not reproducible → gather more data, don't guess
- •
Check Recent Changes
- •Git diff, recent commits
- •New dependencies, config changes
- •Environmental differences
- •
Gather Evidence in Multi-Component Systems
WHEN system has multiple components (CI → build → signing, API → service → database):
Add diagnostic instrumentation before proposing fixes:
codeFor EACH component boundary: - Log what data enters/exits component - Verify environment/config propagation - Check state at each layer Run once to gather evidence → analyze → identify failing component
Example:
bash# Layer 1: Workflow echo "=== Secrets available: ===" echo "IDENTITY: ${IDENTITY:+SET}${IDENTITY:-UNSET}" # Layer 2: Build script env | grep IDENTITY || echo "IDENTITY not in environment" # Layer 3: Signing security find-identity -v - •
Trace Data Flow
See
references/root-cause-tracing.mdfor backward tracing technique.Quick version: Where does bad value originate? Trace up call chain until you find the source. Fix at source.
Phase 2: Pattern Analysis
- •Find Working Examples - Similar working code in codebase
- •Compare Against References - Read reference implementations COMPLETELY, don't skim
- •Identify Differences - List every difference, don't assume "that can't matter"
- •Understand Dependencies - Components, config, environment, assumptions
Phase 3: Hypothesis and Testing
- •Form Single Hypothesis - "I think X is root cause because Y" - be specific
- •Test Minimally - SMALLEST possible change, one variable at a time
- •Verify - Worked → Phase 4. Didn't work → form NEW hypothesis, don't stack fixes
- •When You Don't Know - Say so. Don't pretend.
Phase 4: Implementation
- •
Create Failing Test Case
- •Use the
test-driven-developmentskill - •MUST have before fixing
- •Use the
- •
Implement Single Fix
- •ONE change at a time
- •No "while I'm here" improvements
- •
Verify Fix
- •Test passes? Other tests still pass? Issue resolved?
- •
If Fix Doesn't Work
- •Count attempts
- •If < 3: Return to Phase 1 with new information
- •If ≥ 3: Escalate (below)
Escalation: 3+ Failed Fixes
Pattern indicating architectural problem:
- •Each fix reveals new problems elsewhere
- •Fixes require massive refactoring
- •Shared state/coupling keeps surfacing
Action: STOP. Question fundamentals:
- •Is this pattern fundamentally sound?
- •Are we continuing through inertia?
- •Refactor architecture vs. continue fixing symptoms?
Discuss with human partner before more fix attempts. This is wrong architecture, not failed hypothesis.
Red Flags → STOP and Return to Phase 1
If you catch yourself thinking:
- •"Quick fix for now, investigate later"
- •"Just try changing X"
- •"I'll skip the test"
- •"It's probably X"
- •"Pattern says X but I'll adapt it differently"
- •Proposing solutions before tracing data flow
- •"One more fix" after 2+ failures
Human Signals You're Off Track
- •"Is that not happening?" → You assumed without verifying
- •"Will it show us...?" → You should have added evidence gathering
- •"Stop guessing" → You're proposing fixes without understanding
- •"Ultrathink this" → Question fundamentals
- •Frustrated "We're stuck?" → Your approach isn't working
Response: Return to Phase 1.
Supporting Techniques
Reference files in references/:
- •
root-cause-tracing.md- Trace bugs backward through call stack - •
defense-in-depth.md- Add validation at multiple layers after finding root cause - •
condition-based-waiting.md- Replace arbitrary timeouts with condition polling
Related skills:
- •
test-driven-development- Creating failing test case (Phase 4) - •
verification-before-completion- Verify fix before claiming success