Analyze Report
Interpret benchmark and test results. Turn raw data into actionable findings.
When to Use
- •After running benchmarks — interpret results, flag anomalies
- •After
ctest— analyze failures, identify patterns - •Before spec updates — verify performance claims with measured data
- •Periodic — track performance trends across runs
When NOT to Use
- •Running benchmarks — run them first, then use this skill to analyze
- •Spec quality review — use
/spec-reviewinstead - •Test gap analysis — use
/test-planinstead
Inputs
- •No argument: find and analyze all reports in the project (benchmark JSON, test output, coverage reports)
- •File path: analyze a specific report file
- •
--compare A B: compare two report files (e.g., before/after optimization)
Workflow
Phase 1: Discovery
- •Read
CLAUDE.mdfor project structure — find benchmark output directory, test report directory - •Glob for benchmark results:
**/*_bench*.json,**/benchmark_*.json - •Glob for test reports:
**/test-report*,**/coverage* - •Check for hardware baseline data (if project has
hw-baselineor similar)
Phase 2: Parse Raw Data
For each benchmark result:
- •Extract scenario names, iterations, timing data (mean, median, P50/P99, min/max)
- •Identify the unit (ns, us, ms, ops/sec)
- •Group by benchmark suite
For each test report:
- •Extract pass/fail counts, failure messages
- •Extract coverage data if available
Phase 3: Compute Derived Metrics
Scaling analysis: Compare scenarios that differ by one parameter (e.g., 1 reader vs 4 readers).
code
Scaling ratio = time(N) / time(1) Ideal: ratio ≈ 1.0 (independent) Degraded: ratio > 2.0 (contention)
Contention indicators: Retry rates, CAS failures, cache miss ratios.
Regression detection (when comparing two runs):
code
Regression threshold: > 10% slower on same hardware Improvement threshold: > 10% faster Noise band: within 10%
Phase 4: Cross-Reference with Specs
For each performance claim in specs:
- •Find the corresponding benchmark result
- •Compare measured value against spec target
- •Classify: MEETS | EXCEEDS | MISSES | NO_DATA
Phase 5: Report
markdown
# Benchmark Analysis Report
**Date**: YYYY-MM-DD HH:MM
**Hardware**: {from baseline if available}
## Summary
| Metric | Value |
|--------|-------|
| Benchmarks analyzed | N |
| Spec claims verified | N / M |
| Anomalies found | N |
## Findings
### Performance vs Spec Targets
| Component | Spec Target | Measured | Status |
|-----------|-------------|----------|--------|
| Reactor loop | ~1.6ns (design est.) | P50=4.7ns | MEETS (within order) |
| SeqLock read | ~50ns | — | NO_DATA |
### Scaling Analysis
| Benchmark | 1→N Scaling | Ratio | Assessment |
|-----------|------------|-------|------------|
| DeltaRing 1→4 readers | 2.3ns→845ns | 367x | Expected (SPMC contention) |
### Anomalies
1. **{benchmark}:{scenario}** — {description}
Data: {numbers}
Possible cause: {analysis}
Action: {recommendation}
### Trends (if comparing runs)
| Benchmark | Before | After | Change |
|-----------|--------|-------|--------|
| ... | ... | ... | +/-% |
Principles
- •Data first, interpretation second: Present raw numbers before drawing conclusions.
- •Hardware context matters: Same benchmark on different CPUs means different things. Always note hardware.
- •Noise awareness: Single-digit percent differences are noise, not signal. Only flag > 10% changes.
- •No spec numbers from thin air: If a spec claims X ns but no benchmark measures it, report NO_DATA, don't estimate.
- •Actionable output: Every anomaly should have a "what to do next" recommendation.