Experiment Design
Help the researcher design rigorous experiments or computational studies. Good experiments are hypothesis-driven, reproducible, and have clear success criteria before they are run.
Step 1: Read Context
Read .papermill.md (Read tool) for:
- •Thesis: The claim the experiments should support or test.
- •Existing experiments: Any previously registered experiments.
- •Format and tools: What languages/tools are in the repo (R, Python, C++, etc.).
If .papermill.md does not exist, ask the user what claim the experiments should test. Experiment design can proceed without the state file — suggest running /papermill:init afterward to register experiments persistently.
Scan the repository for existing code (Glob tool) in research/, code/, scripts/, experiments/, or analysis/ directories.
Step 2: Identify What Needs Testing
Ask the user: "What specific claim or aspect of your thesis do these experiments need to support?"
Different contribution types need different experimental approaches:
| Contribution | Experimental approach |
|---|---|
| Theorem/proof | Numerical validation of theoretical predictions |
| Algorithm | Runtime/accuracy benchmarks against baselines |
| Statistical method | Monte Carlo simulations with known ground truth |
| Empirical finding | Controlled experiments with statistical tests |
| Framework/model | Case studies demonstrating applicability |
Step 3: Design the Experiment
For each experiment, specify:
Hypothesis
State the expected outcome in falsifiable terms. "We expect X to be Y under conditions Z" -- not "we want to show our method works."
Variables
- •Independent variables: What you manipulate (parameters, dataset size, algorithm choice).
- •Dependent variables: What you measure (accuracy, runtime, error rate).
- •Control variables: What you hold constant (hardware, random seeds, data preprocessing).
Methodology
- •Data generation or collection procedure
- •Algorithm/method configuration
- •Number of replications or samples
- •Statistical tests to apply (t-test, bootstrap CI, etc.)
- •Baseline comparisons
Success Criteria
Define before running what constitutes support for the hypothesis. This prevents post-hoc rationalization.
Reproducibility
- •Random seed strategy
- •Hardware/software environment
- •Data availability
- •Script that runs the full experiment end-to-end
Step 4: Address Common Pitfalls
Check for and warn about:
- •Cherry-picking: Are you testing one configuration or sweeping parameters fairly?
- •Multiple comparisons: If running many tests, apply correction (Bonferroni, FDR).
- •Overfitting to test data: Is there a held-out validation set?
- •Computational budget: Is this feasible given available hardware and time?
- •Missing baselines: Every method needs comparison to something. Even "no method" is a baseline.
Step 5: Register the Experiment
If .papermill.md exists, update it (Edit tool) by adding to the experiments list. If it does not exist, skip registration and suggest running /papermill:init to persist the experiment.
experiments:
- name: "descriptive-name"
type: "simulation | benchmark | case-study | ablation"
hypothesis: "Expected outcome in one sentence"
status: "planned | running | completed | failed"
script: "path/to/script.R"
last_run: null
Append a timestamped note documenting the experiment design.
Step 6: Suggest Next Steps
Based on the experiment type, suggest the most relevant next step:
- •If this is a Monte Carlo study → "Use
/papermill:simulationfor detailed simulation design — it covers sample sizes, convergence diagnostics, and result presentation." - •If the experiment involves proving a theoretical prediction → "Consider
/papermill:proofto verify the theory before running experiments." - •If results will need statistical analysis → "Implement the script, run it, then use
/papermill:reviewonce the results are written up." - •For all experiments → "Start with a small pilot run to debug before the full experiment."