/review-implementation
Reviews an existing arXiv paper implementation against the original paper to verify correctness, completeness, and adherence to the paper's methodology.
Usage
/review-implementation <arxiv_url_or_id> [options]
Arguments
- •
<arxiv_url_or_id>: arXiv paper URL (e.g.,https://arxiv.org/abs/2301.00001) or ID (e.g.,2301.00001)
Options
- •
--propose-fixes: Include concrete code proposals to fix identified issues in the review report - •
--detailed: Generate a more detailed review with section-by-section analysis
Output
The review report is saved as REVIEW.md in the paper_impl/{arxiv_id}/ directory.
Workflow
When this skill is invoked, follow these phases in order:
Phase 1: Gather Resources
- •
Locate the implementation directory
bashcd /Users/shibuiyusuke/tmp/paper2code uv run python -c " from src import extract_arxiv_id, fetch_paper_metadata from pathlib import Path paper = fetch_paper_metadata(extract_arxiv_id('<arxiv_url_or_id>')) impl_dir = Path('./paper_impl') / paper.clean_id if not impl_dir.exists(): print(f'ERROR: Implementation not found at {impl_dir}') exit(1) print(f'Implementation directory: {impl_dir}') print(f'Paper: {paper.title}') " - •
Read the paper PDF
- •Use the Read tool to read the PDF at
paper_impl/{clean_id}.pdf - •If PDF is not found, check in
paper_impl/{clean_id}/directory - •Focus on: Abstract, Method/Approach, Algorithm boxes, Appendix for implementation details
- •Use the Read tool to read the PDF at
- •
Read all implementation source files
- •Read all Python files under
paper_impl/{arxiv_id}/src/ - •Read all test files under
paper_impl/{arxiv_id}/tests/ - •Read
README.mdandrequirements.txt
- •Read all Python files under
Phase 2: Paper Analysis
Extract the key implementation requirements from the paper:
- •
Core Algorithm/Model
- •Main mathematical formulations (equations)
- •Algorithm pseudocode (if provided)
- •Architecture diagrams and components
- •Key hyperparameters and their values
- •
Implementation Details
- •Initialization methods
- •Normalization techniques
- •Activation functions
- •Training procedures (loss functions, optimizers)
- •Data preprocessing steps
- •
Evaluation Methodology
- •Datasets used
- •Metrics reported
- •Baseline comparisons
Phase 3: Implementation Review
Compare the implementation against the paper systematically:
3.1 Structural Review
- •File organization: Does the project structure make sense?
- •Module separation: Are components properly modularized?
- •Dependencies: Are all required libraries included?
3.2 Algorithm Correctness
For each key algorithm/model component:
- •
Equation Mapping
- •Does the code implement the equations correctly?
- •Are variable names consistent with paper notation?
- •Are all terms and operations accounted for?
- •
Control Flow
- •Does the algorithm flow match the pseudocode?
- •Are edge cases handled correctly?
- •Is the order of operations correct?
- •
Numerical Considerations
- •Is numerical stability addressed?
- •Are appropriate epsilon values used?
- •Is gradient flow preserved?
3.3 Completeness Review
- •Are all major components from the paper implemented?
- •Are optional features or ablations mentioned in the paper included?
- •Are hyperparameters configurable and set to paper defaults?
3.4 Code Quality Review
- •Type hints and docstrings
- •Comments referencing paper sections/equations
- •Test coverage
- •Error handling
Phase 4: Generate Review Report
Create REVIEW.md in the implementation directory with the following structure:
# Implementation Review: {Paper Title}
**Paper**: [{arxiv_id}]({arxiv_url})
**Review Date**: {current_date}
**Implementation Path**: `paper_impl/{arxiv_id}/`
## Summary
{Brief overall assessment: Excellent/Good/Needs Improvement/Major Issues}
{2-3 sentence summary of the implementation quality}
## Detailed Review
### 1. Algorithm Correctness
#### 1.1 {Component Name}
**Paper Reference**: Section X.Y, Equation (Z)
**Status**: {Correct/Partial/Incorrect/Missing}
**Analysis**:
{Detailed analysis of how the implementation matches or differs from the paper}
{If --propose-fixes is set and status is not Correct:}
**Proposed Fix**:
```python
# Proposed code changes
1.2 {Next Component}
...
2. Completeness
| Component | Paper Section | Implemented | Notes |
|---|---|---|---|
| {Component 1} | Section X | Yes/No/Partial | {notes} |
| {Component 2} | Section Y | Yes/No/Partial | {notes} |
| ... |
3. Code Quality
3.1 Documentation
- • README completeness
- • Docstrings present
- • Paper references in comments
3.2 Testing
- • Unit tests present
- • Shape tests
- • Gradient tests
- • Integration tests
3.3 Best Practices
- • Type hints
- • Error handling
- • Numerical stability
4. Discrepancies and Recommendations
Critical Issues
{List of issues that affect correctness}
{If --propose-fixes is set:} Fix:
# Code to fix the issue
Minor Issues
{List of style/quality issues}
Recommendations
{Suggestions for improvement}
Verification Checklist
- • Core algorithm matches paper equations
- • Hyperparameters match paper defaults
- • Training procedure aligns with paper
- • Output shapes are correct
- • Numerical stability is handled
Conclusion
{Final assessment and summary of key findings}
### Phase 5: Fix Proposals (if --propose-fixes)
When the `--propose-fixes` option is specified:
1. For each identified issue, provide:
- **Location**: File path and line numbers
- **Problem**: Clear description of the issue
- **Paper Reference**: Relevant section/equation
- **Proposed Code**: Complete code snippet to fix the issue
2. Format fix proposals clearly:
```markdown
#### Fix for: {Issue Title}
**File**: `src/{filename}.py`
**Lines**: {start}-{end}
**Current Code**:
```python
# Current problematic code
Proposed Code:
# Fixed code with comments explaining changes
Rationale: {Why this fix is correct according to the paper}
3. For complex fixes spanning multiple files, provide a step-by-step guide ## Important Guidelines ### When Reading the Paper - **Focus on implementation details**: Look for Appendix sections with hyperparameters - **Note ambiguities**: Papers often omit details - note what's missing - **Check figures carefully**: Architecture diagrams often reveal details not in text ### When Reviewing Code - **Be thorough but fair**: Not every deviation is a bug - **Consider practical adaptations**: Some changes may be valid simplifications - **Check test results**: If tests pass, the implementation likely works ### Review Criteria Rate each component using: - **Correct**: Faithfully implements the paper - **Partial**: Implements the concept but with minor deviations - **Incorrect**: Contains errors that affect functionality - **Missing**: Not implemented when it should be ### Common Issues to Look For 1. **Matrix operations**: Transposition errors, wrong axis for reduction 2. **Normalization**: Wrong layer norm placement, missing normalization 3. **Initialization**: Default PyTorch init vs paper-specified init 4. **Activation functions**: Wrong activation or placement 5. **Loss computation**: Missing terms, wrong reduction 6. **Hyperparameters**: Different from paper defaults ## Example Session
User: /review-implementation https://arxiv.org/abs/2511.07800 --propose-fixes
Claude: I'll review the implementation of "From Experience to Strategy: Empowering LLM Agents with Trainable Graph Memory".
[Phase 1: Locating implementation and reading paper...] [Phase 2: Extracting paper requirements...] [Phase 3: Reviewing implementation...] [Phase 4: Generating review report...] [Phase 5: Adding fix proposals...]
Review complete! Saved to paper_impl/2511_07800v1/REVIEW.md
Summary:
- •Overall: Good
- •3 components fully correct
- •1 component with minor issues
- •2 fix proposals included