/review-implementation

Reviews an existing arXiv paper implementation against the original paper to verify correctness, completeness, and adherence to the paper's methodology.

Usage

code

/review-implementation <arxiv_url_or_id> [options]

Arguments

•<arxiv_url_or_id>: arXiv paper URL (e.g., https://arxiv.org/abs/2301.00001) or ID (e.g., 2301.00001)

Options

•--propose-fixes: Include concrete code proposals to fix identified issues in the review report
•--detailed: Generate a more detailed review with section-by-section analysis

Output

The review report is saved as REVIEW.md in the paper_impl/{arxiv_id}/ directory.

Workflow

When this skill is invoked, follow these phases in order:

Phase 1: Gather Resources

•

Locate the implementation directory

bash

cd /Users/shibuiyusuke/tmp/paper2code
uv run python -c "
from src import extract_arxiv_id, fetch_paper_metadata
from pathlib import Path

paper = fetch_paper_metadata(extract_arxiv_id('<arxiv_url_or_id>'))
impl_dir = Path('./paper_impl') / paper.clean_id

if not impl_dir.exists():
    print(f'ERROR: Implementation not found at {impl_dir}')
    exit(1)

print(f'Implementation directory: {impl_dir}')
print(f'Paper: {paper.title}')
"

•
Read the paper PDF
- •Use the Read tool to read the PDF at paper_impl/{clean_id}.pdf
- •If PDF is not found, check in paper_impl/{clean_id}/ directory
- •Focus on: Abstract, Method/Approach, Algorithm boxes, Appendix for implementation details
•
Read all implementation source files
- •Read all Python files under paper_impl/{arxiv_id}/src/
- •Read all test files under paper_impl/{arxiv_id}/tests/
- •Read README.md and requirements.txt

Phase 2: Paper Analysis

Extract the key implementation requirements from the paper:

•
Core Algorithm/Model
- •Main mathematical formulations (equations)
- •Algorithm pseudocode (if provided)
- •Architecture diagrams and components
- •Key hyperparameters and their values
•
Implementation Details
- •Initialization methods
- •Normalization techniques
- •Activation functions
- •Training procedures (loss functions, optimizers)
- •Data preprocessing steps
•
Evaluation Methodology
- •Datasets used
- •Metrics reported
- •Baseline comparisons

Phase 3: Implementation Review

Compare the implementation against the paper systematically:

3.1 Structural Review

•File organization: Does the project structure make sense?
•Module separation: Are components properly modularized?
•Dependencies: Are all required libraries included?

3.2 Algorithm Correctness

For each key algorithm/model component:

•
Equation Mapping
- •Does the code implement the equations correctly?
- •Are variable names consistent with paper notation?
- •Are all terms and operations accounted for?
•
Control Flow
- •Does the algorithm flow match the pseudocode?
- •Are edge cases handled correctly?
- •Is the order of operations correct?
•
Numerical Considerations
- •Is numerical stability addressed?
- •Are appropriate epsilon values used?
- •Is gradient flow preserved?

3.3 Completeness Review

•Are all major components from the paper implemented?
•Are optional features or ablations mentioned in the paper included?
•Are hyperparameters configurable and set to paper defaults?

3.4 Code Quality Review

•Type hints and docstrings
•Comments referencing paper sections/equations
•Test coverage
•Error handling

Phase 4: Generate Review Report

Create REVIEW.md in the implementation directory with the following structure:

markdown

# Implementation Review: {Paper Title}

**Paper**: [{arxiv_id}]({arxiv_url})
**Review Date**: {current_date}
**Implementation Path**: `paper_impl/{arxiv_id}/`

## Summary

{Brief overall assessment: Excellent/Good/Needs Improvement/Major Issues}

{2-3 sentence summary of the implementation quality}

## Detailed Review

### 1. Algorithm Correctness

#### 1.1 {Component Name}

**Paper Reference**: Section X.Y, Equation (Z)

**Status**: {Correct/Partial/Incorrect/Missing}

**Analysis**:
{Detailed analysis of how the implementation matches or differs from the paper}

{If --propose-fixes is set and status is not Correct:}
**Proposed Fix**:
```python
# Proposed code changes

1.2 {Next Component}

...

2. Completeness

Component	Paper Section	Implemented	Notes
{Component 1}	Section X	Yes/No/Partial	{notes}
{Component 2}	Section Y	Yes/No/Partial	{notes}
...

3. Code Quality

3.1 Documentation

• README completeness
• Docstrings present
• Paper references in comments

3.2 Testing

• Unit tests present
• Shape tests
• Gradient tests
• Integration tests

3.3 Best Practices

• Type hints
• Error handling
• Numerical stability

4. Discrepancies and Recommendations

Critical Issues

{List of issues that affect correctness}

{If --propose-fixes is set:} Fix:

python

# Code to fix the issue

Minor Issues

{List of style/quality issues}

Recommendations

{Suggestions for improvement}

Verification Checklist

• Core algorithm matches paper equations
• Hyperparameters match paper defaults
• Training procedure aligns with paper
• Output shapes are correct
• Numerical stability is handled

Conclusion

{Final assessment and summary of key findings}

code


### Phase 5: Fix Proposals (if --propose-fixes)

When the `--propose-fixes` option is specified:

1. For each identified issue, provide:
   - **Location**: File path and line numbers
   - **Problem**: Clear description of the issue
   - **Paper Reference**: Relevant section/equation
   - **Proposed Code**: Complete code snippet to fix the issue

2. Format fix proposals clearly:
   ```markdown
   #### Fix for: {Issue Title}

   **File**: `src/{filename}.py`
   **Lines**: {start}-{end}

   **Current Code**:
   ```python
   # Current problematic code

Proposed Code:

python

# Fixed code with comments explaining changes

Rationale: {Why this fix is correct according to the paper}

code


3. For complex fixes spanning multiple files, provide a step-by-step guide

## Important Guidelines

### When Reading the Paper

- **Focus on implementation details**: Look for Appendix sections with hyperparameters
- **Note ambiguities**: Papers often omit details - note what's missing
- **Check figures carefully**: Architecture diagrams often reveal details not in text

### When Reviewing Code

- **Be thorough but fair**: Not every deviation is a bug
- **Consider practical adaptations**: Some changes may be valid simplifications
- **Check test results**: If tests pass, the implementation likely works

### Review Criteria

Rate each component using:
- **Correct**: Faithfully implements the paper
- **Partial**: Implements the concept but with minor deviations
- **Incorrect**: Contains errors that affect functionality
- **Missing**: Not implemented when it should be

### Common Issues to Look For

1. **Matrix operations**: Transposition errors, wrong axis for reduction
2. **Normalization**: Wrong layer norm placement, missing normalization
3. **Initialization**: Default PyTorch init vs paper-specified init
4. **Activation functions**: Wrong activation or placement
5. **Loss computation**: Missing terms, wrong reduction
6. **Hyperparameters**: Different from paper defaults

## Example Session

User: /review-implementation https://arxiv.org/abs/2511.07800 --propose-fixes

Claude: I'll review the implementation of "From Experience to Strategy: Empowering LLM Agents with Trainable Graph Memory".

[Phase 1: Locating implementation and reading paper...] [Phase 2: Extracting paper requirements...] [Phase 3: Reviewing implementation...] [Phase 4: Generating review report...] [Phase 5: Adding fix proposals...]

Review complete! Saved to paper_impl/2511_07800v1/REVIEW.md

Summary:

•Overall: Good
•3 components fully correct
•1 component with minor issues
•2 fix proposals included

code