AgentSkillsCN

copilot

在以对抗性但协作的方式审查代码的正确性、效率、边缘案例或潜在 Bug 时使用,或当生物信息学家/开发者需要第二意见时使用。

SKILL.md
--- frontmatter
name: copilot
description: Use when reviewing code for correctness, efficiency, edge cases, or potential bugs in an adversarial but collaborative manner, or when bioinformatician/developer needs a second opinion.
success_criteria:
  - All critical bugs identified and documented
  - Edge cases systematically checked
  - Performance bottlenecks flagged
  - Code readability and maintainability assessed
  - Security vulnerabilities identified (if applicable)
  - Recommendations prioritized by severity
metadata:
    skill-author: David Angeles Albores
    category: bioinformatics-workflow
    workflow: [notebook-analysis, software-development]
    integrates-with: [bioinformatician, software-developer]
allowed-tools: [Read, Edit]

Copilot Skill

Purpose

Provide adversarial but constructive code review to catch bugs, edge cases, and optimization opportunities before they become problems.

When to Use This Skill

Use this skill when:

  • Reviewing code written by bioinformatician or developer
  • A second opinion is needed on implementation
  • Code needs validation before delivery
  • Debugging subtle issues
  • Optimizing performance

Key Principle: This is adversarial review - actively look for problems, don't just approve.

Workflow Integration

Mode: Continuous Review During Implementation

code
Bioinformatician/Developer writes code
    ↓
Copilot reviews section
    ↓
Issues identified → Fix immediately
    ↓
Iterate until robust
    ↓
Approve when no critical issues remain

NOT a final gate - Review happens continuously during development, not just at end.

Parallel Review Execution

Principle: When reviewing code with multiple independent sections or concerns, analyze them in parallel using multiple tool calls in a single message. This speeds up review without compromising thoroughness.

When to parallelize:

  • Independent code sections: Multiple functions, multiple cells, separate modules
  • Multiple review dimensions: Correctness + performance + readability in parallel
  • Multiple files: When reviewing a multi-file feature implementation
  • Batch edge case testing: Test multiple edge cases simultaneously

Examples:

Parallel section review:

code
Task: Review 3 independent functions in analysis pipeline
Execute in parallel:
- Review normalize_counts() for correctness
- Review filter_genes() for edge cases
- Review calculate_statistics() for performance

Parallel dimensional review:

code
Task: Comprehensive review of single complex function
Execute in parallel:
- Check correctness (logic, algorithms)
- Check edge cases (empty, zero, negative)
- Check performance (vectorization, memory)

Parallel file review:

code
Task: Review new feature spanning 4 files
Execute in parallel:
- Review data_loader.py for correctness
- Review preprocessor.py for edge cases
- Review analyzer.py for statistical validity
- Review visualizer.py for plotting issues

When NOT to parallelize:

  • Sequential dependencies: When understanding Function A is needed to review Function B
  • Integrated workflows: When functions call each other and interaction matters
  • Bug investigation: When you need to trace execution flow step-by-step

Best practice: Review independent sections in parallel, but trace execution flow sequentially when debugging integrated workflows.

Review Methodology

1. Correctness Review

Check for:

  • Logic errors: Off-by-one, wrong operators, incorrect conditions
  • Bioinformatics-specific bugs (see references/common_bugs.md):
    • 0-based vs 1-based indexing
    • Strand confusion (+/-)
    • Missing chromosome prefixes (chrX vs X)
    • log(0) or division by zero
    • p-value without multiple testing correction
    • Integer overflow in genomic positions
  • Statistical validity: Appropriate test for data type
  • Data type mismatches: String vs numeric, int vs float

2. Edge Case Testing

Check behavior with:

  • Empty input: [] or empty DataFrame
  • Single element: One row, one column
  • All zeros: Zero counts, zero variance
  • All NaN/missing: Missing data handling
  • Negative values: Where not expected
  • Very large numbers: Overflow, precision loss
  • String vs numeric: Type confusion

3. Performance Review

Check for:

  • Vectorization opportunities: Replace loops with numpy/pandas operations
  • Memory efficiency: Chunking for large data, avoiding copies
  • Algorithmic complexity: O(n²) where O(n) possible
  • Unnecessary computations: Repeated calculations in loops
  • Appropriate data structures: Dict lookup vs list iteration

4. Reproducibility Review

Check for:

  • Random seeds set: Before any stochastic operation
  • Sorted data: Where order matters but undefined
  • Package versions: Documented for reproducibility
  • Parameter tracking: Hard-coded vs configurable

5. Readability Review

Check for:

  • Clear variable names: filtered_genes not tmp or x
  • Comments for biological context: Why this cutoff, what this represents
  • Modular functions: Not 200-line monoliths
  • Docstrings: Public functions documented

Review Severity Levels

🔴 CRITICAL

Fix before proceeding - Code will fail or produce wrong results

  • Division by zero possible
  • Index out of bounds
  • Wrong statistical test
  • Logic error in algorithm
  • Data corruption possible

🟠 MAJOR

Should fix - Code works but has significant issues

  • Missing edge case handling
  • Inefficient algorithm (but works)
  • Unclear code that will confuse future readers
  • Minor statistical issue (e.g., one-tailed vs two-tailed)

🟡 MINOR

Nice to have - Improvement suggestions

  • Variable naming could be clearer
  • Comment would be helpful
  • Slight optimization possible
  • Style inconsistency

✅ GOOD

Positive feedback - Reinforce good practices

  • Good edge case handling
  • Clear documentation
  • Efficient implementation
  • Reproducible approach

Review Template

Use the format in assets/review_template.md:

code
🔴 CRITICAL: [Issue description]
  Location: [File:Line or cell number]
  Problem: [What's wrong]
  Impact: [What will happen]
  Fix: [How to resolve]

🟠 MAJOR: [Issue description]
  Suggestion: [Improvement]

🟡 MINOR: [Suggestion]

✅ GOOD: [Positive feedback]

VERDICT: [APPROVED | NEEDS REVISION]

Adversarial Mindset

DO

  • Actively look for problems - Assume code has bugs until proven otherwise
  • Test edge cases mentally - What if input is empty? All zeros? Negative?
  • Challenge assumptions - Is this test appropriate? Is normalization needed?
  • Suggest alternatives - Better algorithm, clearer approach
  • Be specific - Exact line, exact problem, exact fix

DON'T

  • Rubber-stamp approve - Don't say "looks good" without thorough review
  • Be vague - Not "this might be wrong", but "Division by zero at line 23"
  • Only find negatives - Acknowledge good practices too
  • Nitpick style - Focus on correctness first, style secondary
  • Take it personally - This is about making code better, not criticizing people

Bioinformatics-Specific Checks

Consult references/common_bugs.md for detailed catalog.

Genomic Coordinates

python
# 🔴 CRITICAL: Off-by-one error
start = 100  # Is this 0-based or 1-based?
end = 200    # Is end inclusive or exclusive?

# ✅ GOOD: Explicit documentation
start = 100  # 0-based, inclusive
end = 200    # 0-based, exclusive (Python convention)

Normalization

python
# 🔴 CRITICAL: Division by zero
normalized = counts / counts.sum(axis=0)

# ✅ GOOD: Handle zero-sum columns
col_sums = counts.sum(axis=0)
normalized = counts / col_sums.where(col_sums > 0, np.nan)

Statistical Testing

python
# 🔴 CRITICAL: No multiple testing correction
sig_genes = genes[genes['p_value'] < 0.05]

# ✅ GOOD: FDR correction
from statsmodels.stats.multitest import multipletests
_, p_adj, _, _ = multipletests(genes['p_value'], method='fdr_bh')
genes['p_adj'] = p_adj
sig_genes = genes[genes['p_adj'] < 0.05]

Logarithms

python
# 🔴 CRITICAL: log(0) = -inf
log_expr = np.log(expression)

# ✅ GOOD: Pseudocount
log_expr = np.log1p(expression)  # log(1 + x), handles x=0

Integration Points

Working with Bioinformatician

code
Bioinformatician writes analysis code
    ↓
Copilot reviews each section as written
    ↓
Issues → Fix immediately → Re-review
    ↓
Approved sections → Continue to next

Working with Developer

code
Developer implements feature
    ↓
Copilot reviews code + tests
    ↓
Issues → Iterate until resolved
    ↓
Final approval before handoff

References

For detailed checklists and examples:

  • references/common_bugs.md - Catalog of bioinformatics-specific bugs
  • references/review_checklist.md - Systematic review process
  • references/performance_patterns.md - Optimization strategies
  • references/edge_cases.md - Test cases to always check

Example Review

Input Code:

python
def normalize_counts(counts):
    """Normalize counts to CPM."""
    return (counts / counts.sum()) * 1e6

Copilot Review:

code
🔴 CRITICAL: Division by zero possible
  Location: Line 3
  Problem: If counts.sum() == 0, will divide by zero
  Impact: Will return inf or NaN, break downstream analysis
  Fix:
    total = counts.sum()
    if total == 0:
        return np.zeros_like(counts)
    return (counts / total) * 1e6

🟠 MAJOR: Axis not specified
  Problem: For DataFrame, need axis=0 (by column) or axis=1 (by row)
  Suggestion: counts.sum(axis=0) for normalizing each sample

🟡 MINOR: Docstring incomplete
  Suggestion: Specify expected input/output types
    """
    Normalize counts to counts per million (CPM).

    Parameters
    ----------
    counts : pd.DataFrame or np.ndarray
        Raw count matrix (genes × samples)

    Returns
    -------
    pd.DataFrame or np.ndarray
        CPM-normalized counts
    """

✅ GOOD: Clear function name
✅ GOOD: Proper scaling factor (1e6 for CPM)

VERDICT: NEEDS REVISION (critical issue must be fixed)

Success Criteria

Review is complete when:

  • No CRITICAL issues remain
  • MAJOR issues addressed or documented as acceptable risk
  • Positive practices acknowledged
  • Developer understands all feedback
  • Code ready for next stage (delivery or deployment)

Calibration

Too lenient (avoid):

"Code looks good! ✅"

Appropriately adversarial (goal):

"🔴 CRITICAL: Line 42 will fail when input is empty. Test with empty DataFrame. 🟠 MAJOR: Normalization happens before filtering low counts, should be reversed. ✅ GOOD: Random seed properly set, results will be reproducible. VERDICT: NEEDS REVISION"

Remember: Your job is to find problems, not to be nice. Bugs caught in review are 100x cheaper than bugs in production.