Knowledge Capture - Documentation Sync from Conversations
Automatically updates project documentation based on session discoveries and learnings.
Purpose
Problem: Valuable knowledge gets lost in conversations
- •Debugging sessions reveal undocumented gotchas
- •Architecture decisions made but not written down
- •Solutions found but not added to troubleshooting guides
- •Performance discoveries not captured
Solution: knowledge-capture analyzes your session and updates docs
Quick Commands
knowledge-capture sync # Update docs from current session knowledge-capture discover # Show what could be documented (dry-run) knowledge-capture --target=docs/TROUBLESHOOTING.md # Update specific file
How It Works
1. Session Analysis
Scans conversation for:
- •Bugs found → docs/troubleshooting/
- •Architecture decisions → docs/architecture/
- •Performance insights → docs/performance/
- •Setup steps → README.md, SETUP.md
- •Gotchas & warnings → Relevant guide sections
- •API changes → API documentation
2. Smart Categorization
Determines where to update:
Discovery: "Spark jobs fail with OOM when partition count < 1000" → Target: docs/troubleshooting/SPARK_ISSUES.md → Section: ## OutOfMemory Errors → Content: Add as known issue with solution Decision: "Using Iceberg instead of Parquet for better schema evolution" → Target: docs/architecture/DATA_STORAGE.md → Section: ## Storage Format Choice → Content: Add rationale and trade-offs Gotcha: "AWS credentials must be refreshed every 12 hours" → Target: SETUP.md → Section: ## AWS Configuration → Content: Add warning with solution
3. Documentation Update
Creates structured updates:
## [Issue Type] - [Brief Title] **Discovered**: 2025-11-20 **Context**: [Where this came from] **Problem**: [What went wrong / what we needed to know] **Solution**: [How we fixed it / what to do] **Example**: ```[code/command]``` **Related**: [Links to code, commits, issues]
Commands
knowledge-capture sync
Interactive mode - recommends updates
Analyzing session... Found 3 documentation opportunities: 1. Troubleshooting: Spark OOM with low partition counts Target: docs/troubleshooting/SPARK_ISSUES.md Section: ## OutOfMemory Errors 2. Architecture: Why we chose Iceberg over Parquet Target: docs/architecture/DATA_STORAGE.md Section: ## Storage Format 3. Setup gotcha: AWS credential refresh requirement Target: SETUP.md Section: ## AWS Configuration Apply all? [y/N]: y ✅ Updated docs/troubleshooting/SPARK_ISSUES.md ✅ Updated docs/architecture/DATA_STORAGE.md ✅ Updated SETUP.md Run "git diff" to review changes before committing.
knowledge-capture discover
Dry-run - shows what could be documented without making changes
Use when:
- •Want to see findings before updating
- •Not sure if updates are valuable
- •Just exploring what was learned
knowledge-capture --target=<file>
Update specific documentation file
# Only update troubleshooting guide knowledge-capture --target=docs/troubleshooting/SPARK_ISSUES.md # Only update README knowledge-capture --target=README.md
Document Types Supported
1. Troubleshooting Guides
Pattern detected:
- •Error messages in conversation
- •"fixed by..." statements
- •Root cause analysis
Update format:
### [Error Type]: [Brief Description] **Symptoms**: What you'll see **Cause**: Why it happens **Solution**: How to fix **Prevention**: How to avoid **Example**: [error log snippet] [fix command]
2. Architecture Documentation
Pattern detected:
- •"Why we chose X over Y"
- •Technology selection discussions
- •Design trade-offs
Update format:
## [Component/Decision] **Decision**: What we chose **Alternatives**: What we considered **Rationale**: Why this choice **Trade-offs**: Pros/cons **Date**: When decided
3. Setup/Configuration
Pattern detected:
- •"Had to configure X"
- •"Missing prerequisite Y"
- •Installation steps
Update format:
## [Step/Component] [Step-by-step instructions] **Gotcha**: [Common issue] **Solution**: [How to fix]
4. Performance Guides
Pattern detected:
- •"Optimized by..."
- •Performance measurements
- •Benchmarking results
Update format:
## [Optimization] **Before**: [baseline metrics] **After**: [improved metrics] **Change**: [what we did] **Impact**: [improvement %]
5. API Documentation
Pattern detected:
- •New endpoints discussed
- •API behavior clarified
- •Request/response examples
Update format:
### `[METHOD] /path/to/endpoint` **Purpose**: What it does **Request**: [schema/example] **Response**: [schema/example] **Example**: [code]
Smart Features
Duplicate Detection
Checks existing docs before adding:
- •"This OOM error already documented"
- •"Similar architecture decision exists"
- •Merges with existing section vs creating new
Context Preservation
Includes:
- •Session date
- •Related commit SHAs
- •File paths mentioned
- •Commands used
- •Links to relevant code
Minimal Noise
Won't document:
- •Trivial findings
- •One-off issues
- •Already well-documented topics
- •Conversational noise
Will document:
- •Non-obvious solutions
- •Gotchas that wasted time
- •Important architecture decisions
- •Performance insights
- •Setup issues that others will hit
Integration with TaskFlow
Works independently OR with TaskFlow:
# Standalone usage knowledge-capture sync # With TaskFlow (updates both) taskflow capture # Updates BACKLOG with task context knowledge-capture sync # Updates docs with learnings # Or combined taskflow capture --with-docs → Updates BACKLOG.md with task progress → Updates project docs with learnings
File Structure
Expects/creates:
docs/ ├── troubleshooting/ │ ├── SPARK_ISSUES.md │ ├── AWS_ISSUES.md │ └── API_ISSUES.md ├── architecture/ │ ├── DATA_STORAGE.md │ ├── API_DESIGN.md │ └── DEPLOYMENT.md ├── performance/ │ └── OPTIMIZATIONS.md └── INDEX.md README.md SETUP.md
Auto-creates missing files with templates.
Configuration
Optional .knowledge-capture.json:
{
"targets": {
"troubleshooting": "docs/troubleshooting/",
"architecture": "docs/architecture/",
"setup": "SETUP.md",
"readme": "README.md"
},
"autoCommit": false,
"requireApproval": true,
"minImportance": "medium"
}
Example Session
Conversation:
User: "Help debug this Spark OOM error"
[... debugging ...]
Assistant: "Found the issue - partition count was too low.
Setting spark.sql.shuffle.partitions=2000 fixed it.
OOMs happen when < 1000 partitions with large datasets."
knowledge-capture sync detects:
📚 Documentation Opportunity Detected
Type: Troubleshooting
Title: Spark OOM with Low Partition Counts
Target: docs/troubleshooting/SPARK_ISSUES.md
Proposed Update:
─────────────────────────────────────────
### OutOfMemory: Insufficient Partition Count
**Symptoms**:
- Spark executors crash with OutOfMemoryError
- Jobs succeed with small data, fail with production data
**Cause**:
Too few shuffle partitions (<1000) for large datasets
**Solution**:
```bash
spark.conf.set("spark.sql.shuffle.partitions", "2000")
Prevention: Rule of thumb: 1 partition per 128MB of data
Session: 2025-11-20 Related: [commit abc123] ─────────────────────────────────────────
Add this to docs/troubleshooting/SPARK_ISSUES.md? [Y/n]:
## Benefits ✅ **Knowledge preserved** - Nothing gets lost ✅ **Zero manual work** - Automatic extraction ✅ **Team benefit** - Others learn from your debugging ✅ **Better docs** - Always up-to-date ✅ **Searchable** - Findings easy to find later ## When to Use **After sessions involving:** - Debugging complex issues - Making architecture decisions - Finding performance bottlenecks - Discovering setup gotchas - Clarifying API behavior - Learning new parts of codebase **Don't use for:** - Routine tasks - No new learnings - Trivial changes - Conversation with no technical insights --- *Automatically documents your discoveries so they're never lost.*