Chainsaw - Semantic Code Search and Knowledge Graph
Description
Chainsaw is a GraphRAG-powered code search tool that indexes codebases into a searchable knowledge graph with vector embeddings. It enables semantic search (find code by meaning) and graph queries (explore relationships between functions, types, and packages).
Key capabilities:
- •Semantic search using natural language queries
- •Graph queries with Cypher-like syntax
- •Multi-hop relationship traversal
- •Aggregation queries (COUNT, GROUP BY, ORDER BY)
- •Context-aware filtering (scoped to current directory)
- •Background indexing daemon
When to Use
Use Chainsaw when you need to:
- •Find code by meaning, not keywords - Search for "error handling patterns" instead of grepping for specific function names
- •Explore code relationships - Find what calls what, what implements what, package dependencies
- •Analyze codebase structure - Count most-called functions, find fan-out, visualize dependencies
- •Navigate unfamiliar codebases - Discover relevant code without knowing exact names or locations
- •Answer architectural questions - "What implements this interface?", "What are the call chains?"
Do NOT use when:
- •You know the exact file/function name (use grep/find instead)
- •You need to modify code (Chainsaw is read-only)
- •Working with non-code files (images, binaries, etc.)
Prerequisites
# Install Ollama (for embeddings) curl https://ollama.ai/install.sh | sh # Pull embedding model ollama pull nomic-embed-text # Initialize database chainsaw init # Start daemon chainsaw daemon start
Installation
git clone https://github.com/wouteroostervld/chainsaw.git cd chainsaw make install
Basic Usage
1. Index Your Codebase
# Index current directory cd /path/to/your/project chainsaw index . # Check status chainsaw status
The daemon processes files in the background, extracting:
- •Code chunks with semantic embeddings
- •Entity relations (function calls, imports, type definitions)
- •Knowledge graph structure
2. Semantic Search
Find code by meaning, not just keywords:
# Find error handling code chainsaw search "error handling patterns" # Find database queries chainsaw search "SQL database connection pooling" # Find authentication logic chainsaw search "user authentication and session management" # Find specific patterns chainsaw search "HTTP middleware with error handling"
Context-aware: Results are automatically filtered to the current directory and subdirectories.
3. Graph Queries
Query code relationships using Cypher-like syntax:
# Find function calls chainsaw graph query "MATCH (f:FUNCTION)-[:calls]->(t) RETURN f.name, t.name" # Find interface implementations chainsaw graph query "MATCH (i:INTERFACE)<-[:implements]-(s) RETURN i.name, s.name" # Multi-hop: Find call chains (up to 3 levels) chainsaw graph query "MATCH (a:FUNCTION)-[:calls*1..3]->(b) RETURN a.name, b.name" # Aggregation: Most-called functions chainsaw graph query " MATCH (a)-[:calls]->(b) RETURN b.name, COUNT(a) AS callers GROUP BY b.name ORDER BY callers DESC LIMIT 10 " # Get code snippets with file paths chainsaw graph query " MATCH (f:FUNCTION)-[:calls]->(t) RETURN f.name, t.name, t.snippet, t.file, t.lines "
Advanced Usage
Entity Types
- •
FUNCTION- Functions and top-level functions - •
METHOD- Methods on structs/classes - •
TYPE- Type definitions - •
INTERFACE- Interface definitions - •
STRUCT- Struct/class definitions - •
PACKAGE- Package/module imports - •
VARIABLE- Variables - •
CONSTANT- Constants
Relation Types
- •
calls- Function/method invocations - •
uses- References and usage - •
imports- Package imports - •
implements- Interface implementations - •
extends- Type extensions/embedding - •
defines- Definitions - •
has_field- Struct field relationships
Multi-Hop Queries
Use *min..max syntax for recursive traversal:
# Exactly 2 hops chainsaw graph query "MATCH (a)-[:calls*2]->(b) RETURN a.name, b.name" # 1 to 3 hops chainsaw graph query "MATCH (a)-[:calls*1..3]->(b) RETURN a.name, b.name" # Unlimited hops (use carefully!) chainsaw graph query "MATCH (a)-[:calls*]->(b) RETURN a.name, b.name"
Aggregation Queries
# Count by entity type chainsaw graph query " MATCH (n) RETURN n.entity_type, COUNT(n) AS count GROUP BY n.entity_type ORDER BY count DESC " # Find fan-out (functions that call many others) chainsaw graph query " MATCH (a:FUNCTION)-[:calls]->(b) RETURN a.name, COUNT(b) AS targets GROUP BY a.name ORDER BY targets DESC LIMIT 10 "
Output Formats
Both search and graph queries support YAML and JSON:
chainsaw search "query" --format json chainsaw graph query "..." --format json | jq '.results[].f_name'
Configuration
Configuration file: ~/.chainsaw/config.yaml
version: "2.0"
active_profile: "default"
profiles:
default:
# Directories to watch
include:
- ~/Projects/myproject
# Directories to skip
exclude:
- node_modules
- .git
- vendor
# File patterns
whitelist:
- "**/*.go"
- "**/*.py"
- "**/*.js"
- "**/*.ts"
# Embedding settings
embedding_model: "nomic-embed-text"
chunk_size: 512
overlap: 64
# Graph extraction
graph_driver:
model: "qwen2.5:3b"
batch_size: 100
Using Cloud LLMs for Graph Extraction
Keep embeddings local, use cloud for graph:
profiles:
default:
embedding_model: "nomic-embed-text" # Local
# Cloud LLM for graph extraction
llm_provider: "openai"
llm_base_url: "https://openrouter.ai/api/v1"
llm_api_key: "${OPENROUTER_API_KEY}"
graph_driver:
model: "anthropic/claude-3.5-haiku"
batch_size: 100
Tips and Best Practices
Search Tips
- •Use natural language - "error handling" works better than "try catch"
- •Be specific - "HTTP middleware with authentication" is better than just "middleware"
- •Check context - Search is scoped to current directory
- •Increase limit - Default is 10 results, use
--limit 20for more
Graph Query Tips
- •Start simple - Begin with single-hop queries, then add complexity
- •Use LIMIT - Always limit results to avoid overwhelming output
- •Return snippets - Use
t.snippetto see actual code in results - •Return file paths - Use
t.fileandt.linesto locate code - •Watch depth - Multi-hop queries can explode; start with
*1..2
Performance Tips
- •Index incrementally - Daemon watches for changes automatically
- •Exclude noise - Add
node_modules,vendor,.gitto exclude list - •Use whitelist - Specify file patterns to index only relevant files
- •Monitor status - Run
chainsaw statusto check indexing progress - •Check daemon logs -
journalctl --user -u chainsawd -f
Troubleshooting
# Check daemon status systemctl --user status chainsawd # View daemon logs journalctl --user -u chainsawd -f # Verify database chainsaw status # Check Ollama ollama list ollama pull nomic-embed-text # Rebuild binary cd ~/Projects/chainsaw make install hash -r # Clear shell cache
Common Patterns
Finding Code Entry Points
chainsaw search "main function application entry" chainsaw graph query "MATCH (m:FUNCTION) WHERE m.name = 'main' RETURN m.snippet, m.file"
Exploring Dependencies
# What does this package import? chainsaw graph query "MATCH (p:PACKAGE)-[:imports]->(t) WHERE p.name = 'api' RETURN t.name" # What imports this package? chainsaw graph query "MATCH (p)-[:imports]->(t:PACKAGE) WHERE t.name = 'utils' RETURN p.name"
Finding Implementations
# What implements this interface? chainsaw graph query "MATCH (i:INTERFACE)<-[:implements]-(s) WHERE i.name = 'Handler' RETURN s.name, s.snippet"
Analyzing Call Graphs
# Direct calls chainsaw graph query "MATCH (a:FUNCTION)-[:calls]->(b) WHERE a.name = 'processRequest' RETURN b.name" # Call chains chainsaw graph query "MATCH (a:FUNCTION)-[:calls*1..3]->(b) WHERE a.name = 'main' RETURN b.name, b.file"
Integration with AI Agents
Chainsaw's YAML output is designed for AI agent consumption:
# Search for relevant code CONTEXT=$(chainsaw search "database schema migrations" --format yaml) # Query relationships GRAPH=$(chainsaw graph query "MATCH (m:FUNCTION)-[:calls]->(d) WHERE m.name =~ '.*migrate.*' RETURN m.name, d.name" --format yaml) # Feed to LLM echo "$CONTEXT" | llm "Explain these database migration patterns"
Example Workflows
Understanding a New Codebase
# 1. Index the codebase cd /path/to/project chainsaw index . # 2. Find entry points chainsaw search "main application entry point" # 3. Explore main dependencies chainsaw graph query "MATCH (m:FUNCTION)-[:calls]->(t) WHERE m.name = 'main' RETURN t.name" # 4. Find key interfaces chainsaw graph query "MATCH (i:INTERFACE) RETURN i.name, i.snippet LIMIT 10" # 5. See what implements them chainsaw graph query "MATCH (i:INTERFACE)<-[:implements]-(s) RETURN i.name, s.name"
Refactoring Analysis
# Find all callers of a function chainsaw graph query "MATCH (a)-[:calls]->(t:FUNCTION) WHERE t.name = 'oldFunction' RETURN a.name, a.file, a.lines" # Find transitive dependencies chainsaw graph query "MATCH (a:FUNCTION)-[:calls*]->(b) WHERE a.name = 'criticalFunction' RETURN b.name" # Find most-coupled components chainsaw graph query " MATCH (a:FUNCTION)-[:calls]->(b) RETURN a.name, COUNT(b) AS dependencies GROUP BY a.name ORDER BY dependencies DESC LIMIT 20 "
Limitations
- •Read-only - Cannot modify code
- •No cross-project queries - Search scoped to current directory tree
- •Language-agnostic extraction - Graph quality depends on LLM model
- •Index lag - New changes take a few seconds to index
- •No version control - Searches current state only, not history
Resources
- •Manual: See
MANUAL.mdfor complete reference - •Developer Guide: See
DEVELOPER.mdfor architecture - •Examples: Check
examples/directory - •Issues: https://github.com/wouteroostervld/chainsaw/issues
Version
Compatible with Chainsaw v0.1.0+ (Schema version 2.4.0)