AgentSkillsCN

rag-retrieval

rag检索

SKILL.md

RAG Retrieval Skill

Query your local document knowledge base using semantic search and get AI-powered answers.

Overview

This skill enables RAG (Retrieval-Augmented Generation) queries against your locally indexed documents. It uses semantic search to find relevant documents and generates answers using Claude Haiku.

Usage

code
/skill rag-retrieval "How to configure the API?"

Features

  • Semantic Search: Uses vector similarity to find relevant documents
  • Hybrid Retrieval: Combines vector search with keyword matching for better accuracy
  • Context-Aware Answers: Uses claude-haiku-4-5-20251001 to generate responses
  • Citation Support: Shows sources for generated answers
  • Performance Monitoring: Tracks query latency and accuracy

Arguments

  • query (required): Your question or search query
  • --top-k (optional): Number of documents to retrieve (default: 5)
  • --threshold (optional): Minimum similarity score (default: 0.7)
  • --mode (optional): Search mode - "hybrid", "vector", or "keyword" (default: "hybrid")

Examples

Basic Query

code
/skill rag-retrieval "What is the authentication process?"

Retrieve More Context

code
/skill rag-retrieval "How to handle errors?" --top-k 10

Vector-Only Search

code
/skill rag-retrieval "API rate limits" --mode vector

Configuration

The skill uses the following configuration from config/default.yaml:

  • retrieval.top_k: Default number of documents to retrieve
  • retrieval.hybrid_ratio: Balance between vector and keyword search (0.7 = 70% vector)
  • claude.model: LLM model for response generation
  • claude.max_tokens: Maximum response length

Performance

Typical latencies:

  • Vector search: <100ms
  • End-to-end response: <5 seconds
  • Indexing: ~0.5s per 100 documents

Requirements

  • Indexed documents in data/vectors/
  • Valid Anthropic API key in environment
  • At least 2GB RAM for vector operations

Troubleshooting

No Results Found

  • Ensure documents are indexed: python scripts/index.py --input data/documents
  • Lower the similarity threshold: --threshold 0.5
  • Try keyword mode if vector search fails

Slow Responses

  • Reduce top_k value for faster retrieval
  • Check if vector index is optimized for your document count
  • Monitor memory usage with python -m src.monitoring.tcp_server

API Errors

  • Verify ANTHROPIC_API_KEY environment variable
  • Check API rate limits and quota
  • Review logs in logs/rag_cli.log