Microservices Architecture

Name: Grantflow Architecture
Author: grantflow-ai

Services Overview

Backend API (services/backend)

•Framework: Litestar async with msgspec serialization
•Auth: Firebase JWT with organization claims
•Pattern: /organizations/{id}/projects/{id}/applications/{id}
•WebSockets: /api/socket/grant-applications (real-time notifications)
•Webhooks: email notifications, grant matching, entity cleanup

Document Indexing (services/indexer)

•Pipeline: PDF/DOC/HTML → kreuzberg extraction → chunks → embeddings
•Entity/keyword extraction with scientific configuration
•Token optimization: 35% reduction through intelligent chunking
•Embeddings: stored in pgvector (384 dims, HNSW index)
•Trigger: Pub/Sub file-indexing topic

Web Crawler (services/crawler)

•Technology: Playwright for JavaScript-heavy sites
•Strategy: recursive link extraction (max depth 2)
•Conversion: HTML → markdown (trafilatura)
•Deduplication: RagUrl table with URL tracking
•Rate limiting: respect robots.txt, add delays

RAG Service (services/rag)

•Multi-stage pipelines: 3 stages for grant applications
•Pipeline stages: BLUEPRINT_PREP → WORKPLAN_GENERATION → SECTION_SYNTHESIS
•LLMs: Gemini Flash 2.5 (primary), Claude Sonnet (specialized)
•Enrichment: Wikidata SPARQL for scientific context
•Evaluation: AI-based quality assessment with rubrics

Grant Scraper (services/scraper)

•Sources: NIH Reporter, grants.gov
•Technology: Playwright for page downloads
•Extraction: eligibility, funding amounts, deadlines
•Notifications: Discord webhooks

Real-time Collaboration

•CRDT Server (crdt/): Hocuspocus WebSocket with Y.js
•Editor (editor/): TipTap collaborative package
•Conflict resolution: CRDT for concurrent edits

Message Flow

code

User Upload → GCS → Pub/Sub (file-indexing)
              ↓
  Indexer: Extract → Chunk → Embed → PostgreSQL/pgvector
              ↓
  RAG: Process (rag-processing topic) → frontend-notifications