Yearn Docs: hosted retrieval
Inputs
- •
BASE(optional): The docs site origin to query (e.g. a Vercel preview). Defaults tohttps://docs.yearn.fi.
Example:
BASE=https://docs.yearn.fi
Tip: if you use a bash one-liner like BASE=... python3 ... --base "$BASE" ..., $BASE expands before the assignment takes effect. Prefer either --base 'https://...' directly, or export BASE=... first, or omit --base and rely on the default.
One-approval workflow (recommended)
To avoid repeated network approvals in restricted environments, use the bundled helper. It auto-checks for updates (conditional request) and only downloads docs.jsonl when it has changed, so you typically need only one approval per invocation:
python3 skills/yearn-docs-site-query/scripts/yearn_docs_query.py --base "$BASE" search "yCHAD multisig signers"
If you want absolutely no network calls, force offline mode (no network at all):
python3 skills/yearn-docs-site-query/scripts/yearn_docs_query.py --base "$BASE" --offline search "veYFI gauge"
To fetch a page by route:
python3 skills/yearn-docs-site-query/scripts/yearn_docs_query.py --base "$BASE" get /developers/security/multisig
To clear cached files for that BASE:
python3 skills/yearn-docs-site-query/scripts/yearn_docs_query.py --base "$BASE" cleanup
To see whether your cached data is stale (age/etag/manifest info):
python3 skills/yearn-docs-site-query/scripts/yearn_docs_query.py --base "$BASE" status
To check the remote for updates and refresh the cache if needed (will require network approval in restricted environments):
python3 skills/yearn-docs-site-query/scripts/yearn_docs_query.py --base "$BASE" --check-updates status
To skip update checks and use cached data (if present):
python3 skills/yearn-docs-site-query/scripts/yearn_docs_query.py --base "$BASE" --no-auto-update search "multisig signers"
To force a full re-download (even if unchanged):
python3 skills/yearn-docs-site-query/scripts/yearn_docs_query.py --base "$BASE" --refresh status
Endpoints (relative to BASE)
- •
GET /llms.txt— human/agent pointer file. - •
GET /ai/manifest.json— machine-readable pointers (paths, counts, origin). - •
GET /ai/docs.jsonl— plaintext corpus (JSON Lines: 1 JSON object per doc page). - •
GET /ai/raw/...— optional raw.md/.mdxsource mirror (if present in corpus records).
Retrieval workflow
- •Fetch
GET {BASE}/ai/manifest.jsonandGET {BASE}/llms.txt. - •Download
GET {BASE}/ai/docs.jsonland parse it as JSONL (stream line-by-line). - •Rank candidate records for the user query:
- •Tokenize the query and score matches.
- •Boost matches in
titleandheadings[].textover matches intext.
- •Select the top records (e.g. 3–8), then select the best sections:
- •Split
textby headings and keep the most relevant chunks.
- •Split
- •Answer using only the selected chunks (don’t hallucinate fields not present in the docs).
Record shape (docs.jsonl)
Expect at least:
- •
title: page title - •
headings: array of{ level, id?, text } - •
text: extracted plaintext (may include fenced code blocks) - •
route: the path on the site host (e.g./developers/addresses) - •
url: canonical URL for the page (often onhttps://docs.yearn.fi/...) - •
source.htmlPath: where it came from in the static build
Optional/newer:
- •
canonicalUrl - •
source.rawPath: path under/ai/raw/...for exact.md/.mdx
Citing and linking
- •Prefer citing the canonical URL in the record:
- •Use
canonicalUrlif present, elseurl.
- •Use
- •If
urlpoints to the wrong origin for the environment you’re using, citeBASE + route.
Exact-source fallback (for precise quotes/formatting)
When you need exact formatting, tables, or the original MDX:
- •If
source.rawPathexists:GET {BASE}{source.rawPath}and quote from that. - •Otherwise fetch the rendered page:
GET {BASE}{route}and extract the relevant section from HTML.
Caching
- •Cache
GET {BASE}/ai/docs.jsonlper session. - •If you can, revalidate with
ETag/If-None-Matchon subsequent runs.