Skip to work
All work
2026Production-style RAG · Measured retrieval

SEC RAG AnalystHybrid Retrieval over 10-K Filings

A production-style RAG assistant over SEC 10-K filings. Section-aware chunking feeds hybrid BM25-plus-dense retrieval, RRF fusion, and a cross-encoder rerank, returning grounded answers with inline citations and a labeled eval.

System architecture

10-KSEC EDGARChunksection-awareBM25lexicalDenseBGE + FAISSRRF + rerankcross-encoderClaudecited answer

Build spec

Source
SEC EDGAR 10-K, ticker to CIK
Retrieval
BM25 + BGE dense in FAISS, RRF-fused
Rerank
BAAI/bge-reranker-base cross-encoder
Generation
Claude with inline [n] citations
Eval
Section-recall@k + reranker ablation

Problem

10-K filings are the hard case for RAG: 100-plus pages, dense tables, repetitive boilerplate, and cross-references, where naive embed-everything plus top-k cosine retrieves plausible but wrong passages. Answers without provenance or measurement cannot be trusted in financial analysis.

Approach

An ingestion step resolves ticker to CIK to the latest 10-K from SEC EDGAR, strips HTML, and segments into Item sections. Section-aware chunking slides word windows that never straddle an Item boundary. A hybrid index combines BM25 lexical retrieval with BGE-small dense embeddings in FAISS, fuses them with Reciprocal Rank Fusion, and reranks with a BGE cross-encoder. Generation produces grounded answers with inline citations via Claude, with an extractive fallback when no API key is present. Served through FastAPI /ask and Streamlit, with an eval measuring section-recall@k and the reranker's lift.

Impact

It does the production-grade RAG parts tutorials skip, section-aware chunking, hybrid retrieval, reranking, citations, and then measures them. A no-rerank ablation proves the cross-encoder earns its latency, and the whole pipeline runs offline end to end via a synthetic filing and extractive fallback, with no network or API keys.

Decisions & tradeoffs

RRF fusion over score normalization

Reciprocal Rank Fusion merges BM25 and dense results by rank, avoiding fragile cross-scorer normalization. The hybrid merge stays robust when lexical and dense scores live on incomparable scales.

Extractive fallback when no API key

When the Anthropic key is absent, the system returns top passages extractively instead of failing. The whole pipeline and its eval run offline and key-free for reproducibility.

Section-aware chunking

Chunks are bounded to Item sections rather than fixed character counts, so a chunk never straddles a boundary. Retrieved context stays coherent for a document type full of cross-references and boilerplate.

System notes

  • Four-stage retrieval (BM25, dense, RRF fusion, cross-encoder rerank) where each stage's contribution is independently measurable
  • A no-rerank ablation flag proves the cross-encoder earns its added latency via section-recall@k lift
  • Section-aware chunking never crosses an Item boundary, avoiding cross-reference contamination
  • A fully offline path with a synthetic 10-K and extractive fallback runs end to end with no network and no key

Stack

BM25 · FAISS · BGE · Cross-encoder · Claude · FastAPI

View source on GitHub
Next project
Credit Default MLOps · Notebook to Operated System