Emon Sarker

The gap between a RAG demo and a production RAG system is roughly the same as the gap between a prototype airplane and one you'd actually fly in. Both have wings. Only one won't kill you.

After building RAG systems that serve thousands of daily users in a B2B context, here are the lessons that aren't in the tutorials.

Lesson 1: Retrieval Quality Is Everything

The LLM is only as good as what you feed it. We spent 60% of our engineering time on retrieval — chunking strategies, embedding model selection, re-ranking — and 10% on the generation prompt. This ratio felt wrong at first. It was exactly right.

Chunking Strategy

Naive chunking (split every 512 tokens) produces garbage retrieval. Instead:

Semantic chunking: Split at paragraph or section boundaries
Overlap: 10-15% overlap between chunks prevents context loss at boundaries
Metadata enrichment: Attach section headers, document title, and page numbers to each chunk

Lesson 2: Validate Everything

We implemented a three-layer validation system:

Retrieval scoring — Drop chunks below a relevance threshold before they reach the LLM
Grounded generation — Constrain the LLM to cite its sources explicitly
Post-generation verification — A second pass checks every claim against source documents

This reduced our hallucination rate from 12% to 0.4%.

Lesson 3: Latency Is a Feature

Users will tolerate 3-5 seconds for a high-quality response. They will not tolerate 30 seconds, even if the response is perfect. We optimized aggressively:

Parallel retrieval across multiple indices
Streaming responses (show partial output immediately)
Caching frequent queries

Lesson 4: Evaluation Is Hard

How do you measure RAG quality at scale? We built a custom evaluation framework:

Faithfulness: Does the response only contain information from retrieved documents?
Relevance: Does the response actually answer the question?
Completeness: Did it miss important information that was available?

Automated evaluation using a judge LLM gets you 80% of the way. Human evaluation covers the remaining 20%.

The Bottom Line