RAG in Production: What Nobody Tells You About Chunking

Every RAG tutorial makes it look easy: chunk your documents, embed them, retrieve the top-k results, stuff them into a prompt. Ship it.

We've built RAG systems for healthcare, legal, and enterprise clients. Here's what actually happens when you move past the demo.

The chunking problem nobody warns you about

Fixed-size chunking works great on blog posts. It falls apart the moment your documents have:

Tables that span multiple pages
Numbered lists where context carries across items
Headers that define scope for everything below them
Multi-column layouts (common in PDFs)

We've tried recursive character splitting, sentence-based splitting, semantic chunking, and document-structure-aware chunking. The answer is always: it depends on your documents.

The best chunking strategy is the one that preserves the context your users actually search for.

What we actually do now

After shipping 6+ RAG systems, here's our approach:

Start with document analysis. Look at 50 real documents before writing a single line of chunking code.
Hybrid chunking. Use document structure when available (headings, sections), fall back to semantic boundaries.
Overlap with context. Include the parent heading in every chunk. A chunk without its heading is a paragraph without meaning.
Test with real queries. Not "what is X?" — test with the messy, ambiguous questions your actual users ask.

The retrieval layer matters more than you think

Everyone obsesses over chunking. The retrieval layer is where most RAG systems actually fail. Hybrid search (BM25 + semantic) with reranking consistently beats pure vector search in our benchmarks.

The evaluation pipeline

You need three metrics: retrieval precision (did we get the right chunks?), answer faithfulness (does the answer match the chunks?), and answer relevance (does it actually answer the question?). Without all three, you're flying blind.

The bottom line

RAG is not a weekend project. It's a system with at least four failure modes (ingestion, chunking, retrieval, generation) and each one needs its own evaluation pipeline. Budget twice the time you think you need. And start with the documents, not the code.