Chunking is where most RAG applications silently fail. The retrieval works, the generation works, but the chunks themselves are poorly constructed — too small, too large, split at wrong boundaries, or missing critical context.
This guide covers practical chunking strategies with concrete guidance on sizes, overlap, boundaries, and metadata preservation.
Quick answer
Split documents at natural boundaries (paragraphs, sections, headings), use chunks of 500-1000 tokens with 100-200 token overlap, preserve metadata (source, section heading, page number) with each chunk, and test retrieval quality with real questions.
- You are building or improving a RAG application.
- Your RAG system retrieves irrelevant or incomplete chunks.
- You need to process different document types (reports, code, FAQs) effectively.
Why chunking matters
A RAG system can only generate good answers if it retrieves good chunks. If chunks are too small, they lack context. If they are too large, the embedding is too diluted to match specific queries. If they split mid-thought, the model gets incomplete information.
Chunking is the foundation of RAG quality. Improve chunking before investing in better embeddings or reranking.
Chunk size guidelines
For most document types, 500-1000 tokens per chunk works well. This is roughly 2-4 paragraphs — enough to contain a complete idea but specific enough to match relevant queries.
| Document Type | Recommended Chunk Size | Overlap | Split Boundary |
|---|---|---|---|
| Technical docs | 800-1000 tokens | 200 tokens | Section headings |
| FAQs | 200-400 tokens | 50 tokens | Question-answer pairs |
| Legal/contracts | 500-800 tokens | 150 tokens | Clause boundaries |
| Code files | 300-500 tokens | 100 tokens | Function/class boundaries |
| Meeting notes | 400-600 tokens | 100 tokens | Topic changes |
Overlap between chunks
Overlap ensures that ideas split across chunk boundaries are not lost. Each chunk shares some text with the previous and next chunks.
Typical overlap is 10-20% of chunk size. Too little overlap risks losing boundary context. Too much overlap wastes storage and can introduce duplicate retrieval results.
Splitting at natural boundaries
The worst chunking strategy is splitting at a fixed character count regardless of content. The best is splitting at natural document boundaries: headings, paragraphs, code blocks, or semantic shifts.
Use a hierarchical approach: try to split at section headings first, then at paragraph boundaries, then at sentence boundaries as a last resort.
# Simple recursive chunking
def chunk_document(text: str, max_tokens: int = 800, overlap: int = 150) -> list[str]:
# Try splitting by sections first
sections = text.split("\n## ")
chunks = []
for section in sections:
if count_tokens(section) <= max_tokens:
chunks.append(section)
else:
# Split large sections by paragraphs
paragraphs = section.split("\n\n")
current = ""
for para in paragraphs:
if count_tokens(current + para) > max_tokens:
chunks.append(current)
# Keep overlap from previous chunk
current = get_last_n_tokens(current, overlap) + para
else:
current += "\n\n" + para
if current:
chunks.append(current)
return chunksPreserving metadata
Every chunk should carry metadata: which document it came from, what section, what page, and any other context that helps the retrieval system and the user understand the source.
This metadata is also useful in the prompt — you can tell the model 'This information comes from the Employee Handbook, Section 3.2' which helps the model frame its answer correctly.
Testing chunking quality
Create a test set of 10-20 questions where you know which document section contains the answer. Run retrieval with your chunks and check: does the right chunk appear in the top 3 results? If not, your chunking needs adjustment.
Common fixes: increase chunk size if context is lost, decrease if chunks are too generic, adjust boundaries if chunks split important information.
Worked example: chunking a product documentation site
A product documentation site has 150 pages of varying length. You split at heading boundaries, use 800-token chunks with 150-token overlap, and include the page title and section heading as metadata. Testing with 15 real customer questions shows 80% retrieval accuracy — up from 55% with the fixed-size chunking used before.
Common mistakes
- Using fixed character-count splitting without regard for content boundaries.
- Not testing retrieval quality after changing chunking strategy.
- Forgetting to include metadata — chunks without context are harder to use.
When to use something else
For improving retrieval after chunking with reranking, see RAG with reranking. For the full RAG pipeline, see building a RAG app.
How to apply this in a real AI project
How to Chunk Documents for Better RAG Results becomes much more useful once it is tied to the rest of the workflow around it. In real work, the result depends on model selection, prompt design, tool integration, evaluation, and the operational reality of shipping AI features, not only on following one local tip correctly.
That is why the biggest win rarely comes from one clever move in isolation. It comes from making the surrounding process easier to review, easier to repeat, and easier to hand over when another person inherits the workbook or codebase later.
- Test with realistic inputs before shipping, not just the examples that inspired the idea.
- Keep the human review step visible so the workflow stays trustworthy as it scales.
- Measure what matters for your use case instead of relying on general benchmarks.
How to extend the workflow after this guide
Once the core technique works, the next leverage usually comes from standardising it. That might mean naming inputs more clearly, keeping one review checklist, or pairing this page with neighbouring guides so the process becomes repeatable rather than person-dependent.
The follow-on guides below are the most natural next steps from How to Chunk Documents for Better RAG Results. They help move the reader from one useful page into a stronger connected system.
- Go next to How to Build a RAG App With Your Own Documents if you want to deepen the surrounding workflow instead of treating How to Chunk Documents for Better RAG Results as an isolated trick.
- Go next to How to Improve RAG Answers With Reranking if you want to deepen the surrounding workflow instead of treating How to Chunk Documents for Better RAG Results as an isolated trick.
- Go next to How to Use Local AI on Your Own Files if you want to deepen the surrounding workflow instead of treating How to Chunk Documents for Better RAG Results as an isolated trick.
Related guides on this site
These guides cover the full RAG pipeline, retrieval improvement, and file processing.
- How to Build a RAG App With Your Own Documents
- How to Improve RAG Answers With Reranking
- How to Use Local AI on Your Own Files
- How to Evaluate AI Outputs in Real Apps
Want to use AI tools more effectively?
My courses cover practical AI workflows, from spreadsheet automation to app development, with real projects and honest tool comparisons.
Browse AI courses