How to Chunk Documents for Better RAG Results

By Sagnik Bhattacharya 21 Mar 2026 5 min read

Coding Liquids blog cover featuring Sagnik Bhattacharya for chunking documents for better RAG results.

Chunking is where most RAG applications silently fail. The retrieval works, the generation works, but the chunks themselves are poorly constructed — too small, too large, split at wrong boundaries, or missing critical context.

I teach Flutter and Excel with AI — explore my courses if you want structured learning.

This guide covers practical chunking strategies with concrete guidance on sizes, overlap, boundaries, and metadata preservation.

Quick answer

Split documents at natural boundaries (paragraphs, sections, headings), use chunks of 500-1000 tokens with 100-200 token overlap, preserve metadata (source, section heading, page number) with each chunk, and test retrieval quality with real questions.

You are building or improving a RAG application.
Your RAG system retrieves irrelevant or incomplete chunks.
You need to process different document types (reports, code, FAQs) effectively.

Why chunking matters

A RAG system can only generate good answers if it retrieves good chunks. If chunks are too small, they lack context. If they are too large, the embedding is too diluted to match specific queries. If they split mid-thought, the model gets incomplete information.

Chunking is the foundation of RAG quality. Improve chunking before investing in better embeddings or reranking.

Chunk size guidelines

For most document types, 500-1000 tokens per chunk works well. This is roughly 2-4 paragraphs — enough to contain a complete idea but specific enough to match relevant queries.

Document Type	Recommended Chunk Size	Overlap	Split Boundary
Technical docs	800-1000 tokens	200 tokens	Section headings
FAQs	200-400 tokens	50 tokens	Question-answer pairs
Legal/contracts	500-800 tokens	150 tokens	Clause boundaries
Code files	300-500 tokens	100 tokens	Function/class boundaries
Meeting notes	400-600 tokens	100 tokens	Topic changes

Overlap between chunks

Overlap ensures that ideas split across chunk boundaries are not lost. Each chunk shares some text with the previous and next chunks.

Typical overlap is 10-20% of chunk size. Too little overlap risks losing boundary context. Too much overlap wastes storage and can introduce duplicate retrieval results.

Splitting at natural boundaries

The worst chunking strategy is splitting at a fixed character count regardless of content. The best is splitting at natural document boundaries: headings, paragraphs, code blocks, or semantic shifts.

Use a hierarchical approach: try to split at section headings first, then at paragraph boundaries, then at sentence boundaries as a last resort.

# Simple recursive chunking
def chunk_document(text: str, max_tokens: int = 800, overlap: int = 150) -> list[str]:
    # Try splitting by sections first
    sections = text.split("\n## ")
    chunks = []
    for section in sections:
        if count_tokens(section) <= max_tokens:
            chunks.append(section)
        else:
            # Split large sections by paragraphs
            paragraphs = section.split("\n\n")
            current = ""
            for para in paragraphs:
                if count_tokens(current + para) > max_tokens:
                    chunks.append(current)
                    # Keep overlap from previous chunk
                    current = get_last_n_tokens(current, overlap) + para
                else:
                    current += "\n\n" + para
            if current:
                chunks.append(current)
    return chunks

Preserving metadata

Every chunk should carry metadata: which document it came from, what section, what page, and any other context that helps the retrieval system and the user understand the source.

This metadata is also useful in the prompt — you can tell the model 'This information comes from the Employee Handbook, Section 3.2' which helps the model frame its answer correctly.

Testing chunking quality

Create a test set of 10-20 questions where you know which document section contains the answer. Run retrieval with your chunks and check: does the right chunk appear in the top 3 results? If not, your chunking needs adjustment.

Common fixes: increase chunk size if context is lost, decrease if chunks are too generic, adjust boundaries if chunks split important information.

Worked example: chunking a product documentation site

A product documentation site has 150 pages of varying length. You split at heading boundaries, use 800-token chunks with 150-token overlap, and include the page title and section heading as metadata. Testing with 15 real customer questions shows 80% retrieval accuracy — up from 55% with the fixed-size chunking used before.

Common mistakes

Using fixed character-count splitting without regard for content boundaries.
Not testing retrieval quality after changing chunking strategy.
Forgetting to include metadata — chunks without context are harder to use.

When to use something else

For improving retrieval after chunking with reranking, see RAG with reranking. For the full RAG pipeline, see building a RAG app.

Frequently asked questions

Why does chunking make or break RAG?

The model can only answer from what is retrieved, and you can only retrieve what you chunked well. Too small loses context, too large dilutes the embedding, and splitting mid-thought hands the model half an idea.

What chunk size and overlap should I start with?

Around 500-1000 tokens with 100-200 tokens of overlap. The overlap stops ideas that straddle a boundary from being lost between two chunks.

Should I split by tokens or by structure?

By structure first (paragraphs, sections, headings), then cap by token count. Natural boundaries keep each chunk coherent, whereas a fixed character split alone tends to cut mid-thought.

What metadata should I store with each chunk?

Source document, section or heading, and page or position at minimum. Metadata lets you filter retrieval, cite sources, and debug which chunk produced an answer.

How do I know my chunking is good?

Test retrieval with real questions and check whether the chunk that should answer each one is actually retrieved. If it is not, fix chunking before touching the model or the prompt.

Does one chunking strategy fit every document?

No. Dense reference docs, chatty transcripts, and source code each want different sizes and boundaries. Match the strategy to the document type rather than forcing one global rule.

Related guides on this site

These guides cover the full RAG pipeline, retrieval improvement, and file processing.