How to Chunk Documents for Better RAG Results

Coding Liquids blog cover featuring Sagnik Bhattacharya for chunking documents for better RAG results.
Coding Liquids blog cover featuring Sagnik Bhattacharya for chunking documents for better RAG results.

Chunking is where most RAG applications silently fail. The retrieval works, the generation works, but the chunks themselves are poorly constructed — too small, too large, split at wrong boundaries, or missing critical context.

This guide covers practical chunking strategies with concrete guidance on sizes, overlap, boundaries, and metadata preservation.

I teach Flutter and Excel with AI — explore my courses if you want structured learning.

Quick answer

Split documents at natural boundaries (paragraphs, sections, headings), use chunks of 500-1000 tokens with 100-200 token overlap, preserve metadata (source, section heading, page number) with each chunk, and test retrieval quality with real questions.

  • You are building or improving a RAG application.
  • Your RAG system retrieves irrelevant or incomplete chunks.
  • You need to process different document types (reports, code, FAQs) effectively.
Follow me on Instagram@sagnikteaches

Why chunking matters

A RAG system can only generate good answers if it retrieves good chunks. If chunks are too small, they lack context. If they are too large, the embedding is too diluted to match specific queries. If they split mid-thought, the model gets incomplete information.

Chunking is the foundation of RAG quality. Improve chunking before investing in better embeddings or reranking.

Connect on LinkedInSagnik Bhattacharya

Chunk size guidelines

For most document types, 500-1000 tokens per chunk works well. This is roughly 2-4 paragraphs — enough to contain a complete idea but specific enough to match relevant queries.

Document TypeRecommended Chunk SizeOverlapSplit Boundary
Technical docs800-1000 tokens200 tokensSection headings
FAQs200-400 tokens50 tokensQuestion-answer pairs
Legal/contracts500-800 tokens150 tokensClause boundaries
Code files300-500 tokens100 tokensFunction/class boundaries
Meeting notes400-600 tokens100 tokensTopic changes
Subscribe on YouTube@codingliquids

Overlap between chunks

Overlap ensures that ideas split across chunk boundaries are not lost. Each chunk shares some text with the previous and next chunks.

Typical overlap is 10-20% of chunk size. Too little overlap risks losing boundary context. Too much overlap wastes storage and can introduce duplicate retrieval results.

Splitting at natural boundaries

The worst chunking strategy is splitting at a fixed character count regardless of content. The best is splitting at natural document boundaries: headings, paragraphs, code blocks, or semantic shifts.

Use a hierarchical approach: try to split at section headings first, then at paragraph boundaries, then at sentence boundaries as a last resort.

# Simple recursive chunking
def chunk_document(text: str, max_tokens: int = 800, overlap: int = 150) -> list[str]:
    # Try splitting by sections first
    sections = text.split("\n## ")
    chunks = []
    for section in sections:
        if count_tokens(section) <= max_tokens:
            chunks.append(section)
        else:
            # Split large sections by paragraphs
            paragraphs = section.split("\n\n")
            current = ""
            for para in paragraphs:
                if count_tokens(current + para) > max_tokens:
                    chunks.append(current)
                    # Keep overlap from previous chunk
                    current = get_last_n_tokens(current, overlap) + para
                else:
                    current += "\n\n" + para
            if current:
                chunks.append(current)
    return chunks

Preserving metadata

Every chunk should carry metadata: which document it came from, what section, what page, and any other context that helps the retrieval system and the user understand the source.

This metadata is also useful in the prompt — you can tell the model 'This information comes from the Employee Handbook, Section 3.2' which helps the model frame its answer correctly.

Testing chunking quality

Create a test set of 10-20 questions where you know which document section contains the answer. Run retrieval with your chunks and check: does the right chunk appear in the top 3 results? If not, your chunking needs adjustment.

Common fixes: increase chunk size if context is lost, decrease if chunks are too generic, adjust boundaries if chunks split important information.

Worked example: chunking a product documentation site

A product documentation site has 150 pages of varying length. You split at heading boundaries, use 800-token chunks with 150-token overlap, and include the page title and section heading as metadata. Testing with 15 real customer questions shows 80% retrieval accuracy — up from 55% with the fixed-size chunking used before.

Common mistakes

  • Using fixed character-count splitting without regard for content boundaries.
  • Not testing retrieval quality after changing chunking strategy.
  • Forgetting to include metadata — chunks without context are harder to use.

When to use something else

For improving retrieval after chunking with reranking, see RAG with reranking. For the full RAG pipeline, see building a RAG app.

How to apply this in a real AI project

How to Chunk Documents for Better RAG Results becomes much more useful once it is tied to the rest of the workflow around it. In real work, the result depends on model selection, prompt design, tool integration, evaluation, and the operational reality of shipping AI features, not only on following one local tip correctly.

That is why the biggest win rarely comes from one clever move in isolation. It comes from making the surrounding process easier to review, easier to repeat, and easier to hand over when another person inherits the workbook or codebase later.

  • Test with realistic inputs before shipping, not just the examples that inspired the idea.
  • Keep the human review step visible so the workflow stays trustworthy as it scales.
  • Measure what matters for your use case instead of relying on general benchmarks.

How to extend the workflow after this guide

Once the core technique works, the next leverage usually comes from standardising it. That might mean naming inputs more clearly, keeping one review checklist, or pairing this page with neighbouring guides so the process becomes repeatable rather than person-dependent.

The follow-on guides below are the most natural next steps from How to Chunk Documents for Better RAG Results. They help move the reader from one useful page into a stronger connected system.

Related guides on this site

These guides cover the full RAG pipeline, retrieval improvement, and file processing.

Want to use AI tools more effectively?

My courses cover practical AI workflows, from spreadsheet automation to app development, with real projects and honest tool comparisons.

Browse AI courses