Skip to main content

COMPEL Glossary / late-chunking

Late chunking

A chunking pattern in which the full document is embedded as a long sequence and chunk boundaries are applied at retrieval time against the embedded representation — rather than chunking before embedding.

What this means in practice

Preserves long-range context in the embedding while still returning chunk-sized results to the generator.

Synonyms

retrieval-time chunking , late-interaction chunking

See also

  • Chunking — The process of dividing documents into units — typically fixed-token windows or paragraph-level segments — suitable for embedding and retrieval.
  • Semantic chunking — A chunking strategy that respects semantic boundaries — sentence, paragraph, or topic-shift — rather than fixed token windows.
  • Embedding model — A model that maps text, images, or multimodal content to dense vector representations used for retrieval, clustering, and similarity search.