Skip to main content

COMPEL Glossary / chunking

Chunking

The process of dividing documents into units — typically fixed-token windows or paragraph-level segments — suitable for embedding and retrieval.

What this means in practice

Chunk size, overlap, and boundary strategy directly affect retrieval quality; too-small chunks lose context, too-large chunks dilute semantic signal.

Synonyms

document chunking , text chunking

See also

  • Semantic chunking — A chunking strategy that respects semantic boundaries — sentence, paragraph, or topic-shift — rather than fixed token windows.
  • Late chunking — A chunking pattern in which the full document is embedded as a long sequence and chunk boundaries are applied at retrieval time against the embedded representation — rather than chunking before embedding.
  • Embedding model — A model that maps text, images, or multimodal content to dense vector representations used for retrieval, clustering, and similarity search.

Related articles in the Body of Knowledge