Skip to main content

COMPEL Glossary / semantic-chunking

Semantic chunking

A chunking strategy that respects semantic boundaries — sentence, paragraph, or topic-shift — rather than fixed token windows.

What this means in practice

Produces higher-coherence chunks that survive retrieval better on question-answering tasks, at the cost of additional preprocessing and variable chunk size.

Synonyms

boundary-aware chunking , paragraph-level chunking

See also

  • Chunking — The process of dividing documents into units — typically fixed-token windows or paragraph-level segments — suitable for embedding and retrieval.
  • Late chunking — A chunking pattern in which the full document is embedded as a long sequence and chunk boundaries are applied at retrieval time against the embedded representation — rather than chunking before embedding.
  • Embedding model — A model that maps text, images, or multimodal content to dense vector representations used for retrieval, clustering, and similarity search.