Skip to main content

COMPEL Glossary / semantic-caching

Semantic caching

A caching strategy in which cache hits are determined by semantic similarity to prior queries rather than exact-string match — typically implemented by embedding the query and performing nearest-neighbour search over a cache of past query-response pairs.

What this means in practice

Trades correctness risk (false hits) for cost and latency savings; requires calibrated similarity thresholds and invalidation.

Synonyms

semantic cache , similarity-based caching

See also

  • Prompt caching — An inference optimisation that caches the attention key-value state for a prompt prefix so that subsequent requests sharing the same prefix skip re-processing.
  • Vector store — A governed index of embeddings — numeric vector representations of text, image, or multimodal content — that supports similarity search used by retrieval-augmented generation.
  • Embedding model — A model that maps text, images, or multimodal content to dense vector representations used for retrieval, clustering, and similarity search.