COMPEL™ Body of Knowledge v2.5

COMPEL Glossary / prompt-caching

Prompt caching

An inference optimisation that caches the attention key-value state for a prompt prefix so that subsequent requests sharing the same prefix skip re-processing.

What this means in practice

Reduces both latency and cost on repeated context (system prompts, long documents, few-shot examples); cache hit rate is a first-order cost-architecture metric.

Synonyms

prefix caching , KV-cache reuse

See also

Semantic caching — A caching strategy in which cache hits are determined by semantic similarity to prior queries rather than exact-string match — typically implemented by embedding the query and performing nearest-neighbour search over a cache of past query-response pairs.
Continuous batching — An inference-server technique — popularised by vLLM and Text Generation Inference — that dynamically groups concurrent requests at the token-generation level to raise GPU utilisation.
Model routing — A pattern that routes each request to the cheapest model capable of handling it, escalating to more powerful models only when necessary — typically via a small classifier, confidence-based escalation, or response evaluation.
Per-task cost — An agent SLI capturing the full compute and API cost of a single task end-to-end — including all loop iterations, tool calls, memory reads and writes.

Cite this article

Author:: FlowRidge Team
Publisher:: FlowRidge
First Published:: 2026
Work:: COMPEL AI Transformation Body of Knowledge

Academic (APA)

FlowRidge Team. (2026). Prompt caching — COMPEL Glossary. COMPEL AI Transformation Body of Knowledge. FlowRidge. Retrieved from https://www.compelframework.org/glossary/prompt-caching

BibTeX

@misc{compel-prompt-caching-2026,
  author = {{FlowRidge Team}},
  title = {Prompt caching — COMPEL Glossary},
  howpublished = {COMPEL AI Transformation Body of Knowledge},
  publisher = {FlowRidge},
  year = {2026},
  url = {https://www.compelframework.org/glossary/prompt-caching},
  note = {Governed by the COMPEL Framework License Agreement}
}

Plain text

FlowRidge Team. Prompt caching — COMPEL Glossary. COMPEL AI Transformation Body of Knowledge. FlowRidge, 2026. https://www.compelframework.org/glossary/prompt-caching

Need Chicago, IEEE, or MLA formats? See the full COMPEL Citation Guide for every supported format with copy-ready snippets.

This content is part of the COMPEL AI Transformation Body of Knowledge, governed by the COMPEL Framework License Agreement. See /license for terms.