The COMPEL Glossary Graph visualizes relationships between framework terminology, showing how concepts interconnect across domains, stages, and pillars. Term nodes cluster by pillar affiliation while cross-references reveal semantic dependencies — for example, how risk appetite connects to control effectiveness, model governance, and assurance requirements. This network representation helps practitioners navigate the framework vocabulary and understand that COMPEL terminology forms a coherent conceptual system rather than isolated definitions.
COMPEL Glossary / prompt-caching
Prompt caching
An inference optimisation that caches the attention key-value state for a prompt prefix so that subsequent requests sharing the same prefix skip re-processing.
What this means in practice
Reduces both latency and cost on repeated context (system prompts, long documents, few-shot examples); cache hit rate is a first-order cost-architecture metric.
Synonyms
prefix caching , KV-cache reuse
See also
- Semantic caching — A caching strategy in which cache hits are determined by semantic similarity to prior queries rather than exact-string match — typically implemented by embedding the query and performing nearest-neighbour search over a cache of past query-response pairs.
- Continuous batching — An inference-server technique — popularised by vLLM and Text Generation Inference — that dynamically groups concurrent requests at the token-generation level to raise GPU utilisation.
- Model routing — A pattern that routes each request to the cheapest model capable of handling it, escalating to more powerful models only when necessary — typically via a small classifier, confidence-based escalation, or response evaluation.
- Per-task cost — An agent SLI capturing the full compute and API cost of a single task end-to-end — including all loop iterations, tool calls, memory reads and writes.