The COMPEL Glossary Graph visualizes relationships between framework terminology, showing how concepts interconnect across domains, stages, and pillars. Term nodes cluster by pillar affiliation while cross-references reveal semantic dependencies — for example, how risk appetite connects to control effectiveness, model governance, and assurance requirements. This network representation helps practitioners navigate the framework vocabulary and understand that COMPEL terminology forms a coherent conceptual system rather than isolated definitions.
COMPEL Glossary / ttft-time-to-first-token
TTFT (time-to-first-token)
The latency from request submission to the first streamed output token.
What this means in practice
TTFT is the user-perceived responsiveness metric for streaming LLM applications; distinct from total generation latency because downstream UX depends on the time the user waits for any output to appear.
Synonyms
time to first token , first-token latency
See also
- Serving pattern — The architectural shape of the inference path — managed API, cloud-platform hosted, self-hosted online, self-hosted batch, or edge.
- Continuous batching — An inference-server technique — popularised by vLLM and Text Generation Inference — that dynamically groups concurrent requests at the token-generation level to raise GPU utilisation.
- Prompt caching — An inference optimisation that caches the attention key-value state for a prompt prefix so that subsequent requests sharing the same prefix skip re-processing.
- SLI/SLO for AI — Service-level indicators and objectives for AI systems — including evaluation score, per-task cost, and goal-achievement rate alongside classical availability/latency.