The COMPEL Glossary Graph visualizes relationships between framework terminology, showing how concepts interconnect across domains, stages, and pillars. Term nodes cluster by pillar affiliation while cross-references reveal semantic dependencies — for example, how risk appetite connects to control effectiveness, model governance, and assurance requirements. This network representation helps practitioners navigate the framework vocabulary and understand that COMPEL terminology forms a coherent conceptual system rather than isolated definitions.
COMPEL Glossary / quantization-ai-cost
Quantization (AI cost)
Representation of model weights (and sometimes activations) at lower numerical precision — INT8, INT4, or mixed-precision — to reduce memory footprint and accelerate inference.
What this means in practice
Techniques include post-training quantization (GPTQ, AWQ) and quantization-aware training; trade-off is small quality degradation for often 2-4x cost reduction.
Synonyms
model quantization , weight quantization , INT8 / INT4 quantization
See also
- Distillation — The training of a smaller "student" model to imitate a larger "teacher" model's behaviour — typically on a shared dataset of prompts and teacher outputs.
- PEFT (parameter-efficient fine-tuning) — A family of fine-tuning techniques — most prominently LoRA, QLoRA, and adapters — that update only a small fraction of model parameters while freezing the rest.
- Serving pattern — The architectural shape of the inference path — managed API, cloud-platform hosted, self-hosted online, self-hosted batch, or edge.