The COMPEL Glossary Graph visualizes relationships between framework terminology, showing how concepts interconnect across domains, stages, and pillars. Term nodes cluster by pillar affiliation while cross-references reveal semantic dependencies — for example, how risk appetite connects to control effectiveness, model governance, and assurance requirements. This network representation helps practitioners navigate the framework vocabulary and understand that COMPEL terminology forms a coherent conceptual system rather than isolated definitions.
COMPEL Glossary / content-safety-classifier
Content safety classifier
A model or rule system that detects policy-violating output categories — violence, self-harm, CSAM, targeted harassment, dangerous instructions, and similar.
What this means in practice
Forms the output-layer of a guardrail architecture and is technology-neutral: implementations span managed APIs, open-weight classifiers, and rule engines.
Synonyms
safety classifier , policy classifier , moderation classifier
See also
- Guardrail — A control placed between the user or environment and an LLM that blocks, rewrites, or classifies content at one of four architectural layers: input filter, policy filter, output filter, or tool-call validator.
- Jailbreak — A user-crafted prompt pattern that bypasses a model's safety training to elicit restricted behavior.
- Evaluation harness — The infrastructure that runs capability, regression, safety, and human-review evaluations on an LLM feature on a defined cadence.