The COMPEL Glossary Graph visualizes relationships between framework terminology, showing how concepts interconnect across domains, stages, and pillars. Term nodes cluster by pillar affiliation while cross-references reveal semantic dependencies — for example, how risk appetite connects to control effectiveness, model governance, and assurance requirements. This network representation helps practitioners navigate the framework vocabulary and understand that COMPEL terminology forms a coherent conceptual system rather than isolated definitions.
COMPEL Glossary / evaluation-harness
Evaluation harness
The infrastructure that runs capability, regression, safety, and human-review evaluations on an LLM feature on a defined cadence.
What this means in practice
Treated as a governance artefact, not just an engineering convenience — its coverage and cadence directly determine whether the organisation can detect and act on drift or misuse.
Synonyms
LLM evaluation suite , eval harness , capability-and-safety evaluation
See also
- Red-team (for LLMs) — A structured adversarial exercise against an LLM feature using human, automated, or hybrid techniques drawn from MITRE ATLAS or OWASP LLM Top 10 to discover failure modes before attackers do.
- Confabulation — NIST's preferred term for hallucination: an LLM generating fluent output that is unsupported by ground truth.
- Content safety classifier — A model or rule system that detects policy-violating output categories — violence, self-harm, CSAM, targeted harassment, dangerous instructions, and similar.
- Model and prompt registry — A versioned inventory of models, system prompts, retrieval sources, and guardrails deployed in production.
Related articles in the Body of Knowledge
- Lab 02: Build an LLM Evaluation Harness with Offline, Online, and Human Components
- Artifact Template: LLM Evaluation Harness Specification
- Prompt Evaluation Harness
- Lab 02: Design an Evaluation Harness for a Retrieval-Augmented Feature
- Designing an Evaluation Harness for Value
- Lab 01: Design and Execute an Offline Evaluation Harness