The COMPEL Glossary Graph visualizes relationships between framework terminology, showing how concepts interconnect across domains, stages, and pillars. Term nodes cluster by pillar affiliation while cross-references reveal semantic dependencies — for example, how risk appetite connects to control effectiveness, model governance, and assurance requirements. This network representation helps practitioners navigate the framework vocabulary and understand that COMPEL terminology forms a coherent conceptual system rather than isolated definitions.
COMPEL Glossary / llm-as-judge
LLM-as-judge
An evaluation technique using a large language model to score outputs from another LLM on quality dimensions — helpfulness, correctness, safety — scaling evaluation beyond human-rater capacity.
What this means in practice
Strengths: scalability, consistency. Weaknesses: judge-model biases, verbosity preference, and self-preference when the judge and candidate share architecture.
Synonyms
model-graded evaluation , LLM judge , judge model
See also
- Evaluation harness — The infrastructure that runs capability, regression, safety, and human-review evaluations on an LLM feature on a defined cadence.
- Benchmark contamination — The presence of benchmark test data in foundation-model training corpora — whether through web crawling or deliberate inclusion — inflating reported benchmark scores and breaking the comparability of benchmark results across models.
- Red-team experiment — An adversarial experiment designed to probe failure modes rather than validate desired behavior — structured, hypothesis-driven exploration of safety bypass, goal mis-specification, jailbreak, and harm.