COMPEL™ Body of Knowledge v2.5

COMPEL Glossary / evaluation-harness

Evaluation harness

The infrastructure that runs capability, regression, safety, and human-review evaluations on an LLM feature on a defined cadence.

What this means in practice

Treated as a governance artefact, not just an engineering convenience — its coverage and cadence directly determine whether the organisation can detect and act on drift or misuse.

Synonyms

LLM evaluation suite , eval harness , capability-and-safety evaluation

See also

Red-team (for LLMs) — A structured adversarial exercise against an LLM feature using human, automated, or hybrid techniques drawn from MITRE ATLAS or OWASP LLM Top 10 to discover failure modes before attackers do.
Confabulation — NIST's preferred term for hallucination: an LLM generating fluent output that is unsupported by ground truth.
Content safety classifier — A model or rule system that detects policy-violating output categories — violence, self-harm, CSAM, targeted harassment, dangerous instructions, and similar.
Model and prompt registry — A versioned inventory of models, system prompts, retrieval sources, and guardrails deployed in production.

Related articles in the Body of Knowledge

Cite this article

Author:: FlowRidge Team
Publisher:: FlowRidge
First Published:: 2026
Work:: COMPEL AI Transformation Body of Knowledge

Academic (APA)

FlowRidge Team. (2026). Evaluation harness — COMPEL Glossary. COMPEL AI Transformation Body of Knowledge. FlowRidge. Retrieved from https://www.compelframework.org/glossary/evaluation-harness

BibTeX

@misc{compel-evaluation-harness-2026,
  author = {{FlowRidge Team}},
  title = {Evaluation harness — COMPEL Glossary},
  howpublished = {COMPEL AI Transformation Body of Knowledge},
  publisher = {FlowRidge},
  year = {2026},
  url = {https://www.compelframework.org/glossary/evaluation-harness},
  note = {Governed by the COMPEL Framework License Agreement}
}

Plain text

FlowRidge Team. Evaluation harness — COMPEL Glossary. COMPEL AI Transformation Body of Knowledge. FlowRidge, 2026. https://www.compelframework.org/glossary/evaluation-harness

Need Chicago, IEEE, or MLA formats? See the full COMPEL Citation Guide for every supported format with copy-ready snippets.

This content is part of the COMPEL AI Transformation Body of Knowledge, governed by the COMPEL Framework License Agreement. See /license for terms.