Skip to main content

COMPEL Glossary / prompt-evaluation-harness

Prompt evaluation harness

The infrastructure that runs capability, regression, safety, and human-review evaluations on prompts — distinct from a general LLM evaluation harness by scope: prompt-evaluation tests the prompt while holding the model fixed, catching prompt-level drift (e.g., after a system-prompt edit) without attributing it to the model.

What this means in practice

Governance artefact for prompt lifecycle management.

Synonyms

prompt eval suite , prompt test battery

See also

  • Evaluation harness — The infrastructure that runs capability, regression, safety, and human-review evaluations on an LLM feature on a defined cadence.
  • Prompt template — A versioned, parameterized representation of a prompt — with placeholders for user input, retrieved context, and dynamic state — enabling reuse, testing, and audit.
  • Red-team experiment — An adversarial experiment designed to probe failure modes rather than validate desired behavior — structured, hypothesis-driven exploration of safety bypass, goal mis-specification, jailbreak, and harm.

Related articles in the Body of Knowledge