Skip to main content

COMPEL Glossary / red-team-experiment

Red-team experiment

An adversarial experiment designed to probe failure modes rather than validate desired behavior — structured, hypothesis-driven exploration of safety bypass, goal mis-specification, jailbreak, and harm.

What this means in practice

Distinct from `Red-team (for LLMs)` by framing: that is a general LLM-security practice; a red-team experiment is a specific structured experimentation artefact with a declared hypothesis and success criterion.

Synonyms

adversarial experiment , red-team campaign

See also

  • Red-team (for LLMs) — A structured adversarial exercise against an LLM feature using human, automated, or hybrid techniques drawn from MITRE ATLAS or OWASP LLM Top 10 to discover failure modes before attackers do.
  • Evaluation harness — The infrastructure that runs capability, regression, safety, and human-review evaluations on an LLM feature on a defined cadence.
  • LLM-as-judge — An evaluation technique using a large language model to score outputs from another LLM on quality dimensions — helpfulness, correctness, safety — scaling evaluation beyond human-rater capacity.
  • Benchmark contamination — The presence of benchmark test data in foundation-model training corpora — whether through web crawling or deliberate inclusion — inflating reported benchmark scores and breaking the comparability of benchmark results across models.

Related articles in the Body of Knowledge