Skip to main content

COMPEL Glossary / guardrail

Guardrail

A control placed between the user or environment and an LLM that blocks, rewrites, or classifies content at one of four architectural layers: input filter, policy filter, output filter, or tool-call validator.

What this means in practice

Taught as a layered architecture rather than a single product.

Synonyms

LLM guardrails , AI safety layer

See also

  • Content safety classifier — A model or rule system that detects policy-violating output categories — violence, self-harm, CSAM, targeted harassment, dangerous instructions, and similar.
  • Excessive agency — A failure mode in which an LLM has been wired into tools and permissions whose blast radius exceeds what its supervision and validation logic can safely bound.
  • Red-team (for LLMs) — A structured adversarial exercise against an LLM feature using human, automated, or hybrid techniques drawn from MITRE ATLAS or OWASP LLM Top 10 to discover failure modes before attackers do.

Related articles in the Body of Knowledge