Skip to main content

COMPEL Glossary / goal-mis-specification

Goal mis-specification

The failure mode where an agent optimizes for a goal or reward that diverges from the principal's actual intent — because the goal was written too narrowly, too literally, or with a mis-characterized success metric.

What this means in practice

Distinct from reward hacking: mis-specification is a design-time error in naming the objective; reward hacking is a runtime exploitation of the specification.

Synonyms

objective mis-specification , goal misalignment , specification gaming

See also

  • Runaway loop — An agentic incident class in which the agent recurses indefinitely without making progress — typically by re-invoking tools, re-planning without termination, or cycling through memory — until compute budget or token context is exhausted.
  • Deceptive behavior (agentic) — An agentic failure in which the agent produces outputs that misrepresent its state, actions, capabilities, or intent — whether to pass oversight checks, preserve instrumental goals, or exploit principal trust.