Skip to main content
AITL M9.3-Art01 v1.0 Reviewed 2026-04-06 Open Access
M9.3 M9.3
AITL · Leader

OWASP Top 10 for Agentic AI: Mitigation Playbook

OWASP Top 10 for Agentic AI: Mitigation Playbook — Transformation Design & Program Architecture — Strategic depth — COMPEL Body of Knowledge.

9 min read Article 1 of 2

COMPEL Body of Knowledge — Agentic Governance Series (Cluster C) Flagship Security Playbook


Why an agentic-specific playbook {#why}

The risks that matter for an agentic AI system are different from the risks that matter for a chat or completion system. A chatbot that hallucinates causes misinformation; an agent that hallucinates causes actions. The moment an AI system is given tools, memory, and autonomy, the attack surface becomes kinetic: the system can move money, delete data, post communications, spawn sub-processes, and touch external systems.

The OWASP GenAI Security Project publishes two complementary catalogues — the OWASP Top 10 for LLM Applications (chat/completion surfaces) and the OWASP Top 10 for Agentic AI Applications (agent/tool/memory surfaces). This playbook focuses on the agentic list and pairs each risk with a production-ready mitigation, mapped to the four layers an agent architecture exposes: prompt, tool, memory, orchestrator.

The playbook is architected so every control also lines up to the dual-compliance baseline (ISO 42001 + NIST AI RMF) that enterprise AI programs already operate.

The ten risks and their mitigations {#top-10}

A1 — Excessive agency

What it is. An agent holds capabilities (tool scopes, API keys, permissions) that exceed what the user task requires. A helpdesk agent is issued an admin credential; a scheduling agent has write access to billing.

Blast radius. Irreversible actions on systems of record, financial loss, regulatory breach.

Mitigations (layered):

  • Tool layer. Scope tokens per task. Use short-lived (≤15 minute) STS-style credentials. Forbid * scopes.
  • Orchestrator. Enforce a capability manifest per agent role; deny tool calls outside the manifest.
  • Prompt. Include a refusal clause for out-of-scope requests; pair with system-prompt attestation (signed system prompts).
  • HITL. Require a human approval gate for any irreversible action.

Detection. Alert on every denied-tool-call event and every capability-expansion request.

A2 — Tool misuse and confused deputy

What it is. An agent is tricked into using a legitimate tool for an illegitimate purpose — an email tool used to exfiltrate data, a search tool used to enumerate accounts.

Mitigations.

  • Tool layer. Parameter schemas enforced server-side; reject calls with unexpected fields.
  • Orchestrator. Tool-call policies (e.g., recipient allowlist on email send, row-limit on database queries).
  • Prompt. Explicitly bind user intent to the tool call; require the agent to summarize the intended action before invocation.

Detection. Anomaly detection on tool-use volume, novel parameter patterns, and cross-tenant data access.

A3 — Memory poisoning

What it is. An attacker plants malicious content into the agent’s long-term memory (vector store, summary buffer, episodic log) that later steers behavior.

Mitigations.

  • Memory layer. Provenance metadata on every write (source, user, trust score). Content-integrity hashes.
  • Orchestrator. Trust-weighted retrieval — weight memories by provenance score.
  • Tool layer. Sandbox untrusted content (emails, user uploads) before ingestion.

Detection. Embedding drift monitoring, red-team injection tests on ingestion paths.

A4 — Goal hijacking

What it is. A manipulated input causes the agent to abandon its original goal and pursue an attacker-specified one. The agent keeps executing, but toward the wrong destination.

Mitigations.

  • Orchestrator. Pin the goal at session start; verify goal invariance on each step.
  • Prompt. System-goal attestation; require the agent to restate the original goal before high-impact tool calls.
  • HITL. Escalate when the agent requests a plan revision in the middle of execution.

Detection. Plan-divergence scoring; goal-embedding distance thresholds.

A5 — Cross-agent collusion and cascading failures

What it is. In multi-agent systems, one compromised or misbehaving agent corrupts its peers through shared memory, broker messages, or synchronized tool calls.

Mitigations.

  • Orchestrator. Isolation boundaries per agent (separate memory namespaces); peer-message sanitization.
  • Memory layer. Read-only shared context where possible; writable memory partitioned per agent.
  • Tool layer. Rate-limit inter-agent messages; block loops.

Detection. Graph analysis of agent-to-agent communication; dead-man switches on message-volume spikes.

A6 — Autonomous code execution

What it is. An agent with code-execution tools (sandbox, REPL, shell) is coaxed into running attacker-supplied code.

Mitigations.

  • Tool layer. Execute in a per-invocation microVM with no persistent state, no network egress, minimal filesystem.
  • Orchestrator. Pre-execution static analysis and capability assertions.
  • HITL. Approval gate for any code execution that produces side effects outside the sandbox.

Detection. Sandbox-break signals, egress attempts, syscall profiles outside a learned baseline.

A7 — Resource exhaustion

What it is. An agent is forced into a loop, recursive planning explosion, or runaway tool-call chain — driving costs, evicting other workloads, or causing denial of service.

Mitigations.

  • Orchestrator. Step budget, token budget, wall-clock budget per session; hard kill when exceeded.
  • Tool layer. Per-tool and per-session rate limits.
  • Prompt. Planner is required to produce a finite step plan; infinite tools are rejected.

Detection. Budget-overrun alerts; percentile-tail monitoring on session duration and cost.

A8 — Data and output provenance failure

What it is. The agent’s outputs and intermediate artifacts cannot be traced back to their inputs, tools, and decisions. When something goes wrong, there is no forensic trail.

Mitigations.

  • Orchestrator. Emit a structured trace (plan, thought, tool-call, tool-result, memory-read, memory-write) for every step. Sign the trace.
  • Memory layer. Append-only log. Immutable.
  • Compliance overlay. Map traces to ISO 42001 A.6.2.8 (records) and NIST AI RMF MEASURE 3.

Detection. Integrity checks on traces; alert on gaps.

A9 — Prompt and context injection through tools

What it is. A tool returns content (web page, email body, PDF, spreadsheet cell) containing hidden instructions that the agent treats as commands.

Mitigations.

  • Tool layer. Mark all tool outputs as untrusted; strip or escape instruction-like patterns before insertion.
  • Orchestrator. Separate the tool-output channel from the instruction channel; never concatenate.
  • Prompt. Explicit system instruction that tool content is data, not commands.

Detection. Canary instructions in known-safe tool responses; alert if the agent follows them.

A10 — Supply-chain compromise

What it is. The agent’s models, embeddings, tools, or MCP servers are sourced from compromised or untrusted vendors — introducing backdoors, data-exfiltration, or behavioral backdoors.

Mitigations.

  • Tool layer. Signed tool manifests; verify publisher signatures.
  • MCP integration. Connect only to MCP servers on an allowlist with TLS pinning and attested builds.
  • Orchestrator. Vendor risk assessments per tool and per model, refreshed quarterly.

Detection. SBOM monitoring for the agent stack; behavioral regression tests per model update.

Layered mitigation matrix {#layers}

LayerPrimary controls
PromptSystem-prompt attestation · goal pinning · refusal clauses · tool-content quarantine
ToolScope minimization · parameter schemas · rate limits · sandboxing · signed manifests · provenance metadata
MemoryProvenance tags · integrity hashes · trust-weighted retrieval · append-only audit log · partitioning
OrchestratorCapability manifests · step budgets · plan-divergence detection · isolation between agents · structured tracing
HITLRisk-tiered approval gates · two-person approvals for irreversible actions · escalation tree

Every OWASP agentic risk requires controls across at least two layers. Single-layer defenses fail.

Mapping to COMPEL stages {#compel-mapping}

COMPEL stageAgentic control focus
CalibrateInventory existing agents · classify by autonomy tier · identify top-risk agents
OrganizeAssign agent owners · define approval thresholds · stand up an agent-incident response team
ModelDesign capability manifests · author system prompts and attestation · scope tools per role
ProduceDeploy with sandboxed tools · structured tracing on day one · canary controls in staging
EvaluateRun the OWASP agentic test suite · penetration test prompt injection and goal hijacking
LearnPost-incident reviews · playbook updates · SBOM refresh cycle

Evidence artifacts {#evidence}

  • Agent registry with autonomy tier per agent
  • Capability manifest per agent role
  • Tool-scope policy and allowlists
  • Agent system prompt, signed and versioned
  • Structured trace schema and sample traces
  • Kill-switch runbook and drill records
  • HITL approval log
  • Memory-poisoning test results
  • Red-team report per release
  • Third-party (tool/model/MCP) risk assessments and SBOM

Metrics {#metrics}

  • Mean time to kill — seconds from kill-switch trigger to agent halt. Target <60s.
  • Denied-call rate — percentage of agent tool calls denied by capability manifest. Expect 2–8%; spikes warrant investigation.
  • HITL approval latency — median time from approval request to decision.
  • Red-team prompt-injection success rate — percentage of injected prompts that change behavior. Target <1%.
  • Budget-overrun rate — percentage of sessions that exhaust step/token/wall-clock budgets.
  • SBOM freshness — days since last supply-chain verification. Target <30.

Risks if skipped {#risks}

Organizations that deploy agents without this playbook face:

  • Financial incidents from tool misuse and excessive agency
  • Data exfiltration through memory poisoning and prompt injection
  • Regulatory breach because traces are insufficient for audit
  • Board-level reputational damage when an incident becomes public
  • Loss of the right to deploy agents, as regulators increasingly require documented controls (EU AI Act Annex III, ISO 42001 Annex A.6)

How to cite

COMPEL FlowRidge Team. (2026). “OWASP Top 10 for Agentic AI: Mitigation Playbook.” COMPEL Framework by FlowRidge. https://www.compelframework.org/articles/seo-c1-owasp-top-10-agentic-ai-mitigation-playbook/

Frequently Asked Questions

How is the OWASP Top 10 for Agentic AI different from the LLM Top 10?
The LLM Top 10 targets chat and completion surfaces. The Agentic Top 10 targets systems that take actions — invoking tools, writing to memory, spawning sub-agents, and executing multi-step plans. The failure modes are kinetic, not just informational.
What is the single highest-impact mitigation for agentic AI?
Constrain tool scope and enforce least-privilege at the tool boundary. Most catastrophic agent incidents trace to an agent holding a tool credential more powerful than the task required.
Do we need a human-in-the-loop for every agentic action?
No. Set HITL thresholds based on blast radius — reversible, low-cost actions run autonomously, while irreversible actions (financial transfers, external communications, data deletion) require a human approval gate.
How do we detect memory poisoning before it causes damage?
Use content provenance on every memory write (who, when, source) and validate retrievals against integrity hashes. Flag any memory entry whose embedding drifts more than two standard deviations from its historical cluster.
Which OWASP controls map to ISO 42001 and NIST AI RMF?
Every agentic control maps to NIST AI RMF MANAGE 2 (mitigation) and MEASURE 2 (trustworthy characteristics), and to ISO 42001 A.6.2.6 (system operation), A.6.2.8 (incidents), and A.9.2 (monitoring). The OWASP catalog provides the implementation detail those standards omit.
What belongs in an agentic incident runbook?
A kill-switch procedure, a memory-quarantine procedure, a tool-revocation procedure, an escalation tree, and a forensic-capture checklist for agent traces, tool-call logs, and memory snapshots.