Skip to main content
AITE M1.2-Art06 v1.0 Reviewed 2026-04-06 Open Access
M1.2 The COMPEL Six-Stage Lifecycle
AITF · Foundations

Tool-Call Authorization and Validation

Tool-Call Authorization and Validation — Transformation Design & Program Architecture — Advanced depth — COMPEL Body of Knowledge.

10 min read Article 6 of 53 Learn

COMPEL Specialization — AITE-ATS: Agentic AI Systems Architect Expert Article 6 of 40


Thesis. Between the model’s decision to call a tool and the world changing, two control layers sit. The first — authorization — asks “may this call occur.” The second — validation — asks “did this call occur safely.” In every publicly-documented agentic failure involving an unauthorized action, one or both of these layers was absent or weak. The architect’s job is to make both layers first-class platform services and to make them so cheap to apply that no tool is authored without them. This article specifies both stacks at implementation depth, then maps a sample ten-tool surface through the stack to show the coverage pattern.

The sandwich model

Every tool invocation, in production-grade agentic systems, is a sandwich: authorization on top, the tool handler in the middle, validation on the bottom. The sandwich is not optional for Class 4+ tools (Article 5); for Class 1–3 tools, the sandwich is simpler but still present.

The two stacks run in the runtime — not in the tool handler, not in the agent prompt. Implementing them in the runtime is what makes them consistent across every tool and debuggable through a single surface.

The authorization stack — six layers

Layer 1 — Identity resolution

The agent is not a principal; it acts on behalf of one. The runtime resolves the acting identity at call-construction time from the session context: the calling user, the tenant, the agent identity (which agent configuration is running), and the invocation chain (which upstream agent delegated this, if multi-agent). The identity resolution populates the authorization request with verified attributes. No value in the authorization request comes from the model’s prompt — prompts can be hijacked; resolved identity attributes cannot.

Layer 2 — Scope check

Does the resolved identity have the OAuth-style scope required for this tool? billing:refund:write scope is different from billing:refund:read; every tool declares a required scope in its registry entry (Article 5), and the runtime validates that the resolved identity has it. In OpenAI Agents SDK, scope checks are typically implemented as input_guardrails; in LangGraph, as pre-handler middleware; in CrewAI, as a role-permission matrix. The architect centralizes the scope evaluation rather than duplicating it per tool.

Layer 3 — Tenant boundary check

In multi-tenant agentic platforms, every tool argument referencing an external resource (order_id, customer_id, account_id) is validated against the acting tenant. The check is “is this resource owned by the tenant the agent is acting for.” The check is the single most common place where tenant leakage is prevented or not prevented; architects who skip it ship production incidents. Cedar expresses this as a tenant-boundary policy; OPA expresses it as a Rego rule; custom implementations use a tenant-scoped data-access layer. The mechanism is less important than the fact that every tool passes through it.

Layer 4 — Rate limit and quota

Sensitive-action rate limits (per user, per tenant, per agent session, per tool-class) run at the authorization layer. “No more than 25 outbound emails per agent session per hour” is enforced by the rate limiter before the send_email handler executes. Rate limits embedded in the tool handler are inconsistent and auditable only per tool; centralized rate limits are consistent and auditable centrally.

Layer 5 — Data classification compatibility

Every tool’s input schema declares the data classes it accepts (public, internal, confidential, PII, PHI, financial). The resolved identity carries a clearance. The authorization layer asserts the tool’s inputs are at or below the clearance. For Class 4+ tools, it also asserts outputs flowing back into the agent context won’t elevate the context’s effective classification in a way that downstream calls can’t handle. Microsoft’s Purview sensitivity labels are one industry reference for this pattern; Google’s sensitivity labels for Vertex AI are another.

Layer 6 — Sensitive-action approval

The policy engine (Article 22) may return require_approval for specific inputs — a refund over $500, an email to an external address the customer has never contacted, a code execution request with a privileged syscall. The runtime routes these to the HITL queue (Article 10) rather than executing immediately. The approver’s verdict (with identity, timestamp, reason) is recorded in the audit log and consumed as the runtime’s authorization decision.

The validation stack — four layers

Layer 1 — Structured output validation

Every tool’s output is validated against its declared output schema before being passed back to the agent loop. The validator catches malformed output, missing required fields, wrong types, and values outside declared ranges. When validation fails, the runtime decides between rolling back the side effect (if possible), reverting to a safe output, or escalating to a human. OpenAI’s structured outputs feature and Anthropic’s tool-use schema validation make this near-automatic for Class 1–3 tools; Class 4+ tools need architect-owned validators because the output often encodes the side effect itself.

Layer 2 — Side-effect diff

For any side-effecting tool (Class 4+), the validator compares system state before and after the call to verify the declared effect occurred (and only the declared effect). “The refund tool was called for $500 on order 12345; the ledger now shows a $500 refund credit on order 12345 and no other changes to any other account.” The diff logic lives in the tool handler’s post-hook and returns a validation record the audit log captures. Side-effect diffs catch silent failures (the tool’s API returned 200 but didn’t actually process), over-reach bugs (the tool modified more than intended), and tampering (the ledger changed in a way the tool did not return).

Layer 3 — Tenant-consistency post-check

For multi-tenant systems, the post-call validator re-checks that the changed state is tenant-consistent. “The change recorded is against the acting tenant’s data; no other tenant’s data was touched.” This is belt-and-suspenders against tenant-boundary bugs in the tool handler: even if Layer 3 of the authorization stack had a gap, Layer 3 of the validation stack catches the resulting anomaly before the next tool call.

Layer 4 — Sanitizer for outputs flowing back into context

Output from Class 3 tools (external information) is sanitized before entering the agent context. Indirect prompt injection (Article 14) arrives via tool output; a sanitizer that strips suspected instruction patterns, wraps suspicious content in safety tags, or quarantines it for human review is the architectural control. This is not a content filter; it is a structural boundary that limits what fraction of tool output is treated as “trusted instruction” versus “data to summarize.” OWASP’s agentic top-10 draft identifies this boundary as a core defense.

Audit log requirements

Every authorization decision and every validation decision writes to an append-only audit log. The required fields: timestamp, agent_session_id, acting_user, tenant_id, tool_name, tool_version, input_hash, authz_decision, authz_reasons, output_hash, validation_decision, validation_reasons, policy_version, trace_id. The log is queryable and retention-policy-governed. For EU AI Act Article 12 logging obligations (touched in Article 23) and financial-services audit requirements, the log is the artifact that proves the stack was present and operational.

Mapping a ten-tool surface through the stack

A representative SaaS customer-operations agent has: search_knowledge_base (Class 1), get_order_history (Class 1), calculate_refund_amount (Class 2), web_search_policy (Class 3), create_support_ticket (Class 4), update_customer_record (Class 4), issue_refund (Class 5), send_email_confirmation (Class 5), escalate_to_manager (Class 5), run_diagnostic_sql (Class 6).

Authorization coverage: Class 1 tools pass through identity + tenant check only (no scope gate needed, no rate limit, no sensitive-action check). Class 2 tools (pure compute) pass through identity only. Class 3 tools add a rate limit (prevent runaway web searches). Class 4 tools add scope check and policy engine hook. Class 5 tools add all six layers and default to HITL approval above configurable thresholds. Class 6 tools require sandbox attestation (Article 21) in addition to all six layers.

Validation coverage: Class 1–2 tools use structured output validation only. Class 3 adds the sanitizer layer. Class 4 adds side-effect diff. Class 5 adds side-effect diff plus tenant-consistency. Class 6 adds the full validation stack plus sandbox result attestation.

The architect’s spec tables the ten tools by class and walks each through the seven authorization + four validation layers as a checklist — “this tool has this control, implemented here.” The table is the auditor’s artifact; the runtime is the enforcement.

Framework parity — where these layers live

  • LangGraph — authorization as pre-node middleware (interrupt before tool node), validation as post-node middleware; tool registry integration via custom ToolNode subclass.
  • CrewAI — role-based scope via Agent config; custom callback hooks for pre/post validation; rate limits via callback side-effects into Redis.
  • AutoGenGroupChatManager custom select_speaker hooks for authorization gates; post-reply hooks for validation.
  • OpenAI Agents SDK — native input_guardrails and output_guardrails are the canonical home for authz/validation; tracing captures decisions.
  • Semantic Kernel — function filters (IFunctionFilter) run pre/post for authorization and validation.
  • LlamaIndex Agents — custom CallbackManager handlers at tool boundary for authz/validation.

The architect picks the framework-native hook and wires it to the same central authorization service and validation service at the platform layer. The framework surface changes; the stack does not.

Real-world anchor — Chevrolet of Watsonville ($1 Tahoe, December 2023)

The Chevrolet dealer chatbot accepted a commercial-commitment tool call (draft-response) without an authorization gate on commercial terms and without a validation gate on the semantics of what was being committed. A six-layer authorization stack with a sensitive-action rule (“drafts containing monetary commitments require HITL approval”) and a validation stack with a semantic post-check (“does this response create a binding commercial offer”) would have caught the $1 Tahoe exchange before publication. Public reporting (Business Insider, December 2023) and ensuing industry commentary identify the architectural gap in exactly these terms.

Real-world anchor — Air Canada Moffatt v. Air Canada (2024 BCCRT 149)

The British Columbia Civil Resolution Tribunal’s February 2024 judgment held Air Canada liable for a bereavement-fare policy its chatbot invented. The “tool” in architectural terms was the response-draft with policy citations; there was no validation layer asserting that cited policies existed in the canonical source or that the draft’s promises were consistent with authoritative policy. A validation stack with a policy-claim verifier — cross-checking any cited rule against the policy registry — would have caught the fabrication before response. Source: Moffatt v. Air Canada, 2024 BCCRT 149 (public judgment).

Closing

Authorization before, validation after, both centralized, both audited, all the time. The sandwich is the most important single control in an agentic system and the easiest to specify poorly. Article 7 now takes up the memory architecture that the tools read from and write into.

Learning outcomes check

  • Explain the six-layer authorization stack and the four-layer validation stack with their sequencing.
  • Classify eight tool calls against both stacks and identify which layers apply to each.
  • Evaluate an agentic design for coverage gaps — find the missing layer(s) for each tool class.
  • Design an authorization policy for a given tool surface with explicit per-class coverage.

Cross-reference map

  • Core Stream: EATE-Level-2/M2.3-Art9-Authorization-Controls-for-Agent-Actions.md.
  • Sibling credential: AITM-AAG Article 5 (governance-facing tool oversight).
  • Forward reference: Articles 10 (HITL), 22 (policy engines), 26 (audit log), 27 (security architecture).