COMPEL Specialization — AITE-ATS: Agentic AI Systems Architect Expert Article 5 of 40
Thesis. The moment an agent gains a tool, it stops being a chatbot and starts being a system that changes the world. Every tool added to an agent’s surface is a new edge of the organization’s blast radius. The architect who treats tools as JSON-schema afterthoughts — a name, a description, a parameter list copied from an OpenAPI spec — builds the next Chevrolet $1 Tahoe incident. The architect who treats tools as first-class governed artifacts, each with a schema, a registry entry, a risk class, a test battery, and a retirement plan, builds an agentic platform that survives a security review. This article teaches tool-surface design at that second depth.
What a tool actually is
A ToolNode, CrewAI @tool, AutoGen register_function, OpenAI Agents SDK @function_tool, Semantic Kernel KernelFunction, LlamaIndex FunctionTool — the shape is the same. A tool has: a name the model uses; a natural-language description that shapes when the model chooses it; an input schema (usually JSON Schema) the model must satisfy; an implementation the runtime executes; and an output the runtime returns to the model. That minimal five-part shape is what the Model Context Protocol (MCP, Anthropic 2024) standardizes across frameworks, and it is the unit the architect designs, versions, and retires.
The tool’s description is load-bearing in a way that most engineers initially underestimate. The model picks which tool to call largely from the description; ambiguous descriptions produce tool-selection errors that look like model hallucinations but are actually specification bugs. Writing tool descriptions is a technical-writing discipline: one sentence of purpose, one sentence of when-to-use (and when-not-to-use), bounded scope, concrete parameter semantics, and a note on side-effect permanence.
The six tool categories and their risk classes
Not every tool is equally dangerous. The architect classifies every tool on a risk axis before authorizing it into an agent’s surface. Six categories recur across production agentic systems.
Class 1 — Read-only retrieval. Fetch from a knowledge base, search a doc corpus, run a read-only SQL. The blast radius is information disclosure (tenant leakage, PII exposure) rather than side-effect damage.
Class 2 — Computational utility. Calculator, date math, timezone conversion, unit conversion. Blast radius is bounded by the computation itself; the primary risk is that the model trusts a buggy computation as ground truth.
Class 3 — External information. Web search, maps, stock price, weather. Blast radius expands because the tool pulls untrusted data into the agent’s context — indirect prompt injection (Article 14) travels on this surface.
Class 4 — Internal write. Create a ticket, update a CRM, write to a datastore. Blast radius is the integrity of internal systems; bugs persist.
Class 5 — External action. Send email, post to Slack, trigger a webhook, run a payment. Blast radius is organizational — reputational, financial, legal. Every famous agentic incident involved a Class 5 call.
Class 6 — Privileged/code-execution. Run arbitrary code, shell commands, SQL with write access, deploy infrastructure. Blast radius is the system itself; containment is the sandbox (Article 21).
Every architect-authored tool registry entry carries this risk class. Every policy rule (Article 22) uses it. Every evaluation harness (Article 17) includes tests proportioned to it. Class assignment is not advisory; it is the scaffold for the rest of the controls.
Schema design discipline
A well-designed tool schema is the single most effective defense against an entire category of agentic bugs. The architect applies four schema disciplines.
Parameter constraints at the schema boundary. JSON Schema supports minimum, maximum, minLength, maxLength, pattern, enum, format. Every one of those constraints that the architect omits becomes a constraint the runtime must enforce elsewhere — or must fail to enforce. A refund tool that declares amount: {type: "number"} without minimum: 0 and maximum: 10000 will sooner or later be asked for a negative refund or a million-dollar one. The architect declares the bounds in the schema and the framework rejects out-of-bounds arguments before the tool handler even sees them.
Enums not strings when the space is closed. If the valid values for status are pending, active, suspended, the schema uses enum: ["pending","active","suspended"], not type: "string". Enumerated domains eliminate typo-driven errors and make the model’s tool-argument selection more robust. OpenAI, Anthropic, and every major framework respect enums at the constrained-decoding layer when supported.
Structured outputs the model can reason about. A tool that returns free-text ("Refund of $500 issued to customer 12345") is harder for the model to reason about than one that returns structured JSON ({status: "success", refund_id: "ref_abc", amount: 500.00, customer_id: 12345, note: "Refund of $500 issued to customer 12345"}). Structured outputs with both machine fields and a human-readable note field are the production shape. Every major framework supports structured tool outputs; the architect makes them mandatory.
No secrets in parameters; server-side lookup instead. The tool schema must not include API keys, credentials, or tenant identifiers as parameters the model populates. Those values come from the runtime context, not from the model. The schema for send_email(to, subject, body) is correct; the schema for send_email(api_key, tenant_id, to, subject, body) is wrong, because the model should never be reasoning about credentials.
The tool registry as governance artifact
The tool registry is the single source of truth for every tool in the platform. The architect’s registry spec — independent of storage (Postgres, Git, JSON-in-S3) — defines the minimum field set: tool_id, tool_name, version, description, input_schema, output_schema, risk_class, owner_team, implementation_reference, authorization_policy_ref, evaluation_suite_ref, audit_fields, deprecation_status, retirement_date.
Registry entries are versioned and immutable once approved. A tool does not get modified in place; a new version supersedes the old. The registry is the object the policy engine consults on every call, the object the evaluation harness iterates over, the object the audit pack (Article 26) summarizes for regulators. A system without a registry has no answer to the auditor’s question “how many tools does your agent have and what can each do?”
Frameworks do not give the architect this registry by default. LangGraph has its @tool decorator; CrewAI has its @tool; AutoGen has register_function; OpenAI Agents SDK has @function_tool. Each decorator registers the tool with the framework — not with the governance registry. The architect builds the registry at the platform layer and wires the framework-level registration to create registry entries as a side effect. The Model Context Protocol (MCP) makes this work across frameworks by standardizing the interoperable tool definition; an MCP-backed registry survives a framework change without re-registering every tool.
Framework parity — the same tool, five ways
A canonical get_refund_status(order_id, customer_id) tool looks nearly identical across frameworks.
- LangGraph:
@tool def get_refund_status(order_id: str, customer_id: str) -> RefundStatus:with Pydantic model for output. - CrewAI:
@tool("get_refund_status") def get_refund_status(order_id: str, customer_id: str): ...withdescriptionarg and structured return. - AutoGen:
agent.register_function(function=get_refund_status, name="get_refund_status", description="...")with JSON schema. - OpenAI Agents SDK:
@function_tool(description_override="...") def get_refund_status(order_id: str, customer_id: str) -> RefundStatus:. - Semantic Kernel:
[KernelFunction, Description("...")] public async Task<RefundStatus> GetRefundStatus(string orderId, string customerId)in C#. - LlamaIndex Agents:
FunctionTool.from_defaults(fn=get_refund_status, name="get_refund_status", description="...").
The architect’s registry layer wraps each framework call so the tool is registered in both places simultaneously. MCP-bound tools register once and are consumable by any MCP-aware framework.
Safe-default patterns
A set of safe defaults, applied uniformly across the tool surface, prevents most design-time bugs.
Deny-by-default tenant scoping. Every tool accepts a tenant_id from runtime context and rejects any parameter that would cross tenant boundaries. The model never sees the tenant ID; the runtime injects it; the tool validates it.
Dry-run mode by default for Class 4+ tools. Every side-effecting tool supports a dry_run: true mode that returns the action it would have taken, without taking it. Used in evaluations, canaries, and the first-time-you-see-this-input path.
Idempotency keys on Class 4 and Class 5 tools. Every side-effecting call accepts an idempotency key; repeat calls with the same key return the same result without repeating the side effect. The runtime generates the key; the model does not.
Explicit “dangerous” flag. Class 5 and Class 6 tools declare dangerous: true in the registry. The runtime routes these through additional controls — policy engine with stricter rules, HITL gates for first-use-per-customer patterns, rate limits per agent session.
Output size caps. Every tool declares a maximum output size. Oversized outputs are truncated and the truncation is visible in the returned structure. This prevents context-window bloat and memory-exhaustion DoS.
Real-world anchor — Chevrolet of Watsonville and the $1 Tahoe
In December 2023, a user engaged a customer-service chatbot on Chevrolet of Watsonville’s website and convinced it to “agree” to sell a 2024 Chevy Tahoe for $1, and to commit “no takesies backsies” (Business Insider, December 2023). The incident went viral. The post-mortem in public commentary settled on two causes: first, the tool — the response-drafting interface — accepted the model’s output without a policy check on commercial terms; second, the system had no registry-level risk class declaring commercial-commitment actions as Class 5 that required additional controls. A registry-driven architecture with the refund/discount/commitment tool correctly classified would have routed that exchange through a gate long before it generated a binding offer. No framework choice fixes this gap; only tool-registry discipline does.
Real-world anchor — Anthropic’s tool use guidance and MCP
Anthropic’s tool use documentation (public, 2024) emphasizes description quality, schema strictness, and the enum-over-string pattern as first-order defenses. The release of the Model Context Protocol standardizes tool definition across models and frameworks, so an architect can define a tool once and expose it to Claude, GPT, or Gemini agents without re-authoring. The architect should treat MCP as the interoperability target for the tool registry, even when the current deployment runs on a single model provider, because the lock-in cost of framework-specific tools is non-trivial and the exit path without MCP is expensive.
Real-world anchor — OpenAI function calling and structured outputs
OpenAI’s function-calling and structured-outputs features (public docs, 2024) made schema-strict tool calls reliable at production scale. Structured outputs, in particular, collapsed an entire class of parsing bugs — the “model returned almost-valid JSON” category — into a non-issue. The architect designs tools assuming strict schema enforcement is available and degrades gracefully on models that don’t support it.
Closing
The tool surface is the agent’s edge to the world. Design it with a six-class risk taxonomy, a strict schema, a first-class registry, a framework-agnostic interoperability layer (MCP), and a set of safe defaults applied uniformly. Article 6 now takes up the controls that sit on top of this foundation — the authorization stack that runs before each call and the validation stack that runs after.
Learning outcomes check
- Explain tool schema elements (description, input schema, structured output, risk class, registry metadata).
- Classify six tool categories by risk class and name the required controls per class.
- Evaluate a tool schema for validation completeness (bounds, enums, tenant scoping, dry-run, idempotency, output caps).
- Design a tool registry spec for a given platform with the minimum field set and MCP-backed interoperability.
Cross-reference map
- Core Stream:
EATF-Level-1/M1.4-Art11-Agentic-AI-Architecture-Patterns-and-the-Autonomy-Spectrum.md;EATE-Level-2/M2.3-Art8-Tool-Use-Design-and-Authorization.md. - Sibling credential: AITM-AAG Article 4 (governance-facing tool oversight).
- Forward reference: Articles 6 (authorization), 14 (indirect injection), 21 (sandboxing), 22 (policy engines), 26 (tool registry).