Architecture for Agentic Use Cases

FlowRidge

This article walks the incremental architecture for agentic use cases, classifies three use cases by agent depth, and draws the boundary between AITE-SAT and AITE-ATS scope.

What changes when a system becomes agentic

A non-agentic system executes a fixed workflow: receive request, retrieve context, call model, return response, log. The architect specifies each step and its order. An agentic system defers some of that specification to the model’s runtime judgement. The agent decides which tool to call, how to chain tools, whether to retry, and when to stop. That deferral is the source of agentic systems’ flexibility and the source of their new failure modes.

Incremental components an agentic architecture adds to the Article 1 reference architecture:

Agent runtime. The loop that takes the model’s output, interprets it (did the model ask to call a tool, ask a follow-up, finalise), invokes the requested action, and feeds the result back to the model. ReAct, function-calling, LangGraph, CrewAI, AutoGen, OpenAI’s Agents SDK, LlamaIndex Agents, and Anthropic’s tool-use conventions are the common implementations.¹

Tool registry. A catalogue of tools the agent can call, each with a spec (name, parameters, return type, authorisation, side-effect classification). Discussed in Article 7 for single-agent tool use; the registry concern scales with agent complexity.

State store. Where the agent keeps its intermediate reasoning, scratch memory, and inter-step context. Can be as simple as the model’s context window for short tasks; as elaborate as a dedicated memory database for long-running agents.

Safety kill-switch. The single control that halts all agent execution. Discussed in Article 20; its design is mandatory, not optional, for agentic systems.

Recovery path. What the agent does when a tool fails, when it loses confidence, or when it detects a policy violation. Non-agentic systems fail forward or fail visible; agentic systems must fail predictably.

Observability extensions. The trace schema includes tool-call sequences, iteration counts, budget consumption, and decision rationales. Discussed in Article 13.

Agent depth — classifying use cases

Agent depth is a spectrum, not a binary. A useful classification has three levels.

Level 1 — Bounded assist

The agent has a small tool set (one to three tools), a single task per invocation, and a low iteration cap (two to five steps). The user initiates the interaction and sees the result within a normal interaction window. Examples: a customer-support assistant that can look up order status and escalate to a human, a research assistant that can query a small set of internal sources.

Architectural load: modest. Most of Article 7 applies directly. Observability needs the tool-call trace. The kill-switch is present but rarely exercised.

Level 2 — Constrained autonomy

The agent has a wider tool set (five to ten), may run a longer workflow (up to tens of steps), can produce intermediate artefacts, and may be asked to complete a multi-step task without user intervention for each step. Examples: a coding assistant that makes code changes, runs tests, and drafts a pull-request description; a back-office agent that triages tickets, pulls relevant context, and routes.

Architectural load: higher. State management becomes significant. Cost and latency become harder to predict. The recovery path needs more design. The kill-switch is likely to be exercised; the observability needs to support debugging of multi-step workflows.

Level 3 — Multi-agent, open-ended

Multiple agents collaborate. Tasks are open-ended. Tool sets are broad and may include tools that take real-world actions (sending email, changing system state, initiating transactions). Iteration counts can be high. Examples: Devin-class autonomous software engineers; multi-agent research teams working on long-horizon analysis tasks.²

Architectural load: very high. This is where AITE-SAT ends and AITE-ATS begins. The architectural disciplines required — multi-agent orchestration, principal-agent delegation patterns, agent-to-agent contract design, safety boundaries for compound autonomy, regulatory mapping for agentic AI specifically — exceed the scope of a general solutions-architect curriculum. AITE-SAT holders encountering Level 3 use cases should either bring in an AITE-ATS-credentialed architect or pursue that credential.

The agency-boundary ADR

Every agentic use case warrants a dedicated ADR (Article 23) on the agency boundary. What does the agent have authority to do, and what must it defer. The ADR addresses:

Tool permission matrix. Which tools the agent can call autonomously; which require user confirmation; which are absolutely out of scope. A read-only calendar lookup is usually autonomous; sending an email or changing a record usually requires confirmation; wiring money or signing a contract is usually out of scope.

Iteration and cost cap. The maximum iterations per task and the maximum cost per task. Agents that do not terminate or consume unbounded resources are a common failure mode.

Escalation triggers. Conditions under which the agent must route to a human: low confidence, tool-call failure pattern, detected policy violation, cost cap approach, user-requested.

Scope boundary. The domains the agent engages with and those it refuses.

Reviewability. Whether every action taken by the agent must be reviewable after the fact and by whom.

This ADR is usually longer and more contested than most. It deserves the extra time.

Safety boundaries

Agentic systems compound risk because the agent’s actions compound across iterations. A small misstep in one iteration can cascade into a larger problem in subsequent iterations. Safety boundaries the architect specifies:

Hard tool limits. Tools that cannot be called regardless of agent state (for example, tools that irreversibly move money or change system state in production without a human signoff). The tool registry flags these and the runtime enforces the block.

Soft tool limits with confirmation. Tools that can be called but require explicit user or reviewer confirmation at the moment of use. Confirmation is a live UX moment, not a blanket earlier consent.

Cost ceilings. Per-task and per-user cost ceilings enforced by the platform. An agent that detects it is approaching the ceiling must ask before exceeding; it does not silently stop delivering the task.

Iteration ceilings. A hard cap on iterations prevents infinite loops. Configurable; typically set low for first deployments and loosened as the team gains experience.

Prompt-injection defence. Agents that read external content (web pages, documents, emails) are exposed to prompt injection attacks at every read. Article 14’s defences apply and are strengthened: output validators at every step, context-segregation between user instruction and external content, provenance tagging of all inputs.

Kill-switch drill cadence. Quarterly kill-switch drills are mandatory for Level 2 and above. The first incident is not the time to discover that the kill-switch logic has bitrotted.

The OWASP LLM Top 10 and the emerging OWASP agentic AI materials should be read alongside this section.³ The AITE-ATS curriculum goes deeper into agent-specific safety patterns.

Observability for agentic systems

Agent observability extends the Article 13 trace schema with agentic signals:

Tool-call count per task. Distribution tracked; outliers inspected.
Iteration count per task. Distribution tracked; infinite-loop detection.
Per-task cost and latency. Per-task not per-model-call, because agent tasks are the user-relevant unit.
Tool-call success rate. Per tool; failing tools point to integration health.
Escalation rate. Per-escalation-reason; trending over time.
Completion rate. Tasks that complete successfully versus those that fail or get abandoned.
Decision-rationale capture. The agent’s own reasoning steps stored for debugging and training eval sets.

The agent’s trace tree is interpretable; a debugger that can scrub through an agent task like a debugger scrubs a function call is the standard today. Langfuse, Arize, Weights & Biases, and LangSmith all offer agentic trace visualisation; build-your-own on OpenTelemetry is viable for teams with strong observability discipline.⁴

Worked example — Replit Agent

Replit’s public Agent launch positioned the feature as an autonomous coding assistant capable of building, running, and iterating on applications within Replit’s environment.⁵ Architecturally this is a bounded agentic system at the higher end of Level 2: the agent has tools for writing code, running shells, executing tests, reading logs, and making Git changes; the environment is the sandbox within Replit.

The architectural decisions the AITE-SAT architect would study:

Agency boundary: the agent’s tools are confined to a sandboxed environment; agent actions do not propagate outside unless explicitly checkpointed into the user’s project.
Recovery path: failed tool calls route back to the agent with error context; repeated failures trigger escalation or abort.
Kill-switch: the user can stop the agent mid-task.
Cost and iteration controls: visible budgets and iteration limits.
Observability: step-by-step trace visible to the user.

The architectural decisions that would concern a reviewer before scope expansion: what if the agent is allowed to deploy from the sandbox to production? What if agent actions on one user’s sandbox can leak to another? These concerns escalate to AITE-ATS scope as the use case does.

Worked example — Devin

Cognition AI’s Devin, launched publicly in 2024, demonstrated an autonomous software engineering agent capable of long-horizon task completion.⁶ This is a Level 3 system. The architectural disciplines needed — ongoing state management, principal-agent trust design, robust rollback of agent actions at scale, regulatory posture for agent-initiated actions — put it squarely in AITE-ATS territory.

AITE-SAT holders reading about Devin take lessons in the architectural patterns that generalise (state management, cost control, kill-switch design) and recognise the disciplines that do not (multi-agent coordination, long-horizon memory design, agent-to-agent contract protocols) as AITE-ATS content.

Worked example — Anthropic Computer Use

Anthropic’s Computer Use, released in late 2024, allows Claude models to operate a computer directly — moving a mouse, clicking, typing, taking screenshots.⁷ The architectural innovation is screen-understanding as a tool and action-taking as a tool. The AITE-SAT implication: computer-use tools are a new class of tool with distinctive failure modes (screen-reading hallucinations, wrong-window actions) that need specific guardrails.

A team adopting Computer Use in production would add: visual-diff validation of the screen state the agent claims to have seen, action-replay logs, restricted-scope sandboxes for the first deployments, and graduated agency expansion as confidence in the behaviour accumulates.

Where AITE-SAT ends and AITE-ATS begins

A practical threshold: if the use case fits in Level 1 (bounded assist) the AITE-SAT architect owns it. At Level 2, AITE-SAT can lead if the architect has read ahead into the agentic specialisation materials; consultation with an AITE-ATS architect is wise. At Level 3, AITE-SAT contributes the general solutions-architecture input and AITE-ATS owns the agentic architecture.

The credential separation is deliberate. A solutions architect who attempts a Level 3 agentic system without the AITE-ATS discipline is likely to under-specify the multi-agent coordination, the safety boundaries, or the regulatory posture. Recognising the boundary and partnering effectively is part of the AITE-SAT job description.

Governance integration

The EU AI Act does not yet have a dedicated agentic-AI chapter but Articles 14 (human oversight), 15 (robustness), and 26 (deployer obligations) apply with force.⁸ Agentic systems also interact with GPAI obligations (Articles 50-56) when the underlying model is general-purpose. The ISO/IEC standards catalogue is still catching up to agentic AI specifically; the existing ISO/IEC 42001 clauses on system lifecycle and risk management apply.

NIST AI RMF’s profile-based approach makes it natural to define an agentic profile — a subset of controls emphasised for agentic systems. A profile would emphasise MAP 3.2 (explanation), MEASURE 2.3 (human-AI configurations), MANAGE 2.3 (risk response), and GOVERN 1.7 (risk tolerance) relative to the baseline.

Anti-patterns

“Agentic” as a marketing wrapper on a non-agentic system. Using function-calling once per request does not make a system agentic. The architect labels accurately to keep the review discipline appropriate.
No iteration cap. Agents that can loop forever will, eventually. Caps are mandatory.
No cost cap. A runaway agent can burn a month’s budget in an hour. Caps enforced at the platform.
Kill-switch that requires code deployment. The kill-switch must be runtime-configurable; deploying code during an incident is the wrong response.
Tool registry that is not reviewed. Tools that can take irreversible action must be reviewed before they enter the registry. A casual addition of an email-send tool is a live fire incident.
Agent debugging without trace visualisation. Without a trace viewer, debugging a 30-step agent failure by reading raw logs is impractical; the team stops debugging, issues persist.

Summary

Agentic use cases add an agent runtime, a tool registry, a state store, a safety kill-switch, a recovery path, and observability extensions to the reference architecture. Agent depth classifies the incremental architectural load: Level 1 (bounded assist), Level 2 (constrained autonomy), Level 3 (multi-agent open-ended). AITE-SAT covers Levels 1 and 2 with consultation; Level 3 belongs to AITE-ATS. The agency-boundary ADR is the architect’s most consequential artefact for any agentic use case.

Key terms

Agentic system
Agent runtime
Tool registry (agent)
Agency-boundary ADR
Agent depth levels

Learning outcomes

After this article the learner can: explain agentic architecture deltas from Article 1 reference architecture; classify three use cases by agent depth; evaluate an agent design for safety completeness; design a boundary document separating agentic scope from non-agentic scope.