COMPEL Specialization — AITE-ATS: Agentic AI Systems Architect Expert Article 4 of 40
Thesis. The loop is the beating heart of the agent. Inside whichever runtime the architect chose in Article 3, the loop decides how the model’s reasoning interleaves with tool calls, how the agent responds to failure, and when the agent stops. Four loop patterns have established themselves; each has a published academic lineage, a canonical failure mode, and at least two framework implementations. An architect who cannot name the loop pattern in their own system is architecting by accident.
The four loop patterns
ReAct — interleaved reasoning and acting
ReAct (Yao et al., ICLR 2023, https://arxiv.org/abs/2210.03629) interleaves reason (a natural-language thought) and act (a tool call). Each iteration has the form: Thought → Action → Observation → Thought → Action → Observation. The thoughts are the agent’s working memory in plain language; the actions are its effect on the world.
Thought: I need the current temperature in Paris to answer the question.
Action: get_weather(city="Paris")
Observation: 18C, partly cloudy
Thought: I have enough to answer.
Final: It is currently 18C in Paris with partly cloudy skies.
ReAct’s strength is transparency — every step is inspectable in plain language, which makes observability (Article 15) and incident analysis (Article 25) trivial. ReAct’s failure mode is infinite looping when no termination condition fires; the architect’s fix is a hard max_steps cap plus escalation (Article 9). LangGraph, CrewAI, AutoGen, and OpenAI Agents SDK all implement ReAct natively or as the default behaviour when tools are provided.
Plan-and-Execute — planner plus executor
Plan-and-Execute splits the loop in two: a planner produces a full plan up front (list of steps with dependencies), then an executor walks the plan step by step, possibly re-planning on failure. The pattern is well-documented in academic literature (BabyAGI reference implementation, 2023; LangChain’s Plan-and-Execute agent; LlamaIndex planner agents).
PLANNER phase:
input: goal
output: [step1, step2, step3, step4]
EXECUTOR phase:
for step in plan:
execute(step)
if failure: re-plan from here
Plan-and-Execute’s strength is efficiency on long-horizon tasks — the planner commits to a trajectory and the executor runs it without re-deliberating at every step, saving tokens. Its failure mode is stale plans when the plan was constructed on out-of-date context; the fix is re-plan triggers tied to observation signals. Devin (Cognition AI, 2024) uses a Plan-and-Execute variant for long-horizon software tasks; Microsoft AutoGen supports the pattern via group-chat planner patterns.
Reflexion — self-critique and revision
Reflexion (Shinn et al., NeurIPS 2023, https://arxiv.org/abs/2303.11366) adds a self-critique step: after the model produces an action, a critic model (often the same model in a different role) reviews the action, spots errors, and produces a revised plan or output. The loop form: Act → Critique → Revise → Act.
Act: generate code for the function
Critique: does this handle the null case? does it pass the tests?
Revise: add null-check; fix edge case
Act: resubmit revised code
Reflexion’s strength is accuracy on tasks with verifiable outcomes — tests pass or fail, code compiles or does not. Its failure mode is looped self-flagellation where the critic never accepts any revision; the fix is a critic-budget (max critique rounds) plus a no-progress detector. Reflexion is heavily used in code agents (Replit, Devin, Cursor agent modes) and in research agents that cite sources.
Explicit state-graph — named states and transitions
Explicit state-graph (the runtime pattern from Article 3, used here as a loop pattern) models the agent loop as a directed graph where each node is a named handler and edges are transitions. Instead of implicit control flow embedded in prompts, the state machine is code.
graph:
START → plan
plan → act
act → validate
validate → human_gate (on risk) | act (on retry) | END (on success)
human_gate → act (on approve) | END (on reject)
The state-graph loop’s strength is gate placement — any node can be a human-approval gate (Article 10), a policy check (Article 22), a memory-write point (Article 7). Its failure mode is graph sprawl as edge cases accumulate; the fix is periodic graph refactoring. LangGraph canonicalises this pattern; AWS Step Functions with Bedrock implements it cloud-natively.
Diagram 1 — Side-by-side loop iteration
ReAct Plan-Execute Reflexion State-Graph
───── ──────────── ───────── ───────────
Thought ┌────────┐ Act ┌────┐
│ │ PLANNER│ │ │plan│
▼ └───┬────┘ ▼ └──┬─┘
Action plan Critique │
│ ┌───▼────┐ │ ┌───▼──┐
▼ │EXECUTOR│ ▼ │ act │
Observation └───┬────┘ Revise └───┬──┘
│ │ │ │
└──── loop ────┘ └─ per step ─┘ └─── loop ────┘ validate
│
┌────▼────┐
│human_gate│
└─────────┘
The four patterns compose — an agent can run ReAct inside an executor node, a Reflexion critique inside a state-graph validate node. A senior architect picks the top-level pattern and mixes inner patterns deliberately.
Diagram 2 — Bridge from problem characteristics to loop choice
Problem characteristics → Loop pattern
───────────────────────── ───────────────────
short horizon, exploratory reasoning → ReAct
long horizon, repetitive steps → Plan-and-Execute
verifiable correctness criterion → Reflexion
regulated, gate-heavy, audited → State-graph
interactive conversation → ReAct (conversational)
multi-agent coordination → State-graph (+ actor)
ambiguous spec, iterative refinement → Reflexion inside ReAct
Worked decision table
- Customer service chatbot answering with tool assistance → ReAct. Conversational, short-horizon, tools are lookups. A Chevrolet-of-Watsonville-style prompt-injection demonstrates why hard termination is required.
- Document pipeline that reads 10,000 forms, extracts fields, posts to ERP → Plan-and-Execute. Long-horizon, repetitive, plan per batch.
- Code agent that must make tests pass → Reflexion inside ReAct. The critic reviews proposed code; the ReAct outer loop handles runtime failure.
- Mortgage-underwriting agent with HITL gates at KYC and at final decision → State-graph. Gates are first-class nodes.
- Research agent citing sources → Reflexion inside ReAct. The critic validates each citation.
- Multi-agent coordination across three agents → State-graph orchestrator with ReAct inside each agent.
Failure-mode catalog
| Pattern | Characteristic failure | Architectural fix |
|---|---|---|
| ReAct | infinite loop | max_steps cap + escalation (Article 9) |
| ReAct | non-progress (same thought repeated) | no-progress detector + kill-switch |
| Plan-and-Execute | stale plan | re-plan on observation-signal trigger |
| Plan-and-Execute | planner hallucinates steps | plan validator before executor starts |
| Reflexion | looped self-critique | critic-budget + accept-on-timeout policy |
| Reflexion | critic capture | rotate critic model; adversarial critic |
| State-graph | graph sprawl | periodic refactor; max-node warning |
| State-graph | dead paths | completeness test on every deploy |
Loop pattern selection drives three other decisions
Once the loop pattern is chosen, three other decisions follow with near-mechanical predictability:
- Observability schema. ReAct emits thoughts as spans; Plan-and-Execute emits plan + executions; Reflexion emits act + critique pairs; state-graph emits transitions. Article 15 covers the schema per pattern.
- Replay strategy. State-graph is trivial to replay; ReAct requires full history capture; Reflexion needs critic history as well; Plan-and-Execute needs plan + execution history. Article 15 again.
- Evaluation structure. ReAct tests step-quality; Plan-and-Execute tests plan-quality; Reflexion tests revision-effectiveness; state-graph tests gate-correctness. Article 17.
The loop choice is upstream of half the credential’s architectural decisions.
Real-world anchors
Yao et al. — ReAct in the academic literature
The ReAct paper (Yao, Zhao, Yu, Du, Shafran, Narasimhan, Cao; ICLR 2023) established the interleaved reasoning+acting pattern as the default for tool-using language models. The paper’s HotpotQA and AlfWorld benchmarks remain reference points for any evaluation of reasoning-driven agents. The paper’s finding — that interleaving reasoning with action outperforms either pure planning or pure acting — is what made ReAct the default loop shape in virtually every agent framework that followed. The architect’s reading list for AITE-ATS starts with this paper. Source: https://arxiv.org/abs/2210.03629.
Devin’s planning approach (Cognition AI, 2024)
Cognition AI’s public posts on Devin’s architecture describe a planning layer that decomposes software-engineering tickets into plans with hundreds of steps, executed by a ReAct-style executor with Reflexion-style critique on test results. The 2024 launch post and subsequent engineering posts are specific enough to teach from: the planner is invoked per ticket, the executor runs step by step with sandbox isolation (Article 21), and the Reflexion critic reviews proposed code against the test suite before the next iteration. The 13.86% SWE-bench result at launch is not the lesson; the lesson is that three loop patterns composed together produced a production-viable long-horizon agent. Source: https://www.cognition.ai/.
Voyager — lineage for long-horizon learning agents
Wang et al.’s Voyager paper (TMLR 2024, https://arxiv.org/abs/2305.16291) — an open-ended embodied agent for Minecraft with a skill library — is cited as lineage, not as a recommended pattern for enterprise agents. Voyager’s contribution for the architect is the skill library concept: the agent accumulates reusable skills as it progresses. Most enterprise agents should not implement open-ended skill acquisition (the L5 boundary), but the pattern of externalising learned behaviours to an inspectable library is one the architect can adopt without L5 risk.
Closing
Four loop patterns; one primary pattern per agent; inner patterns composable. The loop-pattern decision is recorded on the autonomy statement artefact alongside the autonomy-level classification. Article 5 now takes up the tool surface — the bridge between the loop and the world.
Learning outcomes check
- Explain four loop patterns with iteration-level diagrams.
- Classify five use cases by best-fit loop pattern.
- Evaluate a loop design for its characteristic failure mode and propose the architectural fix.
- Design a loop-pattern decision table for a given portfolio.
Cross-reference map
- Core Stream:
EATF-Level-1/M1.4-Art11-Agentic-AI-Architecture-Patterns-and-the-Autonomy-Spectrum.md. - Sibling credential: AITM-AAG Article 2 (governance-facing loop patterns).
- Forward reference: Articles 15 (observability), 17 (evaluation), 20 (platform).