Skip to main content
AITE M1.2-Art04 v1.0 Reviewed 2026-04-06 Open Access
M1.2 The COMPEL Six-Stage Lifecycle
AITF · Foundations

Agent Loop Patterns — ReAct, Plan-and-Execute, Reflexion, State-Graph

Agent Loop Patterns — ReAct, Plan-and-Execute, Reflexion, State-Graph — Transformation Design & Program Architecture — Advanced depth — COMPEL Body of Knowledge.

9 min read Article 4 of 53 Produce

COMPEL Specialization — AITE-ATS: Agentic AI Systems Architect Expert Article 4 of 40


Thesis. The loop is the beating heart of the agent. Inside whichever runtime the architect chose in Article 3, the loop decides how the model’s reasoning interleaves with tool calls, how the agent responds to failure, and when the agent stops. Four loop patterns have established themselves; each has a published academic lineage, a canonical failure mode, and at least two framework implementations. An architect who cannot name the loop pattern in their own system is architecting by accident.

The four loop patterns

ReAct — interleaved reasoning and acting

ReAct (Yao et al., ICLR 2023, https://arxiv.org/abs/2210.03629) interleaves reason (a natural-language thought) and act (a tool call). Each iteration has the form: Thought → Action → Observation → Thought → Action → Observation. The thoughts are the agent’s working memory in plain language; the actions are its effect on the world.

Thought: I need the current temperature in Paris to answer the question.
Action:  get_weather(city="Paris")
Observation: 18C, partly cloudy
Thought: I have enough to answer.
Final:   It is currently 18C in Paris with partly cloudy skies.

ReAct’s strength is transparency — every step is inspectable in plain language, which makes observability (Article 15) and incident analysis (Article 25) trivial. ReAct’s failure mode is infinite looping when no termination condition fires; the architect’s fix is a hard max_steps cap plus escalation (Article 9). LangGraph, CrewAI, AutoGen, and OpenAI Agents SDK all implement ReAct natively or as the default behaviour when tools are provided.

Plan-and-Execute — planner plus executor

Plan-and-Execute splits the loop in two: a planner produces a full plan up front (list of steps with dependencies), then an executor walks the plan step by step, possibly re-planning on failure. The pattern is well-documented in academic literature (BabyAGI reference implementation, 2023; LangChain’s Plan-and-Execute agent; LlamaIndex planner agents).

PLANNER phase:
  input: goal
  output: [step1, step2, step3, step4]
EXECUTOR phase:
  for step in plan:
    execute(step)
    if failure: re-plan from here

Plan-and-Execute’s strength is efficiency on long-horizon tasks — the planner commits to a trajectory and the executor runs it without re-deliberating at every step, saving tokens. Its failure mode is stale plans when the plan was constructed on out-of-date context; the fix is re-plan triggers tied to observation signals. Devin (Cognition AI, 2024) uses a Plan-and-Execute variant for long-horizon software tasks; Microsoft AutoGen supports the pattern via group-chat planner patterns.

Reflexion — self-critique and revision

Reflexion (Shinn et al., NeurIPS 2023, https://arxiv.org/abs/2303.11366) adds a self-critique step: after the model produces an action, a critic model (often the same model in a different role) reviews the action, spots errors, and produces a revised plan or output. The loop form: Act → Critique → Revise → Act.

Act:      generate code for the function
Critique: does this handle the null case? does it pass the tests?
Revise:   add null-check; fix edge case
Act:      resubmit revised code

Reflexion’s strength is accuracy on tasks with verifiable outcomes — tests pass or fail, code compiles or does not. Its failure mode is looped self-flagellation where the critic never accepts any revision; the fix is a critic-budget (max critique rounds) plus a no-progress detector. Reflexion is heavily used in code agents (Replit, Devin, Cursor agent modes) and in research agents that cite sources.

Explicit state-graph — named states and transitions

Explicit state-graph (the runtime pattern from Article 3, used here as a loop pattern) models the agent loop as a directed graph where each node is a named handler and edges are transitions. Instead of implicit control flow embedded in prompts, the state machine is code.

graph:
  START → plan
  plan  → act
  act   → validate
  validate → human_gate (on risk) | act (on retry) | END (on success)
  human_gate → act (on approve) | END (on reject)

The state-graph loop’s strength is gate placement — any node can be a human-approval gate (Article 10), a policy check (Article 22), a memory-write point (Article 7). Its failure mode is graph sprawl as edge cases accumulate; the fix is periodic graph refactoring. LangGraph canonicalises this pattern; AWS Step Functions with Bedrock implements it cloud-natively.

Diagram 1 — Side-by-side loop iteration

   ReAct              Plan-Execute          Reflexion           State-Graph
   ─────              ────────────          ─────────           ───────────
   Thought              ┌────────┐          Act                 ┌────┐
     │                  │ PLANNER│            │                 │plan│
     ▼                  └───┬────┘            ▼                 └──┬─┘
   Action                  plan             Critique                │
     │                  ┌───▼────┐            │                 ┌───▼──┐
     ▼                  │EXECUTOR│            ▼                 │ act  │
   Observation          └───┬────┘         Revise               └───┬──┘
     │                      │                │                     │
     └──── loop ────┘        └─ per step ─┘   └─── loop ────┘    validate

                                                                 ┌────▼────┐
                                                                 │human_gate│
                                                                 └─────────┘

The four patterns compose — an agent can run ReAct inside an executor node, a Reflexion critique inside a state-graph validate node. A senior architect picks the top-level pattern and mixes inner patterns deliberately.

Diagram 2 — Bridge from problem characteristics to loop choice

   Problem characteristics                 →      Loop pattern
   ─────────────────────────                      ───────────────────
   short horizon, exploratory reasoning    →      ReAct
   long horizon, repetitive steps          →      Plan-and-Execute
   verifiable correctness criterion        →      Reflexion
   regulated, gate-heavy, audited          →      State-graph
   interactive conversation                →      ReAct (conversational)
   multi-agent coordination                →      State-graph (+ actor)
   ambiguous spec, iterative refinement    →      Reflexion inside ReAct

Worked decision table

  • Customer service chatbot answering with tool assistance → ReAct. Conversational, short-horizon, tools are lookups. A Chevrolet-of-Watsonville-style prompt-injection demonstrates why hard termination is required.
  • Document pipeline that reads 10,000 forms, extracts fields, posts to ERP → Plan-and-Execute. Long-horizon, repetitive, plan per batch.
  • Code agent that must make tests pass → Reflexion inside ReAct. The critic reviews proposed code; the ReAct outer loop handles runtime failure.
  • Mortgage-underwriting agent with HITL gates at KYC and at final decision → State-graph. Gates are first-class nodes.
  • Research agent citing sources → Reflexion inside ReAct. The critic validates each citation.
  • Multi-agent coordination across three agents → State-graph orchestrator with ReAct inside each agent.

Failure-mode catalog

PatternCharacteristic failureArchitectural fix
ReActinfinite loopmax_steps cap + escalation (Article 9)
ReActnon-progress (same thought repeated)no-progress detector + kill-switch
Plan-and-Executestale planre-plan on observation-signal trigger
Plan-and-Executeplanner hallucinates stepsplan validator before executor starts
Reflexionlooped self-critiquecritic-budget + accept-on-timeout policy
Reflexioncritic capturerotate critic model; adversarial critic
State-graphgraph sprawlperiodic refactor; max-node warning
State-graphdead pathscompleteness test on every deploy

Loop pattern selection drives three other decisions

Once the loop pattern is chosen, three other decisions follow with near-mechanical predictability:

  1. Observability schema. ReAct emits thoughts as spans; Plan-and-Execute emits plan + executions; Reflexion emits act + critique pairs; state-graph emits transitions. Article 15 covers the schema per pattern.
  2. Replay strategy. State-graph is trivial to replay; ReAct requires full history capture; Reflexion needs critic history as well; Plan-and-Execute needs plan + execution history. Article 15 again.
  3. Evaluation structure. ReAct tests step-quality; Plan-and-Execute tests plan-quality; Reflexion tests revision-effectiveness; state-graph tests gate-correctness. Article 17.

The loop choice is upstream of half the credential’s architectural decisions.

Real-world anchors

Yao et al. — ReAct in the academic literature

The ReAct paper (Yao, Zhao, Yu, Du, Shafran, Narasimhan, Cao; ICLR 2023) established the interleaved reasoning+acting pattern as the default for tool-using language models. The paper’s HotpotQA and AlfWorld benchmarks remain reference points for any evaluation of reasoning-driven agents. The paper’s finding — that interleaving reasoning with action outperforms either pure planning or pure acting — is what made ReAct the default loop shape in virtually every agent framework that followed. The architect’s reading list for AITE-ATS starts with this paper. Source: https://arxiv.org/abs/2210.03629.

Devin’s planning approach (Cognition AI, 2024)

Cognition AI’s public posts on Devin’s architecture describe a planning layer that decomposes software-engineering tickets into plans with hundreds of steps, executed by a ReAct-style executor with Reflexion-style critique on test results. The 2024 launch post and subsequent engineering posts are specific enough to teach from: the planner is invoked per ticket, the executor runs step by step with sandbox isolation (Article 21), and the Reflexion critic reviews proposed code against the test suite before the next iteration. The 13.86% SWE-bench result at launch is not the lesson; the lesson is that three loop patterns composed together produced a production-viable long-horizon agent. Source: https://www.cognition.ai/.

Voyager — lineage for long-horizon learning agents

Wang et al.’s Voyager paper (TMLR 2024, https://arxiv.org/abs/2305.16291) — an open-ended embodied agent for Minecraft with a skill library — is cited as lineage, not as a recommended pattern for enterprise agents. Voyager’s contribution for the architect is the skill library concept: the agent accumulates reusable skills as it progresses. Most enterprise agents should not implement open-ended skill acquisition (the L5 boundary), but the pattern of externalising learned behaviours to an inspectable library is one the architect can adopt without L5 risk.

Closing

Four loop patterns; one primary pattern per agent; inner patterns composable. The loop-pattern decision is recorded on the autonomy statement artefact alongside the autonomy-level classification. Article 5 now takes up the tool surface — the bridge between the loop and the world.

Learning outcomes check

  • Explain four loop patterns with iteration-level diagrams.
  • Classify five use cases by best-fit loop pattern.
  • Evaluate a loop design for its characteristic failure mode and propose the architectural fix.
  • Design a loop-pattern decision table for a given portfolio.

Cross-reference map

  • Core Stream: EATF-Level-1/M1.4-Art11-Agentic-AI-Architecture-Patterns-and-the-Autonomy-Spectrum.md.
  • Sibling credential: AITM-AAG Article 2 (governance-facing loop patterns).
  • Forward reference: Articles 15 (observability), 17 (evaluation), 20 (platform).