Skip to main content
AITE M1.2-Art13 v1.0 Reviewed 2026-04-06 Open Access
M1.2 The COMPEL Six-Stage Lifecycle
AITF · Foundations

Agentic RAG and Dynamic Knowledge Access

Agentic RAG and Dynamic Knowledge Access — Transformation Design & Program Architecture — Advanced depth — COMPEL Body of Knowledge.

9 min read Article 13 of 53

COMPEL Specialization — AITE-ATS: Agentic AI Systems Architect Expert Article 13 of 40


Thesis. Static RAG is a well-understood pattern: query the vector store, attach the top-k passages to the prompt, generate the answer. Agentic RAG replaces that single-shot retrieval with an agent that decides when to retrieve, what to retrieve, whether to retrieve again based on what it found, and when to stop. The architectural delta sounds incremental. It is not. Every architectural property of static RAG — determinism, latency bound, cost cap, injection surface — is different in the agentic version. This article walks four agentic-RAG patterns, names the unique risks, and gives the architect a spec discipline for treating agentic RAG as the composed system it actually is.

From static RAG to agentic RAG

Static RAG is a two-step pipeline: retrieve (deterministic or near-deterministic vector search), then generate (single LLM call with retrieved context). The architect’s levers are embedding model, chunking strategy, top-k, and re-ranking. Failure modes are predictable: missed retrieval, irrelevant retrieval, hallucination despite retrieval.

Agentic RAG replaces the pipeline with an agent loop. The model emits retrieval queries as tool calls. It evaluates the returned passages. It may issue follow-up queries, re-rank, merge, reject, or decide it has enough. Retrieval is under the agent’s control, not the pipeline’s.

The four agentic-RAG patterns

Pattern 1 — Planner-driven retrieval

The planner agent decomposes the question into sub-queries; the executor retrieves for each; a synthesizer merges. Works well for complex questions that decompose cleanly into independent sub-questions. Falls apart when sub-questions are interdependent.

Typical implementation: LlamaIndex SubQuestionQueryEngine, LangGraph multi-node graph, OpenAI Agents SDK agent with a retrieval tool and a planner system prompt.

Pattern 2 — Multi-hop retrieval

The agent issues a query, reads results, forms a follow-up query informed by what it just learned, reads those results, and so on. Each hop depends on the previous. Works for questions where the answer path is not known in advance (research agents, complex legal or medical research).

Multi-hop is expensive — tokens compound across hops — and benefits from hop budgets. It is also the most susceptible to goal drift: each hop changes the agent’s focus, and the final answer may have drifted from the original question.

Pattern 3 — Self-correcting retrieval

The agent retrieves, attempts an answer, critiques the answer against the retrieved passages, identifies gaps, retrieves to fill gaps, re-answers. The loop is Reflexion-style (Article 4). Works when verifiable correctness is available (the answer references specific facts that can be checked against passages).

Pattern 4 — Adaptive retrieval

The agent decides at each step whether to retrieve at all, based on confidence in its current answer. High-confidence questions skip retrieval; low-confidence questions retrieve first. Cost-effective for mixed-difficulty workloads.

The injection surface expansion

In static RAG, indirect prompt injection through retrieved content is a known risk; mitigations include sanitization, source allow-lists, and classification. In agentic RAG the injection surface is larger.

Surface 1 — Passage-level injection. A document in the corpus contains embedded instructions. Same as static RAG.

Surface 2 — Query-chain hijack. An injection in hop 1’s retrieved content shapes hop 2’s query. The attacker doesn’t just inject the final answer; they redirect the agent’s research trajectory.

Surface 3 — Tool-call injection via retrieval. The agent has other tools besides retrieve. An injection in retrieved content can say “call send_email with attacker-controlled arguments.” Static RAG has no tool-call surface; agentic RAG does.

Surface 4 — Memory contamination. If the agentic RAG system writes learned facts to long-term memory (Article 7 Pattern C), an injected fact can enter memory and affect future sessions.

Architectural mitigations, layered:

  • Source trust tagging. Every passage enters the agent’s context tagged with its source. Model instructions follow the higher-trust sources preferentially; lower-trust sources are treated as data, not as instruction.
  • Sanitizer on retrieval output. Before passages enter the context, a sanitizer strips or flags instruction patterns. This is imperfect but it raises the attack cost.
  • Tool-call policy that doesn’t trust retrieval-suggested calls. If the agent’s reasoning includes “the document says to send an email,” the policy engine (Article 22) re-evaluates that action against its own rules, regardless of what the document said.
  • Hop-budget enforcement. Caps the blast-radius of a query-chain hijack.
  • Memory write policy. Retrieval-derived facts cannot write directly to trusted memory; they enter at a lower classification (Article 7).
  • Domain allow-lists. Retrieval is restricted to approved corpora. Web search, if supported, has domain allow-lists or deny-lists.

Retrieval freshness and the agent’s mental model

Agentic RAG implicitly promises the agent has current information. When the corpus is stale, the agent still synthesizes as though it is current. The architect addresses this with:

Freshness metadata. Every passage carries a last_updated field; the agent’s prompt guides it to weigh recency appropriately.

Staleness detection. Periodic checks (human or automated) that the corpus has been updated within its SLA; alerts fire when the SLA is breached.

Explicit expiration. Passages older than a threshold are filtered from retrieval unless the query specifically requests archival content.

Agentic RAG cost economics

A static RAG query costs roughly (embedding) + (retrieval, cheap) + (one LLM generation). An agentic RAG query costs embedding + multiple retrievals + multiple LLM calls + synthesizer. A three-hop agentic RAG query can easily cost 10× a static RAG query for the same answer. The architect decides when agentic is worth the markup.

The economic heuristic: agentic RAG is worth it when the question is either (a) decomposable into sub-questions where sub-retrieval gains precision, or (b) needs iterative refinement because the right answer depends on what was found. If the question is straightforwardly “lookup X,” static RAG is cheaper and usually better.

Evaluation differences

Static RAG is evaluated on retrieval metrics (precision, recall, MRR) and generation metrics (faithfulness, relevance). Agentic RAG evaluation adds trajectory metrics: hop count, hop efficiency, query-quality evolution, termination correctness (did the agent stop at the right hop?), cost per answer. Evaluation harnesses for agentic RAG (Article 17) must capture the full trajectory, not just the final answer.

Framework parity — agentic RAG across frameworks

  • LangGraph — a directed graph with retrieval as a tool node; explicit hop structure; checkpointing for debug. Strong fit for self-correcting patterns.
  • CrewAI — a researcher agent with a retrieval tool and task-level prompts guiding iteration; sequential or hierarchical process.
  • AutoGen — retrieval as function-calling capability inside a group chat; multi-agent pattern with a researcher + critic is native.
  • OpenAI Agents SDK — retrieval tools with structured schemas; handoffs between retrieval and synthesis agents; guardrails on retrieval output.
  • Semantic Kernel — retrieval as a plugin; planner composes retrieval + synthesis; process framework expresses multi-hop explicitly.
  • LlamaIndex Agents — richest native retrieval ecosystem; SubQuestionQueryEngine, MultiStepQueryEngine, SelfCorrectingQueryEngine all ship out of the box; agents plug on top.

LlamaIndex Agents is the framework with the richest native agentic-RAG primitives; LangGraph and OpenAI Agents SDK provide the cleanest safety layers on top of custom retrieval.

Spec discipline for agentic RAG

The architect’s agentic-RAG design document names, for each agent:

  • Corpus identity and versioning
  • Embedding model + version + re-embedding plan (Article 24)
  • Chunking strategy
  • Retriever type (vector, hybrid, lexical) and top-k
  • Re-ranker (if any) with its model identity
  • Hop-budget per task class
  • Freshness SLA and staleness-detection mechanism
  • Sanitizer chain applied to retrieved content
  • Source-trust tagging scheme
  • Policy-engine rules for retrieval-suggested tool calls
  • Evaluation plan including trajectory metrics (Article 17)
  • Cost expectations per task class

Without this spec, retrieval behavior drifts; with it, the agent’s knowledge access is auditable and economically bounded.

Real-world anchor — Perplexity AI public architecture

Perplexity AI is the most visible consumer-scale agentic-RAG deployment as of 2024–2025. Its architecture — a search-agent that issues queries, retrieves, synthesizes, and cites — exemplifies the multi-hop pattern at production scale. Perplexity’s public posts and engineering discussions have documented the cost-per-query trade-offs, the freshness SLAs against web content, and the challenge of keeping citations correctly bound to specific claims. The architect’s reading list includes Perplexity’s blog entries on hop efficiency and citation integrity. Source: perplexity.ai blog.

Real-world anchor — LlamaIndex agentic-RAG tutorials

LlamaIndex’s documentation (docs.llamaindex.ai) provides the most comprehensive public tutorials for agentic-RAG patterns. The SubQuestionQueryEngine, MultiStepQueryEngine, and self-correcting variants come with worked examples that map directly to the four patterns in this article. Architects should read the documentation as a complement to the architectural framing here — the examples demonstrate the frameworks’ abstractions against specific datasets. Source: docs.llamaindex.ai.

Real-world anchor — LangGraph self-correcting RAG

LangChain’s LangGraph examples for self-correcting RAG (Reflexion-style retrieval) are the canonical state-graph implementation. They demonstrate how to place the critique-and-re-retrieve loop as explicit graph nodes, with clean replay and inspection. The pattern scales to production regulated workloads where the trajectory must be auditable. Source: langchain.com blog and python.langchain.com/docs.

Closing

Four patterns, four injection surfaces, one evaluation pattern change, an order-of-magnitude cost shift. Agentic RAG is not RAG plus an agent; it is a composed system whose properties the architect specifies explicitly. Article 14 takes up the indirect-injection attacks that this chapter has repeatedly referenced, with the attention they deserve.

Learning outcomes check

  • Explain agentic-RAG vs static-RAG deltas across determinism, cost, and injection surface.
  • Classify four agentic-RAG patterns (planner-driven, multi-hop, self-correcting, adaptive) with fit-for-purpose criteria.
  • Evaluate an agentic-RAG design for retrieval-driven injection, hop budget, freshness SLA, and policy-engine coverage.
  • Design an agentic-RAG spec for a given corpus with the twelve-field spec discipline.

Cross-reference map

  • Core Stream: EATE-Level-2/M2.3-Art10-Retrieval-Augmented-Generation-in-Regulated-Workloads.md.
  • Sibling credential: AITM-CMD Article 6 (content-management angle).
  • Forward reference: Articles 14 (indirect injection), 17 (evaluation), 24 (re-embedding in lifecycle), 28 (data architecture for retrieval).