Financial Services Agentic Patterns

FlowRidge

The architect in FS inherits an expectation: every model has an owner, an independent validator, a documented intended use, a monitoring plan, and a retirement procedure. Agentic systems multiply these expectations — now the “model” is a composite of LLM + prompt + tools + memory — and the architect must decompose the composite and apply the controls to each component.

The regulatory backbone

US Federal Reserve SR 11-7 (2011, reinforced in 2021 and by OCC 2011-12). Defines model risk management expectations: effective challenge, independent validation, ongoing monitoring, documentation, governance. The doctrine predates LLMs but its “effective-challenge” requirement applies directly: someone other than the model developer must be able to evaluate the model’s fitness for purpose.

Bank of England PRA SS1/23 (2023). Extends model risk management doctrine to UK banks with explicit mentions of AI/ML. SS1/23 introduces additional focus on reproducibility, data-lineage, and model-change controls that map cleanly to agentic systems.

EU AI Act Annex III.5(b). Classifies as high-risk AI systems used for “evaluating the creditworthiness of natural persons or establishing their credit score.” Credit-underwriting agents — whether they decide or materially inform a decision — fall inside this classification.

Market-abuse regulation (MAR in the EU; SEC rules in the US). Constrains agents operating in or near trading: algorithmic-trading registration, surveillance, suspicious-trading reporting obligations.

Consumer Duty (UK FCA, 2023). Requires firms to deliver “good outcomes” for retail customers. Agents interacting with retail customers must be evaluable against consumer-outcome metrics — not just technical metrics.

Five sector-specific architecture patterns

Pattern 1 — Synchronous HITL on transactional actions

Any agent that authorizes a financial action — approves a refund, issues a payment, executes a trade, waives a fee — runs with synchronous human-in-the-loop (HITL) above a threshold.

Threshold design:

Below threshold: auto-approval with audit logging.
At threshold: HITL confirmation before execution.
Above threshold: HITL with independent secondary approval.

Thresholds are set by the product + risk team, documented in the ADR (Article 2), and enforced by the policy engine (Article 22). The threshold is monetary (dollar cap per action, per day, per customer), reputational (refund on a regulated complaint), and systemic (action that would affect multiple accounts).

Pattern 2 — Deterministic fallback for outages

Financial services systems fail in regulated ways. When the agent cannot reach the model, cannot reach its tools, or fails its safety classifiers, the system reverts to a deterministic fallback — typically human handoff or a deterministic-rules fallback — rather than degrading silently.

Architect’s deliverable: the fallback mode is documented, tested on a schedule, and surfaced to the customer with the Article 50 disclosure.

Pattern 3 — Enhanced audit trails

FS audit trails are a pre-existing regulatory expectation. For agentic systems:

Every tool call that moves money or changes a record is logged with: agent version, prompt version, model version, inputs, outputs, authorization decision, HITL outcome (if applicable).
Audit logs carry a reconstructable lineage (Article 28) sufficient to answer “why did this action happen” months later.
Retention typically 7+ years (jurisdiction-specific; Sarbanes-Oxley, MiFID II).

The architect ensures the audit trail design is reviewed by internal audit at design time, not discovered at audit time.

Pattern 4 — Independent model risk validation

SR 11-7 requires independent validation separate from development. For agentic systems the validation scope extends to:

Model version and training regime.
Prompt versions (system + task prompts).
Tool schemas and authorization policies.
Memory-write policies and retention.
Evaluation battery and results.
Monitoring plan.

Validators apply “effective challenge” — they probe the agent’s assumptions, test edge cases, and recommend changes before the agent enters production use. In practice, validators often ask for an adversarial evaluation battery (Article 17) they can run independently.

Pattern 5 — Boundary between “decision” and “support”

Regulated credit, insurance, and market-abuse contexts care whether the agent made a decision or informed one. The architect draws the line at design time:

Agent decides, human accepts by default: high-risk under EU AI Act Annex III; full conformity assessment; Article 22 GDPR rights to explanation and to contest; SR 11-7 validation.
Agent informs a human decision, human decides: lower regulatory burden if the human genuinely decides; requires the HITL design in Pattern 1 to be real, not rubber-stamp.

Three FS use cases, classified

Use case A — Internal back-office reconciliation agent.

Regulatory tier: not Annex III; some model-risk expectation if outputs feed financial reporting.
Architecture: agent reads ledgers, identifies discrepancies, proposes journal entries; HITL on journal entry posting; audit trail; no customer-facing output.

Use case B — Wealth-management client assistant (Morgan Stanley-style; public disclosures 2024).

Regulatory tier: not Annex III unless providing credit-scoring-adjacent advice; subject to suitability and best-execution rules.
Architecture: agent assists advisors with research synthesis and meeting prep; advisor remains the decision-maker; Article 50 disclosure to clients if AI-generated content shared; outputs reviewed by advisor; audit trail of research-derived recommendations.

Use case C — Consumer-credit decisioning agent.

Regulatory tier: EU AI Act Annex III.5(b) high-risk; SR 11-7 / PRA SS1/23 apply in scope jurisdictions; ECOA adverse-action notice rules in US; GDPR Article 22 in EU.
Architecture: agent produces decision or recommendation; synchronous HITL above threshold on adverse decisions; fairness evaluation battery; Article 14 evidence pack; Article 22 contest path; explanations provided with each adverse decision.

Operational specifics

Market-abuse surveillance. Agents operating near trading surfaces must log trading-relevant context; surveillance analysts must be able to reconstruct agent-generated messages that could constitute market manipulation. The agent’s logs feed the surveillance queue.

Best-execution evidence. Wealth-management agents that inform trading or research decisions generate artifacts that may be discoverable in suitability reviews; retention policy must anticipate this.

Conduct risk. UK Consumer Duty and similar regimes require firms to demonstrate good outcomes for retail customers. The agent’s evaluation battery (Article 17) must include consumer-outcome proxies (resolution rate, complaint rate, comprehension score on explanations), not just technical metrics.

Operational resilience. EBA guidelines on ICT and security risk management (2019, refreshed 2024) and DORA (EU, effective January 2025) bring financial-services operational resilience expectations to AI systems. The architect coordinates with the operational-resilience program to register the agent in the critical-or-important function inventory where applicable.

Framework selection in FS contexts

Financial-services architects often face additional selection constraints when picking an agentic framework:

Supplier concentration risk. FS regulators (notably the PRA in the UK, and increasingly the EU under DORA) examine concentration of ICT services. Choosing the same framework, runtime, model provider, and observability vendor consolidated under a single hyperscaler can trigger concentration-risk review. Multi-vendor architectures help but add complexity.
Data-residency. EU FS entities routinely require EU-region processing; architects verify the framework and model providers honour residency.
Auditability depth. Frameworks with clean trace semantics (LangGraph’s state-graph) and deterministic replay help meet SR 11-7 and DORA audit expectations.
Deterministic fallback support. FS patterns rely on deterministic fallback (Pattern 2); frameworks that cleanly separate the agentic path from a deterministic path simplify the design.
MCP ecosystem maturity. FS data systems are often SOR-heavy; MCP servers for core-banking or policy-admin systems are emerging but not universal.

The architect documents the selection rationale in ADRs (Article 36) and revisits the decision annually as the landscape matures.

Real-world anchors

Morgan Stanley wealth-management assistant (public disclosures 2023–2024). Morgan Stanley publicly described a GPT-4-based knowledge assistant for financial advisors. The pattern — advisor as the decision-maker, assistant as research accelerator, curated knowledge base, extensive internal evaluation — demonstrates Pattern 5 (support, not decide) executed well.

Klarna customer-service deployment (2024). Klarna publicly reported an agentic customer-service deployment handling volume equivalent to hundreds of FTEs. Financial-services relevant features include the HITL escalation for complex queries and the monitoring of resolution quality.

US Federal Reserve SR 11-7 guidance. The enduring template for model risk management; the architect quotes SR 11-7 language in ADRs to connect the agentic design to supervisor expectations.

Anti-patterns to reject

“Our existing model-risk framework covers it.” It covers part; the agentic-specific components (prompt, tools, memory) need named scope extensions.
“HITL is too slow for our volume.” HITL above threshold is not a throughput concern; mis-sized thresholds are.
“Fallback is documented but not tested.” Untested fallback is not resilience.
“The agent decides; validation is the validator’s problem.” Validation is a joint activity; the architect partners with validators from Calibrate onward.

Learning outcomes

Explain financial-services agentic architecture constraints — SR 11-7, PRA SS1/23, EU AI Act Annex III.5(b), Consumer Duty, market-abuse rules.
Classify three FS use cases by regulatory tier and map each to architecture patterns.
Evaluate an FS agentic design for SR 11-7 fit, Annex III completeness, and fallback adequacy.
Design an FS agentic plan including HITL thresholds, audit trail, deterministic fallback, and validation partnership.