Capstone — A Complete Agentic Reference Architecture Package

FlowRidge

COMPEL Specialization — AITE-ATS: Agentic AI Systems Architect Expert Article 40 of 40

Thesis. The 39 articles preceding this one each explained a building block. This capstone assembles them into a complete exemplar — a worked package for a fictional but realistic multi-agent back-office system at an EU-regulated insurer. The artefacts produced here are what the architect submits at each gate review. The learner’s capstone-assessment task is to produce an analogous package for a different use case.

The fictional system is called ClaimsAgent at Acme Insurance, an EU insurer. It is a multi-agent system that intakes first-notice-of-loss claims, triages them, routes them, produces adjustor-support recommendations, and maintains audit trails for regulators. It is EU AI Act Annex III.5 high-risk (access to essential services). The design is realistic but not a real company.

Package contents

The capstone package contains seven artefacts, each produced during specific gate stages.

Reference architecture.
Autonomy statement.
Architecture Decision Records (ADRs).
Kill-switch design.
Evaluation plan.
Operating runbook.
EU AI Act Article 14 evidence pack.

Artefact 1 — Reference architecture

System overview. ClaimsAgent is a hierarchical multi-agent system (Article 29) with four specialised agents coordinated by a supervisor agent:

Intake agent. Extracts claim facts from customer submissions (text, images, structured forms). L2 autonomy — outputs require human confirmation before advancing.
Triage agent. Classifies claim severity, routes to appropriate processing track. L2 — recommendation reviewed by adjustor.
Evidence agent. Requests supporting documents, tracks receipt, surfaces gaps. L2 — generates requests that adjustor confirms.
Recommendation agent. Produces adjustor-support recommendation (settle / investigate / decline rationale), with citations to policy terms and precedent cases. L2 — adjustor decides.

The supervisor agent coordinates the four specialists, maintains task state, and handles exceptions. No agent has L3+ autonomy; every consequential decision is an adjustor’s decision, informed by agents.

Runtime. LangGraph supervisor-and-workers pattern (Article 3). Self-hosted on Kubernetes in EU regions (data-residency).

Models. Primary: Anthropic Claude 3.5 Sonnet. Fallback: OpenAI GPT-4 class. Both accessed via direct API with EU data-residency terms. Model version pinned per agent; change-controlled per Article 24.

Tool set. 14 tools across the four agents, all registered in the tool registry (Article 26), all exposed via MCP where external. Notable tools: claim-system lookup, policy-database retrieval, customer-record retrieval (RLS-scoped), evidence-upload accept, payment-system proposal (no direct payment authority — the agent proposes, an adjustor releases).

Memory. Per-claim short-term context. Long-term semantic memory of policy-interpretation precedent (curated by underwriting). No cross-tenant memory; row-level security on all stores.

Safety layer. Four-plane defense (Article 27). Input classifiers for injection and for policy-sensitive language; tool plane with OPA policy engine and schema validation; memory plane with provenance-tagged writes and quarterly anomaly review; egress plane with PII-redaction classifier and Article 50 disclosure on customer-facing output.

Observability. OpenTelemetry end-to-end; traces correlated by claim ID; prompt/tool/memory/policy-engine decisions logged; 10-year retention per insurance-industry norms.

Kill-switch. Per-session (adjustor-triggered); per-agent-class (ops-triggered); platform-level circuit breaker. See Artefact 4.

Artefact 2 — Autonomy statement

Intended use. Produce adjustor-support recommendations and process claim intake; improve speed and consistency while preserving adjustor decision authority.

Autonomy level. L2 across all four specialist agents. No agent is authorized to make a binding decision that affects a customer’s coverage or payment without adjustor confirmation.

Boundaries.

ClaimsAgent does not decide claim outcomes.
ClaimsAgent does not communicate binding determinations to customers.
ClaimsAgent does not access customer PII outside claim-relevant scope.
ClaimsAgent does not draft fraud-investigation referrals (separate regulated process).

Escalation paths. Low-confidence recommendations automatically route to senior adjustor. Suspected fraud routes out of the system entirely. Regulatory-complaint indications trigger legal-review workflow.

Expansion considerations (Learn-stage decisions). Future consideration: raising L2 to L3 on low-complexity claims below €2,000 after 12 months of demonstrated L2 performance within target. Contingent on fairness evaluation demonstrating no protected-class disparity in outcomes.

Artefact 3 — Architecture Decision Records (excerpted)

ADR-001 — Choose LangGraph over CrewAI for runtime. Context: four-agent hierarchical system with strong state-graph semantics. Decision: LangGraph. Rationale: state-graph modelling aligns with claims workflow; checkpointing support matters for long-running claims; team has more LangGraph experience; CrewAI considered but declined for weaker state-machine semantics. Consequences: training investment in LangGraph; tighter coupling to LangGraph community pace.

ADR-003 — Require MCP exposure for new tools. Context: need portability and governance discipline. Decision: all new tools expose MCP manifests; non-MCP wrappers permitted only for existing internal systems (grandfathered). Rationale: reduces exit cost; aligns with 2025+ industry direction. Consequences: initial integration effort higher; long-term portability stronger.

ADR-004 — Synchronous HITL on all customer-impacting actions. Context: Annex III high-risk; customer-facing representation matters. Decision: every customer-impacting action (settlement offer, coverage decision, binding communication) is gated by adjustor confirmation. No automation of customer communication. Rationale: Article 14 oversight; customer fairness; insurance-regulator expectations. Consequences: throughput constraint manageable because adjustor capacity is not the bottleneck; latency is acceptable.

ADR-009 — pgvector self-hosted for memory. Context: need tenant isolation via row-level security; cost at scale; data-residency in EU. Decision: pgvector on self-hosted PostgreSQL cluster. Rationale: RLS supports tenant isolation; self-hosted in EU region; cost scales predictably. Alternatives (Pinecone, Weaviate-commercial) declined on residency + cost curves.

Artefact 4 — Kill-switch design

Per-session kill-switch. Adjustor-triggered via UI. Immediate halt of current claim’s agent processing; session state preserved for forensics. Invocation logged. Tested in drill quarterly.

Per-agent-class kill-switch. Ops-triggered via internal console. Halts all sessions using the specified agent class (e.g., “halt all triage-agent sessions”). Takes effect within 10 seconds. Affected customers receive standard temporary-unavailability notice; workflow routes to fully-manual adjustor handling.

Platform-level circuit breaker. Coupled to deadman signals — if the observability pipeline stops reporting for >60 seconds, the circuit breaker opens and all agentic processing halts until human ops clears it. Designed to catch silent failure.

Drill schedule. All three kill-switches are exercised quarterly. The architect participates; drill reports capture findings.

Artefact 5 — Evaluation plan (condensed)

Golden tasks. 200 curated claims across severity levels with known correct outcomes. Pass criterion: agent recommendation agrees with expert adjustor on 85% of severity classifications and on 80% of settle/investigate/decline recommendations.

Adversarial battery. OWASP Agentic Top 10 tailored to insurance: injection via claim narratives, policy-question hijack, PII extraction attempts, tool-misuse via crafted customer inputs, memory-poison attempts via repeated claims. Pass criterion: ≥99% detection across the battery.

Simulation environment. Sandbox with synthetic claim population representative of production mix; adjustor-simulator for HITL-interaction testing.

Calibration evaluation. Confidence-vs-accuracy curve required within target; systematic over-confidence triggers retraining.

Fairness evaluation. Outcome distribution across protected attributes (age, gender, postcode proxy for socio-economic) within statistical bounds. Independent validator reviews quarterly.

Cost and latency. €0.35 per claim at p95 within target; latency p95 ≤ 18 seconds.

Artefact 6 — Operating runbook (extract)

Incident class — goal hijack via claim narrative. Detect: output classifier flags off-topic or policy-violating content; anomaly on semantic distance between claim topic and agent response. Contain: kill-switch the session; invalidate session memory; block the source claim ID from re-ingestion until reviewed. Remediate: patch input-validation pattern; add regression test; broadcast to pattern library. Post-mortem: template-based within five business days; architect reviews.

Operational tempo. 24×5 ops coverage (business hours in primary EU regions); on-call for weekends; escalation to architect on P1/P2.

Change-management integration. All model, prompt, tool, and policy changes pass through the change-management workflow; emergency-change exception available with post-hoc review within 48 hours.

Artefact 7 — EU AI Act Article 14 evidence pack (extract)

Design evidence. Reference architecture (Artefact 1); autonomy statement (Artefact 2); ADRs (Artefact 3). Covers “measures enabling the provider to effectively implement human oversight.”

Documentation evidence. Adjustor operator manual (40 pages); customer-facing disclosures (Article 50 layer); overseer training curriculum and materials; ADR rationale.

Monitoring evidence. Observability dashboards showing adjustor-override rate, confidence-calibration curves, fairness metrics, incident counts. Monthly review schedule.

Training evidence. Records of adjustor training completion by named adjustor; records of refresher training annually; records of overseer-simulator outcomes in training.

Change-management evidence. Version histories for agents, prompts, tools, models; promotion decisions; validation sign-offs; retirement plans.

Cross-reference map

Reference architecture ← Articles 1, 3, 11, 20, 27, 28, 29.
Autonomy statement ← Articles 2, 10.
ADRs ← Articles 3, 5, 11, 22, 26, 39.
Kill-switch design ← Articles 9, 21.
Evaluation plan ← Articles 15, 17, 18.
Operating runbook ← Articles 24, 25, 35.
Article 14 evidence pack ← Articles 23, 26, 36, 37.

Capstone assessment task (for the learner)

The learner’s task is to produce an analogous seven-artefact package for a use case of their choosing. The assessment evaluates:

Completeness. All seven artefacts present.
Correctness. Artefacts consistent with each other (the evaluation plan covers the autonomy statement’s envelope; the kill-switch design covers the reference architecture’s risk surfaces).
Regulatory alignment. Article 14 evidence pack plausibly meets the regulator’s bar; other applicable frameworks referenced.
Fit to use case. The package is specific to the chosen use case, not generic.
Tradeoff clarity. ADRs show the alternative-rejected rationale.

Real-world anchors

EU AI Act Annex III.5 — access to essential services. Regulatory text establishing the high-risk classification that drives the ClaimsAgent design choices. Annex III.5 explicitly covers insurance access for natural persons.

ICO AI Auditing Framework public guidance. UK Information Commissioner’s guidance for auditing AI systems — informs the evidence-pack structure and the data-protection angle.

Public insurance-sector AI deployment disclosures. Lemonade public engineering content on claims processing AI; Zurich and Allianz AI-adoption press and disclosures as benchmark for industry pace.

Final remarks

The architect’s job is not to know every answer. It is to structure the problem so the organisation can answer it, defensively against incident, in conversation with regulators, and in service of customers. The seven-artefact package is the structural commitment. Articles 1–39 provided the building blocks; this capstone shows how they stack. Congratulations to the AITE-ATS learner completing the full course — you are now the senior agentic voice your organisation needs at every stage gate.

Learning outcomes

Explain the seven-artefact capstone package and the source articles feeding each deliverable.
Classify seven capstone deliverables by the source article set and by the gate stage of primary commitment.
Evaluate the ClaimsAgent exemplar for EU AI Act Article 14 alignment, Article 50 disclosure, Article 73 reporting hooks, and internal control adequacy.
Design the learner’s own capstone on a comparable use case with all seven artefacts, demonstrating composition of the previous 39 articles’ material.