Skip to main content
AITE M1.1-Art51 v1.0 Reviewed 2026-04-06 Open Access
M1.1 Foundations of AI Transformation
AITF · Foundations

Lab 01: Design a RAG Reference Architecture for a Regulated Internal Knowledge Assistant

Lab 01: Design a RAG Reference Architecture for a Regulated Internal Knowledge Assistant — AI Strategy & Vision — Advanced depth — COMPEL Body of Knowledge.

6 min read Article 51 of 48

AITE-SAT: AI Solution Architecture Expert — Body of Knowledge Lab Notebook 1 of 5


Scenario

You are the solution architect assigned to PolicyPilot, an internal knowledge assistant for underwriters and claims specialists at a composite European insurer headquartered in Dublin, with operating branches in Madrid and Frankfurt. The assistant must answer natural-language questions over a corpus of roughly 180,000 documents: product wordings, reinsurance treaties, claims guidelines, regulatory circulars from the EIOPA, the Central Bank of Ireland, and BaFin, and internal underwriting memos. Answers must cite the source paragraph; hallucinated citations are a release blocker. The product is scoped as a decision-support feature (not a decision-making system), so under the EU AI Act it is not presumed high-risk, but the parent system supporting underwriting decisions is Annex III high-risk, and PolicyPilot must be engineered so its evidence pack can fold into that system’s conformity assessment. Data residency is EU-only; personal data appears in claims memos and must be handled under GDPR Article 9 (special categories, given medical context in life and health claims).

Business targets at steady state are 1,200 internal daily active users, median question-to-answer latency under 6 seconds, answer-acceptance rate (a thumbs-up rate with a calibrated definition) of 70% or higher on a held-out underwriter golden set, and per-user marginal cost under 8 cents per session. Build-versus-buy is open; the architect’s recommendation must be defensible against both a managed-API path and a self-hosted open-weight path.

Your deliverable is a complete architecture package submitted to the Model Risk Committee.

Part 1: Reference architecture diagram and narrative (60 minutes)

Produce a reference architecture that shows, at a minimum, the ingress tier, the retrieval tier, the generation tier, the evaluation path, the observability path, and the governance boundary. At each tier, annotate:

  • The component’s responsibility in one sentence.
  • The failure mode the component is the primary defense against.
  • Whether the component is stateful, and if so where its state lives.
  • The authentication or authorization step present at each inter-tier boundary.

The diagram must be technology-neutral: name capabilities (dense retriever, sparse retriever, reranker, policy engine, telemetry bus, audit log) rather than vendors, and provide a sidebar that lists at least two viable implementations per capability drawn from different stack families. At least one implementation per capability must work on a self-hosted open-weight path (for example, Llama 3 or Mistral via vLLM, pgvector or Qdrant, bge-reranker) and at least one must work on a managed cloud-API path (for example, Bedrock, Azure AI Foundry, or Vertex AI with Pinecone or Weaviate).

Write a 400-to-500-word narrative that walks a reader through the request lifecycle from the underwriter’s browser to the cited answer. Call out explicitly where a personal identifier could be reflected into a log or prompt, and what redaction occurs before that point.

Expected artifact: PolicyPilot-Reference-Architecture.md with a single system diagram and the narrative.

Part 2: Data-contract register (40 minutes)

Your index must be trustworthy. Produce a data-contract register listing every source system that feeds the index, with, for each source:

FieldWhat to record
Source ID and ownerTeam name, accountable individual
Refresh cadenceEvent-driven, hourly, daily, weekly, or bounded-staleness
Sensitivity classPublic, internal, confidential, restricted (GDPR-special)
Residency constraintEU-only, country-specific, none
Ingestion SLATime from source update to index availability
Retention ruleDelete-from-index rules for source-deleted or retracted documents
Access filterTenant, business-unit, or jurisdiction scoping applied at retrieval

Include a one-paragraph section on how a document retraction (for example, a withdrawn EIOPA guideline) propagates from the source repository to the live index within the stated SLA, and how a retrieval that surfaced the retracted paragraph before the propagation completed is detected and remediated.

Expected artifact: PolicyPilot-Data-Contract-Register.md with the source table and the retraction paragraph.

Part 3: Evaluation plan aligned to the production path (30 minutes)

Produce a three-layer evaluation plan: offline, online, and human review. Specify:

  • Offline. The golden set composition (size, domain coverage, refresh cadence, ownership), the faithfulness and citation-validity checks, the retrieval-quality metrics (hit-rate-at-K, reciprocal rank), and how the set defends against leakage into the training of any fine-tuned component.
  • Online. The canary ramp protocol, the guardrails (latency, refusal rate, unsafe-content rate, cost per session), and the rollback trigger on each guardrail.
  • Human review. The sampling rate for human rating, the rubric dimensions (grounding, completeness, style), the rater calibration cadence, and how disagreements with an LLM-as-judge pipeline are reconciled.

Name a tracking platform for each layer (for example, Langfuse, Arize, Humanloop, MLflow, or a build-your-own logging stack) with a one-sentence rationale. Name two alternatives. The naming is a realism exercise; endorsement is not.

Expected artifact: PolicyPilot-Evaluation-Plan.md with the three layers and guardrail table.

Part 4: Architecture decision record for the generator choice (20 minutes)

Produce a two-page ADR for the generator decision. Use the standard ADR structure (context, decision, consequences, alternatives, revisit trigger). The decision must include:

  • The chosen stack family (managed API, cloud platform, or self-hosted open-weight) with the primary and secondary candidate models.
  • The grounding and refusal behaviour expected of the generator, and how the chosen model has been verified to exhibit them.
  • The data-egress posture: what, if anything, leaves the EU boundary, and the contractual and technical controls in place.
  • The revisit trigger: the condition under which the committee reopens the decision (a material change in model pricing, a shift in regulatory guidance, or a sustained evaluation gap).

The ADR must be defensible. A reviewer should be able to read it and follow the reasoning without having participated in the discussion.

Expected artifact: ADR-001-Generator-Choice.md.

Final deliverable and what good looks like

Package the four artifacts into PolicyPilot-Architecture-Package.md with a one-page executive summary stating the target operating envelope, the residual architecture risks, and the go-to-build recommendation with conditions.

A reviewer will look for: completeness across all four parts; multi-stack parity (at least one open-weight and one managed-API implementation named per capability); an explicit grounding and citation-validity test; a concrete retraction-propagation SLA; and an ADR that takes a position and names the revisit trigger. Vague architecture (“an LLM generates the answer”) and single-vendor architecture (“PolicyPilot uses vendor X”) both fail review.


© FlowRidge.io — COMPEL AI Transformation Methodology. All rights reserved.