Capstone: A Complete Reference Architecture Package

FlowRidge

This article walks the complete architecture package Sovran Capital’s AITE-SAT architect would produce, organised as the capstone deliverable against which the reader’s own capstone work will be assessed.

Scenario overview

Sovran Capital is a financial-services holding with roughly 12,000 employees across 14 EU member states. The talent-acquisition team receives roughly 200,000 CV applications per year across all open roles. The business case for AI assistance: recruiters spend disproportionate time on initial screening at the expense of meaningful candidate engagement later in the process. The hypothesis: AI-assisted initial screening can reduce recruiter time on first-pass review by 60% while keeping candidate-outcome fairness stable or improved.

The system’s output is a summarised assessment of each CV against the role specification, a highlighted list of relevant qualifications, and a ranking within the role’s candidate pool. The system does not reject candidates. Every candidate retained in the pipeline is reviewed by a human recruiter; the system’s role is first-pass triage support.

Regulatory classification

The use case is a high-risk AI system per Annex III.4 (employment, workers management and access to self-employment: AI systems intended to be used for recruitment or selection).¹ This means Articles 9 through 15, plus Articles 16 (obligations of providers), 26 (obligations of deployers), and 72–73 (post-market monitoring and serious incident reporting) apply in full.¹ GDPR applies throughout — Article 22 on automated decision-making, Articles 5, 6, and 9 on processing lawful basis.

Reference architecture

The five-plane architecture from Article 1, instantiated:

Client plane. Recruiter workstation with a CV-review UI. Integrated with Sovran’s ATS (a commercial SaaS) through a browser extension and an internal proxy. The UI displays the CV, the AI summary, the highlighted qualifications, and the ranking with its confidence indicator. The UI carries affordances for the recruiter to accept, adjust, or reject the AI’s view, and an escalation path to the hiring manager.

Orchestration plane. An internally-built thin orchestrator (not a third-party framework) handling: CV ingestion, layout parsing, PII detection and redaction for logging, retrieval against the role’s job description and competency library, model invocation, response validation, and ranking aggregation. Orchestration runs in an EU region.

Model plane. Dual-provider setup: primary is an EU-resident managed closed-weight API (Anthropic Claude or Mistral Large, behind a provider abstraction layer); fallback is a self-hosted open-weight model (Llama 3 or Mistral 7B) for cases where the primary is unavailable or where residency overrides require fully-internal processing. Both providers are used through a single abstraction so the orchestrator code does not know which model is serving.

Knowledge plane. Three corpora: the job descriptions and role-requirement library (actively maintained), the competency taxonomy (organisation standard), and the historical CV-screening rubrics (updated quarterly). All three in EU-resident pgvector on Postgres. No CV content is retained in the retrieval corpus; retrieval is only against role and rubric material. The candidate’s CV exists in the flow only for the duration of the screening; it is deleted from the orchestrator after the recruiter signs the review.

Observability plane. OpenTelemetry tracing; AI-observability on a Langfuse self-hosted EU deployment; SLO dashboards; cost and eval dashboards; refusal and escalation monitoring. Retention set to match EU AI Act Article 12 obligations (at least six months for the logs, extended for specific records of concern).

ADR corpus

The capstone’s ADR corpus covers the decisions made explicit in the reference architecture.

ADR-001 — Model family: Primary EU-resident managed closed-weight API; fallback self-hosted open-weight. Alternatives considered: single-provider managed, pure self-hosted, non-EU managed with data processing agreement. Decision driver: EU residency requirement; capability match; fallback continuity; exit cost.

ADR-002 — Orchestration framework: Internally-built thin orchestrator; not LangChain or LlamaIndex. Alternatives: LangChain (rich but heavy; integration testing burden), LlamaIndex (retrieval-focused fit but broader API surface than needed). Decision driver: orchestrator scope is narrow; test coverage of the critical path is higher with a small internal codebase; no framework upgrade risk. Revisit trigger: if scope expands beyond current reasonable bounds.

ADR-003 — Retrieval strategy: RAG over role descriptions and competency library only; no candidate data in retrieval corpus. Alternatives: CV-to-CV similarity (rejected for data governance and fairness concerns), no retrieval (rejected because role-specific criteria are essential). Decision driver: candidate data minimisation; prevent cross-candidate information leakage.

ADR-004 — Vector store: pgvector on Postgres; not a managed vector DB. Alternatives: Qdrant self-hosted, Weaviate, Pinecone managed. Decision driver: corpora are small to mid-sized; existing Postgres is already EU-resident and well-operated; operational familiarity. Revisit at 50M vectors or if query pressure degrades.

ADR-005 — Evaluation contract: Composite of summary accuracy, qualification-highlighting precision and recall, ranking fairness across protected classes, and end-to-end recruiter-satisfaction measured quarterly. Targets: summary accuracy ≥95% on gold set; qualification-highlight F1 ≥0.85; fairness (parity of false-negative rate across protected classes within 5 percentage points); quarterly recruiter-satisfaction ≥4/5. Action on regression: halt rollouts, full eval review.

ADR-006 — Fine-tuning boundary: No fine-tuning in v1. Revisit if managed-API prompting cannot achieve targets after 90 days of production experience. If fine-tuning, proceed on the open-weight fallback model and keep the primary unchanged.

ADR-007 — Human oversight: Mandatory recruiter review on every candidate; the AI never rejects. Appeal path for candidates who request it. Recruiters’ overrides are logged and feed eval-set expansion.

ADR-008 — Fallback plan: Primary provider unavailable -> automatic switch to self-hosted. Both unavailable -> recruiters proceed with traditional review; batches clear on next availability window. Cost-ceiling approach -> reduced-context mode (shorter retrieval, more compressed prompt); explicit UI indication.

ADR-009 — Logging and retention: Full prompt/response capture retained six months for Article 12 obligations. Extended retention (two years) for any candidate who appeals or for any incident. PII minimisation in logs (candidate name redacted except in the audit record itself).

ADR-010 — Bias-testing cadence: Quarterly fairness evaluation on held-out historical data with rebalanced test sets. Triggers: new role-type introduction; model version upgrade; corpus refresh.

Data governance document

Per Article 10 of the EU AI Act, the data governance document specifies:

Training data. No model fine-tuning in v1; the primary-provider model is used as-is. When fine-tuning is revisited (see ADR-006), training data will be synthetic exemplars generated from job descriptions, never real candidate CVs.

Retrieval corpora. Job descriptions (created by hiring managers; consent and authorship clear); competency taxonomy (organisation IP); historical screening rubrics (internal, reviewed for bias before inclusion).

Processing data. Candidate CVs processed transiently; deleted from orchestrator after recruiter signs the review. Retention is in the ATS (outside the AI system’s scope); GDPR retention policies apply there.

Provenance and consent. Candidates are notified in the application process that AI-assisted screening may be used and that a human reviews every application. Consent is not used as legal basis (GDPR Article 6(1)(b) — contract-necessary is appropriate). Candidates have the right to object and to request human-only review.

Bias assessment. A fairness test set is curated from historical applications labelled with outcome and protected-class attributes (processed under GDPR Article 9 safeguards for sensitive processing). Pre-launch assessment confirmed no disparate-impact pattern exceeding the 5-percentage-point threshold set in ADR-005. Quarterly reassessment scheduled.

Data quality. Role descriptions are reviewed by hiring managers quarterly; competency taxonomy updates annually; screening rubrics updated quarterly with legal and HR review.

Evaluation plan

Offline evaluation:

A 2,000-CV gold set with human-authored summaries, qualification highlights, and rankings, stratified by role type and protected class.
Summary quality measured via LLM-as-judge with a human-calibrated rubric plus human review on a rotating 10% sample.
Qualification-highlight measured as precision and recall against the gold set.
Ranking fairness measured as false-negative-rate parity across protected classes.
Weekly run during active development; monthly run in steady state.

Online evaluation:

Production-traffic sampling: 5% of live CVs double-reviewed by a recruiter and by the system independently; divergence flagged and reviewed monthly.
Recruiter-override rate tracked; unexpectedly high override rates trigger review.
Candidate appeal rate monitored; baseline established in first quarter.

Human review:

Quarterly fairness review with HR, legal, and external advisory input.
Monthly operational review with the product and data-science teams.

SLO and SLI target sheet

Drawn from Article 20:

SLI	Target	Window
Availability (parsable response)	99.5%	30 days
End-to-end latency p95	<6s	7 days
Summary-quality eval score	≥baseline	7 days
Qualification F1	≥0.85	30 days
Fairness (FNR parity)	<5pp	30 days
Cost per CV	<€0.08	30 days
Recruiter-override rate	<20%	7 days
Refusal rate	1–5%	7 days

Error-budget burn triggers: freeze rollouts; full eval review; cost audit if the cost SLO burns.

Threat model and security

Per Article 14 of the AITE-SAT curriculum:

OWASP LLM Top 10 mapping reviewed; LLM01 (prompt injection) and LLM08 (embedding weakness) are the highest-weighted concerns given candidates can submit free-text that the system ingests.
Candidate CVs are the primary prompt-injection vector. Mitigation: strict context segregation in the prompt (system instruction is never concatenated with candidate content); output validators; model instructed to disregard any instructions found within CV content; red-team suite tests injection attempts before each release.
Retrieval corpus poisoning is low-risk because corpora are internal and changes go through review, but the process includes a pre-ingestion classifier for policy-violating content.
Supply-chain security: model providers are contractually bound to notify Sovran of material changes; model versions are pinned in production.

Penetration testing scheduled bi-annually; external red-team contracted for annual review.

Incident runbook

Per Article 20, covering the five AI-specific incident classes plus classical classes. Drills are quarterly; the kill-switch (a single UI toggle plus a configurable feature-flag) is tested every quarter.

Escalation: on severity-1 incidents (safety bypass, widespread confabulation, prompt-injection exploit), the architect and the Data Protection Officer are paged. Article 73 serious-incident reporting is triggered where the incident meets the Act’s thresholds; the legal and compliance teams own the filing.

Cost model

Layer 1 (per-CV): retrieval €0.003 + model call €0.05 + observability €0.001 + overhead €0.001 = €0.055. Below ADR-005’s €0.08 ceiling; 30% headroom.

Layer 2 (annual run-rate): 200,000 CVs × €0.055 = €11,000 annual direct cost. Fixed costs (self-hosted fallback GPU capacity, dashboards, observability platform) approximately €60,000 annually. Total: roughly €71,000 for the AI system direct cost, against an expected recruiter-time saving significantly in excess of that. The business case is clear.

Layer 3 (portfolio): the system is one of seven AI workloads in Sovran’s portfolio; its monthly contribution is less than 5% of the portfolio total.

Deployment topology and residency

Per Article 18: full EU residency. Primary compute in Frankfurt; disaster-recovery in Dublin; EU-resident managed API with a Data Processing Agreement that restricts data to EU regions. No data, logs, or prompts leave EU jurisdiction.

Conformity assessment evidence pack

Per Article 22, the evidence pack for Sovran’s internal-control conformity assessment (Annex VI applies because the Annex III.4 category permits internal assessment):

Article 9 evidence: Sovran’s risk register including model-specific risk entries; threat model document; post-market monitoring plan.
Article 10 evidence: Data governance document above; bias-assessment reports; dataset cards for the three corpora.
Article 11 evidence: This capstone package itself is the technical documentation. Cross-linked to ADR corpus, threat model, eval plan, release manifest archive.
Article 12 evidence: Logging architecture spec; retention policy; sample log export.
Article 13 evidence: Recruiter UI specification; candidate-facing notification language; deployer documentation.
Article 14 evidence: Recruiter review workflow specification; kill-switch specification; override logging.
Article 15 evidence: Eval harness specification; accuracy declaration; robustness test results; penetration test report.

The evidence pack is reproducible: every document cited links to its source, and a quarterly refresh process updates every document to match production state.

Handoff artefacts

Per Article 34, the system is handed off to a named lead architect who is not the author of this package. The handoff artefact set is complete: architecture runway catalogue entry for the CV-screening service, the full ADR corpus, the system card, the five runbooks, the decision log, the evidence pack, the office-hours archive for sessions held during build, the onboarding document (five pages), and the roadmap for the next two quarters.

Post-launch operation

First quarter post-launch: weekly operational reviews; daily eval-score monitoring; first quarterly fairness review at month three; recruiter-feedback sessions twice a month.

Second quarter: cadence normalises to weekly operational, monthly business, quarterly architecture. The system card, evidence pack, and ADR corpus are refreshed at the quarterly architecture review.

Retirement review: scheduled at the 24-month mark by default. Triggers for earlier review covered in Article 30.

What the learner’s capstone should include

The learner’s own capstone — the final artefact for AITE-SAT certification — should include:

A use-case description (similar scope to Sovran).
Reference architecture diagram with five planes annotated.
ADR corpus of at least 10 ADRs covering the decisions that matter.
Data governance document appropriate to the use case.
Evaluation plan with offline and online components.
SLO and SLI target sheet.
Threat model.
Incident runbook outline.
Cost model at all three layers.
Deployment topology with residency.
Conformity-assessment evidence pack mapping (for high-risk use cases) or equivalent risk-based evidence (for lower-risk).
Handoff artefact set.

Total artefact length is typically 40 to 80 pages. The assessment is not for completeness against a checklist but for architectural coherence: the decisions fit together, the evidence is reproducible, the risks are explicit, and a reader unfamiliar with the system can follow the reasoning.

Summary

This capstone walks an end-to-end reference architecture package for a realistic high-risk EU AI system, showing how the thirty-four preceding articles compose into a single defensible deliverable. The learner’s own capstone emulates this package against a use case in their current work, under the review discipline of the AITE-SAT assessors. The goal is an architect who is ready to lead an enterprise AI deployment from Calibrate through retirement with the governance, cost, and quality discipline the market now demands and regulation now requires.

Key terms

Capstone (AITE-SAT)
Conformity assessment evidence pack
High-risk AI system (Annex III)
Reference architecture package
Handoff artefact set

Learning outcomes

After this article the learner can: explain the capstone artefact set; classify six deliverables by source article; evaluate the exemplar for alignment with EU AI Act obligations; design their own capstone on a comparable use case.