This article walks what the AITE-SAT architect contributes to Model and Produce reviews, the artefact set expected at each gate, and the contingency planning that keeps launches recoverable.
Model stage — the architect’s inputs
Model is where the build happens: prompts are designed, the retrieval index is built, the evaluation harness is populated, the integrations are wired, the guardrails are configured. The architect at Model is less a gatekeeper than a steward. The decisions made at Calibrate and Organise are being operationalised and new decisions are being taken daily. The architect’s job is to keep the decisions consistent, keep the ADR corpus current, and watch for decisions that warrant promotion to the ADR layer.
Reference architecture maintenance
The reference architecture drafted at Organise meets reality during Model. Components that sounded reasonable on paper turn out to be wrong for specific workloads. The architect updates the diagram as reality forces changes and writes an ADR for every material deviation. The phrase “we changed X because of Y” is either an ADR or a missing ADR; there is no third category.
Eval harness readiness
The evaluation harness (Article 11) must be running before Produce stage is entered. The architect verifies: the eval set is populated, the metrics are implemented, the thresholds are set, the reporting is working. An eval harness that does not exist by Model-exit will not exist by Produce-exit either; the habit must form during Model.
Security and privacy gate
The threat model (Article 14) is exercised against the actual build. Penetration testing and red-teaming happen during Model, with enough runway to remediate findings before Produce. The architect’s read on the security gate is whether the residual risk is acceptable for the use case’s risk classification.
Integration verification
Every legacy integration (Article 25) is tested end-to-end under realistic load patterns. Circuit breakers, timeouts, and idempotency keys are verified. The architect’s integration note at Model-exit confirms that the boundary contracts hold.
Cost model validation
The cost model v1 from Organise meets actual per-query economics. The model is updated to v2 with measured numbers. If the v2 cost model is materially different from v1, the architect triggers a cost review before Produce.
Produce stage — the architect’s inputs
Produce is the operational run-up to launch: deployment configuration, residency validation, environment promotion (Article 19), registry snapshots (Article 21), SLO instrumentation (Article 20), incident runbook activation. The architect is more visible here because the gate questions are tightly architectural.
Deployment plan
The deployment topology from Article 18 is now concrete: regions, zones, residency boundaries, failover, disaster recovery. The architect confirms the deployment plan matches the residency requirements and the SLO targets.
Canary and rollout
The release is rarely a big bang. The architect specifies the canary sample (traffic percentage, duration, SLO checks), the ramp schedule, and the rollback conditions. Canary failures must trigger automatic or manual rollback without debate; the architect makes sure the decision rules are written down.
Incident runbook drill
The incident runbook from Article 20 is drilled before launch. A quarterly kill-switch test is scheduled. The architect does not run the drills personally but confirms they are happening.
Registry snapshot
At Produce-exit the registry (Article 21) captures the release state: model version, prompt version, index version, code commit, eval set version, policy version. This snapshot is the basis for rollback and for evidence pack assembly.
Evidence pack
For high-risk systems the evidence pack for the EU AI Act conformity assessment (Article 22) is assembled at Produce-exit. The architect verifies completeness against Articles 9–15. Missing evidence is blocking.
Deliverables per gate
The artefacts an AITE-SAT architect puts in front of a Model-exit or Produce-exit review.
| Artefact | Model-exit | Produce-exit |
|---|---|---|
| Reference architecture (detailed, current) | Yes | Yes (snapshot) |
| ADR corpus | Yes (covering all decisions to date) | Yes (with Produce-stage additions) |
| Threat model (exercised) | Yes | Updated with remediation status |
| Eval harness (running, dashboarded) | Yes | Production-traffic eval online |
| Evaluation report with measured metrics | Yes (against Model-stage eval set) | Yes (with online-eval baseline) |
| Integration test results | Yes | Updated with production-traffic patterns |
| Cost model v2 | Yes (measured) | Yes (actualised against early traffic) |
| Security test results (pen test, red team) | Yes | Closed or explicitly accepted |
| Deployment plan | Draft | Final |
| Canary and rollout plan | No | Yes |
| Incident runbook | Draft | Final, drilled |
| Registry snapshot | N/A | Yes |
| EU AI Act evidence pack (if high-risk) | In progress | Complete |
| Contingency plan | Draft | Final |
The contingency plan
An AI system will have bad days. The contingency plan answers what the team does when those days come. A production-ready contingency plan covers:
Rollback path. Exactly which command rolls the system back to a known-good state. The command is tested during Model stage. Rollback time target is stated (typically under 15 minutes for critical systems).
Graceful degradation. What the system does when its AI component is unavailable. For a customer-support assistant, fall back to a scripted FAQ plus human-agent routing. For a RAG search tool, fall back to keyword search. Never fall back to nothing.
Kill-switch. The single control (Article 20) that removes the AI feature from the user experience entirely. Who can trigger it, how fast, for how long.
Incident escalation. Who gets paged, when, and for what incident classes (Article 20). The architect’s name is typically on the escalation for severity-1 AI incidents regardless of role.
Customer communication. If the feature is externally visible, what the status page says and how customers are notified of degraded service. Draft language is reviewed by communications, legal, and the architect before launch.
Regulatory notification. For high-risk systems, the plan includes regulatory notification paths per the EU AI Act’s market-surveillance obligations (Articles 26, 73) and any sector-specific requirements.1
The contingency plan is tested before launch. A plan that has never been exercised is not a plan.
Surviving the review without compromising governance
The hardest part of the Produce-exit review is resisting pressure to ship when the evidence is not ready. The pressure is real — revenue forecasts depend on launch dates, sponsors have made commitments, teams are tired. The architect’s discipline is to separate the question “can we ship” from the question “should we ship under these conditions.”
Practical rules that help:
Pre-commit to the go / no-go criteria. Before entering Produce stage, the team agrees in writing what has to be true at Produce-exit. If those conditions are not met, the launch slips. Writing them down in advance removes the ambiguity that compromise tends to thrive in.
Separate ship-readiness from perfection. Not every open item blocks launch. The architect distinguishes blocking items (must be true before launch) from backlog items (acceptable to ship with; must be remediated on schedule). Communicating that distinction clearly avoids two failure modes: shipping with hidden blockers and delaying indefinitely for items that should not block.
Document exceptions. If the team ships with a known deviation from best practice (for example, an evaluation coverage gap in one slice), the exception is documented with a remediation date and an accountable owner. Exceptions that are written down get fixed; exceptions that live in shared memory do not.
Request paper trail from reviewers. If a gatekeeper asks the architect to ship despite a concern, the architect requests that the decision is recorded in the gate minutes with the reviewer’s name. This is not confrontational; it is the same discipline that protects everybody’s work in the post-mortem.
Worked example — a Bank of England regulated model rollout
The Bank of England’s Prudential Regulation Authority supervisory statement SS1/23 on model risk management sets out expectations for financial institutions deploying models, including AI.2 A UK bank launching an AI-assisted credit-decisioning support tool at Produce-exit would expect its architect to have assembled:
- Reference architecture linked to ADRs for every material decision.
- A model-risk tiering that matches the SS1/23 framework.
- An eval harness and results showing the model meets the stated accuracy and fairness targets.
- Ongoing monitoring plan with drift detection and re-eval cadence.
- Deployment plan including data residency and segregation of duties in the access model.
- Incident runbook covering classical and AI-specific incident classes.
- Evidence pack for internal credit risk and second-line model-risk functions.
SS1/23 is not the EU AI Act but it is a directionally similar regulatory posture. Architects in regulated financial services routinely face both.
Worked example — OpenAI 20 March 2023 post-mortem
The public OpenAI post-mortem of the March 2023 Redis-library bug that exposed chat titles between users is a valuable training example.3 The incident happened at a scale that mature engineering organisations aspire to; the post-mortem was thorough and public. The architectural lessons a Produce-exit reviewer would draw:
- Dependencies on shared data-layer libraries (Redis client) were a risk that a typical AI reference architecture does not highlight.
- Cache invalidation and session isolation in an AI service carry the same risks as in any data-intensive service, amplified by the personal nature of prompts and responses.
- The contingency plan — taking ChatGPT offline while the bug was root-caused — was the right move even at significant product cost.
- Public communication and detailed post-mortems are themselves an operational asset.
The architect who has read this post-mortem and others like it is better prepared to design and review an incident runbook than one who has not. Reviewing public post-mortems is time well spent.
Anti-patterns
- Produce-exit reviews where artefacts are assembled during the meeting. The meeting is too late to assemble evidence. Artefacts should be linked into the readout days in advance.
- Canary plans with no rollback criteria. Starting a canary without stating what triggers rollback is common and pointless. Rollback conditions are part of the plan.
- Incident runbooks that have never been tested. The first incident is the wrong time to discover that the runbook does not work. Drill before launch.
- Evidence packs that cite artefacts that do not exist. Reviewers do check. An evidence pack citing a missing document is worse than an honest gap.
- Shipping under pressure without a documented exception. The exception is not embarrassing; the pattern of shipping under pressure without exceptions is.
Summary
At Model and Produce the AITE-SAT architect turns Calibrate and Organise commitments into operational reality. The artefact set expected is larger than at Organise and the review pressure is higher. The architect’s discipline is pre-commitment to go / no-go criteria, separation of blocking from non-blocking items, documentation of exceptions, and a contingency plan that has actually been exercised. The evidence pack assembled at Produce-exit is the same one that answers a notified body, a market-surveillance authority, or an internal risk committee.
Key terms
Model stage (COMPEL) Produce stage (COMPEL) Contingency plan Canary rollout Go / no-go criteria
Learning outcomes
After this article the learner can: explain Model and Produce gate artefacts; classify six architecture deliverables by stage; evaluate a Produce-exit architecture package for completeness; design a contingency plan for a given rollout.