Artifact Template: AI Solution Architecture Design Document

FlowRidge

AITE-SAT: AI Solution Architecture Expert — Body of Knowledge Artifact Template

How to use this template

This template is the primary design document the solution architect produces for any production AI feature. It is authored before detailed implementation begins, reviewed by the architecture review board at a named gate, and then carried forward as a living document for the life of the feature — amended when the architecture changes, re-reviewed at each substantive amendment, and retired only when the feature is retired.

The template has ten sections. All sections are required. A section whose content is “not applicable to this feature” is completed with one paragraph stating why. Empty sections are rejected by review. Content length is guidance, not a constraint; a well-scoped SaaS feature may produce a six-page document, a high-risk regulated feature often produces fifteen pages.

Copy this template, rename it with the feature’s name, and fill in each section. Leave the frontmatter intact.

Architecture Design Document — [Feature Name]

1. Context and scope

Two-to-four paragraphs stating what the feature is, who it serves, the business problem it solves, and the boundary of what it is and is not. Describe the user class (internal expert, business user, consumer, third-party partner), the regulatory environment (EU AI Act risk tier, ISO 42001 scope, HIPAA, PCI, MiFID, GDPR, other), and the operating-envelope constraints (data residency, latency ceiling, cost ceiling, availability target). Name the decision-making posture: does the feature decide, recommend, draft, or retrieve? A feature that drafts is a fundamentally different architecture from a feature that decides; be explicit.

2. Functional requirements

A numbered list of concrete capabilities the feature must deliver at launch. Each requirement is observable in production (can be demonstrated in an acceptance test) and is either required-for-launch or explicitly phased. A requirement like “the assistant answers questions” is not observable; “the assistant returns an answer and a source citation for 95 percent of in-domain questions on the golden set” is. Name the evaluation instrument for each requirement.

3. Non-functional requirements

Concrete targets, not aspirations. At a minimum:

Category	Target
Availability	[e.g., 99.5% monthly, with maintenance window stated]
Latency	[e.g., p50 1.2s, p99 6.0s]
Cost ceiling	[e.g., $0.08 per session, $12,000 per month across the envelope]
Throughput	[e.g., 180 requests per second at peak]
Data residency	[EU-only, country-specific, none]
Accessibility	[WCAG 2.2 AA or higher]
Language coverage	[enumerated]
Regulatory evidence	[named evidence artifacts, retention durations]

If a target is unknown at design time, name the signal that will produce the target and the date by which it will be produced.

4. Reference architecture

A diagram (inline or linked) plus narrative. The diagram shows, at minimum, the ingress tier, the policy and authentication boundary, the retrieval tier, the generation tier, the tool-use boundary (if applicable), the evaluation and telemetry path, and the state stores. Each component is annotated with its responsibility, its failure mode, and whether it is stateful.

The narrative walks a reader through a representative request from the user’s interface to the response. It calls out every place where personal, sensitive, or regulated data is written, transformed, or logged, and the redaction or access-control step at that place.

The diagram is technology-neutral. Components are named by capability (dense retriever, reranker, policy engine, telemetry bus) with a sidebar listing at least two viable implementations per capability drawn from distinct stack families. A build-it-yourself open-weight path and a managed-cloud path must both be representable.

5. Data architecture

Two components:

The corpus and data contracts. A list of every source system that contributes to the feature (retrieval corpus, context features, evaluation data, telemetry lake). For each source: owner, refresh cadence, sensitivity class, residency constraint, retention rule, and access-control scoping rule. A one-paragraph description of how retractions and source deletions propagate to the live feature within a stated SLA.

The state stores. A list of every store the feature maintains (vector index, cache, session store, feedback store, audit log). For each: the data class, the isolation model (single-tenant, multi-tenant with logical isolation, multi-tenant with physical isolation), the retention policy, the backup and recovery posture, and the access controls.

6. Model and prompt architecture

Model inventory. Every model the feature calls, direct or indirect. For each: the provider, the version or checkpoint, the contractual data-handling posture (whether inputs and outputs are used for training, retention at the provider, residency at the provider), the fallback path, the version-pinning strategy, and the revisit trigger for replacement.

Prompt and grounding architecture. The prompt-assembly rules, the retrieval-to-prompt boundary, the grounding rules (how the generator is instructed to attribute statements to sources), the refusal rules, and the style rules. Where the prompt is versioned and where the version is logged. Where the prompt is a template and where it is a free-composition.

Fine-tuning and adaptation. If any component is fine-tuned, the data, the protocol, the evaluation, the ownership, and the revisit trigger. If nothing is fine-tuned, state so explicitly — “no model weights are modified by this feature” is a defensible and auditable statement.

7. Evaluation and observability architecture

The three-layer evaluation plan (offline, online, human) with the detail expected by Lab 2:

Offline: golden set composition, anti-leakage discipline, scoring approach (deterministic plus LLM-as-judge where applicable), calibration protocol, CI integration.
Online: canary ramp protocol, guardrail set with breach windows and actions, post-deployment monitoring, sampling rate for online rubric scoring.
Human: review console, sampling rate, rater calibration, disagreement adjudication rule.

The observability architecture: trace schema, metrics set with SLO targets, log hygiene rules (what is verbatim, what is redacted, retention class), and the on-call runbook for at least three named failure scenarios.

8. Security architecture

Threat model summary. The OWASP LLM Top 10 categories (2025) mapped to the feature’s architecture: for each category, which component is the defender, what invariant is defended, what detection signal confirms a breach attempt, and what response is triggered. Specifically name the policy decision point, the redaction pipeline, the authentication and authorization boundaries, the tool-use authorization (if applicable), and the secret management. Name the invariant the system is engineered to preserve.

If the feature is agentic (has tool-use), name the agent-runtime invariant in one sentence in the style of Lab 3: the property that must hold at every step of every run, enforced by the runtime and testable in CI.

9. Operational architecture

How the feature runs in production day-to-day. Covers:

Deployment topology: regions, availability zones, traffic routing, failover behaviour.
Release path: environments, promotion gates, canary protocol (crosswalked to §7), rollback mechanics and time-to-rollback SLO.
Change management: who can change what, the review path for a prompt change versus a model-version change versus a retrieval-index change, the feature-flag posture, the kill-switch topology.
Incident response: the on-call rotation, the severity classification, the communication path, the post-incident-review cadence.
Cost management: the cost-attribution model, the budget alerts, the per-tenant cost controls, the cost-anomaly response.

10. Regulatory and evidence architecture

The regulatory mapping. For every relevant regulation or standard, name the clause(s), the evidence artifacts the feature produces, the retention duration for each artifact, the location of the retention store, and the access controls on the retention store. For EU AI Act high-risk features, explicitly map Articles 9 (risk management), 10 (data governance), 11 (technical documentation), 12 (record-keeping), 13 (transparency), 14 (human oversight), and 15 (accuracy, robustness, cybersecurity) to specific components and records.

For non-regulated features, state so: “This feature is not in scope for EU AI Act Article 6 or Annex III; the firm’s internal model-risk policy applies and is mapped in the appendix.” An explicit non-applicability statement is a defensible governance position; silence is not.

Review and sign-off

Role	Name	Decision	Date
Solution architect (author)	[…]	Authored	YYYY-MM-DD
Peer architect reviewer	[…]	Approved / requested changes	YYYY-MM-DD
Security reviewer	[…]	Approved / requested changes	YYYY-MM-DD
Privacy reviewer	[…]	Approved / requested changes	YYYY-MM-DD
Governance reviewer	[…]	Approved / requested changes / held	YYYY-MM-DD
Architecture review board	[…]	Approved / requested changes / held	YYYY-MM-DD

Commit to implementation is not authorized until all required rows show “Approved” and the document is committed to the architecture register.

Amendments

The document is amended when the architecture changes. Each amendment is a numbered entry with the date, the author, the summary of the change, the sections affected, and the re-approvals obtained. A material amendment (a change to the model inventory, a change to the regulatory scope, a change to the data-residency posture, a change to the tool-use boundary) requires a re-review by the architecture review board; a non-material amendment (a new observability metric, a change to the canary ramp schedule) may be self-approved by the architect and the peer reviewer.

Notes on use

When to use this template. Every production AI feature, regardless of risk tier. The template scales to the feature: a small recommendation feature produces a compact document, a high-risk regulated feature produces a longer one.

When a lighter template is acceptable. A pure internal experiment that will not reach production can use a reduced version that covers sections 1, 2, 4, and 7 only. The full template is required before any production traffic is routed.

Common errors in first-time use. Vague scope (“the feature uses an LLM”); non-quantified non-functional requirements (“fast and cheap”); technology-specific reference architecture without the capability-level abstraction; missing retraction-propagation SLA; missing invariant statement for agentic features; missing regulatory mapping for in-scope features. Peer reviewers catch these; architecture review boards treat them as blocking.

What follows. The architecture design document is the parent record. It is cross-referenced by: the evaluation harness spec (Template 2), the RAG data contracts (Template 3), the gateway policy (Template 4), and the agentic runtime SLO sheet (Template 5). Each of those companion artifacts is authored in parallel or downstream of this one.