The Enterprise AI Reference Architecture

FlowRidge

Enterprise AI Reference Architecture — Seven Layers

Experience

User interfaces + clients

WebAPIAssistant

Orchestration

Agents, workflows, routing

Agent runtimePolicy engine

Model layer

LLMs, fine-tunes, specialised

ServingGateway

Data + retrieval

Vector, feature, lineage

RAGFeatures

Platform

Compute, observability, security

K8sSIEM

Governance

Policy, audit, registry

RegistryEvidence

Foundation

Identity, network, storage

IAMVPC

Figure 337. Enterprise AI is layered. Each layer has distinct SLOs, distinct owners, and distinct regulatory touchpoints.

AITE-SAT: AI Solutions Architect Expert — Body of Knowledge Article 1 of 35

An AI solutions architect joins a project and is asked a hundred questions before the first design review. Which model. Which vector store. Which orchestrator. Which cloud. Which cache. Which evaluation tool. Which logging format. Which authentication scheme. Which rate-limit policy. The architect who tries to answer them in the order asked builds an incoherent system. The architect who answers them in the order of the reference architecture builds a coherent one. This article establishes the reference architecture that AITE-SAT teaches: the five-plane model, the responsibilities of each plane, and the way the rest of the credential refines it.

Why a reference architecture

An enterprise AI system is not a model. It is a set of cooperating components that together turn a user request into a governed response. The model is one of those components. The others are as consequential as the model and are far more often the source of production incidents. Separating the system into planes makes the architect’s decisions local: a change to the retrieval strategy is a change to the knowledge plane and does not force a change to the orchestration plane; a change to the observability tool is a change to the observability plane and does not force a change to the model plane. Without a reference architecture, every decision is global, every change is a rewrite, and every incident becomes a cross-team investigation.

The reference architecture in this article is technology-neutral by construction. It works whether the model is a managed API from Anthropic, OpenAI, or Google, or an open-weight model such as Meta’s Llama 3 or Mistral served on a self-hosted vLLM cluster. It works whether the orchestrator is LangChain, LlamaIndex, DSPy, Semantic Kernel, LangGraph, or a bespoke library written in the organization’s preferred language. It works whether the vector store is Pinecone, Weaviate, Qdrant, pgvector, Milvus, or OpenSearch k-NN. It works whether observability flows through Arize, Langfuse, MLflow, or Weights & Biases. The five planes are invariant; the tools in each plane change over a two-year horizon, and a good architecture lets them change without the planes having to change with them.

The five planes

The first plane is the client. It is where the user enters the system and where the system’s answer is rendered. A client plane includes the user interface, the session layer, the input-validation and moderation front door, and the authentication and authorization gate. In an agentic or API-only system the client plane is the calling application rather than a human interface; the responsibilities are the same. The client plane is responsible for: identifying the user, recording the request, rejecting clearly malformed input before any LLM cost is incurred, rendering the response with the disclosures the system promised, and logging the end-to-end latency the user actually experienced.

The second plane is the orchestration. It is where the system decides what to do with a request that has passed the client gate. The orchestration plane assembles the prompt, selects the model, calls the retrieval layer, manages tool calls, applies safety filters, and returns the final response to the client. It is the plane most often confused with the model because modern LLM application frameworks such as LangChain and LlamaIndex are associated with it; but the orchestrator is distinct from the model it calls. A system with three models and one orchestrator has a single orchestration plane and three model-plane instances.

The third plane is the model. It is where generation, embedding, reranking, and classification inference happens. In a closed-weight architecture the model plane is a managed API call to Anthropic Claude, OpenAI GPT-class, Google Gemini, Cohere Command, or Mistral hosted behind their respective managed endpoints. In an open-weight architecture the model plane is a self-hosted inference service running Llama 3, Qwen 2, DeepSeek, or Mistral weights on vLLM, Text Generation Inference, TensorRT-LLM, or a comparable serving stack. In a cloud-platform architecture the model plane sits behind AWS Bedrock, Azure AI Foundry, or Google Vertex AI, which add a platform-level abstraction on top of one or more underlying models. The architect picks which of these the system uses, and Article 2 develops the selection framework.

The fourth plane is knowledge. It is where the system stores and retrieves facts that were not in the model’s training data. The knowledge plane contains the source-of-truth stores (document management systems, databases, APIs to internal services), the extraction and chunking pipelines that prepare those sources for retrieval, the embeddings and index structures that make retrieval fast, the query-time retrieval logic that pulls the right passages for each request, and the lineage and licensing metadata that records where each retrieved item came from. RAG architectures live almost entirely in the knowledge plane; Articles 4 through 6 develop it.

The fifth plane is observability. It is where traces, prompt and response captures, evaluation signals, cost data, and drift signals are recorded and analyzed. The observability plane is the one most often deferred in a first release and most often regretted in the first incident. It is also the plane that the EU AI Act’s record-keeping obligation in Article 12 turns into a legal requirement for high-risk systems.¹ Article 13 develops it.

[DIAGRAM: ConcentricRingsDiagram — aite-sat-article-1-five-plane-rings — Concentric rings with “User” at the outermost ring, then “Client plane”, “Orchestration plane”, “Knowledge plane”, “Model plane” at the core, with the “Observability plane” crosscutting all rings via radial arrows showing telemetry flow outward.]

What belongs where

The discipline of the reference architecture is not that the planes exist but that the architect refuses to let responsibilities leak across them. A client plane that adds retrieval logic becomes unmaintainable; a model plane that embeds business rules becomes unportable; an orchestration plane that holds durable knowledge becomes a data hazard. The table below is the assignment rule that AITE-SAT teaches and that the rest of the credential’s articles keep referring to.

Responsibility	Belongs to plane	Why
Authenticate the user	Client	Must run before any LLM cost
Moderate obvious abuse	Client	Cheapest place to reject
Select the model for the request	Orchestration	Decision depends on request shape, not model internals
Assemble the final prompt	Orchestration	Assembly rules change more often than model capabilities
Retrieve supporting passages	Knowledge	Retrieval owns the corpus lifecycle
Generate the response	Model	Only plane that runs inference
Capture the trace	Observability	Cross-plane concern, cannot belong to any single plane
Persist the session transcript	Client or Observability	Client for short term, Observability for long term
Enforce per-tenant rate limit	Client	Must run before any downstream cost
Decide whether to call a tool	Orchestration	Uses model output but binds to governance policy
Execute the tool	Orchestration (with plane-local authorization)	Keeps tool-invocation outside the model
Record the cost	Observability	Single source of truth for finance reporting

Three worked examples

Microsoft Copilot for Microsoft 365 is a large-scale production example of the five planes. Microsoft’s public architecture documentation describes the client plane as the Microsoft 365 application itself (Word, Excel, Teams); the orchestration plane as the “Copilot orchestrator” that grounds the request against Microsoft Graph and calls the underlying language models; the model plane as OpenAI models hosted on Azure OpenAI; the knowledge plane as Microsoft Graph over the user’s tenant content; and the observability plane as Microsoft’s service telemetry stack.² The architecture document is unusually explicit about plane separation because separation is what lets Microsoft ship the same Copilot across many applications: the client plane varies per application but the other four are shared.

GitHub Copilot is a smaller example where the client plane is a code editor plug-in, the orchestration plane includes a context-gathering stage that reads open files and nearby repository content, the model plane is a fine-tuned model running on Azure, and the knowledge plane is the code editor’s local context itself rather than a separate retrieval system. The observability plane includes telemetry that GitHub publishes in blog posts describing cost and acceptance rates.³ The same five planes appear, with the knowledge plane collapsed into client-provided context; the architect who learns to see this recognizes that the knowledge plane is invariant even when it is not a separate retrieval database.

Morgan Stanley’s wealth-management assistant, delivered with OpenAI, is a third example. Morgan Stanley’s press material describes a system that indexes approximately 100,000 pieces of internal research and delivers answers grounded in that corpus to financial advisors.⁴ The client plane is the advisor workstation; the orchestration plane is Morgan Stanley’s internal application; the model plane is GPT-class models accessed through OpenAI’s enterprise offering; the knowledge plane is the research corpus with its vector index and retrieval logic; the observability plane records both operational telemetry and the evaluation traces that Morgan Stanley’s compliance teams need. The architecture is Microsoft Copilot’s five planes instantiated with different tools; the planes are the same.

Where the architect spends time

In a healthy project the architect spends the first sprint confirming that the five planes exist as independent components. In an unhealthy project, planes are missing or conflated. A “quick proof of concept” that calls a managed API from a client-side JavaScript bundle and writes the user’s question into the browser console as its only log has no orchestration plane, no knowledge plane, and no observability plane; it has a client plane that is doing the work of four. Such a system cannot be governed, cannot be scaled, and cannot be evaluated. The first job of the architect is to refuse to let that system reach users.

The second job is to draw the plane diagram for any existing system and find the responsibility that has leaked. In a system that has been running for months, the leaks are usually small and local: a piece of retrieval logic that lives in the orchestration plane because someone was under deadline, a piece of business logic encoded in a system prompt because the orchestration team’s backlog was full, a piece of observability telemetry that is missing because the logging format changed and nobody updated the dashboard. Each leak is a maintainability cost that compounds; finding them early is the architect’s return on investment.

[DIAGRAM: HubSpokeDiagram — aite-sat-article-1-orchestration-hub — Orchestration plane at the hub, with four spokes to Client, Model, Knowledge, and Observability planes. Each spoke is labeled with the primary interface contract (e.g., orchestration-to-model spoke labeled “prompt + tools in, tokens out”).]

Regulatory alignment

The reference architecture is not only an engineering convenience; it is also the structure that makes regulatory obligations tractable. ISO/IEC 42001:2023 Clause 8.1 requires operational controls for the organization’s AI management system and implicitly assumes that those controls can be located on specific components; a plane diagram is the object on which Clause 8.1 controls are placed.⁵ NIST AI RMF MAP 2.2 requires categorization of the AI system and its components, and the five-plane model satisfies the categorization requirement for a wide class of LLM applications.⁶ EU AI Act Article 11 requires technical documentation for high-risk systems; Annex IV’s itemized list is straightforwardly written as five sub-documents, one per plane, and the evidence each plane produces to support the Annex IV claim.⁷ Architects whose first artifact is the five-plane diagram have already produced the skeleton of the Annex IV document.

The OWASP Top 10 for Large Language Model Applications similarly maps to the planes. LLM01 prompt injection is a client-plane and orchestration-plane defense; LLM03 supply chain is a model-plane and knowledge-plane concern; LLM05 improper output handling is a client-plane and orchestration-plane defense; LLM08 vector and embedding weaknesses are a knowledge-plane concern; LLM10 unbounded consumption is a client-plane and observability-plane concern.⁸ An architect who can name the OWASP items against the plane they defend has a complete, non-duplicating coverage map.

What the rest of the credential builds

Every subsequent article in this half of the credential refines one or more planes. Article 2 addresses model-plane selection. Article 3 addresses orchestration-plane prompt architecture. Articles 4 through 6 develop the knowledge plane through RAG patterns, chunking and embedding, and vector stores. Article 7 addresses orchestration-plane tool calling and agent loops. Articles 8 and 9 address model-plane serving and inference-cost architecture. Article 10 addresses the fine-tuning decision that sits across orchestration and model planes. Articles 11 and 12 develop the observability plane’s evaluation architecture. Article 13 completes the observability plane with traces and dashboards. Article 14 addresses security as a defense-in-depth discipline across all five planes. Articles 15 and 16 develop data pipelines and multi-tenancy in the knowledge plane, and Article 17 closes this half of the credential with the non-functional requirements — latency, cost, scalability — that the architect has to meet whatever planes they build.

Summary

An enterprise AI system is five planes. The client plane owns the user boundary. The orchestration plane owns the decision of what to do with a request. The model plane owns inference. The knowledge plane owns the facts the model does not carry. The observability plane owns the telemetry that makes the other four governable. The first discipline of the AITE-SAT holder is refusing to let responsibilities leak across planes. The second is drawing the plane diagram for any system that lacks one and using it to locate every subsequent decision, every security control, and every regulatory obligation. Microsoft Copilot, GitHub Copilot, and Morgan Stanley’s wealth-management assistant are three public examples; the planes in each are the same. The planes will still be the same when the tools in each have all been replaced.

Further reading in the Core Stream: The AI Technology Landscape, AI Integration Patterns for the Enterprise, and Enterprise AI Platform Strategy.

Regulation (EU) 2024/1689 of the European Parliament and of the Council of 13 June 2024 (EU AI Act), Article 12 (record-keeping). Official Journal of the European Union. https://eur-lex.europa.eu/eli/reg/2024/1689/oj — accessed 2026-04-19. ↩
Microsoft Copilot for Microsoft 365 architecture overview. Microsoft Learn. https://learn.microsoft.com/en-us/copilot/microsoft-365/microsoft-365-copilot-architecture — accessed 2026-04-19. ↩
Research: Quantifying GitHub Copilot’s impact on developer productivity and happiness. The GitHub Blog, September 2022, and subsequent cost-telemetry posts through 2024. https://github.blog/2022-09-07-research-quantifying-github-copilots-impact-on-developer-productivity-and-happiness/ — accessed 2026-04-19. ↩
Morgan Stanley Wealth Management deploys OpenAI-powered AI @ Morgan Stanley Assistant. Morgan Stanley press release, September 2023. https://www.morganstanley.com/press-releases/key-milestone-in-innovation-journey-with-openai — accessed 2026-04-19. ↩
ISO/IEC 42001:2023 — Information technology — Artificial intelligence — Management system, Clause 8.1. International Organization for Standardization. https://www.iso.org/standard/81230.html — accessed 2026-04-19. ↩
Artificial Intelligence Risk Management Framework (AI RMF 1.0), NIST AI 100-1, MAP function, Subcategory 2.2. National Institute of Standards and Technology, January 2023. https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.100-1.pdf — accessed 2026-04-19. ↩
Regulation (EU) 2024/1689, Article 11 and Annex IV. Official Journal of the European Union. https://eur-lex.europa.eu/eli/reg/2024/1689/oj — accessed 2026-04-19. ↩
OWASP Top 10 for Large Language Model Applications, version 2025. OWASP GenAI Security Project. https://genai.owasp.org/llm-top-10/ — accessed 2026-04-19. ↩