Prompt Architecture: Templates, Versioning, Injection Defense

FlowRidge

Prompt Architecture — Reusable Composition

Prompt template registry

Versioned, testable

SemverTests

Variable binding

Typed context injection

SchemaGuardrails

Retrieval context

Grounded evidence

CitationsFilters

Output schema

Constrained decoding contract

JSONValidator

Figure 339. Production prompt systems compose reusable pieces. Each layer versions independently and attaches to the model registry.

AITE-SAT: AI Solutions Architect Expert — Body of Knowledge Article 3 of 35

A production AI system rarely has a single prompt. A mature one has dozens, sometimes hundreds: system prompts for each role, tool-call schemas for each integration, few-shot templates for each capability, safety prompts for each content class, evaluation prompts for each judge. An architect who treats prompts as source code treats that fleet as a first-class asset — versioned, reviewed, tested, and deployed through controlled stages. An architect who treats prompts as text blobs has no way to answer a regulator who asks why a specific production response was generated. This article distinguishes prompt architecture from prompt engineering and teaches the discipline required to run many prompts at once without losing track of any of them.

Engineering and architecture are different jobs

Prompt engineering is the craft of writing one prompt well for one task. It is author-facing work: selecting instructions, choosing examples, experimenting with wording, measuring task performance. Prompt engineering is what the AITM-PEW Associate credential teaches in depth, and every AITE-SAT holder must respect the craft because bad prompts defeat good architecture. But prompt engineering alone does not solve the production problem.

Prompt architecture is the set of system-level decisions that make a fleet of prompts governable. It is architect-facing work: deciding how prompts are stored, versioned, composed, validated, and cached; deciding what defenses sit between user input and the final prompt; deciding how prompt changes are promoted from development to production and how they are rolled back. Prompt architecture answers questions that prompt engineering cannot: how does the system guarantee that a user cannot inject instructions that override the system prompt; how does the system know which version of which prompt produced which response; how does the system upgrade a prompt without breaking tool-call schemas that depend on it. A team that lacks prompt architecture ships a system that works in demo and fails in audit.

The prompt request path

An architect reasons about prompts as a pipeline. A single user-facing request passes through a sequence of stages before it reaches the model, and through a symmetric sequence on the way back. Each stage is a place where the architect decides what safety, composition, or validation logic belongs.

Input sanitization. The raw user input is inspected for malformed structure, obvious policy violations, and injection markers. Sanitization does not mean stripping every prompt-injection payload — adversaries will find new ones — but it means rejecting trivially malicious inputs before any LLM cost is incurred. OWASP LLM01 identifies prompt injection as the first threat class; a sanitization stage that removes known control-character payloads, capped-length exceedances, and obvious jailbreak patterns is the first line of defense.¹

Template assembly. The sanitized input is merged with a prompt template fetched from the prompt registry. A prompt template is a structured document with named slots for system instructions, few-shot examples, retrieved context, tool schemas, conversation history, and user input. The assembly stage is where the template is resolved with the current request’s values. Templates carry version identifiers; the assembly stage records which template version was used for each request.

Pre-call filter. The assembled prompt passes through a content-policy filter before the model call. This stage detects context that should never reach the model — leaked secrets from the retrieval layer, PII that the policy redacts, profane or disallowed instructions accidentally included in the system prompt. A pre-call filter is not the same as input sanitization: sanitization runs on user input, the pre-call filter runs on the full composed prompt.

Model call. The model generates a response. The architect treats this as an opaque but logged step; tracing captures token counts, latency, and the full prompt and response pair (subject to the observability plane’s retention policy).

Post-call filter. The model’s response passes through another content-policy filter before it reaches downstream consumers. The post-call filter catches unsafe content the model produced despite the guardrails in the prompt, PII the model generated or regurgitated, and policy violations that the application’s business context would otherwise accept silently.

Structured-output validator. For any response that must conform to a schema — JSON for an API, a specific enum for a decision, a tool-call shape — the validator rejects responses that do not match and either retries with corrective instructions or returns an error. This stage implements OWASP LLM05 (improper output handling) as a structural defense rather than a behavioral hope.²

Response finalization. The filtered, validated response is returned to the orchestration plane for routing back to the client, with the full trace emitted to the observability plane.

[DIAGRAM: StageGateFlow — aite-sat-article-3-prompt-path — Horizontal stage-gate flow: user input → input sanitization → template assembly → pre-call filter → model → post-call filter → structured-output validator → response. Each stage shows the artifact it produces (sanitized input, composed prompt, filtered prompt, model response, filtered response, validated response) and the log record it emits.]

The prompt registry

The prompt registry is the database of every prompt the system uses, with full version history. It is the architect’s single source of truth for the question “what prompt produced this response?” A registry contains: prompt identifier, version number, author, approver, creation and modification timestamps, the prompt template itself, tool schemas the prompt binds to, safety annotations, evaluation results, intended model or model family, and deployment status (draft, staged, production, retired).

The registry is not a file system. Storing prompts in a Git repository is a helpful starting point for engineering teams but fails three architecture requirements. First, it binds prompt changes to application deploys; a prompt-only fix cannot ship without a full application release. Second, it makes runtime version selection hard; rolling back one prompt without rolling back unrelated code requires partial reverts that are error-prone. Third, it does not record the per-request version actually used; a Git commit tells you what was available, not what ran. A prompt registry is a separate service (or a separate schema within the orchestration plane’s database) that the application consults at runtime and that records the version per request.

LangChain’s LangSmith service, Humanloop’s platform, and Langfuse’s open-source stack each offer prompt registry features; an architect building on raw open-source can implement the same shape with a versioned table in PostgreSQL or a document store with strict write-once semantics.³ The choice of tool is a build-versus-buy question (Article 26); the requirement of a prompt registry is not.

Prompt patterns the architect must recognize

Five prompt patterns appear repeatedly. Each has a different architectural footprint.

Pattern	What it does	Architectural footprint
Role prompt	System-level instruction that frames the model as a specific persona or assistant	Stored once, imported by many task prompts; rarely changes
Few-shot template	Task-specific instruction plus labeled examples	Stored per task; examples rotate with evaluation
Chain-of-thought prompt	Instructs the model to think step-by-step before answering	Increases token cost; must be budgeted
Schema-constrained prompt	Specifies an exact output format (JSON, enum, function-call)	Requires structured-output validator
Self-consistency / ensemble prompt	Runs the same prompt N times and aggregates	Multiplies cost; used for high-stakes decisions

The architect’s concern is less the wording than the footprint: a schema-constrained prompt needs a validator stage; a self-consistency ensemble needs a budget cap; a chain-of-thought prompt increases output tokens and must be monitored against the per-query cost target from Article 17.

Injection defense is an architecture problem

Prompt injection — OWASP LLM01 — is not a problem a single stage can solve. Adversaries will find new phrasings that bypass any specific sanitizer. Defense is layered. The client plane rejects malformed or oversized input. The orchestration plane assembles templates in a way that separates trusted from untrusted content (system instructions first, retrieved context marked, user input clearly delimited). The knowledge plane authenticates the source of retrieved passages so that poisoned documents can be traced (OWASP LLM08).⁴ The orchestration plane runs a pre-call filter that rejects prompts whose composed shape looks adversarial. The model plane is treated as non-trusting — the architect assumes the model will sometimes obey injected instructions. The orchestration plane runs a post-call filter that rejects responses that look like successful injection (a response that changes role, that outputs disallowed content, that exceeds the authorized tool scope). The observability plane logs every layer so that a successful injection can be traced to the defense that failed.

Two real cases illustrate what happens when layers are missing.

DPD UK chatbot, January 2024. The delivery company DPD’s public chatbot was induced by a customer to swear and disparage the company; the exchange went viral after the BBC reported it.⁵ The architectural failure is in the input-sanitization and pre-call filter stages. The chatbot had no layered defense; a single system prompt was expected to hold against adversarial user input, and it did not. After the incident DPD disabled the AI function; an architect with a pre-call filter and a post-call filter would have detected the out-of-role behavior before the responses reached the user.

Chevrolet of Watsonville $1 Tahoe, December 2023. A customer interacting with the dealership’s chatbot induced the bot to agree to sell a 2024 Chevy Tahoe for one dollar and to frame the offer as legally binding.⁶ The architectural failure is in the structured-output validator stage and in tool-call authorization. The chatbot had been configured to engage freely without boundaries on the kinds of commitments it could make; a structured-output validator that constrained responses to a narrow set of transactional claim types, and a tool-call authorization layer that would have refused to commit to a price on behalf of the business, would have prevented the exchange.

Neither failure is a prompt-engineering failure alone. Both are prompt-architecture failures.

Versioning and rollback

A prompt that ships to production must be rollbackable. The registry records every prior version; the orchestration plane can target a specific version per request; the canary infrastructure (Article 19) routes a small percentage of traffic to the candidate version before it replaces the incumbent. Rollback is a one-field change: flipping the deployment-status flag back from “production” to “staged” and promoting the prior version.

Rollback requires backward-compatible tool schemas. If a new prompt version depends on a new tool schema, the tool implementation must accept both schemas during the transition, or the rollback will break unrelated code paths. This is the same discipline that protects database schema migrations: new code must work with the old schema before the old code can be retired. An architect who forgets this ships a prompt change that looks rollbackable until the moment the rollback is needed, and then discovers that the schema change was an unrecognized dependency.

[DIAGRAM: BridgeDiagram — aite-sat-article-3-engineering-architecture-bridge — A horizontal bridge from “Prompt engineering concerns” (left) listing wording, examples, chain-of-thought, task metrics, and single-prompt evaluation, crossing through a middle “shared boundary” (prompt evaluation harness, version control, feedback from production), to “Prompt architecture concerns” (right) listing template registry, version history, injection defense layers, tool schema contracts, and fleet-wide rollback strategy.]

Regulatory alignment

Prompt architecture is the evidence base for EU AI Act Article 12 (record-keeping) and Article 14 (human oversight).⁷ A high-risk AI system must keep logs that allow after-the-fact analysis of its operation; a prompt registry with per-request version capture is the natural implementation of that obligation. Human oversight is enabled by the ability to inspect the exact prompt that ran for a given request; a registry that records composed prompts (subject to privacy redaction) gives the oversight function something to oversee. ISO/IEC 42001:2023 Clause 8.3 requires lifecycle management for AI systems; prompts are lifecycle artifacts whose changes are lifecycle events.⁸ NIST AI RMF MANAGE 1.3 requires resource management for AI risks; the prompt registry is one of the resources that has to be managed.⁹

Prompt architecture across stacks

The same prompt architecture works on closed-weight managed APIs (Anthropic Claude via the Messages API, OpenAI GPT-class via chat completions, Google Gemini via Vertex) as on open-weight self-hosted models (Llama 3 or Mistral served through vLLM). The orchestration layer that manages the registry, assembles templates, and runs filters is independent of which model receives the composed prompt. An architect who builds the registry and the filter chain into the orchestration plane, rather than into model-specific adapters, gets portability for free: changing the underlying model is an adapter change, not a registry change.

Orchestration frameworks help or hurt. LangChain’s prompt templates and LangGraph’s state-graph primitives cover assembly and version-aware routing; LlamaIndex’s prompt classes cover template management for retrieval-heavy applications; DSPy’s signature abstraction pushes prompt structure into code that can be tested; Semantic Kernel’s plan handlers provide similar shape in the .NET ecosystem; Haystack’s prompt-node abstractions serve Python teams that prefer an older pattern. Each framework is a tool in the orchestration plane; none is the architecture. An architect who picks one framework and cannot articulate the same responsibilities in another has confused the tool with the job.

Summary

Prompts are interfaces. They need contracts, versioning, and input validation. Prompt architecture is the production discipline that sits above prompt engineering: a request path with sanitization, template assembly, pre-call filter, model call, post-call filter, and structured-output validator; a prompt registry that records every version; a defense-in-depth stance on prompt injection that does not rely on any one stage; a rollback strategy that treats prompts as versioned artifacts. The DPD and Chevrolet cases show what happens when the architecture layers are missing. The regulatory record-keeping expectations of the EU AI Act, ISO 42001, and NIST AI RMF are satisfied by the same prompt registry that serves the engineering team. The architecture is independent of the model and of the framework; changing either is an adapter change, not an architecture change.

Further reading in the Core Stream: Generative AI and Large Language Models, Safety Boundaries and Containment for Autonomous AI, and Grounding, Retrieval, and Factual Integrity for AI Agents.

OWASP Top 10 for Large Language Model Applications, version 2025, LLM01 Prompt Injection. OWASP GenAI Security Project. https://genai.owasp.org/llm-top-10/ — accessed 2026-04-19. ↩
OWASP Top 10 for Large Language Model Applications, version 2025, LLM05 Improper Output Handling. OWASP GenAI Security Project. https://genai.owasp.org/llm-top-10/ — accessed 2026-04-19. ↩
Langfuse Prompt Management documentation. https://langfuse.com/docs/prompts — accessed 2026-04-19. LangSmith Prompt Hub documentation. https://docs.smith.langchain.com/prompt_engineering — accessed 2026-04-19. Humanloop Prompt Management documentation. https://humanloop.com/docs — accessed 2026-04-19. ↩
OWASP Top 10 for Large Language Model Applications, version 2025, LLM08 Vector and Embedding Weaknesses. OWASP GenAI Security Project. https://genai.owasp.org/llm-top-10/ — accessed 2026-04-19. ↩
“DPD parcel chatbot swears at customer.” BBC News, 19 January 2024. https://www.bbc.co.uk/news/technology-68025677 — accessed 2026-04-19. ↩
“A Chevy dealer added ChatGPT to help sales. A prankster got it to offer a Tahoe for $1.” Business Insider, 18 December 2023. https://www.businessinsider.com/chevy-dealership-chatgpt-chevy-tahoe-1-dollar-2023-12 — accessed 2026-04-19. ↩
Regulation (EU) 2024/1689 of the European Parliament and of the Council of 13 June 2024 (EU AI Act), Articles 12 and 14. Official Journal of the European Union. https://eur-lex.europa.eu/eli/reg/2024/1689/oj — accessed 2026-04-19. ↩
ISO/IEC 42001:2023 — Information technology — Artificial intelligence — Management system, Clause 8.3. International Organization for Standardization. https://www.iso.org/standard/81230.html — accessed 2026-04-19. ↩
Artificial Intelligence Risk Management Framework (AI RMF 1.0), NIST AI 100-1, MANAGE function, Subcategory 1.3. National Institute of Standards and Technology, January 2023. https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.100-1.pdf — accessed 2026-04-19. ↩