Security Architecture for AI Applications

FlowRidge

AITE-SAT: AI Solutions Architect Expert — Body of Knowledge Article 14 of 35

An AI application is a new kind of attack surface attached to a classical application stack. The classical attack surface is unchanged — injection, broken authentication, misconfiguration, insecure deserialization — but the AI layer adds a category of attack the application security team has not seen before: a natural-language instruction that, when embedded in data the model reads, can redirect the model’s behavior to something the user did not intend and the architect did not authorize. Prompt injection is the foundational new threat. Data exfiltration through retrieval, model-output poisoning, excessive agency through tool use, and model-supply-chain attacks follow from it. This article gives the AITE-SAT learner a defense-in-depth architecture anchored in the OWASP Top 10 for LLM Applications and the MITRE ATLAS adversarial-AI threat taxonomy, with concrete controls at each layer and a reference threat model the architect can adapt.

The two reference taxonomies

OWASP Top 10 for LLM Applications (2025 edition) enumerates the ten highest-impact risks in LLM-augmented applications: prompt injection (direct and indirect), sensitive information disclosure, supply-chain vulnerabilities, data and model poisoning, improper output handling, excessive agency, system-prompt leakage, vector and embedding weaknesses, misinformation, and unbounded consumption.¹ The list is shorter than application-security frameworks because it focuses on the AI-specific overlay; it does not replace OWASP’s classic Top 10, it extends it.

MITRE ATLAS (Adversarial Threat Landscape for AI Systems) is the MITRE-maintained knowledge base of adversary tactics and techniques observed against AI systems.² It is modelled on MITRE ATT&CK and uses the same tactic-technique structure — Reconnaissance, Resource Development, Initial Access, ML Model Access, Execution, Persistence, Defense Evasion, Credential Access, Discovery, Collection, Exfiltration, Impact. ATLAS provides the adversary-perspective framing that OWASP’s list does not, and the two together give the architect both the defensive and offensive views.

The architect keeps both the OWASP risks and the ATLAS techniques visible when designing controls so that each risk has an identified adversary capability, and each adversary capability has an identified defensive control.

The five-layer defense model

AI defense-in-depth layers the architecture from outside to inside.

Layer 1 — Input defense

The input layer receives user content, tool outputs, retrieved documents, and any other data that will become part of the prompt. Threats: direct prompt injection from the user, indirect prompt injection from retrieved or tool-provided content, malicious payloads in uploaded files, adversarial text that bypasses safety filters.

Controls: input validation (length limits, character-class constraints, schema validation for structured inputs), source-tagging so the model prompt distinguishes user content from retrieved content and tool content, content scanning (known-injection pattern libraries, NeMo Guardrails-style classifiers, Rebuff and similar open-source injection detectors).³ The architect cannot block all prompt injections at this layer because the attack surface is natural language, but raising the cost and flagging the attempts produces a deterrent and a detection signal.

Layer 2 — Retrieval defense

The retrieval layer fetches content from a corpus and supplies it to the model. Threats: indirect prompt injection in retrieved content, corpus-poisoning attacks where attacker-controlled content enters the corpus through legitimate ingestion channels, data exfiltration where the user induces the model to return corpus content the user is not authorized to see.

Controls: corpus curation and source authority (only authoritative content enters the corpus; user-generated content is tagged distinctly; external web-scraped content is held to a higher scrutiny bar), tenant and role-based metadata filtering at retrieval time (Article 6 developed the pattern), content-hash integrity checks so corpus tampering is detectable, and retrieval-output sanitization that strips instruction-like content from retrieved passages before they enter the prompt.

Layer 3 — Model defense

The model layer is the inference call itself. Threats: model jailbreaks that bypass the provider’s safety training, model-prompt leakage where the system prompt is recovered by the user, hallucinated content presented with false authority, over-reliance on a model that was updated without the team’s notice.

Controls: provider selection that includes a provider-security review (provider safety evaluations, incident history, responsible-disclosure posture), model-version pinning so upgrades are deliberate rather than silent, defense-in-prompt (restating safety constraints at the end of the prompt, clearly delimiting user content, using explicit refusal instructions for out-of-scope queries), and output-logit monitoring for anomalous patterns when self-hosted deployments permit it.

Layer 4 — Tool defense

The tool layer is where the model’s outputs become actions. Article 7 developed this at depth. Threats: excessive agency (OWASP LLM08), tool-call injection from compromised inputs, blast-radius failures when a tool has more capability than the use case requires.

Controls: minimum-agency schema design, pre-execution authorization checks per tool call, post-execution validation of tool outputs, human-in-the-loop gates for high-risk tools, and blast-radius containment through sandbox-like environments (staging targets, draft modes, reversibility requirements).

Layer 5 — Output defense

The output layer is the model’s response on its way to the user. Threats: unsafe content returned to the user (toxicity, bias, misinformation, regulated-advice without disclaimers), PII leakage from content the model should not have disclosed, prompt-injection attempts that reach the user and cause downstream harm in systems the output is piped into (log-injection, spreadsheet-injection, browser-executed JavaScript).

Controls: output safety classification (toxicity and bias classifiers, regulated-content classifiers), PII redaction on the output side, output escaping for downstream contexts (HTML encoding for browser rendering, spreadsheet-neutralization for CSV exports), and response-signing or watermarking where provenance matters downstream.

[DIAGRAM: ConcentricRingsDiagram — aite-sat-article-14-defense-rings — Concentric rings with the model at the core. Innermost ring: “Model defense (version pinning, prompt hardening, provider selection)”. Next ring outward: “Tool defense (minimum agency, pre/post validation, sandboxing)”. Next: “Retrieval defense (curation, metadata filter, sanitization)”. Next: “Input defense (validation, source tagging, injection scan)”. Next: “Output defense (safety classification, PII redaction, downstream escaping)”. Outermost ring: “Network and perimeter (classical application security)”. Each ring labelled with three representative controls and the OWASP LLM Top 10 items it addresses.]

The threat model template

Every AI system gets a threat model, even if short. The template the AITE-SAT learner produces has six sections:

System boundary. What components are in scope (user interface, orchestration, retrieval, model, tools, data stores) and what components are treated as trusted dependencies (managed model provider, cloud infrastructure).

Data flows. The paths data takes through the system, tagged by sensitivity class (public, internal, confidential, regulated).

Trust boundaries. Where data crosses from one trust zone to another (user to application, application to retrieval corpus, application to model provider, model output to downstream systems).

Threats by OWASP LLM Top 10 item. Each of the ten items is considered against the system; applicable threats are described; not-applicable items are marked so with a reason.

Controls by defense layer. Each threat is mapped to the layer’s control that addresses it.

Residual risk and monitoring plan. Risks that cannot be fully mitigated are accepted with named mitigations, monitoring signals, and escalation plans.

A threat model that fits on three or four pages is reviewable in an architecture review; a model that runs twenty pages is too big to maintain and signals that the system is trying to do too much with one design.

[DIAGRAM: MatrixDiagram — aite-sat-article-14-owasp-layer-heatmap — A matrix with OWASP LLM Top 10 items on the vertical axis (LLM01 Prompt Injection through LLM10 Unbounded Consumption) and the five defense layers on the horizontal axis (Input, Retrieval, Model, Tool, Output). Cells colour-coded by coverage strength — primary layer that addresses the threat (dark green), supporting layer (light green), incidental coverage (yellow), no coverage (gray). Annotations in each cell name the specific control at the intersection.]

Security evaluation and red teaming

Security controls are evaluated the same way quality controls are — against a named adversarial suite on a cadence. The safety suite from Article 11 includes an injection-attempt subset. The team augments this with periodic red-team exercises: a named adversary profile (external user, insider, compromised tool provider) tries to exploit the system within a scoped time-box. Internal red teams find what automated suites miss; external red teams find what internal red teams are not motivated to find. AI Village’s DEFCON red-team exercises and the MITRE ATLAS technique catalog give the public references for how to structure these exercises.⁴

Supply-chain risk

A new class of supply-chain risk applies to AI systems. The model itself is a supply-chain component — a compromised model weight (backdoored during training by an attacker with sufficient training-data access) or a maliciously modified quantization of a public model can introduce targeted misbehavior invisible to ordinary testing. Embedding models, reranker models, classifier models, and tokenizers are all supply-chain components. Orchestration frameworks, vector-store clients, and observability SDKs are classical software supply-chain components.

The architect treats the model-supply-chain as a governed dependency: model weights are downloaded only from authenticated, attested sources; checksums are verified; vendor security posture is part of the model-selection decision; self-hosted open-weight models are scanned for known indicators the way container images are. The Hugging Face model-provenance efforts (signed artifacts, vendor verification) and the broader OpenSSF work on supply-chain integrity are the reference material for the state of the art.⁵

Two real-world examples

Samsung source-code disclosure, April 2023. A widely reported incident where Samsung employees pasted proprietary source code and other confidential material into a public chatbot for assistance.⁶ The code passed out of Samsung’s trust boundary into the provider’s training and retention pipeline. Samsung subsequently restricted employee use of public generative-AI tools. The architectural point for the AITE-SAT learner is that the input-layer defense includes preventing the wrong data from leaving the trust boundary at all. A sanctioned enterprise AI path with clear data-handling guarantees reduces the incentive for employees to use unsanctioned paths; a blocking policy without a sanctioned alternative pushes the problem underground rather than solving it.

NYC MyCity chatbot, March 2024. The Markup reported that New York City’s chatbot for small-business guidance returned legally incorrect answers on topics including landlord-tenant law and employment regulations.⁷ The errors were not security breaches; they were output-layer failures where the model confidently produced advice that the city’s legal framework did not actually say. The architectural point is that output-layer defense is not only about blocking obviously unsafe content — it is about bounding the categories of claims the model is authorized to make. A regulated-advice classifier at the output boundary flagging legal advice for mandatory disclaimer insertion or for retrieval-grounded answers only would have reduced the incident surface.

Prompt injection: the foundational new threat

Prompt injection deserves its own section because it is the threat that most distinguishes AI security from classical application security. The core mechanism is simple: content that arrives in the prompt — from the user, from a retrieved document, from a tool’s output, from an email the agent was asked to process — contains instructions that the model treats as authoritative because the model has no general-purpose way to distinguish instructions-to-follow from data-to-process. An attacker who controls any content that reaches the prompt can attempt to redirect the model’s behavior.

Direct prompt injection originates in the user’s own input. It is easy to observe because the user’s content is visible at request time, and input-layer defenses raise the cost meaningfully. Indirect prompt injection originates in content the user did not author — a retrieved document that contains adversarial text, an email forwarded to an agent that contains instructions to exfiltrate data, a webpage scraped by a browsing tool. Indirect injection is the harder class because the malicious content enters the prompt through a legitimate path and may be encountered for the first time during the user’s interaction. Greshake et al.’s 2023 paper and the ongoing research surrounding it are the reference material for the pattern and its defenses.⁸

The architect cannot eliminate prompt injection; they raise its cost, detect its attempts, bound its blast radius, and recover from its successes. Input-layer scanning, source-tagging of retrieved content, tool-output sanitization, minimum-agency tool design, and output-layer safety classification each reduce the attack surface. None of them is complete alone; the defense-in-depth composition is the point.

Regulatory alignment

EU AI Act Article 15 requires accuracy, robustness, and cybersecurity commensurate with the intended purpose and risk level of the high-risk system.⁹ Article 14 requires human oversight proportionate to the risk. Article 9 requires a risk-management system across the lifecycle, which includes identified AI-specific risks from OWASP and ATLAS. ISO/IEC 42001 Clauses A.8 and A.9 speak directly to operational controls and monitoring; the five-layer defense maps to those controls. NIST AI Risk Management Framework (AI RMF) Govern, Map, Measure, Manage functions provide the US-side counterpart; the architecture produced under either framework satisfies the other with minor mapping adjustments.

Summary

AI security is application security plus an AI-specific overlay. The OWASP Top 10 for LLM Applications and MITRE ATLAS are the two reference taxonomies; together they cover the defender and adversary views. Defense-in-depth across five layers — input, retrieval, model, tool, output — maps each OWASP item to the layer that primarily addresses it. The threat-model template fits on three to four pages and names system boundary, data flows, trust boundaries, threats, controls, and residual risk. Security evaluation runs alongside quality evaluation, and red-team exercises find what automated suites miss. Supply-chain risk is a first-class concern because models, tokenizers, and orchestration frameworks are all supply-chain components. Samsung’s source-code disclosure and NYC’s MyCity chatbot illustrate input-side and output-side failures respectively. Regulatory alignment with EU AI Act Articles 9, 14, 15 and ISO/IEC 42001 Annex A controls depends on the architect producing a documented, exercised defense architecture.

Further reading in the Core Stream: AI Security and Threat Modeling and Responsible AI by Design.

OWASP Top 10 for LLM Applications, 2025 edition. https://owasp.org/www-project-top-10-for-large-language-model-applications/ — accessed 2026-04-20. ↩
MITRE ATLAS (Adversarial Threat Landscape for AI Systems). https://atlas.mitre.org/ — accessed 2026-04-20. ↩
NVIDIA NeMo Guardrails. https://github.com/NVIDIA/NeMo-Guardrails — accessed 2026-04-20. Rebuff prompt-injection detector. https://github.com/protectai/rebuff — accessed 2026-04-20. ↩
AI Village DEFCON red-team exercises. https://aivillage.org/ — accessed 2026-04-20. MITRE ATLAS technique catalog. https://atlas.mitre.org/techniques — accessed 2026-04-20. ↩
Hugging Face model provenance and safety. https://huggingface.co/docs/hub/security — accessed 2026-04-20. OpenSSF supply-chain integrity. https://openssf.org/ — accessed 2026-04-20. ↩
Samsung source-code ChatGPT incident, widely reported April 2023. Bloomberg, “Samsung Bans Staff’s AI Use After Spotting ChatGPT Data Leak,” May 2023. https://www.bloomberg.com/news/articles/2023-05-02/samsung-bans-chatgpt-and-other-generative-ai-use-by-staff-after-leak — accessed 2026-04-20. ↩
Colin Lecher, “NYC’s AI Chatbot Tells Businesses to Break the Law,” The Markup, March 2024. https://themarkup.org/news/2024/03/29/nycs-ai-chatbot-tells-businesses-to-break-the-law — accessed 2026-04-20. ↩
Kai Greshake et al., “Not What You’ve Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection,” AISec 2023 (arXiv 2302.12173). https://arxiv.org/abs/2302.12173 — accessed 2026-04-20. ↩
Regulation (EU) 2024/1689, Articles 9, 14, and 15. Official Journal of the European Union. https://eur-lex.europa.eu/eli/reg/2024/1689/oj — accessed 2026-04-20. ISO/IEC 42001:2023, Annex A. https://www.iso.org/standard/81230.html — accessed 2026-04-20. NIST AI Risk Management Framework. https://www.nist.gov/itl/ai-risk-management-framework — accessed 2026-04-20. ↩