The LLM Risk Surface

FlowRidge

LLM Risk Surface — Six Concentric Layers

Figure 283. The model sits at the core; five surrounding layers each carry their own failure modes. The outermost context layer multiplies risk across all inner rings.

AITB-LAG: LLM Risk & Governance Specialist — Body of Knowledge Article 1 of 6

Classical application security grew up around a tidy mental model: code is the untrusted actor, data is the asset, users are the attackers, and the perimeter is the firewall. Classical model risk management grew up around a different tidy mental model: inputs are tabular, outputs are scalar, drift is measurable, and the model behaves like a function. Large language models break both models. They accept free text that is simultaneously code, data, and instruction. They emit outputs that are simultaneously answers, decisions, and new instructions for downstream systems. They retrieve context that can itself carry adversarial payloads. They invoke tools that can take real-world action. A governance practitioner who inherits those systems and tries to apply the old surface maps will miss most of the ways they fail. This article defines what a practitioner must look at instead.

Why the old surface maps miss

The Open Web Application Security Project’s Top 10 for Large Language Model Applications, 2025 revision, opens with a blunt observation: LLM applications produce failure modes that do not map neatly onto the classical OWASP Top 10 for web applications¹. The National Institute of Standards and Technology makes a comparable point in its Generative AI Profile of the AI Risk Management Framework, NIST AI 600-1. Of the twelve risk categories the Profile catalogs as either unique or amplified by generative systems, more than half have no direct analogue in pre-generative risk taxonomies². Confabulation, information-integrity failure, prompt-driven value-chain contamination, and human-AI configuration errors are not extensions of SQL injection or of model drift. They are new things.

The consequence for a governance practitioner is concrete. A risk register built from the classical OWASP list or from an existing model-risk policy will enumerate password strength, session hijacking, feature drift, and backtest accuracy, then stop. The register will have nothing to say about a retrieval store that has been quietly poisoned with adversarial content, or about a tool-using assistant that follows an instruction hidden inside a meeting invite, or about a chatbot that makes a binding commitment to a customer on behalf of the organization. These are not edge cases; they are the failure modes that have produced the most publicly damaging LLM incidents of the past three years.

A new mental model is required. The one this credential uses is the LLM risk surface: six interacting layers at which governance must apply controls, each with its own failure modes and its own evidence.

The six layers

The layers wrap the model in concentric rings. The innermost ring is the model itself. The outer rings are the surfaces the model touches when the feature operates. The outermost boundary is the organizational and user context in which the whole thing runs.

Input layer. Everything the model reads before it generates. The most obvious input is the user’s message. Less obvious inputs include the system prompt that frames the model’s role, any few-shot examples embedded in that prompt, conversation history, and (critically) content retrieved from other sources and injected into the prompt window. The input layer is where prompt injection lives, in both its direct form (the user types an attack) and its indirect form (the attack arrives in a document or tool output that the model later reads).

Model layer. The weights and the training that produced them. The practitioner rarely owns this layer but is always accountable for it. Questions at this layer include: which provider trained the model, on what data, with what safety training, under what licensing, and with what disclosed limitations. When the model memorizes and regurgitates training data, that is a model-layer failure. When the model refuses an otherwise reasonable request because of over-tuned safety training, that is also a model-layer issue.

Output layer. What the model produces and how that output is handled. The output can be text shown to a user, structured data parsed by downstream code, a function-call payload, or a stream consumed by another model. The output layer is where confabulation becomes visible, where system prompt leakage happens when the model is coaxed to reveal its own instructions, and where content-safety failures manifest.

Retrieval layer. Any system that fetches content for the model to consume. Vector databases, search indexes, document stores, web fetchers, email APIs, and knowledge graphs all sit at this layer. The retrieval layer is where a benign-looking LLM feature silently acquires an attack surface the size of the entire source corpus. A retrieval-augmented generation (RAG) pipeline that indexes internal wikis now inherits every risk those wikis carry, including the possibility that someone has planted injection content inside them.

Tool layer. Any function the model can invoke. Email send, database update, calendar write, code execution, file read, web browse, and internal API call are all tool-layer actions. The tool layer is where excessive agency becomes catastrophic. A model that can answer questions is a different risk profile from a model that can send emails on the user’s behalf. A model that can send emails is a different risk profile from one that can also schedule meetings and approve expense reports.

Data layer. The data that flows into the feature at training time, at retrieval time, and at inference time, plus the data that flows out. Training data memorization, sensitive data exposure in prompts, prompt and completion logs that contain personal information, and downstream storage of model outputs all sit here. The Italian data-protection authority’s 2023 investigation of OpenAI found lawful-basis, transparency, and minors-protection issues at exactly this layer; the 2024 sanction decision quantified it at fifteen million euros³.

The six layers are wrapped by the outer context: the organization deploying the feature, the users interacting with it, the regulators supervising it, and the adversaries probing it. A full risk-surface brief treats each of the inner six explicitly and considers the context as a multiplier.

What the surface looks like in three real deployments

The surface is easier to understand when traced across contrasting deployments.

A customer-service chatbot. All six layers are present. A customer message populates the input layer. The model is a commercial general-purpose model reached over an API. Outputs reach the customer directly. Retrieval pulls from the organization’s policy knowledge base. Tools may include ticket creation or refund authorization. Data flows include personally identifiable information in both directions. The British Columbia Civil Resolution Tribunal decision in Moffatt v. Air Canada showed what happens when the output layer confabulates a policy and the organization has not built compensating controls. The tribunal rejected the argument that the chatbot was a separate legal entity; liability attached to the deployer⁴.

A read-only internal research copilot. Input layer active, model layer active, output layer active, retrieval layer heavily loaded against internal documents, tool layer minimal (read-only), data layer sensitive because the retrieval corpus contains proprietary information. The Samsung source-code disclosure in April 2023 illustrated the failure mode here: engineers pasted confidential code into a public-facing model, making the data layer the primary exposure and the organizational boundary the primary control⁵. The model itself did nothing adversarial; the surface map simply placed private data on the wrong side of the user-context ring.

An agentic assistant with calendar and email tools. All six layers are active and the tool layer is dominant. This is the configuration in which indirect prompt injection stops being theoretical. An email landing in the user’s inbox can carry instructions that the assistant reads when summarizing the day. If the assistant has send-email or forward-email tools, those instructions can become actions. This is the failure class that MITRE ATLAS catalogs under indirect injection via untrusted content sources⁶.

Notice that the surface does not depend on the vendor. The same six layers describe a feature built on a closed-weight managed API (an OpenAI, Anthropic, or Google deployment), on an open-weight self-hosted stack (a Llama, Mistral, or Qwen deployment), or on a hybrid (managed API plus self-hosted retrieval plus self-built tooling). What changes between stacks is which party owns which layer and which evidence the practitioner can obtain. Neutrality of the surface is the point.

The risk-surface brief

A practitioner’s first artifact for any LLM feature is a one-page risk-surface brief. It lists the six layers on the left and, for each, records: what inhabits the layer in this feature, who owns it, what the known failure modes are, what controls are in place or proposed, and what evidence demonstrates those controls. The brief is deliberately short. It is meant to fit on a single page so that product managers, engineers, security reviewers, privacy officers, and auditors can hold a shared mental model in a single conversation.

The brief also records what is absent. A chatbot without tools still has a tool-layer row: empty, intentionally. An internal feature without external retrieval still has a retrieval-layer row: empty, intentionally. Recording emptiness is the discipline that keeps teams from unconsciously adding surface between cycles. The day the tool-layer row changes from empty to “can send email”, the brief must be re-reviewed.

The risk-surface brief is the practitioner’s entry point into every other article in this credential. Article 2 will unpack what can go wrong at the input layer and how to mitigate it. Article 3 will unpack the output layer. Article 4 will cover the layered guardrail architecture that wraps the model. Article 5 will cover how to evaluate whether the controls hold. Article 6 will cover the regulatory obligations that apply to the organization as a whole, obligations that the brief must keep current as the feature and the law both evolve.

Summary

The LLM risk surface is six interacting layers (input, model, output, retrieval, tool, and data) wrapped by user and organizational context. The surface is deliberately vendor-neutral: it applies equally to a managed-API deployment and to a self-hosted open-weight one. A practitioner who can name what inhabits each layer, who owns it, and what controls it is ready to receive the rest of this credential. Everything that follows maps back to this surface, and every incident studied by regulators and courts over the past three years can be localized on it.

Further reading in the Core Stream: AI Risk Identification and Classification, Generative AI and Large Language Models, and The AI Governance Imperative.

OWASP Top 10 for Large Language Model Applications, 2025. OWASP Foundation. https://genai.owasp.org/llm-top-10/ — accessed 2026-04-19. ↩
Artificial Intelligence Risk Management Framework: Generative Artificial Intelligence Profile, NIST AI 600-1, July 2024. National Institute of Standards and Technology. https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.600-1.pdf — accessed 2026-04-19. ↩
Provvedimento del 30 dicembre 2024 (n. 755) — sanction against OpenAI following Provvedimento 112/2023 investigation. Garante per la protezione dei dati personali. https://www.garanteprivacy.it/home/docweb/-/docweb-display/docweb/10085455 — accessed 2026-04-19. ↩
Moffatt v. Air Canada, 2024 BCCRT 149. British Columbia Civil Resolution Tribunal. https://decisions.civilresolutionbc.ca/crt/sc/en/item/525448/index.do — accessed 2026-04-19. ↩
Siladitya Ray. Samsung Bans ChatGPT Among Employees After Sensitive Code Leak. Forbes, 2 May 2023. https://www.forbes.com/sites/siladityaray/2023/05/02/samsung-bans-chatgpt-and-other-chatbots-for-employees-after-sensitive-code-leak/ — accessed 2026-04-19. ↩
MITRE ATLAS (Adversarial Threat Landscape for Artificial-Intelligence Systems). MITRE Corporation. https://atlas.mitre.org/ — accessed 2026-04-19. ↩

Why the old surface maps miss

The six layers

What the surface looks like in three real deployments

The risk-surface brief

Summary

Footnotes