Output Structuring and Constrained Decoding

FlowRidge

Output Structuring — From Free Text to Verifiable JSON

Current State

Free-text generation

String outputs

Brittle parsing

Hard to test

No type safety

Transformation Bridge

Constrained decoding

1

JSON schema

2

Grammar constraints

3

Retry on invalid

4

Validator hook

Target State

Structured output contract

Typed fields

Unit testable

Version pinned

Downstream safe

Figure 309. Constrained decoding bridges free-text generation and downstream consumption. Schema contracts make outputs parseable, testable, and auditable.

AITM-PEW: Prompt Engineering Associate — Body of Knowledge Article 3 of 10

The most common production failure of a language-model feature is not a wrong answer. It is a right answer expressed in a shape the surrounding code cannot read. A prompt that returns please find below the requested information, followed by the information, followed by a cheerful sign-off, cannot be parsed by a downstream component that expects a JSON object. The prompt works when a human reads it, and does not work when software reads it. Every practitioner who ships a feature that crosses a process boundary runs into this problem; this article covers the techniques that solve it and the failure modes that survive those techniques.

Why structured output is a governance concern

Output structuring is not only a developer-ergonomics concern. It is a governance concern because every integration between a language model and a downstream system is a place where misparsing can produce harm. A model that returns a dollar figure as a number might produce a valid authorization; the same model returning approximately $37.50 (before tax) produces a parsing exception that, depending on the surrounding code, might silently default to zero, or fail open, or drop the request. A customer-support assistant that classifies tickets into a set of categories must return exactly one of those categories spelled exactly as declared; a case-insensitive near-match in a downstream router is a data-quality liability waiting to surface. The NIST AI RMF Generative AI Profile places integration-surface hazards under information integrity risks and calls out brittle parsing as a specific failure mode¹.

The techniques in this article span a continuum from soft to hard. Soft techniques ask the model politely to produce a shape; hard techniques prevent the model from producing anything else at decoding time. Every technique has a cost and a failure mode. No technique removes the need for a fallback path.

JSON mode

JSON mode is the lightest of the structured-output techniques and the one most widely supported. The prompt declares that the response must be valid JSON and, optionally, supplies a schema the JSON must conform to. OpenAI documents a response_format parameter accepting json_object and json_schema values²; Google’s Gemini exposes a response_mime_type of application/json together with a response_schema³; Mistral’s managed API exposes a similar response_format field⁴. Self-hosted providers routinely expose the same capability through libraries like llama.cpp and vLLM.

JSON mode is sufficient for many production integrations. A prompt saying respond with a JSON object conforming to this schema, followed by the schema, and followed by the task instruction, produces a parseable object the majority of the time. The majority of the time is the precise reason JSON mode needs a fallback. The OpenAI Structured Outputs release of August 2024⁵ tightened this by introducing schemas with strict conformance guarantees; the guarantee is real but narrow, and depends on the model and the parameter settings. For older models or mixed-provider deployments, conformance is statistical, not guaranteed.

Function-call schemas

Function calling reframes structured output as a tool invocation. Instead of asking the model to return data in a shape, the prompt presents a function the model can call, and the model produces a function-call payload whose arguments conform to the function’s parameter schema. OpenAI introduced function calling in June 2023⁶; Anthropic’s equivalent is documented under tool use⁷; Gemini exposes function declarations⁸. The pattern has converged across providers because it fits naturally with the agentic workflows described in Article 5 and Article 6.

A practitioner gets structured output almost for free by framing the task as a function call even when there is no real external function. A classify_ticket function with a category parameter and a confidence parameter, called by the model, returns a structured payload the surrounding code parses without prose. The function need not be executed; its schema is the vehicle for the structure.

Function-call payloads carry the same caveats as JSON mode. A model may hallucinate an argument value, skip a required field, or produce a type mismatch (a number when a string is declared). Strict schema enforcement reduces these errors but does not eliminate them. The production integration checks every payload against the declared schema before using it.

Grammar-constrained decoding

Grammar-constrained decoding operates at the decoding layer. Rather than asking the model to respect a shape, the decoder enforces it by restricting token choice at each step to tokens that keep the output within a declared grammar. The technique is used widely in open-source self-hosted stacks. Outlines and Guidance are two established open-source libraries for grammar-constrained generation⁹¹⁰; the llama.cpp runtime exposes GBNF grammars natively. Managed APIs are beginning to expose equivalent capabilities under the structured-output umbrella.

Grammar constraints deliver the strongest guarantees. A grammar that admits only valid JSON conforming to a specific schema cannot produce a syntax error, because the decoder will refuse to emit a closing brace in the wrong place. The cost is complexity: writing a grammar for a rich schema is more effort than declaring a JSON schema, and the grammar must be updated whenever the schema changes. The technique is most attractive when the integration is high-stakes and the schema is stable.

Choosing among the techniques

A practical decision rule: start with JSON mode using a schema, because it is cheap and well-supported. Move to a function-call schema when the task naturally reads as a function invocation or when the feature will later become agentic. Move to grammar-constrained decoding when the integration cannot tolerate a parsing failure and the deployment uses a stack where grammars are available. Use free-text prompting and parse with a validator only when the integration is tolerant and you have observability to catch regressions.

[DIAGRAM: StageGateFlow — aitm-pew-article-3-output-flow — Flow: model response -> schema validator -> repair attempt (one retry with validation error) -> fallback (default value or human review queue) -> downstream consumer.]

Across all four options, the output must be validated by the application before use. Validation checks two things: structure (is the response parseable into the declared type) and content (do the field values satisfy the business rules). A ticket_category field must be one of the declared enum values, even if the grammar says it is a string. A dollar_amount must be non-negative, even if the schema says it is a number. The model is a fluent generator of plausible shapes; business correctness is the application’s job.

Failure modes

Four failure modes are characteristic of structured output and each needs a named handling path.

Malformed output. The response is syntactically invalid JSON or a function call with missing arguments. The application’s first recourse is a single automated retry with the parsing error quoted back to the model. The OpenAI guide documents this repair pattern². If the retry fails, the request enters a fallback path (a default value, a human review queue, or a user-facing graceful degradation).

Hallucinated fields. The response is syntactically valid but contains a field the schema does not declare, or omits a required field. The validator catches the omission; the surplus field is typically ignored, but a suspicious validator logs it, because a pattern of extraneous fields suggests the prompt is under-specified or the schema has drifted.

Type mismatch. The response is valid JSON but a number is rendered as a string or an enum value is rendered with unexpected capitalization. Strict schemas eliminate most of this; loose schemas need defensive coercion in the application layer.

Schema drift. The schema evolves, the prompt changes, and the model’s behaviour tracks the change on new requests while old cached responses remain in the wrong shape. Schema drift is a lifecycle concern; Article 9 develops the controls that detect and coordinate schema changes across prompt, model, and downstream consumer.

[DIAGRAM: Scoreboard — aitm-pew-article-3-structured-output-dashboard — Dashboard tiles: malformed JSON rate (target 0%), field-missing rate, type-mismatch rate, schema-drift signal; trend sparklines; red-yellow-green thresholds.]

Two real examples

OpenAI Structured Outputs. The August 2024 release of Structured Outputs documented formal JSON Schema support in the API with a strict mode guaranteeing conformance for supported schemas⁵. The documentation also candidly describes the limits: not every keyword in JSON Schema is supported, and the feature’s guarantees are scoped to specific models. For practitioners, the release was a meaningful hardening, but it did not eliminate the need for validation in application code.

Outlines and Guidance in self-hosted stacks. The open-source Outlines library provides regex, JSON-schema, and grammar-constrained generation across many self-hosted runtimes⁹; the Guidance library offers a similar capability with a different ergonomics¹⁰. A team running a Llama or Mistral deployment behind vLLM or llama.cpp uses these libraries to get parity with the managed-API structured-output guarantees. The decision between Outlines and Guidance is typically driven by the ergonomics the team prefers, not by a capability gap; both are in production at scale.

Schema design as a craft

A schema is not a trivial artefact. A bad schema produces unreliable outputs even on a well-aligned model; a good schema makes the model’s job easier. Several disciplines compose a good schema.

Fields are named for what they mean in the business, not for what the model finds convenient to produce. A ticket_category is clearer than a cat; a customer_sentiment is clearer than a sent. Clear names reduce hallucination because the model uses the field name as a guide to the field’s content.

Fields are typed strictly. A number type with minimum and maximum bounds is better than a string that holds a number. An enum with explicit allowed values is better than a string that is expected to match a set of values. A date with an explicit format is better than a free-text date.

Required fields are minimised. Every required field is another place the model can omit. A schema with twenty required fields is a schema that will fail more often than a schema with six required fields and fourteen optional ones, even when the underlying model is identical. A practitioner makes a field required only when the downstream consumer cannot proceed without it.

Descriptions accompany every field. The field description is not documentation for humans; it is documentation for the model. A description that reads the three-letter ISO currency code (USD, EUR, GBP) guides the model more reliably than a description that reads the currency. This is the schema-authoring equivalent of few-shot prompting: the description teaches the model what the field is for.

Nested structures are flattened where reasonable. Deeply nested output schemas produce more errors than flat ones. A three-level nested object may be expressive but will often degrade output reliability; a flat structure of twelve typed fields is usually more robust. The exception is when the nesting reflects genuine domain structure that the model needs to preserve, such as a list of line items where each has its own fields.

The schema is versioned. Article 9 develops versioning in depth; for structured output, the schema version travels with the prompt version, because changing a schema changes the feature’s output contract and downstream consumers must be notified.

Designing the fallback

The fallback path is part of the prompt design, not an afterthought. A practitioner writes down, for each prompt, what happens when the output fails parsing. Three categories of fallback exist. Default-value fallback substitutes a safe placeholder and logs the failure; it is suitable for low-stakes features where the default is obviously benign. Human-review fallback routes the response to a queue where a human resolves the case; it is suitable for moderate-stakes features and for every feature during its early-production period. User-facing degradation tells the user the system could not answer and offers a retry or a human channel; it is suitable for features that touch external users and where silent substitution would be inappropriate.

Every fallback path is observable. Failure rates are metrics on the feature’s dashboard, not lines in a log file no one reads. A structured-output failure rate rising from 0.5% to 2% is a change the on-call owner should see.

Multi-step output structuring

A feature whose output serves several downstream consumers may need more than one output shape from a single prompt. A practitioner has two principal choices. The first is a single structured payload with fields for each consumer, with each consumer selecting the fields relevant to it; this works when the shapes compose cleanly. The second is a pipeline of prompts, each producing the shape its consumer expects, with the orchestration layer mediating between them; this works when the shapes do not compose and the latency cost of multiple prompts is acceptable.

The choice is rarely obvious from the first design iteration. A practitioner starts with the simpler option (single payload), runs the harness, and refactors if the single-payload option produces unreliable outputs or forces the prompt into an unnatural shape. The refactor is not a failure; it is a normal step in the lifecycle, and the registry entry records the version transition so that downstream consumers can coordinate.

Summary

Output structuring turns a language model into a component that other software can consume. JSON mode is the light option; function-call schemas are the agentic-ready option; grammar-constrained decoding is the hardest guarantee. Every option needs a validator and a fallback, because every option has a failure mode. Article 4 develops the retrieval-augmented pattern, where structured output is paired with grounded content and citation requirements, and where structure is the mechanism by which citations become machine-readable.

Further reading in the Core Stream: Tool Use and Function Calling in Autonomous AI Systems.

Artificial Intelligence Risk Management Framework: Generative Artificial Intelligence Profile, NIST AI 600-1, July 2024. National Institute of Standards and Technology. https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.600-1.pdf — accessed 2026-04-19. ↩
Structured outputs. OpenAI Platform documentation. https://platform.openai.com/docs/guides/structured-outputs — accessed 2026-04-19. ↩ ↩²
Generate structured output. Google Gemini API documentation. https://ai.google.dev/gemini-api/docs/structured-output — accessed 2026-04-19. ↩
JSON mode. Mistral AI documentation. https://docs.mistral.ai/capabilities/json_mode/ — accessed 2026-04-19. ↩
Introducing Structured Outputs in the API. OpenAI, 6 August 2024. https://openai.com/index/introducing-structured-outputs-in-the-api/ — accessed 2026-04-19. ↩ ↩²
Function calling and other API updates. OpenAI, 13 June 2023. https://openai.com/index/function-calling-and-other-api-updates/ — accessed 2026-04-19. ↩
Tool use (function calling). Anthropic documentation. https://docs.anthropic.com/en/docs/build-with-claude/tool-use — accessed 2026-04-19. ↩
Function calling. Google Gemini API documentation. https://ai.google.dev/gemini-api/docs/function-calling — accessed 2026-04-19. ↩
Outlines: structured generation. Open-source project. https://github.com/outlines-dev/outlines — accessed 2026-04-19. ↩ ↩²
Guidance: a language for controlling large language models. Open-source project. https://github.com/guidance-ai/guidance — accessed 2026-04-19. ↩ ↩²