Artifact Template: LLM Gateway Policy

FlowRidge

AITE-SAT: AI Solution Architecture Expert — Body of Knowledge Artifact Template

How to use this template

This template is the governance artifact for a centralized LLM gateway (the pattern described in Lab 4). Organizations that route all internal applications’ model calls through a single gateway use this policy record as the source of truth for what is allowed, what is redacted, and what is limited. The policy is enforced at runtime by the gateway’s policy engine (Open Policy Agent, Cedar, an internal rule engine, or equivalent); the template is the human-readable record that the engine’s rules implement.

The template has four primary sections (allow-list matrix, redaction pipeline, rate-limiting and cost-attribution, change-management workflow). All sections are required. Empty sections are rejected by review.

Organizations without a centralized gateway still benefit from the template as a checklist for embedded policy in application code, but the template is intended for a gateway-centralized posture where the policy is enforced in one place and audited from one place.

LLM Gateway Policy — [Gateway Instance Name]

1. Identification and ownership

Field	Value
Gateway instance name	[e.g., “GateKeep — Primary”]
Deployment scope	[production, staging, regional, global]
Gateway owner	[single accountable individual, team]
Policy-engine platform	[Open Policy Agent / Cedar / internal rule engine / other]
Policy-repository location	[version-controlled path]
Security reviewer	[name, role]
Governance reviewer	[name, role]
Effective date	YYYY-MM-DD
Policy version	[e.g., 1.4.2]

2. Allow-list matrix

The allow-list is a matrix of (calling application × data class × client or matter context × jurisdiction) → permitted providers and configurations. It is expressed as rules in the policy-engine language; this section captures the rule set in human-readable form.

2.1 Dimension definitions

Dimension	Values	Source of truth
Calling application	[enumerated application IDs]	[service catalog]
Data class	[public / internal / confidential / restricted]	[data classification service]
Client or matter context	[attached matter ID with “no-third-party” flag; or “none”]	[matter management system]
Jurisdiction	[EU, UK, US-named-state, APAC-country]	[tenant attribute]
Provider channel	[managed API, cloud platform, self-hosted]	[provider registry]
Model tier	[frontier, general, small-fast]	[provider registry]

2.2 Rules (human-readable)

Write each rule in the form: “For [application] on [data class] with [context] in [jurisdiction], the allowed configurations are […] and the default is […]. Override to […] is allowed under [condition].”

Required rules at minimum (organizations add more as their application inventory requires):

Default permit rule. For internal applications on internal-class data with no client-matter context, in approved jurisdictions, the default configuration is [named configuration], with override to [alternate]. Rule applies unless a more specific rule overrides.
Restricted-class rule. Any request carrying restricted-class data must pass the redaction pipeline. Unredacted restricted-class requests are denied. Requests on restricted data after redaction may proceed only via [listed provider channels].
No-third-party rule. Any request attached to a matter flagged “no-third-party” must route to the self-hosted path only. If the self-hosted path is unhealthy, the request is denied with a typed error, not silently routed to a managed API.
Jurisdiction residency rule. Requests originating in [jurisdiction] must route to providers whose residency posture for [data class] is compatible. A rule per jurisdiction-data-class pair.
Deny-by-default rule. Any request that does not match a permit rule is denied.

2.3 Deny response contract

Field	Value
Deny code	[typed code returned to calling application]
Rule ID disclosure	[which denying rule’s ID is surfaced to the caller]
Other-rule leakage protection	[the policy does not enumerate alternative rules in the deny response]
Audit log entry	[what fields are written to the denial audit stream]
Caller escalation path	[how a legitimately denied request is reviewed]

3. Redaction pipeline

3.1 Detection taxonomy

At minimum eight classes. For each class: what it detects, what implementation family (NER model, regex, deny-list, hybrid), what the false-positive and false-negative targets are, and what the evaluation set used to measure them.

Class	Examples	Detection family	FP target	FN target
Personal identifiers	Names, emails, phone numbers	NER + regex	[e.g., ≤ 2%]	[e.g., ≤ 1%]
Sensitive identifiers	Bank accounts, medical codes	Regex + deny-list	[…]	[…]
Client-matter identifiers	Matter numbers, opposing parties	Deny-list	[…]	[…]
Secret patterns	API keys, access tokens	Regex	[…]	[…]
Location data	Addresses, coordinates	NER	[…]	[…]
Special-category (GDPR Art. 9)	Health, biometric, other	Hybrid	[…]	[…]
Commercial-sensitive	Deal codes, transaction IDs	Deny-list	[…]	[…]
Domain-specific	[feature-specific patterns]	[implementation]	[…]	[…]

3.2 Replacement policy

Field	Value
Placeholder scheme	[typed placeholders like `[PERSON_NAME]`, `[EMAIL]`]
Stable-surrogate scheme	[within a request, the same name maps to the same surrogate so downstream reasoning works]
Retention of original-to-surrogate mapping	[held at the gateway; de-redaction before return to caller; retention duration]
Low-confidence fail-mode	[if detector confidence is below the named threshold, the request is denied rather than passed]

3.3 Output-scan policy

Field	Value
Output classes scanned	[same taxonomy applied to generated output]
Action on output match	[redact before return, or flag and deliver with warning, or deny and log]
Scanner performance budget	[target p99 added latency from output scan]

4. Rate-limiting and cost-attribution

4.1 Rate limits

Scope	Algorithm	Limit	Action on breach
Per-calling-application	[token bucket / leaky bucket / concurrency]	[requests per second]	[429 with retry-after]
Per-tenant	[…]	[…]	[…]
Per-user	[…]	[…]	[…]
Per-provider-channel	[…]	[…]	[…] (protects against provider outages and cost spikes)

4.2 Cost attribution

Field	Value
Cost tag dimensions	[tenant, cost center, application, user]
Cost computation	[at-request computation using input-tokens × input-price + output-tokens × output-price, or platform-specific]
Reconciliation cadence	[e.g., weekly against provider invoice]
Tolerance before investigation	[e.g., > 3% discrepancy or > $X absolute]
Budget alert levels	[percent of monthly budget at which alert fires; who receives]
Cost-cap enforcement	[hard cap at gateway layer; behaviour when exceeded — graceful degrade to cheaper model or deny]

4.3 Multi-tenancy

Field	Value
Tenant isolation in policy store	[…]
Tenant isolation in rate-limit store	[per-tenant counters]
Tenant isolation in log stream	[per-tenant log streams or tagged records]
Tenant isolation in cost stream	[per-tenant billing records]
Per-tenant-outage containment	[assertion: a single tenant’s outage does not degrade other tenants]

5. Change-management workflow

Field	Value
Change authors	[who may propose a policy change]
Review path	[security review, governance review, architecture review — named reviewers and SLA]
Rollout modes	[immediate (emergency), feature-flag ramp, percentage ramp]
Emergency change protocol	[who can bypass the review path, under what condition, with what after-the-fact review]
Rollback protocol	[time-to-rollback SLO, decision-authority]
Change log retention	[duration, location, access]

6. Review and amendments

Role	Name	Decision	Date
Gateway owner	[…]	Authored	YYYY-MM-DD
Security reviewer	[…]	Approved	YYYY-MM-DD
Governance reviewer	[…]	Approved	YYYY-MM-DD
Architecture reviewer	[…]	Approved	YYYY-MM-DD

Amendment log with date, author, sections affected, re-approvals obtained. Emergency amendments are logged at the time of commit and reviewed within 10 business days.

Notes on use

When to use this template. Any centralized LLM gateway. Organizations without a centralized gateway still benefit from the template as a structural checklist.

Common errors in first-time use. Deny-by-default rule missing; redaction FP/FN targets unquantified; cost-attribution reconciliation cadence absent; change-management rollback SLO not stated; no emergency-change protocol. Reviewers treat these as blocking.

What follows. The policy record is cited from Template 1 §8 (security architecture). It is re-reviewed quarterly at minimum and whenever a new calling application, a new provider, a new data class, or a new jurisdiction enters scope.