Skip to main content
AITE M1.1-Art74 v1.0 Reviewed 2026-04-06 Open Access
M1.1 Foundations of AI Transformation
AITF · Foundations

Artifact Template: LLM Gateway Policy

Artifact Template: LLM Gateway Policy — AI Strategy & Vision — Advanced depth — COMPEL Body of Knowledge.

8 min read Article 74 of 48

AITE-SAT: AI Solution Architecture Expert — Body of Knowledge Artifact Template


How to use this template

This template is the governance artifact for a centralized LLM gateway (the pattern described in Lab 4). Organizations that route all internal applications’ model calls through a single gateway use this policy record as the source of truth for what is allowed, what is redacted, and what is limited. The policy is enforced at runtime by the gateway’s policy engine (Open Policy Agent, Cedar, an internal rule engine, or equivalent); the template is the human-readable record that the engine’s rules implement.

The template has four primary sections (allow-list matrix, redaction pipeline, rate-limiting and cost-attribution, change-management workflow). All sections are required. Empty sections are rejected by review.

Organizations without a centralized gateway still benefit from the template as a checklist for embedded policy in application code, but the template is intended for a gateway-centralized posture where the policy is enforced in one place and audited from one place.


LLM Gateway Policy — [Gateway Instance Name]

1. Identification and ownership

FieldValue
Gateway instance name[e.g., “GateKeep — Primary”]
Deployment scope[production, staging, regional, global]
Gateway owner[single accountable individual, team]
Policy-engine platform[Open Policy Agent / Cedar / internal rule engine / other]
Policy-repository location[version-controlled path]
Security reviewer[name, role]
Governance reviewer[name, role]
Effective dateYYYY-MM-DD
Policy version[e.g., 1.4.2]

2. Allow-list matrix

The allow-list is a matrix of (calling application × data class × client or matter context × jurisdiction) → permitted providers and configurations. It is expressed as rules in the policy-engine language; this section captures the rule set in human-readable form.

2.1 Dimension definitions

DimensionValuesSource of truth
Calling application[enumerated application IDs][service catalog]
Data class[public / internal / confidential / restricted][data classification service]
Client or matter context[attached matter ID with “no-third-party” flag; or “none”][matter management system]
Jurisdiction[EU, UK, US-named-state, APAC-country][tenant attribute]
Provider channel[managed API, cloud platform, self-hosted][provider registry]
Model tier[frontier, general, small-fast][provider registry]

2.2 Rules (human-readable)

Write each rule in the form: “For [application] on [data class] with [context] in [jurisdiction], the allowed configurations are […] and the default is […]. Override to […] is allowed under [condition].”

Required rules at minimum (organizations add more as their application inventory requires):

  1. Default permit rule. For internal applications on internal-class data with no client-matter context, in approved jurisdictions, the default configuration is [named configuration], with override to [alternate]. Rule applies unless a more specific rule overrides.
  2. Restricted-class rule. Any request carrying restricted-class data must pass the redaction pipeline. Unredacted restricted-class requests are denied. Requests on restricted data after redaction may proceed only via [listed provider channels].
  3. No-third-party rule. Any request attached to a matter flagged “no-third-party” must route to the self-hosted path only. If the self-hosted path is unhealthy, the request is denied with a typed error, not silently routed to a managed API.
  4. Jurisdiction residency rule. Requests originating in [jurisdiction] must route to providers whose residency posture for [data class] is compatible. A rule per jurisdiction-data-class pair.
  5. Deny-by-default rule. Any request that does not match a permit rule is denied.

2.3 Deny response contract

FieldValue
Deny code[typed code returned to calling application]
Rule ID disclosure[which denying rule’s ID is surfaced to the caller]
Other-rule leakage protection[the policy does not enumerate alternative rules in the deny response]
Audit log entry[what fields are written to the denial audit stream]
Caller escalation path[how a legitimately denied request is reviewed]

3. Redaction pipeline

3.1 Detection taxonomy

At minimum eight classes. For each class: what it detects, what implementation family (NER model, regex, deny-list, hybrid), what the false-positive and false-negative targets are, and what the evaluation set used to measure them.

ClassExamplesDetection familyFP targetFN target
Personal identifiersNames, emails, phone numbersNER + regex[e.g., ≤ 2%][e.g., ≤ 1%]
Sensitive identifiersBank accounts, medical codesRegex + deny-list[…][…]
Client-matter identifiersMatter numbers, opposing partiesDeny-list[…][…]
Secret patternsAPI keys, access tokensRegex[…][…]
Location dataAddresses, coordinatesNER[…][…]
Special-category (GDPR Art. 9)Health, biometric, otherHybrid[…][…]
Commercial-sensitiveDeal codes, transaction IDsDeny-list[…][…]
Domain-specific[feature-specific patterns][implementation][…][…]

3.2 Replacement policy

FieldValue
Placeholder scheme[typed placeholders like [PERSON_NAME], [EMAIL]]
Stable-surrogate scheme[within a request, the same name maps to the same surrogate so downstream reasoning works]
Retention of original-to-surrogate mapping[held at the gateway; de-redaction before return to caller; retention duration]
Low-confidence fail-mode[if detector confidence is below the named threshold, the request is denied rather than passed]

3.3 Output-scan policy

FieldValue
Output classes scanned[same taxonomy applied to generated output]
Action on output match[redact before return, or flag and deliver with warning, or deny and log]
Scanner performance budget[target p99 added latency from output scan]

4. Rate-limiting and cost-attribution

4.1 Rate limits

ScopeAlgorithmLimitAction on breach
Per-calling-application[token bucket / leaky bucket / concurrency][requests per second][429 with retry-after]
Per-tenant[…][…][…]
Per-user[…][…][…]
Per-provider-channel[…][…][…] (protects against provider outages and cost spikes)

4.2 Cost attribution

FieldValue
Cost tag dimensions[tenant, cost center, application, user]
Cost computation[at-request computation using input-tokens × input-price + output-tokens × output-price, or platform-specific]
Reconciliation cadence[e.g., weekly against provider invoice]
Tolerance before investigation[e.g., > 3% discrepancy or > $X absolute]
Budget alert levels[percent of monthly budget at which alert fires; who receives]
Cost-cap enforcement[hard cap at gateway layer; behaviour when exceeded — graceful degrade to cheaper model or deny]

4.3 Multi-tenancy

FieldValue
Tenant isolation in policy store[…]
Tenant isolation in rate-limit store[per-tenant counters]
Tenant isolation in log stream[per-tenant log streams or tagged records]
Tenant isolation in cost stream[per-tenant billing records]
Per-tenant-outage containment[assertion: a single tenant’s outage does not degrade other tenants]

5. Change-management workflow

FieldValue
Change authors[who may propose a policy change]
Review path[security review, governance review, architecture review — named reviewers and SLA]
Rollout modes[immediate (emergency), feature-flag ramp, percentage ramp]
Emergency change protocol[who can bypass the review path, under what condition, with what after-the-fact review]
Rollback protocol[time-to-rollback SLO, decision-authority]
Change log retention[duration, location, access]

6. Review and amendments

RoleNameDecisionDate
Gateway owner[…]AuthoredYYYY-MM-DD
Security reviewer[…]ApprovedYYYY-MM-DD
Governance reviewer[…]ApprovedYYYY-MM-DD
Architecture reviewer[…]ApprovedYYYY-MM-DD

Amendment log with date, author, sections affected, re-approvals obtained. Emergency amendments are logged at the time of commit and reviewed within 10 business days.


Notes on use

When to use this template. Any centralized LLM gateway. Organizations without a centralized gateway still benefit from the template as a structural checklist.

Common errors in first-time use. Deny-by-default rule missing; redaction FP/FN targets unquantified; cost-attribution reconciliation cadence absent; change-management rollback SLO not stated; no emergency-change protocol. Reviewers treat these as blocking.

What follows. The policy record is cited from Template 1 §8 (security architecture). It is re-reviewed quarterly at minimum and whenever a new calling application, a new provider, a new data class, or a new jurisdiction enters scope.


© FlowRidge.io — COMPEL AI Transformation Methodology. All rights reserved.