Skip to main content
AITE M1.1-Art73 v1.0 Reviewed 2026-04-06 Open Access
M1.1 Foundations of AI Transformation
AITF · Foundations

Artifact Template: RAG Data Contract

Artifact Template: RAG Data Contract — AI Strategy & Vision — Advanced depth — COMPEL Body of Knowledge.

7 min read Article 73 of 48

AITE-SAT: AI Solution Architecture Expert — Body of Knowledge Artifact Template


How to use this template

This template is completed once per source system that contributes documents to a retrieval-augmented generation (RAG) corpus. A RAG feature that retrieves from five source systems has five completed data contracts, each signed by the source-system owner and the RAG feature’s evaluation and governance owners. The contracts collectively form the data register for the feature and are cross-referenced from Template 1 §5.

The contract is a governance artifact, not a technical specification. It answers: who owns the source, how it changes, what the ingestion SLA is, what retraction means, what the retention rule is, who can see what, and what happens when something goes wrong. Implementation decisions (which embedding model, which index backend, which chunking strategy) are captured in the architecture design document, not here.


RAG Data Contract — [Source Name]

1. Identification and ownership

FieldValue
Source system name[e.g., “Product Documentation Repository — Cloud Team”]
Source system owner[name, role, team]
Source system operator[team responsible for day-to-day running]
Consuming RAG feature(s)[the feature(s) this contract governs]
Contract author[solution architect]
Data-governance reviewer[name, role]
Effective dateYYYY-MM-DD
Next review dateYYYY-MM-DD

2. Source scope

One-to-three paragraphs describing what the source is, what documents it contains, the document format(s), the approximate volume (document count, byte count), the approximate rate of change (documents added, modified, removed per time unit), and the business role the source plays for the feature. Be explicit about what is in scope and what is not — a product-documentation repository may include articles, release notes, and FAQs but exclude marketing collateral, and the contract names both the inclusion and the exclusion.

3. Data classification

FieldValue
Sensitivity class[public / internal / confidential / restricted]
Personal data present?[yes / no; if yes, categories per GDPR Article 4 or equivalent]
Special categories (GDPR Article 9, or equivalent)[yes / no; if yes, which]
Commercial-sensitive markers[client-matter identifiers, deal codes, transaction IDs, other]
Regulated data[financial record under MiFID, medical under HIPAA, etc.]
Residency constraint[EU-only, country-specific, none]
Inherited-from classification[the source system’s overall class, if different from per-document class]

If the source contains mixed-class documents (most are internal but a minority carry restricted content), the contract states the strictest class that applies and the detection protocol that flags restricted-class documents at ingestion.

4. Ingestion and refresh

FieldValue
Ingestion mode[event-driven (webhook, change feed), scheduled pull, manual upload]
Refresh cadence[e.g., every 15 minutes; or “within 4 hours of source change”]
Ingestion SLA[target latency from source-change commit to index availability]
Ingestion SLA violation response[alert destination, on-call owner, remediation time target]
Change-feed source[API, message bus, file drop, other — with URL or identifier]
Back-pressure protocol[what happens if the ingestion pipeline cannot keep up: queue, drop oldest, pause upstream]
Idempotency key[the field or fields that dedupe re-ingested documents]
Schema version[current version; how a schema change is rolled out]

5. Retraction and deletion

The retraction section is the most common gap in practice. It is required and must be specific.

FieldValue
Retraction signal[how the source signals a document has been retracted — a tombstone, an update with a retracted flag, a deletion event]
Retraction SLA[target latency from source retraction to index removal]
Soft-delete vs hard-delete[whether the index retains a tombstone or fully removes]
Cached-retrieval contamination[how retrieval pipelines that already loaded the retracted document are drained — cache invalidation, TTL, explicit purge]
User-facing response on retrieved-and-retracted[what the feature returns if a retrieved passage is discovered to be retracted between retrieval and response composition]
Hard-delete retention record[what evidence is retained after hard deletion — the retention store, the duration, the access controls]
GDPR erasure (Article 17) handling[if personal data, the process for honoring a lawful erasure request that reaches the source or the RAG feature directly]

A paragraph describing a specific worked example of a retraction flowing end-to-end — source marks document retracted at T0; change feed carries the retraction at T0 + δ1; index removes at T0 + δ2; cached retrieval TTLs expire by T0 + δ3; end-to-end p99 latency from T0 to full propagation is X. The worked example is the teaching artifact the on-call engineer reads when a retraction incident occurs.

6. Access control and tenancy

FieldValue
Access model[open within tenant; role-scoped; attribute-scoped; matter-scoped]
Attribute sources[identity provider, group membership, matter system, other]
Enforcement layer[where the access filter is applied — at retrieval query, at result post-filter, at prompt assembly]
Tenant isolation[if multi-tenant, the isolation model — single index with tenant filter, per-tenant index, physical separation]
Cross-tenant leakage test[the CI or production test that asserts isolation; how breach is detected]
Principle of least privilege[how a user’s access to the RAG feature reflects their access to the source system — same access, reduced access, escalated access with justification]

7. Quality signals

FieldValue
Freshness metric[percentage of documents with age under the staleness threshold]
Coverage metric[percentage of queries on the golden set that find at least one relevant document]
Citation-validity metric[percentage of generated citations that correctly reference the source passage]
Retrieval-only failure budget[tolerated rate of retrieval-empty or retrieval-low-confidence outcomes]
Feedback loop[how user feedback flags source quality issues and reaches the source owner]

8. Incident and change response

ScenarioResponseOwner
Ingestion pipeline outage[fail-back posture, user-facing behaviour][on-call team]
Retraction-propagation breach[detection, communication, remediation][data-governance owner]
Access-control breach (wrong user sees wrong document)[detection, communication, remediation, regulatory notification triggers][security owner]
Sensitive-class leak into public-class stream[detection, purge, audit record][data-governance owner]
Schema-change regression[rollback, re-index, communication][source operator]

9. Signatures

RoleNameDecisionDate
Source-system owner[…]ApprovedYYYY-MM-DD
RAG feature evaluation owner[…]ApprovedYYYY-MM-DD
Data-governance reviewer[…]ApprovedYYYY-MM-DD
Solution architect[…]AuthoredYYYY-MM-DD

10. Amendments

Each amendment: date, author, change summary, sections affected, re-approvals obtained. A material amendment (change of classification, change of retraction SLA, change of access model) requires re-approval by the data-governance reviewer.


Notes on use

When to use this template. Every source system that contributes documents to a production RAG feature. Sources under pilot may use a provisional contract; the final contract is required before production traffic.

Common errors in first-time use. Missing retraction SLA; “not applicable” on access control without justification; coarse classification applied uniformly across mixed-class sources; no worked retraction example; no cross-tenant-leakage test. Data-governance reviewers treat these as blocking.

What follows. The RAG data contract is cited from Template 1 §5 and from the feature’s regulatory-evidence appendix. It is re-reviewed annually at minimum, or whenever the source system’s ownership, residency, classification, or ingestion posture changes.


© FlowRidge.io — COMPEL AI Transformation Methodology. All rights reserved.