Model, Prompt, and Index Registries

FlowRidge

This article teaches the architect the minimum schema for each registry, how the three relate, and how the promotion pipeline from Article 19 writes to them as state transitions.

Why three registries

A fine-tuned model, a prompt template, and a retrieval index are three different artifact types with different lifecycles, different ownership patterns, and different rates of change.

Models are expensive to produce and change less frequently. A managed-model version changes quarterly or semi-annually; a fine-tuned model changes as often as the underlying data or the business need.
Prompts are cheap to produce and change frequently — often weekly or more. A prompt change is a material change to system behavior and deserves the same governance as a code change but in practice rarely gets it.
Retrieval indexes change on two cadences: continuous (individual documents added, updated, deprecated) and discrete (re-indexing, embedding-model change, chunking-strategy change). Both need to be tracked.

Collapsing these into one registry misses the lifecycle differences. Running them as entirely separate systems fails when a production incident needs to correlate “which model + which prompt + which index was running at 14:32?” The architect’s answer: three registries, one lineage graph across them, one identifier — the release manifest — that pins specific versions from each for a given deployment.

Model registry

Minimum schema

A model registry entry should carry, at minimum:

model_id (stable identifier, e.g., customer-support-assistant-v3)
version (e.g., 3.2.1)
base_model (gpt-4o-2024-08-06, claude-3-5-sonnet-20241022, llama-3-1-70b-instruct, etc.) with provider
fine_tune_parent (if derivative)
training_data_refs (if fine-tuned — pointers to dataset versions; link to data governance artifacts)
evaluation_report_ref (link to the eval report that qualified this version)
safety_report_ref (red-team and safety eval results)
card_ref (model card per Google’s Model Cards for Model Reporting pattern and ISO/IEC 23053)¹
status (draft / staging / canary / production / deprecated / retired)
created_by, created_at, approved_by, approved_at
license (for base models: OpenAI commercial, Anthropic commercial, Llama 3 Community, Apache 2.0, etc.)
residency (permitted jurisdictions for this version)

The registry API supports: register, promote, deprecate, query-by-status, query-by-base, diff-between-versions. MLflow, Vertex AI Model Registry, W&B Models, Hugging Face Model Hub, and a handful of commercial tools implement this pattern; the architect usually picks one and wraps it in a thin abstraction so registry migration is possible later.

Lineage

Model lineage traces: data → training → artifact → deployment. EU AI Act Article 10 (data governance) and Article 11 (technical documentation) require this lineage in some form for high-risk systems, and ISO/IEC 42001 Clause 8.3 expects AI system lifecycle documentation.² The registry is where that lineage lives as operational data, not as occasional artifact.

Prompt registry

Why prompts need a registry

A system prompt, a user-message template, and a tool-description text are production code. Changing any of them changes system behavior. In many teams, prompts live in a config file, a JSON blob, or worse, inline in source code — none of which gives the governance needed to trace behavior back to a specific template version at a specific time.

The Italian Garante’s ChatGPT rulings and the EU AI Act Article 11 technical documentation requirement both imply that the operator of a production AI system must be able to state, for any user interaction, what instruction set the model was running.³ Only a prompt registry can answer that question reliably.

Minimum schema

prompt_id (e.g., cse-system-prompt)
version (semver or monotonic)
template_body (the actual text, with variable placeholders)
variables_schema (the expected variables and their types)
tool_schemas_refs (the tool descriptions that accompany this prompt)
owner (usually a named person or team)
status (draft, staging, canary, production, deprecated)
created_by, created_at, approved_by, approved_at
eval_ref (the eval report that qualified this prompt)
injection_test_ref (the injection/safety test results)
supersedes (link to the prior version)

Several OSS and commercial prompt-registry tools exist — Langfuse, PromptLayer, Humanloop, LangSmith — and some architects build registries internally on top of Git or a standard document store. The architect’s concern is less the tool than the discipline: every production prompt is registered, every change is a versioned event, and the promotion pipeline treats prompt changes like code.

Prompt provenance in the trace

Each AI trace (Article 13) should record the prompt version in use when the request was served. Without this, post-incident forensics is impossible. Langfuse, LangSmith, and Weights & Biases Weave all support prompt-version propagation in traces.

Retrieval-index registry

The index as a versioned artifact

A retrieval index is a function from query to ranked passages. It has a corpus version (which documents at which revisions are included), an embedding-model version (which embedding model was used to compute the vectors), a chunking-strategy version, and an index-algorithm configuration (HNSW parameters, for example). Any of these can change, and each change can silently shift retrieval behavior.

The architect registers the index as a composite artifact:

index_id
version
corpus_manifest_ref (which documents, at which revisions)
embedding_model_ref (text-embedding-3-small, bge-large-en-v1.5, cohere-embed-multilingual-v3, or the in-house version)
chunking_strategy_ref (fixed-512-50-overlap, sentence-window, semantic-markdown)
store_backend (pinecone, weaviate, qdrant, pgvector, milvus, opensearch)
algo_params (HNSW M and ef, IVF nlist, etc.)
eval_ref (retrieval-eval results: recall@k, MRR, citation accuracy)
freshness_metadata (per-document timestamps)
status, owner, approval metadata

Why this matters

The Harvey AI public blog on legal retrieval and the Perplexity passage-chunking post both describe index changes that materially moved quality.⁴ An index change that bypasses the registry is indistinguishable from a production incident — downstream quality changes, and the team cannot identify why. With the registry, the change is a recorded event with evaluations attached; rollback is a registry operation rather than a re-indexing run.

Release manifest: how the three registries compose

A release is a pinning of specific versions from each registry. The release manifest is a small document:

release: customer-support-assistant@2026-03-15
environments: [production]
model: customer-support-assistant-v3.2.1
prompt: cse-system-prompt@v12
index: cse-knowledge@2026-03-15
guardrail_policy: cse-guardrails@v5
tool_schemas: [escalation-tool@v3, order-lookup@v2]

The manifest is what the promotion pipeline promotes. Rollback is rolling back the manifest, not individual artifacts. The manifest is the primary evidence artifact for EU AI Act Article 12 record-keeping and for most regulators asking “what was running when event X happened?”

Access, ownership, and approval

Each registry has roles. Typical breakdown:

Model registry: ML lead registers; AI/ML architect approves for staging; security plus product lead approve for canary; architecture council approves for production.
Prompt registry: Product/prompt engineer registers; AI lead approves for staging; product plus safety reviewer for canary and production.
Index registry: Data engineer registers; retrieval lead approves for staging; content/legal review for canary and production (because corpus inclusion touches copyright, licensing, residency).

The specific role names vary by organization; what matters is that no single role can unilaterally push a change to production. This is the registry’s governance function.

Integration with observability

Registries and observability (Article 13) share a spine: the trace must carry the IDs of the active model, prompt, index, and tool-schema versions. Langfuse, Weights & Biases Weave, Arize, and OpenTelemetry’s GenAI semantic conventions (2024 draft) all support this pattern.⁵ When a production incident starts, the on-call goes from the trace to the registry entries in one click, reads the recent changes, and forms the first hypothesis.

Governance and compliance mapping

EU AI Act Article 11 (technical documentation) and Article 12 (record-keeping): registries are the source of truth for what was in production and when.²
ISO/IEC 42001 Clause 8.3 (lifecycle) and Clause 9.1 (monitoring): lifecycle states in the registry satisfy 8.3; the monitoring attached to each production version satisfies 9.1.⁶
NIST AI RMF MAP 2.2 (categorization) and MEASURE 2.7 (safety testing): the registry carries the categorization tags and the safety-test references.⁷
ISO/IEC 23894 Annex A (risk sources): the registry links each model version to the risk assessment that accompanied its approval.⁸

Anti-patterns

“Prompts are just config.” This treats production prompts as unprivileged, which invariably produces the silent-regression case.
Model registry that stores artifacts but not lineage. MLflow or W&B pointing at a model binary is a start, but without data and eval lineage, the registry fails regulator tests.
Index that changes in place. Re-indexing the live production index rather than building a new version and promoting it removes rollback.
One registry per team. When each team runs its own prompt registry with its own schema, correlation and audit become impossible. The architect pushes for one registry per artifact type at the platform level.

Summary

Three registries — model, prompt, index — with one release-manifest overlay give the architect the provenance graph needed to operate and govern AI at scale. The registries are not optional. They are the operational spine of everything in Articles 19 (promotion), 20 (SLO/incident), 22 (regulation), and 23 (ADRs and documentation). A team without them may ship AI; they cannot govern it.

Key terms

Model registry
Prompt registry
Retrieval-index registry
Release manifest
Lineage

Learning outcomes

After this article the learner can: explain the three registries and their minimum fields; classify AI artifacts by the correct registry; evaluate a registry design for lineage gaps; design a registry spec for a given platform.

Why three registries

Model registry

Minimum schema

Lineage

Prompt registry

Why prompts need a registry

Minimum schema

Prompt provenance in the trace

Retrieval-index registry

The index as a versioned artifact

Why this matters

Release manifest: how the three registries compose

Access, ownership, and approval

Integration with observability

Governance and compliance mapping

Anti-patterns

Summary

Key terms

Learning outcomes

Further reading

Footnotes