This article defines the AI platform scope, classifies capability candidates between platform and product, walks the maturity curve from MVP through shared to mature, and anchors each stage in a public engineering precedent.
Why platforms matter — the third-use-case test
The test for whether the platform is earning its keep is simple. When the fourth business unit asks for an AI feature, how long does it take to stand up a reasonable first version? If the answer is measured in weeks and not quarters, the platform is doing its job. If the answer is still quarters, one of two things is true: either the platform is too thin (not enough shared capability) or it is too opinionated (product teams cannot fit their problem into the platform’s mould).
Uber’s publicly documented Michelangelo platform, launched around 2015 for classical ML, made the third-use-case test explicit.1 Michelangelo’s thesis was that most ML workflow steps — feature management, training, deployment, monitoring — were identical across use cases and could be productised. The result was hundreds of ML models in production within a few years, a number that would have been impossible had every team stood up its own pipeline.
Netflix’s Metaflow, open-sourced in 2019, took a similar stance for its data-science workflows.2 Metaflow did not try to solve every step; it standardised the flow construct, the versioning, and the compute abstraction, and left modelling to the scientist. The platform’s scope choice — what it solves and what it deliberately leaves to product teams — is the architect’s most consequential platform decision.
For generative AI, the canonical public precedent is Spotify’s 2023 engineering description of its AI platform layer for recommendations and content understanding.3 The lessons generalise even though each company’s stack differs.
Platform versus product — the capability split
The architect’s first job on a platform team is to classify candidate capabilities into platform versus product. The usual test: if more than one product team needs it and the correct implementation is not obvious, it belongs on the platform. If only one team needs it today, or if the implementation is genuinely product-specific, it stays in the product.
A reasonable default split for GenAI applications in early 2026:
| Capability | Platform or product | Rationale |
|---|---|---|
| Model access (closed-weight APIs, self-hosted open weights) | Platform | Quota, billing, key management, routing — none of this is product-specific. |
| Prompt storage and versioning | Platform | The registry (Article 21) is shared. |
| Orchestration framework (LangChain, LlamaIndex, LangGraph, Haystack, DSPy, custom) | Platform picks one or two | Cross-team consistency beats team-by-team choice in all but niche cases. |
| Retrieval service (vector store, reranker, chunk pipeline) | Platform | Retrieval (Articles 4, 5, 6) is hard to do well and easy to do badly. |
| Evaluation harness | Platform | Offline eval infrastructure (Article 11) is shared; eval datasets per product. |
| Observability and tracing | Platform | One tracing standard across teams. |
| Guardrail service (PII redaction, refusal, policy enforcement) | Platform | Safety is not a per-product concern. |
| Domain corpora | Product (with platform ingest) | The platform provides the ingest pipeline; the team owns the corpus. |
| Prompt design and per-task eval sets | Product | These are product intellectual property. |
| UX affordances | Product | The user experience is where product teams differentiate. |
The split is not immutable. A capability that starts in product sometimes graduates to platform once a second team needs it and a defensible abstraction exists. A capability that starts on the platform occasionally retreats to product when the platform abstraction fights too many use cases.
The maturity curve — MVP, shared, mature
Platforms evolve through recognisable stages. The architect who plans the maturity curve avoids the two common traps: over-investing in platform before product demand justifies it, and under-investing once the demand is undeniable.
MVP platform. The first product ships with a minimum shared layer: one managed model API integrated with identity, an observability-and-tracing hookup, a thin orchestration wrapper, and a policy for where prompts live. Most of the work is still in product. This stage lasts from first prototype through the first productionised use case, typically three to six months.
Shared platform. The second and third product teams come on. The platform gains a retrieval service, a shared eval harness, a guardrails service, and a prompt and model registry. The platform team is small (two to five engineers) but distinct from product teams. This stage is where most of the classification decisions above get made and hard-coded.
Mature platform. Five or more product teams consume the platform. The platform has its own roadmap, service-level agreements with product teams, office-hours and on-call rotations, a cost-allocation model (Article 33), and a platform-API versioning discipline. At this stage the platform has earned the right to enforce opinions: teams that want to bypass it have to justify the exception and absorb the costs.
Platform services — scope detail
Model access
Managed closed-weight API routing (Anthropic Claude, OpenAI, Gemini), open-weight self-hosted endpoints (Llama 3, Mistral, Qwen served via vLLM or Text Generation Inference), and cloud-platform catalogues (Bedrock, Azure AI Foundry, Vertex) all pass through a single platform-owned gateway. The gateway handles authentication, per-tenant quota, cost telemetry, content filtering, and routing policy. Product teams call one internal endpoint and the platform decides where the request goes.
Retrieval
A shared vector-store offering — typically Qdrant, Weaviate, or pgvector depending on organisation — with a managed chunk pipeline and a hybrid-retrieval configuration. Product teams bring their corpus; the platform runs ingestion, embedding, indexing, and querying. Advanced teams are allowed to opt out but the opt-out path is documented.
Evaluation
A shared harness runs offline suites against registered models and prompts (Article 11). Product teams supply their eval sets and their thresholds. The platform provides the runners, the dashboards, and the promotion gate.
Observability
One tracing standard (OpenTelemetry-based with AI-specific attributes) and one AI-observability tool (Langfuse, Arize, Weights & Biases, MLflow, Humanloop, or a build-your-own stack). Product teams inherit the instrumentation; breaking away requires a justification.
Guardrails
PII redaction on ingress and egress, policy-based refusal for restricted topics, prompt-injection filters, and output validators. The guardrail is one service called by all product traffic.
Registry
The model, prompt, and index registry of Article 21 is a platform service. Registry mutations flow through the platform change-management process (Article 19).
Governance integration
The platform is where EU AI Act Articles 11 (technical documentation), 12 (record-keeping), and 15 (robustness and cybersecurity) obligations concentrate.4 A mature platform turns these obligations from per-product engineering efforts into shared-service outputs. The platform’s evidence pack — its architecture documentation, its SLOs, its incident log, its change log — covers many obligations once rather than per product.
ISO/IEC 42001 clauses 8.1 (operational planning and control) and 8.2 (system lifecycle) map cleanly to the platform’s scope boundary and release discipline. NIST AI RMF MAP 5.1 (evaluating the AI system in the context of the deployment environment) is materially easier when the deployment environment is a well-documented platform.
Org model and funding
The platform team is typically funded centrally rather than by product charge-back in early stages. The reason is political rather than economic: charging back too early creates an incentive for product teams to avoid the platform, which breaks the network effect that platforms depend on. Once the platform is clearly earning its keep (around the mature stage), chargeback becomes viable and appropriate. FinOps (Article 33) covers the cost model in depth.
RACI clarity matters. The platform team is responsible for platform services; product teams are responsible for product logic and domain corpora. The platform-lead architect is accountable for the platform architecture; the product-lead architects are accountable for their product architectures and consulted on platform evolution. Governance, risk, and compliance functions are consulted on both.
Anti-patterns
- The premature platform. Building a platform before there are two product teams with overlapping needs is speculative. The first product should ship and then the platform team extracts the commonalities.
- The everything-platform. Trying to solve every decision on the platform side strips product teams of legitimate differentiation. The split is a design problem, not a maximalist exercise.
- The vendor-native platform. A platform that is a thin wrapper over one vendor’s managed service is not a platform, it is lock-in. The platform should abstract the vendor away behind a stable internal API.
- The platform with no product consumers. Visible signs: the team builds things, blogs about them, nobody uses them. Fix: co-design new capabilities with a named product team that will consume them at launch.
- The platform with too many opinions. A platform that refuses to let any team use Weaviate when the whole stack is Qdrant is reasonable. A platform that refuses to let any team use LlamaIndex when the whole stack is LangChain is defensible. A platform that refuses to let any team write their own orchestration when the need is legitimate is counterproductive.
Platform runway roadmap — a worked template
A three-quarter runway roadmap for a shared-stage platform:
- Q1: Centralise model access gateway. Ship eval harness v1. Stand up tracing standard. First product team (pilot) migrates.
- Q2: Retrieval service v1 with pgvector or Qdrant. Prompt registry v1. Guardrails service v1 (PII + refusal). Second product team onboards.
- Q3: Index registry with promotion pipeline. Cost telemetry and dashboards. Third product team onboards. Platform SLAs published. Platform on-call rotation established.
The roadmap is not prescriptive; it is an artefact to orient the team and to show product leaders what they can count on by when. Public evidence from Uber, Netflix, and Spotify suggests the pace: each took roughly 18 to 30 months from first platform service to mature-stage platform.123
Summary
The AI platform exists so that the third use case costs less than the first. The architect’s platform job is to classify shared-versus-product, plan the maturity curve, run the scope discipline, and produce an evidence pack that supports compliance at the platform level rather than per product. Public precedents — Michelangelo, Metaflow, Spotify AI platform — illustrate the scope choices that have aged well and the patterns to avoid.
Key terms
AI platform Architecture runway Platform-product split Maturity curve (platform) Platform SLA
Learning outcomes
After this article the learner can: explain the AI platform / product-team split; classify eight candidate components by platform or product; evaluate a platform design for cohesion and scope integrity; design a platform runway roadmap for a given organisation.
Further reading
Footnotes
-
Uber Engineering, “Meet Michelangelo: Uber’s Machine Learning Platform” (2017 blog post and subsequent retrospectives). ↩ ↩2
-
Netflix Technology Blog, “Open-Sourcing Metaflow” (2019) and Metaflow documentation. ↩ ↩2
-
Spotify Engineering, “Scaling the Machine Learning Platform” and GenAI platform posts (2022–2023). ↩ ↩2
-
Regulation (EU) 2024/1689 (AI Act), Articles 11, 12, 15. ↩