Classical cloud operating models — centralized, federated, hub-and-spoke — translate to agentic but need adjustment. Agentic systems are unusual in that the domain expertise (what makes a good clinical-decision-support agent) is deeply product-specific while the safety discipline (how to sandbox tool calls) is deeply platform-specific. The operating model has to let both flourish.
Three operating-model archetypes
Archetype 1 — Centralized
A single central team (the COE) builds and operates all agents. Product teams submit requirements; the COE delivers.
When it fits: small organisations; early-maturity programmes where patterns are not yet known; strongly regulated sectors that require uniform control.
Strengths: consistency, concentrated expertise, uniform safety discipline.
Weaknesses: bottleneck; product teams feel distanced from the solution; doesn’t scale past ~5–10 active agents.
Archetype 2 — COE + federated (hub-and-spoke)
A COE (the hub) provides platform services, patterns, templates, and review; federated product teams (the spokes) build agents on the platform.
When it fits: most organisations with 3+ product teams; the workhorse pattern.
Strengths: scales; preserves safety discipline centrally while letting domain teams move; the COE earns its keep by making product teams faster.
Weaknesses: requires investment in platform and templates; risks “hub envy” (spokes resent hub review); requires explicit decision-rights.
Archetype 3 — Fully federated
Product teams own the full stack — their own platform, their own policies, their own safety discipline. A small central function provides guidelines.
When it fits: very large organisations; business units with distinct risk profiles; holding companies.
Strengths: maximum team autonomy; no central bottleneck.
Weaknesses: inconsistent safety discipline; duplicated platforms; the hardest operating model to audit; most vulnerable to incidents the central team cannot reach.
The COE + federated pattern in detail
Because most organisations land on COE + federated, the architect benefits from a detailed view.
COE responsibilities
- Platform. Runtime, registries, policy engine, sandbox service, observability stack, evaluation harness, kill-switch controller (Article 20 platform capabilities).
- Patterns and templates. Reference architectures for common agent types; ADR templates; evaluation-plan templates; runbook templates.
- Review. Participation in Calibrate, Organize, Model, Produce, Evaluate, Learn gate reviews — Articles 36–38.
- Incident response. Central SRE function or shared on-call for platform incidents; coordination with product-team SRE for product-specific incidents (Article 25).
- Learning curation. Post-mortems and learnings broadcast across product teams; anti-pattern library curated.
Product-team responsibilities
- Use case ownership. The product the agent serves.
- Agent-specific development. Task prompts, use-case-specific tools, domain-specific evaluation, product UI.
- Product operations. Product-level on-call for their agent’s specific incidents.
- Compliance partnership. Product-specific compliance work (DPIA sections, Annex III evidence specific to the use case).
Shared responsibilities (R vs A vs C vs I)
The RACI is where operating models succeed or fail.
Decision rights — five representative responsibilities
The architect maps each significant decision to its RACI. Five representative decisions:
Decision A — Pick the agentic framework.
- A (Accountable): COE architect.
- R (Responsible): COE architect + representative product architect.
- C (Consulted): Security, platform, product teams.
- I (Informed): All other product teams.
- Rationale: framework affects all future agents; platform owns it.
Decision B — Add a new tool to the registry.
- A: Tool owner (system owner).
- R: Product team adding the tool + COE tool registrar.
- C: Security, data governance, compliance (if regulated data).
- I: Other product teams that might use it.
Decision C — Expand agent autonomy from L2 to L3.
- A: Product lead.
- R: Product architect + COE architect (joint).
- C: Security, compliance, legal, SRE, HITL reviewer of the current design.
- I: Business sponsor, users (via disclosure update).
Decision D — Retire an agent version.
- A: Product lead.
- R: Product team + COE promotion engineer.
- C: Users (deprecation notice), compliance (evidence archive).
- I: Platform (to update registries).
Decision E — Ship an urgent fix for a production incident.
- A: Incident commander (product SRE or COE SRE depending on scope).
- R: Engineer executing the fix.
- C: COE architect (if fix changes architecture), security (if fix has security implications).
- I: Everyone affected.
The architect’s role in the operating model
In a COE + federated model there are typically two architect roles:
COE architect (often the AITE-ATS holder with most seniority). Owns the reference architecture, the pattern library, the gate-review participation, the platform-evolution roadmap.
Product-embedded architect (often an AITE-ATS holder or aspirant). Owns the specific agent’s architecture, its ADRs, its runbooks, its evaluation plan; partners with the COE architect at gate reviews.
The two architects are allies, not rivals. COE protects the platform; embedded architect protects the product. Healthy operating models make that division of labour explicit and celebrate both.
Common failure modes of operating models
Failure 1 — Phantom COE. A COE exists on the org chart but has no decision rights, no platform budget, and no review authority. Product teams bypass it. Fix: define platform services with budget; give COE gate-review veto on designated decisions.
Failure 2 — Bottleneck COE. Everything routes through the COE; product teams wait; the COE becomes the blocker. Fix: templates and self-service for common patterns; COE focuses review on high-stakes decisions; platform services reduce coupling.
Failure 3 — Federal anarchy. Product teams go their own way; multiple incompatible platforms emerge; safety discipline is uneven. Fix: define a minimum-viable standard all teams must meet; run platform-share-of-voice analysis to detect fragmentation.
Failure 4 — Compliance-owned by default. Compliance becomes the de-facto safety architect because no technical authority is named. Fix: AITE-ATS holder names themselves the technical authority; compliance partners rather than directs.
Failure 5 — SRE surprise. Product teams build agents without SRE partnership; at production launch, SRE inherits the unknown. Fix: SRE in the Organize gate; pre-production SRE-readiness review mandatory.
How the architect introduces or improves an operating model
If the organisation has none, the architect’s first deliverables are:
- Operating-model one-pager. Names the archetype being adopted, the rationale, the primary teams, the initial platform services.
- RACI for five representative decisions. Concrete decisions, not abstractions. The team can reason from these to new cases.
- Gate-review contracts. What each gate requires, who attends, what decisions are made (Articles 36–38).
- Initial pattern library seed. Three or four reference patterns (chat agent, research agent, back-office agent) for product teams to start from.
- Platform-services roadmap. What the COE will deliver in the next quarter to earn its role as hub.
The architect’s follow-up work over time is to raise the ratio of agents the platform serves to total agents, and to reduce gate-review friction without eroding safety discipline.
Real-world references
ING Bank public engineering blog on AI operating model. ING has written publicly about the transition from fully decentralized AI/ML projects toward a COE + federated model with shared platform services. Lessons highlight the importance of product-team autonomy preservation.
ThoughtWorks Technology Radar entries on AI operating models. ThoughtWorks’ radar periodically covers AI operating-model patterns; their “opinions on AI platforms” shift over time and are a useful calibrating reference.
McKinsey State of AI reports (periodic). McKinsey publishes operating-model findings from enterprise surveys. Use with care — survey data is self-reported — but the directional trends are informative.
AWS Well-Architected Framework Machine Learning Lens and the Generative AI Lens. AWS publishes architectural-patterns documents informed by customer operating models; the Generative AI Lens (2024) touches operating-model questions directly.
Anti-patterns to reject
- “The architect reports to compliance.” The architect is a technical authority; compliance is a peer, not a superior.
- “We don’t need a COE; each team decides.” Without a COE, the platform never emerges and every team rebuilds the same things.
- “The COE owns all agents.” Centralisation at scale becomes a bottleneck; federation is inevitable past a point.
- “SRE joins after production.” SRE is a design partner from Organize onward.
- “The operating model is the org chart.” Org charts encode reporting lines; operating models encode decision rights. They differ.
Learning outcomes
- Explain three operating-model archetypes (centralized, COE + federated, fully federated) and their fit conditions.
- Classify five representative responsibilities by owner in a COE + federated operating model.
- Evaluate an operating model for bottlenecks, phantom-COE signs, and SRE engagement timing.
- Design a RACI and operating-model one-pager for a given organisation including decision rights, gate contracts, and initial platform-services roadmap.
Further reading
- Core Stream anchors:
EATF-Level-1/M1.2-Art15-The-COMPEL-Operating-Model-Roles-and-Decision-Rights.md;EATF-Level-1/M1.6-Art04-The-AI-Center-of-Excellence.md. - AITE-ATS siblings: Article 20 (platform), Article 25 (incident response), Articles 36–38 (stage-gate reviews).
- Primary sources: ING Bank engineering blog AI posts; ThoughtWorks Technology Radar; McKinsey State of AI reports (most recent); AWS Well-Architected Generative AI Lens.