Operating Model for Agentic Systems

FlowRidge

Classical cloud operating models — centralized, federated, hub-and-spoke — translate to agentic but need adjustment. Agentic systems are unusual in that the domain expertise (what makes a good clinical-decision-support agent) is deeply product-specific while the safety discipline (how to sandbox tool calls) is deeply platform-specific. The operating model has to let both flourish.

Three operating-model archetypes

Archetype 1 — Centralized

A single central team (the COE) builds and operates all agents. Product teams submit requirements; the COE delivers.

When it fits: small organisations; early-maturity programmes where patterns are not yet known; strongly regulated sectors that require uniform control.

Strengths: consistency, concentrated expertise, uniform safety discipline.

Weaknesses: bottleneck; product teams feel distanced from the solution; doesn’t scale past ~5–10 active agents.

Archetype 2 — COE + federated (hub-and-spoke)

A COE (the hub) provides platform services, patterns, templates, and review; federated product teams (the spokes) build agents on the platform.

When it fits: most organisations with 3+ product teams; the workhorse pattern.

Strengths: scales; preserves safety discipline centrally while letting domain teams move; the COE earns its keep by making product teams faster.

Weaknesses: requires investment in platform and templates; risks “hub envy” (spokes resent hub review); requires explicit decision-rights.

Archetype 3 — Fully federated

Product teams own the full stack — their own platform, their own policies, their own safety discipline. A small central function provides guidelines.

When it fits: very large organisations; business units with distinct risk profiles; holding companies.

Strengths: maximum team autonomy; no central bottleneck.

Weaknesses: inconsistent safety discipline; duplicated platforms; the hardest operating model to audit; most vulnerable to incidents the central team cannot reach.

The COE + federated pattern in detail

Because most organisations land on COE + federated, the architect benefits from a detailed view.

COE responsibilities

Platform. Runtime, registries, policy engine, sandbox service, observability stack, evaluation harness, kill-switch controller (Article 20 platform capabilities).
Patterns and templates. Reference architectures for common agent types; ADR templates; evaluation-plan templates; runbook templates.
Review. Participation in Calibrate, Organize, Model, Produce, Evaluate, Learn gate reviews — Articles 36–38.
Incident response. Central SRE function or shared on-call for platform incidents; coordination with product-team SRE for product-specific incidents (Article 25).
Learning curation. Post-mortems and learnings broadcast across product teams; anti-pattern library curated.

Product-team responsibilities

Use case ownership. The product the agent serves.
Agent-specific development. Task prompts, use-case-specific tools, domain-specific evaluation, product UI.
Product operations. Product-level on-call for their agent’s specific incidents.
Compliance partnership. Product-specific compliance work (DPIA sections, Annex III evidence specific to the use case).

Shared responsibilities (R vs A vs C vs I)

The RACI is where operating models succeed or fail.

Decision rights — five representative responsibilities

The architect maps each significant decision to its RACI. Five representative decisions:

Decision A — Pick the agentic framework.

A (Accountable): COE architect.
R (Responsible): COE architect + representative product architect.
C (Consulted): Security, platform, product teams.
I (Informed): All other product teams.
Rationale: framework affects all future agents; platform owns it.

Decision B — Add a new tool to the registry.

A: Tool owner (system owner).
R: Product team adding the tool + COE tool registrar.
C: Security, data governance, compliance (if regulated data).
I: Other product teams that might use it.

Decision C — Expand agent autonomy from L2 to L3.

A: Product lead.
R: Product architect + COE architect (joint).
C: Security, compliance, legal, SRE, HITL reviewer of the current design.
I: Business sponsor, users (via disclosure update).

Decision D — Retire an agent version.

A: Product lead.
R: Product team + COE promotion engineer.
C: Users (deprecation notice), compliance (evidence archive).
I: Platform (to update registries).

Decision E — Ship an urgent fix for a production incident.

A: Incident commander (product SRE or COE SRE depending on scope).
R: Engineer executing the fix.
C: COE architect (if fix changes architecture), security (if fix has security implications).
I: Everyone affected.

The architect’s role in the operating model

In a COE + federated model there are typically two architect roles:

COE architect (often the AITE-ATS holder with most seniority). Owns the reference architecture, the pattern library, the gate-review participation, the platform-evolution roadmap.

Product-embedded architect (often an AITE-ATS holder or aspirant). Owns the specific agent’s architecture, its ADRs, its runbooks, its evaluation plan; partners with the COE architect at gate reviews.

The two architects are allies, not rivals. COE protects the platform; embedded architect protects the product. Healthy operating models make that division of labour explicit and celebrate both.

Common failure modes of operating models

Failure 1 — Phantom COE. A COE exists on the org chart but has no decision rights, no platform budget, and no review authority. Product teams bypass it. Fix: define platform services with budget; give COE gate-review veto on designated decisions.

Failure 2 — Bottleneck COE. Everything routes through the COE; product teams wait; the COE becomes the blocker. Fix: templates and self-service for common patterns; COE focuses review on high-stakes decisions; platform services reduce coupling.

Failure 3 — Federal anarchy. Product teams go their own way; multiple incompatible platforms emerge; safety discipline is uneven. Fix: define a minimum-viable standard all teams must meet; run platform-share-of-voice analysis to detect fragmentation.

Failure 4 — Compliance-owned by default. Compliance becomes the de-facto safety architect because no technical authority is named. Fix: AITE-ATS holder names themselves the technical authority; compliance partners rather than directs.

Failure 5 — SRE surprise. Product teams build agents without SRE partnership; at production launch, SRE inherits the unknown. Fix: SRE in the Organize gate; pre-production SRE-readiness review mandatory.

How the architect introduces or improves an operating model

If the organisation has none, the architect’s first deliverables are:

Operating-model one-pager. Names the archetype being adopted, the rationale, the primary teams, the initial platform services.
RACI for five representative decisions. Concrete decisions, not abstractions. The team can reason from these to new cases.
Gate-review contracts. What each gate requires, who attends, what decisions are made (Articles 36–38).
Initial pattern library seed. Three or four reference patterns (chat agent, research agent, back-office agent) for product teams to start from.
Platform-services roadmap. What the COE will deliver in the next quarter to earn its role as hub.

The architect’s follow-up work over time is to raise the ratio of agents the platform serves to total agents, and to reduce gate-review friction without eroding safety discipline.

Real-world references

ING Bank public engineering blog on AI operating model. ING has written publicly about the transition from fully decentralized AI/ML projects toward a COE + federated model with shared platform services. Lessons highlight the importance of product-team autonomy preservation.

ThoughtWorks Technology Radar entries on AI operating models. ThoughtWorks’ radar periodically covers AI operating-model patterns; their “opinions on AI platforms” shift over time and are a useful calibrating reference.

McKinsey State of AI reports (periodic). McKinsey publishes operating-model findings from enterprise surveys. Use with care — survey data is self-reported — but the directional trends are informative.

AWS Well-Architected Framework Machine Learning Lens and the Generative AI Lens. AWS publishes architectural-patterns documents informed by customer operating models; the Generative AI Lens (2024) touches operating-model questions directly.

Anti-patterns to reject

“The architect reports to compliance.” The architect is a technical authority; compliance is a peer, not a superior.
“We don’t need a COE; each team decides.” Without a COE, the platform never emerges and every team rebuilds the same things.
“The COE owns all agents.” Centralisation at scale becomes a bottleneck; federation is inevitable past a point.
“SRE joins after production.” SRE is a design partner from Organize onward.
“The operating model is the org chart.” Org charts encode reporting lines; operating models encode decision rights. They differ.

Learning outcomes

Explain three operating-model archetypes (centralized, COE + federated, fully federated) and their fit conditions.
Classify five representative responsibilities by owner in a COE + federated operating model.
Evaluate an operating model for bottlenecks, phantom-COE signs, and SRE engagement timing.
Design a RACI and operating-model one-pager for a given organisation including decision rights, gate contracts, and initial platform-services roadmap.