Build vs. Buy for Agentic Platform Components

FlowRidge

The architect’s contribution is a scoring framework applied explicitly per layer rather than a single decision. The framework is opinionated; it rejects naive benchmarks (“biggest vendor wins”) and naive libertarian positions (“everything in open-source”). It asks, for each layer: strategic fit, cost curve, differentiation, exit cost, and talent availability. The score on those five factors drives the decision.

The five-factor framework

Factor 1 — Strategic fit

Does this layer align with where the organisation believes its advantage lies?

A fintech doing its own model risk management doctrine has strategic fit for owning the evaluation harness; a retailer does not. A healthcare provider has strategic fit for the clinician-review UX layer; it does not have strategic fit for the vector-store runtime.

Factor 2 — Cost curve

How does cost scale with usage at this layer?

A managed vector-store is cheaper at low volume and more expensive at high volume; self-hosted pgvector flips that curve. A commercial LLM is cheap at low volume, expensive at high volume, and trivially elastic; self-hosted inference has a step-function cost curve.

Factor 3 — Differentiation

Is this layer a place where custom work creates user-visible advantage, or is it commodity infrastructure?

Custom user-facing UX differentiates; custom logging infrastructure probably does not. The architect resists “we can build it better” on commodity layers while protecting investment on differentiating ones.

Factor 4 — Exit cost

If the chosen vendor changes terms, degrades, or exits, how hard is it to replace?

Open-source with a healthy community has low exit cost; a proprietary format with no export has high exit cost. Single-vendor dependencies at core layers are the highest-exit-cost commitments — the architect weighs them especially carefully.

Factor 5 — Talent

Does the organisation have or can it hire the talent to run this layer?

Some layers have scarce talent (LLM inference operations, adversarial-ML red teaming); others have abundant talent (generic SaaS integration). Talent-constrained layers tilt toward buy; talent-abundant layers tilt toward build.

Applying the framework per layer

Layer 1 — Framework (LangGraph / CrewAI / AutoGen / OpenAI Agents SDK / Semantic Kernel / LlamaIndex Agents)

Strategic fit: usually low — framework is infrastructure, not differentiation.
Cost curve: open-source frameworks free to run; commercial frameworks (Agentforce, Copilot Studio) priced per user/seat.
Differentiation: moderate — framework choice affects developer productivity but rarely end-user outcomes.
Exit cost: moderate for open-source (patterns port; specific code does not); higher for commercial (vendor-specific features).
Talent: abundant for LangGraph/AutoGen; thinner for Semantic Kernel; depends on location.

Typical decision: integrate from open-source (LangGraph primary), with commercial platforms (Bedrock Agents, Vertex AI Agent Builder) considered when ecosystem integration matters.

Layer 2 — Runtime + orchestration substrate

Strategic fit: low unless the runtime is a differentiated domain.
Cost curve: managed (Bedrock Agents, Vertex AI Agent Builder, AzureML prompt flow) is expensive at scale; self-hosted is cheaper at scale.
Differentiation: typically low.
Exit cost: medium.
Talent: generic DevOps talent suffices for self-hosted on Kubernetes.

Typical decision: self-host on Kubernetes with open-source framework, unless the use case is early and volume is low.

Layer 3 — Tool registry and MCP integration

Strategic fit: medium — tool registry is where custom governance lives.
Cost curve: low build cost; low operating cost.
Differentiation: governance is differentiating (compliance evidence is custom); tool manifests are commodity.
Exit cost: very low with MCP (Model Context Protocol standardises the manifest format).
Talent: generic backend + governance talent.

Typical decision: build the governance registry; integrate MCP for tool-manifest interoperability. MCP is a rare case where emerging standardisation actively reduces build cost and exit cost simultaneously.

Layer 4 — Memory services

Strategic fit: varies — memory is infrastructure but its isolation model may be differentiating.
Cost curve: commercial vector-stores (Pinecone, Weaviate, commercial Chroma) expensive at scale; self-hosted (pgvector, self-hosted Weaviate) cheaper at scale.
Differentiation: typically low; policy around memory is differentiating, not the store.
Exit cost: medium — re-indexing is work but feasible.
Talent: DB-adjacent talent available.

Typical decision: self-host pgvector or similar for cost + control; commercial vector-stores for very-high-scale or when managed ops is required.

Layer 5 — Safety layer (input, tool, memory, egress planes — Article 27)

Strategic fit: high — the safety layer is where governance and differentiation intersect.
Cost curve: managed classifiers (Azure AI Content Safety, Amazon Bedrock Guardrails, Protect AI) usage-priced; open-source classifiers free to run.
Differentiation: high on policy; medium on classifier.
Exit cost: medium for classifiers (portable with work); higher for policy-engine lock-in.
Talent: security + ML-safety talent scarce.

Typical decision: integrate OPA for policy + commercial-or-open-source classifiers. Keep classifiers portable behind an internal interface so they can be swapped.

Layer 6 — Observability and tracing

Strategic fit: low — observability is infrastructure.
Cost curve: commercial (Datadog, New Relic, LangSmith, Arize, Weights & Biases, Fiddler) usage-priced and expensive at scale; self-hosted (OpenTelemetry Collector, Jaeger, Grafana) cheaper at scale.
Differentiation: low.
Exit cost: medium — OTEL standardisation reduces exit cost substantially.
Talent: generic SRE talent.

Typical decision: integrate OpenTelemetry-based pipeline with best-fit backend (commercial or self-hosted per scale + budget). Avoid vendor-specific SDK coupling.

Layer 7 — Evaluation harness

Strategic fit: high for regulated use cases — evaluation is governance evidence.
Cost curve: build cost concentrated; operating cost moderate.
Differentiation: high for the golden-task battery (use-case-specific); low for the adversarial generic battery.
Exit cost: low — harness is the organisation’s IP.
Talent: SDET + AI-red-team hybrid talent.

Typical decision: build the golden-task battery internally; integrate open-source adversarial batteries (Garak, Giskard, OWASP LLM testing tools); consider commercial (Arize, Galileo) for quality monitoring.

Seven worked build-vs-buy decisions

Using the five-factor framework, here are seven decisions with illustrative scores.

Build: golden-task evaluation battery. High strategic fit + high differentiation + low exit cost. Verdict: build.
Buy: primary LLM provider API. Low strategic fit + commodity + low exit cost with a secondary provider. Verdict: buy (with fallback second provider).
Integrate (open-source): LangGraph + OpenTelemetry + OPA + pgvector platform composition. Medium strategic fit + strong communities + low exit cost. Verdict: integrate.
Buy: Salesforce Agentforce for CRM-native agentic. Where the CRM is Salesforce and the agent is CRM-adjacent, buy the native offering; strategic fit is in integration depth. Verdict: buy.
Watch-and-wait: agent-to-agent networking products. Early market; standards (Anthropic A2A proposals; emerging IETF-style drafts) still forming. Verdict: watch until standards settle.
Buy with caveat: managed MCP-server ecosystem for common SaaS. Low-strategic-fit integrations (generic calendar, generic email) best purchased; but retain option to replace per MCP portability. Verdict: buy with portability.
Build: governance registries for agent/prompt/tool/memory (Article 26). High differentiation in governance; low exit cost; moderate build effort. Verdict: build on open-source substrate.

The “build-your-own” open-source reference

For teams with budget and talent for sovereignty, the open-source reference stack exists:

Runtime: Kubernetes + LangGraph or Semantic Kernel.
Model serving: vLLM or SGLang; or commercial API with two providers.
Vector store: pgvector on PostgreSQL.
Policy engine: Open Policy Agent.
Observability: OpenTelemetry Collector → Grafana + Jaeger.
Evaluation: custom golden-task battery + Garak / Giskard / OWASP LLM testing.
Classifiers: mix of open-source (Presidio for PII; spaCy-based for topics) and use-case-specific fine-tunes.
Tool interop: MCP servers where available; custom for internal systems.

This reference is the “build + integrate” pole. Most organisations end up somewhere between this pole and an all-buy stack.

Model Context Protocol — the standardisation bet

MCP (published by Anthropic, 2024, and increasingly adopted across providers) is the most significant reduction in exit cost the agentic ecosystem has seen. MCP lets tools expose manifests once and be consumed by any MCP-speaking agent regardless of framework. The architect should:

Favour MCP-exposed tools over vendor-specific wrappers where possible.
Require new internal tools to expose MCP manifests alongside any vendor-specific integration.
Prepare for a future in which more managed services and SaaS products expose MCP endpoints.

MCP does not eliminate lock-in elsewhere in the stack but at the tool layer it is a meaningful win.

Scoreboard for a sample decision

Anti-patterns to reject

“We’ll pick one vendor for simplicity.” Simplicity today; lock-in tomorrow.
“All open-source.” Ignores talent + operating cost; open-source without talent to run it is a different lock-in.
“We’ll re-decide after six months.” Decisions made without a re-visit cadence become unacknowledged commitments. Schedule re-reviews.
“MCP is a demo; we’ll integrate later.” MCP adoption is growing fast; early integration is cheaper than late.
“The vendor promised a migration path.” Promises without contractual commitment are not migration paths.

Learning outcomes

Explain the five-factor framework (strategic fit, cost curve, differentiation, exit cost, talent) and apply it per agentic stack layer.
Classify seven representative decisions as build / integrate / buy / watch-and-wait using the framework.
Evaluate a buy decision for lock-in risk, portability cost, and exit-cost adequacy.
Design a build-vs-buy scorecard for a given agentic platform including per-layer options, scoring, and a re-visit cadence.