This article gives the five factors that govern every AI build-versus-buy-versus-integrate decision, walks them through six common decisions, and illustrates the output with a scoreboard an architect can use in review.
The five factors
The factors below are strong predictors of which decision wins in practice. None is individually decisive; the architect weighs all five and usually finds a clear direction.
1. Strategic fit
Is this component core to the organisation’s competitive position, or infrastructure. Core components belong in-house because the organisation needs control and because the decisions that shape the component track the organisation’s strategy. Infrastructure components are commodities and the rational stance is to outsource.
The test is blunt: if a competitor copied this component, would it meaningfully reduce our advantage. For most organisations the foundation model is infrastructure; the retrieval corpus is strategic. For a small number — frontier AI labs, specialised search companies — the model itself is strategic.
2. Cost curve
What does total cost look like across three horizons: a single use case today, five use cases in eighteen months, fifty use cases in three years. Managed vendors often look expensive at scale but cheap at the first use case. Self-hosted open source is the inverse. The architect models both curves and picks based on the honest forecast of adoption.
Vector stores illustrate the pattern vividly. A managed Pinecone or Weaviate Cloud deployment costs little at ten thousand vectors and grows predictably. A self-hosted Qdrant or Milvus deployment costs nearly nothing at ten thousand vectors but accumulates ops and storage costs that scale with the team running it. Crossover usually occurs somewhere in the tens to hundreds of millions of vectors — the architect computes the crossover for their specific workload.
3. Differentiation
Does this component differentiate the product in the market. Differentiation arguments are easily overused; reviewers should challenge them. A retrieval ranking method tuned to proprietary data might be differentiating; a generic orchestration framework rarely is. When differentiation is weak, buy or integrate. When differentiation is strong and testable, build.
4. Exit cost
What does it cost to undo this decision. Closed-weight API dependencies have low exit cost if the prompt abstraction is clean (the architect can re-point to another provider), higher if the prompts depend on provider-specific features. Self-hosted open-weight has high exit cost from operational investment but low exit cost from vendor switching. A fine-tuned proprietary derivative of an open-weight model has moderate exit cost because retraining against another base model is doable but expensive.
The Klarna deployment on OpenAI is an example where the public architecture appears to keep exit cost bounded: prompts are behind an abstraction layer, which means swapping to Anthropic or another provider is a tractable engineering project rather than a multi-year migration.1
5. Talent
Does the organisation have, or credibly plan to hire, the engineers needed to operate the component to enterprise standards. Self-hosting a vLLM-served Llama 3 at scale requires a small team of ML-infrastructure engineers. If the organisation does not have that team and cannot hire it in the planning window, self-hosting is not a real option no matter how attractive the cost model looks on paper.
Six common decisions
Foundation model
For most organisations: buy. The capability of a frontier model (Anthropic Claude, OpenAI GPT-4 class, Google Gemini) exceeds what an enterprise could plausibly match from its own pre-training. Exit cost is low if prompts are abstracted. Talent cost of self-pre-training is prohibitive.
Exceptions: organisations with genuinely proprietary high-value corpora and established ML-research teams — Bloomberg’s BloombergGPT and Meta’s internal models are the canonical examples.2 For these, build can be rational.
Hybrid: integrate open-weight models (Llama 3, Mistral, Qwen, DeepSeek) for workloads where managed APIs are too expensive, slow, or residency-incompatible. Many teams run this dual pattern — managed API for general workloads, open-weight for specific residency or cost corners.
Orchestration framework
Integrate or build-thin. LangChain, LlamaIndex, Haystack, DSPy, LangGraph, Semantic Kernel, and AutoGen compete as integratable open-source frameworks. Managed orchestration products exist but are less common. Most teams integrate one (or occasionally two) of these and wrap them in thin organisation-specific layers.
The build-it-all option is growing more attractive as orchestration patterns stabilise. A team that fully understands its integration pattern may prefer to ship its own narrow orchestration rather than inherit a framework’s broad API surface. The decision is the team’s maturity, not religious.
Vector store
Buy for early stage (Pinecone, Weaviate Cloud, Elastic, MongoDB Atlas Vector, the managed cloud variants). Integrate for scale (pgvector on existing Postgres, Qdrant self-hosted, Milvus, Chroma). Build is rarely justified because the math of approximate nearest-neighbour search is well understood; outsourcing the implementation to a specialist is usually right.
A pattern that works: pgvector for everything that fits under a few tens of millions of vectors (leveraging existing Postgres operations); managed or self-hosted specialist store above that scale.
Observability
Buy or integrate. Managed options (Langfuse, Arize, Weights & Biases, Humanloop, Helicone) ship fast and keep up with the moving observability frontier. Self-hosted options (MLflow, Langfuse OSS, a custom stack on OpenTelemetry) are viable for organisations with existing observability discipline. Build from scratch is rarely defensible.
Guardrails
Integrate. NVIDIA NeMo Guardrails, Guardrails AI, Llama Guard, Rebuff, and build-your-own on top of policy engines are all in play. The rails problem is mostly solved for the common cases; the organisation’s work is the policy content, not the enforcement engine. Build becomes defensible when the policies are genuinely domain-specific and cannot be expressed in a generic engine.
Identity
Buy. Every organisation of any scale already has an identity provider (Okta, Microsoft Entra ID, Ping, Auth0, AWS IAM Identity Center). AI systems integrate into it rather than rolling their own. The question is never “build identity” but “which integration pattern” (SAML, OIDC, SCIM, service-principal, workload identity federation).
The scoreboard template
A scoreboard makes the decision defensible and reviewable. Each factor scores 1 to 5, weighted, and the aggregated score for each option (build, buy, integrate) produces a recommendation. A sample filled-in scoreboard for a vector-store decision:
| Factor | Weight | Build | Buy (managed) | Integrate (self-hosted OSS) |
|---|---|---|---|---|
| Strategic fit | 0.15 | 2 | 4 | 4 |
| Cost curve (3-year) | 0.25 | 3 | 3 | 4 |
| Differentiation | 0.15 | 2 | 3 | 3 |
| Exit cost | 0.15 | 4 | 3 | 4 |
| Talent | 0.15 | 2 | 5 | 3 |
| Operational maturity | 0.15 | 2 | 5 | 3 |
| Weighted total | 2.5 | 3.75 | 3.55 |
The scoreboard is not a final answer but a documented argument. Two architects with the same inputs should arrive at similar scores; when they disagree, the disagreement is legible and discussable.
Worked example — Klarna and the OpenAI deployment
Klarna’s public disclosures from 2024 describe scaling an OpenAI-based assistant to replace the workload of hundreds of customer-service agents.1 The implicit buy decision on the model family was rational: the capability needed, the speed to deliver, and the scale required all pointed to a managed frontier model. The strategic fit of the foundation model for Klarna is infrastructure; the strategic fit of the customer-service domain data and the orchestration around it is core.
The Klarna announcement, followed by later qualified repositioning of AI versus human agents in 2024–2025, also illustrates the exit-cost test. An architecture that hard-wires a single model provider has larger exit cost than one that abstracts the provider behind a platform boundary. The public signals suggest Klarna’s architecture took the abstracted approach, giving them room to adjust the mix as the market evolves.
Worked example — Shopify and the build-integrate mix
Shopify’s public engineering disclosures show a mixed decision pattern.3 Shopify integrates frontier models (managed APIs) for generative tasks, builds internally on top of them with proprietary prompts and tool sets for the Sidekick merchant assistant, and has publicly discussed building and open-sourcing tooling where the internal team is ahead of the market.
The pattern is instructive: the decision is per-layer and per-capability rather than organisation-wide. Saying “we buy everything” or “we build everything” is almost always wrong at scale; the right answer is a portfolio with explicit rationale per component.
When to revisit
Build-buy-integrate decisions expire. The architect schedules a review at a sensible cadence — annually for platform-level choices, more often if the market is moving fast. Triggers that force an unscheduled review:
- A significant pricing change from a managed vendor.
- An open-source capability crossing the production-ready threshold.
- A regulatory change that alters residency or certification posture (Article 18).
- An evaluation gap that the current path cannot close.
- A strategic shift that repositions the component from infrastructure to core.
Revisiting is not defection; it is competence. Teams that cling to the first decision for sentimental reasons absorb the costs of a market that has moved on.
Anti-patterns
- The architecture-by-vendor-sales-cycle. A framework decision driven by who most recently pitched the team is not an architecture. Run the scoreboard; document the argument.
- The “we could build it ourselves” fantasy. Often said by the team that would build it. Ask: if an outside investor had to fund this component for eighteen months, would they. If no, integrate or buy.
- The “we can’t trust any vendor” reflex. Often said by teams with poor abstraction discipline. Fix the abstraction; most vendors are adequately trustworthy.
- No exit plan. Every buy decision should be accompanied by an exit plan that costs the real migration. If the migration is impossible, the dependency is strategic and deserves more scrutiny.
- The “best-of-breed everywhere” sprawl. A stack with seven orchestrators, four vector stores, and three observability tools is unmanageable. Platforms (Article 24) exist partly to resist this sprawl.
Governance integration
For EU AI Act evidence purposes, the decision record (Article 23) captures the scoreboard. Article 25 of the Act (obligations of providers and deployers) is easier to answer when the architect can demonstrate that build-versus-buy decisions followed a documented process. ISO/IEC 42001 clause 6.1.4 (addressing risks and opportunities) directly covers component-selection risk assessments. NIST AI RMF GOVERN 6.1 and 6.2 (third-party risk management) map to the buy and integrate decisions specifically.
Summary
Every AI-stack layer deserves a deliberate build-versus-buy-versus-integrate decision based on strategic fit, cost curve, differentiation, exit cost, and talent. The scoreboard format makes the argument legible. Most organisations buy the foundation model, integrate the orchestration and vector store, and buy the identity; the interesting decisions are usually on observability and guardrails where the market is moving quickly. Decisions expire; the architect schedules reviews and revisits when the inputs change.
Key terms
Build-versus-buy-versus-integrate Exit cost Cost curve (AI component) Strategic fit classification Component maturity
Learning outcomes
After this article the learner can: explain the five-factor framework; classify six common AI decisions by framework output; evaluate a buy decision for hidden lock-in; design a build-versus-buy scorecard for an in-scope component.