Deployment Topology and Data Residency

FlowRidge

Data residency — the requirement that data (and in many readings of the EU AI Act recitals, the inference that processes it) remains within a specific jurisdiction — is now a first-class architectural input, not a downstream compliance concern. This article gives the architect a defensible topology decision process, a mapping from topology to regulatory regimes, and three reference patterns the learner can adapt.

Why topology is an architecture decision, not an ops decision

A common anti-pattern is treating topology as a deployment detail chosen by the platform team after the architecture is designed. This fails when the choice of model, orchestration, or retrieval store is incompatible with the residency requirements of the data the system will handle. The EU AI Act Article 10 obligations on data governance for high-risk systems, ISO/IEC 42001 Clause 8.3 on AI system lifecycle, and Schrems II’s Court of Justice of the European Union judgment on international transfers all push residency decisions upstream into architecture.¹

The architect makes the topology decision at the same moment as the model-selection decision (Article 2). The model selected must be available in the topology required; if it is not, either the model choice changes or the topology changes. Teams that defer this question discover mid-build that their chosen closed-weight managed API has no European-only endpoint, or that their chosen open-weight model cannot be quantized to fit the hardware in their sovereign-cloud tenancy.

The four topologies

Edge topology

Edge means inference runs on or near the device: mobile phone, laptop, embedded device, factory-floor compute node, in-store kiosk. Small quantized models (Llama 3 8B at 4-bit, Phi-3 Mini, Mistral 7B at 4-bit, Gemma 2B) make edge deployment viable for constrained tasks. Apple Intelligence is the most visible consumer deployment of a hybrid edge-cloud topology as of 2024.²

Edge strengths: minimum latency (tens of milliseconds rather than hundreds), highest residency assurance (data never leaves the device), lowest cloud cost, offline capability. Edge weaknesses: capability ceiling (the largest frontier models will not fit), fleet management overhead (every device must be patched), eval overhead (eval must run across a heterogeneous fleet), limited orchestration sophistication. Edge topologies anchor the autonomy-spectrum architect in Article 32 as well — an edge agent is the most constrained form of agent.

Regional cloud topology

Regional cloud means the model runs in a hyperscaler region (AWS us-east-1, Azure West Europe, GCP europe-west4) with the full capability ceiling of managed AI services. Amazon Bedrock, Azure AI Foundry, and Google Vertex AI all expose frontier-class models via regional endpoints; all three hyperscalers publish regional availability matrices that change month to month.³

Regional cloud strengths: capability ceiling (every major closed-weight model; broad open-weight via Bedrock, Vertex Model Garden, Azure AI Foundry); operational maturity (autoscaling, SLAs, observability integrations); economics at scale. Regional cloud weaknesses: residency complexity (a “European region” may still involve administrative data flows to the hyperscaler’s US headquarters under Schrems II scrutiny); lock-in if the architect does not isolate the model call via the abstraction layer in Article 26.

Sovereign cloud topology

Sovereign clouds are purpose-built to satisfy national or supranational data-sovereignty regimes. Examples include the AWS European Sovereign Cloud (first region in Brandenburg, announced 2023 with first workloads 2025), Microsoft Azure EU Data Boundary, Google Cloud Sovereign Controls with T-Systems and local partners, the French S3NS joint venture (Thales + Google), Oracle Sovereign Cloud, and the Gaia-X federated reference framework.⁴ Public-sector-focused stacks such as AWS GovCloud (US), Azure Government, and Google Cloud for Government serve the same architectural role for US federal data.

Sovereign cloud strengths: residency assurance backed by contractual and operational guarantees (operated by local-entity staff, data stored and processed entirely within jurisdiction); recognition by specific regulators (BaFin, ACPR, AMF; US FedRAMP; UK G-Cloud). Sovereign cloud weaknesses: capability lag (the newest frontier models arrive months to years after regional cloud); cost premium (typically 20–40% above equivalent regional cloud); feature-parity gaps (not every service available in the commercial region is available in the sovereign region).

On-premise topology

On-premise means the organization runs its own hardware in its own data centre or in a colocation facility contracted exclusively to the organization. NVIDIA DGX SuperPOD, Dell PowerEdge + NVIDIA H100, and AMD MI300-based systems anchor the modern on-premise AI deployment pattern. For LLM inference, vLLM or TGI running on H100 or H200 is the reference self-hosted stack.

On-premise strengths: maximum residency assurance; full control over data flows; amortized cost can beat cloud at very high utilization; compatibility with air-gapped environments (defense, certain health, certain banking). On-premise weaknesses: capital intensity; the hardest operational discipline (patching, scaling, failure recovery); capability ceiling bounded by open-weight release cadence; FinOps architecture is harder because there is no pay-per-use visibility by default.

Mapping topology to regulatory regimes

EU AI Act and data residency

The EU AI Act does not explicitly require EU residency of inference. It does require (Article 10) that training, validation, and testing datasets for high-risk systems meet data governance expectations that, in practice, are easier to demonstrate when the data stays under EU jurisdiction; (Article 11) technical documentation that maps data flows end to end; (Article 12) record-keeping that will be requested by notified bodies.⁵ The architect designing a high-risk system for the EU market typically chooses a sovereign cloud or regional cloud with EU Data Boundary guarantees.

GDPR — specifically Schrems II and the EDPB’s subsequent supplementary measures recommendations — affects residency more directly. If personal data passes through a US-headquartered processor, the Standard Contractual Clauses plus “supplementary measures” (often including encryption with customer-held keys) are the minimum.⁶ Sovereign cloud stacks offering locally-held-key encryption and jurisdiction-capped access are the design response.

US regimes: FedRAMP, HIPAA, ITAR

US federal data requires FedRAMP authorisation for the cloud environment hosting it. Moderate and High authorisation levels vary by sensitivity; the architect confirms that the chosen model and orchestration services are in the authorised catalog before design lock. HIPAA-regulated workloads use Business Associate Agreements that most hyperscaler AI services now support, but the BAA coverage is service-by-service (GPT-4 via Azure OpenAI has HIPAA coverage; plain ChatGPT does not). ITAR requires US-person-only access and typically pushes workloads to GovCloud topologies.

UK: G-Cloud and DSIT patterns

UK public-sector AI work runs on G-Cloud authorised services. The UK government’s AI Playbook (2024) and the DSIT AI safety guidance push toward traceability, which in topology terms favours designs where prompt, retrieval, and response capture are retained in UK-hosted observability stores.⁷

Sector regimes: GxP, PCI, MAS TRM, OSFI B-13

Life-sciences GxP workflows require computer-system validation of the AI system, which is easier in a topology the organization controls; many pharma AI platforms anchor on sovereign cloud or on-premise for this reason. PCI DSS governs payment-card data, which typically must not pass to a model context without tokenisation. Singapore MAS Technology Risk Management (TRM) and Canada OSFI B-13 both expect residency controls on financial-institution AI.

Three reference topologies

Reference A — EU-high-risk HR screening assistant

Per the capstone case in Article 35. Topology: sovereign cloud (EU Data Boundary on a major hyperscaler plus locally-held-key encryption) with on-premise retrieval index for sensitive candidate data. Model: a frontier managed model available in EU-sovereign mode (Claude via AWS European Sovereign Cloud when available; GPT-4 via Azure EU Data Boundary; Mistral Large via the EU-based provider). Orchestration in the same region. Observability in an EU-resident store (Langfuse self-hosted or Azure Application Insights EU).

Reference B — Global consumer co-pilot with regional serving

A SaaS vendor serving consumers globally. Topology: regional cloud in three geographies (us-east, eu-west, ap-northeast) with geo-routing at the orchestration plane. Data residency metadata is attached to each request at ingress, and the orchestrator routes to the closest compliant region. Retrieval corpora per region are kept in the region. The architect’s artefact is the geo-routing policy and a residency test harness that simulates cross-region leak attempts.

Reference C — Air-gapped on-premise for defense or classified work

Topology: on-premise cluster running open-weight models (Llama 3 70B quantized, Mixtral 8x22B quantized) via vLLM with no outbound internet access. Retrieval store is pgvector on the same cluster. The architect’s artefact here includes a supply-chain-security plan for model weights (signed checksums, air-gapped import process), a compute-capacity plan, and a fallback plan for when frontier open-weight capability is insufficient.

The residency decision procedure

The architect follows a four-step procedure at design start:

Classify the data. What data categories enter the system (personal data, special-category personal data, commercial confidential, regulated financial, regulated health, classified)? Which jurisdictions originate them?
Determine obligations. What residency, cross-border, key-custody, and audit-access obligations attach to each data category?
Filter candidate topologies. Eliminate topologies that cannot meet any obligation (an on-premise-only workload rules out managed API model choices; a US-only regulatory regime may rule out EU sovereign cloud).
Trade-off remaining candidates. Against capability, latency, cost, operational maturity, and exit cost.

The output of the procedure is an ADR (Article 23) with an explicit residency-control plan — the document that will be cited in the EU AI Act Article 11 technical documentation or the equivalent sector regulator’s filing.

Cross-topology patterns: hybrid and fallback

Few real systems are single-topology. Common hybrids include:

Edge-first with cloud fallback. The device handles simple requests locally; requests beyond the edge model’s capability cascade to a regional or sovereign cloud. Apple Intelligence is the consumer canonical; industrial inspection systems follow the same pattern.
Sovereign-primary with regional-cloud overflow. The primary workload runs in sovereign cloud; burst capacity overflows to an adjacent regional cloud within the same vendor’s EU Data Boundary. Overflow requests must be classified as not-sensitive before they are allowed to overflow.
Regional-primary with on-premise for high-sensitivity cohorts. A bank serves most retail customers from regional cloud but routes high-net-worth private-banking requests to an on-premise instance. The routing rule is a governance artefact.

Each hybrid carries a residency test: the architect must prove, in test, that data cannot cross the boundary the hybrid defines. The test harness is part of the architecture package.

Anti-patterns the architect rejects

Assuming a region name guarantees residency. A workload deployed in an “EU” region may still involve administrative data flows to the US-headquartered hyperscaler. The architect reads the hyperscaler’s data-residency documentation end to end, including the sub-processor list.
Residency-by-label. Tagging data “EU-resident” without an enforced boundary is not residency. An architect demands a technical control — VPC endpoints, private networking, locally-held keys — not a label.
Retrofitting residency after launch. Residency gaps discovered post-launch are expensive to fix (corpus re-indexing, model re-selection, observability store migration). The architect surfaces the residency decision at the Calibrate stage gate (Article 28).
Ignoring observability residency. Prompt/response logs and eval datasets inherit the residency obligations of the data they contain. An EU-resident workload whose observability stack sits in a US SaaS is non-compliant.

Summary

Topology is a first-order architectural decision that is made jointly with model selection and data pipeline design, not afterward. The four topologies — edge, regional cloud, sovereign cloud, on-premise — cover the vast majority of enterprise cases; most real systems hybridize among them. The architect’s deliverable is a residency-control plan backed by an ADR and a test harness, traceable to the data classification and the regulatory regime that governs the workload.

Key terms

Edge
Data residency
Sovereign cloud
Schrems II
EU Data Boundary

Learning outcomes

After this article the learner can: explain four deployment topologies and their residency implications; classify five workloads by required topology; evaluate a deployment for residency risk; design a residency-control plan for a given workload.