Hybrid Cloud Integration Strategy
A year and a half into operating hybrid cloud estates, the patterns that work and the patterns that fail have separated cleanly. A practitioner view of cross-cloud integration, the aggregation layer, and the operating disciplines hybrid cloud genuinely requires.
Hybrid cloud has shifted meaning over the last few years. In 2020 it usually meant on-premises plus one cloud, with workloads gradually migrating in one direction. By 2023 it more often means workloads distributed across multiple clouds with an on-premises aggregation point, often for sovereignty or operational reasons that aren't going away.
This piece is the practitioner view from the more recent shape — cross-cloud, with on-prem in a smaller but persistent role. It complements rather than replaces the earlier hybrid integration piece; the architectures that work look different now than they did in 2022.
The three configurations that recur
Stripping away vendor positioning, hybrid cloud estates in 2023 settle into three configurations:
Configuration A — Cloud-primary with on-prem footprint. Most workloads in one cloud. A residual on-prem footprint serves specific needs — legacy systems that won't migrate, sovereignty data, partner-facing integrations that need network adjacency. Integration across the boundary is asynchronous and bounded.
Configuration B — Multi-cloud with shared workloads. Workloads distributed across two or more clouds, typically for resilience (active-active), regulatory (different regions per jurisdiction), or workload fit (AI/ML on one cloud, transactional on another). Integration across clouds is a primary architectural concern.
Configuration C — On-prem-primary with cloud burst. Most workloads on-premises, with specific cloud usage — typically AI/ML, analytics, or burst capacity for peak loads. The on-prem estate is the system of record; cloud is supplementary.
The strategy decisions, network architecture, integration platform choices, and operating model differ across the three. Conflating them produces designs that fit none of them.
The aggregation layer question
Multi-cloud estates need an aggregation point — somewhere that can see across the clouds, reconcile state, drive consolidated reporting, and serve as the system of record where the regulator requires one. The placement of this aggregation layer is the most consequential single architectural decision in multi-cloud estates.
The credible options:
On-premises aggregation. A dedicated on-prem layer aggregates from the clouds. Common where data sovereignty pins the system of record on-prem. Operationally heavier; provides the strongest sovereignty story.
Cloud-resident aggregation in a "neutral" cloud. A third cloud (not the workload clouds) hosts the aggregation. Common in regulated estates where the workload clouds carry the operational risk and the aggregation cloud carries the audit obligation. Cleaner operationally than on-prem.
One workload cloud serves as the aggregator. The cloud with the most analytical capability or the strongest sovereignty story handles aggregation alongside its workload role. Cheapest; produces stronger vendor lock-in.
Distributed aggregation via streaming. Event streams replicated across clouds; each cloud has a complete view; no single aggregation point. Cleanest in theory; operationally complex in practice (replication topology, consistency model, schema governance).
Most regulated enterprises end up at on-premises or third-cloud aggregation for sovereignty reasons. Younger or smaller estates often pick the workload-cloud aggregation for simplicity. The streaming-distributed pattern is the right call for specific high-throughput use cases (real-time analytics, fraud detection) and is operationally heavy for everything else.
Network architecture decisions
Hybrid cloud integration is bottlenecked by network design more often than by application design. The decisions that bind:
Connectivity model. Site-to-site VPN, dedicated interconnect (ExpressRoute, Direct Connect, Interconnect), partner-supplied private connectivity, or the public internet with strong identity and encryption. Each has different latency, throughput, cost, and operational characteristics.
IP space coordination. Cross-cloud workloads need to communicate. IP space conflicts (overlapping CIDRs across clouds, on-prem subnets that collide with cloud subnets) become migration blockers. Plan IP space across the entire estate before workloads start landing.
DNS resolution. Names need to resolve consistently across the estate. Split-horizon DNS, private DNS zones per cloud, on-prem authoritative servers — the model varies but the discipline doesn't. Inconsistent DNS produces production incidents that nobody can debug.
Service discovery. Beyond DNS — how does a workload in one cloud find services in another? Service mesh extension across clouds, federated service catalogues, or explicit endpoint configuration. The choice has long-term operational consequences.
Egress cost. Cross-cloud traffic costs egress fees at the source cloud. For high-volume integration, this becomes a meaningful budget line. Architectural choices that reduce cross-cloud traffic (caching at the receiving cloud, locality-aware routing) repay their cost in egress savings.
The estates that handle hybrid cloud well have invested in the network layer. The estates that struggle have usually deferred the network design and discovered the costs during workload migration.
Identity and access across clouds
Each cloud has its own identity system. Workloads operating across clouds need consistent identity — the same actor authenticating in one cloud needs to be the same authenticated actor in another.
The patterns that work:
Federated identity from a single provider. An identity provider (Azure AD / Entra ID, Okta, Auth0, Ping) federates to each cloud's identity system. Users and service identities map to consistent claims across clouds. The federation provider is the source of truth.
Workload identity federation. OIDC workload identity federation lets a workload in one cloud authenticate to another cloud using the source cloud's identity tokens. Replaces stored credentials with short-lived federated tokens. Available across major clouds; the operational pattern is well-documented.
Service account discipline. Service-to-service authentication across clouds uses scoped service accounts with rotated credentials, ideally federated rather than stored. The pattern in each cloud is similar; the cross-cloud federation makes it tractable.
The estates that get identity right operate across clouds smoothly. The estates that have separate identity stacks per cloud accumulate access drift, credential lifecycle problems, and audit gaps.
Integration platform placement
The integration platform in hybrid cloud estates is usually one of:
- A cloud-agnostic iPaaS deployed on-prem or in the aggregation cloud, integrating with all workload clouds
- A per-cloud integration footprint (Azure Integration Services, AWS EventBridge, GCP Workflows) plus cross-cloud connectors
- Hybrid: cloud-native integration within each cloud, iPaaS across cloud boundaries
The third pattern is the one we see working in most regulated enterprise estates. Cloud-native integration handles cloud-internal flows efficiently; the iPaaS bridges across cloud boundaries with consistent governance and observability. The boundary is explicit.
The pure cloud-native pattern (option two) works for younger estates with little on-prem footprint and tight cloud-vendor coupling. The cloud-agnostic-only pattern (option one) becomes operationally expensive when cloud-internal workloads route unnecessarily through the iPaaS.
Data residency that actually holds up
The most common compliance failure in hybrid cloud estates is data residency that's claimed but not enforced architecturally. Residency requirements ("data X must stay in geography Y") are typically declared in policy documents and assumed to be enforced by network or platform configuration. In practice, residency violations occur through observability tools forwarding payloads to globally-aggregated SaaS, through backup replication to non-resident storage, through development tooling that copies data across environments.
The disciplines that produce residency that actually holds:
- Data classification with explicit residency labels. Each data type has a documented residency requirement.
- Architecture diagrams that show every place data flows. Including observability tools, backup systems, development workflows.
- Periodic auditing that residency holds. Not a one-time check; a recurring discipline.
- Tooling that enforces residency. Cloud-native sovereign-region controls, observability tools with regional deployment, backup configuration that respects residency.
The estates that have passed residency audits have done this work explicitly. The estates that have failed have usually had residency claims in policy without architectural enforcement.
Operating model implications
Hybrid cloud is operationally heavier than single-cloud. The team has to operate across cloud-vendor consoles, learn each cloud's failure modes, manage capacity in each, handle cost discussions per cloud, and coordinate incidents that span clouds.
The patterns that scale:
- Common tooling across clouds where possible (Terraform, GitHub Actions or GitLab CI, consistent observability stack) reduces cognitive load.
- Per-cloud specialists with cross-cloud collaboration — engineers develop depth in one cloud and broad familiarity with others, with structured collaboration on cross-cloud workloads.
- Centralised governance for the cross-cloud surface — cost reporting, identity federation, network architecture, integration strategy all sit at a central level.
- Per-team workload responsibility — teams own their workloads' operation regardless of which cloud they run on.
The estates that haven't invested in the operating model discover that hybrid cloud is roughly 1.5x the operational cost of single-cloud, sometimes more. The estates that have invested produce closer to 1.2x — the additional cost is real but manageable.
What we recommend
For an estate evaluating hybrid cloud:
- Decide the configuration explicitly (A, B, or C). The strategy depends on which.
- Make the aggregation layer decision deliberately. This is the most consequential single choice.
- Invest in network architecture before workloads. IP space, DNS, service discovery.
- Establish federated identity across clouds before workload sprawl.
- Plan for the operating model — common tooling, specialist organisation, centralised governance.
For an existing hybrid cloud estate with operational pain:
- Audit the network architecture. Are IP conflicts, DNS inconsistencies, or egress costs producing friction?
- Audit the identity story. Is there federated identity, or per-cloud identity stacks with manual reconciliation?
- Audit the aggregation layer. Is the cross-cloud view consolidated, or reconstructed during incidents?
- Audit residency enforcement. Is residency claimed but unenforced architecturally?
Hybrid cloud is the operational reality for most large enterprises in 2023 and the foreseeable future. The architectural disciplines that make it work are more demanding than single-cloud disciplines but well-understood. The estates that invest in them produce hybrid estates that compound value. The estates that don't produce hybrid estates that compound operational cost.
RELATED READING
More from the field.
Service practices the article draws on, related programmes, and other pieces on adjacent topics.
Service practices
Related pieces
Hybrid Enterprise Integration Strategy
Hybrid integration has accumulated more meaning than the architects who coined the term intended. A decision framework for what workloads belong on-premises, what belongs in the cloud, and where the boundary between them should live.
Cloud Integration Architecture
Cloud integration services have matured into platforms that compete with traditional iPaaS. A decision framework for what belongs on cloud-native integration versus what belongs on a dedicated integration platform — and how to architect the boundary.
Building Scalable Cloud Architectures
Cloud-native scalability is rarely a single architectural decision — it's a layered commitment to autoscaling, asynchronous patterns, data partitioning, caching, and edge delivery. A practitioner view of which patterns belong at which layer.
Programme · Life Sciences · North America
AI-Ready Event Streaming — Global Life Sciences Enterprise
Production-grade Apache Kafka event streaming platform feeding AI models, ML pipelines, and operational intelligence systems across global operations.
Industry
Government & Public Sector
Regulatory platforms, citizen services, and federal-grade integration.
Discuss this work
Bring an enterprise programme.
If anything in this piece resonates with what you're building, talk to us. Senior practitioners engage directly on architecture and delivery.
Work with the practitioners
Bring an enterprise programme.
Architecture audit, new delivery, modernisation, or in-flight rescue — Intellectual engages directly on enterprise programmes with senior practitioners.