The Enterprise AI Stack — A Reference Architecture
Most enterprise AI teams are assembling the same stack from the same parts. A clean reference architecture for the layers that compose an AI-augmented enterprise platform — and the design decisions at each layer.
A pattern from the last twelve months of enterprise AI work: most teams are building the same architecture from the same components, with the same trade-offs at each layer. The stack has stabilised. Knowing what the layers are makes new initiatives faster to scope and easier to compare.
This is a reference architecture for the AI layer of an enterprise platform — what each layer does, what components fit there, and what design choice has to be made at each step. It is not prescriptive about products; it is descriptive about shape.
The layers
From the user-facing surface down to the foundation:
- Experience layer — the interface where humans or upstream systems pose questions
- Orchestration layer — agents, planners, workflow engines that coordinate work
- Intelligence layer — LLMs, multimodal models, classification models
- Knowledge layer — vector indices, structured data, document stores
- Integration layer — connectors to enterprise systems
- Governance layer — across all of the above
Each layer has its own design problems. Skipping a layer or merging it into another tends to produce systems that are hard to evolve.
1. Experience layer
The interface. In production we see four common shapes:
- Conversational — a chat UI on a web or mobile surface
- Embedded — an AI capability inside an existing application (a "draft this" button, an "explain this" panel)
- Headless — an API consumed by upstream systems, returning structured data
- Batch — a scheduled job that processes a workload and writes outputs
The design choice is which of these to expose. Most enterprise initiatives end up with two or three, often the same intelligence pipeline behind different experience surfaces. Designing the intelligence layer to be shape-agnostic — same orchestration whether the caller is a chat session, an embedded button, or a batch job — pays off over the eighteen-month horizon.
What matters at this layer:
- Latency expectations differ by shape. Conversational expects sub-second first token; batch can take minutes. The orchestration layer has to know which it is serving.
- Authentication and identity propagate from here. The user's identity, their permissions, their organisational context have to be available downstream.
- Telemetry starts here. Every interaction logged with a session identifier that lets the entire downstream pipeline be reconstructed.
2. Orchestration layer
The layer that decides what to do. In production this is one of:
- Single-call — the request goes to the LLM once, the response goes back
- Retrieval-augmented call — retrieval, then LLM
- Function-calling loop — the LLM may call functions, see results, continue
- Multi-step planner — the LLM produces a plan, executes it as a workflow
- Multi-agent — multiple specialised agents coordinated by a supervisor
The design choice is which orchestration pattern fits the workload. Most enterprise workloads are single-call or retrieval-augmented; a smaller set genuinely needs function calling; a still smaller set needs planners; multi-agent is largely experimental in production.
What matters at this layer:
- Boundedness. Loops have step limits, cost limits, time limits.
- State. Long-running interactions need conversation memory; how that memory is stored, summarised, and retrieved is a design decision.
- Fallbacks. When the LLM fails to produce usable output, what happens? Retry, escalate, fall back to deterministic, return graceful error.
- Observability. Every orchestration step logged with the inputs, the decision, the outputs.
Frameworks at this layer — LangChain, LlamaIndex, Semantic Kernel, custom orchestration — have specific opinions about each of these. Picking a framework before knowing the workload shape often leads to fighting the framework.
3. Intelligence layer
The models themselves. In production this is rarely a single model:
- A primary generation model — GPT-4, Claude 2.1, a hosted equivalent
- A cheap classifier or router — GPT-3.5 or a small open model, deciding intent or routing
- An embedding model — text-embedding-ada-002, or an open alternative
- A reranker — a cross-encoder for retrieval reordering
- Optional specialist models — for code, for vision, for transcription
The design choice is the routing policy. A naive system sends everything to the most capable, most expensive model. A production system routes by complexity — easy queries to a small model, hard queries to a large one — and uses specialist models where they outperform general-purpose ones.
What matters at this layer:
- Version pinning. Hosted models update without notice. Pin model versions in code; treat upgrades as migrations.
- Provider redundancy. A single provider outage takes the whole system down. Where the workload justifies, multi-provider routing matters.
- Cost monitoring at the call level. Tokens in, tokens out, model identifier, dollar cost. Without this, cost surprises are inevitable.
- Hosting model. Hosted APIs, dedicated capacity, self-hosted open models, on-premises deployment of commercial weights. Each has different cost, latency, compliance, and operational profiles.
4. Knowledge layer
The data the intelligence layer reaches for. In production this is:
- Vector indices — for similarity search over unstructured content
- Structured stores — databases the system queries directly
- Document stores — the source of truth for documents
- Caches — embedding caches, retrieval caches, response caches
- Metadata stores — about the content (lineage, classification, access)
The design choice is the partitioning. A single global index for the whole enterprise is rarely what you want — different content has different access controls, different update cadences, different appropriate retrieval strategies. But a fragmented landscape of indices, one per project, creates retrieval surface that misses obvious recall.
What matters at this layer:
- Access control propagation. The user querying the system has permissions; those permissions have to constrain what the retrieval returns.
- Refresh and lifecycle. Documents change. The index has to reflect changes. Stale indices undermine trust in the system.
- Hybrid search. Pure vector search consistently underperforms hybrid (lexical + vector) for enterprise queries. The store has to support both.
- Audit and lineage. When the system cites a chunk, you have to be able to trace that chunk back to the source document and its version.
5. Integration layer
The connectors to enterprise systems. This is where AI initiatives meet existing platform reality.
- Read-side integration — pulling content into the knowledge layer
- Write-side integration — taking action in enterprise systems (creating tickets, updating records, sending notifications)
- Identity integration — knowing who the user is across systems
- Observability integration — logs and metrics into the enterprise SIEM and dashboards
The design choice is alignment with existing platform conventions. Enterprises with mature integration platforms — webMethods, MuleSoft, Kafka — should be exposing AI capabilities through those platforms, not building parallel integration stacks.
What matters at this layer:
- Reuse existing connectors. The work to build a SAP connector for the AI initiative is the same work that has already been done somewhere in the estate. Find it.
- Same governance, same security review. AI initiatives that bypass integration governance create exposure that bites later.
- Versioning. APIs change. The integration layer has to handle change without breaking the AI workload.
6. Governance layer
Cuts across everything else. The properties this layer ensures:
- Auditability — every action is logged with enough detail to reconstruct
- Access control — users see only what they are permitted to see; tools take only actions they are permitted to take
- Content policy — what the system can produce; what it must refuse
- Cost control — budgets at the user, workload, and tenant level; alerts and circuit breakers
- Model policy — which models can be used for which workloads; approval flow for new models
- Data residency and classification — what data can leave the boundary; what models can process what classification of data
This is the layer that determines whether the AI initiative is allowed to ship into a regulated environment. Building it in retrospectively is expensive. Building it from day one is much cheaper.
How the layers compose
A typical request flow:
- User query arrives at the experience layer with identity context
- Orchestration layer decides the workload shape — retrieval-augmented, function-calling, single-call
- Knowledge layer is queried for relevant context (with access controls applied)
- Integration layer fetches any structured data needed
- Intelligence layer produces a response (with prompt assembly mediated by the orchestration layer)
- Output is validated against governance policy
- Response returned to experience layer
- Full trace logged
Every step traverses the governance layer.
What we recommend
For an enterprise standing up an AI capability:
- Design at the layer level first. Pick patterns at each layer before picking products.
- Reuse existing integration platforms. The AI initiative is a new workload on the existing platform, not a parallel stack.
- Treat the intelligence layer as composed of multiple models, not a single model. Routing and specialisation are first-class concerns.
- Build governance from day one. The cost of bolting it on later is high.
- Build observability before scale. The system has to be debuggable, auditable, and improvable from launch.
The reference architecture is not exotic. It is enterprise integration architecture with an intelligence layer added. The teams that ship are the ones that recognise this; the teams that struggle are the ones that treat AI as a new architecture rather than a new workload.
RELATED READING
More from the field.
Service practices the article draws on, related programmes, and other pieces on adjacent topics.
Service practices
Related pieces
LLMOps Maturity — A Practitioner's Maturity Model
Most enterprises are operating LLM workloads on engineering intuition alone. A maturity model helps locate where you are, what to invest in next, and what the next stage actually requires.
AI Platform Engineering — What Mature Platforms Look Like in 2025
The first wave of enterprise AI platforms is now mature enough to extract patterns. The platforms that compound value across line-of-business teams share recognisable shape.
LLM Evaluation — The Engineering Discipline Most Teams Skip
Without evaluation, every change to an LLM system is a guess. Teams that build evaluation discipline ship with confidence; teams that skip it operate on intuition until production incidents force the issue.
Discuss this work
Bring an enterprise programme.
If anything in this piece resonates with what you're building, talk to us. Senior practitioners engage directly on architecture and delivery.
Work with the practitioners
Bring an enterprise programme.
Architecture audit, new delivery, modernisation, or in-flight rescue — Intellectual engages directly on enterprise programmes with senior practitioners.