AI & Enterprise AI23 January 20249 min read

The Enterprise AI Stack — A Reference Architecture

Most enterprise AI teams are assembling the same stack from the same parts. A clean reference architecture for the layers that compose an AI-augmented enterprise platform — and the design decisions at each layer.

ByIntellectual AI Engineering Practice· Collective byline

A pattern from the last twelve months of enterprise AI work: most teams are building the same architecture from the same components, with the same trade-offs at each layer. The stack has stabilised. Knowing what the layers are makes new initiatives faster to scope and easier to compare.

This is a reference architecture for the AI layer of an enterprise platform — what each layer does, what components fit there, and what design choice has to be made at each step. It is not prescriptive about products; it is descriptive about shape.

The layers

From the user-facing surface down to the foundation:

Experience layer — the interface where humans or upstream systems pose questions
Orchestration layer — agents, planners, workflow engines that coordinate work
Intelligence layer — LLMs, multimodal models, classification models
Knowledge layer — vector indices, structured data, document stores
Integration layer — connectors to enterprise systems
Governance layer — across all of the above

Each layer has its own design problems. Skipping a layer or merging it into another tends to produce systems that are hard to evolve.

1. Experience layer

The interface. In production we see four common shapes:

Conversational — a chat UI on a web or mobile surface
Embedded — an AI capability inside an existing application (a "draft this" button, an "explain this" panel)
Headless — an API consumed by upstream systems, returning structured data
Batch — a scheduled job that processes a workload and writes outputs

The design choice is which of these to expose. Most enterprise initiatives end up with two or three, often the same intelligence pipeline behind different experience surfaces. Designing the intelligence layer to be shape-agnostic — same orchestration whether the caller is a chat session, an embedded button, or a batch job — pays off over the eighteen-month horizon.

What matters at this layer:

Latency expectations differ by shape. Conversational expects sub-second first token; batch can take minutes. The orchestration layer has to know which it is serving.
Authentication and identity propagate from here. The user's identity, their permissions, their organisational context have to be available downstream.
Telemetry starts here. Every interaction logged with a session identifier that lets the entire downstream pipeline be reconstructed.

2. Orchestration layer

The layer that decides what to do. In production this is one of:

Single-call — the request goes to the LLM once, the response goes back
Retrieval-augmented call — retrieval, then LLM
Function-calling loop — the LLM may call functions, see results, continue
Multi-step planner — the LLM produces a plan, executes it as a workflow
Multi-agent — multiple specialised agents coordinated by a supervisor

The design choice is which orchestration pattern fits the workload. Most enterprise workloads are single-call or retrieval-augmented; a smaller set genuinely needs function calling; a still smaller set needs planners; multi-agent is largely experimental in production.

What matters at this layer:

Boundedness. Loops have step limits, cost limits, time limits.
State. Long-running interactions need conversation memory; how that memory is stored, summarised, and retrieved is a design decision.
Fallbacks. When the LLM fails to produce usable output, what happens? Retry, escalate, fall back to deterministic, return graceful error.
Observability. Every orchestration step logged with the inputs, the decision, the outputs.

Frameworks at this layer — LangChain, LlamaIndex, Semantic Kernel, custom orchestration — have specific opinions about each of these. Picking a framework before knowing the workload shape often leads to fighting the framework.

3. Intelligence layer

The models themselves. In production this is rarely a single model:

A primary generation model — GPT-4, Claude 2.1, a hosted equivalent
A cheap classifier or router — GPT-3.5 or a small open model, deciding intent or routing
An embedding model — text-embedding-ada-002, or an open alternative
A reranker — a cross-encoder for retrieval reordering
Optional specialist models — for code, for vision, for transcription

The design choice is the routing policy. A naive system sends everything to the most capable, most expensive model. A production system routes by complexity — easy queries to a small model, hard queries to a large one — and uses specialist models where they outperform general-purpose ones.

What matters at this layer:

Version pinning. Hosted models update without notice. Pin model versions in code; treat upgrades as migrations.
Provider redundancy. A single provider outage takes the whole system down. Where the workload justifies, multi-provider routing matters.
Cost monitoring at the call level. Tokens in, tokens out, model identifier, dollar cost. Without this, cost surprises are inevitable.
Hosting model. Hosted APIs, dedicated capacity, self-hosted open models, on-premises deployment of commercial weights. Each has different cost, latency, compliance, and operational profiles.

4. Knowledge layer

The data the intelligence layer reaches for. In production this is:

Vector indices — for similarity search over unstructured content
Structured stores — databases the system queries directly
Document stores — the source of truth for documents
Caches — embedding caches, retrieval caches, response caches
Metadata stores — about the content (lineage, classification, access)

The design choice is the partitioning. A single global index for the whole enterprise is rarely what you want — different content has different access controls, different update cadences, different appropriate retrieval strategies. But a fragmented landscape of indices, one per project, creates retrieval surface that misses obvious recall.

What matters at this layer:

Access control propagation. The user querying the system has permissions; those permissions have to constrain what the retrieval returns.
Refresh and lifecycle. Documents change. The index has to reflect changes. Stale indices undermine trust in the system.
Hybrid search. Pure vector search consistently underperforms hybrid (lexical + vector) for enterprise queries. The store has to support both.
Audit and lineage. When the system cites a chunk, you have to be able to trace that chunk back to the source document and its version.

5. Integration layer

The connectors to enterprise systems. This is where AI initiatives meet existing platform reality.

Read-side integration — pulling content into the knowledge layer
Write-side integration — taking action in enterprise systems (creating tickets, updating records, sending notifications)
Identity integration — knowing who the user is across systems
Observability integration — logs and metrics into the enterprise SIEM and dashboards

The design choice is alignment with existing platform conventions. Enterprises with mature integration platforms — webMethods, MuleSoft, Kafka — should be exposing AI capabilities through those platforms, not building parallel integration stacks.

What matters at this layer:

Reuse existing connectors. The work to build a SAP connector for the AI initiative is the same work that has already been done somewhere in the estate. Find it.
Same governance, same security review. AI initiatives that bypass integration governance create exposure that bites later.
Versioning. APIs change. The integration layer has to handle change without breaking the AI workload.

6. Governance layer

Cuts across everything else. The properties this layer ensures:

Auditability — every action is logged with enough detail to reconstruct
Access control — users see only what they are permitted to see; tools take only actions they are permitted to take
Content policy — what the system can produce; what it must refuse
Cost control — budgets at the user, workload, and tenant level; alerts and circuit breakers
Model policy — which models can be used for which workloads; approval flow for new models
Data residency and classification — what data can leave the boundary; what models can process what classification of data

This is the layer that determines whether the AI initiative is allowed to ship into a regulated environment. Building it in retrospectively is expensive. Building it from day one is much cheaper.

How the layers compose

A typical request flow:

User query arrives at the experience layer with identity context
Orchestration layer decides the workload shape — retrieval-augmented, function-calling, single-call
Knowledge layer is queried for relevant context (with access controls applied)
Integration layer fetches any structured data needed
Intelligence layer produces a response (with prompt assembly mediated by the orchestration layer)
Output is validated against governance policy
Response returned to experience layer
Full trace logged

Every step traverses the governance layer.

What we recommend

For an enterprise standing up an AI capability:

Design at the layer level first. Pick patterns at each layer before picking products.
Reuse existing integration platforms. The AI initiative is a new workload on the existing platform, not a parallel stack.
Treat the intelligence layer as composed of multiple models, not a single model. Routing and specialisation are first-class concerns.
Build governance from day one. The cost of bolting it on later is high.
Build observability before scale. The system has to be debuggable, auditable, and improvable from launch.

The reference architecture is not exotic. It is enterprise integration architecture with an intelligence layer added. The teams that ship are the ones that recognise this; the teams that struggle are the ones that treat AI as a new architecture rather than a new workload.

More from the field.

Service practices the article draws on, related programmes, and other pieces on adjacent topics.

Service practices

Service

AI & Intelligent Automation

/services/ai-solutions →

Service

Enterprise Integration & API Management

/services/enterprise-integration →

Related pieces

4 June 20248 min read

LLMOps Maturity — A Practitioner's Maturity Model

Most enterprises are operating LLM workloads on engineering intuition alone. A maturity model helps locate where you are, what to invest in next, and what the next stage actually requires.

11 February 20257 min read

AI Platform Engineering — What Mature Platforms Look Like in 2025

The first wave of enterprise AI platforms is now mature enough to extract patterns. The platforms that compound value across line-of-business teams share recognisable shape.

2 April 20248 min read

LLM Evaluation — The Engineering Discipline Most Teams Skip

Without evaluation, every change to an LLM system is a guess. Teams that build evaluation discipline ship with confidence; teams that skip it operate on intuition until production incidents force the issue.

Industry

Government & Public Sector

Regulatory platforms, citizen services, and federal-grade integration.

Discuss this work

Bring an enterprise programme.

If anything in this piece resonates with what you're building, talk to us. Senior practitioners engage directly on architecture and delivery.

Contact Intellectual →

← Newer post

Vector Databases for Enterprise Search

Older post →

RAG Architecture — From Demo to Production

Work with the practitioners

Bring an enterprise programme.

Architecture audit, new delivery, modernisation, or in-flight rescue — Intellectual engages directly on enterprise programmes with senior practitioners.

Contact Intellectual →Read more insights