AI & Enterprise AI4 February 20257 min read

Agent Infrastructure Catches Up — The Production Stack in 2025

Agent infrastructure was the gap a year ago. In 2025 the stack has matured enough that production deployment is a reasonable expectation, not a research bet.

ByIntellectual AI Engineering Practice· Collective byline

A year ago, the infrastructure for production agent systems was the gap. The capability was demonstrable; the operational and architectural pieces around it were not. In 2025 the picture has changed. Agent infrastructure has matured enough that production deployment is a reasonable expectation, not a research bet. The teams shipping agent systems share a recognisable stack.

This piece is a practitioner snapshot of the agent infrastructure stack in early 2025 — what components matter, which products are stabilising, and where the residual gaps remain.

The components of an agent stack

A working production agent system in 2025 has the following pieces:

Orchestration framework

The runtime that coordinates model calls, tool calls, and state. Choices in early 2025:

LangGraph — graph-based, controllable, common in production
OpenAI Assistants API and AgentSDK — hosted, simpler, less flexible
Anthropic Claude with MCP — increasingly common as MCP gains adoption
Custom orchestration — what many serious production teams build

The frameworks have stabilised enough that the choice is now about fit rather than capability.

Tool catalogue and registry

The set of functions the agent can call. The infrastructure to manage them:

Registry with descriptions, schemas, permissions
Versioning and deprecation
Approval workflows for new tools
Audit logging of tool invocations

This was the gap a year ago. Tools were added casually; governance was retrofitted. In 2025 the discipline is more common.

Permission and identity propagation

The user's identity travels with every tool call. Permission enforcement happens at the tool execution layer. This is now standard pattern; deployments that skip it fail security review.

State management

Agent state — conversation history, intermediate results, current goal — needs to be managed. In 2025 the patterns are:

Per-session state in a fast store (Redis or equivalent)
Persisted state for long-running interactions
Summarisation of long histories to fit context windows
Explicit state contracts between agent steps

Cost and resource controls

Circuit breakers at multiple levels:

Per-call cost caps
Per-session budgets
Per-user budgets
Workload-level budgets
Anomaly detection on cost rate

These are now table stakes. Production agents without them produce expensive incidents.

Observability

Full traces capturing:

Every model call (prompt, response, model version, cost)
Every tool call (arguments, result, latency, errors)
State transitions
Errors and recoveries
Human checkpoints

The observability tooling — LangSmith, Langfuse, Phoenix, custom — has matured. The traces are useful.

Evaluation

Curated test sets exercising the agent's behaviour. Regression testing on every change. This is the discipline that distinguishes production agent systems from extended pilots.

Human-in-the-loop interface

For agents that propose consequential actions, the human approval surface:

Clear presentation of what the agent did and why
Easy approval or rejection
Edit-then-approve patterns where appropriate
Audit trail of human decisions

Error and escalation handling

What happens when the agent can't proceed:

Clear "I cannot complete this" outputs
Routing to humans with sufficient context
Recovery from partial states
Graceful degradation

The stabilising patterns

Across the production agent deployments we are seeing in 2025:

Supervisor-worker is the dominant shape

Open-ended free-form agent conversations are still uncommon in production. Supervisor-worker patterns with deterministic hand-offs dominate. The supervisor is closer to a workflow engine than a free planner.

MCP is gaining traction

Model Context Protocol adoption is broadening. Internal MCP servers, vendor MCP servers, community MCP servers. The standardisation is reducing the integration burden.

Specialist agents over generalist agents

A few well-bounded agents handle specific workloads. Generalist agents that can do anything well are still aspirational. Production deployments are narrower.

Hybrid agentic-deterministic flows

Agents are steps in larger workflows, not the whole workflow. Deterministic code handles the parts where rules are stable; agents handle the parts that benefit from reasoning. The orchestration sits in a workflow engine.

Heavy use of evaluation

The teams that ship reliably are the ones with evaluation. Without it, drift is invisible.

The residual gaps

Even in 2025, some gaps remain:

Long-running stateful agents

Agents that operate over hours or days, maintaining state, surviving restarts — the infrastructure is immature. Most production agents are session-scoped.

Multi-agent collectives

Despite framework support, multi-agent designs in production are rare. The bounded supervisor-worker shape dominates; emergent multi-agent behaviour is still mostly research.

Cross-organisation agents

Agents that operate across organisational boundaries — your agent interacting with another organisation's agent — are not yet common. The protocols, trust models, and governance are immature.

Agent-to-agent authentication

Beyond OAuth-style patterns, the conventions for agent identity, agent permissions, and inter-agent trust are still forming.

What we keep seeing

Patterns in 2025 agent deployments:

The discipline distinguishes shipped from stalled. Teams with strong evaluation, observability, and human-in-the-loop discipline ship. Teams without these stay in extended pilot.

Tool catalogue governance matters more than the agent design. A well-governed tool catalogue with carefully scoped functions enables agent systems that are useful and safe. A casually grown catalogue produces exposure.

Cost surprises are less common. The discipline around budgets and circuit breakers has spread. Cost is a managed concern, not a surprise.

The integration with existing enterprise workflows is the work. The agent capability is the smaller part; the integration with existing systems, processes, and people is the bulk of effort.

MCP adoption is reducing custom integration. Where MCP servers exist for needed systems, the integration burden drops materially.

What we recommend

For enterprise teams building production agents in 2025:

Pick orchestration based on production fit, not demo aesthetics. LangGraph, custom, or hosted — match to your operating model.
Govern the tool catalogue from day one. The retrofit is harder than the discipline.
Propagate identity and enforce permissions at execution. The agent is not the authorisation layer.
Build evaluation as a primary discipline. Without it, you cannot improve or even maintain quality.
Apply human-in-the-loop on consequential actions. The autonomy aspiration is still ahead of the production reality.
Use MCP where it fits. The standardisation is paying off.
Plan the cost discipline. Agents make more model calls than chat workloads; budgets and circuit breakers matter.

Agent infrastructure in 2025 is mature enough for production. The teams that respect the disciplines — bounded scope, strong tool governance, identity propagation, evaluation, observability — ship useful systems. The teams that chase the autonomous-agent aspiration without the discipline produce systems that demo well and fail in operation. The capability is real; the discipline determines whether it ships.

More from the field.

Service practices the article draws on, related programmes, and other pieces on adjacent topics.

Service practices

Service

AI & Intelligent Automation

/services/ai-solutions →

Service

Enterprise Integration & API Management

/services/enterprise-integration →

Related pieces

16 July 20248 min read

The Practical State of AI Agents in Mid-2024

The agent conversation has moved from hype to deployment in some categories and remains hype in others. A practitioner snapshot of where agents are actually working and where they are still demos.

9 April 20248 min read

Function Calling — Production Patterns for Enterprise

Function calling turned LLMs from text producers into action takers. The production patterns are constrained: a tight function catalogue, careful permission modelling, robust argument validation, and explicit human checkpoints for irreversible actions.

26 March 20249 min read

Multi-Agent Orchestration — Hype Versus Production Reality

Multi-agent frameworks dominate the AI engineering conversation right now. The patterns that actually ship are narrower, more bounded, and more boring than the demos suggest.

Industry

Government & Public Sector

Regulatory platforms, citizen services, and federal-grade integration.

Discuss this work

Bring an enterprise programme.

If anything in this piece resonates with what you're building, talk to us. Senior practitioners engage directly on architecture and delivery.

Contact Intellectual →

← Newer post

AI Platform Engineering — What Mature Platforms Look Like in 2025

Older post →

Inference Economics in 2025 — Where the Cost Curves Have Settled

Work with the practitioners

Bring an enterprise programme.

Architecture audit, new delivery, modernisation, or in-flight rescue — Intellectual engages directly on enterprise programmes with senior practitioners.

Contact Intellectual →Read more insights