Enterprise AI in 2024 — What We Learned
A year-end practitioner reflection on what changed in enterprise AI in 2024, what stayed the same, and what to take into 2025.
A year ago, enterprise AI was a category of vivid possibility and limited production. The first wave of LLM proofs of concept had shown what was possible; few had shipped into operations. Twelve months later the picture has shifted. More has shipped. More is operating reliably. The patterns of what works and what doesn't are clearer than they were.
This is a year-end practitioner reflection — what changed, what stayed the same, what to take into 2025 — drawn from a year of delivery work across enterprise and government engagements.
What changed in 2024
The model landscape diversified
A year ago, enterprise LLM workloads were largely "GPT-4, sometimes Claude." Through 2024:
- Claude 3 (March) and Claude 3.5 (June) became serious enterprise options
- Llama 3 (April) and Llama 3.1 (July) closed the open-weights gap
- Gemini 1.5 made million-token context windows commercially available
- Mistral and others produced competitive open alternatives
- GPT-4o brought multimodal capability
The default of "one frontier model for everything" no longer holds. Model routing — different models for different workloads — became a standard architectural pattern.
Cost discipline emerged
A year ago, most teams discovered LLM cost through the bill. Through 2024, cost monitoring, prompt optimisation, model routing, and caching became standard practice. Bills are more predictable; the surprise has receded.
Evaluation discipline started to mature
A year ago, evaluation was missing in most production deployments. Through 2024, more teams built evaluation harnesses, curated eval sets, ran regression testing. The maturity is still uneven, but the discipline is no longer optional in serious deployments.
Function calling stabilised
Function calling moved from a feature to a default architecture pattern. Most production LLM workloads we see now involve function calls. The governance patterns around function catalogues have stabilised.
Long context arrived
Million-token context windows changed some architectures meaningfully. Document analysis simplified. Conversation memory improved. The trade-off with cost became a real design decision.
Self-hosting became viable
Through 2024, self-hosting open models moved from research exercise to credible enterprise option for specific workload categories — data residency, sustained high volume, latency-critical work.
Multimodal entered production
Vision capabilities in foundation models stopped being demos. Document processing for complex layouts, field-operations workflows with photographic evidence, mixed-content workloads — production deployments started shipping.
Agentic patterns clarified
A year ago, "agents" meant everything from a single function-calling loop to multi-agent collectives. Through 2024 the picture clarified: bounded supervisor-worker patterns work in production; open-ended autonomy doesn't yet.
Computer use crossed a threshold
The late-year launch of Anthropic's Computer Use and similar capabilities from other providers signalled that AI controlling the screen is now real enough to take seriously, though production patterns remain immature.
What stayed the same
The integration discipline still determines success
The teams that ship AI workloads reliably are the teams that respect enterprise integration discipline. Identity propagation, observability, audit, governance. The novelty is in the model layer; the production discipline is the same as it was before AI.
The data and knowledge layer is still the bottleneck
The model is interesting; the knowledge base, the semantic layer, the document corpus is what determines quality. Teams that invest here win; teams that try to substitute model capability for data quality don't.
Human-in-the-loop remains the production posture
The aspiration of autonomous AI taking consequential actions did not materialise in 2024. Production systems retain human checkpoints. The disciplines of supervised flow, escalation paths, audit trails matter more than the autonomous-agent demos.
Governance is still the unblock
AI initiatives that engage governance early ship. AI initiatives that defer governance stall at the production gate. The shape of this hasn't changed.
Change management is still neglected
The technology gets the attention; the change in how people work gets less. Deployments that succeed include change management as a primary work stream. Deployments that don't, fail at user adoption.
What we got wrong a year ago
A few predictions that didn't pan out:
"Autonomous agents will be transformative by mid-2024"
The agent capability improved; the autonomous-agent transformation didn't happen. Bounded, supervised agent patterns are useful and shipping; the autonomous vision remains future.
"Fine-tuning will be the differentiator"
Fine-tuning is useful for specific narrow tasks at high volume. It is not the differentiator most teams thought it would be. Retrieval-augmented generation with strong evaluation discipline outperforms fine-tuning for most enterprise workloads.
"Open models will replace commercial APIs"
Open models closed the gap and became viable for specific workloads. They didn't replace commercial APIs for most use cases. The realistic pattern is a mix — commercial APIs for most workloads, open models for the cases where their specific advantages apply.
"Multi-agent collectives will solve complex problems"
Multi-agent designs work when the multi-agentness is structural (different specialist roles). They don't work when the multi-agentness is emergent (agents negotiating freely). The pattern is narrower than the hype suggested.
What's worth taking into 2025
Stay disciplined about scope
The space of possible AI workloads is larger than the space of valuable workloads. Be specific about what each initiative is doing and what value it generates.
Invest in the data layer
The knowledge base, the semantic layer, the document corpus — these are the foundation. Investment here compounds; investment in chasing model capability decays.
Build evaluation as a primary discipline
Without evaluation, you don't know whether the system is getting better. With it, the system improves continuously.
Embrace model routing
The right model for each workload is rarely the same as the right model for every workload. Routing, model registry, and cost-aware selection are now table stakes.
Plan for ongoing model evolution
Foundation models will keep updating. Pin versions; treat upgrades as migrations; build the regression discipline.
Treat governance as enabling, not blocking
Governance done well accelerates deployment; governance done badly blocks it. Engage early; align on what's required; build the controls into the architecture.
Maintain human-in-the-loop posture
The autonomy aspirations will continue. The production reality is supervised AI. Design accordingly.
Don't chase frontier capability without a workload case
The next generation of frontier models will be impressive. The question isn't whether the capability is impressive; the question is whether the workload justifies the cost.
Expect the unexpected
A year ago, none of us were planning for the specific shape of capabilities that arrived in 2024. The capability evolution will continue surprising us. Build adaptable architectures; commit lightly to specific capabilities; stay close to what users actually need.
A few predictions for 2025
Conservative ones, since 2024 taught us that prediction is hard:
- Agentic patterns will broaden but autonomy will not. More workloads will use agent-style orchestration; full autonomy on consequential actions will remain experimental.
- Multimodal will become more deeply integrated. Document understanding, mixed-content workflows, vision-augmented operations will become standard rather than novel.
- Specialised models will proliferate. Fine-tuned models for narrow tasks, hosted as part of enterprise platforms, will become a real category.
- Governance frameworks will codify. Regulator expectations will become more specific; institutions with strong AI audit posture will benefit.
- The integration with existing enterprise systems will deepen. AI as a feature of every major enterprise platform, not as a separate stack.
These are not bold predictions. They are the trajectories that 2024 set up. The bold predictions tend to be the ones that don't pan out.
What we recommend for 2025
For enterprise teams planning AI work for 2025:
- Pick the workloads where AI clearly helps and ship those reliably. Resist the broader-is-better pull.
- Invest in the foundations — data layer, evaluation, governance, observability. The foundations compound.
- Maintain a balanced model strategy. Commercial APIs, open models, specialised models — each in their right place.
- Plan for ongoing model evolution. The capabilities will keep changing.
- Engage governance and change management as primary work streams. The technology is necessary but not sufficient.
- Build for the next decade, not for the next demo. The teams that take a long view ship sustainable AI capability.
Enterprise AI in 2024 was the year the patterns clarified. Production deployments stopped being remarkable and started being routine. The discipline that produces successful deployments is now visible enough to follow. The teams that follow it will deliver real value in 2025. The teams that chase the next capability without the discipline will produce expensive demos and few shipped systems.
What we have learned in 2024 is mostly what we already knew about enterprise software — applied to a new technology layer. The new technology is the model. The discipline is the same. The teams that recognise this ship; the teams that don't, don't.
RELATED READING
More from the field.
Service practices the article draws on, related programmes, and other pieces on adjacent topics.
Service practices
Related pieces
Enterprise AI in 2025 — Year in Review
A second year-end reflection from the field. What stabilised, what surprised, and what's heading into 2026.
Three Years of Enterprise AI — What We Got Right and Wrong
A practitioner reflection on three years of enterprise AI work — the patterns I called correctly, the calls I got wrong, and what to take from each into 2026 and beyond.
MCP One Year In — What's Working, What Isn't
Model Context Protocol is a year into broader adoption. The standardisation has paid off in specific ways and disappointed in others. A practitioner perspective from the trenches.
Discuss this work
Bring an enterprise programme.
If anything in this piece resonates with what you're building, talk to us. Senior practitioners engage directly on architecture and delivery.
Work with the practitioners
Bring an enterprise programme.
Architecture audit, new delivery, modernisation, or in-flight rescue — Intellectual engages directly on enterprise programmes with senior practitioners.