AI & Enterprise AI17 December 20248 min read

Enterprise AI in 2024 — What We Learned

A year-end practitioner reflection on what changed in enterprise AI in 2024, what stayed the same, and what to take into 2025.

ByIntellectual AI Engineering Practice· Collective byline

A year ago, enterprise AI was a category of vivid possibility and limited production. The first wave of LLM proofs of concept had shown what was possible; few had shipped into operations. Twelve months later the picture has shifted. More has shipped. More is operating reliably. The patterns of what works and what doesn't are clearer than they were.

This is a year-end practitioner reflection — what changed, what stayed the same, what to take into 2025 — drawn from a year of delivery work across enterprise and government engagements.

What changed in 2024

The model landscape diversified

A year ago, enterprise LLM workloads were largely "GPT-4, sometimes Claude." Through 2024:

Claude 3 (March) and Claude 3.5 (June) became serious enterprise options
Llama 3 (April) and Llama 3.1 (July) closed the open-weights gap
Gemini 1.5 made million-token context windows commercially available
Mistral and others produced competitive open alternatives
GPT-4o brought multimodal capability

The default of "one frontier model for everything" no longer holds. Model routing — different models for different workloads — became a standard architectural pattern.

Cost discipline emerged

A year ago, most teams discovered LLM cost through the bill. Through 2024, cost monitoring, prompt optimisation, model routing, and caching became standard practice. Bills are more predictable; the surprise has receded.

Evaluation discipline started to mature

A year ago, evaluation was missing in most production deployments. Through 2024, more teams built evaluation harnesses, curated eval sets, ran regression testing. The maturity is still uneven, but the discipline is no longer optional in serious deployments.

Function calling stabilised

Function calling moved from a feature to a default architecture pattern. Most production LLM workloads we see now involve function calls. The governance patterns around function catalogues have stabilised.

Long context arrived

Million-token context windows changed some architectures meaningfully. Document analysis simplified. Conversation memory improved. The trade-off with cost became a real design decision.

Self-hosting became viable

Through 2024, self-hosting open models moved from research exercise to credible enterprise option for specific workload categories — data residency, sustained high volume, latency-critical work.

Multimodal entered production

Vision capabilities in foundation models stopped being demos. Document processing for complex layouts, field-operations workflows with photographic evidence, mixed-content workloads — production deployments started shipping.

Agentic patterns clarified

A year ago, "agents" meant everything from a single function-calling loop to multi-agent collectives. Through 2024 the picture clarified: bounded supervisor-worker patterns work in production; open-ended autonomy doesn't yet.

Computer use crossed a threshold

The late-year launch of Anthropic's Computer Use and similar capabilities from other providers signalled that AI controlling the screen is now real enough to take seriously, though production patterns remain immature.

What stayed the same

The integration discipline still determines success

The teams that ship AI workloads reliably are the teams that respect enterprise integration discipline. Identity propagation, observability, audit, governance. The novelty is in the model layer; the production discipline is the same as it was before AI.

The data and knowledge layer is still the bottleneck

The model is interesting; the knowledge base, the semantic layer, the document corpus is what determines quality. Teams that invest here win; teams that try to substitute model capability for data quality don't.

Human-in-the-loop remains the production posture

The aspiration of autonomous AI taking consequential actions did not materialise in 2024. Production systems retain human checkpoints. The disciplines of supervised flow, escalation paths, audit trails matter more than the autonomous-agent demos.

Governance is still the unblock

AI initiatives that engage governance early ship. AI initiatives that defer governance stall at the production gate. The shape of this hasn't changed.

Change management is still neglected

The technology gets the attention; the change in how people work gets less. Deployments that succeed include change management as a primary work stream. Deployments that don't, fail at user adoption.

What we got wrong a year ago

A few predictions that didn't pan out:

"Autonomous agents will be transformative by mid-2024"

The agent capability improved; the autonomous-agent transformation didn't happen. Bounded, supervised agent patterns are useful and shipping; the autonomous vision remains future.

"Fine-tuning will be the differentiator"

Fine-tuning is useful for specific narrow tasks at high volume. It is not the differentiator most teams thought it would be. Retrieval-augmented generation with strong evaluation discipline outperforms fine-tuning for most enterprise workloads.

"Open models will replace commercial APIs"

Open models closed the gap and became viable for specific workloads. They didn't replace commercial APIs for most use cases. The realistic pattern is a mix — commercial APIs for most workloads, open models for the cases where their specific advantages apply.

"Multi-agent collectives will solve complex problems"

Multi-agent designs work when the multi-agentness is structural (different specialist roles). They don't work when the multi-agentness is emergent (agents negotiating freely). The pattern is narrower than the hype suggested.

What's worth taking into 2025

Stay disciplined about scope

The space of possible AI workloads is larger than the space of valuable workloads. Be specific about what each initiative is doing and what value it generates.

Invest in the data layer

The knowledge base, the semantic layer, the document corpus — these are the foundation. Investment here compounds; investment in chasing model capability decays.

Build evaluation as a primary discipline

Without evaluation, you don't know whether the system is getting better. With it, the system improves continuously.

Embrace model routing

The right model for each workload is rarely the same as the right model for every workload. Routing, model registry, and cost-aware selection are now table stakes.

Plan for ongoing model evolution

Foundation models will keep updating. Pin versions; treat upgrades as migrations; build the regression discipline.

Treat governance as enabling, not blocking

Governance done well accelerates deployment; governance done badly blocks it. Engage early; align on what's required; build the controls into the architecture.

Maintain human-in-the-loop posture

The autonomy aspirations will continue. The production reality is supervised AI. Design accordingly.

Don't chase frontier capability without a workload case

The next generation of frontier models will be impressive. The question isn't whether the capability is impressive; the question is whether the workload justifies the cost.

Expect the unexpected

A year ago, none of us were planning for the specific shape of capabilities that arrived in 2024. The capability evolution will continue surprising us. Build adaptable architectures; commit lightly to specific capabilities; stay close to what users actually need.

A few predictions for 2025

Conservative ones, since 2024 taught us that prediction is hard:

Agentic patterns will broaden but autonomy will not. More workloads will use agent-style orchestration; full autonomy on consequential actions will remain experimental.
Multimodal will become more deeply integrated. Document understanding, mixed-content workflows, vision-augmented operations will become standard rather than novel.
Specialised models will proliferate. Fine-tuned models for narrow tasks, hosted as part of enterprise platforms, will become a real category.
Governance frameworks will codify. Regulator expectations will become more specific; institutions with strong AI audit posture will benefit.
The integration with existing enterprise systems will deepen. AI as a feature of every major enterprise platform, not as a separate stack.

These are not bold predictions. They are the trajectories that 2024 set up. The bold predictions tend to be the ones that don't pan out.

What we recommend for 2025

For enterprise teams planning AI work for 2025:

Pick the workloads where AI clearly helps and ship those reliably. Resist the broader-is-better pull.
Invest in the foundations — data layer, evaluation, governance, observability. The foundations compound.
Maintain a balanced model strategy. Commercial APIs, open models, specialised models — each in their right place.
Plan for ongoing model evolution. The capabilities will keep changing.
Engage governance and change management as primary work streams. The technology is necessary but not sufficient.
Build for the next decade, not for the next demo. The teams that take a long view ship sustainable AI capability.

Enterprise AI in 2024 was the year the patterns clarified. Production deployments stopped being remarkable and started being routine. The discipline that produces successful deployments is now visible enough to follow. The teams that follow it will deliver real value in 2025. The teams that chase the next capability without the discipline will produce expensive demos and few shipped systems.

What we have learned in 2024 is mostly what we already knew about enterprise software — applied to a new technology layer. The new technology is the model. The discipline is the same. The teams that recognise this ship; the teams that don't, don't.

Bring an enterprise programme.

Architecture audit, new delivery, modernisation, or in-flight rescue — Intellectual engages directly on enterprise programmes with senior practitioners.

Contact Intellectual →Read more insights