AI & Enterprise AI15 April 20257 min read

Forecasting Enterprise AI Costs — Methods That Hold Up

Annual budgeting for AI workloads is hard. The costs have multiple drivers, the usage patterns change, the technology moves. A practitioner view of forecasting methods that produce useful estimates instead of theatre.

ByIntellectual AI Engineering Practice· Collective byline

By 2025, AI workloads are a meaningful line in enterprise IT budgets. The forecasting is harder than for conventional IT. Costs depend on usage patterns, on model choice, on pricing changes from providers, on the success or failure of capabilities that don't yet exist. Most enterprise finance teams have asked us, in one form or another, how to forecast.

This piece is a practitioner view of how to forecast enterprise AI costs in 2025 — what to model, what to ignore, and what level of certainty is realistic.

What makes AI cost forecasting different

A few properties that make AI costs harder to predict than conventional IT:

Usage-driven, not capacity-driven

A traditional infrastructure cost is mostly capacity — you buy or rent capacity, you use it or you don't. AI inference cost is usage-driven — every call has a cost; cost scales with usage; usage depends on user behaviour.

Multiple price-curves

Provider pricing changes. Multi-provider deployments combine different curves. Self-hosted has different cost shape from hosted. Within hosted, batch is different from real-time.

Latency-cost trade-offs

Some workloads pay more for lower latency (dedicated capacity, frontier models). Others can use cheaper batch options. The mix matters.

Adoption uncertainty

How widely a new AI capability will be adopted determines the cost. Conservative adoption assumptions produce conservative forecasts; aggressive adoption assumptions can be 5x higher.

Capability evolution

The capabilities you'll be using in 18 months may not exist now. Forecasts that don't allow for new line items miss.

The forecasting components

A working AI cost forecast separates several components:

Existing workload costs

For workloads already in production:

Current cost (from invoices and platform metrics)
Expected usage growth (from user adoption curves)
Expected pricing changes (from provider relationships)
Expected optimisation savings (from planned work)

This is the most predictable component. For existing workloads, forecasting to within 20% over a year is achievable.

New workload pipeline

For workloads not yet in production:

Workloads with funded plans
Expected go-live dates
Estimated unit economics
Expected usage at maturity

This is harder to predict. Some planned workloads ship; some slip; some are cancelled; some change scope. Allow significant variance.

Capability evolution

For capabilities that may emerge:

Reasoning models becoming more valuable
Multimodal use cases expanding
Agent workloads scaling
Cost-curve shifts

This is genuinely uncertain. Model it as a range rather than a number.

Platform and infrastructure costs

For the shared infrastructure:

Vector store hosting
Self-hosted inference if applicable
Observability and evaluation infrastructure
Governance and audit infrastructure

These are more like conventional infrastructure costs. Capacity-driven within ranges.

Vendor and licensing

For commercial AI tools:

Existing licence costs
Expected renewals
New vendor adoptions

Most enterprises will see vendor sprawl growth. Plan for it.

The methods that work

A few methods we have seen produce useful forecasts:

Bottom-up workload modelling

Start from each workload. Estimate users, calls per user, tokens per call, cost per token. Aggregate.

This works for established workloads with stable patterns. It's less reliable for new workloads where the inputs are guesses.

Top-down envelope

Set a target spend (as a percentage of IT budget, or as an absolute number) and allocate within. Force discipline; don't let line items grow beyond the envelope.

This works for environments with strong cost discipline. It produces forecasts that are predictable but may be wrong about how the spending lands.

Three-scenario forecasting

Build three forecasts: conservative, expected, aggressive. Each makes explicit assumptions about adoption, pricing, and capability evolution. The range is the forecast; the point estimate is the centre.

This is the method we recommend for most enterprises. The range communicates the uncertainty honestly; the centre is a working assumption.

Driver-based modelling

Identify the key drivers — number of active users, calls per user per day, average tokens per call, ratio of model tiers. Build the forecast as a function of these drivers. Update as drivers update.

This works when the drivers are measurable and have history. It can be the engine behind the three-scenario approach.

Reforecast quarterly

A forecast built in Q1 will be wrong by Q3. Update it quarterly. The cumulative result over the year is more accurate than a single point estimate.

This is the discipline that distinguishes useful forecasting from theatre.

What to monitor

Operationally, what to watch:

Cost per active user. Should be roughly constant if the architecture is stable.
Cost per call, by workload. Should be falling as caching, routing, and optimisation work compounds.
Cost per workload. Should reflect the workload's intended scope.
Provider mix. Drift toward one provider is a contracting risk.
Model mix. Drift toward more expensive models without quality justification is a cost risk.
Cache hit rates. Falling cache hit rates indicate something has changed.
Outlier users or workloads. Should be investigated.

Each of these has thresholds; threshold breaches trigger investigation.

What's hard to forecast

A few things that resist forecasting:

Provider pricing changes

Through 2024 prices fell consistently. In 2025 the trajectory is less clear. Some providers may raise prices; some may lower. Don't assume continued declines.

Sudden capability changes

A new capability arrives that changes what workloads make sense. Reasoning models in 2024 were a discontinuity; the next one may be too.

Adoption surprises

A workload exceeds adoption expectations 10x; another underperforms. Both happen. The aggregate may be predictable; specifics aren't.

Regulatory changes

Compliance requirements may add cost or constrain choices. The EU AI Act is one example; others will follow.

Internal organisational changes

Restructures, leadership changes, strategic pivots all affect AI workloads. Plans assume continuity; reality has changes.

What we keep seeing

Patterns in enterprise AI cost forecasting:

Forecasts get more accurate with discipline. The first forecast is rough; the second incorporates lessons from the first; by the third year, the forecasting capability is reliable.

Provider price reductions are partially captured. Teams negotiate; teams optimise. They capture some of the reduction; some of it stays with the providers as margin.

Adoption is the largest single variance source. Workloads that ship and gain rapid adoption blow forecasts. Workloads that ship and underperform produce under-utilised capacity.

The platform layer drifts up. As capabilities expand, the platform's costs grow. Forecasters tend to underestimate this drift.

Vendor costs accumulate. Multiple AI vendors, each with growing usage. Audits surface what aggregate forecasts missed.

What we recommend

For enterprise teams forecasting AI costs in 2025:

Forecast bottom-up for established workloads. Top-down envelope for the unknowns.
Build three scenarios. Communicate the range honestly.
Identify the key drivers; build models that update as drivers update.
Reforecast quarterly. Annual forecasts are inadequate for AI's pace.
Monitor unit economics, not just totals. Drift surfaces in unit metrics first.
Allow for capability evolution. Reserve some budget for things that don't exist yet.
Audit vendor sprawl. Aggregate from individual contracts, not from the IT department's perspective alone.

Enterprise AI cost forecasting in 2025 is harder than conventional IT forecasting and possible to do reasonably well. The teams that approach it as a discipline — with methods, monitoring, and quarterly reforecasting — produce useful estimates that improve over time. The teams that produce annual point estimates and don't update them produce theatre, which discredits the FinOps practice and damages the AI initiative.

Bring an enterprise programme.

Architecture audit, new delivery, modernisation, or in-flight rescue — Intellectual engages directly on enterprise programmes with senior practitioners.

Contact Intellectual →Read more insights