Forecasting Enterprise AI Costs — Methods That Hold Up
Annual budgeting for AI workloads is hard. The costs have multiple drivers, the usage patterns change, the technology moves. A practitioner view of forecasting methods that produce useful estimates instead of theatre.
By 2025, AI workloads are a meaningful line in enterprise IT budgets. The forecasting is harder than for conventional IT. Costs depend on usage patterns, on model choice, on pricing changes from providers, on the success or failure of capabilities that don't yet exist. Most enterprise finance teams have asked us, in one form or another, how to forecast.
This piece is a practitioner view of how to forecast enterprise AI costs in 2025 — what to model, what to ignore, and what level of certainty is realistic.
What makes AI cost forecasting different
A few properties that make AI costs harder to predict than conventional IT:
Usage-driven, not capacity-driven
A traditional infrastructure cost is mostly capacity — you buy or rent capacity, you use it or you don't. AI inference cost is usage-driven — every call has a cost; cost scales with usage; usage depends on user behaviour.
Multiple price-curves
Provider pricing changes. Multi-provider deployments combine different curves. Self-hosted has different cost shape from hosted. Within hosted, batch is different from real-time.
Latency-cost trade-offs
Some workloads pay more for lower latency (dedicated capacity, frontier models). Others can use cheaper batch options. The mix matters.
Adoption uncertainty
How widely a new AI capability will be adopted determines the cost. Conservative adoption assumptions produce conservative forecasts; aggressive adoption assumptions can be 5x higher.
Capability evolution
The capabilities you'll be using in 18 months may not exist now. Forecasts that don't allow for new line items miss.
The forecasting components
A working AI cost forecast separates several components:
Existing workload costs
For workloads already in production:
- Current cost (from invoices and platform metrics)
- Expected usage growth (from user adoption curves)
- Expected pricing changes (from provider relationships)
- Expected optimisation savings (from planned work)
This is the most predictable component. For existing workloads, forecasting to within 20% over a year is achievable.
New workload pipeline
For workloads not yet in production:
- Workloads with funded plans
- Expected go-live dates
- Estimated unit economics
- Expected usage at maturity
This is harder to predict. Some planned workloads ship; some slip; some are cancelled; some change scope. Allow significant variance.
Capability evolution
For capabilities that may emerge:
- Reasoning models becoming more valuable
- Multimodal use cases expanding
- Agent workloads scaling
- Cost-curve shifts
This is genuinely uncertain. Model it as a range rather than a number.
Platform and infrastructure costs
For the shared infrastructure:
- Vector store hosting
- Self-hosted inference if applicable
- Observability and evaluation infrastructure
- Governance and audit infrastructure
These are more like conventional infrastructure costs. Capacity-driven within ranges.
Vendor and licensing
For commercial AI tools:
- Existing licence costs
- Expected renewals
- New vendor adoptions
Most enterprises will see vendor sprawl growth. Plan for it.
The methods that work
A few methods we have seen produce useful forecasts:
Bottom-up workload modelling
Start from each workload. Estimate users, calls per user, tokens per call, cost per token. Aggregate.
This works for established workloads with stable patterns. It's less reliable for new workloads where the inputs are guesses.
Top-down envelope
Set a target spend (as a percentage of IT budget, or as an absolute number) and allocate within. Force discipline; don't let line items grow beyond the envelope.
This works for environments with strong cost discipline. It produces forecasts that are predictable but may be wrong about how the spending lands.
Three-scenario forecasting
Build three forecasts: conservative, expected, aggressive. Each makes explicit assumptions about adoption, pricing, and capability evolution. The range is the forecast; the point estimate is the centre.
This is the method we recommend for most enterprises. The range communicates the uncertainty honestly; the centre is a working assumption.
Driver-based modelling
Identify the key drivers — number of active users, calls per user per day, average tokens per call, ratio of model tiers. Build the forecast as a function of these drivers. Update as drivers update.
This works when the drivers are measurable and have history. It can be the engine behind the three-scenario approach.
Reforecast quarterly
A forecast built in Q1 will be wrong by Q3. Update it quarterly. The cumulative result over the year is more accurate than a single point estimate.
This is the discipline that distinguishes useful forecasting from theatre.
What to monitor
Operationally, what to watch:
- Cost per active user. Should be roughly constant if the architecture is stable.
- Cost per call, by workload. Should be falling as caching, routing, and optimisation work compounds.
- Cost per workload. Should reflect the workload's intended scope.
- Provider mix. Drift toward one provider is a contracting risk.
- Model mix. Drift toward more expensive models without quality justification is a cost risk.
- Cache hit rates. Falling cache hit rates indicate something has changed.
- Outlier users or workloads. Should be investigated.
Each of these has thresholds; threshold breaches trigger investigation.
What's hard to forecast
A few things that resist forecasting:
Provider pricing changes
Through 2024 prices fell consistently. In 2025 the trajectory is less clear. Some providers may raise prices; some may lower. Don't assume continued declines.
Sudden capability changes
A new capability arrives that changes what workloads make sense. Reasoning models in 2024 were a discontinuity; the next one may be too.
Adoption surprises
A workload exceeds adoption expectations 10x; another underperforms. Both happen. The aggregate may be predictable; specifics aren't.
Regulatory changes
Compliance requirements may add cost or constrain choices. The EU AI Act is one example; others will follow.
Internal organisational changes
Restructures, leadership changes, strategic pivots all affect AI workloads. Plans assume continuity; reality has changes.
What we keep seeing
Patterns in enterprise AI cost forecasting:
Forecasts get more accurate with discipline. The first forecast is rough; the second incorporates lessons from the first; by the third year, the forecasting capability is reliable.
Provider price reductions are partially captured. Teams negotiate; teams optimise. They capture some of the reduction; some of it stays with the providers as margin.
Adoption is the largest single variance source. Workloads that ship and gain rapid adoption blow forecasts. Workloads that ship and underperform produce under-utilised capacity.
The platform layer drifts up. As capabilities expand, the platform's costs grow. Forecasters tend to underestimate this drift.
Vendor costs accumulate. Multiple AI vendors, each with growing usage. Audits surface what aggregate forecasts missed.
What we recommend
For enterprise teams forecasting AI costs in 2025:
- Forecast bottom-up for established workloads. Top-down envelope for the unknowns.
- Build three scenarios. Communicate the range honestly.
- Identify the key drivers; build models that update as drivers update.
- Reforecast quarterly. Annual forecasts are inadequate for AI's pace.
- Monitor unit economics, not just totals. Drift surfaces in unit metrics first.
- Allow for capability evolution. Reserve some budget for things that don't exist yet.
- Audit vendor sprawl. Aggregate from individual contracts, not from the IT department's perspective alone.
Enterprise AI cost forecasting in 2025 is harder than conventional IT forecasting and possible to do reasonably well. The teams that approach it as a discipline — with methods, monitoring, and quarterly reforecasting — produce useful estimates that improve over time. The teams that produce annual point estimates and don't update them produce theatre, which discredits the FinOps practice and damages the AI initiative.
RELATED READING
More from the field.
Service practices the article draws on, related programmes, and other pieces on adjacent topics.
Service practices
Related pieces
Three Years of Enterprise AI — What We Got Right and Wrong
A practitioner reflection on three years of enterprise AI work — the patterns I called correctly, the calls I got wrong, and what to take from each into 2026 and beyond.
The 2026 AI Infrastructure Shift — What's Changing Underneath
The infrastructure layer for enterprise AI is shifting in 2026. New hardware, new deployment patterns, new economics. A look at what's actually different and what it means for architecture decisions.
MCP One Year In — What's Working, What Isn't
Model Context Protocol is a year into broader adoption. The standardisation has paid off in specific ways and disappointed in others. A practitioner perspective from the trenches.
Discuss this work
Bring an enterprise programme.
If anything in this piece resonates with what you're building, talk to us. Senior practitioners engage directly on architecture and delivery.
Work with the practitioners
Bring an enterprise programme.
Architecture audit, new delivery, modernisation, or in-flight rescue — Intellectual engages directly on enterprise programmes with senior practitioners.