AI & Enterprise AI9 January 20249 min read

LLM Integration Patterns for Enterprise Applications

Most LLM proofs of concept work in a notebook and break in production. The patterns that survive deployment are not exotic — they're the ones built on enterprise integration discipline most teams already have.

ByIntellectual AI Engineering Practice· Collective byline

A year on from the first wave of enterprise LLM proofs of concept, the picture is clearer. The PoCs that worked in a notebook tend not to survive contact with production. The deployments that ship and stay shipped look like enterprise integration work, with an LLM as one of several services in the data path.

This piece is a practitioner view of the integration patterns that actually carry weight in production — what they are, where they fit, and how they fail when treated as anything other than the integration problem they are.

The shape of the problem

A useful enterprise LLM workload almost never reduces to "send a prompt, render a completion." The minimal production path looks more like this:

A user or upstream system poses a question or initiates an action.
The system gathers the context the LLM needs — documents, structured data from line-of-business systems, recent user activity, policy text.
The system constructs a prompt that bundles the question with the gathered context and any instructions, system rules, or constraints.
The LLM call goes out, often through a managed endpoint (OpenAI, Azure OpenAI, AWS Bedrock).
The response comes back as text. Sometimes that text is the answer. Sometimes it is a structured plan that needs to be parsed, validated, and acted on.
The system records the interaction for audit, observability, and improvement.

Every one of those steps is an integration step. The LLM is one of many components. The discipline that produces good enterprise integration produces good enterprise LLM systems.

Pattern 1 — Retrieval-augmented prompt construction

This is the workhorse pattern of the current generation. A query comes in; the system fetches relevant context from a knowledge base; that context is injected into the prompt; the LLM responds grounded in the retrieved material.

The integration considerations are familiar to anyone who has built an enterprise search or recommendation system:

The retrieval source has to be enterprise-grade. Document stores need to support access controls, lineage, lifecycle, deduplication. A vector index alone is not a knowledge base; it is one of several indices over a knowledge base.
The context window is a shared resource. Every component competing for tokens — system rules, retrieved chunks, conversation history, output formatting hints — needs an allocation policy. Teams that don't think about this end up with prompts that quietly truncate critical instructions.
Latency is composed of three calls minimum. Embedding the query, hitting the vector store, calling the LLM. Each has its own variance. Total p99 latency is the sum of the worst cases, not the sum of the averages.
Caching matters at every layer. Embeddings can be cached by canonical query form. Retrieved contexts can be cached by query embedding. LLM responses can be cached for deterministic prompts. The cache strategy has more impact on cost than model choice does.

This is the pattern most teams reach for first. It is also the pattern most teams overestimate the simplicity of. A working retrieval-augmented system in production is integration engineering work, not prompt engineering.

Pattern 2 — Function calling and structured output

The function-calling capability that matured through 2023 changed what enterprise LLMs are useful for. The model no longer has to generate natural language and hope the downstream parser succeeds; it can generate a structured call that the system invokes deterministically.

The pattern in enterprise terms:

The system advertises a set of available functions to the LLM, with their parameter schemas.
The LLM, given a user request, decides which function to call and produces structured arguments.
The application validates the arguments, executes the call, and either returns the result to the LLM (for a follow-up step) or to the user.

This makes LLMs viable for workflows that previously required either deterministic code or human routing — "draft a permit application from this email", "look up the customer's order history and find the relevant returns policy", "schedule the meeting based on this thread."

What teams underestimate:

The function catalogue needs governance. Every function the LLM can call is an action the LLM can take. The same approval discipline that gates API publication should gate function exposure. Casual additions of functions create casual exposure of capability.
Schema design is part of prompt design. A function with vague parameter names produces vague calls. A function whose schema explains its preconditions and side effects produces sharper calls. Schema authoring is now a prompt-engineering surface.
Idempotency and side effects matter more. When the LLM is the orchestrator, the cost of a duplicate call or a misordered sequence is borne by the downstream system. Functions exposed to LLMs should be idempotent or guarded with explicit confirmation flows.

Pattern 3 — Agentic loops with tool use

The agentic pattern — where the LLM produces a plan, executes it through tool calls, observes results, and iterates — has moved from research demos to early enterprise deployments through 2023. The reality is more constrained than the marketing suggests.

In production, the loops we see working are:

Bounded. A maximum step count, a maximum cost budget, a maximum wall-clock time. Open-ended agents that "keep going until the task is done" are not deployable; bounded agents that produce a clear "I could not complete this, escalating" output are.
Observable. Every step the agent takes is logged with the LLM call, the chosen tool, the arguments, and the result. Without this, debugging is impossible and audit trails do not exist.
Human-in-the-loop on irreversible actions. Agents can read freely. They cannot write, send, transact, or modify external state without a human checkpoint. This is not a temporary limit; it is the operating posture for the foreseeable future.

The integration considerations here look like workflow engine considerations. Compensation transactions, error handling, replay-on-failure, partial-state recovery — the same patterns that distributed transaction frameworks have spent decades on. The right model for agentic systems is closer to BPM than to chatbots.

Pattern 4 — Document-grounded synthesis

A narrower but very common pattern: given a set of documents, produce a derived artifact — a summary, a comparison, a structured extraction, a translation. Document-grounded synthesis is where intelligent document processing meets generative AI.

The integration considerations:

Document ingestion has to be production-grade. Parsers for PDFs, scanned documents, structured forms, tables, mixed-language content. The quality of the LLM output is bounded by the quality of the extraction; a brilliant model reading garbled text produces garbled answers.
Output schema enforcement. When the target is structured (a JSON extraction, a CSV row, a regulatory submission field), the LLM output needs schema validation. Function calling with response schemas, or schema-validated parse-with-retry loops, are the production-viable approaches.
Version pinning matters. When the same prompt against the same document produces a slightly different answer next month because the model was silently updated, audit chains break. Pin model versions and treat upgrades as planned migrations.

This pattern is where the early enterprise wins are clustering — intelligent document processing for compliance, contract analysis, regulatory submission drafting, KYC document review. It is also where the integration discipline is most decisive: the LLM is the smallest part of a system that is mostly about document workflows.

Pattern 5 — Hybrid deterministic-LLM pipelines

For tasks where the LLM is one step in a larger workflow, the integration pattern is the hybrid pipeline. Deterministic code handles the parts where rules are stable; the LLM handles the parts where natural-language understanding or generation is needed; the orchestration sits in a workflow engine.

In practice this looks like:

Pre-LLM normalisation. Deterministic code cleans, normalises, and routes the incoming payload. The LLM never sees a raw, malformed input.
LLM as a step, not the whole. The LLM call produces a specific output — a classification, a translation, an extraction, a draft — that the next deterministic step consumes.
Post-LLM validation. Deterministic code validates the LLM output against business rules. If validation fails, the system retries with a corrected prompt, escalates to a human, or falls back to a deterministic path.

This pattern is much closer to enterprise integration as practitioners have always done it. The LLM is a service on the bus. The orchestration is BPMN or an equivalent. The discipline that produces good integration produces good LLM-augmented integration.

What we recommend for teams starting now

For an enterprise team beginning serious LLM integration work in 2024:

Treat the LLM as a service in your existing integration architecture, not as a new architecture. The patterns above are integration patterns; your existing platform conventions apply.
Pin model versions. Treat model upgrades as planned migrations with regression test suites. The cost of an unexpected change in model behaviour in production is significant.
Build observability before scale. Every LLM call, every retrieval, every tool invocation should be logged with sufficient detail to reconstruct what happened and why. Without this, post-incident analysis is impossible.
Cost-aware design from day one. The token economy is not free. Caching, prompt minimisation, model routing (small models for easy queries, large models only where needed) are first-order concerns, not optimisations.
Human-in-the-loop posture by default for any system that takes irreversible action. Pure-autonomy agents are a research direction, not a production direction.
Apply your existing governance. API governance, change management, security review, data classification — none of these stop applying because the underlying technology is novel. The novelty is in the model; the production discipline is the same.

The LLM is the most interesting piece of the system, but it is not the differentiator. The differentiator is the integration discipline around it.

Bring an enterprise programme.

Architecture audit, new delivery, modernisation, or in-flight rescue — Intellectual engages directly on enterprise programmes with senior practitioners.

Contact Intellectual →Read more insights