AI & Enterprise AI30 July 20248 min read

LLM Security — Threats, Mitigations, and What Enterprise Teams Should Actually Do

The LLM security landscape in mid-2024 has more named threats than mature mitigations. A practitioner view of which threats deserve attention and which technical and operational controls actually reduce risk.

ByIntellectual AI Engineering Practice· Collective byline

The LLM security conversation has matured. OWASP published its LLM Top 10 last year; this year's update consolidates a clearer picture. The landscape now has more named threats than mature mitigations. The teams shipping production LLM systems are picking specific risks to address with specific controls, accepting some residual risk, and monitoring.

This piece is a practitioner view of the LLM security landscape in mid-2024 — which threats deserve attention in enterprise contexts, what controls actually reduce risk, and how to think about the residual risk that remains.

The threat landscape, simplified

A practical grouping of LLM security threats:

Direct attacks on the model

Prompt injection — the user input contains instructions that override the system's intent
Jailbreaking — techniques to elicit content the model would normally refuse
Model extraction — using the model's responses to reverse-engineer its weights or training data

Indirect attacks via content

Indirect prompt injection — content the model processes (web pages, documents) contains instructions targeted at the model
Data poisoning — training data or retrieval data is manipulated to influence the model's outputs

Application-level attacks

Permission escalation — using the model to access data or actions the user shouldn't have
Data exfiltration — using the model to extract sensitive data from the system
Tool abuse — using function calls in ways the system didn't anticipate

Resource attacks

Cost amplification — inputs that cause expensive operations
Denial of service — inputs that cause the system to hang or consume excessive resources

Operational threats

Supply chain — compromised models, libraries, or dependencies
Insider misuse — privileged users misusing the system
Audit failure — incidents that cannot be reconstructed because logging was inadequate

Not all of these matter equally for every workload. The risk profile depends on what the system does, what data it touches, and who can interact with it.

Mitigations that actually work

Input filtering

The first layer of defence. Before content reaches the model:

PII detection and redaction for inputs that include personal data
Prompt injection pattern detection using either heuristics or a small classifier
Length limits to prevent context exhaustion attacks
Encoding detection for inputs that try to smuggle instructions via encoded text

Input filtering is the cheapest layer and the highest-leverage. It catches the easy attacks before they reach the model.

Output filtering

After the model produces a response, before it returns to the user or downstream system:

PII detection and redaction in outputs
Content policy enforcement — does the output violate organisational policy
Schema enforcement for structured outputs
Citation validation — does cited content actually exist
Toxicity and bias screening for high-stakes outputs

Output filtering catches model failures that input filtering missed.

System prompt hardening

The system prompt is the model's persistent instructions. Hardening patterns:

Explicit refusal posture — what the model must not do, stated clearly
Role anchoring — the model is reminded of its role periodically
Boundary enforcement — what topics are out of scope
Injection resistance — instructions about how to handle inputs that look like instructions

System prompts don't make the model robust to all attacks but they raise the bar substantially.

Permission propagation

The user's identity travels with every function call. The function execution layer enforces permissions. Without this, the LLM can act as a privilege bypass.

Function design

Functions are designed with security in mind:

Minimum-privilege scope per function
Idempotency where possible
Argument validation as deterministic code
Audit logging of every call

Output validation for structured workloads

For workloads where the LLM produces structured output that downstream systems consume:

Schema validation
Business rule validation
Allowlist checks for entity references
Sanity bounds on numeric values

Validation catches model outputs that look right but aren't.

Sandboxing where the model executes code

If the model has the capability to execute code (e.g., code interpreter tools), the execution is sandboxed:

Restricted filesystem
No network access (or restricted)
Resource limits
Time limits
Output capture and validation

Rate limiting and circuit breakers

Per-user and per-workload limits prevent both abuse and runaway cost. Sudden cost spikes or unusual usage patterns trigger automatic shutoffs.

Audit logging at adequate depth

Every input, every output, every function call, every user identity, every model version — logged with consistent identifiers. Without this, post-incident analysis is impossible.

Red team testing

Periodic adversarial testing to find failure modes before users or attackers do. Findings inform the other layers.

Threats with weaker mitigations

Some threats remain hard to mitigate completely:

Indirect prompt injection

The state of the art for defending against indirect prompt injection is immature. The mitigations that exist:

Treat external content as data, not instructions, in the prompt assembly
Isolate processing of untrusted content from processing of user instructions
Output filtering catches some downstream effects

But these are partial. The threat surface is real and the residual risk is meaningful for workloads that process untrusted content.

Sophisticated jailbreaking

Capable attackers can find prompt patterns that bypass any specific defence. The defence is layered — system prompt, input filtering, output filtering, monitoring — rather than any single barrier.

Supply chain

The model, the framework, the dependencies — all potential supply chain risk. Mitigations follow conventional software supply chain practice: vendor due diligence, provenance verification, behaviour monitoring.

Data poisoning of retrieval corpus

If the retrieval corpus can be modified by users (e.g., a community knowledge base), the corpus can be poisoned. Mitigations are corpus governance and content curation, not technical filtering.

How to think about residual risk

Some risk remains after mitigation. The question is how to think about it.

A working framework:

Identify the worst-case incident for each threat category
Estimate the probability of that incident given the mitigations
Estimate the impact if it happens
Decide whether the residual risk is acceptable
Plan the incident response for the case where it happens anyway

This is conventional risk management applied to LLM-specific threats. The shape is familiar; the inputs are new.

What the OWASP LLM Top 10 captures

For reference, the OWASP LLM Top 10 (as of late 2023 and updated through 2024) lists:

Prompt Injection
Insecure Output Handling
Training Data Poisoning
Model Denial of Service
Supply Chain Vulnerabilities
Sensitive Information Disclosure
Insecure Plugin Design
Excessive Agency
Overreliance
Model Theft

The list is useful as a checklist for threat modelling. The relative importance of each item depends on the workload.

What we keep seeing

Recurring patterns in enterprise LLM security engagements:

Input and output filtering catch most issues. Teams that invest in these two layers prevent the bulk of incidents. The exotic threats matter less than the basic ones.

Permission propagation gaps are the most common application-level vulnerability. Across our engagements, we keep finding cases where the model's calls execute with broader permissions than the user has.

Indirect prompt injection is underestimated. Teams test for direct injection and miss the indirect surface. This will be the source of significant incidents over the next several years.

Audit gaps surface during incidents. A team responds to an incident and realises the trace doesn't reconstruct what happened. Audit retrofit is expensive; building it from the start is much cheaper.

Red team findings are valuable. Every red team engagement we run finds at least one issue worth fixing. The investment pays back.

What we recommend

For enterprise teams operating LLM systems in 2024:

Threat-model explicitly. Generic security thinking misses LLM-specific patterns.
Layer the defences. No single barrier is sufficient; the combination is.
Invest in input and output filtering first. Highest leverage.
Audit permission propagation carefully. It is the most common application-level vulnerability.
Treat indirect prompt injection as a primary concern in workloads that process untrusted content.
Build audit-grade logging from day one. Retrofit is painful.
Red team periodically. The findings sharpen the defences.
Accept residual risk explicitly. Plan incident response for the cases where mitigation fails.

LLM security in 2024 is not a solved problem. It is a managed problem. The teams that manage it deliberately ship systems that operate safely within their risk envelope. The teams that ship without disciplined threat modelling discover issues through incidents, which is much more expensive.

More from the field.

Service practices the article draws on, related programmes, and other pieces on adjacent topics.

Service practices

Service

AI & Intelligent Automation

/services/ai-solutions →

Service

enterprise-architecture

/services/enterprise-architecture →

Related pieces

14 May 20248 min read

Red Teaming Enterprise AI Systems — A Practitioner Playbook

Most enterprise AI systems are deployed without serious adversarial testing. The teams that ship with confidence are the ones that have tried to break their own system before users or attackers do.

6 August 20247 min read

AI Systems and Enterprise Identity — Where Most Deployments Cut Corners

Authentication and authorisation are conventional enterprise architecture topics. In AI systems they tend to be deferred, abbreviated, or wired up wrongly. A practitioner view of the patterns that actually hold up.

9 August 20228 min read

API Security Architecture

API security is a layered problem. The architecture that holds up treats the gateway, the transport, the authentication, the authorisation, the input handling, and the audit posture as separate concerns — each defended independently.

Industry

Government & Public Sector

Regulatory platforms, citizen services, and federal-grade integration.

Discuss this work

Bring an enterprise programme.

If anything in this piece resonates with what you're building, talk to us. Senior practitioners engage directly on architecture and delivery.

Contact Intellectual →

← Newer post

AI Systems and Enterprise Identity — Where Most Deployments Cut Corners

Older post →

From AI Pilot to Production — The Playbook That Bridges the Gap

Work with the practitioners

Bring an enterprise programme.

Architecture audit, new delivery, modernisation, or in-flight rescue — Intellectual engages directly on enterprise programmes with senior practitioners.

Contact Intellectual →Read more insights