LLM Security — Threats, Mitigations, and What Enterprise Teams Should Actually Do
The LLM security landscape in mid-2024 has more named threats than mature mitigations. A practitioner view of which threats deserve attention and which technical and operational controls actually reduce risk.
The LLM security conversation has matured. OWASP published its LLM Top 10 last year; this year's update consolidates a clearer picture. The landscape now has more named threats than mature mitigations. The teams shipping production LLM systems are picking specific risks to address with specific controls, accepting some residual risk, and monitoring.
This piece is a practitioner view of the LLM security landscape in mid-2024 — which threats deserve attention in enterprise contexts, what controls actually reduce risk, and how to think about the residual risk that remains.
The threat landscape, simplified
A practical grouping of LLM security threats:
Direct attacks on the model
- Prompt injection — the user input contains instructions that override the system's intent
- Jailbreaking — techniques to elicit content the model would normally refuse
- Model extraction — using the model's responses to reverse-engineer its weights or training data
Indirect attacks via content
- Indirect prompt injection — content the model processes (web pages, documents) contains instructions targeted at the model
- Data poisoning — training data or retrieval data is manipulated to influence the model's outputs
Application-level attacks
- Permission escalation — using the model to access data or actions the user shouldn't have
- Data exfiltration — using the model to extract sensitive data from the system
- Tool abuse — using function calls in ways the system didn't anticipate
Resource attacks
- Cost amplification — inputs that cause expensive operations
- Denial of service — inputs that cause the system to hang or consume excessive resources
Operational threats
- Supply chain — compromised models, libraries, or dependencies
- Insider misuse — privileged users misusing the system
- Audit failure — incidents that cannot be reconstructed because logging was inadequate
Not all of these matter equally for every workload. The risk profile depends on what the system does, what data it touches, and who can interact with it.
Mitigations that actually work
Input filtering
The first layer of defence. Before content reaches the model:
- PII detection and redaction for inputs that include personal data
- Prompt injection pattern detection using either heuristics or a small classifier
- Length limits to prevent context exhaustion attacks
- Encoding detection for inputs that try to smuggle instructions via encoded text
Input filtering is the cheapest layer and the highest-leverage. It catches the easy attacks before they reach the model.
Output filtering
After the model produces a response, before it returns to the user or downstream system:
- PII detection and redaction in outputs
- Content policy enforcement — does the output violate organisational policy
- Schema enforcement for structured outputs
- Citation validation — does cited content actually exist
- Toxicity and bias screening for high-stakes outputs
Output filtering catches model failures that input filtering missed.
System prompt hardening
The system prompt is the model's persistent instructions. Hardening patterns:
- Explicit refusal posture — what the model must not do, stated clearly
- Role anchoring — the model is reminded of its role periodically
- Boundary enforcement — what topics are out of scope
- Injection resistance — instructions about how to handle inputs that look like instructions
System prompts don't make the model robust to all attacks but they raise the bar substantially.
Permission propagation
The user's identity travels with every function call. The function execution layer enforces permissions. Without this, the LLM can act as a privilege bypass.
Function design
Functions are designed with security in mind:
- Minimum-privilege scope per function
- Idempotency where possible
- Argument validation as deterministic code
- Audit logging of every call
Output validation for structured workloads
For workloads where the LLM produces structured output that downstream systems consume:
- Schema validation
- Business rule validation
- Allowlist checks for entity references
- Sanity bounds on numeric values
Validation catches model outputs that look right but aren't.
Sandboxing where the model executes code
If the model has the capability to execute code (e.g., code interpreter tools), the execution is sandboxed:
- Restricted filesystem
- No network access (or restricted)
- Resource limits
- Time limits
- Output capture and validation
Rate limiting and circuit breakers
Per-user and per-workload limits prevent both abuse and runaway cost. Sudden cost spikes or unusual usage patterns trigger automatic shutoffs.
Audit logging at adequate depth
Every input, every output, every function call, every user identity, every model version — logged with consistent identifiers. Without this, post-incident analysis is impossible.
Red team testing
Periodic adversarial testing to find failure modes before users or attackers do. Findings inform the other layers.
Threats with weaker mitigations
Some threats remain hard to mitigate completely:
Indirect prompt injection
The state of the art for defending against indirect prompt injection is immature. The mitigations that exist:
- Treat external content as data, not instructions, in the prompt assembly
- Isolate processing of untrusted content from processing of user instructions
- Output filtering catches some downstream effects
But these are partial. The threat surface is real and the residual risk is meaningful for workloads that process untrusted content.
Sophisticated jailbreaking
Capable attackers can find prompt patterns that bypass any specific defence. The defence is layered — system prompt, input filtering, output filtering, monitoring — rather than any single barrier.
Supply chain
The model, the framework, the dependencies — all potential supply chain risk. Mitigations follow conventional software supply chain practice: vendor due diligence, provenance verification, behaviour monitoring.
Data poisoning of retrieval corpus
If the retrieval corpus can be modified by users (e.g., a community knowledge base), the corpus can be poisoned. Mitigations are corpus governance and content curation, not technical filtering.
How to think about residual risk
Some risk remains after mitigation. The question is how to think about it.
A working framework:
- Identify the worst-case incident for each threat category
- Estimate the probability of that incident given the mitigations
- Estimate the impact if it happens
- Decide whether the residual risk is acceptable
- Plan the incident response for the case where it happens anyway
This is conventional risk management applied to LLM-specific threats. The shape is familiar; the inputs are new.
What the OWASP LLM Top 10 captures
For reference, the OWASP LLM Top 10 (as of late 2023 and updated through 2024) lists:
- Prompt Injection
- Insecure Output Handling
- Training Data Poisoning
- Model Denial of Service
- Supply Chain Vulnerabilities
- Sensitive Information Disclosure
- Insecure Plugin Design
- Excessive Agency
- Overreliance
- Model Theft
The list is useful as a checklist for threat modelling. The relative importance of each item depends on the workload.
What we keep seeing
Recurring patterns in enterprise LLM security engagements:
Input and output filtering catch most issues. Teams that invest in these two layers prevent the bulk of incidents. The exotic threats matter less than the basic ones.
Permission propagation gaps are the most common application-level vulnerability. Across our engagements, we keep finding cases where the model's calls execute with broader permissions than the user has.
Indirect prompt injection is underestimated. Teams test for direct injection and miss the indirect surface. This will be the source of significant incidents over the next several years.
Audit gaps surface during incidents. A team responds to an incident and realises the trace doesn't reconstruct what happened. Audit retrofit is expensive; building it from the start is much cheaper.
Red team findings are valuable. Every red team engagement we run finds at least one issue worth fixing. The investment pays back.
What we recommend
For enterprise teams operating LLM systems in 2024:
- Threat-model explicitly. Generic security thinking misses LLM-specific patterns.
- Layer the defences. No single barrier is sufficient; the combination is.
- Invest in input and output filtering first. Highest leverage.
- Audit permission propagation carefully. It is the most common application-level vulnerability.
- Treat indirect prompt injection as a primary concern in workloads that process untrusted content.
- Build audit-grade logging from day one. Retrofit is painful.
- Red team periodically. The findings sharpen the defences.
- Accept residual risk explicitly. Plan incident response for the cases where mitigation fails.
LLM security in 2024 is not a solved problem. It is a managed problem. The teams that manage it deliberately ship systems that operate safely within their risk envelope. The teams that ship without disciplined threat modelling discover issues through incidents, which is much more expensive.
RELATED READING
More from the field.
Service practices the article draws on, related programmes, and other pieces on adjacent topics.
Service practices
Related pieces
Red Teaming Enterprise AI Systems — A Practitioner Playbook
Most enterprise AI systems are deployed without serious adversarial testing. The teams that ship with confidence are the ones that have tried to break their own system before users or attackers do.
AI Systems and Enterprise Identity — Where Most Deployments Cut Corners
Authentication and authorisation are conventional enterprise architecture topics. In AI systems they tend to be deferred, abbreviated, or wired up wrongly. A practitioner view of the patterns that actually hold up.
API Security Architecture
API security is a layered problem. The architecture that holds up treats the gateway, the transport, the authentication, the authorisation, the input handling, and the audit posture as separate concerns — each defended independently.
Discuss this work
Bring an enterprise programme.
If anything in this piece resonates with what you're building, talk to us. Senior practitioners engage directly on architecture and delivery.
Work with the practitioners
Bring an enterprise programme.
Architecture audit, new delivery, modernisation, or in-flight rescue — Intellectual engages directly on enterprise programmes with senior practitioners.