Computer Use and Browser Agents — Where the Threshold Sits
Anthropic's Computer Use, browser-control demos from OpenAI and others — the agentic-AI-controls-the-screen pattern has crossed a threshold in late 2024. What's actually production-ready is much narrower than the demos.
Anthropic's Computer Use feature, released last week, plus the browser-control demos from several other providers, plus the older Adept lineage — the pattern of AI agents controlling the screen has crossed a credibility threshold in late 2024. Demos work; some real workloads are starting to ship. The enterprise picture is more constrained than the demos suggest, and the constraints are worth understanding before commitment.
This piece is a practitioner view of where computer-use agents are actually production-viable in enterprise contexts, where they aren't, and how the category relates to existing RPA and automation patterns.
What computer use actually is
In current usage, computer use is the capability for an AI agent to:
- View screenshots of a screen
- Interpret what is on the screen
- Decide on next actions (click, type, scroll, navigate)
- Take those actions through a controlled interface
- Iterate until the task is complete or fails
The implementations vary in detail. Anthropic's Computer Use is a beta API capability where Claude controls a virtual machine. Other providers have similar patterns with their own variations. Specialised products (Adept, Browserbase, and others) provide the surrounding infrastructure.
The capability is real. The reliability for production workloads is variable.
Where computer use overlaps with RPA
Enterprise teams that have RPA programmes will see the obvious overlap. Both patterns:
- Drive UI surfaces of existing applications
- Replace human operators of those surfaces
- Handle tasks that don't have native APIs
The differences:
- Brittleness profile. Classical RPA is rule-based; small UI changes break it. AI-driven computer use is more resilient to UI variation but introduces non-determinism.
- Programming model. RPA bots are scripted. Computer use is instructed in natural language with feedback loops.
- Failure modes. RPA fails deterministically (the script crashes). Computer use can fail subtly (the agent does the wrong thing while reporting success).
- Reliability ceiling. Classical RPA, when the rules hold, is highly reliable. Computer use is less reliable per attempt but can recover from situations RPA can't.
The two patterns will coexist. For high-volume, stable, deterministic workflows, RPA remains the right answer. For lower-volume, variable, exception-heavy workflows, computer use is becoming a real alternative.
Where computer use is becoming production-viable
The use cases where deployment is starting to make sense:
Form filling across diverse sites
A user provides information once; the agent fills the same information into several different web forms — visa applications, supplier registrations, regulatory filings. Each form has a different layout; the agent adapts.
The reliability is acceptable for non-critical use cases. For critical filings, human review of the agent's actions remains necessary.
Account onboarding across multiple systems
Setting up a new employee across HR, IT, payroll, and access management systems. Each step has a screen flow; the agent navigates each. Faster than scripted RPA to set up; less brittle to UI changes.
Cross-system data lookups
A research analyst needs information from several internal systems with no integration. The agent navigates each, extracts the relevant data, assembles a synthesis. Reduces the manual effort meaningfully.
Browser-based testing assistance
Generating UI test scripts, exercising flows for QA. The agent's reliability matters less because the human is reviewing the captured behaviour, not depending on it.
Where it isn't yet viable
The use cases where deployment is premature:
High-volume transaction processing
Computer use is slower per transaction than API integration or scripted RPA. For high-volume work, the latency and cost don't match the alternatives.
Critical financial operations
The risk of the agent taking unintended action is real. For consequential financial workflows, the human-in-the-loop checkpoint is required, which removes most of the velocity advantage.
Operations with audit requirements
Computer use produces less structured audit trails than API integration. For regulated workloads, the compliance posture often requires API-level auditability.
Workloads where APIs exist
If the underlying system has an API, use it. Computer use is the fallback for systems without integration options; it's not preferable when integration is available.
High-stakes information retrieval
The agent might misread a screen and confidently report wrong information. For high-stakes information needs (medical, legal, financial), this risk is unacceptable.
The deployment patterns
Where computer use is shipping, the patterns are:
Constrained workflows
The agent's task is specific: "go to system X, look up Y, return Z." Not "do whatever needs doing." Bounded scope is the predictor of success.
Sandboxed execution
The agent runs in a virtual machine, browser sandbox, or other isolated environment. The blast radius of mistakes is contained.
Step-level human review
Critical actions are confirmed by a human before execution. Routine actions proceed; consequential ones pause.
Robust failure handling
The agent has clear "I cannot complete this" outputs. When it gets stuck, it escalates rather than guesses.
Detailed observability
Every screen the agent sees, every action it takes, every reasoning step — all logged. Without this, debugging is impossible and audit is incomplete.
The cost picture
Computer use is more expensive per task than other patterns. The agent makes many model calls per task (one for each screen interpretation), often with images involved. A task that an API integration handles in one call may take dozens of computer-use calls.
The economics:
- High-volume tasks: prefer API integration or scripted RPA.
- Low-volume tasks with high configuration cost: computer use can compete; the cost per task is high but no integration was built.
- Tasks where the alternative is human time: computer use is cheap compared to a human operator, even at its high per-task cost.
The cost case is workload-specific.
What we keep seeing
Recurring patterns in early computer-use deployments:
The capability gets attention; the reliability gets the work. Demos show what's possible. Production requires the reliability engineering that the demos skip.
RPA teams are the natural adopters. Teams that have run RPA programmes have the operating muscle to adopt computer use. They understand the failure modes, the audit needs, the human-review patterns.
Visual UI changes still break things. Less brittle than RPA, but not robust. A site redesign can break a computer-use workflow until it adapts.
Audit is the unsolved problem. Producing audit trails that match what regulated workflows require is harder for computer use than for API integration. Innovative auditing patterns are emerging but immature.
The supervised flow is the production pattern. Pure autonomy is research; supervised use is production.
What we recommend
For enterprise teams considering computer use in late 2024:
- Prefer API integration where it exists. Computer use is the fallback for systems without integration options.
- Consider classical RPA for high-volume, stable, deterministic workflows. Computer use is for the exception cases.
- Bound the scope. "Do specific task X with input Y" works; open-ended tasking does not.
- Sandbox the execution. The blast radius of agent mistakes matters.
- Include human review for consequential actions. The current reliability does not yet support autonomy.
- Build the observability and audit layer deliberately. The compliance gap is real.
- Match the workload to the technique. Computer use is one tool; not every workflow benefits from it.
Computer use and browser agents are a category that will keep evolving. The capability ceiling in late 2024 is real; the reliability ceiling is lower. The teams that find the workloads where computer use genuinely helps — and respect the constraints — capture meaningful value. The teams that chase the demo aesthetic ship workflows that fail in production.
RELATED READING
More from the field.
Service practices the article draws on, related programmes, and other pieces on adjacent topics.
Service practices
Related pieces
MCP and AI Interoperability — The Standardisation That Was Missing
Model Context Protocol arrived in late 2024 as an attempted standard for AI-to-tool connections. The standardisation matters more than the protocol details for enterprise architects.
The Practical State of AI Agents in Mid-2024
The agent conversation has moved from hype to deployment in some categories and remains hype in others. A practitioner snapshot of where agents are actually working and where they are still demos.
Three Years of Enterprise AI — What We Got Right and Wrong
A practitioner reflection on three years of enterprise AI work — the patterns I called correctly, the calls I got wrong, and what to take from each into 2026 and beyond.
Discuss this work
Bring an enterprise programme.
If anything in this piece resonates with what you're building, talk to us. Senior practitioners engage directly on architecture and delivery.
Work with the practitioners
Bring an enterprise programme.
Architecture audit, new delivery, modernisation, or in-flight rescue — Intellectual engages directly on enterprise programmes with senior practitioners.