Drive a real browser. Get the answer — or generate the tests.
brow-use is a plugin for Claude Code that drives a real browser on your behalf. Tell it in plain English what you want done, and it carries out the task and returns the result. Or point it at an app and it generates Playwright Page Objects, reusable workflows, and plain-language feature docs from what it sees.
Two problems, one mechanism
1 — Getting things done in a browser
You need a list out of an internal admin tool that has no export. You need to file the same form fifty times. You need a weekly report from an app with no API. The work is mechanical and one-off; building real automation feels disproportionate, so you do it by hand.
LLM assistants can write code that might do it — but they can't see the application. They guess at selectors, hallucinate fields, and the gap between "looks plausible" and "actually fetched the right data" is where the time goes.
2 — Writing browser tests
Writing browser tests is mostly mechanical too. You open the app, inspect the DOM, copy selectors, write a Page Object, write a workflow that uses it, write the test that uses that. Each layer is straightforward in isolation but tedious to assemble — and the result is brittle: a refactor renames a CSS class and a hundred selectors break.
What brow-use does
brow-use closes both gaps with the same mechanism. It gives Claude Code a set of MCP tools that drive a real browser, read the live accessibility tree at every step, and record everything. The agent reasons about what to do next using the same accessibility tree your users (and screen readers) see — so it sees what's actually on the page, picks stable semantic selectors, and recovers from things a static script wouldn't (redirects, modals, conditional UI).
The same browser session and the same agent loop can either satisfy a one-off intent (and hand you the result) or generate code you keep around for tests later.
What you can do with it
Four capabilities, all powered by the same browser-driving agent. Each is independently usable; together they compose.
Treat browser tests like any other software code. Get typed Page Object classes, reusable workflow functions, and a layered structure ready to call from your test suite — all derived from real, recorded interactions with the live app, not guessed from screenshots.
Describe the data you need from a website ("export the active subscriber list as CSV"); receive it in the format you asked for. Works behind authentication without sharing credentials with the model — the agent operates in your already-logged-in browser session.
Describe the outcome ("file this expense report", "register these fifty entries"); the agent carries it out and hands you a forensic record of what it did. Works behind authentication without you exposing credentials. Destructive actions are refused by policy.
Generate end-user documentation of an entire web application: one feature page per cluster of related screens, with embedded screenshots, a navigation map between pages, and zero developer jargon. Read by humans, also used as grounding context for later natural-language tasks against the same app.
Why it's different
The capabilities above each look obvious in isolation. The hard part — and what brow-use actually innovates on — is the synthesis of three previously separate worlds into a single agent loop, and pushing all the orchestration intelligence complexity into that loop. This is what makes this product possible.
The Chrome extension
A Manifest V3 extension lets the agent operate inside your real, logged-in Chrome — using your existing session, cookies, and profile. That's what makes natural-language tasks behind authentication possible without you handing credentials to the model or scripting a login flow per app.
The agent loop
Claude Code's reasoning loop sits at the centre and absorbs the messy parts: redirects, intermediate loading states, conditional UI, retry-on-stale, deciding which selector strategy to use. Each tick perceives the live page, picks the next step, acts, and observes the result. The complexity that would otherwise live in your code lives in the loop instead.
Browser plain-text artifacts
Underneath, Playwright produces the plain-text view of the browser the agent reasons over: an accessibility tree per page, accessible selectors (getByRole, getByLabel) for any code it generates, and a full Playwright trace per run. The same artifacts that ground the agent's decisions also make every run auditable, replayable, and re-usable as context for future runs.
The synthesis
Each piece on its own is a familiar tool. Putting them together — and pushing the integration into a single agent loop — is what otherwise requires weeks of glue code, hand-written selectors, custom login automation, and manual inspection of every screen of every app you want to automate. brow-use does that work once, generically, so any web application gets all four capabilities above with no per-app engineering investment.
What you walk away with
| Outcome | What you receive |
|---|---|
| Extracted data | Structured data from the website in the format you asked for: CSV, JSON, markdown, or plain text. |
| Plain-language summary | A short narrative of which pages were visited and how the result was obtained — written for a non-developer reader. |
| Page Object classes | Typed TypeScript Page Object Model files, one per page reached, with stable accessible selectors. |
| Workflow functions | Reusable parameterised async TypeScript functions for recorded user flows, importable from any test. |
| Feature documentation | One feature page per cluster of related screens, plus a page-transitions index — all in plain English with embedded screenshots. |
| Forensic record of every run | A Playwright trace (DOM snapshots, screencast, network), a per-step aria-tree log, and a run database — replayable and auditable after the fact. |
Get started
brow-use installs as a plugin into Claude Code. Once installed, you register the application you want to work with, let the agent get familiar with it once, and then ask it for whatever you need — data, an action, documentation, or generated code.
The user guide walks each capability end-to-end with the exact commands. If you'd rather see how the pieces fit together first, head to the architecture page.