Using brow-use
Driving brow-use from your own agent (a skill) instead of typing slash commands? See Agent integration for the MCP tool reference and a skill template.
Before you start
1 — Install the plugin
brow-use ships as a Claude Code plugin. The MCP server runs as a child process of Claude Code; the Chrome extension is optional and only required for session mode. See the developer guide for setup steps.
2 — Pick a mode
Default (Playwright) launches a fresh Chromium window each time. Use it for public pages, demos, and anything where login state isn't needed.
Session mode routes commands through the brow-use Chrome extension into your real, logged-in Chrome tab. Use it for anything behind authentication.
/bu:use-managed-browser— switch to default Playwright mode./bu:use-session— switch to session mode (will list tabs and ask which one to pin).
3 — Confirm URL and mode
URL and mode are confirmed before every run. Browser-driving
commands (/bu:explore, /bu:explore-guided,
/bu:run-instruction, /bu:investigate) ask for the target URL
if you haven't supplied one, then confirm URL + mode before starting. The mode is
persisted in .brow-use/config.json, so it survives MCP server restarts.
Optionally provide a short description of the app — it is used as exploration bias in
/bu:explore and /bu:explore-guided.
/bu:health— quick check that the MCP server, extension, and tab are wired up./bu:setup-project— scaffold a Playwright TypeScript project in the current directory.
How the commands fit together
Commands are organised into four layers. Each layer consumes what the layer below produces. Browse the diagram first, then read the reference for any command you want to use.
Drive the browser. Capture raw artifacts: traces and aria logs.
/bu:explore— autonomous breadth-first walk/bu:explore-guided— one-off intent, ad-hoc/bu:investigate— run a small action and report what you observeDeterministic shell scripts. No browser, no LLM. Same trace in → same artifacts out.
make extract SESSION=<id>— trace → aria log + screenshotsnpm run viewer:ingest— run viewer database buildTurn captured data into human-readable docs and machine-readable Page Objects.
/bu:document— feature docs + page-transitions index/bu:generate-page-objects— typed POMs from aria logUse the knowledge artifacts. Either execute a plain-English intent live, or generate a reusable workflow function from a goal description.
/bu:run-instruction— execute an intent live, optionally grounded/bu:generate-workflow-function— generate a Playwright workflow fn (no POM)
Shortcut: for a one-off task you don't want to ground in earlier docs,
just run /bu:explore-guided directly. It bypasses Layers 2–4 entirely and
leaves a trace behind.
Command reference
Layer 1 /bu:explore
Drives the agent through your app breadth-first. Captures one aria-tree entry per page, detects loops via perceptual hashing of the screen plus an aria-tree hash, and records the whole run as a single Playwright trace.
Asks for three budgets up front (with sensible defaults): maxSteps,
maxLoopHits, phashThreshold. The exploration policy enforces
"every top-level module before deepening any branch" — so the trace covers the surface
area of the app, not just one module.
Outputs: trace zip in output/trace/, run row in
.brow-use/runs.json, reasoning log in output/reasoning/<id>.jsonl.
The aria-tree log and per-step screenshots are produced by the next step.
Then run: make extract SESSION=<id> to extract the aria
log and screenshots from the trace. Extraction is intentionally separate so it can be re-run
whenever the trace format or extraction heuristics change, without re-driving the browser.
Layer 1 /bu:explore-guided
One-off intent execution with a recording. No knowledge stack, no docs reference — useful
for ad-hoc tasks where you just want the agent to do the thing and leave behind a
trace. Unlike /bu:run-instruction, it never asks about grounding and goes
straight to execution.
Layer 1 /bu:investigate
Take a small action in the browser and answer a question about it. The command asks for two
things — what to run (the action sequence) and how to help with the
investigation (what to observe or explain) — and decides the technique on its own:
aria diffs, screenshots, DOM snapshots, interactive-surface enumeration, or page
fingerprints, used in whichever combination answers the question. Unlike
/bu:explore-guided and /bu:run-instruction, this command does not
read any prior run's docs, page objects, or aria logs — the investigation is purely live.
Refuses destructive "what to run" inputs at the parsing stage. Has a hard step budget (40
browser actions). Records a trace and a run row in .brow-use/runs.json like the
other browser-driving commands.
Outputs: output/investigation/<id>/findings.md with the
narrative answer plus any saved screenshots, and the trace zip in output/trace/.
Layer 2 make extract SESSION=<id>
Post-processes a trace zip into the per-step aria log and screenshots that Layer 3 commands consume. No browser, no LLM — pure shell + TypeScript, so the same trace always produces the same output. Re-runnable any time the extraction heuristics change.
Outputs: output/exploration/<id>.jsonl and output/exploration/<id>/page-*.jpg.
Layer 3 /bu:document
Read-only over an explore run. Groups pages into features (by URL prefix and aria summary),
embeds screenshots from the trace, and writes one feature doc per cluster plus a single
page-transitions.md index.
Navigation edges come from read_observed_edges, which correlates the
trace's action sidecar with the aria log to produce ground-truth transitions — every click
and navigate the agent actually performed. Phrases like "select Save" appear identically
in the feature narrative and in the page-transitions table, so the docs stay internally
consistent.
Outputs: output/docs/<id>/README.md,
page-transitions.md, and one <feature>.md per cluster.
Layer 3 /bu:generate-page-objects
Generates Playwright Page Object Model classes from an explore run's aria log. No browser needed — works entirely from captured data.
Two passes: first, an in-memory name map with one entry per unique URL (class name,
file name, elements, navigation edges). Second, generation — for every page either creates
a new file via write_page_object or merges into an existing matching one.
Navigation edges from read_observed_edges drive typed return types: a method
that clicks Save on EditPage returns DetailsPage when the
sidecar saw that exact transition.
Outputs: TypeScript files in output/page/, one per page reached.
Idempotent — re-running merges new locators into existing classes without overwriting work.
Layer 4 /bu:run-instruction
Carries out a plain-English intent ("Export every Excavating Machine as CSV") against the
current app. On launch it reads .brow-use/runs.json and, if earlier explore
runs exist, offers to ground in one — loading docs, aria log, POMs, and workflows as a
knowledge stack. If you skip grounding or no explore runs are present, it executes the
intent directly from the live page.
Output format defaults to markdown but accepts csv, json, or
txt. Refuses destructive intents at the parsing stage. Has a hard step budget
(50 browser actions) so it can't wander away from the goal.
Outputs: output/results/<id>/result.<ext> with the
structured data, plus how.md describing in plain language which pages were
visited and how the data was collected.
Layer 4 /bu:generate-workflow-function
Generates a reusable Playwright async function from a plain-English goal
("log in then search Subjects by name"). The generated function calls Playwright's
getByRole / getByLabel / page.click APIs directly —
no Page Object Model imports — so the workflow file is self-contained and depends only on
@playwright/test.
Picks the best grounding available: feature docs from /bu:document when they
exist (semantic mapping from goal phrases to specific page transitions), falling back to
the aria log + observed-edges from a raw /bu:explore or
/bu:explore-guided run. No browser is launched — this is code generation from
captured data.
Outputs: a single TypeScript file at
output/workflow/<function-name>-workflow.ts, exporting one
async function whose first parameter is page: Page.
Mode commands
Patterns
Re-running over the same app
Run /bu:explore once and reuse it. Every Layer 3 / Layer 4 command takes a
sessionId at the top, so you can document, generate POMs, and run intents against the same
explore output many times. When the app changes, run /bu:explore again to
capture a new baseline.
Auditing a run
Every recorded run lands in .brow-use/runs.json. The viewer
(npm run viewer:dev) stitches the trace, sidecar, reasoning log, and aria
log into a navigable timeline. Use it when something looked odd and you need to understand
why the agent took a particular path.
Idempotent merges
Every write_* tool that writes user-facing artifacts is idempotent. Re-running
a Layer 3 command over a Layer 1 run will overwrite that run's outputs cleanly without
touching unrelated files; Page Objects merge new locators in without dropping existing
ones.