Day-to-day workflow
Every command, what it does, and the order to run them in.
Before you start
1 — Install the plugin
brow-use ships as a Claude Code plugin. The MCP server runs as a child process of Claude Code; the Chrome extension is optional and only required for session mode. See the developer guide for setup steps.
2 — Pick a mode
Default (Playwright) launches a fresh Chromium window each time. Use it for public pages, demos, and anything where login state isn't needed.
Session mode routes commands through the brow-use Chrome extension into your real, logged-in Chrome tab. Use it for anything behind authentication.
/bu:use-managed-browser— switch to default Playwright mode./bu:use-session— switch to session mode (will list tabs and ask which one to pin).
3 — Register an app
Most commands start by reading .brow-use/apps.json for the currently active
app — its URL, name, and functional description. Run /bu:apps once to register
it; the description is used as exploration bias and as context for everything that follows.
/bu:apps— conversational catalogue: list, create, delete, set current./bu:health— quick check that the MCP server, extension, and tab are wired up./bu:setup-project— scaffold a Playwright TypeScript project in the current directory.
How the commands fit together
Commands are organised into four layers. Each layer consumes what the layer below produces. Browse the diagram first, then read the reference for any command you want to use.
Drive the browser. Capture raw artifacts: traces and aria logs.
/bu:explore— autonomous breadth-first walk/bu:explore-guided— one-off intent, ad-hocDeterministic shell scripts. No browser, no LLM. Same trace in → same artifacts out.
make extract SESSION=<id>— trace → aria log + screenshotsnpm run viewer:ingest— run viewer database buildTurn captured data into human-readable docs and machine-readable Page Objects.
/bu:document— feature docs + page-transitions index/bu:generate-page-objects— typed POMs from aria logUse the knowledge artifacts. Either execute a plain-English intent live, or generate a reusable workflow function from a goal description.
/bu:run-instruction— execute an intent live, optionally grounded/bu:generate-workflow-function— generate a Playwright workflow fn (no POM)
Shortcut: for a one-off task you don't want to ground in earlier docs,
just run /bu:explore-guided directly. It bypasses Layers 2–4 entirely and
leaves a trace behind.
Command reference
Layer 1 /bu:explore
Drives the agent through your app breadth-first. Captures one aria-tree entry per page, detects loops via perceptual hashing of the screen plus an aria-tree hash, and records the whole run as a single Playwright trace.
Asks for three budgets up front (with sensible defaults): maxSteps,
maxLoopHits, phashThreshold. The exploration policy enforces
"every top-level module before deepening any branch" — so the trace covers the surface
area of the app, not just one module.
Outputs: trace zip in output/trace/, run row in
.brow-use/runs.json, reasoning log in output/reasoning/<id>.jsonl.
The aria-tree log and per-step screenshots are produced by the next step.
Then run: make extract SESSION=<id> to extract the aria
log and screenshots from the trace. Extraction is intentionally separate so it can be re-run
whenever the trace format or extraction heuristics change, without re-driving the browser.
Layer 1 /bu:explore-guided
One-off intent execution with a recording. No knowledge stack, no docs reference — useful
for ad-hoc tasks where you just want the agent to do the thing and leave behind a
trace. Unlike /bu:run-instruction, it never asks about grounding and goes
straight to execution.
Layer 2 make extract SESSION=<id>
Post-processes a trace zip into the per-step aria log and screenshots that Layer 3 commands consume. No browser, no LLM — pure shell + TypeScript, so the same trace always produces the same output. Re-runnable any time the extraction heuristics change.
Outputs: output/exploration/<id>.jsonl and output/exploration/<id>/page-*.jpg.
Layer 3 /bu:document
Read-only over an explore run. Groups pages into features (by URL prefix and aria summary),
embeds screenshots from the trace, and writes one feature doc per cluster plus a single
page-transitions.md index.
Navigation edges come from read_observed_edges, which correlates the
trace's action sidecar with the aria log to produce ground-truth transitions — every click
and navigate the agent actually performed. Phrases like "select Save" appear identically
in the feature narrative and in the page-transitions table, so the docs stay internally
consistent.
Outputs: output/docs/<id>/README.md,
page-transitions.md, and one <feature>.md per cluster.
Layer 3 /bu:generate-page-objects
Generates Playwright Page Object Model classes from an explore run's aria log. No browser needed — works entirely from captured data.
Two passes: first, an in-memory name map with one entry per unique URL (class name,
file name, elements, navigation edges). Second, generation — for every page either creates
a new file via write_page_object or merges into an existing matching one.
Navigation edges from read_observed_edges drive typed return types: a method
that clicks Save on EditPage returns DetailsPage when the
sidecar saw that exact transition.
Outputs: TypeScript files in output/page/, one per page reached.
Idempotent — re-running merges new locators into existing classes without overwriting work.
Layer 4 /bu:run-instruction
Carries out a plain-English intent ("Export every Excavating Machine as CSV") against the
current app. On launch it reads .brow-use/runs.json and, if earlier explore
runs exist, offers to ground in one — loading docs, aria log, POMs, and workflows as a
knowledge stack. If you skip grounding or no explore runs are present, it executes the
intent directly from the live page.
Output format defaults to markdown but accepts csv, json, or
txt. Refuses destructive intents at the parsing stage. Has a hard step budget
(50 browser actions) so it can't wander away from the goal.
Outputs: output/results/<id>/result.<ext> with the
structured data, plus how.md describing in plain language which pages were
visited and how the data was collected.
Layer 4 /bu:generate-workflow-function
Generates a reusable Playwright async function from a plain-English goal
("log in then search Subjects by name"). The generated function calls Playwright's
getByRole / getByLabel / page.click APIs directly —
no Page Object Model imports — so the workflow file is self-contained and depends only on
@playwright/test.
Picks the best grounding available: feature docs from /bu:document when they
exist (semantic mapping from goal phrases to specific page transitions), falling back to
the aria log + observed-edges from a raw /bu:explore or
/bu:explore-guided run. No browser is launched — this is code generation from
captured data.
Outputs: a single TypeScript file at
output/workflow/<function-name>-workflow.ts, exporting one
async function whose first parameter is page: Page.
App management
/bu:apps manages everything in .brow-use/apps.json: list,
create, delete, set current. The "current app" is the one all subsequent commands operate
against — its URL is the entry point, its description is the exploration bias.
For the file shape and full conversational behaviour, see App management.
Mode commands
Patterns
Re-running over the same app
Run /bu:explore once and reuse it. Every Layer 3 / Layer 4 command takes a
sessionId at the top, so you can document, generate POMs, and run intents against the same
explore output many times. When the app changes, run /bu:explore again to
capture a new baseline.
Auditing a run
Every recorded run lands in .brow-use/runs.json. The viewer
(npm run viewer:dev) stitches the trace, sidecar, reasoning log, and aria
log into a navigable timeline. Use it when something looked odd and you need to understand
why the agent took a particular path.
Idempotent merges
Every write_* tool that writes user-facing artifacts is idempotent. Re-running
a Layer 3 command over a Layer 1 run will overwrite that run's outputs cleanly without
touching unrelated files; Page Objects merge new locators in without dropping existing
ones.