User guide

Day-to-day workflow

Every command, what it does, and the order to run them in.


Before you start

1 — Install the plugin

brow-use ships as a Claude Code plugin. The MCP server runs as a child process of Claude Code; the Chrome extension is optional and only required for session mode. See the developer guide for setup steps.

2 — Pick a mode

Default (Playwright) launches a fresh Chromium window each time. Use it for public pages, demos, and anything where login state isn't needed.

Session mode routes commands through the brow-use Chrome extension into your real, logged-in Chrome tab. Use it for anything behind authentication.

3 — Register an app

Most commands start by reading .brow-use/apps.json for the currently active app — its URL, name, and functional description. Run /bu:apps once to register it; the description is used as exploration bias and as context for everything that follows.


How the commands fit together

Commands are organised into four layers. Each layer consumes what the layer below produces. Browse the diagram first, then read the reference for any command you want to use.

Layer 1 Live capture

Drive the browser. Capture raw artifacts: traces and aria logs.

/bu:explore— autonomous breadth-first walk
/bu:explore-guided— one-off intent, ad-hoc
trace zip
Layer 2 Post-processing

Deterministic shell scripts. No browser, no LLM. Same trace in → same artifacts out.

make extract SESSION=<id>— trace → aria log + screenshots
npm run viewer:ingest— run viewer database build
aria log + screenshots
Layer 3 Knowledge generation

Turn captured data into human-readable docs and machine-readable Page Objects.

/bu:document— feature docs + page-transitions index
/bu:generate-page-objects— typed POMs from aria log
docs + Page Objects + workflows
Layer 4 Grounded execution

Use the knowledge artifacts. Either execute a plain-English intent live, or generate a reusable workflow function from a goal description.

/bu:run-instruction— execute an intent live, optionally grounded
/bu:generate-workflow-function— generate a Playwright workflow fn (no POM)

Shortcut: for a one-off task you don't want to ground in earlier docs, just run /bu:explore-guided directly. It bypasses Layers 2–4 entirely and leaves a trace behind.


Command reference

Layer 1   /bu:explore

Drives the agent through your app breadth-first. Captures one aria-tree entry per page, detects loops via perceptual hashing of the screen plus an aria-tree hash, and records the whole run as a single Playwright trace.

Asks for three budgets up front (with sensible defaults): maxSteps, maxLoopHits, phashThreshold. The exploration policy enforces "every top-level module before deepening any branch" — so the trace covers the surface area of the app, not just one module.

Outputs: trace zip in output/trace/, run row in .brow-use/runs.json, reasoning log in output/reasoning/<id>.jsonl. The aria-tree log and per-step screenshots are produced by the next step.

Then run: make extract SESSION=<id> to extract the aria log and screenshots from the trace. Extraction is intentionally separate so it can be re-run whenever the trace format or extraction heuristics change, without re-driving the browser.

Layer 1   /bu:explore-guided

One-off intent execution with a recording. No knowledge stack, no docs reference — useful for ad-hoc tasks where you just want the agent to do the thing and leave behind a trace. Unlike /bu:run-instruction, it never asks about grounding and goes straight to execution.

Layer 2   make extract SESSION=<id>

Post-processes a trace zip into the per-step aria log and screenshots that Layer 3 commands consume. No browser, no LLM — pure shell + TypeScript, so the same trace always produces the same output. Re-runnable any time the extraction heuristics change.

Outputs: output/exploration/<id>.jsonl and output/exploration/<id>/page-*.jpg.

Layer 3   /bu:document

Read-only over an explore run. Groups pages into features (by URL prefix and aria summary), embeds screenshots from the trace, and writes one feature doc per cluster plus a single page-transitions.md index.

Navigation edges come from read_observed_edges, which correlates the trace's action sidecar with the aria log to produce ground-truth transitions — every click and navigate the agent actually performed. Phrases like "select Save" appear identically in the feature narrative and in the page-transitions table, so the docs stay internally consistent.

Outputs: output/docs/<id>/README.md, page-transitions.md, and one <feature>.md per cluster.

Layer 3   /bu:generate-page-objects

Generates Playwright Page Object Model classes from an explore run's aria log. No browser needed — works entirely from captured data.

Two passes: first, an in-memory name map with one entry per unique URL (class name, file name, elements, navigation edges). Second, generation — for every page either creates a new file via write_page_object or merges into an existing matching one. Navigation edges from read_observed_edges drive typed return types: a method that clicks Save on EditPage returns DetailsPage when the sidecar saw that exact transition.

Outputs: TypeScript files in output/page/, one per page reached. Idempotent — re-running merges new locators into existing classes without overwriting work.

Layer 4   /bu:run-instruction

Carries out a plain-English intent ("Export every Excavating Machine as CSV") against the current app. On launch it reads .brow-use/runs.json and, if earlier explore runs exist, offers to ground in one — loading docs, aria log, POMs, and workflows as a knowledge stack. If you skip grounding or no explore runs are present, it executes the intent directly from the live page.

Output format defaults to markdown but accepts csv, json, or txt. Refuses destructive intents at the parsing stage. Has a hard step budget (50 browser actions) so it can't wander away from the goal.

Outputs: output/results/<id>/result.<ext> with the structured data, plus how.md describing in plain language which pages were visited and how the data was collected.

Layer 4   /bu:generate-workflow-function

Generates a reusable Playwright async function from a plain-English goal ("log in then search Subjects by name"). The generated function calls Playwright's getByRole / getByLabel / page.click APIs directly — no Page Object Model imports — so the workflow file is self-contained and depends only on @playwright/test.

Picks the best grounding available: feature docs from /bu:document when they exist (semantic mapping from goal phrases to specific page transitions), falling back to the aria log + observed-edges from a raw /bu:explore or /bu:explore-guided run. No browser is launched — this is code generation from captured data.

Outputs: a single TypeScript file at output/workflow/<function-name>-workflow.ts, exporting one async function whose first parameter is page: Page.

App management

/bu:apps manages everything in .brow-use/apps.json: list, create, delete, set current. The "current app" is the one all subsequent commands operate against — its URL is the entry point, its description is the exploration bias.

For the file shape and full conversational behaviour, see App management.


Mode commands

/bu:use-managed-browser
Switch to Mode 1 — fresh Chromium launched by Playwright. No login state. Default.
/bu:use-session
Switch to Mode 2 — automate your real Chrome via the brow-use extension. Lists tabs and asks you to pin one.
/bu:health
Diagnose the MCP server, the extension, and the active tab. Each issue comes with a one-line remedy.
/bu:setup-project
Scaffold a Playwright TypeScript project in the current directory — adds missing config, dependencies, and output folders.

Patterns

Re-running over the same app

Run /bu:explore once and reuse it. Every Layer 3 / Layer 4 command takes a sessionId at the top, so you can document, generate POMs, and run intents against the same explore output many times. When the app changes, run /bu:explore again to capture a new baseline.

Auditing a run

Every recorded run lands in .brow-use/runs.json. The viewer (npm run viewer:dev) stitches the trace, sidecar, reasoning log, and aria log into a navigable timeline. Use it when something looked odd and you need to understand why the agent took a particular path.

Idempotent merges

Every write_* tool that writes user-facing artifacts is idempotent. Re-running a Layer 3 command over a Layer 1 run will overwrite that run's outputs cleanly without touching unrelated files; Page Objects merge new locators in without dropping existing ones.