Driving brow-use from your own agent
brow-use exposes two interfaces. The slash commands
(/bu:explore, /bu:do, …) are for humans. Underneath, every
command is a thin wrapper around a set of MCP tools that any Claude Code
agent can call directly — including agents in your own project, driven by a skill.
This page documents that surface and shows how to wire it into a skill.
Commands vs MCP tools
| Interface | Triggered by | Use when |
|---|---|---|
Slash commands /bu:* |
A human typing in Claude Code | Interactive use — exploration, ad-hoc data fetches, generating tests by hand. |
MCP tools mcp__bu__* |
Any Claude Code agent (autonomous) | You want an agent in your own project to drive brow-use without a human in the loop — e.g., a skill that fetches a daily report, or a workflow that runs as part of a larger task. |
Installation reminder. Both interfaces require the plugin to be
enabled in the consuming project. The MCP server is registered in .mcp.json
and the plugin (with its slash commands and injected CLAUDE.md instructions)
is enabled in .claude/settings.json. See the
user guide for setup.
MCP tool reference
Every tool below is exposed by the bu MCP server. The full input
schema lives in tool/<name>.ts in the brow-use repository — the
summaries here cover what each tool is for and the main inputs/outputs.
domcontentloaded by default and returns the resulting title and final URL. Inputs: url, optional waitUntil (domcontentloaded / load / networkidle — use networkidle for SPAs that keep rendering after load), optional urlPrefix (scope backstop — an out-of-scope target is rejected without loading, returning rejected: true).selector. Discover selectors via get_accessibility_tree or enumerate_interactive_elements first. If the selector no longer matches, the tool heals it against the current accessibility tree, retries once, and reports the substitution in the result (healed from: …) — update your page object when you see it.selector, text. Selector healing applies as for click.localStorage, and sessionStorage for the current context. Use between independent runs in Playwright mode.topLevelOnly, rolesFilter, includeDestructive, urlPrefix (scope filter — links whose target path is outside the prefix are stripped server-side; urlless buttons are kept).ariaHash), SHA-1 of the same tree with text/values stripped (structuralHash, the page skeleton), plus a 64-bit perceptual image hash. Pair with compare_fingerprint.ariaHash (a true loop, aria-identical); same structuralHash AND same URL template, id-like path segments collapsed to :id (a repeat of one page archetype, same-template — used to sample one list/detail page and skip the rest); then Hamming distance on the perceptual hash (phash-close). Inputs: candidate, known (each may carry structuralHash and url), phashThreshold.output/trace/<name>/, so a mid-run crash loses at most the last in-flight action. Input: name (the session id).output/trace/<name>/. Each chunk is an independently viewable Playwright trace. Input: name.output/page/<name>.ts. Inputs: name, content, and optional provenance sessionId + sources ([{ stepId, url, tab? }]) written to a sibling <name>.meta.json so the viewer can link the class back to its screenshots, aria trees, and tab panels.output/workflow/<name>.ts.output/test/<name>.spec.ts.output/docs/<sessionId>/<name>.md.README.md for the docs session — TOC, app metadata, generation footer. Inputs: sessionId, appName, entries, …output/exploration/<sessionId>.jsonl.compare_fingerprint reason same-template) to output/exploration/<sessionId>-skipped.jsonl. Keeps the run auditable and lets the viewer show a sampled page's skipped siblings. Inputs: sessionId, entries, append.output/exploration/<sessionId>/<name>.png. Returns a Markdown embed snippet you can drop into a generated doc.output/results/<sessionId>/result.<ext>. Handles CSV escaping, JSON indenting, and Markdown table alignment. Inputs: format (markdown / csv / json / txt), records, columns..brow-use/runs.json (the database of all brow-use invocations).playwright (fresh Chromium) or crx (real Chrome via the extension).tabId.{ok, issues} with a one-line remedy per issue, plus per-tool timing stats. Input: optional heal — when true, a hung extension connection is dropped so the extension reconnects, then the check reruns and reports healed.output/reasoning/<sessionId>.jsonl. Use sparingly — this is an audit trail, not narrative.Using brow-use from a skill
A skill is a Markdown file that tells a Claude Code agent when and how
to do something. Skills live inside the consuming project — not inside brow-use — and
can call any tool that's available to the agent, including mcp__bu__*.
Where the skill file lives
Inside the project that wants to drive brow-use (e.g., your application repo), create:
.claude/skills/<skill-name>/SKILL.md
Claude Code discovers skills under .claude/skills/ automatically. The
directory name is the skill's slug.
Skill template
Replace the bracketed placeholders. Comments are for your reference; you can delete them.
---
name: [skill-slug]
description: [One sentence. What does this skill do, and when should the
agent activate it? Be specific — Claude reads this to decide whether to
invoke the skill in a given turn.]
---
# [Human-readable skill title]
[Optional: one paragraph of background — what app are we driving, why,
what does the user expect as output.]
## When to use
[Concrete triggers. E.g.: "When the user asks for the weekly active-user
count, or anything phrased as 'pull X from the admin tool'."]
## Steps
1. **Precondition: confirm URL and mode.** Read `.brow-use/config.json` with
the Read tool.
- If `currentMode` is null (and your skill drives the browser), tell the
user to run `/bu:use-managed-browser` or `/bu:use-session` and stop.
- Get the target URL: use it if supplied, otherwise ask the user.
- Confirm with the user: "I'll run against **{url}** in **{currentMode}**
mode. Continue or change mode?" Honour the answer before proceeding.
2. **(If switching mode programmatically)** Call `mcp__bu__set_mode` with
`crx` (user's logged-in Chrome) or `playwright` (fresh Chromium). The mode
is persisted to `.brow-use/config.json` automatically.
3. **Start tracing** (if you want an audit record): `mcp__bu__start_trace`.
4. **Drive the browser.** Loop:
- `mcp__bu__navigate` to the entry URL.
- `mcp__bu__get_accessibility_tree` to perceive the page.
- Pick the next action from the tree; call `mcp__bu__click` or
`mcp__bu__type` with the selector you chose.
- Repeat until the goal page is reached.
5. **Persist outputs.** Depending on the goal:
- **Data extraction** → `mcp__bu__write_result` with the records
and the format the user asked for (csv / json / markdown / txt).
- **Generated code** → `mcp__bu__write_page_object`,
`mcp__bu__write_workflow`, or `mcp__bu__write_test`.
- **Documentation** → `mcp__bu__save_screenshot` per page, then
`mcp__bu__write_feature_doc` and `mcp__bu__write_docs_index`.
6. **Stop tracing** (if started): `mcp__bu__stop_trace` with a name.
7. **Record the run**: `mcp__bu__record_run` so the run appears in
`.brow-use/runs.json`.
## Notes for the agent
- Always perceive before acting — prefer `get_accessibility_tree` to
`snapshot`. Screenshots are expensive and not needed for navigation.
- Use accessible selectors (role + name) from the ARIA tree, not raw CSS,
whenever both are available.
- If the page looks identical after an action, use
`mcp__bu__page_fingerprint` + `compare_fingerprint` to detect loops.
- Destructive actions are filtered out of
`enumerate_interactive_elements` by default. Don't override unless the
user has explicitly asked for a destructive operation.
What context the agent already has
When the plugin is enabled in the consuming project, brow-use injects its own
CLAUDE.md into every session. This instructs the agent to read the current
app context before acting — meaning your skill can assume that context is already
established and focus on the task-specific steps.
Note on mode state. currentMode lives on disk in
.brow-use/config.json. Read the file directly with the Read tool.
The mode persists across MCP server restarts because mcp__bu__set_mode
writes it to the same file.
Accessing output artifacts
Everything brow-use generates lands on disk in predictable locations. An agent consuming those outputs uses one of two channels:
| Artifact | Location on disk | How to read it |
|---|---|---|
| Extracted result (data) | output/results/<sessionId>/result.<ext> |
Read tool |
| Page Object class | output/page/<name>.ts |
Read, or mcp__bu__read_pom_summary for a structured summary |
| Workflow function | output/workflow/<name>.ts |
Read |
| Generated test | output/test/<name>.spec.ts |
Read |
| Feature documentation | output/docs/<sessionId>/<name>.md |
Read |
| Docs index | output/docs/<sessionId>/README.md |
Read |
| Per-step screenshots | output/exploration/<sessionId>/<name>.png |
Read (returns an image block) |
| Exploration log | output/exploration/<sessionId>.jsonl |
Read |
| Navigation graph | (derived from trace + ARIA snapshots) | mcp__bu__read_observed_edges |
| Playwright trace | output/trace/<name>.zip |
Replayable with npx playwright show-trace — not normally read by the agent |
| Run database | .brow-use/runs.json |
Read |
| Reasoning audit | output/reasoning/<sessionId>.jsonl |
Read |
The same naming is used regardless of whether brow-use was driven by a slash command or
by your own skill — so any tooling you build around output/ works in both
cases.
Next steps
To set up the plugin in a consuming project, see the
user guide. To understand how the MCP server fits into the
overall system, see architecture. To extend the tool
surface itself (adding new mcp__bu__* tools), see the
developer guide.