Agent integration

Driving brow-use from your own agent

brow-use exposes two interfaces. The slash commands (/bu:explore, /bu:do, …) are for humans. Underneath, every command is a thin wrapper around a set of MCP tools that any Claude Code agent can call directly — including agents in your own project, driven by a skill. This page documents that surface and shows how to wire it into a skill.

Commands vs MCP tools

Interface	Triggered by	Use when
Slash commands `/bu:*`	A human typing in Claude Code	Interactive use — exploration, ad-hoc data fetches, generating tests by hand.
MCP tools `mcp__bu__*`	Any Claude Code agent (autonomous)	You want an agent in your own project to drive brow-use without a human in the loop — e.g., a skill that fetches a daily report, or a workflow that runs as part of a larger task.

Installation reminder. Both interfaces require the plugin to be enabled in the consuming project. The MCP server is registered in .mcp.json and the plugin (with its slash commands and injected CLAUDE.md instructions) is enabled in .claude/settings.json. See the user guide for setup.

MCP tool reference

Every tool below is exposed by the bu MCP server. The full input schema lives in tool/<name>.ts in the brow-use repository — the summaries here cover what each tool is for and the main inputs/outputs.

Navigation & interaction

mcp__bu__navigate

Navigate the active browser to a URL. Waits for domcontentloaded by default and returns the resulting title and final URL. Inputs: url, optional waitUntil (domcontentloaded / load / networkidle — use networkidle for SPAs that keep rendering after load), optional urlPrefix (scope backstop — an out-of-scope target is rejected without loading, returning rejected: true).

mcp__bu__click

Click an element on the current page. Input: CSS or accessible selector. Discover selectors via get_accessibility_tree or enumerate_interactive_elements first. If the selector no longer matches, the tool heals it against the current accessibility tree, retries once, and reports the substitution in the result (healed from: …) — update your page object when you see it.

mcp__bu__type

Type text into an input. Clears existing content first. Inputs: selector, text. Selector healing applies as for click.

mcp__bu__clear_session

Wipe cookies, localStorage, and sessionStorage for the current context. Use between independent runs in Playwright mode.

Perception

mcp__bu__snapshot

Return a PNG screenshot of the current page as a base64 image block. Use sparingly — the accessibility tree is cheaper and usually sufficient.

mcp__bu__get_accessibility_tree

Return the page's ARIA snapshot as plain text. This is the primary way an agent perceives the page and finds stable selectors.

mcp__bu__enumerate_interactive_elements

List clickable elements (links, buttons, inputs) with selectors and metadata. Destructive actions are filtered by default. Inputs: topLevelOnly, rolesFilter, includeDestructive, urlPrefix (scope filter — links whose target path is outside the prefix are stripped server-side; urlless buttons are kept).

mcp__bu__page_fingerprint

Compute a combined fingerprint for loop and template detection — SHA-1 of the normalized ARIA tree (ariaHash), SHA-1 of the same tree with text/values stripped (structuralHash, the page skeleton), plus a 64-bit perceptual image hash. Pair with compare_fingerprint.

mcp__bu__compare_fingerprint

Check whether a candidate fingerprint matches any in a known set, in priority order: exact ariaHash (a true loop, aria-identical); same structuralHash AND same URL template, id-like path segments collapsed to :id (a repeat of one page archetype, same-template — used to sample one list/detail page and skip the rest); then Hamming distance on the perceptual hash (phash-close). Inputs: candidate, known (each may carry structuralHash and url), phashThreshold.

Recording

mcp__bu__start_trace

Begin a Playwright trace (screenshots, DOM snapshots, network, sources). Call this before any actions you want auditable. The trace is flushed to disk after every action as chunk zips under output/trace/<name>/, so a mid-run crash loses at most the last in-flight action. Input: name (the session id).

mcp__bu__stop_trace

End trace recording, flushing the final chunk. Returns the chunk directory output/trace/<name>/. Each chunk is an independently viewable Playwright trace. Input: name.

Artifact generation

mcp__bu__write_page_object

Write a Playwright Page Object Model TypeScript class to output/page/<name>.ts. Inputs: name, content, and optional provenance sessionId + sources ([{ stepId, url, tab? }]) written to a sibling <name>.meta.json so the viewer can link the class back to its screenshots, aria trees, and tab panels.

mcp__bu__write_workflow

Write a reusable Playwright workflow function module to output/workflow/<name>.ts.

mcp__bu__write_test

Write a Playwright test file to output/test/<name>.spec.ts.

mcp__bu__write_feature_doc

Write a plain-English Markdown guide for one feature/cluster of screens to output/docs/<sessionId>/<name>.md.

mcp__bu__write_docs_index

Generate a README.md for the docs session — TOC, app metadata, generation footer. Inputs: sessionId, appName, entries, …

mcp__bu__write_exploration_log

Append a JSONL audit log of visited pages (one record per page with fingerprints, URL, title, ARIA summary) to output/exploration/<sessionId>.jsonl.

mcp__bu__write_skipped_log

Append a JSONL log of pages deliberately skipped as repeats of an already-sampled archetype (compare_fingerprint reason same-template) to output/exploration/<sessionId>-skipped.jsonl. Keeps the run auditable and lets the viewer show a sampled page's skipped siblings. Inputs: sessionId, entries, append.

mcp__bu__save_screenshot

Capture and save a screenshot to output/exploration/<sessionId>/<name>.png. Returns a Markdown embed snippet you can drop into a generated doc.

Result & metadata

mcp__bu__write_result

Format and save extracted data to output/results/<sessionId>/result.<ext>. Handles CSV escaping, JSON indenting, and Markdown table alignment. Inputs: format (markdown / csv / json / txt), records, columns.

mcp__bu__read_pom_summary

Parse a previously generated Page Object file and return its class name, locators, methods, URL hints, and imports — without loading the full source.

mcp__bu__read_observed_edges

Build the navigation graph from an exploration run by correlating trace actions with ARIA snapshots. Returns an edge list with confidence levels.

mcp__bu__record_run

Append completed run metadata to .brow-use/runs.json (the database of all brow-use invocations).

Session management

mcp__bu__set_mode

Switch execution mode. Input: playwright (fresh Chromium) or crx (real Chrome via the extension).

mcp__bu__list_tabs

List currently open Chrome tabs. Session mode only.

mcp__bu__select_tab

Pin automation to a specific Chrome tab. Input: tabId.

mcp__bu__health_check

Verify MCP server, extension, and browser health. Returns a structured {ok, issues} with a one-line remedy per issue, plus per-tool timing stats. Input: optional heal — when true, a hung extension connection is dropped so the extension reconnects, then the check reruns and reports healed.

Debug

mcp__bu__log_reasoning

Append a one-line reasoning entry to output/reasoning/<sessionId>.jsonl. Use sparingly — this is an audit trail, not narrative.

Using brow-use from a skill

A skill is a Markdown file that tells a Claude Code agent when and how to do something. Skills live inside the consuming project — not inside brow-use — and can call any tool that's available to the agent, including mcp__bu__*.

Where the skill file lives

Inside the project that wants to drive brow-use (e.g., your application repo), create:

.claude/skills/<skill-name>/SKILL.md

Claude Code discovers skills under .claude/skills/ automatically. The directory name is the skill's slug.

Skill template

Replace the bracketed placeholders. Comments are for your reference; you can delete them.

---
name: [skill-slug]
description: [One sentence. What does this skill do, and when should the
  agent activate it? Be specific — Claude reads this to decide whether to
  invoke the skill in a given turn.]
---

# [Human-readable skill title]

[Optional: one paragraph of background — what app are we driving, why,
what does the user expect as output.]

## When to use

[Concrete triggers. E.g.: "When the user asks for the weekly active-user
count, or anything phrased as 'pull X from the admin tool'."]

## Steps

1. **Precondition: confirm URL and mode.** Read `.brow-use/config.json` with
   the Read tool.
   - If `currentMode` is null (and your skill drives the browser), tell the
     user to run `/bu:use-managed-browser` or `/bu:use-session` and stop.
   - Get the target URL: use it if supplied, otherwise ask the user.
   - Confirm with the user: "I'll run against **{url}** in **{currentMode}**
     mode. Continue or change mode?" Honour the answer before proceeding.

2. **(If switching mode programmatically)** Call `mcp__bu__set_mode` with
   `crx` (user's logged-in Chrome) or `playwright` (fresh Chromium). The mode
   is persisted to `.brow-use/config.json` automatically.

3. **Start tracing** (if you want an audit record): `mcp__bu__start_trace`.

4. **Drive the browser.** Loop:
   - `mcp__bu__navigate` to the entry URL.
   - `mcp__bu__get_accessibility_tree` to perceive the page.
   - Pick the next action from the tree; call `mcp__bu__click` or
     `mcp__bu__type` with the selector you chose.
   - Repeat until the goal page is reached.

5. **Persist outputs.** Depending on the goal:
   - **Data extraction** → `mcp__bu__write_result` with the records
     and the format the user asked for (csv / json / markdown / txt).
   - **Generated code** → `mcp__bu__write_page_object`,
     `mcp__bu__write_workflow`, or `mcp__bu__write_test`.
   - **Documentation** → `mcp__bu__save_screenshot` per page, then
     `mcp__bu__write_feature_doc` and `mcp__bu__write_docs_index`.

6. **Stop tracing** (if started): `mcp__bu__stop_trace` with a name.

7. **Record the run**: `mcp__bu__record_run` so the run appears in
   `.brow-use/runs.json`.

## Notes for the agent

- Always perceive before acting — prefer `get_accessibility_tree` to
  `snapshot`. Screenshots are expensive and not needed for navigation.
- Use accessible selectors (role + name) from the ARIA tree, not raw CSS,
  whenever both are available.
- If the page looks identical after an action, use
  `mcp__bu__page_fingerprint` + `compare_fingerprint` to detect loops.
- Destructive actions are filtered out of
  `enumerate_interactive_elements` by default. Don't override unless the
  user has explicitly asked for a destructive operation.

What context the agent already has

When the plugin is enabled in the consuming project, brow-use injects its own CLAUDE.md into every session. This instructs the agent to read the current app context before acting — meaning your skill can assume that context is already established and focus on the task-specific steps.

Note on mode state. currentMode lives on disk in .brow-use/config.json. Read the file directly with the Read tool. The mode persists across MCP server restarts because mcp__bu__set_mode writes it to the same file.

Accessing output artifacts

Everything brow-use generates lands on disk in predictable locations. An agent consuming those outputs uses one of two channels:

Artifact	Location on disk	How to read it
Extracted result (data)	`output/results/<sessionId>/result.<ext>`	`Read` tool
Page Object class	`output/page/<name>.ts`	`Read`, or `mcp__bu__read_pom_summary` for a structured summary
Workflow function	`output/workflow/<name>.ts`	`Read`
Generated test	`output/test/<name>.spec.ts`	`Read`
Feature documentation	`output/docs/<sessionId>/<name>.md`	`Read`
Docs index	`output/docs/<sessionId>/README.md`	`Read`
Per-step screenshots	`output/exploration/<sessionId>/<name>.png`	`Read` (returns an image block)
Exploration log	`output/exploration/<sessionId>.jsonl`	`Read`
Navigation graph	(derived from trace + ARIA snapshots)	`mcp__bu__read_observed_edges`
Playwright trace	`output/trace/<name>.zip`	Replayable with `npx playwright show-trace` — not normally read by the agent
Run database	`.brow-use/runs.json`	`Read`
Reasoning audit	`output/reasoning/<sessionId>.jsonl`	`Read`

The same naming is used regardless of whether brow-use was driven by a slash command or by your own skill — so any tooling you build around output/ works in both cases.

Next steps

To set up the plugin in a consuming project, see the user guide. To understand how the MCP server fits into the overall system, see architecture. To extend the tool surface itself (adding new mcp__bu__* tools), see the developer guide.