Browser Mode¶

Browser mode manages browser sessions through direct CDP (Chrome DevTools Protocol) WebSocket connections. The governance model is identical to CLI session mode: typed commands, per-interaction Cedar gating, scope enforcement, state-aware policies, output schema validation, evidence capture. The transport is a browser engine instead of a PTY.

`[tool.browser]` Configuration¶

The [browser] section controls browser lifecycle and connection behavior.

Connection Settings¶

Field	Type	Default	Description
`engine`	string	`"cdp"`	`"cdp-direct"` (recommended) or `"playwright"` (optional convenience layer)
`connect`	string	`"launch"`	`"launch"` spawns a new headless browser. `"live"` attaches to a running Chrome session.
`headless`	boolean	`true`	Run headless. Ignored when `connect = "live"`.

Timeouts and Limits¶

Field	Type	Default	Description
`startup_timeout_seconds`	integer	`60`	Max time to wait for browser ready
`session_timeout_seconds`	integer	`1800`	Max total session lifetime
`idle_timeout_seconds`	integer	`300`	Max time between interactions
`max_interactions`	integer	`100`	Maximum command/response rounds

Content Extraction¶

Field	Type	Default	Description
`extract_mode`	string	`"accessibility_tree"`	Default content extraction: `"accessibility_tree"`, `"html"`, or `"text"`

The accessibility tree is the recommended default. It returns a compact semantic representation of the page (buttons, links, headings, form fields) instead of raw HTML. This is dramatically more token-efficient for LLM consumption.

When raw HTML is needed (for scraping or debugging), use the extract_html command with a CSS selector to scope the output.

Example¶

[browser]
engine = "cdp-direct"
connect = "launch"
headless = true
extract_mode = "accessibility_tree"
startup_timeout_seconds = 10
session_timeout_seconds = 600
idle_timeout_seconds = 120
max_interactions = 200

`[browser.scope]` -- URL Scope Enforcement¶

The [browser.scope] section defines which domains the browser can visit. This is the browser equivalent of target scope checking on CLI tools like nmap.

Field	Type	Default	Description
`allowed_domains`	array	`[]`	Glob patterns for permitted navigation domains
`blocked_domains`	array	`[]`	Glob patterns for explicitly blocked domains
`allow_external`	boolean	`false`	If `false`, browser cannot leave `allowed_domains`. If `true`, allows unlisted domains but still blocks `blocked_domains`.

[browser.scope]
allowed_domains = ["*.example.com", "docs.example.com"]
blocked_domains = ["*.evil.com", "admin.*"]
allow_external = false

Enforcement Levels¶

Scope checking operates at three levels:

Navigation commands. The navigate command's URL is validated against [browser.scope] before the CDP command fires.
Redirect interception. The BrowserExecutor intercepts HTTP redirects and validates each hop. A page at allowed.example.com that redirects to evil.com is blocked mid-redirect.
Link click validation. When the agent uses click on a link, the executor resolves the href and validates before allowing the navigation.

The allow_external = false setting is the strictest mode: the browser cannot leave the declared domain set under any circumstances.

`[browser.commands.*]` -- Browser Commands¶

Each entry in [browser.commands] declares a permitted browser operation. Each command becomes a separate MCP tool visible to the LLM during the session.

Command Fields¶

Field	Type	Default	Description
`description`	string	--	Human-readable description shown in MCP tool listing
`risk_tier`	string	`"low"`	`"low"`, `"medium"`, `"high"`, or `"critical"`
`human_approval`	boolean	`false`	Whether this command requires human approval
`args.*`	table	--	Typed arguments for the command

Action Types¶

`navigate` -- Navigate to a URL¶

[browser.commands.navigate]
description = "Navigate a tab to a URL"
risk_tier = "medium"
args.url = { type = "url", schemes = ["https"], scope_check = true }

URLs are scope-checked against [browser.scope] before the CDP command fires.

`click` -- Click an element¶

[browser.commands.click]
description = "Click an element by CSS selector"
risk_tier = "low"
args.selector = { type = "string", pattern = "^[a-zA-Z0-9_.#\\[\\]=\"' >:()-]+$" }

`type_text` -- Type into an element¶

[browser.commands.type_text]
description = "Type text into the focused element"
risk_tier = "low"
args.text = { type = "string", sanitize = ["injection"] }

`scroll` -- Scroll the page¶

[browser.commands.scroll]
description = "Scroll the page"
risk_tier = "low"
args.direction = { type = "enum", allowed = ["up", "down"] }
args.amount = { type = "integer", min = 1, max = 10, default = 3 }

`wait_for` -- Wait for an element¶

[browser.commands.wait_for]
description = "Wait for an element to appear"
risk_tier = "low"
args.selector = { type = "string" }
args.timeout_ms = { type = "integer", min = 100, max = 30000, default = 5000 }

`extract` / `snapshot` -- Extract page content¶

[browser.commands.snapshot]
description = "Get accessibility tree snapshot of the current page"
risk_tier = "low"
args.selector = { type = "string", required = false, description = "Optional CSS selector to scope the snapshot" }

[browser.commands.extract_html]
description = "Get raw HTML scoped to a CSS selector"
risk_tier = "low"
args.selector = { type = "string", required = true }

`screenshot` -- Capture evidence¶

[browser.commands.screenshot]
description = "Capture page screenshot for evidence"
risk_tier = "low"

`submit_form` -- Submit a form (high risk)¶

[browser.commands.submit_form]
description = "Submit a form"
risk_tier = "high"
human_approval = true
args.selector = { type = "string" }

`execute_js` -- Evaluate JavaScript (high risk)¶

[browser.commands.execute_js]
description = "Evaluate JavaScript in page context"
risk_tier = "high"
human_approval = true
args.expression = { type = "string" }

Commands Reference Table¶

Command	Description	Risk	Arguments
`navigate`	Navigate a tab to a URL	medium	`url` (url, scope-checked)
`click`	Click an element by CSS selector	low	`selector` (string)
`type_text`	Type text into the focused element	low	`text` (string, injection-safe)
`submit_form`	Submit a form	high	`selector` (string)
`snapshot`	Get accessibility tree snapshot	low	`selector` (optional string)
`extract_html`	Get raw HTML scoped to a CSS selector	low	`selector` (string)
`screenshot`	Capture page screenshot for evidence	low	--
`execute_js`	Evaluate JavaScript in page context	high	`expression` (string)
`wait_for`	Wait for an element to appear	low	`selector`, `timeout_ms`
`go_back`	Navigate back in browser history	low	--
`list_tabs`	List open browser tabs	low	--
`network_timing`	Get network resource timing data	low	--

Interaction Tracking¶

The BrowserExecutor tracks session state after each interaction.

`[browser.state]` Fields¶

[browser.state]
fields = ["url", "title", "domain", "has_forms", "is_authenticated", "page_loaded", "tab_count"]

Field	Source	Use
`url`	Current page URL	Domain-based policies
`title`	Page title	Context for agent reasoning
`domain`	Extracted from URL	Scope enforcement
`has_forms`	DOM inspection	Form submission gating
`is_authenticated`	Session cookies, user profile elements	Elevated caution on authenticated pages
`page_loaded`	Page load event + network idle	Ready detection
`tab_count`	Chrome tab enumeration	Tab management policies

Cedar policies can reference these fields for authorization decisions.

Evidence Collection¶

Every browser interaction produces a structured evidence envelope. When [tool.evidence].screenshots = true, the BrowserExecutor also captures a page screenshot at each interaction for the audit trail.

The evidence envelope includes:

Current URL, title, and domain after the command
Content (accessibility tree, HTML, or command result)
Page state for Cedar policy context
Screenshot file path (if configured)
Interaction count
SHA-256 hash of the output

Per-Interaction ORGA Gating¶

Each browser command goes through the full Observe-Reason-Gate-Act cycle:

Iteration 1:
  Observe:  {url: "https://app.example.com/login", title: "Login", has_forms: true}
  Reason:   LLM proposes browser_session.type_text(selector="#email", text="user@co.com")
  Gate:     Cedar: is type_text allowed? ToolClad: text is injection-safe
  Act:      BrowserExecutor sends CDP command, waits for page stable

Iteration 2:
  Observe:  {url: "https://app.example.com/login", title: "Login"}
  Reason:   LLM proposes browser_session.submit_form(selector="#login-form")
  Gate:     Cedar: submit_form requires human_approval -> PENDING
  Act:      (blocked until human approves)

Iteration 3:
  Observe:  {url: "https://app.example.com/dashboard", is_authenticated: true}
  Reason:   LLM proposes browser_session.navigate(url="https://evil.com/exfil")
  Gate:     ToolClad: URL scope check -> DENY (domain not in allowed_domains)
  Act:      (blocked, denial fed back to agent)

Live Browser Attachment¶

When connect = "live", the BrowserExecutor attaches to the user's running Chrome session via its debug port (http://localhost:9222/json). This enables:

Interaction with tabs the user already has open, including logged-in sessions
Agent operates in the user's authenticated context
The executor does not own the browser lifecycle

[tool]
name = "chrome_live"
mode = "browser"
description = "Governed access to live Chrome session"
risk_tier = "medium"

[tool.cedar]
resource = "Web::BrowserSession"
action = "browse"

[browser]
engine = "cdp-direct"
connect = "live"
headless = false
idle_timeout_seconds = 1200

[browser.scope]
allowed_domains = ["github.com", "*.internal.corp.com"]
blocked_domains = ["*.evil.com"]
allow_external = false

Cedar Policy Examples¶

permit (
    principal,
    action == Web::Action::"browse",
    resource
)
when {
    resource.command == "navigate" &&
    resource.url_domain in Web::DomainSet::"allowed"
};

Block form submission on authenticated pages without approval¶

forbid (
    principal,
    action == Web::Action::"browse",
    resource
)
when {
    resource.command == "submit_form" &&
    resource.page_state.is_authenticated == true
}
unless {
    resource.human_approved == true
};

Block all JavaScript execution unless approved¶

forbid (
    principal,
    action == Web::Action::"browse",
    resource
)
when {
    resource.command == "execute_js"
}
unless {
    resource.human_approved == true
};

Rate limit form submissions per session¶

forbid (
    principal,
    action == Web::Action::"browse",
    resource
)
when {
    resource.command == "submit_form" &&
    resource.session_submit_count >= 5
};

Use Cases¶

Use Case	Connect Mode	Commands Used	Governance Value
Web application testing	`launch`	navigate, click, type_text, submit_form, snapshot	Scope-locked to test environment. Form submission gated.
Competitive intelligence	`launch`	navigate, snapshot, screenshot	Read-only. No form submission, no JS execution.
Form filling / workflow automation	`launch`	navigate, type_text, submit_form, snapshot	Human approval on submission. Scope-locked to target app.
Web scraping	`launch`	navigate, snapshot, click	Read-only. Rate-limited. Domain-locked.
Developer assistant (live browser)	`live`	list_tabs, snapshot, extract_html, screenshot	Read-only access to user's logged-in sessions.
Internal tool automation	`live`	navigate, click, type_text, submit_form, snapshot	Agent in user's authenticated context. Scope-locked to internal domains.
Debugging / monitoring	`live`	snapshot, network_timing, screenshot	Read-only inspection of live page state and performance.

Out of Scope¶

The following are explicitly excluded from browser mode:

Coordinate-based clicking (clickxy, pixel-coordinate targeting): ToolClad browser commands operate on CSS selectors, which are semantic and validatable. Coordinate-based clicking depends on viewport size and zoom level, making it non-deterministic and ungovernable.
Raw CDP passthrough: Exposing raw Chrome DevTools Protocol methods to the agent bypasses all ToolClad validation. Browser commands are the governed surface; raw CDP is the implementation detail the BrowserExecutor uses internally.
Non-headless GUI automation: Browser mode supports both headless and live browser attachment, but interaction is always through CSS selectors and typed commands via CDP. GUI-level automation (mouse movement, screen capture-based element detection) is out of scope.

Browser Mode¶

[tool.browser] Configuration¶