Skip to content

browsemake/browser-cli

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Browser CLI

Discord Twitter Follow

br is a command line tool used by any capable LLM agent, like ChatGPT, Claude Code or Gemini CLI.

https://www.npmjs.com/package/@browsemake/browser-cli

Why Broswer CLI?

  • Just works: simply browser automation, coding not required, leave the rest workflow to the most powerful LLM agent
  • AI first: designed for LLM agent, readable view from HTML, and error hint
  • Secure: can be run locally, no credential passed to LLM
  • Robust: browser persisted progress across session, and track history action for replay

Install

npm install -g @browsemake/browser-cli

Usage

Type instruction to AI agent (Gemini CLI / Claude Code / ChatGPT):

> You have browser automation tool 'br', use it to go to amazon to buy me a basketball

Use command line directly by human:

br start
br goto https://github.com/

Demos

Grocery (Go to Amazon and buy me a basketball)

Navigate to GitHub repo:

Print invoice

Download bank account statement

Search for job posting

Features

  • Browser Action: Comprehensive action for browser automation (navigation, click, etc.)
  • LLM friendly output: LLM friendly command output with error correction hint
  • Daemon mode: Always-on daemon mode so it lives across multiple LLM sessions
  • Structured web page view: Accessibility tree view for easier LLM interpretation than HTML
  • Secret management: Secret management to isolate password from LLM
  • History tracking: History tracking for replay and scripting

Command

Start the daemon

br start

If starting the daemon fails (for example due to missing Playwright browsers), the CLI prints the error output so you can diagnose the issue.

Navigate to a URL

br goto https://example.com

Click an element

br click "button.submit"

Commands that accept a CSS selector (like click, fill, scrollIntoView, type) can also accept a numeric ID. These IDs are displayed in the output of br view-tree and allow for direct interaction with elements identified in the tree.

Scroll element into view

br scrollIntoView "#footer"

Scroll to percentage of page

br scrollTo 50

Fill an input field

br fill "input[name='q']" "search text"

Fill an input field with a secret

MY_SECRET="top-secret" br fill-secret "input[name='password']" MY_SECRET

When retrieving page HTML with br view-html, any text provided via fill-secret is masked to avoid exposing secrets.

Type text into an input

br type "input[name='q']" "search text"

Press a key

br press Enter

Scroll next/previous chunk

br nextChunk
br prevChunk

View page HTML

br view-html

View action history

br history

Clear action history

br clear-history

Capture a screenshot

br screenshot

View accessibility and DOM tree

br view-tree

Outputs a hierarchical tree combining accessibility roles with DOM element information. It also builds an ID-to-XPath map for quick element lookup.

List open tabs

br tabs

Switch to a tab by index

br switch-tab 1

Stop the daemon

br stop

The daemon runs a headless Chromium browser and exposes a small HTTP API. The CLI communicates with it to perform actions like navigation and clicking elements.