🌐 Browser Use

Page Control · Automation Agent · Headless Driver

Give your AI agent eyes to see and hands to manipulate real web pages! Re-engineered on top of Playwright, supporting multi-page headless rendering with model-driven click, type, and drag actions.

OpenClaw Team

🚀 Quick Install

Run the following command in your terminal to install:

npx clawhub install browser-use

📊 Stats Overview

⭐ Stars	☁️ Total Calls	👥 Active Users	🎯 Stable Version
1.2k	6.84M	8,400	v3.1.2

🎛️ How It Works

Unlike crawlers that can only pull raw DOM, browser-use is a frontend environment virtual interactor completely re-architected for multimodal LLMs:

🎭 Pixel-level Interaction Mapping: The model can not only read specific tag content (e.g., "read this tweet for me"), but also output coordinate systems and behavior trees to perform realistic page operations (e.g., "click the buy button in the top right and enter this promo code").
📸 Dual-track Visual Snapshots & Page Summaries: Beyond text, it captures screenshots after every page reload and sends them back to the LLM via API. When paired with cutting-edge vision-capable models like GPT-4o and Claude 3.5 Sonnet, it achieves near-human-level perception.
🛡️ Fingerprint Anti-detection & Bypass: The underlying architecture strips traditional automation framework signatures and uses stealth driver plugins to maximize bypass rates against Cloudflare and Captcha interceptors.
🧵 Persistent Session Context: You can maintain Cookie state storage, meaning AI can continuously manage internal dashboards, perform audits, or even batch-post content using your login session — no need to re-authenticate each time.

🧭 Typical Use Cases

🛒 Scenario 1: Multi-node Flash Sales & Ticket Monitoring

As an advanced user, you can create orchestration flows: tell the Agent to monitor a ticket website for availability. Once tickets appear, browser-use immediately manipulates page focus to click "Add to Cart," fills in the shipping address, and submits the order. The entire process is real client-side operation with extremely high success rates.

📊 Scenario 2: Deep Web Closed System Data Mining

Unlike public Wiki data that can be collected with Tavily, many valuable datasets (like competitor backend dashboards) require multi-layer authentication within SPAs. AI can use this tool to auto-fill credentials, switch through dashboard tabs one by one, and capture complex charts as images — building exclusive automated assets.

💻 Command Reference

After installation, you can let AI call these autonomously via conversation, or manually trigger operations from the CLI:

Execute a seamless operation — open a specific website, perform a natural language intent, and archive on completion:

clawhub execute browser-use url="https://x.com" \
  intent="search for 'OpenClaw' returning the top 3 posts" \
  --screenshot-on-finish

Use vision mode and save each step as a debug trace:

clawhub execute browser-use intent="Login to GitHub and star myclaw repo" \
  --enable-vision=true --trace-dir="./traces"

🛡️ Requirements & Authentication

📦 System Dependencies: After installation, the first run of npx clawhub install browser-use will download approximately 150MB of core Chromium binaries to your device — ensure stable network and disk space.
💰 High Multimodal Cost: If you enable the Vision screenshot mode, each AI reasoning step will consume significant LLM API tokens. Make sure your LLM account balance is sufficient.

🔗 View Source on GitHub