🌐 Agent Browser

Core Infrastructure · Automation · Web Agent

A high-speed headless browser automation CLI built on Rust with Node.js fallback, enabling your OpenClaw agent to navigate, click, type, and capture page snapshots via structured commands at blazing speed.

OpenClaw Team

🚀 Quick Install

Run the following command in your terminal to install:

npx clawhub install agent-browser

📊 Stats Overview

⭐ Stars	☁️ Total Downloads	👥 Active Users	🎯 Stable Version
892	128k	3,450	v2.4.1

🎛️ Core Workflow

This extension skill breaks down the barrier between AI and the terminal, granting it the ability to interact visually and structurally with modern dynamic web environments (DOM/Canvas):

🌐 Blazing-fast Web Navigation: Receives URL commands and loads fully rendered pages in seconds via the built-in Rust engine or Node.js layer (navigate <url>).
📸 Visual Snapshot Capture: Automatically takes high-resolution screenshots of target nodes or full pages (snapshot), seamlessly feeding into multimodal LLM visual understanding pipelines.
🖱️ Deep DOM Interaction: Converts natural language intents into precise structured click and form input commands — no need for developers to manually write complex CSS selectors.
⚡ Dynamic Script Injection: With secure sandbox isolation, AI can directly execute custom JavaScript within the current page lifecycle context (evaluate) to extract deep-level data.

🧭 Typical Use Cases

🤖 Scenario 1: Immersive Testing & QA

Let AI play the role of a QA end-user, automatically finding input fields, navigating complex OAuth login flows, and performing DOM assertion checks on pages.

🔍 Scenario 2: Breaking Through Knowledge Barriers

No longer limited to static text API endpoints. When AI encounters knowledge gaps with newer frameworks during coding, it can directly drive the browser to official docs or StackOverflow to read the latest code snippets.

🕸️ Scenario 3: Dynamic Data Scraping

For SPAs with strict anti-scraping measures or heavy React/Vue client-side hydration rendering — achieve "what you see is what you get" powerful extraction.

👁️ Scenario 4: Multimodal Visual UI Auditing

Leveraging page snapshot capabilities, visual models can directly compare subtle UI component-level differences before and after deployment, replacing tedious manual review processes.

🛡️ Runtime Prerequisites

📦 Global Dependency: This skill requires the driver to be globally installed on the host machine. Please run: npm install -g agent-browser.
⚙️ Native Kernel & Fallback: It's strongly recommended to have native Chromium or equivalent WebKit dependencies available. If missing, the CLI will attempt to launch a lightweight Node.js compatible fallback.

🔗 View Source on GitHub

⭐ Stars

☁️ Total Downloads

👥 Active Users

🎯 Stable Version

892

128k

3,450

v2.4.1

🎛️ Core Workflow

This extension skill breaks down the barrier between AI and the terminal, granting it the ability to interact visually and structurally with modern dynamic web environments (DOM/Canvas):

🌐 Blazing-fast Web Navigation: Receives URL commands and loads fully rendered pages in seconds via the built-in Rust engine or Node.js layer (navigate <url>).

📸 Visual Snapshot Capture: Automatically takes high-resolution screenshots of target nodes or full pages (snapshot), seamlessly feeding into multimodal LLM visual understanding pipelines.

🖱️ Deep DOM Interaction: Converts natural language intents into precise structured click and form input commands — no need for developers to manually write complex CSS selectors.

⚡ Dynamic Script Injection: With secure sandbox isolation, AI can directly execute custom JavaScript within the current page lifecycle context (evaluate) to extract deep-level data.