Web Browser Bot

Updated June 2026
The web browser bot is a real headless Chromium that agents drive interactively, navigate, click, type, scroll, read, and decide what to do next from what they see. It runs on Playwright with stealth hardening, persistent fingerprint profiles, proxy support, and full session, tab, and frame control.

Most of the web has no API, and a meaningful slice of the web that matters to a business actively resists automation. The browser bot is the platform's answer to both: agents get the same instrument a person uses, an actual browser, with the controls exposed as commands. The research pipeline uses it to work sources, agents use it for any job that lives behind a login or a form, and you can direct it conversationally through the master agent.

Sessions: The Working Unit

Work happens in sessions. An agent opens a session, gets a session id, and issues commands against it: the browser stays alive between commands, cookies and login state persist, and multiple sessions can run side by side without touching each other. Sessions list, close individually, or close all at once, and long-lived sessions are bounded, a sticky session caps at 25 minutes, which keeps abandoned browsers from accumulating. Each session can also hold multiple tabs, with commands to open, list, switch, and close them, so an agent can compare two pages the way you would, side by side rather than back and forth.

Fingerprint Profiles

Headless browsers announce themselves in dozens of small ways, and sites that resist automation read those signals. The bot ships with stealth hardening on top of Playwright and adds persistent fingerprint profiles: a profile carries a consistent viewport from common desktop resolutions, a current Chrome user agent, a believable GPU identity, and a plausible timezone, and the browser launches with the automation tells disabled. Profiles persist on disk, so an identity that logged into a site last week presents the same fingerprint this week, which is exactly how a real returning visitor looks. Profiles are created, inspected, updated, saved, and listed by command, and an agent can maintain separate profiles for separate jobs.

Interacting with Pages

The command set covers everything a hand on a mouse does. Navigation: goto, back, forward, refresh. Element interaction: click, type, press for keyboard keys, select for dropdowns, and hover. Movement: scroll, wait for elements or conditions, and sleep for explicit pacing. Precision work is covered too: raw mouse clicks by coordinate for stubborn interfaces, and frame commands that list iframes and click inside them, which is where embedded widgets and payment forms live. Interaction is paced like a person rather than fired like a script, which both reads naturally to sites and avoids racing pages that load in stages.

Reading and Seeing

Acting is half the loop; the other half is perceiving. Three reading commands return the page at different depths: a cleaned reading of the visible content for quick judgment, a full extraction when everything matters, and raw text when structure is noise. Screenshots capture the actual rendered page, full page or viewport, saved where agents can inspect them, the agent's eyes for visual judgment calls and your evidence when you want to see exactly what the bot saw. For anything the standard commands do not cover, execute_js runs arbitrary JavaScript in the page and returns the result, the escape hatch that makes the tool complete.

Proxies

Provider configurations give sessions their network identity: route a session through a proxy provider and its traffic originates where you need it to, with profiles and providers composing naturally, a consistent fingerprint on a consistent route. Providers are configured once in the tool's providers directory and selected per session.

How Agents Use It

The bot is a standard tool in the tool layer, so every agent can call it, and the interactive loop is the point: issue a command, read the result, decide, repeat. That loop is what separates it from fetch-and-parse scraping, an agent using the browser bot can handle a login wall, a cookie banner, a multi-step form, pagination, and an unexpected layout change in one session, because at every step it sees where it is and chooses what to do next. Agents learn site-specific procedures as they go and save them to the memory bank as learned procedures, so the second visit to a tricky site goes faster than the first.

For high-volume structured collection from the major platforms, the BrightData integration is the complementary instrument: APIs where APIs work, the browser where hands are needed. Together they cover the spectrum from bulk data to bespoke interaction.

Key Takeaway

A real Chromium under agent control: stealth fingerprint profiles that persist, sessions with tabs and frames, human-paced interaction, three depths of page reading plus screenshots, JS execution, and proxy routing. If a person can do it in a browser, an agent can do it here.