Browser Service

The browser service is an optional background process that keeps one warm browser running. When it is up, inspect (and the --browser / --screenshot paths, plus the auto-escalation from HTTP to browser) route through it automatically instead of each launching their own browser and re-solving Cloudflare.

Cold-starting a browser for a Cloudflare-protected site is expensive (launch + solve the challenge, roughly 10-15s each time). The service pays that cost once and keeps the browser warm, so repeated inspects — and several agents inspecting different sites at once — are far faster.

When to run it

Start it when a session will do many browser inspects:

Repeated inspect --screenshot across a site’s sections and sample pages.
Cloudflare/JS sites where every inspect would otherwise cold-start a browser.
Parallel processing — several agents each inspecting a different site.

For a one-off inspect you don’t need it. When the service is not running, inspect cold-starts its own browser exactly as before — output (page.html, page.png, the transport report) is identical either way. It is a pure speed-up.

Commands

./scrapai browser start

start

Launches the background browser and waits until it answers pings.

./scrapai browser start --pool 5 --proxy-type auto

--pool

default:"5"

Max concurrent lanes — one lane per site (see Parallel crawling with lanes below).

--proxy-type

default:"auto"

Proxy for the service: auto, none, or any proxy name configured in .env.

On a headless server the browser runs under Xvfb automatically — no windows, no xvfb-run needed. If a display is required but Xvfb is missing, start tells you to install it: sudo apt-get install -y xvfb.

If a service is already running, start reports its pid and does nothing.

status

./scrapai browser status

Prints Running (pid ..., port ...). or Not running.

stop

./scrapai browser stop

Gracefully shuts the service down and drops its state file.

restart

./scrapai browser restart

Stops the service and starts it again with its previous --proxy-type and --pool settings. Pass either flag to override just that value:

./scrapai browser restart --pool 10

shot

Screenshot a URL through the running service, reusing the warm browser.

./scrapai browser shot https://example.com --project myproj --screens 2

url

required

The page to capture (positional argument).

--project

default:"default"

Project name — determines where the screenshot is saved.

--screens

default:"2"

Screen-heights to capture. 0 captures the full page.

The image is written to <DATA_DIR>/<project>/<domain>/analysis/page.png, where <domain> is the URL host with www. stripped and dots replaced by underscores.

shot requires a running service. If none is up it exits with: No browser service running. Start it: ./scrapai browser start

State file

The service records its pid and port in a per-user state file so any scrapai process — browser start and every inspect caller — can find the one running service:

~/.scrapai/browser_service.json

The path is anchored on your home directory (not $TMPDIR) on purpose: $TMPDIR differs per shell/sandbox on macOS, which would make a service started in one terminal invisible from another. stop removes this file; stale files are handled gracefully.

How it works

One browser, one window. The service launches a single browser. Each site gets its own tab in that one window.
One tab per site (domain-sticky). A site reuses its tab and its already-solved Cloudflare session, so the second inspect of a site skips the challenge and is much faster. Different sites get different tabs and solve Cloudflare concurrently without interfering.
LRU eviction. When more than --pool sites are in play, the least-recently-used tab is closed.

Memory: one shared browser for, say, 5 sites uses roughly half of what 5 separate browsers would (one browser baseline instead of five).

Parallel crawling with lanes

Under the hood the service runs a lane pool over the one shared browser. A “lane” is an isolated browser context + page that solves Cloudflare on its own. The pool maps each domain to a lane:

Domain-sticky. The same domain reuses its lane (and its solved CF session); different domains get different lanes and run in parallel.
LRU eviction. At most --pool lanes exist at once (default 5). When the number of domains exceeds the cap, the least-recently-used lane is closed.
Per-domain navigation locks. A lane has a single page, so two requests for the same domain are serialized on that lane while different domains proceed concurrently.
Sessioned lanes. A lane is tied to the session it was opened with. If the same domain is later requested with a different session (or none), the lane is torn down and reopened — a logged-in lane is never reused unlogged, or vice versa.

Before processing multiple sites in parallel, start the service once:

./scrapai browser start

Each agent’s inspect then shares the one browser (one lane per site) instead of launching its own. Run ./scrapai browser stop when the batch is done.

Because all lanes share one browser session, a site that needs to switch proxies mid-solve (only when Cloudflare blocks and a proxy chain is configured) can disturb other lanes. With direct connections — the normal case — this does not happen.

Cloudflare Bypass

Handle Cloudflare-protected sites with browser verification and cookie caching

Proxy Escalation

Combine the service with smart proxy usage

​When to run it

​Commands

​start

​status

​stop

​restart

​shot

​State file

​How it works

​Parallel crawling with lanes

​Related Guides

Cloudflare Bypass

Proxy Escalation

When to run it

Commands

start

status

stop

restart

shot

State file

How it works

Parallel crawling with lanes

Related Guides