Skip to main content
The browser service is an optional background process that keeps one warm browser running. When it is up, inspect (and the --browser / --screenshot paths, plus the auto-escalation from HTTP to browser) route through it automatically instead of each launching their own browser and re-solving Cloudflare.
Cold-starting a browser for a Cloudflare-protected site is expensive (launch + solve the challenge, roughly 10-15s each time). The service pays that cost once and keeps the browser warm, so repeated inspects — and several agents inspecting different sites at once — are far faster.

When to run it

Start it when a session will do many browser inspects:
  • Repeated inspect --screenshot across a site’s sections and sample pages.
  • Cloudflare/JS sites where every inspect would otherwise cold-start a browser.
  • Parallel processing — several agents each inspecting a different site.
For a one-off inspect you don’t need it. When the service is not running, inspect cold-starts its own browser exactly as before — output (page.html, page.png, the transport report) is identical either way. It is a pure speed-up.

Commands

./scrapai browser start

start

Launches the background browser and waits until it answers pings.
./scrapai browser start --pool 5 --proxy-type auto
--pool
default:"5"
Max concurrent lanes — one lane per site (see Parallel crawling with lanes below).
--proxy-type
default:"auto"
Proxy for the service: auto, none, or any proxy name configured in .env.
On a headless server the browser runs under Xvfb automatically — no windows, no xvfb-run needed. If a display is required but Xvfb is missing, start tells you to install it: sudo apt-get install -y xvfb.
If a service is already running, start reports its pid and does nothing.

status

./scrapai browser status
Prints Running (pid ..., port ...). or Not running.

stop

./scrapai browser stop
Gracefully shuts the service down and drops its state file.

restart

./scrapai browser restart
Stops the service and starts it again with its previous --proxy-type and --pool settings. Pass either flag to override just that value:
./scrapai browser restart --pool 10

shot

Screenshot a URL through the running service, reusing the warm browser.
./scrapai browser shot https://example.com --project myproj --screens 2
url
required
The page to capture (positional argument).
--project
default:"default"
Project name — determines where the screenshot is saved.
--screens
default:"2"
Screen-heights to capture. 0 captures the full page.
The image is written to <DATA_DIR>/<project>/<domain>/analysis/page.png, where <domain> is the URL host with www. stripped and dots replaced by underscores.
shot requires a running service. If none is up it exits with: No browser service running. Start it: ./scrapai browser start

State file

The service records its pid and port in a per-user state file so any scrapai process — browser start and every inspect caller — can find the one running service:
~/.scrapai/browser_service.json
The path is anchored on your home directory (not $TMPDIR) on purpose: $TMPDIR differs per shell/sandbox on macOS, which would make a service started in one terminal invisible from another. stop removes this file; stale files are handled gracefully.

How it works

  • One browser, one window. The service launches a single browser. Each site gets its own tab in that one window.
  • One tab per site (domain-sticky). A site reuses its tab and its already-solved Cloudflare session, so the second inspect of a site skips the challenge and is much faster. Different sites get different tabs and solve Cloudflare concurrently without interfering.
  • LRU eviction. When more than --pool sites are in play, the least-recently-used tab is closed.
Memory: one shared browser for, say, 5 sites uses roughly half of what 5 separate browsers would (one browser baseline instead of five).

Parallel crawling with lanes

Under the hood the service runs a lane pool over the one shared browser. A “lane” is an isolated browser context + page that solves Cloudflare on its own. The pool maps each domain to a lane:
  • Domain-sticky. The same domain reuses its lane (and its solved CF session); different domains get different lanes and run in parallel.
  • LRU eviction. At most --pool lanes exist at once (default 5). When the number of domains exceeds the cap, the least-recently-used lane is closed.
  • Per-domain navigation locks. A lane has a single page, so two requests for the same domain are serialized on that lane while different domains proceed concurrently.
  • Sessioned lanes. A lane is tied to the session it was opened with. If the same domain is later requested with a different session (or none), the lane is torn down and reopened — a logged-in lane is never reused unlogged, or vice versa.
Before processing multiple sites in parallel, start the service once:
./scrapai browser start
Each agent’s inspect then shares the one browser (one lane per site) instead of launching its own. Run ./scrapai browser stop when the batch is done.
Because all lanes share one browser session, a site that needs to switch proxies mid-solve (only when Cloudflare blocks and a proxy chain is configured) can disturb other lanes. With direct connections — the normal case — this does not happen.

Cloudflare Bypass

Handle Cloudflare-protected sites with browser verification and cookie caching

Proxy Escalation

Combine the service with smart proxy usage