Only enable Cloudflare bypass when the site explicitly requires it. Always test WITHOUT --browser first.
Detection Indicators
Your site needs Cloudflare bypass if you see:
“Checking your browser” or “Just a moment” messages
403/503 HTTP errors with Cloudflare branding
Challenge pages before content loads
VPS/Cloud Server IP Reputation Issue: If running on AWS, DigitalOcean, Hetzner, or any cloud provider, Cloudflare may block your server’s IP even with browser bypass enabled. Cloud/datacenter IPs are often flagged as high-risk. Solution: Combine --browser with residential proxies:./scrapai crawl spider --project proj --proxy-type residential --browser
See Proxy Configuration for details.
Display Requirements
Cloudflare bypass requires a visible browser (not headless). Cloudflare detects and blocks headless browsers.
Platform support:
Windows: Uses native display automatically ✓
macOS: Uses native display automatically ✓
Linux desktop: Uses native display automatically ✓
Linux servers (VPS without GUI): Auto-detects missing display and uses Xvfb (virtual display) ✓
Installing Xvfb on Linux servers:
sudo apt-get install xvfb
The crawler automatically detects your environment and uses Xvfb when no display is available on Linux.
Inspector Usage
Start with default HTTP (fast)
Works for most sites: ./scrapai inspect https://example.com --project proj
Try browser mode if JS-rendered
For JavaScript-heavy sites: ./scrapai inspect https://example.com --project proj --browser
Use Cloudflare bypass only when blocked
For Cloudflare-protected sites: ./scrapai inspect https://example.com --project proj --browser
Strategies
Hybrid Mode (Recommended)
Browser verification once per 10 minutes, then fast HTTP with cached cookies. 20-100x faster than browser-only mode.
Do NOT set CONCURRENT_REQUESTS - uses Scrapy default of 16 for optimal performance.
{
"settings" : {
"CLOUDFLARE_ENABLED" : true ,
"CLOUDFLARE_STRATEGY" : "hybrid" ,
"CLOUDFLARE_COOKIE_REFRESH_THRESHOLD" : 600 ,
"CF_MAX_RETRIES" : 5 ,
"CF_RETRY_INTERVAL" : 1 ,
"CF_POST_DELAY" : 5
}
}
Browser-Only Mode (Legacy)
Much slower - uses browser for every request. Requires CONCURRENT_REQUESTS: 1 to prevent browser conflicts.
{
"settings" : {
"CLOUDFLARE_ENABLED" : true ,
"CLOUDFLARE_STRATEGY" : "browser_only" ,
"CONCURRENT_REQUESTS" : 1
}
}
Settings Reference
Setting Default Description CLOUDFLARE_ENABLEDfalse Enable CF bypass CLOUDFLARE_STRATEGY”hybrid" "hybrid” or “browser_only” CLOUDFLARE_COOKIE_REFRESH_THRESHOLD600 Seconds before cookie refresh CF_MAX_RETRIES5 Max verification attempts CF_RETRY_INTERVAL1 Seconds between retries CF_POST_DELAY5 Seconds after successful verification CF_WAIT_SELECTOR— CSS selector to wait for before extracting CF_WAIT_TIMEOUT10 Max seconds to wait for selector CF_PAGE_TIMEOUT120000 Page navigation timeout (ms) CONCURRENT_REQUESTS— Must be 1 for browser-only mode
Complete Spider Example
{
"name" : "mysite" ,
"allowed_domains" : [ "example.com" ],
"start_urls" : [ "https://www.example.com/articles" ],
"rules" : [
{
"allow" : [ "/article/[^/]+$" ],
"callback" : "parse_article" ,
"follow" : false ,
"priority" : 100
},
{
"allow" : [ "/articles/" ],
"callback" : null ,
"follow" : true ,
"priority" : 50
}
],
"settings" : {
"CLOUDFLARE_ENABLED" : true ,
"CLOUDFLARE_STRATEGY" : "hybrid" ,
"CLOUDFLARE_COOKIE_REFRESH_THRESHOLD" : 600 ,
"CF_MAX_RETRIES" : 5 ,
"CF_RETRY_INTERVAL" : 1 ,
"CF_POST_DELAY" : 5 ,
"CF_WAIT_SELECTOR" : "h1.title-med-1" ,
"DOWNLOAD_DELAY" : 2
}
}
Timeouts & Hang Prevention
Browser operation timeout: 300 seconds (5 minutes) per operation to prevent infinite hangs.
Typical operation times:
CF verification: 10-60 seconds
Page load: 5-30 seconds
Cookie refresh: 10-30 seconds
If you consistently hit the 300s timeout, investigate:
Network connectivity issues
Site blocking your IP/region
Browser/Chrome subprocess problems
System resource constraints (CPU/memory)
Troubleshooting
Crawl Hangs at “Getting/refreshing CF cookies”
Symptoms: Browser opens but never navigates.
Solutions:
Update to latest version
Ensure you’re on latest version with timeout fix
Verify browser opens
Check browser actually opens (not headless failing)
Check display (Linux servers)
Verify Xvfb is installed: sudo apt-get install xvfb
Test with inspector
Test with --browser flag on inspector first: ./scrapai inspect https://example.com --project proj --browser
Check system resources
Verify CPU, memory, and disk space availability
Works on One Machine But Not Another
Debugging steps:
Test inspector on both machines
./scrapai inspect https://example.com --project proj --browser
Check Chrome installation
Verify display (Linux)
echo $DISPLAY # Should show :99 with Xvfb
Review logs for errors
Check logs for specific error messages
Try different strategy
Switch between hybrid and browser_only modes
Diagnosing via Logs
Hybrid mode indicators:
Cached N cookies (cf_clearance: ...)
// Cookies working properly
Browser-only mode indicators:
Cloudflare verified successfully
Opened persistent browser
Closed browser
// Normal lifecycle
Title Contamination
If extracted titles show wrong text, set CF_WAIT_SELECTOR to the main title element to capture HTML before related content loads.
{
"settings" : {
"CF_WAIT_SELECTOR" : "h1.article-title"
}
}
Proxy Escalation Combine with smart proxy usage
Checkpoint Resume Pause and resume long crawls