The setup commands initialize your ScrapAI environment, install dependencies, and verify everything is working correctly.
setup
Install virtual environment, dependencies, and initialize database.
Syntax
./scrapai setup [--skip-deps]
Options
Skip dependency installation (useful when re-running setup after manual changes)
What It Does
- Creates virtual environment at
.venv/ (skips if exists)
- Installs Python dependencies from
requirements.txt
- Installs Playwright Chromium for browser automation
- Creates
.env file from .env.example (if missing)
- Tests data directory permissions by writing a test file
- Runs database migrations via Alembic
- Configures Claude Code permissions (if using Claude Code agent)
Output
$ ./scrapai setup
🚀 Setting up ScrapAI environment...
📦 Creating virtual environment...
✅ Virtual environment created
📋 Installing requirements...
✅ Requirements installed
🌐 Installing Playwright Chromium browser...
✅ Playwright Chromium installed
📝 Creating .env from .env.example...
✅ .env file created (using SQLite by default)
📁 Checking data directory permissions...
✅ Have permission to write to data directory: ./data
🗄️ Initializing database...
✅ Database initialized with migrations
🔧 Configuring Claude Code permissions...
✅ Claude Code permissions configured
🎉 ScrapAI setup complete!
📝 You can now:
• List spiders: ./scrapai spiders list --project <name>
• Import spiders: ./scrapai spiders import <file> --project <name>
• Run crawls: ./scrapai crawl <spider_name> --project <name>
Linux
Playwright Chromium requires system dependencies. If browser fails to launch:
sudo .venv/bin/python -m playwright install-deps chromium
This command requires sudo as it installs system packages (libglib, libnss3, etc.).
Windows
Use scrapai or scrapai.bat instead of ./scrapai:
Skip Dependencies
If you’ve already installed dependencies manually or made changes:
./scrapai setup --skip-deps
This runs migrations and permission checks without reinstalling packages.
verify
Verify environment setup without installing anything.
Syntax
What It Checks
- Virtual environment exists at
.venv/
- Core dependencies installed (scrapy, sqlalchemy, alembic)
- Database initialized (checks current Alembic revision)
Output
Success
$ ./scrapai verify
🔍 Verifying ScrapAI environment...
✅ Virtual environment exists
✅ Core dependencies installed
✅ Database initialized
🎉 Environment is ready!
📝 You can now:
• List spiders: ./scrapai spiders list --project <name>
• Import spiders: ./scrapai spiders import <file> --project <name>
• Run crawls: ./scrapai crawl <spider_name> --project <name>
Missing Setup
$ ./scrapai verify
🔍 Verifying ScrapAI environment...
❌ Virtual environment not found
Run: ./scrapai setup
⚠️ Environment setup incomplete
Run: ./scrapai setup
Use Cases
- After cloning repository: Verify setup before starting work
- CI/CD pipelines: Check environment before running tests
- Troubleshooting: Diagnose setup issues without modifying anything
Claude Code Permissions
The setup command configures Claude Code agent permissions in .claude/settings.local.json.
Allow List
Commands and operations the agent can perform:
[
"Read",
"Write",
"Edit",
"Update",
"Glob",
"Grep",
"Bash(./scrapai:*)",
"Bash(source:*)",
"Bash(sqlite3:*)",
"Bash(psql:*)",
"Bash(xvfb-run:*)"
]
Deny List
Commands and operations the agent cannot perform:
[
"Edit(scrapai)",
"Update(scrapai)",
"Edit(.claude/*)",
"Update(.claude/*)",
"Write(**/*.py)",
"Edit(**/*.py)",
"Update(**/*.py)",
"MultiEdit(**/*.py)",
"Write(.env)",
"Write(secrets/**)",
"Write(config/**/*.key)",
"Write(**/*password*)",
"Write(**/*secret*)",
"WebFetch",
"WebSearch",
"Bash(rm:*)"
]
These permissions ensure the agent writes config (JSON), not code (Python). This is a core security principle of ScrapAI’s agent safety model.
Environment Variables
The .env file created by setup:
# Data directory (default: ./data)
DATA_DIR=./data
# Database (default: SQLite)
DATABASE_URL=sqlite:///scrapai.db
# For PostgreSQL:
# DATABASE_URL=postgresql://user:password@localhost:5432/scrapai
# Proxy settings (optional)
DATACENTER_PROXY_USERNAME=
DATACENTER_PROXY_PASSWORD=
DATACENTER_PROXY_HOST=
DATACENTER_PROXY_PORT=
RESIDENTIAL_PROXY_USERNAME=
RESIDENTIAL_PROXY_PASSWORD=
RESIDENTIAL_PROXY_HOST=
RESIDENTIAL_PROXY_PORT=
# S3 storage (optional, for Airflow)
S3_ENDPOINT=
S3_BUCKET=
Troubleshooting
Permission Denied (Linux/macOS)
Make the script executable:
Python Version
Requires Python 3.9 or higher:
python --version # or python3 --version
Virtual Environment Issues
Delete and recreate:
rm -rf .venv
./scrapai setup
Database Migration Errors
Check DATABASE_URL in .env and ensure database is accessible:
# For SQLite (default)
ls -la scrapai.db
# For PostgreSQL
psql $DATABASE_URL -c "SELECT 1"
Next Steps