Overview
ScrapAI stores crawl metadata, spider configurations, and queue data in a relational database.
Database Options
SQLite (Default)
Best for: Development, small to medium projects (< 1M items), single-user usage.
Pros: No setup, zero configuration, file-based, built into Python.
Cons: Limited concurrency, single writer, not suitable for distributed systems.
PostgreSQL
Best for: Production, large projects (1M+ items), multi-user environments, distributed crawling.
Pros: Excellent concurrency, scales to billions of rows, advanced indexing.
Requires: PostgreSQL 12+ installed and running.
SQLite Configuration
Default Setup
SQLite is configured by default. Run ./scrapai setup to create scrapai.db in the project root.
Custom Path
Update .env to use a different location:
DATABASE_URL=sqlite:///scrapai.db # Relative
DATABASE_URL=sqlite:////absolute/path/to/scrapai.db # Absolute
Use three slashes /// for relative paths and four slashes //// for absolute paths.
Optimization
ScrapAI automatically applies optimized SQLite settings (WAL mode, 64MB cache) in core/db.py.
PostgreSQL Configuration
Installation
macOS
Ubuntu/Debian
Docker
# Install PostgreSQL
brew install postgresql@15
# Start service
brew services start postgresql@15
# Install PostgreSQL
sudo apt-get update
sudo apt-get install postgresql postgresql-contrib
# Start service
sudo systemctl start postgresql
sudo systemctl enable postgresql
# Run PostgreSQL container
docker run -d \
--name scrapai-postgres \
-e POSTGRES_PASSWORD=yourpassword \
-e POSTGRES_DB=scrapai \
-p 5432:5432 \
postgres:15
Create Database
Or with custom user:
CREATE DATABASE scrapai;
CREATE USER scrapai_user WITH PASSWORD 'secure_password';
GRANT ALL PRIVILEGES ON DATABASE scrapai TO scrapai_user;
Update .env with your connection string:
DATABASE_URL=postgresql://user:password@localhost:5432/scrapai
Format: postgresql://[user]:[password]@[host]:[port]/[database]
Add ?sslmode=require for SSL connections.
Run Migrations
Initialize the database schema:
Migrating from SQLite to PostgreSQL
Backup your data first: cp scrapai.db scrapai.db.backup
Steps
- Install and configure PostgreSQL
- Update
.env with PostgreSQL URL
- Run migrations:
./scrapai db migrate
- Transfer data:
./scrapai db transfer sqlite:///scrapai.db
- Verify:
./scrapai verify
For large databases, use --skip-items to transfer only configs and metadata.
The transfer command reads from the source URL (argument) and writes to the current DATABASE_URL in .env.
Database Maintenance
Backup
cp scrapai.db scrapai.db.backup
cp scrapai.db scrapai.db.$(date +%Y%m%d_%H%M%S)
pg_dump scrapai > scrapai_backup.sql
pg_dump scrapai | gzip > scrapai_backup.sql.gz
Restore
cp scrapai.db.backup scrapai.db
psql scrapai < scrapai_backup.sql
Optimize
sqlite3 scrapai.db "VACUUM; ANALYZE;"
psql scrapai -c "VACUUM ANALYZE;"
Database Schema
ScrapAI uses SQLAlchemy ORM with Alembic migrations. Schema defined in core/models.py.
Key Tables: spiders, projects, crawls, items, queue, analysis
View Schema
sqlite3 scrapai.db ".schema"
SQLite
ScrapAI automatically applies optimal settings. For extreme performance, increase cache size in core/db.py.
PostgreSQL
Edit postgresql.conf to tune memory settings:
shared_buffers = 256MB # 25% of RAM
effective_cache_size = 1GB # 50-75% of RAM
work_mem = 16MB
random_page_cost = 1.1 # For SSD
Restart: sudo systemctl restart postgresql
Troubleshooting
Connection Failed
SQLite: Check file exists and permissions. Recreate with rm scrapai.db* && ./scrapai setup.
PostgreSQL: Test connection with psql -U user -d scrapai. Check service status and verify DATABASE_URL.
Database Locked (SQLite)
lsof scrapai.db # Find processes
pkill -f scrapai # Kill stuck processes
rm scrapai.db-wal *.db-shm # Clear WAL files
Migration Failed
Check version with ./scrapai db version. Reset with rm scrapai.db && ./scrapai setup or run alembic upgrade head.
Transfer Failed
Verify source is readable and target is accessible. Try with --skip-items.
Security
PostgreSQL Security
- Use strong passwords
- Restrict network access in
pg_hba.conf
- Enable SSL: Add
?sslmode=require to DATABASE_URL
- Set up automated backups