Database Commands

Database commands provide direct access to the scrapai database for migrations, queries, statistics, and transferring data between databases.

db migrate

Run Alembic database migrations to update the database schema.

Syntax

./scrapai db migrate

Output

$ ./scrapai db migrate
🔄 Running database migrations...
INFO  [alembic.runtime.migration] Context impl SQLiteImpl.
INFO  [alembic.runtime.migration] Will assume non-transactional DDL.
INFO  [alembic.runtime.migration] Running upgrade -> a1b2c3d4e5f6, add callbacks_config
INFO  [alembic.runtime.migration] Running upgrade a1b2c3d4e5f6 -> b2c3d4e5f6g7, add queue table
✅ Migrations completed successfully!

Use Cases

After git pull: Apply new schema changes
After scrapai upgrade: Update to new database structure
Initial setup: Run by ./scrapai setup automatically

db current

Show current migration revision.

Syntax

./scrapai db current

Output

$ ./scrapai db current
b2c3d4e5f6g7 (head)

db stats

Show database statistics.

Syntax

./scrapai db stats

Output

$ ./scrapai db stats
📊 Database Statistics

   Spiders: 23
   Scraped Items: 45,891
   Projects: 5

   Queue Items: 127
      • Pending: 89
      • Processing: 3
      • Completed: 31
      • Failed: 4

Use Cases

Monitor database growth
Check queue status at a glance
Verify data after imports/migrations

db tables

List all tables with row counts.

Syntax

./scrapai db tables

Output

$ ./scrapai db tables
📋 Database Tables

   alembic_version        1 rows
   crawl_queue            127 rows
   scraped_items          45,891 rows
   spider_rules           67 rows
   spider_settings        152 rows
   spiders                23 rows

Table Validation

Only scrapai tables are displayed. System tables and non-scrapai tables are skipped:

   sqlite_sequence        (skipped: not a scrapai table)
   my_custom_table        (skipped: not a scrapai table)

Table validation prevents SQL injection by using a whitelist of known scrapai table names.

db inspect

Show schema for a specific table.

Syntax

./scrapai db inspect <table>

Arguments

table

string

required

Table name. Must be one of: spiders, scraped_items, crawl_queue, spider_rules, spider_settings, alembic_version.

Examples

./scrapai db inspect spiders
./scrapai db inspect scraped_items
./scrapai db inspect crawl_queue

Output (SQLite)

$ ./scrapai db inspect spiders
🔍 Table: spiders

Column                Type                 Nullable  Default
----------------------------------------------------------------------
id                    INTEGER              NO        
name                  VARCHAR(255)         NO        
allowed_domains       JSON                 YES       
start_urls            JSON                 YES       
source_url            VARCHAR(2048)        YES       
active                BOOLEAN              NO        1
project               VARCHAR(255)         YES       
callbacks_config      JSON                 YES       
created_at            TIMESTAMP            YES       CURRENT_TIMESTAMP
updated_at            TIMESTAMP            YES       CURRENT_TIMESTAMP

Total rows: 23

Output (PostgreSQL)

$ ./scrapai db inspect spiders
🔍 Table: spiders

Column                Type                 Nullable  Default
----------------------------------------------------------------------
id                    integer              NO        nextval('spiders_id_seq')
name                  character varying(255) NO      
allowed_domains       jsonb                YES       
start_urls            jsonb                YES       
source_url            character varying(2048) YES   
active                boolean              NO        true
project               character varying(255) YES    
callbacks_config      jsonb                YES       
created_at            timestamp            YES       now()
updated_at            timestamp            YES       now()

Total rows: 23

Invalid Table

$ ./scrapai db inspect my_table
❌ Invalid table name: 'my_table'
Valid tables: alembic_version, crawl_queue, scraped_items, spider_rules, spider_settings, spiders

db query

Execute SQL queries (SELECT, UPDATE, DELETE only).

Syntax

./scrapai db query "<sql>" [options]

Arguments

sql

string

required

SQL query to execute.

Options

--format

choice

default:"table"

Output format: table, json, csv.

--yes

flag

Skip confirmation prompt for UPDATE/DELETE.

SELECT Queries

# Table format (default)
./scrapai db query "SELECT * FROM spiders LIMIT 5"

# JSON format
./scrapai db query "SELECT * FROM spiders LIMIT 5" --format json

# CSV format
./scrapai db query "SELECT * FROM spiders LIMIT 5" --format csv

Output (Table Format)

$ ./scrapai db query "SELECT name, project, active FROM spiders LIMIT 3"

name          | project | active
------------------------------------
bbc_co_uk     | news    | True
cnn_com       | news    | True
reuters_com   | news    | True

(3 rows)

Output (JSON Format)

$ ./scrapai db query "SELECT * FROM spiders LIMIT 2" --format json
[
  {
    "id": 1,
    "name": "bbc_co_uk",
    "allowed_domains": ["bbc.co.uk"],
    "start_urls": ["https://www.bbc.co.uk/news"],
    "source_url": "https://bbc.co.uk",
    "active": true,
    "project": "news",
    "created_at": "2026-02-28T14:30:00"
  },
  {
    "id": 2,
    "name": "cnn_com",
    "allowed_domains": ["cnn.com"],
    "start_urls": ["https://www.cnn.com"],
    "source_url": "https://cnn.com",
    "active": true,
    "project": "news",
    "created_at": "2026-02-27T09:15:00"
  }
]

Output (CSV Format)

$ ./scrapai db query "SELECT name, project FROM spiders LIMIT 3" --format csv
name,project
bbc_co_uk,news
cnn_com,news
reuters_com,news

UPDATE Queries

# With confirmation
./scrapai db query "UPDATE spider_settings SET value='3' WHERE key='DOWNLOAD_DELAY'"

# Skip confirmation
./scrapai db query "UPDATE spiders SET active=false WHERE project='archive'" --yes

Output

⚠️  This will UPDATE 8 row(s). Continue? [y/N] y
✅ UPDATE complete — 8 row(s) affected

DELETE Queries

./scrapai db query "DELETE FROM scraped_items WHERE spider_id = 5"

Output

⚠️  This will DELETE 1247 row(s). Continue? [y/N] y
✅ DELETE complete — 1247 row(s) affected

DELETE operations cannot be undone. Always verify row count before confirming.

Blocked Operations

INSERT, DROP, ALTER, and TRUNCATE are blocked for safety:

$ ./scrapai db query "DROP TABLE spiders"
❌ Only SELECT, UPDATE, and DELETE queries are allowed
   INSERT, DROP, ALTER, and TRUNCATE are blocked for safety

Complex Queries

Joins, aggregations, and subqueries are supported:

# Count items per spider
./scrapai db query "
SELECT s.name, COUNT(si.id) as item_count 
FROM spiders s 
LEFT JOIN scraped_items si ON s.id = si.spider_id 
GROUP BY s.name 
ORDER BY item_count DESC
"

# Find spiders with no items
./scrapai db query "
SELECT name FROM spiders 
WHERE id NOT IN (SELECT DISTINCT spider_id FROM scraped_items)
"

# Queue statistics by project
./scrapai db query "
SELECT project_name, status, COUNT(*) as count 
FROM crawl_queue 
GROUP BY project_name, status 
ORDER BY project_name, status
"

db transfer

Transfer data from another database.

Syntax

./scrapai db transfer <source_url> [--skip-items]

Arguments

source_url

string

required

Source database connection string.

Options

--skip-items

flag

Skip scraped_items table (transfer only spiders and queue).

Workflow

Update DATABASE_URL in .env to your new database
Run migrations on new database: ./scrapai db migrate
Transfer data from old database: ./scrapai db transfer <old_db_url>

Examples

SQLite to PostgreSQL

# 1. Update .env
DATABASE_URL=postgresql://user:pass@localhost:5432/scrapai

# 2. Run migrations
./scrapai db migrate

# 3. Transfer from SQLite
./scrapai db transfer sqlite:///scrapai.db

PostgreSQL to PostgreSQL

# 1. Update .env to new database
DATABASE_URL=postgresql://user:pass@new-host:5432/scrapai

# 2. Run migrations
./scrapai db migrate

# 3. Transfer from old database
./scrapai db transfer postgresql://user:pass@old-host:5432/scrapai

Skip Scraped Items

./scrapai db transfer sqlite:///scrapai.db --skip-items

Output

$ ./scrapai db transfer sqlite:///old_scrapai.db
📦 Source (old): sqlite:///old_scrapai.db
📦 Target (current): postgresql://localhost:5432/scrapai

🕷️  Transferring 23 spiders...
   ✅ 23 spiders (with rules and settings)

📰 Transferring 45,891 scraped items...
   ... 10,000/45,891
   ... 20,000/45,891
   ... 30,000/45,891
   ... 40,000/45,891
   ✅ 45,891 items

📋 Transferring 127 queue items...
   ✅ 127 queue items

🎉 Transfer complete! Your new database is ready.

What Gets Transferred

Spiders: All spider configurations
Spider Rules: URL matching rules
Spider Settings: Custom settings
Scraped Items: All scraped data (unless --skip-items)
Queue: All queue items with status

ID Remapping

Foreign key relationships are preserved. New spider IDs are assigned and referenced data is updated accordingly.

Error Handling

$ ./scrapai db transfer postgresql://localhost:5432/scrapai
❌ Source is the same as current database.
   Update DATABASE_URL in .env to your new database first.

$ ./scrapai db transfer postgresql://bad-host:5432/db
❌ Transfer failed: connection to server at "bad-host" failed

All operations are transactional. If transfer fails, target database is rolled back.

Safety Features

SQL Injection Prevention

Table names validated against whitelist
All queries use SQLAlchemy parameterized bindings
INSERT, DROP, ALTER, TRUNCATE blocked via db query

Confirmation Prompts

UPDATE and DELETE queries show affected row count and require confirmation. Use --yes to skip prompts.

PostgreSQL vs SQLite

PostgreSQL Benefits

Atomic queue operations: FOR UPDATE SKIP LOCKED for race-free parallel processing
Better performance for large datasets (100k+ items)
Concurrent writes from multiple workers
Advanced JSONB operators for querying spider configs

SQLite Benefits

Zero configuration, no server setup required
Single file, easy to backup and copy
Sufficient for most use cases (under 100k items)

Switching Databases

# In .env
DATABASE_URL=postgresql://user:password@localhost:5432/scrapai

# Run migrations
./scrapai db migrate

# Transfer from SQLite
./scrapai db transfer sqlite:///scrapai.db

Example Queries

Spider Analytics

# Spiders by item count
./scrapai db query "
SELECT s.name, s.project, COUNT(si.id) as items
FROM spiders s
LEFT JOIN scraped_items si ON s.id = si.spider_id
GROUP BY s.name, s.project
ORDER BY items DESC
" --format table

# Recent spiders
./scrapai db query "
SELECT name, project, created_at 
FROM spiders 
ORDER BY created_at DESC 
LIMIT 10
"

Queue Analytics

# Queue status summary
./scrapai db query "
SELECT status, COUNT(*) as count 
FROM crawl_queue 
GROUP BY status
"

# Failed items with errors
./scrapai db query "
SELECT website_url, error_message, updated_at 
FROM crawl_queue 
WHERE status='failed' 
ORDER BY updated_at DESC
" --format json

Scraped Items

# Items by date
./scrapai db query "
SELECT DATE(scraped_at) as date, COUNT(*) as count 
FROM scraped_items 
GROUP BY DATE(scraped_at) 
ORDER BY date DESC 
LIMIT 7
"

# Search content
./scrapai db query "
SELECT title, url 
FROM scraped_items 
WHERE content LIKE '%climate change%' 
LIMIT 10
"

Troubleshooting

Migration Failed

❌ Migration failed!

Check database connection (DATABASE_URL in .env), write permissions, and disk space.

# Test connection (PostgreSQL)
psql $DATABASE_URL -c "SELECT 1"

# Test connection (SQLite)
ls -la scrapai.db

Query Timeout

# Add LIMIT clause
./scrapai db query "SELECT * FROM scraped_items LIMIT 1000"

# Or use export command
./scrapai export myspider --project myproject --format csv

Transfer Failed

# Target database is rolled back (safe). Fix the issue and retry:
./scrapai db transfer <source_url>

​db migrate

​Syntax

​Output

​Use Cases

​db current

​Syntax

​Output

​db stats

​Syntax

​Output

​Use Cases

​db tables

​Syntax

​Output

​Table Validation

​db inspect

​Syntax

​Arguments

​Examples

​Output (SQLite)

​Output (PostgreSQL)

​Invalid Table

​db query

​Syntax

​Arguments

​Options

​SELECT Queries

​Output (Table Format)

​Output (JSON Format)

​Output (CSV Format)

​UPDATE Queries

​Output

​DELETE Queries

​Output

​Blocked Operations

​Complex Queries

​db transfer

​Syntax

​Arguments

​Options

​Workflow

​Examples

​SQLite to PostgreSQL

​PostgreSQL to PostgreSQL

​Skip Scraped Items

​Output

​What Gets Transferred

​ID Remapping

​Error Handling

​Safety Features

​SQL Injection Prevention

​Confirmation Prompts

​PostgreSQL vs SQLite

​PostgreSQL Benefits

​SQLite Benefits

​Switching Databases

​Example Queries

​Spider Analytics

​Queue Analytics

​Scraped Items

​Troubleshooting

​Migration Failed

​Query Timeout

​Transfer Failed

​Next Steps

Data Commands

Queue Management

db migrate

Syntax

Output

Use Cases

db current

Syntax

Output

db stats

Syntax

Output

Use Cases

db tables

Syntax

Output

Table Validation

db inspect

Syntax

Arguments

Examples

Output (SQLite)

Output (PostgreSQL)

Invalid Table

db query

Syntax

Arguments

Options

SELECT Queries

Output (Table Format)

Output (JSON Format)

Output (CSV Format)

UPDATE Queries

Output

DELETE Queries

Output

Blocked Operations

Complex Queries

db transfer

Syntax

Arguments

Options

Workflow

Examples

SQLite to PostgreSQL

PostgreSQL to PostgreSQL

Skip Scraped Items

Output

What Gets Transferred

ID Remapping

Error Handling

Safety Features

SQL Injection Prevention

Confirmation Prompts

PostgreSQL vs SQLite

PostgreSQL Benefits

SQLite Benefits

Switching Databases

Example Queries

Spider Analytics

Queue Analytics

Scraped Items

Troubleshooting

Migration Failed

Query Timeout

Transfer Failed

Next Steps