Skip to main content
The queue system enables batch processing of multiple websites with status tracking, priority ordering, and atomic claim operations. Perfect for processing hundreds of URLs with AI agents or parallel workers.

queue add

Add a single website to the queue.

Syntax

./scrapai queue add <url> --project <name> [options]

Arguments

url
string
required
Website URL to add to queue.

Options

--project
string
default:"default"
Project name.
--message, -m
string
Custom instruction for processing this URL (e.g., “Focus on product pricing”).
--priority
integer
default:"5"
Priority level (higher = processed sooner). Range: 1-10.

Examples

# Simple add
./scrapai queue add https://example.com --project myproject

# With custom instructions
./scrapai queue add https://techsite.com --project tech \
  --message "Extract author and publication date"

# High priority
./scrapai queue add https://urgent.com --project news --priority 10

Output

$ ./scrapai queue add https://bbc.co.uk --project news --priority 8
 Added to queue (ID: 42)
   URL: https://bbc.co.uk
   Project: news
   Priority: 8

Duplicate Handling

$ ./scrapai queue add https://bbc.co.uk --project news
⚠️  URL already exists in queue
 ID: 42
   Status: pending
   URL: https://bbc.co.uk
   Skipping duplicate...
Duplicates are detected by (project_name, website_url) combination.

queue bulk

Bulk add URLs from CSV or JSON file.

Syntax

./scrapai queue bulk <file> --project <name> [options]

Arguments

file
string
required
Path to CSV or JSON file containing URLs.

Options

--project
string
default:"default"
Project name for all URLs.
--priority
integer
default:"5"
Default priority for all URLs (can be overridden per-row).

CSV Format

Requires a url column. Optional: custom_instruction, priority.
url,custom_instruction,priority
https://bbc.co.uk,Focus on UK news,8
https://cnn.com,Focus on breaking news,9
https://reuters.com,,7

JSON Format

Array of objects with url field. Optional: custom_instruction, priority.
[
  {
    "url": "https://bbc.co.uk",
    "custom_instruction": "Focus on UK news",
    "priority": 8
  },
  {
    "url": "https://cnn.com",
    "custom_instruction": "Focus on breaking news",
    "priority": 9
  },
  {
    "url": "https://reuters.com",
    "priority": 7
  }
]

Examples

# Bulk add from CSV
./scrapai queue bulk websites.csv --project news

# Bulk add from JSON
./scrapai queue bulk urls.json --project ecommerce --priority 6

Output

$ ./scrapai queue bulk news_sites.csv --project news
 Bulk add complete:
   Added: 47
   Skipped (duplicates/invalid): 3
   Project: news
   Format: CSV
Use templates/queue-template.csv as a starting point for your CSV files.

queue list

List queue items with filtering.

Syntax

./scrapai queue list --project <name> [options]

Options

--project
string
default:"default"
Project name.
--status
string
Filter by status: pending, processing, completed, failed.
--limit
integer
default:"5"
Maximum items to show.
--all
flag
Show all items including completed and failed (default shows only pending/processing).
--count
flag
Show only the count, no details.

Examples

# Show pending/processing items (default)
./scrapai queue list --project news

# Show all items
./scrapai queue list --project news --all

# Show only failed items
./scrapai queue list --project news --status failed

# Count pending items
./scrapai queue list --project news --status pending --count

Output

$ ./scrapai queue list --project news
📋 Queue for project 'news':

 [42] https://bbc.co.uk
   Status: pending | Priority: 8
   Instructions: Focus on UK news

🔄 [43] https://cnn.com
   Status: processing | Priority: 9
   Processing by: user@hostname (since 2026-02-28 15:30)

 [44] https://reuters.com
   Status: pending | Priority: 7

Status Icons

  • Pending: Waiting to be processed
  • 🔄 Processing: Currently being worked on
  • Completed: Successfully processed
  • Failed: Processing failed

queue next

Atomically claim the next pending item.

Syntax

./scrapai queue next --project <name>

Options

--project
string
default:"default"
Project name.

Behavior

  1. Finds highest priority pending item
  2. Atomically updates status to processing
  3. Sets processing_by to username@hostname
  4. Sets locked_at to current timestamp
PostgreSQL: Uses FOR UPDATE SKIP LOCKED (race-safe) SQLite: Uses conditional UPDATE (race-prone under high concurrency)

Example

./scrapai queue next --project news

Output

Item Claimed

🔄 Claimed item from queue:
   ID: 42
   URL: https://bbc.co.uk
   Instructions: Focus on UK news
   Priority: 8
   Locked by: user@hostname

Empty Queue

📬 No pending items in queue for project 'news'

Use in Agents

Typical agent workflow:
# Claim next item
./scrapai queue next --project news
# ID: 42, URL: https://bbc.co.uk

# Process the website
# (AI agent analyzes site, generates spider, tests, imports)

# Mark as completed
./scrapai queue complete 42

# Or mark as failed
./scrapai queue fail 42 --message "Site requires authentication"

queue complete

Mark an item as successfully completed.

Syntax

./scrapai queue complete <id>

Arguments

id
integer
required
Queue item ID.

Example

./scrapai queue complete 42

Output

 Item 42 marked as completed
   URL: https://bbc.co.uk

queue fail

Mark an item as failed with error message.

Syntax

./scrapai queue fail <id> [--message <error>]

Arguments

id
integer
required
Queue item ID.

Options

--message, -m
string
Error message explaining why processing failed.

Example

./scrapai queue fail 42 --message "Site requires login"

Output

 Item 42 marked as failed
   URL: https://bbc.co.uk
   Error: Site requires login

queue retry

Reset a failed item to pending status.

Syntax

./scrapai queue retry <id>

Arguments

id
integer
required
Queue item ID.

Example

./scrapai queue retry 42

Output

🔄 Item 42 reset to pending (retry count: 1)
   URL: https://bbc.co.uk
Retry count is tracked but not enforced. You can retry items indefinitely.

queue remove

Remove an item from the queue.

Syntax

./scrapai queue remove <id>

Arguments

id
integer
required
Queue item ID.

Example

./scrapai queue remove 42

Output

🗑️  Item 42 removed from queue
   URL: https://bbc.co.uk

queue cleanup

Bulk remove completed or failed items.

Syntax

./scrapai queue cleanup --project <name> [options]

Options

--project
string
default:"default"
Project name.
--completed
flag
Remove all completed items.
--failed
flag
Remove all failed items.
--all
flag
Remove all completed and failed items.
--force
flag
Skip confirmation prompt.

Examples

# Clean up completed items
./scrapai queue cleanup --project news --completed

# Clean up failed items
./scrapai queue cleanup --project news --failed

# Clean up everything (completed + failed)
./scrapai queue cleanup --project news --all --force

Output

$ ./scrapai queue cleanup --project news --all
🗑️  Found 47 items to remove:
 [42] https://bbc.co.uk
 [43] https://cnn.com
 [44] https://broken-site.com
 [45] https://reuters.com
   ... and 43 more

Remove 47 items? (y/N): y
 Removed 47 items from queue

Database Schema

crawl_queue Table

CREATE TABLE crawl_queue (
    id INTEGER PRIMARY KEY,
    project_name VARCHAR(255) NOT NULL,
    website_url VARCHAR(2048) NOT NULL,
    custom_instruction TEXT,
    status VARCHAR(50) NOT NULL DEFAULT 'pending',
    priority INTEGER NOT NULL DEFAULT 5,
    processing_by VARCHAR(255),
    locked_at TIMESTAMP,
    error_message TEXT,
    retry_count INTEGER NOT NULL DEFAULT 0,
    created_at TIMESTAMP NOT NULL,
    updated_at TIMESTAMP NOT NULL,
    completed_at TIMESTAMP,
    UNIQUE(project_name, website_url)
);

Status Values

  • pending: Waiting to be processed
  • processing: Currently being worked on (locked)
  • completed: Successfully processed
  • failed: Processing failed with error

Parallel Processing

Manual Loop

while true; do
  ./scrapai queue next --project news | grep "ID:" || break
  # Process the URL
  # Mark complete/failed
done

Parallel Workers

Run multiple workers in parallel:
# Terminal 1
while true; do
  ./scrapai queue next --project news && # process
done

# Terminal 2
while true; do
  ./scrapai queue next --project news && # process
done

# Terminal 3
while true; do
  ./scrapai queue next --project news && # process
done
PostgreSQL recommended for parallel processing. SQLite lacks FOR UPDATE SKIP LOCKED and may have race conditions under high concurrency.

Next Steps