Spider Management

Spider management commands handle JSON configurations stored in the database.

spiders list

List all spiders in the database.

Syntax

./scrapai spiders list [--project <name>]

Options

--project

string

Filter by project name. If omitted, shows spiders from all projects.

Examples

# List all spiders across all projects
./scrapai spiders list

# List spiders in specific project
./scrapai spiders list --project news

Output

$ ./scrapai spiders list --project news
📋 Available Spiders (DB) - Project: news:
  • bbc_co_uk [news] (Active: True) - Created: 2026-02-28 14:30, Updated: 2026-02-28 15:45
    Source: https://bbc.co.uk
  • cnn_com [news] (Active: True) - Created: 2026-02-27 09:15, Updated: 2026-02-27 09:15
    Source: https://cnn.com
  • reuters_com [news] (Active: True) - Created: 2026-02-26 16:20, Updated: 2026-02-28 11:30
    Source: https://reuters.com

spiders import

Import or update a spider from a JSON configuration file.

Syntax

./scrapai spiders import <file> --project <name> [--skip-validation]

Arguments

file

string

required

Path to JSON spider configuration file. Use - to read from stdin.

Options

--project

string

default:"default"

Project name to associate with this spider.

--skip-validation

flag

Skip Pydantic schema validation (not recommended). Use only for backward compatibility.

Examples

# Import spider from file
./scrapai spiders import bbc_spider.json --project news

# Import from stdin (useful in pipelines)
cat spider.json | ./scrapai spiders import - --project news

# Skip validation (backward compatibility)
./scrapai spiders import old_spider.json --project legacy --skip-validation

Spider Configuration Format

{
  "name": "bbc_co_uk",
  "allowed_domains": ["bbc.co.uk"],
  "start_urls": ["https://www.bbc.co.uk/news"],
  "source_url": "https://bbc.co.uk",
  "rules": [
    {
      "allow": ["/news/articles/[^/]+$"],
      "callback": "parse_article",
      "follow": false,
      "priority": 10
    },
    {
      "allow": ["/news/?$"],
      "follow": true,
      "priority": 5
    }
  ],
  "settings": {
    "EXTRACTOR_ORDER": ["newspaper", "trafilatura"],
    "DOWNLOAD_DELAY": 2,
    "CONCURRENT_REQUESTS": 8
  },
  "callbacks": {
    "parse_article": {
      "extract": {
        "title": {"css": "h1.article-headline::text"},
        "author": {"css": "span.author-name::text"},
        "content": {"css": "div.article-body", "get": "all_text"}
      }
    }
  }
}

Configuration Fields

name

string

required

Spider name (letters, numbers, hyphens, underscores only). Must be unique per project.

allowed_domains

array

required

List of domains this spider can crawl. URLs outside these domains are filtered.

start_urls

array

required

Initial URLs to crawl. Must be valid HTTP/HTTPS URLs.

source_url

string

Original website URL (for documentation purposes).

rules

array

URL pattern matching rules. Each rule defines which URLs to follow and how to process them.

settings

object

Spider-specific settings that override defaults.

callbacks

object

Custom extraction callbacks with CSS/XPath selectors for non-article content.

Output

Successful Import

$ ./scrapai spiders import bbc_spider.json --project news
✅ Spider 'bbc_co_uk' imported successfully!
   Project: news
   Domains: bbc.co.uk
   Start URLs: 1
   Rules: 2
   Callbacks: 1 (parse_article)

Update Existing Spider

$ ./scrapai spiders import bbc_spider.json --project news
⚠️  Spider 'bbc_co_uk' already exists. Updating...
✅ Spider 'bbc_co_uk' imported successfully!
   Project: news
   Domains: bbc.co.uk
   Start URLs: 1
   Rules: 2
   Callbacks: 1 (parse_article)

Re-importing a spider replaces its configuration entirely. All rules and settings are deleted and recreated.

Validation Failure

$ ./scrapai spiders import bad_spider.json --project news
❌ Spider configuration validation failed:
   • name: string does not match pattern "^[a-zA-Z0-9_-]+$"
   • start_urls -> 0: URL scheme must be http or https
   • settings -> CONCURRENT_REQUESTS: value must be between 1 and 32

💡 Use --skip-validation to bypass validation (not recommended)

spiders delete

Delete a spider and all its associated data.

Syntax

./scrapai spiders delete <name> [--project <name>] [--force]

Arguments

name

string

required

Spider name to delete.

Options

--project

string

Project name. If specified, only deletes spider from that project.

--force, -f

flag

Skip confirmation prompt.

Examples

# Delete spider with confirmation
./scrapai spiders delete bbc_co_uk --project news

# Delete without confirmation
./scrapai spiders delete old_spider --project archive --force

Output

With Confirmation

$ ./scrapai spiders delete bbc_co_uk --project news
Are you sure you want to delete spider 'bbc_co_uk' in project 'news'? (y/N): y
🗑️  Spider 'bbc_co_uk' in project 'news' deleted!

Force Delete

$ ./scrapai spiders delete bbc_co_uk --project news --force
🗑️  Spider 'bbc_co_uk' in project 'news' deleted!

Deleting a spider removes all configuration, rules, settings, and all scraped items associated with this spider. This operation cannot be undone.

Commands

Spider Management

spiders list

Syntax

Options

Examples

Output

spiders import

Syntax

Arguments

Options

Examples

Spider Configuration Format

Configuration Fields

Output

Successful Import

Update Existing Spider

Validation Failure

spiders delete

Syntax

Arguments

Options

Examples

Output

With Confirmation

Force Delete

Next Steps

Run Crawls

View Data

​spiders list

​Syntax

​Options

​Examples

​Output

​spiders import

​Syntax

​Arguments

​Options

​Examples

​Spider Configuration Format

​Configuration Fields

​Output

​Successful Import

​Update Existing Spider

​Validation Failure

​spiders delete

​Syntax

​Arguments

​Options

​Examples

​Output

​With Confirmation

​Force Delete

​Next Steps

Run Crawls

View Data

spiders list

Syntax

Options

Examples

Output

spiders import

Syntax

Arguments

Options

Examples

Spider Configuration Format

Configuration Fields

Output

Successful Import

Update Existing Spider

Validation Failure

spiders delete

Syntax

Arguments

Options

Examples

Output

With Confirmation

Force Delete

Next Steps