Spider management commands handle JSON configurations stored in the database.
spiders list
List all spiders in the database.
Syntax
./scrapai spiders list [--project < nam e > ]
Options
Filter by project name. If omitted, shows spiders from all projects.
Examples
# List all spiders across all projects
./scrapai spiders list
# List spiders in specific project
./scrapai spiders list --project news
Output
$ ./scrapai spiders list --project news
📋 Available Spiders (DB) - Project: news:
• bbc_co_uk [news] ( Active: True ) - Created: 2026-02-28 14:30, Updated: 2026-02-28 15:45
Source: https://bbc.co.uk
• cnn_com [news] ( Active: True ) - Created: 2026-02-27 09:15, Updated: 2026-02-27 09:15
Source: https://cnn.com
• reuters_com [news] ( Active: True ) - Created: 2026-02-26 16:20, Updated: 2026-02-28 11:30
Source: https://reuters.com
spiders import
Import or update a spider from a JSON configuration file.
Syntax
./scrapai spiders import < fil e > --project < nam e > [--skip-validation]
Arguments
Path to JSON spider configuration file. Use - to read from stdin.
Options
Project name to associate with this spider.
Skip Pydantic schema validation (not recommended). Use only for backward compatibility.
Examples
# Import spider from file
./scrapai spiders import bbc_spider.json --project news
# Import from stdin (useful in pipelines)
cat spider.json | ./scrapai spiders import - --project news
# Skip validation (backward compatibility)
./scrapai spiders import old_spider.json --project legacy --skip-validation
{
"name" : "bbc_co_uk" ,
"allowed_domains" : [ "bbc.co.uk" ],
"start_urls" : [ "https://www.bbc.co.uk/news" ],
"source_url" : "https://bbc.co.uk" ,
"rules" : [
{
"allow" : [ "/news/articles/[^/]+$" ],
"callback" : "parse_article" ,
"follow" : false ,
"priority" : 10
},
{
"allow" : [ "/news/?$" ],
"follow" : true ,
"priority" : 5
}
],
"settings" : {
"EXTRACTOR_ORDER" : [ "newspaper" , "trafilatura" ],
"DOWNLOAD_DELAY" : 2 ,
"CONCURRENT_REQUESTS" : 8
},
"callbacks" : {
"parse_article" : {
"extract" : {
"title" : { "css" : "h1.article-headline::text" },
"author" : { "css" : "span.author-name::text" },
"content" : { "css" : "div.article-body" , "get" : "all_text" }
}
}
}
}
Configuration Fields
Spider name (letters, numbers, hyphens, underscores only). Must be unique per project.
List of domains this spider can crawl. URLs outside these domains are filtered.
Initial URLs to crawl. Must be valid HTTP/HTTPS URLs.
Original website URL (for documentation purposes).
URL pattern matching rules. Each rule defines which URLs to follow and how to process them.
Spider-specific settings that override defaults.
Custom extraction callbacks with CSS/XPath selectors for non-article content.
Output
Successful Import
$ ./scrapai spiders import bbc_spider.json --project news
✅ Spider 'bbc_co_uk' imported successfully!
Project: news
Domains: bbc.co.uk
Start URLs: 1
Rules: 2
Callbacks: 1 (parse_article)
Update Existing Spider
$ ./scrapai spiders import bbc_spider.json --project news
⚠️ Spider 'bbc_co_uk' already exists. Updating...
✅ Spider 'bbc_co_uk' imported successfully!
Project: news
Domains: bbc.co.uk
Start URLs: 1
Rules: 2
Callbacks: 1 (parse_article)
Re-importing a spider replaces its configuration entirely. All rules and settings are deleted and recreated.
Validation Failure
$ ./scrapai spiders import bad_spider.json --project news
❌ Spider configuration validation failed:
• name: string does not match pattern "^[a-zA-Z0-9_-]+$"
• start_urls - > 0: URL scheme must be http or https
• settings - > CONCURRENT_REQUESTS: value must be between 1 and 32
💡 Use --skip-validation to bypass validation (not recommended )
spiders delete
Delete a spider and all its associated data.
Syntax
./scrapai spiders delete < nam e > [--project < nam e > ] [--force]
Arguments
Options
Project name. If specified, only deletes spider from that project.
Skip confirmation prompt.
Examples
# Delete spider with confirmation
./scrapai spiders delete bbc_co_uk --project news
# Delete without confirmation
./scrapai spiders delete old_spider --project archive --force
Output
With Confirmation
$ ./scrapai spiders delete bbc_co_uk --project news
Are you sure you want to delete spider 'bbc_co_uk' in project 'news'? (y/N): y
🗑️ Spider 'bbc_co_uk' in project 'news' deleted!
Force Delete
$ ./scrapai spiders delete bbc_co_uk --project news --force
🗑️ Spider 'bbc_co_uk' in project 'news' deleted!
Deleting a spider removes all configuration, rules, settings, and all scraped items associated with this spider. This operation cannot be undone.
Next Steps
Run Crawls Start crawling with your imported spiders
View Data Inspect and export scraped items