Skip to main content
ScrapAI is designed to work with AI coding agents that read workflow instructions, analyze websites, and produce validated JSON configs through the CLI. The agent becomes your scraping assistant—you describe what you want in plain English, and it handles the technical work.

How It Works

You (plain English) → AI Agent → JSON config → Database → Scrapy crawl
                       (once)                               (forever)
Agent generates JSON configs, not Python code. See Security-First Design for why this matters.

What the Agent Does

1

Analyze site

Inspects homepage, identifies sections and URL patterns
2

Write extraction rules

Configures extractors or CSS selectors for content
3

Test and save

Runs test crawl, verifies quality, stores to database

Supported Agents

Claude Code is what we use and test with. The complete workflow instructions fit in ~5k tokens, and ./scrapai setup configures permission rules that block the agent from modifying framework code.
claude
You: "Add https://bbc.com to my news project"
Agent: [Analyzes site, generates rules, tests extraction, deploys spider]

You: "Here's a CSV with 200 websites, add them all to the queue"
Agent: [Queues them, processes in parallel batches]
Claude Code enforces permission rules at the tool level, blocking all Python file modifications (Write/Edit/Update/MultiEdit(**/*.py)), sensitive files, web access, and destructive shell commands. This is the only agent with guaranteed enforcement via .claude/settings.local.json.

Other Coding Agents

OpenCode, Cursor, Windsurf, Antigravity, and other agents should work with any agent that can read instructions and run shell commands. An AGENTS.md file is included for these agents.
These agents lack Claude Code’s permission enforcement, so review changes carefully. They receive instructions but cannot enforce tool-level blocks.

Claws

ScrapAI works with any Claw that can read instructions and execute shell commands. We tested with NanoClaw for autonomous operation via Telegram. More rigorous testing is in progress with other Claws like PicoClaw, IronClaw, and Nanobot.

Example Workflow

You: "Add https://techcrunch.com to my news project"

Agent: Analyzes site → Identifies URL patterns → Tests extractors →
       Creates config → Test crawl (5/5 articles) → Imports to database

Spider 'techcrunch_com' ready: ./scrapai crawl techcrunch_com --project news
The agent writes analysis notes in sections.md during the process. Review and correct assumptions as needed.

Next Steps

Claude Code Setup

Complete guide for setting up Claude Code with ScrapAI

4-Phase Workflow

Understand the analysis → rules → import → test workflow