AI Agents Overview

scrapai is designed to work with AI coding agents that read workflow instructions, analyze websites, and produce validated JSON configs through the CLI. The agent becomes your scraping assistant—you describe what you want in plain English, and it handles the technical work.

How It Works

You (plain English) → AI Agent → JSON config → Database → Scrapy crawl
                       (once)                               (forever)

Agent generates JSON configs, not Python code. See Security-First Design for why this matters.

What the Agent Does

Analyze site

Inspects homepage, identifies sections and URL patterns

Write extraction rules

Configures extractors or CSS selectors for content

Test and save

Runs test crawl, verifies quality, stores to database

Supported Agents

Claude Code (Recommended)

Claude Code is what we use and test with. The complete workflow instructions live in CLAUDE.md, and ./scrapai setup configures permission rules that block the agent from modifying framework code.

claude

You: "Add https://bbc.com to my news project"
Agent: [Analyzes site, generates rules, tests extraction, deploys spider]

You: "Here's a CSV with 200 websites, add them all to the queue"
Agent: [Queues them, processes in parallel batches]

Claude Code enforces permission rules at the tool level, blocking all Python file modifications (Write/Edit/Update/MultiEdit(**/*.py)), sensitive files, web access, and destructive shell commands. This is the only agent with guaranteed enforcement via .claude/settings.local.json.

Other Coding Agents

OpenCode, Cursor, Windsurf, Antigravity, and other agents should work with any agent that can read instructions and run shell commands. An AGENTS.md file is included for these agents.

These agents lack Claude Code’s permission enforcement, so review changes carefully. They receive instructions but cannot enforce tool-level blocks.

Claws

scrapai works with any Claw that can read instructions and execute shell commands. We tested with NanoClaw for autonomous operation via Telegram. More rigorous testing is in progress with other Claws like PicoClaw, IronClaw, and Nanobot.

Example Workflow

You: "Add https://techcrunch.com to my news project"

Agent: Analyzes site → Identifies URL patterns → Tests extractors →
       Creates config → Test crawl (5/5 articles) → Imports to database

Spider 'techcrunch_com' ready: ./scrapai crawl techcrunch_com --project news

The agent writes analysis notes in sections.md during the process. Review and correct assumptions as needed.

Get Started

Core Concepts

AI Agents

Guides

Configuration

Advanced

AI Agents Overview

How It Works

What the Agent Does

Supported Agents

Claude Code (Recommended)

Other Coding Agents

Claws

Example Workflow

Next Steps

Claude Code Setup

4-Phase Workflow

​How It Works

​What the Agent Does

​Supported Agents

​Claude Code (Recommended)

​Other Coding Agents

​Claws

​Example Workflow

​Next Steps

Claude Code Setup

4-Phase Workflow

How It Works

What the Agent Does

Supported Agents

Claude Code (Recommended)

Other Coding Agents

Claws

Example Workflow

Next Steps