# ScrapAI

## Docs

- [Apache Airflow Integration](https://docs.scrapai.dev/advanced/airflow-scheduling.md): Production scheduling and monitoring with Apache Airflow
- [Migrating Existing Scrapers](https://docs.scrapai.dev/advanced/migration.md): Convert Scrapy, BeautifulSoup, and other scrapers to ScrapAI
- [Parallel Crawling](https://docs.scrapai.dev/advanced/parallel-crawling.md): Run multiple spiders simultaneously with GNU parallel
- [Security](https://docs.scrapai.dev/advanced/security.md): Security validation, SSRF protection, and agent safety
- [Claude Code Integration](https://docs.scrapai.dev/agents/claude-code.md): Complete guide for using Claude Code with ScrapAI, including setup, permissions, and workflow
- [AI Agents Overview](https://docs.scrapai.dev/agents/overview.md): Use AI agents to automatically analyze websites and generate production-ready scrapers
- [4-Phase Workflow](https://docs.scrapai.dev/agents/workflow.md): Complete workflow documentation for building scrapers with AI agents
- [Callbacks](https://docs.scrapai.dev/api/callbacks.md): Custom field extraction for structured data
- [Custom CSS Extractors](https://docs.scrapai.dev/api/custom-extractors.md): Site-specific extraction using CSS selectors
- [Extractors Overview](https://docs.scrapai.dev/api/extractors-overview.md): Content extraction strategies and fallback order
- [Newspaper4k Extractor](https://docs.scrapai.dev/api/newspaper.md): General-purpose article extraction for news and blogs
- [Playwright Extractor](https://docs.scrapai.dev/api/playwright.md): Browser rendering for JavaScript-heavy sites
- [Spider Rules](https://docs.scrapai.dev/api/rules.md): URL matching and routing configuration
- [Spider Settings](https://docs.scrapai.dev/api/settings.md): Configuration options for spider behavior
- [Spider JSON Schema](https://docs.scrapai.dev/api/spider-schema.md): Complete spider configuration schema reference
- [Trafilatura Extractor](https://docs.scrapai.dev/api/trafilatura.md): Lightweight, high-accuracy content extraction
- [Crawl Commands](https://docs.scrapai.dev/cli/crawl.md): Run spiders in test and production mode with checkpoint support
- [Data Commands](https://docs.scrapai.dev/cli/data.md): View and export scraped items from database
- [Database Commands](https://docs.scrapai.dev/cli/database.md): Migrations, queries, statistics, and data transfer
- [Inspect Command](https://docs.scrapai.dev/cli/inspect.md): Analyze websites to help create scrapers
- [CLI Overview](https://docs.scrapai.dev/cli/overview.md): Complete command-line interface reference for ScrapAI
- [Projects Commands](https://docs.scrapai.dev/cli/projects.md): List and manage project organization
- [Queue Management](https://docs.scrapai.dev/cli/queue.md): Batch processing with database-backed queue and atomic locking
- [Setup & Verification](https://docs.scrapai.dev/cli/setup.md): Installation, environment setup, and verification commands
- [Spider Management](https://docs.scrapai.dev/cli/spiders.md): Import, list, and delete spider configurations
- [ScrapAI vs Other Tools](https://docs.scrapai.dev/comparison.md): A practical comparison of ScrapAI, Scrapling, and crawl4ai for anyone choosing a scraping tool
- [Architecture](https://docs.scrapai.dev/concepts/architecture.md): System components and data flow in ScrapAI CLI
- [Database-First Spider Management](https://docs.scrapai.dev/concepts/database-first.md): Why spiders live in the database, not in Python files
- [How It Works](https://docs.scrapai.dev/concepts/how-it-works.md): AI-once, deterministic-forever approach to web scraping
- [Security-First Design](https://docs.scrapai.dev/concepts/security-first.md): Why ScrapAI uses config-only architecture instead of AI-generated code
- [Database Configuration](https://docs.scrapai.dev/configuration/database.md): Configure SQLite or PostgreSQL database for ScrapAI CLI
- [Environment Setup](https://docs.scrapai.dev/configuration/environment.md): Configure ScrapAI CLI environment variables and .env file
- [Proxy Configuration](https://docs.scrapai.dev/configuration/proxies.md): Configure smart proxy rotation for ScrapAI CLI to avoid blocking
- [S3 Storage Configuration](https://docs.scrapai.dev/configuration/s3-storage.md): Configure S3-compatible object storage for automatic crawl data backup
- [For Developers](https://docs.scrapai.dev/developers.md): ScrapAI is built for developers who want control without repetition
- [Spider Examples](https://docs.scrapai.dev/examples.md): Real-world examples of what the agent produces for different site types
- [AI-Assisted Maintenance](https://docs.scrapai.dev/guides/ai-assisted-maintenance.md): Automated testing to detect broken spiders and AI-assisted fixing with agent support
- [Checkpoint Pause/Resume](https://docs.scrapai.dev/guides/checkpoint-resume.md): Pause long-running crawls and resume them later without losing progress
- [Cloudflare Bypass](https://docs.scrapai.dev/guides/cloudflare-bypass.md): Handle Cloudflare-protected sites with browser verification and cookie caching
- [Custom Callbacks & Field Extraction](https://docs.scrapai.dev/guides/custom-callbacks.md): Extract structured data from products, jobs, listings, forums, and more
- [Data Processors](https://docs.scrapai.dev/guides/data-processors.md): Transform extracted values with 8 powerful processors for cleaning, casting, and formatting
- [Content Extractors](https://docs.scrapai.dev/guides/extractors.md): Configure extraction strategies for different content types and rendering methods
- [Incremental Crawling (DeltaFetch)](https://docs.scrapai.dev/guides/incremental-crawling.md): Skip unchanged pages on subsequent crawls to save time and resources
- [Smart Proxy Escalation](https://docs.scrapai.dev/guides/proxy-escalation.md): Minimize costs with intelligent proxy usage and expert-in-the-loop escalation
- [Queue Processing](https://docs.scrapai.dev/guides/queue-processing.md): Batch process multiple websites with priority management and status tracking
- [Installation](https://docs.scrapai.dev/installation.md): Install ScrapAI CLI on Linux, macOS, or Windows
- [Introduction](https://docs.scrapai.dev/introduction.md): ScrapAI is a CLI where you describe what you want to scrape in plain English, an AI agent builds the scraper, and Scrapy runs it.
- [Quick Start](https://docs.scrapai.dev/quickstart.md): Build your first scraper in 5 minutes with ScrapAI CLI
- [Roadmap](https://docs.scrapai.dev/roadmap.md): The future of ScrapAI - from database-first scraping to a shared spider marketplace