# ScrapAI ## Docs - [Apache Airflow Integration](https://docs.scrapai.dev/advanced/airflow-scheduling.md): Production scheduling and monitoring with Apache Airflow - [Migrating Existing Scrapers](https://docs.scrapai.dev/advanced/migration.md): Convert Scrapy, BeautifulSoup, and other scrapers to ScrapAI - [Parallel Crawling](https://docs.scrapai.dev/advanced/parallel-crawling.md): Run multiple spiders simultaneously with GNU parallel - [Security](https://docs.scrapai.dev/advanced/security.md): Security validation, SSRF protection, and agent safety - [Claude Code Integration](https://docs.scrapai.dev/agents/claude-code.md): Complete guide for using Claude Code with ScrapAI, including setup, permissions, and workflow - [AI Agents Overview](https://docs.scrapai.dev/agents/overview.md): Use AI agents to automatically analyze websites and generate production-ready scrapers - [4-Phase Workflow](https://docs.scrapai.dev/agents/workflow.md): Complete workflow documentation for building scrapers with AI agents - [Callbacks](https://docs.scrapai.dev/api/callbacks.md): Custom field extraction for structured data - [Custom CSS Extractors](https://docs.scrapai.dev/api/custom-extractors.md): Site-specific extraction using CSS selectors - [Extractors Overview](https://docs.scrapai.dev/api/extractors-overview.md): Content extraction strategies and fallback order - [Newspaper4k Extractor](https://docs.scrapai.dev/api/newspaper.md): General-purpose article extraction for news and blogs - [Playwright Extractor](https://docs.scrapai.dev/api/playwright.md): Browser rendering for JavaScript-heavy sites - [Spider Rules](https://docs.scrapai.dev/api/rules.md): URL matching and routing configuration - [Spider Settings](https://docs.scrapai.dev/api/settings.md): Configuration options for spider behavior - [Spider JSON Schema](https://docs.scrapai.dev/api/spider-schema.md): Complete spider configuration schema reference - [Trafilatura Extractor](https://docs.scrapai.dev/api/trafilatura.md): Lightweight, high-accuracy content extraction - [Crawl Commands](https://docs.scrapai.dev/cli/crawl.md): Run spiders in test and production mode with checkpoint support - [Data Commands](https://docs.scrapai.dev/cli/data.md): View and export scraped items from database - [Database Commands](https://docs.scrapai.dev/cli/database.md): Migrations, queries, statistics, and data transfer - [Inspect Command](https://docs.scrapai.dev/cli/inspect.md): Analyze websites to help create scrapers - [CLI Overview](https://docs.scrapai.dev/cli/overview.md): Complete command-line interface reference for ScrapAI - [Projects Commands](https://docs.scrapai.dev/cli/projects.md): List and manage project organization - [Queue Management](https://docs.scrapai.dev/cli/queue.md): Batch processing with database-backed queue and atomic locking - [Setup & Verification](https://docs.scrapai.dev/cli/setup.md): Installation, environment setup, and verification commands - [Spider Management](https://docs.scrapai.dev/cli/spiders.md): Import, list, and delete spider configurations - [ScrapAI vs Other Tools](https://docs.scrapai.dev/comparison.md): A practical comparison of ScrapAI, Scrapling, and crawl4ai for anyone choosing a scraping tool - [Architecture](https://docs.scrapai.dev/concepts/architecture.md): System components and data flow in ScrapAI CLI - [Database-First Spider Management](https://docs.scrapai.dev/concepts/database-first.md): Why spiders live in the database, not in Python files - [How It Works](https://docs.scrapai.dev/concepts/how-it-works.md): AI-once, deterministic-forever approach to web scraping - [Security-First Design](https://docs.scrapai.dev/concepts/security-first.md): Why ScrapAI uses config-only architecture instead of AI-generated code - [Database Configuration](https://docs.scrapai.dev/configuration/database.md): Configure SQLite or PostgreSQL database for ScrapAI CLI - [Environment Setup](https://docs.scrapai.dev/configuration/environment.md): Configure ScrapAI CLI environment variables and .env file - [Proxy Configuration](https://docs.scrapai.dev/configuration/proxies.md): Configure smart proxy rotation for ScrapAI CLI to avoid blocking - [S3 Storage Configuration](https://docs.scrapai.dev/configuration/s3-storage.md): Configure S3-compatible object storage for automatic crawl data backup - [For Developers](https://docs.scrapai.dev/developers.md): ScrapAI is built for developers who want control without repetition - [Spider Examples](https://docs.scrapai.dev/examples.md): Real-world examples of what the agent produces for different site types - [AI-Assisted Maintenance](https://docs.scrapai.dev/guides/ai-assisted-maintenance.md): Automated testing to detect broken spiders and AI-assisted fixing with agent support - [Checkpoint Pause/Resume](https://docs.scrapai.dev/guides/checkpoint-resume.md): Pause long-running crawls and resume them later without losing progress - [Cloudflare Bypass](https://docs.scrapai.dev/guides/cloudflare-bypass.md): Handle Cloudflare-protected sites with browser verification and cookie caching - [Custom Callbacks & Field Extraction](https://docs.scrapai.dev/guides/custom-callbacks.md): Extract structured data from products, jobs, listings, forums, and more - [Data Processors](https://docs.scrapai.dev/guides/data-processors.md): Transform extracted values with 8 powerful processors for cleaning, casting, and formatting - [Content Extractors](https://docs.scrapai.dev/guides/extractors.md): Configure extraction strategies for different content types and rendering methods - [Incremental Crawling (DeltaFetch)](https://docs.scrapai.dev/guides/incremental-crawling.md): Skip unchanged pages on subsequent crawls to save time and resources - [Smart Proxy Escalation](https://docs.scrapai.dev/guides/proxy-escalation.md): Minimize costs with intelligent proxy usage and expert-in-the-loop escalation - [Queue Processing](https://docs.scrapai.dev/guides/queue-processing.md): Batch process multiple websites with priority management and status tracking - [Installation](https://docs.scrapai.dev/installation.md): Install ScrapAI CLI on Linux, macOS, or Windows - [Introduction](https://docs.scrapai.dev/introduction.md): ScrapAI is a CLI where you describe what you want to scrape in plain English, an AI agent builds the scraper, and Scrapy runs it. - [Quick Start](https://docs.scrapai.dev/quickstart.md): Build your first scraper in 5 minutes with ScrapAI CLI - [Roadmap](https://docs.scrapai.dev/roadmap.md): The future of ScrapAI - from database-first scraping to a shared spider marketplace