Overview
ScrapAI CLI uses environment variables stored in a.env file in the project root. The file is automatically created during ./scrapai setup or can be manually created from .env.example.
The
.env file is gitignored by default. Never commit credentials to version control.Core Environment Variables
Data Directory
Directory where all scraped data, analysis, and artifacts are stored.
Database Configuration
Database connection string. Supports SQLite (default) and PostgreSQL.See Database Configuration for details.
Logging
Logging verbosity:
debug, info, warning, or error.Directory for log files.
Optional Services
Proxy Configuration
Username for datacenter proxy authentication.See Proxy Configuration for complete setup.
Password for datacenter proxy authentication.
Datacenter proxy server hostname.
Datacenter proxy server port.
Username for residential proxy authentication.
Password for residential proxy authentication.
Residential proxy server hostname.
Residential proxy server port.
S3 Storage Configuration
S3-compatible storage access key.See S3 Storage Configuration for complete setup.
S3-compatible storage secret key.
S3-compatible storage endpoint URL.
S3 bucket name for storing crawl results.
Airflow Configuration
Airflow web UI admin username (for docker-compose.airflow.yml).
Airflow web UI admin password.
User ID for Airflow processes in Docker.
Environment File Example
Loading Configuration
Environment variables are automatically loaded from.env when you run any ScrapAI command. Loading order: .env file → system environment variables → defaults.
Validation
Verify your configuration:Security Best Practices
- Use
.env.exampleas a template with placeholder values - Restrict permissions:
chmod 600 .env - Rotate credentials regularly
- Use different credentials per environment (
.env,.env.production,.env.test)
Troubleshooting
Changes not taking effect:- Verify
.envexists in project root - Check syntax (no spaces around
=) - Restart running processes