Overview
ScrapAI CLI uses environment variables for configuration. All settings are stored in a.env file in the project root directory.
Initial Setup
The.env file is automatically created during setup:
-
Copy the example file:
-
Edit
.envwith your preferred settings - Restart any running processes to apply changes
The
.env file is gitignored by default. Never commit credentials to version control.Core Environment Variables
Data Directory
Directory where all scraped data, analysis, and artifacts are stored.Example:All crawl results, spider configurations, and project data are organized under this directory:
Database Configuration
Database connection string. Supports SQLite and PostgreSQL.SQLite (default):PostgreSQL:See Database Configuration for details.
Logging
Logging verbosity level.Options:
debug- Detailed debugging informationinfo- General informational messages (recommended)warning- Warning messages onlyerror- Error messages only
Directory for log files.Example:
Optional Services
Proxy Configuration
Username for datacenter proxy authentication.See Proxy Configuration for complete setup.
Password for datacenter proxy authentication.
Datacenter proxy server hostname.Example:
dc.yourproxy.comDatacenter proxy server port.Example:
10000 (rotating IPs)Username for residential proxy authentication.
Password for residential proxy authentication.
Residential proxy server hostname.Example:
residential.yourproxy.comResidential proxy server port.Example:
7000 (rotating residential IPs)S3 Storage Configuration
S3-compatible storage access key.See S3 Storage Configuration for complete setup.
S3-compatible storage secret key.
S3-compatible storage endpoint URL.Example:
https://fsn1.your-objectstorage.comS3 bucket name for storing crawl results.Example:
scrapai-crawlsAirflow Configuration
Airflow web UI admin username (for docker-compose.airflow.yml).
Airflow web UI admin password.
User ID for Airflow processes in Docker.
Environment File Example
Complete.env file with all available options:
Loading Configuration
Environment variables are automatically loaded from.env when you run any ScrapAI command:
.envfile in project root- System environment variables (override .env)
- Default values (if not set)
Validation
Verify your configuration is loaded correctly:- Python environment
- Database connectivity
- Required directories
- Optional service configuration
Security Best Practices
-
Use
.env.exampleas a template- Commit
.env.examplewith placeholder values - Keep actual credentials in
.envonly
- Commit
-
Restrict file permissions
-
Rotate credentials regularly
- Change database passwords
- Regenerate API keys
- Update proxy credentials
-
Use different credentials per environment
- Development:
.env - Production:
.env.production - Testing:
.env.test
- Development:
Troubleshooting
Changes not taking effect
-
Verify
.envfile exists in project root: -
Check file syntax (no spaces around
=): - Restart any running processes
Configuration not found
If ScrapAI can’t find your.env file: