Ethical scraping
Defaults that don't get you sued — or banned at the network level.
Honor robots.txt
Enabled by default. Per-host fetched once and cached for 24 hours. To override per-host, ship a per-target manifest in your code that explicitly opts in (and document why).
Per-host rate limiting
Default: 2 concurrent + 500ms minimum delay per host. Honour Crawl-Delay when present. If a site returns 429, back off exponentially before retrying.
No PII without lawful basis
Don't scrape personal data without GDPR / CCPA-grade legal cover. The scraper has no opinion on what you crawl — but a lawyer should before you scrape phone numbers or biometrics.
No paywall bypass
Anti-bot bypass is for public content. Bypassing authentication, paywalls, or terms-protected zones is out of scope and not what we're optimizing for.
Audited proxies only
Use providers that publish ethical-sourcing audits — Bright Data, Decodo, IPRoyal, Oxylabs. Cheap residential pools often run on malware botnets; in many jurisdictions, using them is itself a crime.