Scraping ethically in 2026 (post-hiQ, post-AI Act)

The legal landscape for web scraping has shifted twice in the last 18 months — once when the Ninth Circuit re-affirmed hiQ in 2025, and again when the EU AI Act's training-data transparency requirements took effect.

What didn't change

Public data is still public. The CFAA does not criminalize scraping pages that don't require authentication. Honoring robots.txt is still industry best practice but not a legal obligation in the US.

What did change

GDPR has been weaponized against scrapers more aggressively post-Clearview. If you collect personal data of EU residents — phone numbers, email addresses, social profiles — you need a documented lawful basis and a way for subjects to exercise erasure rights. Most scrapers fail this test.

The EU AI Act's Article 53 requires "sufficiently detailed summary" of training data sources for general-purpose AI models. If your scraper feeds a model that's deployed in the EU, your sourcing has to be auditable end-to-end.

Defaults that ship the right way

Honor robots.txt by default; require explicit per-host opt-out
Per-host rate limiting in the standard library, not optional
No scraping behind auth or paywall — refuse the request
Use only proxy providers with audited ethical sourcing

Scraping ethically in 2026 (post-hiQ, post-AI Act)

What didn't change

Public data is still public. The CFAA does not criminalize scraping pages that don't require authentication. Honoring robots.txt is still industry best practice but not a legal obligation in the US.

What did change

Defaults that ship the right way

Honor robots.txt by default; require explicit per-host opt-out
Per-host rate limiting in the standard library, not optional
No scraping behind auth or paywall — refuse the request
Use only proxy providers with audited ethical sourcing