Tier routing: the mental model that cuts scraping costs by 80%

The default mistake when building a scraper is to start with a headless browser. It works on day one, looks smart, and turns out to be a budget bomb at scale.

A headless Chrome session is roughly 60× more expensive than a plain HTTP request — both in CPU and in proxy bandwidth. If you're paying $3.50/GB for residential IPs, every browser-rendered page is ~5MB instead of ~80KB. A 1M-page crawl is the difference between $280 and $17,000.

The route

Scrape's router starts every URL at the cheapest tier and only escalates when the response is actually blocked. In practice, ~80% of pages clear at Tier 0 (HTTP). Another ~15% pass Tier 1 (browser). Tier 2 (CAPTCHA) and Tier 3 (managed unblock) handle the long tail — usually under 5% combined.

Why this works

Most pages aren't actually protected. A Cloudflare badge in the corner doesn't mean every URL on that domain serves a challenge — usually only product detail pages, login flows, and pricing endpoints get the heavy treatment. Static catalogs, blog posts, and category pages return clean HTML to a real-Chrome TLS handshake.

Tuning the router

Cap max_tier per job to control your worst-case spend. If you set max_tier=0 the scraper will fail rather than escalate; sometimes that's exactly what you want for a cheap survey crawl.

Tier routing: the mental model that cuts scraping costs by 80%

The default mistake when building a scraper is to start with a headless browser. It works on day one, looks smart, and turns out to be a budget bomb at scale.

The route

Why this works

Tuning the router

Cap max_tier per job to control your worst-case spend. If you set max_tier=0 the scraper will fail rather than escalate; sometimes that's exactly what you want for a cheap survey crawl.