Scaling

Concurrency tuning, queue partitioning, cost controls.

Concurrency tuning

Two knobs matter: CRAWL_MAX_CONCURRENCY (total in-flight requests) and CRAWL_PER_HOST_CONCURRENCY (per-domain cap). Start at 16 / 2; raise the per-host cap only when target sites tolerate it.

Queue partitioning

Redis Streams support consumer groups. Partition by host hash so the same domain always hits the same worker — keeps the per-host limiter accurate and warms HTTP/2 connections.

Cost controls

Cap max_tier per job to avoid surprise CAPTCHA spend
Set per-host min_delay to avoid thundering rate-limit walls
Use per-site selectors instead of LLM for known sites — 100× cheaper
Enable prompt caching on Claude — 90% input-token savings

Measured benchmarks

Numbers from the production-shape soak harness against a 15-target mixed-vendor URL list (baselines, Cloudflare, DataDome, Akamai, Reddit interstitial). Single host, 6 concurrent workers. Reproduce with scripts/soak.py. Excludes robots.txt-blocked URLs from the success-rate denominator.

Metric	8.6-min run	55-min run
Fetches	75	450
Bandwidth	13.2 MB	70.8 MB
Success rate	85%	89%
Tier 0 p50 / p95	1.9 s / 2.8 s	2.0 s / 3.0 s
Tier 1 p50 / p95	91 s / 102 s	75 s / 116 s
RSS peak / median / end	3.75 / 2.96 / 2.79 GB	4.35 / 3.20 / 1.60 GB

What this tells you about scaling:

No memory leak. The 55-minute run ended at 1.60 GB — well below the 3.20 GB median during the run — meaning the browser pool actively releases instances when concurrency drops. The 8.6-minute run plateaued at 2.79 GB because it never had enough idle time to recover.
Tier 0 latency stays flat. p50 and p95 moved less than 8% across a 6× longer run. p99 did grow (3.1 s → 10.6 s) — under sustained load you see more rare slow proxy hops; budget for that in your timeouts.
Tier 1 p50 got faster (-17%) because the browser pool keeps instances warm longer in a sustained run. Tier 1 p95 grows because you cycle through more cold-starts.
Per-Camoufox memory. ~700–900 MB resident per warm browser. The pool sizes itself to max_concurrency / 4; cap concurrency to bound total RAM.

When to add Tier 3

If your target list contains sites with behavioral scoring (PerimeterX, advanced Akamai, Kasada), the free FlareSolverr Tier 3 will lose. Switch to UNBLOCK_PROVIDER=brightdata or scrapfly — both run real browser farms purpose-built for this. Cost is per-success only; failed requests aren't billed by Bright Data.

← PREVIOUS

Observability

Ethical scraping

Measured benchmarks

Metric	8.6-min run	55-min run
Fetches	75	450
Bandwidth	13.2 MB	70.8 MB
Success rate	85%	89%
Tier 0 p50 / p95	1.9 s / 2.8 s	2.0 s / 3.0 s
Tier 1 p50 / p95	91 s / 102 s	75 s / 116 s
RSS peak / median / end	3.75 / 2.96 / 2.79 GB	4.35 / 3.20 / 1.60 GB

What this tells you about scaling:

No memory leak. The 55-minute run ended at 1.60 GB — well below the 3.20 GB median during the run — meaning the browser pool actively releases instances when concurrency drops. The 8.6-minute run plateaued at 2.79 GB because it never had enough idle time to recover.
Tier 0 latency stays flat. p50 and p95 moved less than 8% across a 6× longer run. p99 did grow (3.1 s → 10.6 s) — under sustained load you see more rare slow proxy hops; budget for that in your timeouts.
Tier 1 p50 got faster (-17%) because the browser pool keeps instances warm longer in a sustained run. Tier 1 p95 grows because you cycle through more cold-starts.
Per-Camoufox memory. ~700–900 MB resident per warm browser. The pool sizes itself to max_concurrency / 4; cap concurrency to bound total RAM.

When to add Tier 3