NEWStratum 02 — CAPTCHA injection is live

/ ISSUE 047 · APRIL 2026

The web,
excavated.

Strip the surface. Read the strata. Extract the signal. Production-grade scraping infrastructure that escalates only when blocked.

Begin excavation Read the manual

· No credit card · Self-hostable · Apache-2.0 licensed

filed under · hero.bash

# Stratum 0 — Surface
$ scrape crawl https://target.example/p \
    --max-tier 2 --llm --schema product.yaml

╭─ field report ─────────────────────────────────────╮
│ 1,243 / 1,248 succeeded               ↑ 99.6%       │
├─ stratum 0  HTTP             1,012 pages   $1.01   │
├─ stratum 1  Browser            218 pages   $1.09   │
├─ stratum 2  + CAPTCHA           13 pages   $0.26   │
╰─ filed: results.json                       $2.36 ──╯

FIELD CREW · TEAMS DIGGING WITH SCRAPE

ACMEVECTORCATALOGHELIXNORTHBOUNDLESSFJORDLUMENSTRATAORBITACMEVECTORCATALOGHELIXNORTHBOUNDLESSFJORDLUMENSTRATAORBIT

§ 01 · THE FOUR STRATA

Cheap first.
Deep when forced.

Every URL starts at Stratum 00 — plain HTTP. The block detector reads each response. If a wall appears, the router escalates one stratum at a time. You never pay for a depth you didn't need.

/00

Surface

Plain HTTP with real-Chrome TLS, JA3/JA4+, HTTP/2 frame ordering. ~50ms per request, $0.001 per page.

/01

Subsurface

Camoufox — Firefox patched at the C++ level. Coherent fingerprint bundles, behavioral simulation.

/02

Deep

Browser plus token-injection CAPTCHA. Cloudflare Turnstile, reCAPTCHA v3, hCaptcha — solved in ~5s.

/03

Bedrock

Managed unblock fallback (Scrapfly, Bright Data). The last 3% — when the first three strata won't yield.

§ 02 · INVENTORY

Everything in one box.

No glue code. No second vendor. No CSV sync at 3am. The full pipeline ships in a single binary and a single SQLite file.

TLS · JA3 / JA4+

/01

Real-browser handshakes via curl_cffi. Akamai hash matched, extension ordering preserved.

Stealth browsers

/02

Camoufox + Nodriver. Coherent UA, screen, timezone, locale, fonts, WebGL — sourced from real devices.

Sticky proxies

/03

Per-(host,fp) sticky residential sessions, automatic health scoring, 10-minute cooldowns on burned IPs.

CAPTCHA solving

/04

Turnstile, reCAPTCHA v3, hCaptcha. Token injected & resubmitted automatically.

AI extraction

/05

Claude with prompt caching. Drop in a JSON schema; get clean rows. Cache hit ratio = your discount.

Built for scale

/06

Per-host rate limiter, async fan-out, content-addressed raw HTML, full Prometheus surface.

99.6%

AVG SUCCESS

<50ms

STRATUM 0 LATENCY

$0.001

PER FETCH

4 STRATA

OF ESCALATION

§ 03 · METHOD

From URL to clean row.

Three operations. No tier-selection logic to maintain. The router decides; you receive.

Submit URLs

Paste in the dashboard, POST to the API, or pipe from your CSV. Pick a max stratum.

II.

We dig

Each URL starts at the surface. The router escalates only on confirmed block. ~80% never leave Stratum 0.

III.

Receive findings

Stream results live, download JSON / CSV, or push to your webhook. Fully typed.

§ 04 · INSTRUMENTATION

For people who ship.

A typed Python SDK. An OpenAPI 3.1 REST surface. SSE for live progress. HMAC-signed webhooks. Per-job extraction schemas. Use the dashboard for ad-hoc digs; wire the API into your pipeline for everything else.

→Idempotent job submission with content hashing
→Server-Sent Events for live progress
→Webhooks on completion (HMAC signed)
→Per-job extraction schemas
→JSON, CSV, NDJSON exports
→Per-fetch cost telemetry — proxy bytes & solver $

API reference Tutorials

filed under · excavate.python

from scrape import Client, Stratum

client = Client(api_key="sk_live_...")

job = client.jobs.create(
    name="product prices · q2",
    urls=[f"https://shop.example.com/p/{s}" for s in skus],
    max_stratum=Stratum.DEEP,
    schema={
        "title":    "string",
        "price":    "number",
        "currency": "string",
        "in_stock": "boolean",
    },
)

# Stream findings as they're excavated
for row in client.jobs.stream(job.id):
    print(row.url, row.data["price"])

§ 04½ · COST LEDGER

You see every penny.

Every fetch records what it actually cost: residential proxy bytes plus paid CAPTCHA solver USD. Per row in storage, per tier in Prometheus, per job in the dashboard. No surprise invoice.

scrape_proxy_bytes_total
scrape_solver_cost_usd_total

Tier

Wall time

Proxy traffic

USD / page

00 · Surface

~50 ms

~80 KB

~$0.0001

01 · Browser

60–90 s

~5 MB

$0.002–$0.02

02 · CAPTCHA

+3–10 s

$0.001–$0.003

03 · FlareSolverr

60–120 s

0 (own browser)

03 · Bright Data

5–30 s

0 (own farm)

~$0.003

03 · Scrapfly

5–30 s

0 (own farm)

$0.001–$0.025

§ 05 · CHARTER

PRINCIPLE 01

Built ethically

Honors robots.txt by default. Per-host rate limits enforced before egress. Never auth-walled or PII content.

PRINCIPLE 02

Audited proxies

Bright Data, Decodo, IPRoyal, Oxylabs. Providers with documented ethical-sourcing audits, never malware botnets.

PRINCIPLE 03

Open source

Apache-2.0 licensed. Self-host or use the managed cloud — same code. Auditable, forkable, yours.

§ 06 · QUESTIONS · ANSWERED

Common questions, plain answers.

If you're picking a scraping stack and trying to understand whether Scrape fits, start here. Linked sources for everything.

What makes Scrape different from a simple HTTP scraper or a headless browser?→

A single-tier scraper either over-pays (everything goes through a $0.02 browser) or under-delivers (gets blocked on protected pages). Scrape routes each URL through four tiers — TLS-impersonated HTTP, stealth browser, CAPTCHA solver, managed unblock — and only pays for the depth a given page actually requires. ~80% of pages clear at Tier 0.

Does Scrape handle Cloudflare, DataDome, and PerimeterX?→

Yes. Cloudflare Turnstile and DataDome are typically beaten by Tier 0 (curl_cffi + residential IP) or Tier 1 (Camoufox stealth Firefox). PerimeterX and other behavioral-scoring vendors require Tier 3 — a commercial managed unblocker (Bright Data Web Unlocker or Scrapfly) wired in via UNBLOCK_PROVIDER.

Can I self-host Scrape?→

Yes — the entire stack is Apache-2.0 licensed and ships as a Docker Compose file. The free FlareSolverr container can serve as Tier 3, and Ollama can replace Anthropic for LLM extraction. Self-hosting needs zero paid services.

Does Scrape respect robots.txt?→

By default, yes. CRAWL_RESPECT_ROBOTS=true is the shipped default and the orchestrator skips disallowed URLs. Operators can opt out per deployment, but the default is ethical-by-design.

What does it cost to run at scale?→

At ~80% Tier-0 success on 1M pages, total spend is ~$280 (proxy bandwidth + LLM extraction with prompt caching). The same workload through a browser-only scraper is ~$17,000 — Scrape's tier router exists specifically to avoid that bill.

DIRECTIVE

Begin your first dig
in under sixty seconds.

The free tier ships with 10,000 monthly fetches at Stratum 00. No credit card. First user becomes the admin.

Get access View plans

# Stratum 0 — Surface $ scrape crawl https://target.example/p \ --max-tier 2 --llm --schema product.yaml ╭─ field report ─────────────────────────────────────╮ │ 1,243 / 1,248 succeeded ↑ 99.6% │ ├─ stratum 0 HTTP 1,012 pages $1.01 │ ├─ stratum 1 Browser 218 pages $1.09 │ ├─ stratum 2 + CAPTCHA 13 pages $0.26 │ ╰─ filed: results.json $2.36 ──╯

For people who ship.

→Idempotent job submission with content hashing

→Server-Sent Events for live progress

→Webhooks on completion (HMAC signed)

→Per-job extraction schemas

→JSON, CSV, NDJSON exports

→Per-fetch cost telemetry — proxy bytes & solver $

from scrape import Client, Stratum client = Client(api_key="sk_live_...") job = client.jobs.create( name="product prices · q2", urls=[f"https://shop.example.com/p/{s}" for s in skus], max_stratum=Stratum.DEEP, schema={ "title": "string", "price": "number", "currency": "string", "in_stock": "boolean", }, ) # Stream findings as they're excavated for row in client.jobs.stream(job.id): print(row.url, row.data["price"])

The web,excavated.

Cheap first.Deep when forced.

Everything in one box.

From URL to clean row.

For people who ship.

You see every penny.

Common questions, plain answers.

Begin your first digin under sixty seconds.

The web,excavated.

Cheap first.Deep when forced.

Everything in one box.

From URL to clean row.

For people who ship.

You see every penny.

Common questions, plain answers.

Begin your first digin under sixty seconds.

The web,
excavated.

Cheap first.
Deep when forced.

Begin your first dig
in under sixty seconds.

The web,
excavated.

Cheap first.
Deep when forced.

Begin your first dig
in under sixty seconds.