- →Cloudflare Turnstile + reCAPTCHA v3 token injection via CapSolver
- →Behavioral simulation: Bezier mouse paths, jittered scroll easing
- →Per-host robots.txt enforcement with 24h cache
- →Job runner background tasks with SSE live progress
/ FIELD LOGS
Dispatches from
the dig site.
New strata, smarter heuristics, faster pipelines — every release we file.
- →Multi-tenant FastAPI backend with JWT cookie auth
- →Next.js 15 dashboard: jobs, fetches, extracted, exports
- →First admin user auto-promotion
- →Live job progress over Server-Sent Events
- →Claude Haiku 4.5 schema extractor with prompt caching
- →Markdown pipeline (selectolax) — token-efficient output
- →Per-site CSS selector registry with longest-suffix match
- →Camoufox integration with coherent fingerprint bundles
- →Browser session pool keyed by (proxy, fingerprint, host)
- →Tier router with auto-escalation on block detection
- →Tier 0 HTTP client with curl_cffi (real Chrome TLS)
- →Proxy provider abstraction (Decodo / IPRoyal / Bright Data)
- →SQLite storage with content-addressed raw HTML
- →Per-host rate limiting + concurrency control
- →OpenTelemetry / Prometheus metrics