Output formats

JSON, CSV, NDJSON. Sinks: filesystem, Postgres, S3-compatible.

From the dashboard

Each job has Download JSON and Download CSV buttons. The CSV builds a column union from all extracted rows so heterogeneous schemas still produce a flat table.

From the API

filed under · bash.bash

curl -b cookies.txt http://localhost:8000/api/jobs/JOB_ID/export.json -o results.json
curl -b cookies.txt http://localhost:8000/api/jobs/JOB_ID/export.csv -o results.csv

Streaming with SSE

The /api/jobs/JOB_ID/events endpoint emits Server-Sent Events for live progress. Use it to drive UIs or fan out to downstream consumers as rows complete.

Direct DB access

By default everything lives in data/scrape.db. Tables: users, jobs, fetches, extracted. Raw HTML is content-addressed in data/raw/.

← PREVIOUS

LLM extraction

Deployment