Schema-driven extraction with Claude — and a 90% prompt cache

LLM extraction is the right answer when you're crawling many sites with a stable schema. Selectors break every time the source DOM changes; Claude reads the page semantically and fills your schema regardless.

The problem: at $3 / 1M input tokens, naive LLM extraction at scale gets expensive fast.

The fix: prompt caching

Anthropic's prompt cache lets you tag stable parts of the prompt — system instructions, JSON schema — as ephemeral cache entries. The first call writes the cache (90% surcharge). Subsequent calls within the cache window pay 10× less for those tokens.

filed under · cached extraction.python

msg = await client.messages.create(
    model="claude-haiku-4-5-20251001",
    system=[
        {"type": "text", "text": SYSTEM_PROMPT, "cache_control": {"type": "ephemeral"}},
        {"type": "text", "text": json.dumps(SCHEMA), "cache_control": {"type": "ephemeral"}},
    ],
    messages=[{"role": "user", "content": markdown}],
)

What we measured

On a 10,000-page product crawl: average input was 6,200 tokens, of which 5,800 were cacheable (system + schema). Cache hit rate after the first 50 pages: 99%. Total cost dropped from $186 to $19 — a 90% reduction.

Schema-driven extraction with Claude — and a 90% prompt cache

The problem: at $3 / 1M input tokens, naive LLM extraction at scale gets expensive fast.

The fix: prompt caching

filed under · cached extraction.python

msg = await client.messages.create(
    model="claude-haiku-4-5-20251001",
    system=[
        {"type": "text", "text": SYSTEM_PROMPT, "cache_control": {"type": "ephemeral"}},
        {"type": "text", "text": json.dumps(SCHEMA), "cache_control": {"type": "ephemeral"}},
    ],
    messages=[{"role": "user", "content": markdown}],
)