LLM extraction is the right answer when you're crawling many sites with a stable schema. Selectors break every time the source DOM changes; Claude reads the page semantically and fills your schema regardless.
The problem: at $3 / 1M input tokens, naive LLM extraction at scale gets expensive fast.
The fix: prompt caching
Anthropic's prompt cache lets you tag stable parts of the prompt — system instructions, JSON schema — as ephemeral cache entries. The first call writes the cache (90% surcharge). Subsequent calls within the cache window pay 10× less for those tokens.
msg = await client.messages.create(
model="claude-haiku-4-5-20251001",
system=[
{"type": "text", "text": SYSTEM_PROMPT, "cache_control": {"type": "ephemeral"}},
{"type": "text", "text": json.dumps(SCHEMA), "cache_control": {"type": "ephemeral"}},
],
messages=[{"role": "user", "content": markdown}],
)What we measured
On a 10,000-page product crawl: average input was 6,200 tokens, of which 5,800 were cacheable (system + schema). Cache hit rate after the first 50 pages: 99%. Total cost dropped from $186 to $19 — a 90% reduction.