Anthropic Claude API Pricing & Cost Calculator (2026)
Current Anthropic Claude API pricing per million tokens — Opus, Sonnet, Haiku and Fable — with a live cost calculator, prompt-caching maths, worked examples and savings tips.
Anthropic's Claude API is a favourite for code-heavy and reasoning-heavy work, and its pricing rewards teams that structure their prompts well. More than any other major provider, the way you use Claude — especially prompt caching — determines what you actually pay. This page covers the lineup, the caching maths that defines a Claude bill, a worked example and how to keep costs down. The table above is pulled from a live feed; plug your own usage into the cost calculator for the real monthly number.
The Claude lineup
Claude is sold as a tiered family, each billed per million input and output tokens:
- Opus — the most capable tier, for the hardest reasoning and agentic coding. Priced accordingly, with the largest context (up to 1M tokens on the top models).
- Sonnet — the balanced workhorse most production traffic should default to: strong quality at a mid-tier price.
- Haiku — fast and inexpensive, for high-volume, simpler tasks.
- Fable — Anthropic's premium creative/agentic line, at the top of the price range.
The table above shows the live spread. As a rule, run the bulk of your traffic on Haiku/Sonnet and reserve Opus and Fable for work that genuinely needs them.
Prompt caching: the lever that defines a Claude bill
This is the single most important cost feature of the Claude API. If your requests share a long, stable prefix — a detailed system prompt, coding guidelines, a fixed knowledge base, a long document you ask several questions about — those cached tokens are billed at roughly one-tenth of the normal input rate on a cache hit.
There's a small nuance worth understanding. Writing to the cache costs slightly more than a normal input token (a one-time premium), while reading from it costs about 10%. So caching pays off once the same prefix is reused even a couple of times. For an agent that resends the same 8–16k-token system prompt on every call, this turns the input half of the bill from a major line item into a rounding error.
The catch: caching only helps the repeated prefix. The dynamic part of each request — the user's actual message, fresh retrieved context — is billed at full input price. Model a realistic cached percentage in the calculator (what share of your input is genuinely stable), not an optimistic one.
Batch processing
Bulk, non-interactive jobs — evaluations, offline summarisation, backfills — can go through the Batch API for roughly a 50% discount on both input and output, in exchange for asynchronous delivery. Caching and batch discounts stack.
A worked example
A coding agent sends a 12,000-token system prompt (tools, conventions, repo context) plus a 3,000-token user/task message, and gets back 1,500 output tokens, 10,000 times a month.
input = 15,000 tokens (12,000 stable + 3,000 dynamic)
output = 1,500 tokens
Without caching, you pay full input price on all 15,000 tokens every call. With caching, the 12,000-token prefix bills at ~10% after the first call — so your effective input drops to roughly 3,000 + 12,000×0.1 = 4,200 "billed" tokens, a ~72% cut on the input half. Output is unaffected. Run both scenarios in the calculator with caching on/off to see the difference on your own numbers.
How to cut your Claude bill
- Cache aggressively. Put stable instructions and context in a cacheable prefix and keep it stable between calls.
- Right-size the model. Default to Sonnet; use Haiku for simple, high-volume tasks; reserve Opus/Fable for the hard cases.
- Cap output. Output is the expensive half — prompt for concise answers and set output limits.
- Batch offline work. Evals and bulk jobs belong in the discounted tier.
- Trim dynamic context. Only the changing part is full-price, so tighten retrieval and summarise long histories.
When Claude is worth it
Claude's frontier models are consistently strong at long-context reasoning, careful instruction-following and code that respects existing conventions. For agentic coding — where the model reads files, plans and edits — reliability per token tends to be high, and the big context window makes it practical to feed whole files without aggressive chunking. A model that gets it right on the first try is cheaper than a cheaper model you call three times, so compare per-task cost, not just the sticker rate. Where Claude loses on pure price is high-volume simple work, where Gemini Flash-Lite or DeepSeek are dramatically cheaper.
Privacy note
Anthropic's commercial terms exclude API inputs and outputs from training by default, and enterprise options add controls. But if your requirement is "no data ever leaves our infrastructure", no managed API qualifies — that's where self-hosted open-weight inference becomes the honest comparison, on total cost of ownership.
Frequently asked questions
How much can prompt caching actually save? On workloads with a large stable prefix, it can cut the input portion of the bill by 70–90%. It does nothing for output tokens, so the overall saving depends on your input/output ratio.
Which Claude model should I default to? Sonnet for most production traffic, Haiku for simple high-volume tasks, Opus/Fable only for the hardest work. Don't run everyday traffic on the top tier.
Does Claude have a free tier? New accounts get a small amount of trial credit; sustained use is pay-as-you-go per token, which this calculator estimates.
Is Opus worth the premium over Sonnet? Only for tasks where the extra reasoning changes the outcome. Test both on your real workload and compare success rate and total calls, not just per-token price.
Read our hands-on Anthropic Claude API review, or compare Claude against every other provider in the LLM API cost calculator.
Prices are auto-refreshed from a live source and dated. Confirm current pricing on Anthropic's page before deciding.