LLM API Cost Calculator

Enter your usage profile and instantly compare what every major LLM provider would charge — per request and per month. The default scenario below is real, server-rendered output; change any field to recalculate live.

Quick scenarios

Input tokens / request

Output tokens / request

Tip: you can type 4k, 100k or 1.5m.

Requests

per day per month

Model category

Display currency

Advanced options

Prompt caching Cached input % Batch mode discount

Estimated cost — per month

Shown in USD. Pricing last verified 2026-08-01.

API price = the provider's list price per 1M tokens (input → output). Your cost = estimate for the usage profile you set on the left (4,000 in + 1,000 out per request).

Showing 50 of 250 models.

Show:

Tip: click a column header to sort, or filter by category above. Sorted by cost per request by default.

Provider / Model	Context	API price / 1M in → out	Your cost / request	Your cost / month	Links
Mistral Mistral Nemo fast cheapest batch	131K	$0.0190 → $0.0300	$0.000106	$0.1060	Visit
IBM Granite Granite 4.0 Micro open-weight	131K	$0.0170 → $0.1120	$0.000180 ×1.7	$0.1800	Visit Review
OpenAI gpt-oss-20b fast batch	131K	$0.0300 → $0.1400	$0.000260 ×2.5	$0.2600	Visit Review
Amazon Nova Micro 1.0 fast	128K	$0.0350 → $0.1400	$0.000280 ×2.6	$0.2800	Visit
Meta Llama 3.1 8B Instruct open-weight cache	131K	$0.0500 → $0.0800	$0.000280 ×2.6	$0.2800	Visit Review
Mistral Mistral Small 3 fast batch	33K	$0.0500 → $0.0800	$0.000280 ×2.6	$0.2800	Visit
Cohere Command R7B (12-2024) fast	128K	$0.0375 → $0.1500	$0.000300 ×2.8	$0.3000	Visit
Google Gemma 3 4B fast batch	131K	$0.0500 → $0.1000	$0.000300 ×2.8	$0.3000	Visit
IBM Granite Granite 4.1 8B open-weight cache	131K	$0.0500 → $0.1000	$0.000300 ×2.8	$0.3000	Visit Review
Meta Llama 3.2 1B Instruct open-weight	60K	$0.0270 → $0.2010	$0.000309 ×2.9	$0.3090	Visit Review
OpenAI gpt-oss-120b fast batch	131K	$0.0370 → $0.1700	$0.000318 ×3.0	$0.3180	Visit Review
Google Gemma 3 12B fast batch	131K	$0.0500 → $0.1500	$0.000350 ×3.3	$0.3500	Visit
Google Gemma 3n 4B fast batch	33K	$0.0600 → $0.1200	$0.000360 ×3.4	$0.3600	Visit
Qwen Qwen3 30B A3B Instruct 2507 open-weight	262K	$0.0482 → $0.1931	$0.000386 ×3.6	$0.3859	Visit Review
NVIDIA Nemotron 3 Nano 30B A3B open-weight	262K	$0.0500 → $0.2000	$0.000400 ×3.8	$0.4000	Visit Review
Microsoft Phi 4 open-weight	16K	$0.0700 → $0.1400	$0.000420 ×4.0	$0.4200	Visit Review
Tencent Hy3 preview open-weight cache	262K	$0.0630 → $0.2100	$0.000462 ×4.4	$0.4620	Visit Review
Amazon Nova Lite 1.0 fast	300K	$0.0600 → $0.2400	$0.000480 ×4.5	$0.4800	Visit
Mistral Ministral 3 3B 2512 fast cache batch	131K	$0.1000 → $0.1000	$0.000500 ×4.7	$0.5000	Visit
Mistral Mistral Small 3.2 24B fast cache batch	256K	$0.0750 → $0.2000	$0.000500 ×4.7	$0.5000	Visit
Self-hosted Open-weight on your GPU (illustrative) open-weight	128K	$0.1000 → $0.1000	$0.000500 ×4.7	$0.5000	Visit Review
Qwen Qwen3.5-Flash open-weight	1M	$0.0650 → $0.2600	$0.000520 ×4.9	$0.5200	Visit Review
Meta Llama 3.2 3B Instruct open-weight	131K	$0.0500 → $0.3300	$0.000530 ×5.0	$0.5300	Visit Review
Qwen Qwen3.5-9B open-weight	262K	$0.1000 → $0.1500	$0.000550 ×5.2	$0.5500	Visit Review
Qwen Qwen3 Coder 30B A3B Instruct open-weight	262K	$0.0700 → $0.2700	$0.000550 ×5.2	$0.5500	Visit Review
OpenAI gpt-oss-safeguard-20b fast cache batch	131K	$0.0750 → $0.3000	$0.000600 ×5.7	$0.6000	Visit Review
OpenAI GPT-5 Nano fast cache batch	400K	$0.0500 → $0.4000	$0.000600 ×5.7	$0.6000	Visit Review
Qwen Qwen2.5 7B Instruct open-weight	33K	$0.1000 → $0.2000	$0.000600 ×5.7	$0.6000	Visit Review
Qwen Qwen3 32B open-weight	131K	$0.0800 → $0.2800	$0.000600 ×5.7	$0.6000	Visit Review
Google Gemma 4 26B A4B fast cache batch	262K	$0.0700 → $0.3400	$0.000620 ×5.8	$0.6200	Visit
Z.AI GLM 4.7 Flash open-weight cache	203K	$0.0600 → $0.4000	$0.000640 ×6.0	$0.6400	Visit Review
Meta Llama 4 Scout open-weight	1.3M	$0.1000 → $0.3000	$0.000700 ×6.6	$0.7000	Visit Review
Mistral Voxtral Small 24B 2507 fast cache batch	32K	$0.1000 → $0.3000	$0.000700 ×6.6	$0.7000	Visit
Google Gemma 4 31B fast batch	262K	$0.1000 → $0.3400	$0.000740 ×7.0	$0.7400	Visit
NVIDIA Nemotron 3 Super open-weight	1M	$0.0850 → $0.4000	$0.000740 ×7.0	$0.7400	Visit Review
Mistral Ministral 3 8B 2512 fast cache batch	262K	$0.1500 → $0.1500	$0.000750 ×7.1	$0.7500	Visit
Google Gemma 3 27B fast cache batch	262K	$0.0800 → $0.4500	$0.000770 ×7.3	$0.7700	Visit
Google Gemini 2.5 Flash Lite fast cache batch	1M	$0.1000 → $0.4000	$0.000800 ×7.5	$0.8000	Visit
OpenAI GPT-4.1 Nano fast cache batch	1M	$0.1000 → $0.4000	$0.000800 ×7.5	$0.8000	Visit Review
Qwen Qwen3 VL 32B Instruct open-weight	131K	$0.1040 → $0.4160	$0.000832 ×7.8	$0.8320	Visit Review
DeepSeek DeepSeek V4 Flash open-weight cache	1M	$0.1400 → $0.2800	$0.000840 ×7.9	$0.8400	Visit Review
Meta Llama Guard 4 12B open-weight	1M	$0.1800 → $0.1800	$0.000900 ×8.5	$0.9000	Visit Review
Qwen Qwen3 235B A22B Instruct 2507 open-weight	262K	$0.0900 → $0.5500	$0.000910 ×8.6	$0.9100	Visit Review
Meta Llama 3.3 70B Instruct open-weight	131K	$0.1300 → $0.4000	$0.000920 ×8.7	$0.9200	Visit Review
Qwen Qwen3 8B open-weight	131K	$0.1170 → $0.4550	$0.000923 ×8.7	$0.9230	Visit Review
Qwen Qwen3 VL 8B Instruct open-weight	262K	$0.1170 → $0.4550	$0.000923 ×8.7	$0.9230	Visit Review
Qwen Qwen3 30B A3B open-weight	131K	$0.1200 → $0.5000	$0.000980 ×9.2	$0.9800	Visit Review
Mistral Ministral 3 14B 2512 fast cache batch	262K	$0.2000 → $0.2000	$0.001000 ×9.4	$1.00	Visit
OpenAI GPT-5.6 Luna Pro balanced cache batch	1.1M	$0.1000 → $0.6000	$0.001000 ×9.4	$1.00	Visit Review
OpenAI GPT-5.6 Luna balanced cache batch	1.1M	$0.1000 → $0.6000	$0.001000 ×9.4	$1.00	Visit Review

Estimates only. Actual bills depend on exact token counts, tier pricing and provider changes. Always confirm on the provider's pricing page.

How LLM API pricing works

Every major LLM provider bills by the token — a chunk of text roughly ¾ of a word in English. You pay separately for input tokens (everything you send: system prompt, retrieved context and the user message) and output tokens (what the model writes back). Output is typically priced two to five times higher than input, which is why concise responses save real money at scale.

The formula

cost_per_request = (input_tokens  / 1,000,000) × input_price_per_M
                 + (output_tokens / 1,000,000) × output_price_per_M
cost_per_period  = cost_per_request × requests_in_period

What moves the number

Prompt caching. If you reuse a long prefix (a big system prompt, a fixed knowledge base), many providers bill those cached tokens at a fraction of the normal input price.
Batch mode. Non-interactive workloads submitted in bulk often get a flat discount (commonly 0.5×) in exchange for slower, asynchronous delivery.
Context tiers. Some providers raise the per-token price once a request exceeds a context threshold. The dataset supports tiered pricing for exactly this case.
Self-hosting. Open-weight models on your own GPUs change the equation entirely: you trade per-token fees for fixed infrastructure cost and operational work. See our self-hosted cost breakdown.

Frequently asked questions

How are LLM API costs calculated?

Providers bill per million tokens, separately for input (your prompt) and output (the model's response). Cost per request = (input tokens / 1,000,000 × input price) + (output tokens / 1,000,000 × output price). Multiply by your request volume for the period.

What is the difference between input and output tokens?

Input tokens are everything you send to the model — system prompt, context and user message. Output tokens are what the model generates. Output is usually priced several times higher than input, so response length matters a lot for cost.

Does prompt caching reduce cost?

Yes, where supported. Repeated prompt prefixes (e.g. a long system prompt) can be billed at a steep discount. The calculator lets you set what share of input tokens are cached.

Why do prices vary so much between providers?

Model size, hardware efficiency, context window, and business strategy all play a part. Frontier reasoning models cost the most; fast/cheap and open-weight self-hosted options can be orders of magnitude cheaper for suitable tasks.