LLM API Cost Calculator
Enter your usage profile and instantly compare what every major LLM provider would charge — per request and per month. The default scenario below is real, server-rendered output; change any field to recalculate live.
Estimated cost — per month
Shown in USD. Pricing last verified 2026-06-17.
API price = the provider's list price per 1M tokens (input → output). Your cost = estimate for the usage profile you set on the left (4,000 in + 1,000 out per request).
Tip: click a column header to sort, or filter by category above. Sorted by cost per request by default.
| Provider / Model | Context | API price / 1M in → out | Your cost / request | Your cost / month | Links |
|---|---|---|---|---|---|
| Meta Llama 3.1 8B Instruct open-weight cheapest | 131K | $0.0200 → $0.0300 | $0.000110 | $0.1100 | Visit Review |
| Mistral Mistral Nemo fast | 131K | $0.0200 → $0.0300 | $0.000110 | $0.1100 | Visit |
| IBM Granite Granite 4.0 Micro open-weight | 131K | $0.0170 → $0.1120 | $0.000180 | $0.1800 | Visit Review |
| Liquid LFM2-24B-A2B open-weight | 128K | $0.0300 → $0.1200 | $0.000240 | $0.2400 | Visit Review |
| OpenAI gpt-oss-20b fast | 131K | $0.0290 → $0.1400 | $0.000256 | $0.2560 | Visit Review |
| Qwen Qwen2.5 7B Instruct open-weight | 131K | $0.0400 → $0.1000 | $0.000260 | $0.2600 | Visit Review |
| Amazon Nova Micro 1.0 fast | 128K | $0.0350 → $0.1400 | $0.000280 | $0.2800 | Visit |
| Mistral Mistral Small 3 fast | 33K | $0.0500 → $0.0800 | $0.000280 | $0.2800 | Visit |
| Cohere Command R7B (12-2024) fast | 128K | $0.0375 → $0.1500 | $0.000300 | $0.3000 | Visit |
| Google Gemma 3 4B fast | 131K | $0.0500 → $0.1000 | $0.000300 | $0.3000 | Visit |
| IBM Granite Granite 4.1 8B open-weight | 131K | $0.0500 → $0.1000 | $0.000300 | $0.3000 | Visit Review |
| Meta Llama 3.2 1B Instruct open-weight | 131K | $0.0270 → $0.2010 | $0.000309 | $0.3090 | Visit Review |
| OpenAI gpt-oss-120b fast | 131K | $0.0390 → $0.1800 | $0.000336 | $0.3360 | Visit Review |
| Google Gemma 3 12B fast | 131K | $0.0500 → $0.1500 | $0.000350 | $0.3500 | Visit |
| Google Gemma 3n 4B fast | 33K | $0.0600 → $0.1200 | $0.000360 | $0.3600 | Visit |
| Qwen Qwen3 30B A3B Instruct 2507 open-weight | 131K | $0.0482 → $0.1931 | $0.000386 | $0.3859 | Visit Review |
| NVIDIA Nemotron 3 Nano 30B A3B open-weight | 262K | $0.0500 → $0.2000 | $0.000400 | $0.4000 | Visit Review |
| Microsoft Phi 4 open-weight | 16K | $0.0650 → $0.1400 | $0.000400 | $0.4000 | Visit Review |
| Qwen Qwen3 235B A22B Instruct 2507 open-weight | 262K | $0.0900 → $0.1000 | $0.000460 | $0.4600 | Visit Review |
| Amazon Nova Lite 1.0 fast | 300K | $0.0600 → $0.2400 | $0.000480 | $0.4800 | Visit |
| Google Gemma 3 27B fast | 131K | $0.0800 → $0.1600 | $0.000480 | $0.4800 | Visit |
| Mistral Ministral 3 3B 2512 fast | 131K | $0.1000 → $0.1000 | $0.000500 | $0.5000 | Visit |
| Mistral Mistral Small 3.2 24B fast | 128K | $0.0750 → $0.2000 | $0.000500 | $0.5000 | Visit |
| Qwen Qwen3 235B A22B Thinking 2507 open-weight | 262K | $0.1000 → $0.1000 | $0.000500 | $0.5000 | Visit Review |
| Self-hosted Open-weight on your GPU (illustrative) open-weight | 128K | $0.1000 → $0.1000 | $0.000500 | $0.5000 | Visit Review |
| Qwen Qwen3.5-Flash open-weight | 1M | $0.0650 → $0.2600 | $0.000520 | $0.5200 | Visit Review |
| Tencent Hy3 preview open-weight | 262K | $0.0660 → $0.2600 | $0.000524 | $0.5240 | Visit Review |
| Meta Llama 3.2 3B Instruct open-weight | 131K | $0.0509 → $0.3350 | $0.000539 | $0.5386 | Visit Review |
| DeepSeek DeepSeek V4 Flash open-weight | 1M | $0.0900 → $0.1800 | $0.000540 | $0.5400 | Visit Review |
| Qwen Qwen3.5-9B open-weight | 262K | $0.1000 → $0.1500 | $0.000550 | $0.5500 | Visit Review |
| Qwen Qwen3 Coder 30B A3B Instruct open-weight | 160K | $0.0700 → $0.2700 | $0.000550 | $0.5500 | Visit Review |
| Google Gemma 4 26B A4B fast | 262K | $0.0600 → $0.3300 | $0.000570 | $0.5700 | Visit |
| OpenAI gpt-oss-safeguard-20b fast | 131K | $0.0750 → $0.3000 | $0.000600 | $0.6000 | Visit Review |
| OpenAI GPT-5 Nano fast | 400K | $0.0500 → $0.4000 | $0.000600 | $0.6000 | Visit Review |
| Qwen Qwen3 32B open-weight | 131K | $0.0800 → $0.2800 | $0.000600 | $0.6000 | Visit Review |
| Qwen Qwen3 8B open-weight | 131K | $0.0500 → $0.4000 | $0.000600 | $0.6000 | Visit Review |
| Qwen Qwen3 14B open-weight | 132K | $0.1000 → $0.2400 | $0.000640 | $0.6400 | Visit Review |
| Z.AI GLM 4.7 Flash open-weight | 203K | $0.0600 → $0.4000 | $0.000640 | $0.6400 | Visit Review |
| Microsoft Phi 4 Mini Instruct open-weight | 131K | $0.0800 → $0.3500 | $0.000670 | $0.6700 | Visit Review |
| Meta Llama 4 Scout open-weight | 10M | $0.1000 → $0.3000 | $0.000700 | $0.7000 | Visit Review |
| Mistral Voxtral Small 24B 2507 fast | 32K | $0.1000 → $0.3000 | $0.000700 | $0.7000 | Visit |
| Meta Llama 3 8B Instruct open-weight | 8K | $0.1400 → $0.1400 | $0.000700 | $0.7000 | Visit Review |
| Meta Llama 3.3 70B Instruct open-weight | 131K | $0.1000 → $0.3200 | $0.000720 | $0.7200 | Visit Review |
| Qwen Qwen3 30B A3B Thinking 2507 open-weight | 131K | $0.0800 → $0.4000 | $0.000720 | $0.7200 | Visit Review |
| Mistral Ministral 3 8B 2512 fast | 262K | $0.1500 → $0.1500 | $0.000750 | $0.7500 | Visit |
| Google Gemini 2.5 Flash Lite Preview 09-2025 fast | 1M | $0.1000 → $0.4000 | $0.000800 | $0.8000 | Visit |
| Google Gemini 2.5 Flash Lite fast | 1M | $0.1000 → $0.4000 | $0.000800 | $0.8000 | Visit |
| OpenAI GPT-4.1 Nano fast | 1M | $0.1000 → $0.4000 | $0.000800 | $0.8000 | Visit Review |
| NVIDIA Nemotron 3 Super open-weight | 1M | $0.0900 → $0.4500 | $0.000810 | $0.8100 | Visit Review |
| Qwen Qwen3 VL 8B Instruct open-weight | 256K | $0.0800 → $0.5000 | $0.000820 | $0.8200 | Visit Review |
Estimates only. Actual bills depend on exact token counts, tier pricing and provider changes. Always confirm on the provider's pricing page.
How LLM API pricing works
Every major LLM provider bills by the token — a chunk of text roughly ¾ of a word in English. You pay separately for input tokens (everything you send: system prompt, retrieved context and the user message) and output tokens (what the model writes back). Output is typically priced two to five times higher than input, which is why concise responses save real money at scale.
The formula
cost_per_request = (input_tokens / 1,000,000) × input_price_per_M
+ (output_tokens / 1,000,000) × output_price_per_M
cost_per_period = cost_per_request × requests_in_period
What moves the number
- Prompt caching. If you reuse a long prefix (a big system prompt, a fixed knowledge base), many providers bill those cached tokens at a fraction of the normal input price.
- Batch mode. Non-interactive workloads submitted in bulk often get a flat discount (commonly 0.5×) in exchange for slower, asynchronous delivery.
- Context tiers. Some providers raise the per-token price once a request exceeds a context threshold. The dataset supports tiered pricing for exactly this case.
- Self-hosting. Open-weight models on your own GPUs change the equation entirely: you trade per-token fees for fixed infrastructure cost and operational work. See our self-hosted cost breakdown.
Frequently asked questions
How are LLM API costs calculated?
Providers bill per million tokens, separately for input (your prompt) and output (the model's response). Cost per request = (input tokens / 1,000,000 × input price) + (output tokens / 1,000,000 × output price). Multiply by your request volume for the period.
What is the difference between input and output tokens?
Input tokens are everything you send to the model — system prompt, context and user message. Output tokens are what the model generates. Output is usually priced several times higher than input, so response length matters a lot for cost.
Does prompt caching reduce cost?
Yes, where supported. Repeated prompt prefixes (e.g. a long system prompt) can be billed at a steep discount. The calculator lets you set what share of input tokens are cached.
Why do prices vary so much between providers?
Model size, hardware efficiency, context window, and business strategy all play a part. Frontier reasoning models cost the most; fast/cheap and open-weight self-hosted options can be orders of magnitude cheaper for suitable tasks.