LLM API Cost Calculator

Enter your usage profile and instantly compare what every major LLM provider would charge — per request and per month. The default scenario below is real, server-rendered output; change any field to recalculate live.

Advanced options

Estimated cost — per month

Shown in USD. Pricing last verified 2026-06-17.

API price = the provider's list price per 1M tokens (input → output). Your cost = estimate for the usage profile you set on the left (4,000 in + 1,000 out per request).

Showing 50 of 247 models.

Show:

Tip: click a column header to sort, or filter by category above. Sorted by cost per request by default.

Provider / Model Context API price / 1M in → out Your cost / request Your cost / month Links
Meta Llama 3.1 8B Instruct open-weight cheapest 131K $0.0200 $0.0300 $0.000110 $0.1100 Visit Review
Mistral Mistral Nemo fast 131K $0.0200 $0.0300 $0.000110 $0.1100 Visit
IBM Granite Granite 4.0 Micro open-weight 131K $0.0170 $0.1120 $0.000180 $0.1800 Visit Review
Liquid LFM2-24B-A2B open-weight 128K $0.0300 $0.1200 $0.000240 $0.2400 Visit Review
OpenAI gpt-oss-20b fast 131K $0.0290 $0.1400 $0.000256 $0.2560 Visit Review
Qwen Qwen2.5 7B Instruct open-weight 131K $0.0400 $0.1000 $0.000260 $0.2600 Visit Review
Amazon Nova Micro 1.0 fast 128K $0.0350 $0.1400 $0.000280 $0.2800 Visit
Mistral Mistral Small 3 fast 33K $0.0500 $0.0800 $0.000280 $0.2800 Visit
Cohere Command R7B (12-2024) fast 128K $0.0375 $0.1500 $0.000300 $0.3000 Visit
Google Gemma 3 4B fast 131K $0.0500 $0.1000 $0.000300 $0.3000 Visit
IBM Granite Granite 4.1 8B open-weight 131K $0.0500 $0.1000 $0.000300 $0.3000 Visit Review
Meta Llama 3.2 1B Instruct open-weight 131K $0.0270 $0.2010 $0.000309 $0.3090 Visit Review
OpenAI gpt-oss-120b fast 131K $0.0390 $0.1800 $0.000336 $0.3360 Visit Review
Google Gemma 3 12B fast 131K $0.0500 $0.1500 $0.000350 $0.3500 Visit
Google Gemma 3n 4B fast 33K $0.0600 $0.1200 $0.000360 $0.3600 Visit
Qwen Qwen3 30B A3B Instruct 2507 open-weight 131K $0.0482 $0.1931 $0.000386 $0.3859 Visit Review
NVIDIA Nemotron 3 Nano 30B A3B open-weight 262K $0.0500 $0.2000 $0.000400 $0.4000 Visit Review
Microsoft Phi 4 open-weight 16K $0.0650 $0.1400 $0.000400 $0.4000 Visit Review
Qwen Qwen3 235B A22B Instruct 2507 open-weight 262K $0.0900 $0.1000 $0.000460 $0.4600 Visit Review
Amazon Nova Lite 1.0 fast 300K $0.0600 $0.2400 $0.000480 $0.4800 Visit
Google Gemma 3 27B fast 131K $0.0800 $0.1600 $0.000480 $0.4800 Visit
Mistral Ministral 3 3B 2512 fast 131K $0.1000 $0.1000 $0.000500 $0.5000 Visit
Mistral Mistral Small 3.2 24B fast 128K $0.0750 $0.2000 $0.000500 $0.5000 Visit
Qwen Qwen3 235B A22B Thinking 2507 open-weight 262K $0.1000 $0.1000 $0.000500 $0.5000 Visit Review
Self-hosted Open-weight on your GPU (illustrative) open-weight 128K $0.1000 $0.1000 $0.000500 $0.5000 Visit Review
Qwen Qwen3.5-Flash open-weight 1M $0.0650 $0.2600 $0.000520 $0.5200 Visit Review
Tencent Hy3 preview open-weight 262K $0.0660 $0.2600 $0.000524 $0.5240 Visit Review
Meta Llama 3.2 3B Instruct open-weight 131K $0.0509 $0.3350 $0.000539 $0.5386 Visit Review
DeepSeek DeepSeek V4 Flash open-weight 1M $0.0900 $0.1800 $0.000540 $0.5400 Visit Review
Qwen Qwen3.5-9B open-weight 262K $0.1000 $0.1500 $0.000550 $0.5500 Visit Review
Qwen Qwen3 Coder 30B A3B Instruct open-weight 160K $0.0700 $0.2700 $0.000550 $0.5500 Visit Review
Google Gemma 4 26B A4B fast 262K $0.0600 $0.3300 $0.000570 $0.5700 Visit
OpenAI gpt-oss-safeguard-20b fast 131K $0.0750 $0.3000 $0.000600 $0.6000 Visit Review
OpenAI GPT-5 Nano fast 400K $0.0500 $0.4000 $0.000600 $0.6000 Visit Review
Qwen Qwen3 32B open-weight 131K $0.0800 $0.2800 $0.000600 $0.6000 Visit Review
Qwen Qwen3 8B open-weight 131K $0.0500 $0.4000 $0.000600 $0.6000 Visit Review
Qwen Qwen3 14B open-weight 132K $0.1000 $0.2400 $0.000640 $0.6400 Visit Review
Z.AI GLM 4.7 Flash open-weight 203K $0.0600 $0.4000 $0.000640 $0.6400 Visit Review
Microsoft Phi 4 Mini Instruct open-weight 131K $0.0800 $0.3500 $0.000670 $0.6700 Visit Review
Meta Llama 4 Scout open-weight 10M $0.1000 $0.3000 $0.000700 $0.7000 Visit Review
Mistral Voxtral Small 24B 2507 fast 32K $0.1000 $0.3000 $0.000700 $0.7000 Visit
Meta Llama 3 8B Instruct open-weight 8K $0.1400 $0.1400 $0.000700 $0.7000 Visit Review
Meta Llama 3.3 70B Instruct open-weight 131K $0.1000 $0.3200 $0.000720 $0.7200 Visit Review
Qwen Qwen3 30B A3B Thinking 2507 open-weight 131K $0.0800 $0.4000 $0.000720 $0.7200 Visit Review
Mistral Ministral 3 8B 2512 fast 262K $0.1500 $0.1500 $0.000750 $0.7500 Visit
Google Gemini 2.5 Flash Lite Preview 09-2025 fast 1M $0.1000 $0.4000 $0.000800 $0.8000 Visit
Google Gemini 2.5 Flash Lite fast 1M $0.1000 $0.4000 $0.000800 $0.8000 Visit
OpenAI GPT-4.1 Nano fast 1M $0.1000 $0.4000 $0.000800 $0.8000 Visit Review
NVIDIA Nemotron 3 Super open-weight 1M $0.0900 $0.4500 $0.000810 $0.8100 Visit Review
Qwen Qwen3 VL 8B Instruct open-weight 256K $0.0800 $0.5000 $0.000820 $0.8200 Visit Review

Estimates only. Actual bills depend on exact token counts, tier pricing and provider changes. Always confirm on the provider's pricing page.

How LLM API pricing works

Every major LLM provider bills by the token — a chunk of text roughly ¾ of a word in English. You pay separately for input tokens (everything you send: system prompt, retrieved context and the user message) and output tokens (what the model writes back). Output is typically priced two to five times higher than input, which is why concise responses save real money at scale.

The formula

cost_per_request = (input_tokens  / 1,000,000) × input_price_per_M
                 + (output_tokens / 1,000,000) × output_price_per_M
cost_per_period  = cost_per_request × requests_in_period

What moves the number

Frequently asked questions

How are LLM API costs calculated?

Providers bill per million tokens, separately for input (your prompt) and output (the model's response). Cost per request = (input tokens / 1,000,000 × input price) + (output tokens / 1,000,000 × output price). Multiply by your request volume for the period.

What is the difference between input and output tokens?

Input tokens are everything you send to the model — system prompt, context and user message. Output tokens are what the model generates. Output is usually priced several times higher than input, so response length matters a lot for cost.

Does prompt caching reduce cost?

Yes, where supported. Repeated prompt prefixes (e.g. a long system prompt) can be billed at a steep discount. The calculator lets you set what share of input tokens are cached.

Why do prices vary so much between providers?

Model size, hardware efficiency, context window, and business strategy all play a part. Frontier reasoning models cost the most; fast/cheap and open-weight self-hosted options can be orders of magnitude cheaper for suitable tasks.