API Pricing

OpenAI API Pricing & Cost Calculator (2026)

Up-to-date OpenAI API pricing per million tokens for the GPT-5 family, with a live cost calculator. Input, output, cached and batch pricing, worked examples and how to cut your bill.

Updated 2026-06-17 · prices auto-refreshed

Compare OpenAI in the calculator → Visit OpenAI

OpenAI's API is the broadest, most widely-integrated LLM platform on the market, and for most teams the real question isn't whether it can do the job but which model in the GPT-5 ladder is the right cost/quality point — and how to keep the bill predictable as you scale. This page explains how OpenAI prices its API, walks through a worked example, and lists the levers that actually move the number. The prices in the table above are pulled from a live feed and refreshed automatically; plug your own token volume into the cost calculator to see your real monthly spend.

How OpenAI prices its API

Like every major provider, OpenAI bills per token — a chunk of text roughly ¾ of a word in English — and charges separately for input (everything you send: system prompt, retrieved context, the user message) and output (what the model generates). Prices are quoted per million tokens. Output is consistently the more expensive half, typically several times the input rate, which is why response length is such a large cost driver.

Three structural factors shape an OpenAI bill:

Model tier. The GPT-5 family spans from the flagship down to mini and nano variants that cost a small fraction as much, plus pro reasoning variants at the very top that cost many times the flagship. The table above shows the live spread.
Cached input. Repeated prompt prefixes — a long fixed system prompt, a stable instruction block — are billed at a steep discount, typically around one-tenth of the normal input rate. You only pay the discount on the repeated portion; the dynamic part of each prompt is full price.
Batch tier. Non-interactive workloads submitted through the Batch API receive roughly a 50% discount in exchange for asynchronous delivery.

The model ladder, and which to use

Treating the lineup as a ladder is the single most important cost habit:

nano / mini — classification, routing, extraction, short rewrites, simple chat. Often good enough for the majority of production traffic, at a tiny fraction of the flagship price.
Flagship GPT-5 — the default for general reasoning, coding and anything user-facing where quality matters.
pro reasoning — reserve for genuinely hard problems (complex multi-step reasoning, difficult debugging). At several times the flagship price, sending everyday traffic here is the most common way teams overspend.

The biggest mistake is routing everything to the top model out of habit. A cheap classifier — or even a heuristic on input length and task type — that picks the right rung per request routinely halves spend on mixed workloads.

A worked example

Take a customer-support assistant: 2,000 input tokens (system prompt + retrieved article + user question) and 400 output tokens per reply, 100,000 replies a month.

cost_per_reply = (2,000 / 1,000,000 × input_price)
               + (400  / 1,000,000 × output_price)
monthly_cost   = cost_per_reply × 100,000

Run those numbers against the flagship and against mini in the calculator and the gap is usually an order of magnitude. Now add prompt caching: if 1,500 of those input tokens are a fixed system prompt, caching bills them at ~10% — the input half of the bill nearly disappears. This is why "which model + caching" matters far more than the headline per-token price.

How to cut your OpenAI bill

Cap output length. Output is the expensive half; set sensible max_tokens and prompt for concise answers.
Route by difficulty. A small model picking the rung saves more than any negotiation.
Cache the stable prefix. Long system prompts and fixed context are ideal candidates.
Batch the offline work. Evals, backfills and bulk classification belong in the discounted tier.
Trim retrieved context. RAG pipelines often stuff far more context than the model needs; tighter retrieval cuts input tokens directly.
Shorten chat history. The full transcript is re-sent every turn, so cost grows with conversation length — summarise or window old turns.

OpenAI vs the alternatives

OpenAI is the low-risk default thanks to ecosystem maturity: the SDKs, tool/function calling, structured outputs and assistant tooling are the most polished, and almost every third-party library targets it first. If "everything already integrates with it" matters, it's the safe pick.

On raw price, though, it is rarely the cheapest. For high-volume simple work, Google's Gemini Flash-Lite and DeepSeek often undercut it substantially. For agentic coding, it's worth A/B testing against Anthropic's Claude, because per-token price tells you little about per-task cost when first-try accuracy differs — a model that solves the task once beats a cheaper one you call three times.

Privacy and data handling

Standard API terms exclude inputs and outputs from training, and enterprise agreements add further controls. As with any managed API, though, "data never leaves our infrastructure" is not on the menu — for that requirement, compare against self-hosted open-weight inference on total cost of ownership.

Frequently asked questions

Is the OpenAI API cheaper than ChatGPT Plus? They're different products. ChatGPT is a flat monthly subscription for the chat app; the API is usage-based and billed per token. For programmatic use you want the API, and this page's calculator estimates that usage cost.

What's the difference between mini and the flagship? mini (and nano) are smaller, faster and far cheaper, tuned for simpler tasks. The flagship is stronger on hard reasoning. Route easy traffic to the small models and reserve the flagship for where quality matters.

How much does prompt caching save? It bills the repeated prefix at roughly one-tenth of the input rate. The bigger and more stable your prefix (long system prompts, fixed context), the larger the saving — model a realistic cached percentage in the calculator.

Do these prices include taxes? No. Prices are list prices excluding any applicable tax, and enterprise discounts may apply at volume.

Read our hands-on OpenAI API review for the verdict, or compare every OpenAI model against rivals in the LLM API cost calculator.

Prices are auto-refreshed from a live source and dated for transparency. Always confirm current pricing on OpenAI's own page before committing.

Model	Context	API price / 1M in → out	Sample cost / request
gpt-oss-20b fast	131K	$0.0290 → $0.1400	$0.000256
gpt-oss-120b fast	131K	$0.0390 → $0.1800	$0.000336
gpt-oss-safeguard-20b fast	131K	$0.0750 → $0.3000	$0.000600
GPT-5 Nano fast	400K	$0.0500 → $0.4000	$0.000600
GPT-4.1 Nano fast	1M	$0.1000 → $0.4000	$0.000800
GPT-4o-mini Search Preview fast	128K	$0.1500 → $0.6000	$0.001200
GPT-4o-mini (2024-07-18) fast	128K	$0.1500 → $0.6000	$0.001200
GPT-4o-mini fast	128K	$0.1500 → $0.6000	$0.001200
GPT-5.4 Nano fast	400K	$0.2000 → $1.25	$0.002050
GPT-5.1-Codex-Mini fast	400K	$0.2500 → $2.00	$0.003000
GPT-5 Mini fast	400K	$0.2500 → $2.00	$0.003000
GPT-4.1 Mini fast	1M	$0.4000 → $1.60	$0.003200
GPT-3.5 Turbo fast	16K	$0.5000 → $1.50	$0.003500
GPT Audio Mini fast	128K	$0.6000 → $2.40	$0.004800
GPT-3.5 Turbo (older v0613) fast	4K	$1.00 → $2.00	$0.006000
GPT-5.4 Mini fast	400K	$0.7500 → $4.50	$0.007500
GPT-3.5 Turbo Instruct fast	4K	$1.50 → $2.00	$0.008000
o4 Mini High fast	200K	$1.10 → $4.40	$0.008800
o4 Mini fast	200K	$1.10 → $4.40	$0.008800
o3 Mini High fast	200K	$1.10 → $4.40	$0.008800
o3 Mini fast	200K	$1.10 → $4.40	$0.008800
GPT-5 Image Mini fast	400K	$2.50 → $2.00	$0.0120
GPT-5.1-Codex-Max balanced	400K	$1.25 → $10.00	$0.0150
GPT-5.1 balanced	400K	$1.25 → $10.00	$0.0150
GPT-5.1 Chat balanced	128K	$1.25 → $10.00	$0.0150
GPT-5.1-Codex balanced	400K	$1.25 → $10.00	$0.0150
GPT-5 Codex balanced	400K	$1.25 → $10.00	$0.0150
GPT-5 Chat balanced	128K	$1.25 → $10.00	$0.0150
GPT-5 balanced	400K	$1.25 → $10.00	$0.0150
GPT-3.5 Turbo 16k fast	16K	$3.00 → $4.00	$0.0160
o4 Mini Deep Research balanced	200K	$2.00 → $8.00	$0.0160
o3 balanced	200K	$2.00 → $8.00	$0.0160
GPT-4.1 balanced	1M	$2.00 → $8.00	$0.0160
GPT Audio balanced	128K	$2.50 → $10.00	$0.0200
GPT-4o Search Preview balanced	128K	$2.50 → $10.00	$0.0200
GPT-4o (2024-11-20) balanced	128K	$2.50 → $10.00	$0.0200
GPT-4o (2024-08-06) balanced	128K	$2.50 → $10.00	$0.0200
GPT-4o balanced	128K	$2.50 → $10.00	$0.0200
GPT-5.3 Chat balanced	128K	$1.75 → $14.00	$0.0210
GPT-5.3-Codex balanced	400K	$1.75 → $14.00	$0.0210
GPT-5.2-Codex balanced	400K	$1.75 → $14.00	$0.0210
GPT-5.2 Chat balanced	128K	$1.75 → $14.00	$0.0210
GPT-5.2 balanced	400K	$1.75 → $14.00	$0.0210
GPT-5.4 frontier	1.1M	$2.50 → $15.00	$0.0250
GPT-4o (2024-05-13) frontier	128K	$5.00 → $15.00	$0.0350
GPT-5.4 Image 2 frontier	272K	$8.00 → $15.00	$0.0470
GPT-5 Image balanced	400K	$10.00 → $10.00	$0.0500
GPT Chat Latest frontier	400K	$5.00 → $30.00	$0.0500
GPT-5.5 frontier	1.1M	$5.00 → $30.00	$0.0500
GPT-4 Turbo frontier	128K	$10.00 → $30.00	$0.0700
GPT-4 Turbo Preview frontier	128K	$10.00 → $30.00	$0.0700
o3 Deep Research frontier	200K	$10.00 → $40.00	$0.0800
o1 frontier	200K	$15.00 → $60.00	$0.1200
o3 Pro frontier	200K	$20.00 → $80.00	$0.1600
GPT-4 frontier	8K	$30.00 → $60.00	$0.1800
GPT-5 Pro frontier	400K	$15.00 → $120.00	$0.1800
GPT-5.2 Pro frontier	400K	$21.00 → $168.00	$0.2520
GPT-5.5 Pro frontier	1.1M	$30.00 → $180.00	$0.3000
GPT-5.4 Pro frontier	1.1M	$30.00 → $180.00	$0.3000
o1-pro frontier	200K	$150.00 → $600.00	$1.20

OpenAI API Pricing & Cost Calculator (2026)

How OpenAI prices its API

The model ladder, and which to use

A worked example

How to cut your OpenAI bill

OpenAI vs the alternatives

Privacy and data handling

Frequently asked questions

OpenAI models & current pricing

Read our OpenAI review

Anthropic pricing

Google pricing

LLM API Cost Calculator