OpenAI API Pricing & Cost Calculator (2026)
Up-to-date OpenAI API pricing per million tokens for the GPT-5 family, with a live cost calculator. Input, output, cached and batch pricing, worked examples and how to cut your bill.
OpenAI's API is the broadest, most widely-integrated LLM platform on the market, and for most teams the real question isn't whether it can do the job but which model in the GPT-5 ladder is the right cost/quality point — and how to keep the bill predictable as you scale. This page explains how OpenAI prices its API, walks through a worked example, and lists the levers that actually move the number. The prices in the table above are pulled from a live feed and refreshed automatically; plug your own token volume into the cost calculator to see your real monthly spend.
How OpenAI prices its API
Like every major provider, OpenAI bills per token — a chunk of text roughly ¾ of a word in English — and charges separately for input (everything you send: system prompt, retrieved context, the user message) and output (what the model generates). Prices are quoted per million tokens. Output is consistently the more expensive half, typically several times the input rate, which is why response length is such a large cost driver.
Three structural factors shape an OpenAI bill:
- Model tier. The GPT-5 family spans from the flagship down to
miniandnanovariants that cost a small fraction as much, plusproreasoning variants at the very top that cost many times the flagship. The table above shows the live spread. - Cached input. Repeated prompt prefixes — a long fixed system prompt, a stable instruction block — are billed at a steep discount, typically around one-tenth of the normal input rate. You only pay the discount on the repeated portion; the dynamic part of each prompt is full price.
- Batch tier. Non-interactive workloads submitted through the Batch API receive roughly a 50% discount in exchange for asynchronous delivery.
The model ladder, and which to use
Treating the lineup as a ladder is the single most important cost habit:
nano/mini— classification, routing, extraction, short rewrites, simple chat. Often good enough for the majority of production traffic, at a tiny fraction of the flagship price.- Flagship GPT-5 — the default for general reasoning, coding and anything user-facing where quality matters.
proreasoning — reserve for genuinely hard problems (complex multi-step reasoning, difficult debugging). At several times the flagship price, sending everyday traffic here is the most common way teams overspend.
The biggest mistake is routing everything to the top model out of habit. A cheap classifier — or even a heuristic on input length and task type — that picks the right rung per request routinely halves spend on mixed workloads.
A worked example
Take a customer-support assistant: 2,000 input tokens (system prompt + retrieved article + user question) and 400 output tokens per reply, 100,000 replies a month.
cost_per_reply = (2,000 / 1,000,000 × input_price)
+ (400 / 1,000,000 × output_price)
monthly_cost = cost_per_reply × 100,000
Run those numbers against the flagship and against mini in the calculator and the gap is usually an order of magnitude. Now add prompt caching: if 1,500 of those input tokens are a fixed system prompt, caching bills them at ~10% — the input half of the bill nearly disappears. This is why "which model + caching" matters far more than the headline per-token price.
How to cut your OpenAI bill
- Cap output length. Output is the expensive half; set sensible
max_tokensand prompt for concise answers. - Route by difficulty. A small model picking the rung saves more than any negotiation.
- Cache the stable prefix. Long system prompts and fixed context are ideal candidates.
- Batch the offline work. Evals, backfills and bulk classification belong in the discounted tier.
- Trim retrieved context. RAG pipelines often stuff far more context than the model needs; tighter retrieval cuts input tokens directly.
- Shorten chat history. The full transcript is re-sent every turn, so cost grows with conversation length — summarise or window old turns.
OpenAI vs the alternatives
OpenAI is the low-risk default thanks to ecosystem maturity: the SDKs, tool/function calling, structured outputs and assistant tooling are the most polished, and almost every third-party library targets it first. If "everything already integrates with it" matters, it's the safe pick.
On raw price, though, it is rarely the cheapest. For high-volume simple work, Google's Gemini Flash-Lite and DeepSeek often undercut it substantially. For agentic coding, it's worth A/B testing against Anthropic's Claude, because per-token price tells you little about per-task cost when first-try accuracy differs — a model that solves the task once beats a cheaper one you call three times.
Privacy and data handling
Standard API terms exclude inputs and outputs from training, and enterprise agreements add further controls. As with any managed API, though, "data never leaves our infrastructure" is not on the menu — for that requirement, compare against self-hosted open-weight inference on total cost of ownership.
Frequently asked questions
Is the OpenAI API cheaper than ChatGPT Plus? They're different products. ChatGPT is a flat monthly subscription for the chat app; the API is usage-based and billed per token. For programmatic use you want the API, and this page's calculator estimates that usage cost.
What's the difference between mini and the flagship? mini (and nano) are smaller, faster and far cheaper, tuned for simpler tasks. The flagship is stronger on hard reasoning. Route easy traffic to the small models and reserve the flagship for where quality matters.
How much does prompt caching save? It bills the repeated prefix at roughly one-tenth of the input rate. The bigger and more stable your prefix (long system prompts, fixed context), the larger the saving — model a realistic cached percentage in the calculator.
Do these prices include taxes? No. Prices are list prices excluding any applicable tax, and enterprise discounts may apply at volume.
Read our hands-on OpenAI API review for the verdict, or compare every OpenAI model against rivals in the LLM API cost calculator.
Prices are auto-refreshed from a live source and dated for transparency. Always confirm current pricing on OpenAI's own page before committing.