Google Gemini API Pricing & Cost Calculator (2026)
Current Google Gemini API pricing per million tokens — Pro, Flash and Flash-Lite — with a live calculator, context-tier pricing explained, worked examples and cost tips.
Google's Gemini API is often the value leader, especially for high-volume and long-context workloads, thanks to very competitive Flash and Flash-Lite tiers and enormous context windows. It also has a pricing quirk most providers don't — context-tiered pricing — that's worth understanding before you commit. This page explains the lineup, the tier mechanics, a worked example and how to keep it cheap. The table above is from a live feed; use the cost calculator to see your real spend.
The Gemini lineup
Gemini is sold in a clear ladder, billed per million input and output tokens:
- Pro — frontier reasoning, the largest practical context, and the only tier with context-tiered pricing.
- Flash — the balanced, fast default: strong quality at a fraction of Pro's price.
- Flash-Lite — the cheapest tier, excellent for classification, extraction and summarisation at scale.
The enormous context windows (up to ~1M tokens) make Gemini attractive whenever you genuinely need to put a lot of material in front of the model in one shot — long documents, large codebases, big transcripts.
Context-tiered pricing (the quirk to watch)
On the Pro tier, the per-token price increases once a request crosses a context threshold (around 200k tokens): both input and output cost more above the line. Our dataset records these tiers; the headline figure shown in the table is the standard (≤200k) rate. The practical implication: a workflow that occasionally sends very long contexts can cost noticeably more than the headline suggests. If you routinely operate above the threshold, budget for the higher tier — and consider whether you can stay under it by chunking or summarising.
Flash and Flash-Lite generally use flat pricing, which is part of what makes them so predictable for high-volume work.
Context caching
Gemini bills cached context at a fraction of the input rate, with a separate small storage component for very large cached blocks held over time. For repeated long prompts — the same big document queried many times — this is a meaningful saving on top of already-low Flash pricing.
A worked example
A document-Q&A feature feeds a 50,000-token document plus a 200-token question, and returns a 600-token answer, 30,000 times a month, on Flash:
input = 50,200 tokens
output = 600 tokens
monthly = ((50,200/1e6 × input) + (600/1e6 × output)) × 30,000
Because the document is identical across many questions, context caching bills it at a fraction after the first call — turning a large input line item into a small one. Run it in the calculator with caching on to see the effect. Note how heavily the input dominates here: with 50k input vs 600 output, this is a workload where Gemini's cheap input tiers and caching shine.
How to cut your Gemini bill
- Default to Flash-Lite, escalate to Flash, and only reach for Pro when quality genuinely demands it.
- Stay under the context tier threshold where you can — chunk or summarise long inputs to avoid the higher Pro rate.
- Use context caching for stable, repeated material like fixed documents or knowledge bases.
- Cap output length — as everywhere, output is the pricier half.
- Send bulk jobs through the batch tier for the asynchronous discount.
Gemini vs the alternatives
For high-volume, cost-sensitive tasks — classification, extraction, summarisation, simple chat — Flash-Lite is among the cheapest capable options anywhere, rivalled mainly by DeepSeek and open-weight models. At the top end, Pro competes with OpenAI's flagship and Anthropic's Opus and offers the largest practical context, though the context-tier pricing means you should model long-context costs carefully rather than trusting the headline rate.
Frequently asked questions
Why is my Gemini Pro bill higher than the headline price? Almost certainly context-tiered pricing: requests above ~200k tokens are billed at a higher per-token rate. Check whether your prompts cross that threshold.
Is Flash-Lite good enough for production? For classification, extraction, routing and summarisation, usually yes — and at a fraction of frontier prices. Reserve Flash/Pro for tasks where quality clearly improves the outcome.
Does Gemini support prompt/context caching? Yes. Stable repeated context is billed at a reduced rate, with a small storage fee for large cached blocks — worthwhile when the same material is queried many times.
Which is cheaper, Gemini or OpenAI? It depends on the task and model tier. For high-volume simple work Gemini Flash-Lite is typically cheaper; at the frontier they're closer. Compare your exact scenario in the calculator.
Compare Gemini against OpenAI, Anthropic and the open-weight field in the LLM API cost calculator.
Prices are auto-refreshed from a live source and dated. Confirm current pricing on Google's page before committing.