LLM API pricing glossary

Plain-English definitions of the terms used across the cost calculator and our guides — tokens, context windows, caching, batch pricing and more.

Token
The unit LLMs read and write and the unit you are billed in. In English a token is roughly ¾ of a word (about 4 characters). "Hello world" is ~2 tokens.
Input tokens
Everything you send to the model in a request: the system prompt, any retrieved context or documents, and the user message. Priced per million tokens, usually cheaper than output.
Output tokens
The tokens the model generates in its response. Typically priced two to five times higher than input, so response length is a major cost driver.
Per-million pricing
API prices are quoted in USD per 1,000,000 tokens. Cost per request = (input tokens ÷ 1,000,000 × input price) + (output tokens ÷ 1,000,000 × output price).
Context window
The maximum number of tokens (input + output) a model can consider at once. Larger windows let you pass more material in a single request, but very long contexts can cost more on tiered models.
Prompt caching
Billing repeated prompt prefixes (a long system prompt, a fixed knowledge base) at a steep discount — often one-tenth of the normal input price. Only the repeated, stable part benefits.
Cached input price
The reduced per-million price charged for a cache hit on previously processed input tokens. Shown in the calculator as the cached rate.
Batch API
A processing tier for non-interactive workloads that trades latency for a discount (commonly 50%). Requests are submitted in bulk and returned asynchronously.
Tiered pricing
Some providers raise the per-token price once a request exceeds a context threshold (e.g. above 200k tokens). The calculator stores these tiers; the headline price is the standard tier.
Frontier model
The most capable (and usually most expensive) reasoning models. Best for hard tasks where first-try accuracy reduces total calls.
Open-weight model
A model whose weights are publicly released, so it can be run via a hosting provider or self-hosted on your own hardware. Enables full data residency.
Self-hosting
Running an open-weight model on your own GPUs. Trades per-token fees for fixed infrastructure and operational cost; cheaper only at high, steady utilisation.
Exchange rate (FX)
API prices are set in USD. This site converts them to your chosen currency using daily European Central Bank reference rates, fetched server-side and cached.

Try the LLM API cost calculator →