LLM API pricing glossary

Plain-English definitions of the terms used across the cost calculator and our guides — tokens, context windows, caching, batch pricing and more.

Token: The unit LLMs read and write and the unit you are billed in. In English a token is roughly ¾ of a word (about 4 characters). "Hello world" is ~2 tokens.
Input tokens: Everything you send to the model in a request: the system prompt, any retrieved context or documents, and the user message. Priced per million tokens, usually cheaper than output.
Output tokens: The tokens the model generates in its response. Typically priced two to five times higher than input, so response length is a major cost driver.
Per-million pricing: API prices are quoted in USD per 1,000,000 tokens. Cost per request = (input tokens ÷ 1,000,000 × input price) + (output tokens ÷ 1,000,000 × output price).
Context window: The maximum number of tokens (input + output) a model can consider at once. Larger windows let you pass more material in a single request, but very long contexts can cost more on tiered models.
Prompt caching: Billing repeated prompt prefixes (a long system prompt, a fixed knowledge base) at a steep discount — often one-tenth of the normal input price. Only the repeated, stable part benefits.
Cached input price: The reduced per-million price charged for a cache hit on previously processed input tokens. Shown in the calculator as the cached rate.
Batch API: A processing tier for non-interactive workloads that trades latency for a discount (commonly 50%). Requests are submitted in bulk and returned asynchronously.
Tiered pricing: Some providers raise the per-token price once a request exceeds a context threshold (e.g. above 200k tokens). The calculator stores these tiers; the headline price is the standard tier.
Frontier model: The most capable (and usually most expensive) reasoning models. Best for hard tasks where first-try accuracy reduces total calls.
Open-weight model: A model whose weights are publicly released, so it can be run via a hosting provider or self-hosted on your own hardware. Enables full data residency.
Self-hosting: Running an open-weight model on your own GPUs. Trades per-token fees for fixed infrastructure and operational cost; cheaper only at high, steady utilisation.
Exchange rate (FX): API prices are set in USD. This site converts them to your chosen currency using daily European Central Bank reference rates, fetched server-side and cached.

Try the LLM API cost calculator →