LLM API pricing glossary
Plain-English definitions of the terms used across the cost calculator and our guides — tokens, context windows, caching, batch pricing and more.
- Token
- The unit LLMs read and write and the unit you are billed in. In English a token is roughly ¾ of a word (about 4 characters). "Hello world" is ~2 tokens.
- Input tokens
- Everything you send to the model in a request: the system prompt, any retrieved context or documents, and the user message. Priced per million tokens, usually cheaper than output.
- Output tokens
- The tokens the model generates in its response. Typically priced two to five times higher than input, so response length is a major cost driver.
- Per-million pricing
- API prices are quoted in USD per 1,000,000 tokens. Cost per request = (input tokens ÷ 1,000,000 × input price) + (output tokens ÷ 1,000,000 × output price).
- Context window
- The maximum number of tokens (input + output) a model can consider at once. Larger windows let you pass more material in a single request, but very long contexts can cost more on tiered models.
- Prompt caching
- Billing repeated prompt prefixes (a long system prompt, a fixed knowledge base) at a steep discount — often one-tenth of the normal input price. Only the repeated, stable part benefits.
- Cached input price
- The reduced per-million price charged for a cache hit on previously processed input tokens. Shown in the calculator as the cached rate.
- Batch API
- A processing tier for non-interactive workloads that trades latency for a discount (commonly 50%). Requests are submitted in bulk and returned asynchronously.
- Tiered pricing
- Some providers raise the per-token price once a request exceeds a context threshold (e.g. above 200k tokens). The calculator stores these tiers; the headline price is the standard tier.
- Frontier model
- The most capable (and usually most expensive) reasoning models. Best for hard tasks where first-try accuracy reduces total calls.
- Open-weight model
- A model whose weights are publicly released, so it can be run via a hosting provider or self-hosted on your own hardware. Enables full data residency.
- Self-hosting
- Running an open-weight model on your own GPUs. Trades per-token fees for fixed infrastructure and operational cost; cheaper only at high, steady utilisation.
- Exchange rate (FX)
- API prices are set in USD. This site converts them to your chosen currency using daily European Central Bank reference rates, fetched server-side and cached.