Context Window
The maximum number of tokens an LLM can hold in view at once — input and output combined.
Reviewed by the RadarTrek editorial team · June 2026
The context window is the model's working memory: every token of your prompt, conversation history, and the model's own response must fit inside this limit. A bigger context window lets you hand the model more documents or history, but it isn't free — more tokens means more compute, more latency, and models tend to recall information from the middle of a very long context worse than the beginning or end.
Why it matters
- —Exceeding the context window forces you to truncate or summarise — losing information silently if you're not careful.
- —Bigger context windows enable RAG and long conversations, but cost and latency scale with every token included.
- —The "lost in the middle" effect means structuring what you put where in a long prompt genuinely matters.
Where to learn this
Tokens and Context Windows
How LLMs Actually Work course
This is the exact lesson that covers this term in depth — with examples, diagrams, and a hands-on exercise.