AI & LLMs

Context Window

The maximum number of tokens an LLM can hold in view at once — input and output combined.

Reviewed by the RadarTrek editorial team · June 2026

The context window is the model's working memory: every token of your prompt, conversation history, and the model's own response must fit inside this limit. A bigger context window lets you hand the model more documents or history, but it isn't free — more tokens means more compute, more latency, and models tend to recall information from the middle of a very long context worse than the beginning or end.

Why it matters

—Exceeding the context window forces you to truncate or summarise — losing information silently if you're not careful.
—Bigger context windows enable RAG and long conversations, but cost and latency scale with every token included.
—The "lost in the middle" effect means structuring what you put where in a long prompt genuinely matters.

Where to learn this

🎓

Tokens and Context Windows

How LLMs Actually Work course

This is the exact lesson that covers this term in depth — with examples, diagrams, and a hands-on exercise.

Related terms

Token RAG

← All glossary terms