Tokens and Context Windows Deep Dive
Language models don't read text the way humans do. They process tokens — chunks of text that might be words, parts of words, or individual characters. Understanding tokenization helps you write better prompts and work within model limitations.
What Are Tokens?
Tokens are the fundamental units LLMs process. The model converts your text into tokens before processing and converts tokens back to text when generating responses.
Common English words typically equal one token. Rare words, technical terms, and non-English text often split into multiple tokens. Code can be particularly token-heavy due to special characters and naming conventions.
"Hello" → 1 token
"Hello, how are you?" → 5 tokens
"antidisestablishmentarianism" → 6 tokens
"def calculate_average(numbers):" → 7 tokens
"你好" → 2 tokens
This matters because you pay for tokens (with API models) and because context windows are measured in tokens, not words.
Context Windows Explained
The context window is the total amount of text a model can "see" at once. It includes everything: your system prompt, the conversation history, your current message, and the space needed for the response.
Context Window = System Prompt + Conversation History + Your Message + Response
Different models have different context sizes:
| Model | Context Window |
|---|---|
| GPT-3.5 | 4K - 16K tokens |
| GPT-4 | 8K - 128K tokens |
| Claude | 100K - 200K tokens |
| Llama 2 | 4K tokens |
When you exceed the context window, older content gets dropped. In a long conversation, the model literally forgets what you discussed earlier.
Practical Implications
Long conversations lose context — Start fresh conversations for new topics rather than continuing indefinitely.
Large codebases may not fit — You can't paste an entire project into the prompt. Be selective about what context you provide.
More context isn't always better — Irrelevant information can confuse the model. Focused context often produces better results than comprehensive context.
Token-Efficient Prompting
Concise prompts save tokens for what matters:
Less efficient (15 tokens):
"I would like you to please write a Python function that
takes a list of numbers and returns the average value."
More efficient (8 tokens):
"Write Python function: average of number list."
Both prompts work, but the efficient version leaves more room for context and response.