TracksSpecializations and Deep DivesUnderstanding AI ToolsTokens and Context Windows Deep Dive(2 of 6)

Tokens and Context Windows Deep Dive

Language models don't read text the way humans do. They process tokens — chunks of text that might be words, parts of words, or individual characters. Understanding tokenization helps you write better prompts and work within model limitations.

What Are Tokens?

Tokens are the fundamental units LLMs process. The model converts your text into tokens before processing and converts tokens back to text when generating responses.

Common English words typically equal one token. Rare words, technical terms, and non-English text often split into multiple tokens. Code can be particularly token-heavy due to special characters and naming conventions.

"Hello" → 1 token
"Hello, how are you?" → 5 tokens
"antidisestablishmentarianism" → 6 tokens
"def calculate_average(numbers):" → 7 tokens
"你好" → 2 tokens

This matters because you pay for tokens (with API models) and because context windows are measured in tokens, not words.

Context Windows Explained

The context window is the total amount of text a model can "see" at once. It includes everything: your system prompt, the conversation history, your current message, and the space needed for the response.

Context Window = System Prompt + Conversation History + Your Message + Response

Different models have different context sizes:

ModelContext Window
GPT-3.54K - 16K tokens
GPT-48K - 128K tokens
Claude100K - 200K tokens
Llama 24K tokens

When you exceed the context window, older content gets dropped. In a long conversation, the model literally forgets what you discussed earlier.

Practical Implications

Long conversations lose context — Start fresh conversations for new topics rather than continuing indefinitely.

Large codebases may not fit — You can't paste an entire project into the prompt. Be selective about what context you provide.

More context isn't always better — Irrelevant information can confuse the model. Focused context often produces better results than comprehensive context.

Token-Efficient Prompting

Concise prompts save tokens for what matters:

Less efficient (15 tokens):
"I would like you to please write a Python function that 
takes a list of numbers and returns the average value."

More efficient (8 tokens):
"Write Python function: average of number list."

Both prompts work, but the efficient version leaves more room for context and response.

See More

Further Reading

Last updated December 26, 2025

You need to be signed in to leave a comment and join the discussion