When you type a message to an AI, it doesn't see letters or words. It sees tokens — chunks of text that can be whole words, parts of words, or even single characters. The model was trained on tokens, thinks in tokens, and generates tokens.
**How tokenization works:**
Most modern LLMs use Byte Pair Encoding (BPE). Common words get their own token ('the', 'and', 'is'). Rare words are split ('tokenization' → 'token', 'ization'). Numbers often split into single digits. Unicode characters may be multiple tokens.
Example with GPT-4:
- 'Hello world' = 2 tokens
- 'Antidisestablishmentarianism' = 7 tokens
- '2024' = 1 token
- 'aaabbbccc' = 3 tokens
**Why this matters:**
- **Cost**: APIs charge per token. Know your token count.
- **Length limits**: Context windows are measured in tokens, not words.
- **The 'strawberry' problem**: LLMs famously struggled to count letters in words because the model sees tokens, not letters. 'strawberry' is 3 tokens ('st', 'raw', 'berry') — the model never 'sees' individual letters during standard generation. Newer models with extended thinking have largely fixed this.
- **Arithmetic failures**: '1 + 1' is fine. Large number arithmetic breaks because digits are separate tokens and the model has to reason across them.
**Practical rule of thumb**: 1 token ≈ 0.75 English words, 4 characters. GPT-4's 128K context = ~96,000 words = ~190 pages.
**Key takeaway:** AI reads tokens, not words or letters. This explains many surprising limitations and why counting letters in words was historically hard for LLMs.