AI doesn't read words — it reads tokens. And the difference matters more than you'd think. Why 'strawberry' has 3 R's but GPT couldn't tell you — until recently.

When you type a message to an AI, it doesn't see letters or words. It sees tokens — chunks of text that can be whole words, parts of words, or even single characters. The model was trained on tokens, thinks in tokens, and generates tokens.

**How tokenization works:**

Most modern LLMs use Byte Pair Encoding (BPE). Common words get their own token ('the', 'and', 'is'). Rare words are split ('tokenization' → 'token', 'ization'). Numbers often split into single digits. Unicode characters may be multiple tokens.

Example with GPT-4:

- 'Hello world' = 2 tokens

- 'Antidisestablishmentarianism' = 7 tokens

- '2024' = 1 token

- 'aaabbbccc' = 3 tokens

**Why this matters:**

- **Cost**: APIs charge per token. Know your token count.

- **Length limits**: Context windows are measured in tokens, not words.

- **The 'strawberry' problem**: LLMs famously struggled to count letters in words because the model sees tokens, not letters. 'strawberry' is 3 tokens ('st', 'raw', 'berry') — the model never 'sees' individual letters during standard generation. Newer models with extended thinking have largely fixed this.

- **Arithmetic failures**: '1 + 1' is fine. Large number arithmetic breaks because digits are separate tokens and the model has to reason across them.

**Practical rule of thumb**: 1 token ≈ 0.75 English words, 4 characters. GPT-4's 128K context = ~96,000 words = ~190 pages.

**Key takeaway:** AI reads tokens, not words or letters. This explains many surprising limitations and why counting letters in words was historically hard for LLMs.

Tokens: The Building Blocks of AI Language