Hallucination sounds like a malfunction, but it's actually the model doing exactly what it was trained to do — just in a context where that behavior goes wrong.
LLMs generate the statistically most likely next token. They don't have a ground-truth database to check against. When asked about something obscure or at the edge of their training data, they **generate a plausible-sounding continuation** based on patterns. That plausible continuation is often wrong.
**Why it's hard to fix:**
- The model can't distinguish 'I know this' from 'I'm guessing this'
- Training data has noise — the model may have learned incorrect facts
- Models are incentivized to sound helpful, not to say 'I don't know'
**Active mitigation strategies:**
1. **RAG**: Ground responses in retrieved documents. If the answer is in the context, the model usually uses it correctly.
2. **Self-consistency**: Sample multiple completions, pick the most consistent answer. Reduces random errors.
3. **Chain-of-thought**: Reasoning step by step reduces hallucinations on factual tasks.
4. **Citations**: Force the model to cite sources. Uncited claims are more likely hallucinated.
5. **Constitutional AI / RLHF**: Train models to refuse rather than guess.
Recent research shows newer models hallucinate significantly less — GPT-4 hallucinates far less than GPT-3.5. But it's not solved. For high-stakes applications (medical, legal), human review is still essential.
**Key takeaway:** Hallucination happens because LLMs predict plausible text, not true text. RAG, chain-of-thought, and citations are the best defenses.