Research shows that LLMs perform significantly worse when the information they need sits in the middle of a long context — better at the beginning, better at the end, worse in the middle. This 'lost in the middle' effect has real implications for how to structure prompts in long-context applications.

A 2023 Stanford study made a surprising discovery: when researchers placed a relevant document somewhere in a long context and asked the model to use it, model accuracy depended strongly on position. Documents at the start of the context were used reliably. Documents at the end were used reliably. Documents in the middle were often missed entirely — even though the information was right there in the context. This 'lost in the middle' effect persists across model families and has been replicated with progressively longer context models. The mechanism appears to be attention distribution: during training, models develop attention patterns that heavily weight the beginning (system prompt) and recent tokens (current question), with middle-context tokens getting less attention on average. The practical implications are significant. For RAG systems: the order in which you place retrieved documents matters. Put the most relevant retrievals at the top and bottom of the retrieved set, not randomly ordered. For long-document QA: don't rely on the model to find a needle in the middle of a 500-page document. Pre-filter or summarize before passing to the model. For agentic systems with long conversation histories: critical instructions placed at conversation start will degrade in influence over long runs — periodic instruction refresh prompts help. The broader lesson: bigger context windows are powerful but not a replacement for good information architecture. Retrieval-augmented approaches that surface only the most relevant content often outperform just cramming everything into a massive context window.

Lost in the Middle: Why Bigger Context Windows Aren't Always Better