A 200,000-token context window sounds enormous until you run a research agent for 30 minutes that makes 50 tool calls, each returning 2,000 tokens of results. Context management is the discipline of keeping the agent's working memory useful and within budget across long task horizons. Four strategies are used in practice. Sliding window: drop the oldest turns beyond a fixed context length. Simple to implement, but causes agents to lose critical early context — like forgetting the original goal specifications. Summarization: periodically compress older context segments into dense summaries using a separate LLM call. Preserves key information at lower token cost, but introduces latency and risks losing nuance in compression. Selective retention: classify context segments by relevance and drop low-relevance segments first. Requires a scoring mechanism but produces much better long-horizon performance. Offload to external memory: store detailed tool results in a vector store and retrieve them on-demand when relevant, keeping the active context lean. This hybrid short-term/long-term memory architecture mirrors how human working memory interacts with long-term recall. KV cache optimization at the infrastructure level is also relevant: most providers cache the computation for repeated prompt prefixes, so structuring agents to have stable system prompt prefixes and variable suffixes significantly reduces per-step latency and cost on long runs.
AdvancedAgents & Tool UseAgents & Tool UseKnowledge
Context Window Management: Engineering Agent Short-Term Memory
As agents execute long tasks, their context windows fill with tool results, reasoning traces, and intermediate outputs. Managing this finite resource — deciding what to keep, summarize, compress, or offload — is a core engineering challenge that directly determines how far an agent can run before degrading or failing.
short-term-memorycontext-managementagent-memorystm
Want more like this?
WeeBytes delivers 25 cards like this every day — personalised to your interests.
Start learning for free