Short-term memory in AI agents is the working information available during a single task — the conversation history, tool results, and intermediate reasoning steps held in the model's context window. Like human working memory, it's fast and immediately accessible, but strictly limited in capacity.

When you give an AI agent a task, it needs to track what it's already done, what it found out, and what it still needs to do. This tracking happens in short-term memory — concretely, this is the contents of the model's context window. Every message, tool call, and tool result gets appended to this running record and fed back into the model on each step. Short-term memory is why an agent can say 'based on the search I did two steps ago' — that result is still in its context window. The critical constraint is that context windows are finite. Current models support anywhere from 8,000 to 200,000 tokens depending on the provider, and every token in the context costs money and adds latency. For long-running tasks, the agent's short-term memory can fill up, forcing choices about what to drop, summarize, or offload to long-term storage. Short-term memory is also ephemeral — it resets completely between sessions. This is why agents that need to 'remember' things across conversations require an explicit long-term memory layer backed by a database or vector store. Understanding short-term memory as a finite, session-scoped resource is foundational to diagnosing why agents fail on long tasks and how to architect around this constraint.

What is Short-Term Memory in AI Agents?