How Much Memory Do AI Agents Need
The Detailed Answer
People asking how much memory an agent needs are usually picturing a single quantity, like the amount of storage to provision. But memory in an agent works at two very different scales, and conflating them is the source of most confusion. One scale is the durable store, the database of everything the agent has ever saved, which can be enormous and is cheap to grow. The other is the working set, the memories actually pulled into the prompt for a given turn, which is tightly limited and expensive to expand.
The amount that matters for quality is almost always the second one. An agent can have a store of a million memories and still need only three of them to answer the question in front of it. The whole challenge is selecting those three, which is a retrieval problem, not a storage problem, and is covered in memory retrieval strategies. So the practical answer to how much memory an agent needs is: store as much as is useful, but retrieve and inject only the small, relevant slice each task actually requires.
Two Different Questions: Storage and Context
Separating the two scales cleanly is the key to reasoning about agent memory capacity. Storage is about how much the agent can keep, and here the answer is generous: keep anything durably useful, because space is cheap and a larger store simply means more potential to recall the right thing later. The discipline at the storage scale is not limiting size but maintaining quality, pruning noise and stale entries so the store stays sharp, as covered in maintaining agent memory over time.
Context is about how much the agent can consider at once, and here the answer is frugal: inject only what the current task needs. The context window is a fixed, shared, and relatively scarce resource, and memory is just one claimant on it. Spending it wisely means retrieving a small, high-relevance set rather than flooding the prompt. This frugality is exactly what adaptive approaches automate, scaling the amount retrieved to the difficulty of each turn, as described in adaptive recall. Hold these two scales apart and the apparent paradox, store a lot but use a little, resolves into common sense.
It helps to put rough proportions on it. A context window holds a fixed budget of tokens, and those tokens are spent on the system instructions that define the agent, the running conversation, any tools and their descriptions, and the room the model needs to actually reason. Memory competes for whatever is left. In that light, injecting fifty retrieved memories to answer a simple question is like emptying a filing cabinet onto your desk to find one address: the sheer volume makes the task harder, not easier. A few well-chosen memories leave room for everything else the prompt must carry and let the model concentrate on producing a good answer.
This is also why larger context windows have not made memory systems obsolete. As windows grow, it becomes tempting to simply pour in more, but the same dynamics apply at every size: relevance still beats volume, cost and latency still climb with every token, and models still attend better to a focused prompt than a sprawling one. A bigger window raises the ceiling on how much an agent can consider at once, but it does not change the goal, which remains retrieving the right information rather than the most information.
Finding the Right Amount for Your Agent
In practice, finding the right amount is an empirical exercise rather than a formula. Start by injecting a small number of retrieved memories, perhaps three to five, and measure whether the agent has what it needs to answer well. If it frequently lacks relevant information, the problem is usually retrieval quality rather than too little memory, so improve the search before reaching for a bigger slice. If answers are unfocused or the agent fixates on tangents, you are likely injecting too much, and trimming the set will help.
The right amount also varies by task within the same agent, which is why a fixed number is rarely optimal. A simple factual lookup needs almost nothing, while a complex question spanning several stored facts needs more. Matching the amount to the moment, rather than always injecting the same quantity, is what separates an efficient memory system from a wasteful one. The total store, meanwhile, should be allowed to grow as long as maintenance keeps it clean, since a big, well-tended store is an asset while a big, noisy one is a liability. The underlying loop that moves memory between store and context is laid out in how memory systems work.
An agent needs a store that can be as large as is useful, but a working set that stays small. Storage is cheap and abundant, so keep anything durably valuable and maintain its quality; the context window is scarce, so inject only the few most relevant memories each task needs. More stored memory can help, but more injected memory usually hurts, which makes retrieval quality, not raw volume, the thing that actually determines how well an agent performs.