Memory Retrieval Strategies: Keyword, Vector, Hybrid

Updated May 2026
Memory retrieval is the part of an agent's memory system that finds the right past information when a new task arrives, and there are three core strategies for doing it. Keyword retrieval matches exact terms and excels at precise identifiers. Vector retrieval matches meaning and excels at different wordings of the same idea. Hybrid retrieval runs both and merges their results, capturing the strengths of each. On top of these, a reranking step reorders the candidates for true relevance, and filtering by metadata and recency narrows the search to what actually applies. Because a memory the agent fails to retrieve is worthless no matter how well it was stored, choosing and tuning these strategies is where most of the engineering effort in a memory system goes.

Why Retrieval Is the Crux of Memory

Storing information is the easy half of memory; retrieving the right piece at the right moment is the hard half, and it is where memory systems succeed or fail. An agent can faithfully store every important fact and still seem forgetful if its retrieval never surfaces those facts when they are needed. From the user's perspective, a memory that is not retrieved does not exist. This is why serious memory engineering concentrates on retrieval quality rather than on storage, which is largely a solved problem.

Retrieval is hard because the system must select a handful of genuinely useful memories from a store that may hold millions, using only the current query as a guide, and it must do so in milliseconds and within a strict limit on how much it can return. Every strategy below is an attempt to improve the odds that the few memories pulled back are the ones that actually help. Understanding how they differ, and how they combine, is what lets you move an agent from frustratingly forgetful to reliably knowledgeable. These strategies sit in the retrieve stage of the broader loop described in how memory systems work.

Keyword Retrieval: Matching Exact Terms

Keyword retrieval, also called lexical search, is the classic approach: find memories that contain the same words as the query. Modern keyword search is more sophisticated than simple word matching, using scoring methods that weigh how rare and how frequent each term is to rank results, but the principle remains that it matches on the literal text. Its great strength is precision on exact terms. If a query contains a specific error code, product name, account number, or unusual proper noun, keyword search finds the memories containing that exact token reliably, with no risk of the term being blurred away.

The weakness of keyword retrieval is that it is blind to meaning. It cannot connect a query about cancelling a subscription to a memory about ending a recurring plan, because the two share no words, even though they mean the same thing. For agent memory, where users and the agent itself naturally phrase the same idea in countless ways, this blindness is a serious limitation on its own. Keyword search is therefore rarely used alone in modern systems, but it remains indispensable as one half of a hybrid approach, precisely because it catches the exact terms that meaning-based search misses.

Vector Retrieval: Matching Meaning

Vector retrieval is the meaning-based counterpart to keyword search. It converts both the stored memories and the query into embeddings and finds the memories whose vectors are closest to the query's, so it matches on semantic similarity rather than shared words. This is what lets an agent connect differently worded expressions of the same idea, and it is the default backbone of modern agent memory because natural language is so variable. The mechanics of embeddings and similarity are covered in detail in vector search and embedding models.

Vector retrieval has its own blind spots that mirror keyword search's strengths. It can blur precise terms, treating two different product codes as nearly identical because they look semantically similar, and it can rank a memory as relevant simply because it is on the same topic, even when it does not actually answer the query. It also depends entirely on the quality of the embedding model, which sets a hard ceiling on how well it can ever perform. These weaknesses are exactly the ones keyword search does not share, which is the entire motivation for combining the two.

Hybrid Retrieval: Combining Both

Hybrid retrieval runs keyword and vector search together and merges their results, and it reliably outperforms either method alone. The logic is simple: the two approaches fail in opposite ways, so combining them lets each cover the other's weakness. Vector search supplies the meaning-based matches that bridge different wordings, while keyword search guarantees that exact terms are caught, and the merged result captures both the gist and the specifics of what the query is asking for.

The practical challenge in hybrid retrieval is combining two different kinds of score into one ranking, since keyword relevance and vector similarity are measured on different scales. Common techniques normalize and blend the scores, or use a method that combines the rank positions from each search rather than the raw scores. Most modern vector databases offer hybrid search as a built-in feature, so adopting it is usually a configuration choice rather than a custom build. For the majority of agent memory systems, hybrid retrieval is the right default, delivering robust recall across the full range of queries an agent encounters.

Reranking: The Precision Layer

Even a good hybrid search returns a rough ranking, because the initial retrieval is optimized for speed over a huge store, not for perfect judgment of relevance. Reranking adds a second, more careful pass: the system takes the top candidates from the first retrieval, perhaps the best twenty or fifty, and scores each one against the query with a more powerful model that can weigh subtle relevance the fast first pass cannot. It then keeps only the few best for injection.

This two-stage pattern, fast retrieval followed by careful reranking, is one of the highest-impact improvements available to a memory system. The first stage casts a wide net cheaply, and the second stage applies expensive judgment to a small set, combining broad recall with sharp precision. The cost is the extra latency and compute of the reranking model, applied only to a handful of candidates, which is usually well worth it. Reranking is often what separates a memory system that returns vaguely related memories from one that returns exactly the right ones, and it pairs naturally with the budget decisions discussed in how much memory agents need.

Filtering, Recency, and the Context Budget

Two more levers shape retrieval beyond the core search. Metadata filtering restricts results to memories that meet exact criteria, most importantly the current user, so that one person's memories are never surfaced in another's session, but also things like date ranges, sources, or categories. Filtering is applied alongside the search and is essential both for correctness and for the privacy isolation that any multi-user memory system requires. Recency weighting tilts results toward newer memories, which matters because in many situations the latest information is the most relevant and the most likely to be current.

All of these strategies ultimately serve one constraint: the limited budget of how much memory can be injected into the context window. Retrieval is not about finding everything relevant but about finding the small, high-value set that fits the budget and genuinely helps. Keyword and vector search find candidates, hybrid merging balances them, reranking sharpens the order, and filtering and recency trim to what applies, all so that the few memories that reach the model are the right ones. Tuning this pipeline against real queries is an ongoing operational task, not a one-time setting, and it is the difference between an agent that reliably recalls what it knows and one that does not.

Key Takeaway

Agent memory retrieval rests on three core strategies: keyword search for exact terms, vector search for meaning, and hybrid retrieval that merges both for robust recall. A reranking pass then sharpens precision, while metadata filtering and recency weighting trim results to what actually applies within the context budget. Because an unretrieved memory is effectively lost, tuning this pipeline is where most of the value in a memory system is won.