What Is RAG and How Do Agents Use It

Updated May 2026
RAG, or retrieval augmented generation, is a technique that gives a language model relevant outside information at the moment it answers, by retrieving that information from an external source and adding it to the prompt. Instead of relying only on what the model absorbed during training, a RAG system searches a knowledge store for material relevant to the current question, injects the best matches into the context, and asks the model to answer using them. Agents use RAG to ground their responses in current, specific, or private information the model was never trained on, and agent memory is essentially RAG applied to an agent's own accumulated experience. It is the dominant pattern for making AI systems answer accurately from a body of knowledge they do not hold in their weights.

The Detailed Answer

A language model on its own can only draw on what it learned during training. That knowledge is broad but frozen at the moment training ended, it contains nothing private or specific to your situation, and the model cannot tell you where any particular claim came from. RAG addresses all three limits at once by adding a retrieval step before generation. When a question arrives, the system searches an external store of documents or data for the passages most relevant to that question, places those passages into the prompt, and instructs the model to answer based on them.

The effect is to separate knowledge from the model. The model supplies the language ability and reasoning, while the external store supplies the facts, which can be updated, expanded, or corrected at any time without retraining anything. This is why RAG has become the standard way to build AI systems that answer from a specific body of knowledge, whether that is a company's documentation, a user's history, or a constantly changing dataset. The retrieval machinery it relies on is the same vector search described in vector search.

Why do agents need RAG instead of just a bigger model?
Because a bigger model still has frozen, general knowledge and cannot know private or current information. Training a model is slow and expensive, and even the largest model knows nothing about your internal documents or what happened after its training cutoff. RAG lets a fixed model answer from fresh, specific, private material simply by retrieving it at query time, which is faster, cheaper, and more current than trying to bake every fact into the weights.
How is RAG different from fine-tuning?
Fine-tuning changes the model's weights to alter its behavior or style, while RAG leaves the model unchanged and supplies knowledge through the prompt. Fine-tuning is good for teaching the model how to respond; RAG is good for giving it what to know. They are complementary, but for keeping an agent current and grounded in specific facts, RAG is usually the right tool, since updating a knowledge store is instant while retraining is not.
Is agent memory the same as RAG?
Agent memory is essentially RAG pointed at the agent's own experience rather than a fixed document set. Both retrieve relevant information and inject it into the prompt. The difference is the source: a classic RAG knowledge base is curated and loaded up front, while agent memory is written continuously from interactions. The retrieval mechanism is the same, which is why the two are built on identical foundations of embeddings and vector search.

How RAG Works Step by Step

RAG runs as a short pipeline each time the agent answers. First, the system takes the incoming question and converts it into an embedding, the same numeric representation of meaning used to index the knowledge store. Second, it searches the store for the passages whose embeddings are closest to the question, retrieving the handful most likely to be relevant, often refined with keyword matching and a reranking step for precision. Third, it assembles those passages into the prompt as reference material, clearly marked as context the model should use.

Fourth, the model generates an answer grounded in the supplied passages rather than from memory alone, ideally citing which source each claim came from. The quality of the whole pipeline hinges on the retrieval step, because the model can only answer well from material that was actually surfaced; if retrieval misses the relevant passage, no amount of model skill recovers it. This is why so much of building good RAG is really about building good retrieval, the subject of memory retrieval strategies, and why assembling the source material well, covered in how to build a knowledge base, matters so much.

A concrete example shows the value. Ask a bare model what your company parental leave policy is, and it cannot know, because the policy lives in an internal document it never saw during training. Wrap the same model in RAG over the company handbook and the flow changes completely: the system retrieves the parental leave section, places it in the prompt, and the model answers accurately, quoting the actual policy and pointing to the document it came from. Nothing about the model itself changed, yet it went from useless to authoritative on that question purely because the right passage was retrieved and supplied at answer time. Multiply that across every internal question an organization fields, and the appeal of grounding answers in retrieved sources becomes obvious.

RAG, Memory, and Knowledge Bases

RAG is the umbrella pattern, and both agent memory and knowledge bases are specific applications of it. A knowledge base is RAG over a curated, relatively stable set of documents loaded in advance, such as product manuals or policies, giving the agent a reference library to answer from. Agent memory is RAG over information the agent writes continuously from its own interactions, such as user preferences and past outcomes, giving the agent personal recall. The two differ in where the content comes from and how often it changes, but they share the same retrieve-and-inject machinery underneath, described in how memory systems work.

Seeing them as one family clarifies how to build a capable agent. The same embedding model, vector store, and retrieval pipeline can serve both a knowledge base of reference material and a memory of personal experience, often side by side, with the agent drawing on each as the task requires. This shared foundation is why understanding RAG is so central to agent memory, and why the broader treatment of the pattern in the RAG guide connects directly to everything covered here. Master retrieval augmented generation and you have mastered the core mechanism behind both memory and knowledge in modern agents.

Key Takeaway

RAG, retrieval augmented generation, gives a language model relevant external information at answer time by retrieving it and adding it to the prompt, so the model answers from current, specific, or private knowledge it was never trained on. Agents use it to ground their responses, and agent memory is simply RAG applied to the agent's own accumulated experience. Because the model can only use what retrieval surfaces, building good RAG is mostly about building good retrieval.