Retrieval-Augmented Generation (RAG)
Retrieval-Augmented Generation is an AI architecture pattern that retrieves relevant documents from a knowledge base and includes them in the prompt context, reducing hallucination and grounding outputs in source material.
What is RAG?
Retrieval-Augmented Generation (RAG) is an AI architecture pattern where the system first retrieves relevant documents from a knowledge base (typically a vector database or search index), then includes those documents in the prompt sent to a language model. The retrieved context grounds the model's response in source material, reducing hallucination and enabling answers that reference specific documents. The pattern was popularised by a 2020 Facebook AI paper but is now the dominant way enterprises deploy LLMs over internal knowledge.
What buyers should ask
RAG looks deceptively simple but introduces real procurement questions. Where is the retrieval index hosted? Does it leak across customers in a multi-tenant deployment? Are retrieved documents themselves treated as trusted input, or sanitized for prompt injection? When a user query retrieves an outdated or deleted document, is that visible in the response? How is the embedding model used for retrieval kept in sync with documents that change? Mature RAG vendors answer these in their architecture documentation; less mature ones treat RAG as a checkbox feature.
RAG vs fine-tuning
RAG and fine-tuning solve overlapping but distinct problems. RAG is better when the underlying knowledge changes frequently, when you need to cite sources, or when you can't easily move sensitive data into a training pipeline. Fine-tuning is better when you need a consistent style or specialized capability that doesn't fit in a prompt window. Most enterprise deployments use both: RAG for current knowledge, fine-tuning for tone and task-specific behavior.