Hoppa till huvudinnehåll

AI consulting that actually delivers.

Contact us

Sweden (SE)

Social media

What is RAG (Retrieval-Augmented Generation)?

RAG, or Retrieval-Augmented Generation, lets a language model answer from your own documents instead of guessing, for more accurate and traceable answers.

A central dark obsidian crystal drawing lime-green light threads from a semicircle of smaller crystals, symbolising a language model retrieving knowledge from external documents

Here is what RAG is, how it works step by step, and when your business needs it.

How does RAG work step by step?

RAG works in four steps: your question goes to a search system, the most relevant pieces of the knowledge source are retrieved, they are passed to the language model as context, and the model writes an answer grounded in them. The search itself usually relies on so-called embeddings.

At its core, RAG combines two kinds of memory: the model's built-in memory, what it learned during training, and an external memory, a knowledge base it looks things up in for every question. The idea was introduced in 2020 in a research paper by Patrick Lewis and others. The built-in memory is frozen at training time; the external one you can update whenever you like, without touching the model.

Here is the flow:

  1. The question is turned into a search, typically by converting it into a numerical representation (an embedding).
  2. The search system retrieves the text passages from your knowledge source that sit closest to the question, often from a vector database.
  3. The retrieved passages are passed to the language model as context, along with the question itself.
  4. The model produces an answer based on the retrieved passages, not only on its training.

Two parts do the work. The retriever (the search system) finds the right information, and the generator (the language model) writes the answer. Amazon's overview of RAG describes the same split. For the search to hit the mark, the documents are usually broken into smaller chunks beforehand, so only the relevant pieces come along, not whole documents.

Why does RAG matter for businesses?

RAG makes AI useful on your own information. It lowers the risk of made-up answers, gives current facts, and every answer can be traced back to a specific document. On top of that, you avoid retraining the model every time your data changes.

That last point is the one that matters for most companies. A language model knows nothing about your internal routines, your product range or last week's price list, because it never saw them. With RAG you point the model at your sources, and it answers from them. Google Cloud's overview highlights exactly that as RAG's main benefit: fresh, verifiable answers without retraining.

Control is an underrated advantage. You decide which sources the AI may use, and because the answer can be linked to a document, it can be reviewed. Salesforce highlights traceability as one of RAG's main business benefits, and IBM describes grounding in retrieved sources as what keeps made-up answers down. That is a difference from a model that answers freely from memory, where you do not know where the information came from.

How we build RAG

For us, the source of truth lives in version-controlled text, and RAG sits on top as a derived layer: it helps an agent find and navigate the information, but it must never quietly replace the documented source.

In practice we combine structured navigation, to find the right section and point to where an answer comes from, with hybrid search, that is vector search plus ordinary text search, and a reranker that orders the hits by relevance. Every answer has to be traceable back to its original. The principle: structured navigation and source-tracing first, broad RAG second.

Eteya's RAG flow: a question goes via retrieval, navigation and search to agent context and a grounded answer, all resting on the version-controlled source of truth.Eteya's RAG flow: a question goes via retrieval, navigation and search to agent context and a grounded answer, all resting on the version-controlled source of truth.

RAG or fine-tuning: what is the difference?

In short: RAG gives the model new knowledge, fine-tuning changes its behaviour. RAG feeds in facts and documents at question time, which suits current and proprietary information. Fine-tuning instead retrains the model to change its tone, format or skills. They solve different problems.

AspectRAGFine-tuning
What it changesWhich knowledge the model can reachHow the model behaves (tone, format)
UpdatingSwap the documents, applies right awayRetrain, takes time and costs
Best forProprietary and current facts, citationsStyle, structure, specialised tasks
TraceabilityThe answer can be linked to a sourceHard to trace where the answer came from

For most businesses RAG is what is needed, not fine-tuning. If you want the AI to know your products, routines and documents, it is knowledge you are after, not behaviour. The two can be combined, but start with RAG: it is cheaper, faster to update, and easier to review.

When do you need RAG, and when not?

Use RAG when the answers should build on your own or current knowledge: support documents, internal policies, product catalogues, price lists. You do not need RAG for simple tasks, or when all the information you need already fits inside the question itself.

The rule of thumb is simple. If the AI must know something specific about your business, or something that changes often, RAG is the right call. If what the model already knows is enough, or the whole input fits in the prompt, RAG just adds complexity for no reason.

In practice, RAG is often the engine behind an AI agent that answers questions about your specific business. For the bigger picture of how such agents work for businesses, read our pillar guide on AI agents, or the definition of what an AI agent is. When the agent also needs to read and write in your systems in real time, the MCP protocol is often used instead: RAG retrieves knowledge, MCP connects to systems.

Frequently asked questions

No. RAG is a technique for retrieving knowledge, while an AI agent is a system that performs tasks and can use RAG as one part. An agent can look something up in your documents via RAG and then act on the answer, for example book, reply or update a case.

Usually, but not always. Most RAG solutions use embeddings and a vector database to find the most relevant passages. For small amounts of text, simpler search is sometimes enough. What fits depends on the volume of data and how fast and accurate the answers must be.

No, but it reduces them clearly. By grounding the answer in retrieved documents it becomes more accurate and traceable. The model can still misread or stitch sources together wrongly, so for important decisions human review of the answer is still needed.

The data stays in your chosen knowledge source and is passed to the model as context for each question. Choose a provider with EU data processing and a data processing agreement. How to keep AI within data protection in practice is covered in our guide on [AI and GDPR](/en/blog/ai-agents/ai-and-gdpr-security).

It depends on the data volume and integrations, but at its core it is the same kind of build as an AI agent. What such a build costs is covered in our [cost guide for AI agents](/en/blog/ai-agents/ai-agent-cost-guide).

Filip Thai
Filip ThaiCEO & Founder

AI consultant focused on automation and AI agents for SMBs. Builds solutions that actually deliver measurable savings.

Ready to put 
AI to work?