Retrieval Augmented Generation (RAG)

Retrieval Augmented Generation is the process of combining a user's prompt with relevant external contextual information. This process enables LLMs to have more accurate answers, hallucinate less and increases consistency and speed through caching.

LLMs are usually trained on public data, so it cannot know about internal memos of a company or personal emails of an user. It will either not be able to answer or will hallucinate an incorrect answer. RAG would solve this by getting the contextual data necessary and injecting along with the prompt giving the info that the LLM needs to answer properly.

Example

These are 2 cases for a simple scenario where the user asks the LLM what is their name.

Using raw LLM knowledge

Pasted image 20250630102214.png

Pasted image 20250630105329.png

Using RAG

Pasted image 20250630102236.png

Pasted image 20250630105448.png

This is an naive example on how it works. For real scenarios the user wouldn't need to input this data manually, because this would be done by the systems that they are using.

RAG can work with many types of data sources such as files, databases, videos and so on, but in most of the content that is currently found on internet it is used with Vector databases.

Vector Databases

Vector databases are specialized in storing data as numerical vectors and allowing them to be searched based on similarity. This let's the user search data using natural language and the vector will match the similarity of the question with the data that exists in the database and the data that is similar will be returned in the query.

Going a bit deeper, instead of tables, a vector database will organize the data in a multi dimensional space where each point is a piece of data and their location represents its meaning relative to other pieces of data. The closer they are, the more semantic similarities they will have. Eg. Apple will have a higher semantic similarity to fruit than to machine.

Vector Search and embedding models

Vector search is a type of search that users an special type of language model called an embedding model, that is responsible to translated the text into a numeric vector that represents the text's meaning meaning.

This process is applied both to the data and the user's query, allowing making both content more simple to compare mathematically and thus get relevant results.

Pasted image 20250630104659.png

Sources:
https://www.databricks.com/sites/default/files/2024-05/2024-05-EB-A_Compact_GuideTo_RAG.pdf