RAG (Retrieval Augmented Generation)¶
Terms¶
- Fine-tuning: Train an existing model on new data. It is expensive and limited by the number of parameters of teh base model. It is not additive. It could forget exiting knowledge and replace it with new knowledge.
- Few-shot prompting: Teach a LM to perform a new task by first providing with some examples. Then, the model is asked to generate new examples.
- Retrieval: Find relevant documents from a large corpus of documents.
RAG pipeline¶
We start with a base model. Our goal is to extend the model for QA on data that was not train on, and without the user having to look for the answers in external sources and provide it as context to the model.
- Identify the data source(s): PDF documents, websites, any other document source.
- Split the document(s) into chunks
- For each chunk, create embeddings for each chunk. An embedding is a vector of fixed size that represents the meaning of the chunk.
- Store the pairs of embedding, chunk in a vector database.
- At query time, we retrieve the relevant chunks from the vector database by searching for the most similar embeddings to the embedding of the query.
- Feed the most similar chunks into the base model's context and ask it to generate answers.
Embedding vectors¶
Multi-dimensional vector representations of chunks or words or any other meaningful unit. After training, the embeddings end up being able to capture semantic relationships between chunks:
- we expect that the angle between words with similar meaning to be small (the vectors are close together, similar) and that the angle between words with different meanings will be large (the vectors are far apart).
- The cosine similarity is a common way to measure the similarity (angle) of two vectors, based on the dot product of the 2 vectors.
Embedding models¶
These are models that can be used to generate embedding vectors for sentences, chunks or words. An example is the Sentence BERT model. The model generates embedding vectors for each of the tokens in the context. Then, the vectors are averaged together into a single vector that can then be used in cosine similarity with the embedding vector of the query. BERT has by default 768 dimensions. In order to represent an embedding in a lower dimensionality, we use a linear layer right after applying the average, that maps each of these 768 dimensions to a smaller number of dimensions. We call this number of dimensions the embedding size.
Vector database¶
Stores vectors of fixed dimensions (embeddigns) so that later on we can query the DB fo the most similar embeddings to a given embedding vector. A common way is to use an approximate KNN (nearest neighbour) search algorithm, like Annoy or Faiss. These algorithms are used to find the closest embeddings in a vector database. This is true not only for text, buat audio, images and other data as well.
K-NN search¶
With a naive approach, we would have to compute the cosine similarity of each embedding with every other embedding. An optimization to this is the HNSW that stands for Hierarchical Navigable Small Worlds. It uses a graph structure to store the embeddings. The graph is built using a nearest neighbour search algorithm (e.g., Annoy or Faiss). This way, we can query the graph for the closest embedding and then traverse the graph to find the closest neighbors of that embedding.
RAG pipeline with LlamaIndex and Ollama¶
!pip install llama-index llama-index-llms-ollama bs4 requests llama-index-embeddings-huggingface
from llama_index.llms.ollama import Ollama
llm = Ollama(model="llama3.1:8b", request_timeout=600)
response = llm.complete("Do you know about Claude 3.5 Sonnet?")
print(response.text)
I'm not familiar with a specific work called "Claude 3.5 Sonnet." It's possible that it might be a lesser-known or unpublished poem, or maybe even a joke or a fictional reference. However, I can try to help you learn more about it. Can you please provide me with some context or information about the Claude 3.5 Sonnet? For example: * Who is Claude, and what's the significance of the number 3.5? * Is this a specific poem by a known author, or perhaps a reference in a book, movie, or TV show? * Do you have any details about the content, style, or themes of the sonnet? Any information you can provide will help me better understand what you're looking for and try to assist you!
import requests
from bs4 import BeautifulSoup
from llama_index.core import Document
urls = [
"https://www.anthropic.com/news/claude-3-5-sonnet",
"https://www.reddit.com/r/ClaudeAI/comments/1dvfyp6/all_this_talk_about_claude_sonnet_35_being_good/"
]
docs = []
for url in urls:
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
docs.append(Document(text=soup.text, metadata={"source": url}))
from llama_index.core import VectorStoreIndex
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core import Settings
Settings.embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
Settings.llm = Ollama(model="llama3.1:8b", request_timeout=600)
index = VectorStoreIndex.from_documents(docs)
query_engine = index.as_query_engine(similarity_top_k=1)
response = query_engine.query("Do you know about Claude 3.5 Sonnet?")
print(response)
Claude 3.5 Sonnet is a newly released AI model that outperforms competitor models and previous versions on various evaluations, including graduate-level reasoning, undergraduate-level knowledge, and coding proficiency. It operates at twice the speed of its predecessor and demonstrates improved capabilities in grasping nuance, humor, and complex instructions, with exceptional writing skills and natural tone.
response.metadata
{'2b1f3100-d8de-49db-a940-ef6d666b7810': {'source': 'https://www.anthropic.com/news/claude-3-5-sonnet'}}