Have you ever thought “I wish I could find the document where I’ve noted some information but I can’t even remember if it was a local notebook or a gdoc or what even was the filename” would be nice to make all your data searchable without sharing your sensitive personal information?
Well you can build a Glean like application locally, and keep all your personal information private and power it with a local LLM. Let’s build an advanced local RAG.
Note: the following is just a demonstration of the concept with a particular demo implementation.
What is RAG?
Retrieval-Augmented Generation (RAG) is an AI technique that enhances the accuracy and relevance of language model responses by combining retrieval and generation.
What is advanced RAG?
Combine multiple sources both static (embedded documents) and dynamic (web search) and than use a Re-rank model to get the most relevant context for the question.
What is the benefit to use Re-rank models?
Using a re-rank model in a Retrieval-Augmented Generation (RAG) solution can significantly improve the quality of the information provided to the language model. Here’s why it’s beneficial, particularly when re-ranking results from a vector database:
- Improved Relevance Vector databases often return a list of documents using similarity search (e.g., cosine similarity in embeddings). While this is efficient, the similarity score may not perfectly reflect relevance to the query’s intent. A re-ranker uses a more sophisticated language model to assess the context, relevance, and alignment with the query, leading to a more accurate ranking.
- Contextual Understanding Re-rank models, especially those based on transformer architectures like Cross-Encoders (e.g., BERT-based models), evaluate the query and documents together. Unlike vector similarity that treats query and document embeddings independently, re-rankers understand the relationship between them using deeper language understanding. This leads to better judgment of nuanced, contextual relevance.
- Handling Ambiguity If the query is ambiguous, vector databases might return a broad set of results with varied relevance. A re-rank model can prioritize results that are most likely to resolve the ambiguity by focusing on the query’s intent.
- Filtering Noise Vector search may introduce irrelevant documents due to issues like semantic drift or embedding inaccuracies. A re-ranker serves as a second layer of filtration, pushing low-quality results further down the list or removing them.