Building Corrective RAG from scratch with open-source, local LLMs
AI Summary
Summary: Building Self-Reflective RAG Apps with Local Models
- Presenter: Lance from Lang chain team
- Topic: Creating self-reflective retrieval-augmented generation (RAG) applications using open-source, local models on a laptop.
Key Concepts:
- Self-Reflection in RAG: Involves reasoning and potentially retrying steps based on the relevance and quality of retrieved documents and generated responses.
- Corrective RAG (C-RAG): A paper illustrating self-reflection in RAG. It grades retrieved documents for relevance, refines knowledge from correct documents, and supplements with web search if documents are ambiguous or incorrect.
Implementation Using Langra:
- Local Models: Smaller local language models (LLMs) are used instead of large-scale API-gated models.
- Graph Layout: The process involves retrieval using local embeddings, grading documents for relevance, query rewriting, web search, and generation based on web search results.
Getting Started with Local LLMs:
- AMA: A platform for running models locally, supporting various platforms.
- Model Selection: Using AMA to download and run models like mraw instruct (7 billion parameter model).
Building the Index:
- Blog Post: Used as a target for retrieval, split into chunks.
- Embedding Model: GPT-4all embeddings from nomic, CPU-optimized, contrastively trained.
- Vector Store: Chroma, an open-source local vector store, is used for document retrieval.
Graph State and Nodes:
- State Dictionary: Contains keys relevant to RAG (question, documents, generation).
- Functions for Nodes: Each node in the graph corresponds to a function that modifies the state.
- Conditional Edge: A logical gate that decides the next step based on document grading results.
Using AMA Json Mode:
- Structured Output: Ensures reliable interpretation of output by enforcing JSON format with a binary score (yes/no).
Running the Graph:
- Compilation: The graph is compiled with nodes and edges according to the logical flow.
- Execution: The graph is executed with a question, traversing through the steps and printing the process.
Conclusion:
- Local Models for Logical Flows: Local models can be effective for logical reasoning tasks by performing specific tasks at each step of a constrained logical flow.
- Agents vs. State Machines: Depending on the problem, a state machine or graph with logical steps may be more reliable than using an agent for complex reasoning.
Encouragement:
- Lance encourages trying out the approach of using local models and Lang graph for logical reasoning tasks, highlighting the reliability and usefulness of this method.