Building Corrective RAG from scratch with open-source, local LLMs



AI Summary

Summary: Building Self-Reflective RAG Apps with Local Models

  • Presenter: Lance from Lang chain team
  • Topic: Creating self-reflective retrieval-augmented generation (RAG) applications using open-source, local models on a laptop.

Key Concepts:

  • Self-Reflection in RAG: Involves reasoning and potentially retrying steps based on the relevance and quality of retrieved documents and generated responses.
  • Corrective RAG (C-RAG): A paper illustrating self-reflection in RAG. It grades retrieved documents for relevance, refines knowledge from correct documents, and supplements with web search if documents are ambiguous or incorrect.

Implementation Using Langra:

  • Local Models: Smaller local language models (LLMs) are used instead of large-scale API-gated models.
  • Graph Layout: The process involves retrieval using local embeddings, grading documents for relevance, query rewriting, web search, and generation based on web search results.

Getting Started with Local LLMs:

  • AMA: A platform for running models locally, supporting various platforms.
  • Model Selection: Using AMA to download and run models like mraw instruct (7 billion parameter model).

Building the Index:

  • Blog Post: Used as a target for retrieval, split into chunks.
  • Embedding Model: GPT-4all embeddings from nomic, CPU-optimized, contrastively trained.
  • Vector Store: Chroma, an open-source local vector store, is used for document retrieval.

Graph State and Nodes:

  • State Dictionary: Contains keys relevant to RAG (question, documents, generation).
  • Functions for Nodes: Each node in the graph corresponds to a function that modifies the state.
  • Conditional Edge: A logical gate that decides the next step based on document grading results.

Using AMA Json Mode:

  • Structured Output: Ensures reliable interpretation of output by enforcing JSON format with a binary score (yes/no).

Running the Graph:

  • Compilation: The graph is compiled with nodes and edges according to the logical flow.
  • Execution: The graph is executed with a question, traversing through the steps and printing the process.

Conclusion:

  • Local Models for Logical Flows: Local models can be effective for logical reasoning tasks by performing specific tasks at each step of a constrained logical flow.
  • Agents vs. State Machines: Depending on the problem, a state machine or graph with logical steps may be more reliable than using an agent for complex reasoning.

Encouragement:

  • Lance encourages trying out the approach of using local models and Lang graph for logical reasoning tasks, highlighting the reliability and usefulness of this method.