Guidance Language for Controlling LLMs



AI Nuggets

Instructions for Installing and Using Guidance with Large Language Models

Prerequisites

  • Use Google Colab with a T4 GPU runtime.

Installation Steps

  1. Install Guidance:
    pip install guidance  
  2. Install Lama CPP (Python wrapper for Llama models):
    pip install lama-cpp-python  
    • This may take a few minutes to build the wheel.

Downloading the Model

  1. Choose a model from the Hugging Face website:
    • Go to the Hugging Face models page.
    • Click on “Files” to see different versions.
    • Select a model version based on your disk space.
    • Click on “Download” or right-click and copy the link.
  2. Use the wget command in Colab to download the model:
    wget <MODEL_URL>  
    • Replace <MODEL_URL> with the actual URL of the model you want to download from Hugging Face.

Initializing Guidance

  1. Import the models from guidance:
    from guidance import models  
  2. Create an initial model object using the constructor under guidance.model:
    model = guidance.model(path_to_model, n_gpu_layers=-1, n_core_ctx=4096)  
    • path_to_model: The local path to the downloaded model file.
    • n_gpu_layers: Set to -1 to use all available GPU layers for faster processing.
    • n_core_ctx: The context size or the maximum sequence length the model can handle (e.g., 4096 tokens).

Using Guidance for Inference

  1. Specify your prompt with the model:
    prompt = "Your prompt here"  
  2. Concatenate the model with the string (prompt):
    result = model + prompt  
  3. Generate the output with a specified maximum number of tokens:
    output = result.gen(max_tokens=150)  
    • max_tokens: The maximum number of tokens to generate (e.g., 150).

Additional Capabilities

  • Guidance allows for multi-generation, variable capture, function encapsulation, and interleaved generation.
  • You can control various aspects of the generation process.

Resources

  • The GitHub repo link for Guidance will be provided in the video description.

Tips

  • You can use any model that follows the Llama architecture with Guidance.
  • Guidance is designed to allow you to write code in a familiar Pythonic way while leveraging the power of large language models.

Note

  • The exact URLs for the models and the GitHub repository are not provided in the transcript. They should be available in the video description or by following the instructions to navigate the Hugging Face website.

Call to Action

  • Consider subscribing to the channel for more content.
  • Share the video with your network if you find it helpful.