Local and Open Source Speech to Speech Assistant



AI Summary

Summary of Video Transcript

  • The video is a tutorial on setting up a local voice assistant using the project “Warby.”
  • Warby allows communication with LLMs (Large Language Models) through voice.
  • The setup involves three local models: speech-to-text, LLM for generating responses, and text-to-speech.
  • The hardware used in the tutorial is a MacBook Pro with an M2 Chip and 96 GB of VRAM, but the models can run on a CPU as well.
  • The three API endpoints needed are:
    • Fast Whisper API for speech-to-text conversion.
    • Local LLM served through Olama for generating responses.
    • Mow Text-to-Speech system for converting text to speech.
  • The tutorial provides step-by-step instructions for setting up each component.
  • The setup process includes cloning repositories, installing packages, and running APIs.
  • The video demonstrates the use of Warby, showing real-time response speeds and interactions with the voice assistant.
  • The voice assistant can perform tasks and answer questions with varying response times based on hardware and text length.
  • The video also mentions upcoming updates to Warby, including a UI and improvements to the codebase.

Detailed Instructions and URLs

  • Local LLM Setup:
    • Install Olama and run the desired model using the command AMA run <model_name>.
  • Fast Whisper API Setup:
    • Clone the Fast Whisper API repository (URL not provided).
    • Install required packages using pip install -r requirements.txt.
    • Run the API with uvicorn main:app --reload.
  • Mow Text-to-Speech Setup:
    • Follow the installation instructions for your operating system (URL not provided).
    • Clone the Mow TTS repository (URL not provided).
    • Change directory to the cloned repo and install with the provided command.
    • Download the text-to-speech model using the provided command.
  • Warby Configuration:
    • Update config.py to use Fast Whisper API, Olama, and Mow TTS.
    • Run the voice assistant with python run_voice_assistant.py.

Additional Notes

  • The video emphasizes the modularity of Warby, allowing users to replace any of the endpoints with models of their choice.
  • The presenter advises checking out previous videos for initial setup and architecture overview of Warby.
  • Links to additional resources and videos are mentioned but not provided in the transcript.