Realtime AI in the Browser



AI Summary

Summary of AI-Powered Babble Fish Video

Concept

  • Create an AI-powered Babble Fish to transcribe speech to text and broadcast it to an audience in real-time.
  • Audience can select their preferred language for translation.

Implementation

  • Use the Whisper base model from Hugging Face for offline transcription in the browser.
  • Broadcast transcribed text using Supabase Realtime.
  • Translate text into different languages using another model in the browser.

Technical Details

  • Utilize Hugging Face Transformers JS and Onyx runtime for model execution in the browser.
  • Implement WebGPU support for better performance (transcription is WebGPU enabled, translation is not yet).
  • The application is a client-side JavaScript application using React and Vite.
  • Use React Router with a hash router for GitHub Pages compatibility.
  • Audio Web APIs are used for chunking audio for transcription.
  • Web workers handle transcription and translation tasks.

Transcription Process

  • Create a transcription worker on the broadcaster page.
  • Load the Whisper model from Hugging Face once and cache it in the browser.
  • Use a Singleton pattern to ensure only one instance of the model is loaded.
  • The transcription worker builds a speech recognition pipeline.
  • Audio input is batch decoded into output text and sent back to the main thread.

Broadcasting Process

  • Utilize a utils function for broadcasting sentences using Supabase Realtime.
  • Generate a random channel ID for broadcasting.
  • Set up a Supabase project and configure .env.local with the necessary keys and URLs.
  • Ensure the real-time service is running for broadcasting.

Receiver Process

  • The receiver subscribes to the real-time broadcast and translates the transcript.
  • Use a translation worker with a Singleton pattern for the translation pipeline.
  • The translation model is open-source from Meta, trained on 200 languages.
  • Stream translations back through a callback function for a live effect.
  • Use Supabase Realtime to subscribe to a channel and listen for transcripts.
  • Disable new translation tasks until the current one is complete to avoid overloading.

Additional Notes

  • The application is purely static, hosted on GitHub Pages.
  • The code is available on the Supabase Community GitHub repository.
  • The demo showcases the capabilities of Transformers JS and Supabase Realtime for AI in the browser.

Conclusion

  • The video demonstrates building a real-time transcription and translation service using client-side technologies.
  • It emphasizes the use of open-source models and the potential of AI in the browser without the need for a server backend.