Realtime AI in the Browser
AI Summary
Summary of AI-Powered Babble Fish Video
Concept
- Create an AI-powered Babble Fish to transcribe speech to text and broadcast it to an audience in real-time.
- Audience can select their preferred language for translation.
Implementation
- Use the Whisper base model from Hugging Face for offline transcription in the browser.
- Broadcast transcribed text using Supabase Realtime.
- Translate text into different languages using another model in the browser.
Technical Details
- Utilize Hugging Face Transformers JS and Onyx runtime for model execution in the browser.
- Implement WebGPU support for better performance (transcription is WebGPU enabled, translation is not yet).
- The application is a client-side JavaScript application using React and Vite.
- Use React Router with a hash router for GitHub Pages compatibility.
- Audio Web APIs are used for chunking audio for transcription.
- Web workers handle transcription and translation tasks.
Transcription Process
- Create a transcription worker on the broadcaster page.
- Load the Whisper model from Hugging Face once and cache it in the browser.
- Use a Singleton pattern to ensure only one instance of the model is loaded.
- The transcription worker builds a speech recognition pipeline.
- Audio input is batch decoded into output text and sent back to the main thread.
Broadcasting Process
- Utilize a utils function for broadcasting sentences using Supabase Realtime.
- Generate a random channel ID for broadcasting.
- Set up a Supabase project and configure
.env.local
with the necessary keys and URLs.- Ensure the real-time service is running for broadcasting.
Receiver Process
- The receiver subscribes to the real-time broadcast and translates the transcript.
- Use a translation worker with a Singleton pattern for the translation pipeline.
- The translation model is open-source from Meta, trained on 200 languages.
- Stream translations back through a callback function for a live effect.
- Use Supabase Realtime to subscribe to a channel and listen for transcripts.
- Disable new translation tasks until the current one is complete to avoid overloading.
Additional Notes
- The application is purely static, hosted on GitHub Pages.
- The code is available on the Supabase Community GitHub repository.
- The demo showcases the capabilities of Transformers JS and Supabase Realtime for AI in the browser.
Conclusion
- The video demonstrates building a real-time transcription and translation service using client-side technologies.
- It emphasizes the use of open-source models and the potential of AI in the browser without the need for a server backend.