Self Hosted AI Server gateway for LLM APIs, Ollama, ComfyUI & FFmpeg servers



AI Summary

AI Server Overview

  • AI Server is a free, self-hosted unified API for interacting with various AI models and services.
  • It supports major LLM APIs, Open Router, Olama instances, Comfy UI workflows, and more.
  • Features include typed client support for 11 languages, API key access control, live background jobs monitoring, and a flexible, easy installation process.
  • Integrates with Comfy UI for simplified AI tasks like text-to-image, speech-to-text, text-to-speech, etc.

Installation Process

  1. Clone the AI Server repository from GitHub (server-stack/ai-server).
  2. Run the installation script (install.sh | bash) in the cloned AI Server folder.
  3. During installation, provide API keys for providers like Open Router, OpenAI, Mistal, Google Cloud, Gro Cloud, and Replicate.
  4. Set up an auth secret for accessing the AI Server’s admin UI.
  5. Configuration details are saved to a .env file for review.

Features and Configuration

  • AI Server acts as a single OpenAI-compatible entry point for LLM integrations.
  • Supports APIs for modalities like text-to-image, image-to-image, image-to-text, image upscaling, speech-to-text, and more.
  • Comfy UI agent can be deployed on separate hosts with specific hardware requirements (e.g., Nvidia GPUs).
  • Admin portal accessible at localhost:5006/for/admin for configuring AI providers, generating API keys, monitoring tasks, and testing integrations.

UIs and Testing

  • Chat UI: Test different AI providers and steer chats with system prompts.
  • Text to Image UI: Generate images with prompts, resolution, image count, and batch requests.
  • Image to Text UI: Describe images using the Florence 2 model via Comfy UI agent.
  • Image to Image UI: Transform images with prompts using an inpainting workflow.
  • Upscaling UI: Double image size using an upscale model on Comfy UI agent.
  • Speech to Text UI: Convert audio to text using OpenAI’s Whisper model.
  • Text to Speech UI: Generate spoken audio from text using Comfy UI agent or OpenAI’s service.

API and Client Support

  • OpenAI-compatible chat API accessible using service clients in languages like C#, TypeScript, Python, PHP, Java, Swift, etc.
  • Provides typed end-to-end experience with language-specific DTOs.
  • Offers sync and async endpoints with web callback feature for event-driven integration.

Additional Information

  • AI Server is open-source, actively developed, and available on GitHub.
  • Documentation and links to the project are provided in the video description.

(Note: The summary is based on the provided transcript and does not include any URLs or CLI commands as none were specified in the text.)