The Best Tiny LLMs



AI Summary

Summary: Best Tiny Large Language Models

  • Overview of Tiny LLMS
    • Motivation for using small LLMS
    • Performance comparison of F Deep Seek Coder (1.3B), Tiny Llama, and microsoft’s F2 (2.7B)
    • Fine-tuning tips for tiny LLMS
    • Function calling with tiny LLMS
    • Challenges with tiny models for function calling
    • Introduction of custom model “Tris Tiny” (1.3B)
  • Reasons for Tiny Language Models
    • Run locally on consumer hardware
    • High throughput API for cost efficiency
  • Performance Comparison
    • Utilized a Jupyter notebook from the advanced inference repository
    • Compared Tiny Llama, microsoft’s F2, and Deep Seek Coder
    • microsoft F2 is not available for commercial use
  • Fine-Tuning Tiny LLMS
    • Different scripts for various fine-tuning methods
    • Importance of training enough parameters in tiny models
    • Laura (Low Rank Adaptation) technique adjustments for tiny models
  • Function Calling with Tiny LLMS
    • Quantization effects on performance
    • Challenges with getting tiny models to work for function calling
    • Development of “Tris Tiny” for API function calling
  • Quantization and Model Size
    • Open Chat model quantization to reduce size
    • Performance degradation with excessive quantization
  • Fine-Tuning for Function Calling
    • Deep Seek as the best starting point
    • Challenges with chain function calling and recursive function calls
    • Helper text and logic to prevent recursive function calling in tiny models
  • Running Models Locally
    • Practicality of tiny models for local use
    • High-speed inference and memory considerations

For more detailed guidance and resources, refer to the descriptions or visit tr.com.

Expand upon the summary and give answers based on the transcript