The Best Tiny LLMs
AI Summary
Summary: Best Tiny Large Language Models
- Overview of Tiny LLMS
- Motivation for using small LLMS
- Performance comparison of F Deep Seek Coder (1.3B), Tiny Llama, and microsoft’s F2 (2.7B)
- Fine-tuning tips for tiny LLMS
- Function calling with tiny LLMS
- Challenges with tiny models for function calling
- Introduction of custom model “Tris Tiny” (1.3B)
- Reasons for Tiny Language Models
- Run locally on consumer hardware
- High throughput API for cost efficiency
- Performance Comparison
- Fine-Tuning Tiny LLMS
- Different scripts for various fine-tuning methods
- Importance of training enough parameters in tiny models
- Laura (Low Rank Adaptation) technique adjustments for tiny models
- Function Calling with Tiny LLMS
- Quantization effects on performance
- Challenges with getting tiny models to work for function calling
- Development of “Tris Tiny” for API function calling
- Quantization and Model Size
- Open Chat model quantization to reduce size
- Performance degradation with excessive quantization
- Fine-Tuning for Function Calling
- Deep Seek as the best starting point
- Challenges with chain function calling and recursive function calls
- Helper text and logic to prevent recursive function calling in tiny models
- Running Models Locally
- Practicality of tiny models for local use
- High-speed inference and memory considerations
For more detailed guidance and resources, refer to the descriptions or visit tr.com.
Expand upon the summary and give answers based on the transcript