Fine-Tune Your Own Tiny-Llama on Custom Dataset



AI Summary

  • Overview of Fine-Tuning Tiny Lama:
    • Tiny Lama is a small language model not suited for general language tasks.
    • It can be fine-tuned for specific tasks and run on edge devices.
  • Fine-Tuning Process:
    • Using the “colors” dataset from Hugging Face, but custom datasets can be formatted similarly.
    • Inspired by Minyang Jiang’s blog post on fine-tuning Tiny Lama with color data.
  • Dataset Structure:
    • Two columns: “description” (color description) and “colors” (hexadecimal code).
    • Goal: Train Tiny Lama to predict hex code from color description without explicit instructions.
  • Setup for Fine-Tuning:
    • Running on Google Colab with good inference speed.
    • Installation of necessary packages: accelerate, bits and bytes, Transformer TRL, etc.
    • Importing required libraries: PyTorch, Hugging Face datasets, Lora configurations, and tokenizers.
  • Data Formatting:
    • Using chat ML format for the dataset.
    • Description becomes user input, and color column becomes model response.
    • Data formatted to include a “text” column with prompt template.
  • Model Preparation:
    • Using the chat version of Tiny Lama for shorter training time.
    • Downloading and initializing tokenizer and model.
    • Setting up Lora configurations and training parameters.
    • Creating a supervised fine-tuning trainer object with preformatted dataset.
    • Specifying training arguments and max sequence length.
  • Training:
    • Running for 250 steps, monitoring training loss.
    • Training only Lora adapters, not the entire model.
  • Post-Training:
    • Merging trained Lora adapters with the original model.
    • Final model includes Lora adapters.
  • Inference Testing:
    • Using a function to generate responses based on user input.
    • Measuring inference speed.
    • Helper function to display color from hexadecimal code.
  • Results:
    • Model predicts hex code for “light orange” quickly and accurately.
    • Demonstrates the potential of small language models for edge devices.
  • Conclusion:
    • Excitement for the future of small language models in consumer hardware.
    • Encouragement to explore and ask questions about the process.