A NEW TRUTH: AI in 2024



AI Summary

Summary: Understanding AI in 2024

Building Block 1: Pre-training Large Language Models (LLMs)

  • Data Sets for Pre-training LLMs:
    • Example data sets include the Proof Pile 2 (55 billion tokens) and Common Crawl (filtered down to 6.3 million documents about mathematics).
    • Open-source data sets like Red Pajama combine multiple sources to create larger data sets (1.2 trillion tokens).
    • Historical data sets like C4 from 2020 are also used.
    • The mixture of data defines the LLM’s performance.
  • Pre-training Process:
    • Pre-training involves modifying the tensor weights of a Transformer-based LLM to replicate semantic patterns from the training data.
    • Example LLM: Mistral 7B with 7 billion parameters, 32k vocabulary size, and 8,000 token context length.
    • Advances in 2024 include faster inference and new attention mechanisms like group very attention and sliding window attention.
  • Cost and Time:
    • Pre-training on 1,000 GPUs for four months could cost around $100,000 in 2024.

Building Block 2: Fine-tuning and Alignment

  • Fine-tuning LLMs:
    • Fine-tuning tailors the LLM to specific tasks like question-answering or summarization.
    • Instruction-based data sets are used for fine-tuning.
    • In 2024, DPU (Deep P Learning) alignment ensures the model behaves in a specific way, like being friendly and avoiding aggression.
  • Alignment Data Sets:
    • Alignment data sets dictate desired and undesired behaviors.
    • Examples include the Ultra Chat and Ultra Feedback data sets.
  • Open Source Models:
    • Open-source models like Mistral 7B can be used and fine-tuned with standardized Python programs.
    • The community regularly updates these models with new methodologies.

Building Block 3: Prompt Engineering and In-Context Learning

  • Prompt Engineering:
    • Strategic formulation of prompts to guide the LLM’s responses.
    • Example: Asking the LLM to summarize a scientific article with specific details.
  • In-Context Learning (ICL):
    • Providing examples within prompts to prime the LLM for desired output formats.
    • ICL is temporary and limited by the maximum context length of the prompt (e.g., 8K, 32K, 100K tokens).
  • Fine-tuning vs. ICL:
    • Fine-tuning integrates new knowledge permanently into the LLM.
    • ICL provides temporary knowledge within the context window.
    • Path and Lora adapters can be used to add knowledge layers for efficient fine-tuning.

Conclusion:

Understanding AI in 2024 involves grasping the pre-training process, fine-tuning and alignment for specific tasks, and the nuances of prompt engineering and in-context learning. Advances in technology and methodologies have made AI more accessible and customizable, with open-source models playing a significant role in the AI ecosystem.