A NEW TRUTH: AI in 2024
AI Summary
Summary: Understanding AI in 2024
Building Block 1: Pre-training Large Language Models (LLMs)
- Data Sets for Pre-training LLMs:
- Example data sets include the Proof Pile 2 (55 billion tokens) and Common Crawl (filtered down to 6.3 million documents about mathematics).
- Open-source data sets like Red Pajama combine multiple sources to create larger data sets (1.2 trillion tokens).
- Historical data sets like C4 from 2020 are also used.
- The mixture of data defines the LLM’s performance.
- Pre-training Process:
- Pre-training involves modifying the tensor weights of a Transformer-based LLM to replicate semantic patterns from the training data.
- Example LLM: Mistral 7B with 7 billion parameters, 32k vocabulary size, and 8,000 token context length.
- Advances in 2024 include faster inference and new attention mechanisms like group very attention and sliding window attention.
- Cost and Time:
- Pre-training on 1,000 GPUs for four months could cost around $100,000 in 2024.
Building Block 2: Fine-tuning and Alignment
- Fine-tuning LLMs:
- Fine-tuning tailors the LLM to specific tasks like question-answering or summarization.
- Instruction-based data sets are used for fine-tuning.
- In 2024, DPU (Deep P Learning) alignment ensures the model behaves in a specific way, like being friendly and avoiding aggression.
- Alignment Data Sets:
- Alignment data sets dictate desired and undesired behaviors.
- Examples include the Ultra Chat and Ultra Feedback data sets.
- Open Source Models:
- Open-source models like Mistral 7B can be used and fine-tuned with standardized Python programs.
- The community regularly updates these models with new methodologies.
Building Block 3: Prompt Engineering and In-Context Learning
- Prompt Engineering:
- Strategic formulation of prompts to guide the LLM’s responses.
- Example: Asking the LLM to summarize a scientific article with specific details.
- In-Context Learning (ICL):
- Providing examples within prompts to prime the LLM for desired output formats.
- ICL is temporary and limited by the maximum context length of the prompt (e.g., 8K, 32K, 100K tokens).
- Fine-tuning vs. ICL:
- Fine-tuning integrates new knowledge permanently into the LLM.
- ICL provides temporary knowledge within the context window.
- Path and Lora adapters can be used to add knowledge layers for efficient fine-tuning.
Conclusion:
Understanding AI in 2024 involves grasping the pre-training process, fine-tuning and alignment for specific tasks, and the nuances of prompt engineering and in-context learning. Advances in technology and methodologies have made AI more accessible and customizable, with open-source models playing a significant role in the AI ecosystem.