ThirdBrAIn.tech

ThirdBrAIn.tech

Search

❯

❯

❯

❯

❯

A NEW TRUTH - AI in 2024

Apr 02, 20252 min read

A NEW TRUTH: AI in 2024

AI Summary

Summary: Understanding AI in 2024

Building Block 1: Pre-training Large Language Models (LLMs)

Data Sets for Pre-training LLMs:

Example data sets include the Proof Pile 2 (55 billion tokens) and Common Crawl (filtered down to 6.3 million documents about mathematics).

Open-source data sets like Red Pajama combine multiple sources to create larger data sets (1.2 trillion tokens).

Historical data sets like C4 from 2020 are also used.

The mixture of data defines the LLM’s performance.

Pre-training Process:

Pre-training involves modifying the tensor weights of a Transformer-based LLM to replicate semantic patterns from the training data.

Example LLM: Mistral 7B with 7 billion parameters, 32k vocabulary size, and 8,000 token context length.

Advances in 2024 include faster inference and new attention mechanisms like group very attention and sliding window attention.

Cost and Time:

Pre-training on 1,000 GPUs for four months could cost around $100,000 in 2024.

Building Block 2: Fine-tuning and Alignment

Fine-tuning LLMs:

Fine-tuning tailors the LLM to specific tasks like question-answering or summarization.

Instruction-based data sets are used for fine-tuning.

In 2024, DPU (Deep P Learning) alignment ensures the model behaves in a specific way, like being friendly and avoiding aggression.

Alignment Data Sets:

Alignment data sets dictate desired and undesired behaviors.

Examples include the Ultra Chat and Ultra Feedback data sets.

Open Source Models:

Open-source models like Mistral 7B can be used and fine-tuned with standardized Python programs.

The community regularly updates these models with new methodologies.

Building Block 3: Prompt Engineering and In-Context Learning

Prompt Engineering:

Strategic formulation of prompts to guide the LLM’s responses.

Example: Asking the LLM to summarize a scientific article with specific details.

In-Context Learning (ICL):

Providing examples within prompts to prime the LLM for desired output formats.

ICL is temporary and limited by the maximum context length of the prompt (e.g., 8K, 32K, 100K tokens).

Fine-tuning vs. ICL:

Fine-tuning integrates new knowledge permanently into the LLM.

ICL provides temporary knowledge within the context window.

Path and Lora adapters can be used to add knowledge layers for efficient fine-tuning.

Conclusion:

Understanding AI in 2024 involves grasping the pre-training process, fine-tuning and alignment for specific tasks, and the nuances of prompt engineering and in-context learning. Advances in technology and methodologies have made AI more accessible and customizable, with open-source models playing a significant role in the AI ecosystem.

A NEW TRUTH: AI in 2024
Summary: Understanding AI in 2024

Graph View

Backlinks

No backlinks found

Created with Quartz v4.2.3 © 2025

GitHub
Discord Community