MORE AGENTS Is All You Need


type: video
status: OK
published: 2025-03-22
share: true

AI Summary

  • Title: More Agents Is All You Need
  • Context:
    • The paper from Tencent in China explores the idea of using multiple large language models (LLMs) to answer the same question to improve output quality.
    • It distinguishes this approach from mixture of experts, emphasizing that it’s a simple ensembling method.
  • Ensembling Method:
    • Similar to a random forest in classical machine learning, where multiple decision trees are combined to improve results.
    • The paper proposes a sampling and voting approach with multiple LLMs to enhance performance.
  • Findings:
    • Performance scales with the number of LLM agents used.
    • Accuracy increases as more models are added to the ensemble.
  • Experiment:
    • Multiple experiments conducted with diverse datasets and model sizes.
    • Explores the correlation between performance improvements and task difficulty.
    • Three dimensions of difficulty: inherent difficulty, length of reasoning steps, and prior probability of correct answers.
  • Methodology:
    • A query is sent to all models in the ensemble.
    • Each model provides an output, and majority voting determines the final result.
  • Models Used:
    • GPT series, including GPT-3.5 Turbo and GPT-4, and LLaMA-2 models with 13 billion and 70 billion parameters.
  • Results:
    • Ensembling improves performance relative to individual model baselines.
    • Larger, pre-trained models still outperform ensembles of smaller models.
    • Performance gains are more significant for smaller models facing difficult tasks.
  • Implications:
    • Ensembling can be cost-effective and improve efficiency.
    • Layered approach to problem-solving with different models for different difficulty levels.
    • Distributed and asynchronous inference methods are possible.
  • Conclusion:
    • While ensembling can enhance smaller models, the inherent knowledge from pre-training in larger models remains crucial.
    • The paper suggests that stacking models can improve performance but cannot fully replicate the capabilities of more advanced models.
  • Additional Notes:
    • The paper’s code is available for experimentation with LLaMA-2, GPT-3.5 Turbo, and GPT-4 models.

For further reading, the paper and its methodology can be explored in detail.