Mixture of Models (MoM) - SHOCKING Results on Hard LLM Problems!



AI Summary

Summary: Mixture of Models Experiment

Concept:

  • Based on the wisdom of the crowd principle.
  • Multiple models (referred to as “peasants”) solve the same problem.
  • Different architectures (King, Duopoly, Democracy) are used to synthesize answers.

Architectures:

  1. King:
    • Multiple LLMs provide answers to a query.
    • A “king” model (GPT-4 Turbo) synthesizes these answers with the original query to provide a final response.
  2. Duopoly:
    • Similar to King, but with two “co-founder” models (GPT-4 Turbo and CLA-3 Opus) discussing and agreeing on the best answer.
  3. Democracy:
    • Each model has equal weight.
    • Models vote on the best answer, and a “teller” model (GPT-4) counts votes to determine the final answer.

Implementation:

  • Models include LLMs like Llama 8B, CLA-3, Hiu, etc.
  • User queries are processed, and answers are collected as context.
  • System messages guide models towards desired outcomes.
  • HTML responses are generated for user-friendly display.

Testing:

  • Problems tested include a marble logic puzzle, an age-related logic question, a hard coding problem from LeetCode, and a creative writing task.
  • Results varied by architecture, with the King setup generally performing best.

Code Setup:

  • Functions for different models are defined.
  • Main function orchestrates the process according to the chosen architecture.
  • System messages are crafted to guide the “king” or “co-founders” in their synthesis of answers.

Results:

  • King architecture solved most problems correctly.
  • Duopoly had mixed results, with some correct and some incorrect answers.
  • Democracy was less reliable, with the potential for incorrect consensus.

Conclusion:

  • The experiment showcased different ways to combine model outputs.
  • The King architecture was the most successful in this test.
  • The code and further details are available for channel members on GitHub and the community Discord.