Local AI Just Got Crazy Smart—And It’s Only 8B Thinking LLM!



AI Summary

Summary of Deep Hermes 3 Model Evaluation

  • Model Overview:
    • Deep Hermes 3 is a local thinking model with long Chain of Thought reasoning.
    • It can toggle between thinking and non-thinking modes using a system prompt.
    • The model is based on llama 3.1 and has 8 billion parameters.
    • It was tested using LM Studio.
  • Model Performance:
    • Google Sheets Formula Test:
      • Without thinking mode, the model failed to generate a correct formula.
      • With thinking mode enabled, it successfully created a formula based on given conditions.
    • Wolfram Alpha Factorization Test:
      • Without thinking mode, the model provided an incorrect factorization.
      • With thinking mode, it eventually arrived at the correct factorization after extensive deliberation.
    • Physics Simulation Test:
      • The model performed poorly in programming a bouncing ball inside a spinning hexagon.
      • The provided Python code did not meet the task requirements.
    • Chemistry Compound Identification Test:
      • Deep Hermes 3 correctly identified the compound vanillin, outperforming other models.
    • Medical Diagnosis Test:
      • The model incorrectly diagnosed a set of symptoms, providing an unrelated disease.
    • Math Problem Test:
      • The model failed to solve an IIT-JEE entrance exam problem, even after 10 minutes of processing.
  • Overall Experience:
    • The model excels in reasoning within context and performs well on local machines.
    • It shows improvement over the llama 3.1 8 billion parameter model, especially with thinking enabled.
    • The system prompt can be customized to direct the model’s behavior.
  • Usage Instructions:
    • To use Deep Hermes 3, download it from the Discover Tab in LM Studio.
    • The model requires about 4.7 GB of storage.
    • Add the system prompt from the Hugging Face model page to LM Studio to enable thinking mode.
  • Conclusion:
    • The model is highly capable in certain contexts but has limitations.
    • The creator encourages users to try the model and provide feedback on the tests conducted.

Detailed Instructions and URLs

  • No specific CLI commands, website URLs, or detailed instructions were provided in the transcript.