Building a fully local deep researcher with DeepSeek-R1



AI Summary

Summary of Deep Seek’s R1 Reasoning Model Release

  • Deep Seek released R1, a new open-source reasoning model, with an accompanying paper detailing its training strategy.
  • Reasoning models represent a new scaling paradigm for language models (LLMs), focusing on system 2 reasoning (deliberate and logical) rather than system 1 (fast and intuitive).
  • R1 is trained using a combination of fine-tuning and reinforcement learning (RL) with a method called GRPO.
  • The training process involves:
    • Fine-tuning Deep Seek V3 on thousands of Chain of Thought reasoning examples (exact number unspecified).
    • Using GRPO RL on 144,000 hard verifiable problems in math and coding, generating 64 attempts per example and scoring them based on correctness.
    • Comparing each attempt to the mean of the batch and adjusting token generation probabilities accordingly to reinforce good reasoning patterns.
    • Filtering the RL outputs to obtain 600,000 high-quality reasoning traces for further training.
    • A second fine-tuning phase with these traces and an additional 200,000 non-reasoning examples to restore general capabilities.
    • A second RL phase with rewards for helpfulness, harm, and reasoning, using a mix of reasoning and general problems.
  • Deep Seek also created distilled, smaller R1 models that can run on personal laptops.
  • Results show that R1 performs comparably to OpenAI’s O1 on coding and math challenges, with the distilled 14b model running effectively on a 32GB MacBook Pro.
  • The video demonstrates using R1 with AMA (a tool for running models) in a notebook environment, highlighting the presence of “think tokens” in the outputs, which can be removed in JSON mode.
  • An example workflow using R1 for report writing is shown, involving generating queries, performing web searches, summarizing results, and reflecting to generate new queries in a loop.
  • The presenter concludes that while the “think token” issue is a minor annoyance, the ability to run reasoning models locally is a significant advancement, and the open-source nature of the training process is commendable.

Detailed Instructions and URLs

  • No specific CLI commands, website URLs, or detailed instructions were provided in the transcript.