Building a fully local deep researcher with DeepSeek-R1

Deep Seek released R1, a new open-source reasoning model, with an accompanying paper detailing its training strategy.

Reasoning models represent a new scaling paradigm for language models (LLMs), focusing on system 2 reasoning (deliberate and logical) rather than system 1 (fast and intuitive).

R1 is trained using a combination of fine-tuning and reinforcement learning (RL) with a method called GRPO.

The training process involves:

Fine-tuning Deep Seek V3 on thousands of Chain of Thought reasoning examples (exact number unspecified).
Using GRPO RL on 144,000 hard verifiable problems in math and coding, generating 64 attempts per example and scoring them based on correctness.
Comparing each attempt to the mean of the batch and adjusting token generation probabilities accordingly to reinforce good reasoning patterns.
Filtering the RL outputs to obtain 600,000 high-quality reasoning traces for further training.
A second fine-tuning phase with these traces and an additional 200,000 non-reasoning examples to restore general capabilities.
A second RL phase with rewards for helpfulness, harm, and reasoning, using a mix of reasoning and general problems.

Deep Seek also created distilled, smaller R1 models that can run on personal laptops.

Results show that R1 performs comparably to OpenAI’s O1 on coding and math challenges, with the distilled 14b model running effectively on a 32GB MacBook Pro.

The video demonstrates using R1 with AMA (a tool for running models) in a notebook environment, highlighting the presence of “think tokens” in the outputs, which can be removed in JSON mode.

An example workflow using R1 for report writing is shown, involving generating queries, performing web searches, summarizing results, and reflecting to generate new queries in a loop.

The presenter concludes that while the “think token” issue is a minor annoyance, the ability to run reasoning models locally is a significant advancement, and the open-source nature of the training process is commendable.

ThirdBrAIn.tech

ThirdBrAIn.tech

Building a fully local deep researcher with DeepSeek-R1

Summary of Deep Seek’s R1 Reasoning Model Release

Detailed Instructions and URLs

Table of Contents

Graph View

Backlinks