ThirdBrAIn.tech

ThirdBrAIn.tech

Search

❯

❯

❯

❯

❯

DeepSeek R1 Cloned for $30?! PhD Student STUNNING Discovery

Apr 02, 20252 min read

DeepSeek R1 Cloned for $30?! PhD Student STUNNING Discovery

AI Summary

Summary of the Video Transcript

A UC Berkeley PhD student, Ja Pan, reproduced the “aha moment” from the DeepSRL R10 model using reinforcement learning (RL) for under $30.

The “aha moment” refers to a model’s emergent ability to allocate more thinking time to a problem by re-evaluating its initial approach.

This behavior demonstrates the model’s growing reasoning abilities and the potential of reinforcement learning to produce sophisticated outcomes.

The student applied RL to the countdown game, where the model developed self-verification and search abilities autonomously.

The model’s performance improved with a well-defined reward function, which is easier to establish for tasks with definitive answers like math or logic.

The experiment showed that the base model quality is crucial, with larger models (1.5B parameters and up) developing the ability to search and self-verify.

The instruct model learns faster but converges to about the same performance as the base model, and its outputs are more structured and readable.

The specific RL algorithm used (PPO, GRPO, or Prime) did not significantly affect the outcome.

The model’s reasoning behavior is task-dependent, with different strategies emerging for different tasks.

The findings are currently only validated on the countdown task and not general reasoning due to computational constraints.

The student’s work is open-sourced under the name “tiny zero,” with all resources available on GitHub.

Detailed Instructions and URLs

No specific CLI commands, website URLs, or detailed instructions were provided in the transcript.

DeepSeek R1 Cloned for $30?! PhD Student STUNNING Discovery
Summary of the Video Transcript
Detailed Instructions and URLs

Graph View

Backlinks

No backlinks found

Created with Quartz v4.2.3 © 2025

GitHub
Discord Community