AI researchers claim that they reproduced the core abilities of DeepSeek R1-Zero with open source model TinyZero, for just $30.

Artificial intelligence (AI) researchers at UC Berkeley claim that they reproduced the core abilities of DeepSeek R1-Zero for just $30, Tom's Hardware reports. This shows how you can make advanced models cheaply.

Research leader Jiayi Pan posted an X thread about this. "You can experience the Ahah moment yourself for < $30," he said. He wrote on his own website: "We release TinyZero, the first open reproduction of reasoning models. Through RL, the 3B base LM develops self-verification and search abilities all on its own."

The researchers taught the model to verify and search answers using reinforcement learning. They started with a basic language model, a prompt, and a reward system. They tested the model with the Countdown game, where players use basic math to reach a target number from given numbers.

The model began with wrong guesses but learned to revise and search for the right answer. For example, it would suggest an answer, check if it was correct, and adjust until it found the solution.

Impressive cost reduction for a specific task

The researchers different model sizes, starting with one having 500 million parameters and then going up in size.

"We run Qwen-2.5-Base 0.5B, 1.5B, 3B to 7B. 0.5B guess a solution and stop," said Pan. "From 1.5B, the model start learning to search, to self-verify and to revise its solutions, enabling them to achieve much higher scores." Qwen is the AI model developed by Alibaba, which Alibaba has recently updated.

The code for the model, called TinyZero, is available on GitHub.

Impressively, this cost them only $30, much less than using services like OpenAI's API, which costs $15 per million input tokens. DeepSeek-R1's cost is $0.55 per million tokens. Pan's project makes AI research more accessible due to its low cost.

"We hope this project helps to demystify the emerging RL scaling research and make it more accessible!," concluded Pan. "One caveat, of course, is that it's validated only in the Countdown task but not the general reasoning domain."

#LargeLanguageModels(LLMs)

About the Writer

GP

Giulio Prisco

253.04986 MPXR

Giulio Prisco is Senior Editor at Mindplex. He is a science and technology writer mainly interested in fundamental science and space, cybernetics and AI, IT, VR, bio/nano, crypto technologies.

TinyZero emulates DeepSeek for $30 on a specific task

Impressive cost reduction for a specific task

About the Writer

Related Articles

Comments on this article

Mindplex

QUICK LINKS

ABOUT US

CONTACT