TinyZero emulates DeepSeek for $30 on a specific task

2025-01-31
2 min read.
AI researchers claim that they reproduced the core abilities of DeepSeek R1-Zero with open source model TinyZero, for just $30.

Artificial intelligence (AI) researchers at UC Berkeley claim that they reproduced the core abilities of DeepSeek R1-Zero for just $30, Tom's Hardware reports. This shows how you can make advanced models cheaply.

Research leader Jiayi Pan posted an X thread about this. "You can experience the Ahah moment yourself for < $30," he said. He wrote on his own website: "We release TinyZero, the first open reproduction of reasoning models. Through RL, the 3B base LM develops self-verification and search abilities all on its own."

The researchers taught the model to verify and search answers using reinforcement learning. They started with a basic language model, a prompt, and a reward system. They tested the model with the Countdown game, where players use basic math to reach a target number from given numbers.

The model began with wrong guesses but learned to revise and search for the right answer. For example, it would suggest an answer, check if it was correct, and adjust until it found the solution.

Impressive cost reduction for a specific task

The researchers different model sizes, starting with one having 500 million parameters and then going up in size.

"We run Qwen-2.5-Base 0.5B, 1.5B, 3B to 7B. 0.5B guess a solution and stop," said Pan. "From 1.5B, the model start learning to search, to self-verify and to revise its solutions, enabling them to achieve much higher scores." Qwen is the AI model developed by Alibaba, which Alibaba has recently updated.

The code for the model, called TinyZero, is available on GitHub.

Impressively, this cost them only $30, much less than using services like OpenAI's API, which costs $15 per million input tokens. DeepSeek-R1's cost is $0.55 per million tokens. Pan's project makes AI research more accessible due to its low cost.

"We hope this project helps to demystify the emerging RL scaling research and make it more accessible!," concluded Pan. "One caveat, of course, is that it's validated only in the Countdown task but not the general reasoning domain."

#LargeLanguageModels(LLMs)



Related Articles


Comments on this article

Before posting or replying to a comment, please review it carefully to avoid any errors. Reason: you are not able to edit or delete your comment on Mindplex, because every interaction is tied to our reputation system. Thanks!

Mindplex

Mindplex is an AI company, a decentralized media platform, a global brain experiment, and a community dedicated to the rapidly unfolding future. Our platform empowers our community to share and discuss futurist content while showcasing AI and blockchain tools that enhance the media experience. Join us and shape the future of digital media!

ABOUT US

FAQ

CONTACT

Editors

© 2025 MindPlex. All rights reserved