LIMO: small data, big results

2025-02-17
2 min read.
Fewer good examples train AI for complex reasoning tasks with impressive accuracy across diverse benchmarks.

A new study suggests that large language models (LLMs) can learn complex reasoning tasks with only a small set of examples, VentureBeat reported last week. The story is not accessible at this moment, but here's an archived copy.

The researchers found that LLMs already have a lot of knowledge from their pre-training phase. With smart training methods, it could be possible to create custom LLMs without needing the huge resources of big AI labs.

The study introduces a method called "less is more" (LIMO) that uses fewer but carefully chosen examples to train LLMs. The researchers created a small LIMO dataset for hard math problems with just a few hundred examples. They fine-tuned the Qwen2.5-32B-Instruct LLM on this dataset.

The results were impressive. The LIMO-trained model solved 57.1% of problems on the tough AIME math test and 94.8% on the MATH test. It beat other models trained on much more data. It also did better than some advanced reasoning models like QwQ-32B-Preview and OpenAI o1-preview, which used more resources. The LIMO model even worked well on new, different problems, scoring high on science tests like OlympiadBench and GPQA.

Quality, not quantity of data

Reasoning tasks often need fine-tuning, and experts think this required lots of data. LIMO changes that, making it easier to build specialized models.

The researchers say LIMO works because LLMs already have reasoning knowledge from pre-training. New techniques also let models "think" longer by creating detailed reasoning chains, which helps them solve problems better. To make LIMO datasets, you need to pick hard problems that push the model to think in new ways. Solutions should be clear and well-organized, guiding the model step by step.

The researchers shared their code and data on GitHub. They plan to apply LIMO to other areas in the future. This study suggests that quality, not quantity, is key to unlocking LLM reasoning power.

#LargeLanguageModels(LLMs)



Related Articles


Comments on this article

Before posting or replying to a comment, please review it carefully to avoid any errors. Reason: you are not able to edit or delete your comment on Mindplex, because every interaction is tied to our reputation system. Thanks!

Mindplex

Mindplex is an AI company, a decentralized media platform, a global brain experiment, and a community dedicated to the rapidly unfolding future. Our platform empowers our community to share and discuss futurist content while showcasing AI and blockchain tools that enhance the media experience. Join us and shape the future of digital media!

ABOUT US

FAQ

CONTACT

Editors

© 2025 MindPlex. All rights reserved