Samsung’s tiny AI model defeats giant inference LLM

Date:

A new paper by Samsung AI researchers explains how small-scale networks can beat large-scale large language models (LLMs) at complex inferences.

In the race for AI supremacy, the industry mantra is often “bigger is better.” Tech giants have spent billions of dollars creating increasingly larger models, but with Tiny Recursive Models (TRMs) they can take a fundamentally different and more efficient step forward, according to Samsung SAIL Montréal’s Alexia Jolicoeur-Martineau.

TRM achieves new state-of-the-art results on notoriously difficult benchmarks such as the ARC-AGI intelligence test using a model with just 7 million parameters, less than 0.01% of the size of leading LLMs. Samsung’s research challenges the common assumption that the only way to evolve the capabilities of AI models is at full scale, and provides a more sustainable and parameter-efficient alternative.

Overcoming scale limitations

Although LLMs have shown remarkable ability to generate human-like text, their ability to perform complex multi-step inferences can be weak. Because each token generates an answer, one mistake early in the process can derail the entire solution and invalidate the final answer.

To alleviate this, techniques such as thought chains have been developed in which the model “thinks out loud” to solve the problem. However, these methods are computationally expensive, often require large amounts of high-quality inference data that may not be available, and may still produce flawed logic. Even with these enhancements, LLM struggles with certain puzzles that require complete logical execution.

Samsung’s efforts are built on a recent AI model known as Hierarchical Reasoning Models (HRM). HRM introduced a new method using two small neural networks that recursively approach the problem at different frequencies and narrow down the answer. Although it showed great promise, it was complex, relying on uncertain biological arguments and complex fixed point theorems whose application was not guaranteed.

Instead of HRM’s two networks, TRM uses a single small network that recursively improves both the internal “inference” and the proposed “answer.”

The model is given a question, an initial guess of the answer, and a potential inference function. First, we iterate several steps to refine the potential inference based on all three inputs. This improved inference is then used to update the prediction of the final answer. This entire process can be repeated up to 16 times, allowing the model to incrementally correct its mistakes in a parameter-efficient manner.

Counterintuitively, this study found that a small network with only two layers can achieve much better generalization than the four-layer version. This size reduction seems to prevent the model from overfitting. This is a common problem when training on small, specialized datasets.

TRM also eliminates the need for complex mathematics used in previous versions. The original HRM model required the assumption that the function converges to a fixed point to justify the training method. TRM bypasses this completely by simply backpropagating through a fully recursive process. This change alone significantly improved performance, increasing the accuracy of the Sudoku-Extreme benchmark in ablation studies from 56.5% to 87.4%.

Samsung model beats AI benchmarks with fewer resources

The results speak for themselves. On the Sudoku-Extreme dataset using only 1,000 training samples, TRM achieved a test accuracy of 87.4%, significantly higher than HRM’s 55%. In Maze-Hard, a task of finding a long path through a 30×30 maze, the TRM score was 85.3% and the HRM score was 74.5%.

Most notably, TRM has made significant progress in the Abstraction and Reasoning Corpus (ARC-AGI), a benchmark designed to measure true fluid intelligence in AI. With only 7 million parameters, TRM achieves an accuracy of 44.6% on ARC-AGI-1 and 7.8% on ARC-AGI-2. This outperforms HRM using a 27M parameter model, and even outperforms many of the world’s largest LLMs. For comparison, the Gemini 2.5 Pro’s ARC-AGI-2 score is just 4.9%.

The TRM training process has also been streamlined. The adaptation mechanism called ACT (which determines when the model has improved its response enough to move on to new data samples) has been simplified to eliminate the need for a second, costly forward pass through the network during each training step. This change was made without significant difference in the final generalization.

This study from Samsung presents a compelling argument against the current trajectory of ever-expanding AI models. This shows that by designing architectures that can iteratively reason and self-correct, we can solve very difficult problems with only a fraction of computational resources.

See also: Google’s new AI agent rewrites code to automate vulnerability fixes

Banner for AI & Big Data Expo by TechEx event.

Want to learn more about AI and big data from industry leaders? Check out the AI ​​& Big Data Expo in Amsterdam, California, and London. This comprehensive event is part of TechEx and co-located with other major technology events such as Cyber ​​Security Expo. Click here for more information.

AI News is brought to you by TechForge Media. Learn about other upcoming enterprise technology events and webinars.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Share post:

Subscribe

spot_imgspot_img

Popular

More like this
Related

Powerball jackpot rises to $133 million for March 23 drawing

Check out the luckiest states in the lotteryUSA TODAY's...

Supreme Court debates policy to turn back asylum seekers at border

To be granted asylum, applicants must prove they faced...

Travis Scott surprises crowd at Fashion Scholarship Fund gala

Ciara and Russell Wilson bring their kids to Fashion...

What happened when Queen Elizabeth called President Trump and other takeaways from the new book

Queen Elizabeth met with 13 American presidents. Who was...