TransformerLLMlarge reasoning modelsMambaState Space ModelsHRMHierarchical Reasoning ModelAI architectureAI futureSapient IntelligenceAI21 LabsARC-AGIAI reasoningAI research

The King Is Dead - What Will Replace the Transformer and What the Large Reasoning Models Era Looks Like

Everyone works with ChatGPT, Claude, and Gemini, but the Transformer architecture behind them is approaching its limits. Ben Rotenberg explains two new architectures challenging it and why we are on the verge of a shift from LLMs to Large Reasoning Models.

Ben Rotenberg

Chat wizard and automation magician, with vast experience in workshops and lectures on artificial intelligence for all types of organizations.

April 25, 2026

The King Is Dead - What Will Replace the Transformer and What the Large Reasoning Models Era Looks Like

Ben Rotenberg, an AI consultant specializing in enterprise technology adoption, opened with the question preoccupying everyone in the field: what will replace ChatGPT? Before answering, he invites us to understand the architecture behind all the major tools and why the current technological "king" is reaching its limits.

What is the Transformer's built-in limitation?

The Transformer architecture powering ChatGPT, Claude, and Gemini operates like a person reading a thousand-page book: to understand the current word, it must go back and process every page from the beginning. This makes it thorough and powerful, but also consistently slower and more expensive as context grows longer. Beyond that, the Transformer excels at detecting language patterns but is limited in genuine multi-step reasoning.

What does Mamba offer?

Mamba, commercially adopted by AI21 Labs, is based on an approach called State Space Models. Instead of re-reading everything, the model maintains a compressed running state that updates with each new word, like a reader holding a continuously refreshed summary in memory rather than returning to page one each time. This allows it to handle enormous contexts of millions of words without losing what actually matters.

What does Sapient Intelligence's HRM offer?

The Hierarchical Reasoning Model, developed by Sapient Intelligence, is not just trying to be faster. It is trying to think differently. The inspiration comes from the human brain and the model described by Daniel Kahneman in Thinking Fast and Slow: a fast shallow system alongside a slow thorough one. The two layers of HRM work together to enable deep latent reasoning, meaning actual computation within the neural network, rather than generating words to simulate thinking as Chain-of-Thought does.

The results are striking: HRM with only 27 million parameters solved complex Sudoku puzzles and massive mazes almost perfectly after learning from just 1,000 examples. Far larger models failed entirely. It also outperformed models many times its size on the ARC-AGI benchmark, which is considered a measure of general intelligence capabilities.

What does this mean for the future of AI?

Rotenberg emphasizes this is not a ChatGPT upgrade but a generational shift. We are on the verge of transitioning from "Large Language Models" to "Large Reasoning Models." The limitations troubling us today may look in retrospect like minor birth pangs of a young technology. What this means for OpenAI's lead and the future of work, Rotenberg admits he does not know, but it is clear we are living through a moment whose full significance cannot be grasped in real time.

Key Insights

•The Transformer architecture behind ChatGPT, Claude, and Gemini cannot handle long contexts efficiently because it must reprocess the entire history with every new token.
•Mamba from AI21 Labs solves the context problem by maintaining a compressed running state that updates in real time, rather than re-reading everything.
•HRM by Sapient Intelligence mirrors the dual-layer structure of the human brain and enables deep latent reasoning without relying on Chain-of-Thought.
•We are on the verge of a transition from the "Large Language Models" era to the "Large Reasoning Models" era, and current AI limitations will be remembered as minor birth pangs.

Frequently Asked Questions

What is the main limitation of the Transformer architecture?

The Transformer must reprocess the entire existing context with every new token, making it progressively slower and more expensive as text grows longer. This also limits its ability to perform genuine multi-step reasoning.

What is Mamba and how does it differ from the Transformer?

Mamba is a State Space Models-based architecture that maintains a compressed running state updated in real time instead of reprocessing everything. This allows it to handle contexts of millions of words efficiently.

What is HRM and what has it achieved?

The Hierarchical Reasoning Model by Sapient Intelligence mirrors the dual-layer structure of human thinking and enables deep latent reasoning. With only 27 million parameters it outperformed much larger models on reasoning tasks and the ARC-AGI benchmark.

What are Large Reasoning Models and why do they matter?

Large Reasoning Models are a new generation of AI focused on genuine multi-step reasoning rather than language pattern recognition. The shift from LLMs to LRMs marks a deep architectural change that may redefine the boundaries of AI capability.

Based on a post by Ben Rotenberg

View original post →

About the speaker

Ben Rotenberg

Chat wizard and automation magician, with vast experience in workshops and lectures on artificial intelligence for all types of organizations.

View profile →