Welcome to the Nexus of Ethics, Psychology, Morality, Philosophy and Health Care

Welcome to the nexus of ethics, psychology, morality, technology, health care, and philosophy

Monday, September 22, 2025

Hierarchical Reasoning Model

Wang, G., Li, J.,  et al. (2025, June 26).
arXiv.org.

Abstract

Reasoning, the process of devising and executing complex goal-oriented action sequences, remains a critical challenge in AI. Current large language models (LLMs) primarily employ Chain-of-Thought (CoT) techniques, which suffer from brittle task decomposition, extensive data requirements, and high latency. Inspired by the hierarchical and multi-timescale processing in the human brain, we propose the Hierarchical Reasoning Model (HRM), a novel recurrent architecture that attains significant computational depth while maintaining both training stability and efficiency. HRM executes sequential reasoning tasks in a single forward pass without explicit supervision of the intermediate process, through two interdependent recurrent modules: a high-level module responsible for slow, abstract planning, and a low-level module handling rapid, detailed computations. With only 27 million parameters, HRM achieves exceptional performance on complex reasoning tasks using only 1000 training samples. The model operates without pre-training or CoT data, yet achieves nearly perfect performance on challenging tasks including complex Sudoku puzzles and optimal path finding in large mazes. Furthermore, HRM outperforms much larger models with significantly longer context windows on the Abstraction and Reasoning Corpus (ARC), a key benchmark for measuring artificial general intelligence capabilities. These results underscore HRM's potential as a transformative advancement toward universal computation and general-purpose reasoning systems.

Here are some thoughts:

This article introduces the Hierarchical Reasoning Model (HRM), a biologically inspired neural architecture designed to mimic the brain's hierarchical, multi-timescale processing for complex reasoning. Unlike standard large language models that rely on brittle, token-by-token Chain-of-Thought (CoT) prompting, HRM uses two coupled recurrent modules: a slow-updating, high-level module for abstract planning and a fast-updating, low-level module for detailed computation. This structure allows HRM to perform deep, iterative reasoning within its internal state space during a single forward pass, achieving near-perfect performance on demanding tasks like complex Sudoku and maze navigation with only 1,000 training examples—without pre-training or explicit CoT supervision. Critically, HRM exhibits an emergent neural representation hierarchy, where the high-level module develops a significantly higher-dimensional state space than the low-level module, mirroring the dimensionality increase observed along the cortical hierarchy in the primate brain. This suggests HRM autonomously learns a functional organization akin to biological systems, offering a promising, data-efficient alternative to current AI reasoning paradigms and providing a novel computational model for studying the neural underpinnings of flexible, goal-directed cognition.