Amit Dhurandhar, Vijil Vijil, et al.
ICML 2026
Language Reasoning Models (LRMs) have shown impressive performance on solving complex problems requiring multi-steps. However, a growing body of studies show that LRMs are still inefficient, over-generating verification and self-reflection steps. To address this challenge, we introduce the Step-Tagging Early-Stopping (ST-ES) framework, a lightweight sentence-classifier enabling real-time annotation of the type of reasoning steps that an LRM is generating. We show that limiting the count of specific step-type - especially verification and self-reflection steps - yields a more accurate and token-efficient early-stopping criterion than token-count baseline, and that each step-types yield to a different efficiency trade-off. Unlike prior dynamic early-stopping methods, ST-ES operates in a full black-box setting, and offers interpretable early-stopping criteria. We evaluate ST-ES on three mathematical reasoning benchmarks, namely, MATH500, GSM8K, AIME and two knowledge and reasoning benchmarks, MMLU and GPQA respectively. We achieve 20 to 50% token reduction while maintaining comparable accuracy to standard generation.
Amit Dhurandhar, Vijil Vijil, et al.
ICML 2026
Naiyu Yin, Dennis Wei, et al.
ICML 2026
Nandana Mihindukulasooriya, Sarthak Dash, et al.
ISWC 2023
Song Wang, Lin Junhong, et al.
ICLR 2024