Robert Farrell, Rajarshi Das, et al.
AAAI-SS 2010
Mid-training is increasingly used to improve the reasoning capabilities of large language models (LLMs), yet its design choices and interaction with evaluation and reinforcement learning (RL) remain poorly understood. Prior work often focuses on narrow domain gains, overlooking retention of general abilities, long-context performance, and RL compatibility. We present PRISM (Demystifying Retention and Interaction in Mid-Training), a holistic empirical study that analyzes mid-training design choices, what to evaluate, and how domain mixtures and training stages interact across model families. Experiments on Granite-3.3 8B, LLaMA-3.1 8B, and Mistral-7B/24B base models show that a relatively small, high-quality mid-training phase of ~27B tokens acts as a critical stabilizing stage for reasoning. Across models, PRISM yields consistent gains of ~6-10 points on coding benchmarks and ~17-30 points on mathematical reasoning benchmarks while preserving general performance. RL applied on top of PRISM-mid-trained models produces stable, monotonic improvements, adding a further ~3-8 points across coding and math tasks such as LiveCodeBench, Codeforces, AIME and MATH500, and ~17-20 points on science (GPQA-Diamond), whereas RL applied directly to base models is substantially less effective. Our results demonstrate that retention-aware mid-training is a necessary intermediate step for reliable reasoning enhancement and RL scaling, and provide practical guidance for designing robust mid-training pipelines for modern LLMs.
Robert Farrell, Rajarshi Das, et al.
AAAI-SS 2010
Amit Dhurandhar, Vijil Vijil, et al.
ICML 2026
Chen-chia Chang, Wan-hsuan Lin, et al.
ICML 2025
Daniel Karl I. Weidele, Hendrik Strobelt, et al.
SysML 2019