Universal Position Interpolation: Unified Context Scaling for Hybrid Mamba-Transformer Models

Haochen Shen; Davis Wertheimer; Zheng Wang; Garrett Goon; Derrick Liu; Naigang Wang; Mudhakar Srivatsa; Raghu Kiran Ganti; Minjia Zhang

ICLR 2026

Conference paper

23 Apr 2026

Universal Position Interpolation: Unified Context Scaling for Hybrid Mamba-Transformer Models

Download paper

Abstract

Hybrid Mamba-Transformer models have emerged as promising alternatives to Transformers, offering efficiency and competitive performance. However, they struggle to generalize beyond their training context windows, collapsing on long-context tasks. We provide the first systematic analysis of this failure, showing that it arises from uncontrolled state growth and uneven receptive field contributions across the hybrid architecture. Guided by this understanding, we introduce Universal Position Interpolation (UPI), a lightweight, training-free scaling method that unifies Mamba’s cumulative decay with Transformer rotary frequency scaling. UPI selectively stabilizes unstable Mamba dynamics while rescaling Transformer encodings, controlling state growth and enabling reliable long-context generalization, with only a few auxiliary forward passes. Evaluation shows that UPI extends multi- ple state-of-the-art hybrid and pure Mamba models from 4K to up to 64K tokens on PG-19 perplexity, LongBench and RULER benchmarks, without sacrificing short-context accuracy. These findings establish the first principled bridge between context length extension on Transformers and state-space models and open a new direction for training-free context extension methods for emerging hybrid models.

Conference paper