Leveraging Suboptimal and Noisy Trajectories for Goal-Conditional Offline RL

Ningze Zhong; Yi Wang; Bo Wu

ICLR 2026

Workshop paper

23 Apr 2026

Leveraging Suboptimal and Noisy Trajectories for Goal-Conditional Offline RL

Abstract

Exploration is a key capability of online reinforcement learning (RL), where agents interact with the environment to discover diverse trajectories and improve policies. In contrast, offline RL relies on static datasets that typically consist of high-quality demonstrations, limiting state-space exploration. As a result, suboptimal or highly noisy trajectories are often discarded as harmful to learning. In this paper, we show that in offline goal-conditioned reinforcement learning (OGCRL), such imperfect trajectories can instead serve as a valuable source of exploration. We theoretically analyze how suboptimal and noisy trajectories expand state-space coverage and propose a learning pipeline that leverages them as exploration experts while preserving policy learning from high-quality demonstrations. Exper- iments show that incorporating large-scale noisy trajectories consistently outperforms baselines and improves models trained solely on expert data, especially in environments with large and complex state spaces. Our findings reveal the un- tapped potential of imperfect trajectories in offline RL and suggest a scalable way where increasingly diverse datasets drive policy improvement.

Workshop paper