Soft-Masked Diffusion Language Models
Michael Hersche, Samuel Moor, et al.
ICLR 2026
Dense Associative Memories (DenseAMs) are modern generalizations of Hopfield networks with high-capacity, energy-based retrieval dynamics, but it remains unclear what the most elegant training principle should be for these models. Contrastive divergence (CD) is theoretically well motivated but requires expensive iterative negative sampling, and backpropagating reconstruction loss through long inference trajectories is also costly while not directly leveraging the explicit energy objective. Inspired by the Hebbian learning rule in classical Hopfield networks, we propose to train DenseAMs by direct energy minimization. For DenseAMs with translation-invariant kernel energies, we show that the partition function is independent of memory parameters, so maximum likelihood estimation (MLE) reduces exactly to minimizing data energy. This yields a sampling-free training rule that preserves an explicit energy formulation. We demonstrate the method in both ambient space and latent space, where a stop-gradient coupling with an autoencoder enables stable joint training and memory synthesis from latent noise.
Michael Hersche, Samuel Moor, et al.
ICLR 2026
Robert Farrell, Rajarshi Das, et al.
AAAI-SS 2010
Chen-chia Chang, Wan-hsuan Lin, et al.
ICML 2025
Daniel Karl I. Weidele, Hendrik Strobelt, et al.
SysML 2019