Workshop paper

HGR-TabE: Universal Tabular Embeddings via Maximal Correlation Alignment

Abstract

Universal text embedding models show that a single pretrained model can produce representations useful across tasks like classification, clustering, and retrieval. In contrast, tabular foundation mod- els remain largely task-specific. We ask whether a single tabular embedding model can general- ize across tasks. We propose HGR-TabE, an initial approach that first aligns heterogeneous table-cell representations into a shared space us- ing Hirschfeld–Gebelein–Renyi (HGR) maximal correlation, capturing relationships between numerical and non-numerical values within rows. We then apply message passing via Hypergraph Transformer (All-Set Transformer modules) to preserve row and column permutation invariance. The model is trained entirely with self-supervision to learn consistent representations at the cell, row, column, and table levels. Without task-specific fine-tuning, it generates embeddings that perform well on row similarity, column similarity, and predictive tasks, demonstrating strong cross-task generalization compared to specialized models.