Chen Chang Lew, Christof Ferreira Torres, et al.
EuroS&P 2024
Confidential collaborative machine learning (ML) enables multiple mutually distrusted data holders to jointly train an ML model while preserving the confidentiality of their private datasets due to regulatory or competitive reasons. However, existing works need frequent data and model exchanges during training via slower conventional links. They face increasing challenges due to the exponentially growing sizes of models and datasets in modern training workloads like large language models (LLMs), resulting in prohibitively high communication costs. In this paper, we propose a novel mechanism called GPU Travelling that leverages recently emerged confidential GPUs. With our rigorous design, the GPU can securely travel to the specific data holder to load the dataset directly into the GPU’s protected memory and then return for training, eliminating the need for data transmission while ensuring confidentiality up to a data-centre level. We developed a prototype using Intel TDX and NVIDIA H100 and evaluated its performance on llm.c, a CUDA-based LLM training project, and demonstrated the performance and feasibility while maintaining strong security guarantees. The results showed at least 4x speed improvement when transmitting a 512 MiB dataset chunk versus conventional transmission.
Chen Chang Lew, Christof Ferreira Torres, et al.
EuroS&P 2024
Manoj Kumar, Pratap Pattnaik
HPEC 2020
Daniel Egger, Jakub Marecek, et al.
APS March Meeting 2021
Naorin Hossain, William Santiago Fernandez, et al.
ICMC 2024