Mind the Memory Gap: Unveiling GPU Bottlenecks in Large-Batch LLM InferencePol G. RecasensFerran Agulloet al.2025CLOUD 2025Conference paper
Towards Pareto Optimal Throughput in Small Language Model ServingPol G. RecasensYue Zhuet al.2024EuroMLSys 2024Conference paper
Characterizing Training Performance and Energy for Foundation Models and Image Classifiers on Multi-Instance GPUsConnor EspenshadeRachel Penget al.2024EuroMLSys 2024Conference paper