Flash: Fast Model Adaptation in ML-Centric Cloud PlatformsHaoran QiuWeichao Maoet al.2024MLSys 2024Conference paper
Queue Management for Large Language Model ServingArchit PatkeDhemath Reddyet al.2024ASPLOS 2024Workshop paper
Efficient Interactive LLM Serving with Proxy Model-based Sequence Length PredictionHaoran QiuWeichao Maoet al.2024ASPLOS 2024Workshop paper