Scalable and Efficient LLM Serving with the vLLM Production StackJunchen JiangYue Zhu2025OSSNA 2025Talk
Towards Optimal Preemptive GPU Time-Sharing for Edge Model ServingZhengxu XiaYitian Haoet al.2023MIDDLEWARE 2023Workshop paper
DEFT: SLO-Driven Preemptive Scheduling for Containerized DNN ServingYitian HaoWenqing Wuet al.2023NSDI 2023Poster