Publications

3 results for Shengkun Cui

Flash: Fast Model Adaptation in ML-Centric Cloud Platforms
- - Haoran Qiu
  - Weichao Mao
  - et al.
- 2024
- MLSys 2024
Conference paper
Queue Management for Large Language Model Serving
- - Archit Patke
  - Dhemath Reddy
  - et al.
- 2024
- ASPLOS 2024
Workshop paper
Efficient Interactive LLM Serving with Proxy Model-based Sequence Length Prediction
- - Haoran Qiu
  - Weichao Mao
  - et al.
- 2024
- ASPLOS 2024
Workshop paper