Publications

Niyama : Breaking the Silos of LLM Inference Serving

Published in arxiv, 2025

QoS aware scheduling for LLM inference serving.

Recommended citation: Goel et al.
Download Paper

Published in HiPC, 2022

Accelerating Key-value stores using Page Table Walkers.

Recommended citation: Anupindi et al.
Download Paper

Published in INCET, 2022

Task partitioning framework for heterogeneous systems.

Recommended citation: Yekbote et al.
Download Paper

Published in arxiv, 2022

Pre-emptive and elastic scheduling of AI workloads at planet-scale.

Recommended citation: Shukla et al
Download Paper

Published in ICACCE, 2019

Simulation framework for energy aware VM allocation in a cloud data center.

Recommended citation: Bhandia et al.
Download Paper