LithOS: An Operating System for Efficient Machine Learning on GPUs
a year ago
- #GPU
- #Machine Learning
- #Operating System
- LithOS is introduced as an operating system designed for efficient machine learning on GPUs.
- It features a TPC Scheduler for spatial scheduling at individual TPC granularity.
- Includes transparent kernel atomization to reduce head-of-line blocking.
- Offers lightweight hardware right-sizing to determine minimal TPC resources per atom.
- Implements transparent power management to reduce consumption based on workload behavior.
- LithOS is implemented in Rust and shows significant improvements in GPU efficiency.
- Reduces tail latencies by up to 13x compared to NVIDIA's MPS in inference stacking.
- Improves aggregate throughput by 1.6x over state-of-the-art solutions.
- Provides quarter GPU capacity savings with under 4% performance hit via right-sizing.
- Achieves quarter GPU energy savings with a 7% performance hit through power management.