Client-side GPU load balancing with Redis and Lua
9 days ago
- #Redis
- #Load Balancing
- #GPU Optimization
- Achieved a 40% increase in GPU utilization and 70% reduction in tail latency by implementing a load-aware client-side balancer with Redis.
- Identified the default Kubernetes load balancer as ineffective for GPU inference due to its round-robin approach, leading to uneven GPU utilization.
- Developed a client-side load balancing solution leveraging Redis and Lua scripting for atomic operations and real-time GPU load tracking.
- Used Redis sorted sets to maintain a global priority queue and per-GPU request logs for accurate load tracking and reconciliation.
- Implemented a cost function based on request payload size to estimate GPU workload, ensuring efficient load distribution.
- Designed failure handling mechanisms including client crash recovery, Redis fallback, and GPU pod lifecycle management.
- Demonstrated significant latency improvements in load tests, especially for larger input sizes, with up to 73% reduction in p99 latency.
- Highlighted client-side load balancing benefits: lower latency, natural failure isolation, simpler implementation, and better scalability.
- Planned future enhancements include refining the scoring function and exploring throughput-optimized modes.