Hasty Briefsbeta

Client-side GPU load balancing with Redis and Lua

9 days ago
  • #Redis
  • #Load Balancing
  • #GPU Optimization
  • Achieved a 40% increase in GPU utilization and 70% reduction in tail latency by implementing a load-aware client-side balancer with Redis.
  • Identified the default Kubernetes load balancer as ineffective for GPU inference due to its round-robin approach, leading to uneven GPU utilization.
  • Developed a client-side load balancing solution leveraging Redis and Lua scripting for atomic operations and real-time GPU load tracking.
  • Used Redis sorted sets to maintain a global priority queue and per-GPU request logs for accurate load tracking and reconciliation.
  • Implemented a cost function based on request payload size to estimate GPU workload, ensuring efficient load distribution.
  • Designed failure handling mechanisms including client crash recovery, Redis fallback, and GPU pod lifecycle management.
  • Demonstrated significant latency improvements in load tests, especially for larger input sizes, with up to 73% reduction in p99 latency.
  • Highlighted client-side load balancing benefits: lower latency, natural failure isolation, simpler implementation, and better scalability.
  • Planned future enhancements include refining the scoring function and exploring throughput-optimized modes.