Hasty Briefsbeta

Bilingual

Surprising Economics of Load-Balanced Systems

4 hours ago
  • #latency scaling
  • #queuing theory
  • #M/M/c model
  • The system is an M/M/c queuing model with c servers, each handling one request at a time.
  • Offered load is c * 0.8 requests per second, keeping per-server utilization constant at 0.8.
  • Mean service time is one second per request.
  • Erlang's C formula shows that as c increases, the probability of queuing decreases.
  • Client-observed mean latency asymptotically approaches one second as c grows larger.
  • Percentiles (median, 99th, 99.9th) follow a similar improvement pattern as the mean.
  • Larger c improves latency at the same utilization or allows better utilization at the same latency.
  • Assumptions include Poisson arrivals and exponential service times, though real systems may differ.
  • Stability requires λ/(cμ) < 1, which holds here with utilization 0.8.