Surprising Economics of Load-Balanced Systems
4 hours ago
- #latency scaling
- #queuing theory
- #M/M/c model
- The system is an M/M/c queuing model with c servers, each handling one request at a time.
- Offered load is c * 0.8 requests per second, keeping per-server utilization constant at 0.8.
- Mean service time is one second per request.
- Erlang's C formula shows that as c increases, the probability of queuing decreases.
- Client-observed mean latency asymptotically approaches one second as c grows larger.
- Percentiles (median, 99th, 99.9th) follow a similar improvement pattern as the mean.
- Larger c improves latency at the same utilization or allows better utilization at the same latency.
- Assumptions include Poisson arrivals and exponential service times, though real systems may differ.
- Stability requires λ/(cμ) < 1, which holds here with utilization 0.8.