What 100k concurrent sandboxes has taught us so far
3 days ago
- #benchmarking
- #serverless
- #scalability
- The Scale Invitational's original goal was to test how providers handle spinning up tens of thousands of sandboxes simultaneously.
- Initial attempts using a single VM for 10,000 sandboxes revealed that the test harness itself became a bottleneck, skewing results.
- To better simulate real workloads, the architecture was redesigned to use sharding, distributing the load across multiple VMs.
- A balance of about 100 iterations per shard was found to avoid individual VM bottlenecks while keeping fleet size manageable.
- A key insight was the difference between measuring throughput (creating sandboxes quickly) and true concurrency (sustaining many sandboxes alive simultaneously).
- The test was adjusted to keep sandboxes alive until peak concurrency, introducing result categories like 'partial' to indicate sandboxes that died prematurely.
- Log aggregation was implemented across shards to enable debugging at scale, storing logs in durable object storage.
- A data pipeline using Tigris for cold storage and Clickhouse for analytics was set up to handle the large volume of results.
- Due to these complexities and iterative improvements, the event was postponed to June 17th to ensure accurate and meaningful results.