What 100k concurrent sandboxes has taught us so far

3 days ago

The Scale Invitational's original goal was to test how providers handle spinning up tens of thousands of sandboxes simultaneously.
Initial attempts using a single VM for 10,000 sandboxes revealed that the test harness itself became a bottleneck, skewing results.
To better simulate real workloads, the architecture was redesigned to use sharding, distributing the load across multiple VMs.
A balance of about 100 iterations per shard was found to avoid individual VM bottlenecks while keeping fleet size manageable.
A key insight was the difference between measuring throughput (creating sandboxes quickly) and true concurrency (sustaining many sandboxes alive simultaneously).
The test was adjusted to keep sandboxes alive until peak concurrency, introducing result categories like 'partial' to indicate sandboxes that died prematurely.
Log aggregation was implemented across shards to enable debugging at scale, storing logs in durable object storage.
A data pipeline using Tigris for cold storage and Clickhouse for analytics was set up to handle the large volume of results.
Due to these complexities and iterative improvements, the event was postponed to June 17th to ensure accurate and meaningful results.

Hasty Briefsbeta