Reduce GVisor Cold Starts with GPU Snapshotting

a day ago

GPU cold starts for AI models in production can cause long startup times, affecting scaling and resource utilization.
Cerebrium reduces cold starts by over 80% using CPU and GPU memory snapshots to restore fully initialized containers in seconds.
Checkpointing involves pausing execution, dumping CPU/GPU memory to files, uploading to storage, and restoring on demand.
High-level architecture includes a checkpoint service and modified gVisor containerd shim to decide between normal boot or restore.
Restoration times are fast, e.g., 2.25 seconds from S3 for a 9GB container, compared to 50 seconds for a full cold start.
Real-world challenges include handling network state, multiprocessing, local runtime files, and ensuring compatibility across environments.
Benchmarks show Cerebrium reduces cold starts by an average of 71% compared to without snapshots, and is faster than competitors like Baseten and Modal.
Checkpointing is ideal for workloads with deterministic initialization like framework imports, model loading, and CUDA graph capture.

Hasty Briefsbeta