Light Sleep: Waking VMs in 200ms with eBPF and snapshots
8 days ago
- #virtualization
- #serverless
- #eBPF
- Koyeb introduced Light Sleep to reduce cold starts to around 200ms for CPU workloads.
- Transitioned from Firecracker to Cloud Hypervisor for broader hardware support, including GPUs.
- Integrated Kata Containers for flexibility in swapping between different VMM backends.
- Implemented snapshotting with pause_with_snapshot and resume_from_snapshot endpoints.
- Encountered and resolved issues with virtio-fs and network restoration during snapshotting.
- Used eBPF for kernel-level idle detection and to ignore health check traffic.
- Developed scaletozero-agent to monitor and manage VM sleep and wake cycles.
- Proxied health checks to prevent Nomad from restarting paused services.
- Achieved near-instant wakeups by leveraging TCP retries and eBPF signaling.
- Plans to extend snapshotting to GPU-based services, addressing VRAM preservation challenges.