The invisible engineering behind Lambda's network
7 hours ago
- #Cloud Computing
- #Network Optimization
- #Serverless Architecture
- AWS Lambda's network improvements involve invisible engineering that enhances performance without disrupting services, similar to upgrading an aircraft mid-flight.
- Lambda's network topology is software-defined and critical for managing data flow and isolation in a multi-tenant cloud environment, impacting latency and resource costs.
- VPC cold starts were historically slower due to network setup like Geneve tunnel creation, which took 300 milliseconds, posing barriers for latency-sensitive workloads.
- The team reduced Geneve tunnel latency from 150 milliseconds to 200 microseconds by using eBPF to rewrite VNIs dynamically, moving tunnel creation off the hot path.
- Lambda SnapStart required pre-created network devices for cloned execution environments, initially capped at 200 per host, but scaling to 4,000 networks was necessary for full adoption.
- To scale, networks were pre-created at worker boot (taking three minutes) to avoid on-demand creation overhead, eliminating CPU drain during function execution.
- Stateful NAT with iptables caused latency at high density; it was replaced with stateless eBPF-based packet mangling, reducing NAT setup latency by 100x.
- Iptables rules were simplified from over 125,000 in the root namespace to 144 static rules by moving slot-specific rules into individual network namespaces, eliminating performance skew.
- RTNL lock bottlenecks were addressed by reordering operations and batching eBPF attachments, speeding up network creation during worker initialization.
- The unified network topology supports both traditional and snapshot workloads, increased capacity 20x, reduced CPU usage by 1%, and enabled reuse by other AWS services like Aurora DSQL.