Lambda isn't leaking memory, your metrics are lying to you
3 days ago
- #memory management
- #AWS Lambda
- #performance tuning
- Glibc's malloc may hoard freed memory in thread-local arenas for allocations under 128 KB, causing RSS to remain high and creating the appearance of a memory leak.
- The @maxMemoryUsed metric in AWS Lambda is not per-invocation but a high-water mark for the execution environment that only increases, making it unreliable for detecting memory leaks.
- Reducing the ONNX model cache size increased SIGKILLs due to more load/unload cycles, which exacerbated memory issues from arena hoarding.
- Lowering glibc's mmap threshold from 128 KB to 32 KB reduced arena hoarding by 97%, significantly lowering RSS but adding latency due to more syscalls.
- Disabling ONNX Runtime's custom allocator and using mallinfo2() to monitor memory usage revealed that glibc was hoarding memory, not the application itself.