The fix for a segfault that never shipped
3 months ago
- #fixed-point-arithmetic
- #debugging
- #audio-encoding
- Recall.ai processes millions of hours of meetings monthly, normalizing diverse audio formats into a consistent one.
- The company encountered a rare segfault in an ancient AAC encoder, leading to data loss in 1 out of 36 million meeting bot instances.
- The issue was traced to a fixed-point math bug in the VisualOn AAC encoder, which was patched over a decade ago but never distributed downstream.
- To diagnose the issue, Recall.ai developed 'Garbage Truck', a Rust binary to capture and upload core dumps from ephemeral EC2 instances to S3.
- Analysis revealed the crash occurred in `voAACEnc_pow2_xy` due to an out-of-bounds access caused by incorrect fixed-point arithmetic leading to `avgEn < minEn`.
- The team implemented a workaround by enforcing `avgEn >= minEn` and ultimately replaced the outdated encoder with a modern AAC encoder to prevent future crashes.