Hasty Briefsbeta

We found a bug in Go's ARM64 compiler

8 hours ago
  • #arm64
  • #race-condition
  • #Go
  • Cloudflare discovered a bug in Go's arm64 compiler causing race conditions in generated code due to their massive scale.
  • Initial sporadic panics on arm64 machines were observed, linked to stack corruption during stack unwinding.
  • The issue was initially correlated with recovered panics and an old Go issue (#73259), leading to temporary mitigation by avoiding panic/recover for error handling.
  • Fatal panics returned at a higher rate without clear triggers, prompting deeper investigation.
  • Two classes of bugs were identified: crashes due to invalid memory access and explicit fatal errors during stack unwinding.
  • The root cause was traced to async preemption between split stack pointer adjustments in Go's arm64 compiler, leading to invalid stack states during unwinding.
  • A minimal reproducer was created, confirming the bug was a runtime issue, not specific to Cloudflare's environment.
  • The bug was fixed in Go versions 1.23.12, 1.24.6, and 1.25.0 by ensuring stack pointer adjustments are atomic.
  • The investigation highlighted the challenges of debugging rare race conditions at scale and the importance of understanding low-level runtime behaviors.