Hasty Briefsbeta

Bilingual

Notes from Optimizing CPU-Bound Go Hot Paths

2 days ago
  • #Go Performance
  • #CPU Optimization
  • #Compiler Limitations
  • Go's idiomatic abstractions (generics, interfaces, closures) often hinder performance in hot loops due to lack of inlining, leading to significant call overhead.
  • Generic functions in Go are not fully monomorphized like in C++ or Rust; they use GC Shape Stenciling, causing interface-style dispatch in hot paths.
  • Performance workarounds include manual code duplication (e.g., 16 similar functions in a Brotli port) or code generation, but this increases maintenance burden.
  • Benchmarks show substantial throughput penalties: generics (-15.18%), closures (-14.82%), and interfaces (-27.44%) compared to concrete implementations.
  • Assembly analysis reveals extra instructions in non-inlined versions (e.g., reloading arguments, bounds checks, nil checks) that degrade hot loop efficiency.
  • Go lacks intrinsics for operations like memory prefetching and SIMD (experimental in Go 1.26), forcing assembly use with non-inlinable call overhead.
  • Compiler hints like //go:inline and //go:nobounds are absent, making it hard to force optimizations; workarounds include reshaping code or using unsafe pointers.
  • Code layout sensitivity can cause benchmark noise (e.g., ±3-4% variations), complicating optimization validation due to lack of tools like BOLT for Go.
  • CPU-bound Go code often requires specialization, duplication, and low-level tricks, diverging from idiomatic abstraction for performance gains.
  • Trade-offs in Go's design (e.g., runtime model for GC) prioritize IO-bound workloads, making CPU-bound optimization more manual and less elegant.