Hasty Briefsbeta

Bilingual

Hung by a thread

3 months ago
  • #deadlock
  • #debugging
  • #robotics
  • The author's robot control loop froze consistently 16 seconds after a client connected, despite no crashes or errors.
  • Debugging attempts included changing thread handling and mutex types, but the issue persisted at iteration 1,615 every time.
  • A heartbeat thread revealed the loop was blocked, not slow or starved, indicating a deadlock.
  • GDB identified unexpected Rayon worker threads, traced back to the Rerun visualization SDK used for telemetry.
  • The deadlock occurred because the author called Rerun's `recorder.log()` while holding a mutex, a known issue with Rayon's work-stealing threads.
  • The solution was to reduce the time the mutex was held, fixing the issue with minimal code changes.
  • Key lessons include the value of GDB for deadlocks, being wary of unexpected threads, understanding dependency threading models, and the utility of heartbeat threads.
  • The author submitted a PR to Rerun to document the issue, hoping to prevent future occurrences.