Hasty Briefsbeta

Bilingual

Tracing Discord's Elixir Systems (Without Melting Everything)

6 hours ago
  • #Discord
  • #Observability
  • #Elixir
  • Discord aims for instant user interactions by leveraging Elixir's concurrency to run guilds independently.
  • When guilds lag or fail, on-call engineers use observability tools to diagnose and prevent recurrence.
  • Initial investigations rely on metrics and logs, which may hint at bursty activity but lack user experience context.
  • For deeper insights, engineers use 'guild timings,' a custom tool recording minute-by-minute action processing, though data is volatile.
  • Distributed tracing (APM) offers detailed operation insights but required custom integration due to Elixir's communication limitations.
  • Discord successfully integrated distributed tracing without downtime, enhancing performance monitoring.