Hasty Briefsbeta

Bilingual

Optimizing Tail Sampling in OpenTelemetry with Retroactive Sampling

4 days ago
  • #Distributed Tracing
  • #OpenTelemetry
  • #Performance Optimization
  • VictoriaMetrics presented retroactive sampling at KubeCon Europe 2026 to reduce costs in OpenTelemetry trace collection.
  • Retroactive sampling sends only minimal span attributes (e.g., trace_id, status_code) to the collector for decisions, buffering raw data on edge agents.
  • It lowers network traffic by up to 70% and reduces CPU and memory usage by 60–70% compared to tail sampling.
  • Edge agents use an on-disk FIFO queue instead of in-memory buffers, cutting memory pressure and enabling efficient data retrieval for sampled traces.
  • A benchmark with 15,000–30,000 spans/s showed retroactive sampling uses 1.7 GB disk vs. 4 GB memory for tail sampling, with significant resource savings.
  • Limitations include reduced decision context if many attributes are needed; hybrid approaches can combine agent and collector decisions.
  • Disk-based designs (like Pebble in OpenTelemetry) also reduce memory but increase CPU usage, highlighting trade-offs.
  • VictoriaMetrics plans to donate retroactive sampling as an OpenTelemetry processor and integrate it into vtagent in 2026.