Optimizing Tail Sampling in OpenTelemetry with Retroactive Sampling

4 days ago

VictoriaMetrics presented retroactive sampling at KubeCon Europe 2026 to reduce costs in OpenTelemetry trace collection.
Retroactive sampling sends only minimal span attributes (e.g., trace_id, status_code) to the collector for decisions, buffering raw data on edge agents.
It lowers network traffic by up to 70% and reduces CPU and memory usage by 60–70% compared to tail sampling.
Edge agents use an on-disk FIFO queue instead of in-memory buffers, cutting memory pressure and enabling efficient data retrieval for sampled traces.
A benchmark with 15,000–30,000 spans/s showed retroactive sampling uses 1.7 GB disk vs. 4 GB memory for tail sampling, with significant resource savings.
Limitations include reduced decision context if many attributes are needed; hybrid approaches can combine agent and collector decisions.
Disk-based designs (like Pebble in OpenTelemetry) also reduce memory but increase CPU usage, highlighting trade-offs.
VictoriaMetrics plans to donate retroactive sampling as an OpenTelemetry processor and integrate it into vtagent in 2026.

Hasty Briefsbeta