What Are Traces and Spans in OpenTelemetry?

14 days ago

Copy Link

Distributed tracing in OpenTelemetry revolves around two core concepts: Trace (the full journey of a request across services) and Span (a timed unit of work within that journey).
Spans contain metadata like name, start/end times, IDs, attributes, events, status, and links, which help in diagnosing performance issues and understanding request flows.
Setting up tracing in Node.js involves installing OpenTelemetry packages, configuring exporters, samplers, and resources, and initializing telemetry before the application starts.
Manual instrumentation allows for detailed tracing of specific operations, with utilities like `withSpan` to handle errors and context propagation automatically.
Context propagation ensures trace continuity across asynchronous boundaries and network calls, using W3C trace headers for HTTP and messaging systems.
Sampling strategies include head sampling (deciding at trace start) and tail sampling (deciding after seeing the full trace), balancing cost and observability.
Best practices for span naming emphasize stability, low cardinality, and action-oriented names to maintain trace usability and searchability.
Common anti-patterns include span explosion, high-cardinality names, logging everything as events, and mixing domains in a single span.
Correlating traces with metrics and logs enhances observability, enabling comprehensive monitoring and troubleshooting of distributed systems.

Hasty Briefsbeta