Skip to content
Published on

OpenTelemetry Distributed Tracing Practical Guide: Building and Operating Instrumentation, Collection, and Analysis Pipelines

Authors
  • Name
    Twitter
OpenTelemetry Distributed Tracing

Introduction

In a microservices architecture, a single user request can traverse dozens of services before being fulfilled. Distributed tracing is essential to identify which service caused latency or where errors propagated along the call chain. OpenTelemetry (OTel) is a CNCF graduated project that provides a unified API and SDK for collecting traces, metrics, and logs as an observability standard.

This article covers everything needed for production operations, from OpenTelemetry's architecture and core concepts, to language-specific instrumentation in Python/Node.js/Go, Collector pipeline configuration, sampling strategies, backend comparison, and failure cases with checklists encountered in production environments.

OpenTelemetry Architecture Overview

Core Components

OpenTelemetry consists of the following components:

  • API: A vendor-neutral instrumentation interface. Used by library developers.
  • SDK: Concrete implementation of the API. Handles sampling, batching, and exporting.
  • Collector: A standalone process that receives, processes, and exports telemetry data.
  • Exporters: Modules that send collected data to backends such as Jaeger, Tempo, or Datadog.
  • Instrumentation Libraries: Framework-specific libraries supporting automatic instrumentation.

Trace Model

The core concepts of distributed tracing are as follows:

ConceptDescription
TraceThe complete path of a single request. Composed of multiple Spans
SpanAn individual unit of work within a trace
SpanContextContext containing SpanID, TraceID, TraceFlags, TraceState
TraceID128-bit ID that uniquely identifies a trace
SpanID64-bit ID that uniquely identifies a span
Parent SpanThe parent span that created the current span
BaggageKey-value pairs propagated across the entire trace
AttributesMetadata (key-value pairs) attached to a span
EventsPoint-in-time events within a span (similar to logs)
LinksCausal relationship connections to other traces/spans

Manual Instrumentation

Python Instrumentation

# pip install opentelemetry-api opentelemetry-sdk opentelemetry-exporter-otlp

from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.resources import Resource
from opentelemetry.semconv.resource import ResourceAttributes

# Define resource
resource = Resource.create({
    ResourceAttributes.SERVICE_NAME: "order-service",
    ResourceAttributes.SERVICE_VERSION: "1.2.0",
    ResourceAttributes.DEPLOYMENT_ENVIRONMENT: "production",
})

# Configure TracerProvider
provider = TracerProvider(resource=resource)
exporter = OTLPSpanExporter(endpoint="http://otel-collector:4317")
provider.add_span_processor(BatchSpanProcessor(exporter))
trace.set_tracer_provider(provider)

# Create Tracer
tracer = trace.get_tracer("order-service", "1.2.0")


# Usage example: Order processing
def create_order(customer_id: str, items: list) -> dict:
    with tracer.start_as_current_span(
        "create_order",
        attributes={
            "customer.id": customer_id,
            "order.item_count": len(items),
        },
    ) as span:
        try:
            # Check inventory
            with tracer.start_as_current_span("check_inventory") as inventory_span:
                available = check_inventory(items)
                inventory_span.set_attribute("inventory.all_available", available)

            if not available:
                span.set_status(trace.StatusCode.ERROR, "Inventory not available")
                raise ValueError("Some items are out of stock")

            # Process payment
            with tracer.start_as_current_span("process_payment") as payment_span:
                payment_result = process_payment(customer_id, items)
                payment_span.set_attribute("payment.transaction_id", payment_result["tx_id"])
                payment_span.add_event("payment_completed", {
                    "amount": payment_result["amount"],
                    "currency": "KRW",
                })

            # Save order
            with tracer.start_as_current_span("save_order"):
                order = save_to_database(customer_id, items, payment_result)

            span.set_attribute("order.id", order["id"])
            return order

        except Exception as e:
            span.set_status(trace.StatusCode.ERROR, str(e))
            span.record_exception(e)
            raise

Node.js Instrumentation

// npm install @opentelemetry/api @opentelemetry/sdk-node
// npm install @opentelemetry/exporter-trace-otlp-grpc
// npm install @opentelemetry/semantic-conventions

const { NodeSDK } = require('@opentelemetry/sdk-node')
const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-grpc')
const { Resource } = require('@opentelemetry/resources')
const { ATTR_SERVICE_NAME, ATTR_SERVICE_VERSION } = require('@opentelemetry/semantic-conventions')
const { trace, SpanStatusCode } = require('@opentelemetry/api')

// Initialize SDK
const sdk = new NodeSDK({
  resource: new Resource({
    [ATTR_SERVICE_NAME]: 'user-service',
    [ATTR_SERVICE_VERSION]: '2.1.0',
  }),
  traceExporter: new OTLPTraceExporter({
    url: 'http://otel-collector:4317',
  }),
})

sdk.start()

const tracer = trace.getTracer('user-service', '2.1.0')

// Usage example: User lookup
async function getUser(userId) {
  return tracer.startActiveSpan('getUser', async (span) => {
    try {
      span.setAttribute('user.id', userId)

      // Database query
      const user = await tracer.startActiveSpan('db.query', async (dbSpan) => {
        dbSpan.setAttribute('db.system', 'postgresql')
        dbSpan.setAttribute('db.statement', 'SELECT * FROM users WHERE id = ?')
        const result = await db.query('SELECT * FROM users WHERE id = $1', [userId])
        dbSpan.setAttribute('db.row_count', result.rows.length)
        dbSpan.end()
        return result.rows[0]
      })

      if (!user) {
        span.setStatus({ code: SpanStatusCode.ERROR, message: 'User not found' })
        return null
      }

      // Cache update
      await tracer.startActiveSpan('cache.set', async (cacheSpan) => {
        cacheSpan.setAttribute('cache.system', 'redis')
        cacheSpan.setAttribute('cache.key', `user:${userId}`)
        await redis.set(`user:${userId}`, JSON.stringify(user), 'EX', 3600)
        cacheSpan.end()
      })

      span.setStatus({ code: SpanStatusCode.OK })
      return user
    } catch (error) {
      span.setStatus({ code: SpanStatusCode.ERROR, message: error.message })
      span.recordException(error)
      throw error
    } finally {
      span.end()
    }
  })
}

Go Instrumentation

package main

import (
    "context"
    "log"

    "go.opentelemetry.io/otel"
    "go.opentelemetry.io/otel/attribute"
    "go.opentelemetry.io/otel/codes"
    "go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc"
    "go.opentelemetry.io/otel/sdk/resource"
    sdktrace "go.opentelemetry.io/otel/sdk/trace"
    semconv "go.opentelemetry.io/otel/semconv/v1.26.0"
    "go.opentelemetry.io/otel/trace"
)

func initTracer() (*sdktrace.TracerProvider, error) {
    exporter, err := otlptracegrpc.New(
        context.Background(),
        otlptracegrpc.WithEndpoint("otel-collector:4317"),
        otlptracegrpc.WithInsecure(),
    )
    if err != nil {
        return nil, err
    }

    res, err := resource.New(
        context.Background(),
        resource.WithAttributes(
            semconv.ServiceName("payment-service"),
            semconv.ServiceVersion("3.0.1"),
            semconv.DeploymentEnvironmentKey.String("production"),
        ),
    )
    if err != nil {
        return nil, err
    }

    tp := sdktrace.NewTracerProvider(
        sdktrace.WithBatcher(exporter),
        sdktrace.WithResource(res),
        sdktrace.WithSampler(sdktrace.ParentBased(sdktrace.TraceIDRatioBased(0.1))),
    )
    otel.SetTracerProvider(tp)
    return tp, nil
}

var tracer = otel.Tracer("payment-service")

func ProcessPayment(ctx context.Context, orderID string, amount float64) error {
    ctx, span := tracer.Start(ctx, "ProcessPayment",
        trace.WithAttributes(
            attribute.String("order.id", orderID),
            attribute.Float64("payment.amount", amount),
        ),
    )
    defer span.End()

    // Fraud detection check
    ctx, fraudSpan := tracer.Start(ctx, "fraud_detection")
    isFraud, err := checkFraud(ctx, orderID, amount)
    if err != nil {
        fraudSpan.SetStatus(codes.Error, err.Error())
        fraudSpan.RecordError(err)
        fraudSpan.End()
        return err
    }
    fraudSpan.SetAttributes(attribute.Bool("fraud.detected", isFraud))
    fraudSpan.End()

    if isFraud {
        span.SetStatus(codes.Error, "Fraud detected")
        return fmt.Errorf("fraud detected for order %s", orderID)
    }

    // Payment gateway call
    ctx, gwSpan := tracer.Start(ctx, "payment_gateway_call")
    txID, err := callPaymentGateway(ctx, amount)
    if err != nil {
        gwSpan.SetStatus(codes.Error, err.Error())
        gwSpan.RecordError(err)
        gwSpan.End()
        return err
    }
    gwSpan.SetAttributes(attribute.String("payment.transaction_id", txID))
    gwSpan.End()

    span.SetStatus(codes.Ok, "Payment processed successfully")
    return nil
}

Auto-Instrumentation

Python Auto-Instrumentation

# Install auto-instrumentation packages
pip install opentelemetry-distro opentelemetry-exporter-otlp
opentelemetry-bootstrap -a install

# Run with environment variable configuration
OTEL_SERVICE_NAME=order-service \
OTEL_TRACES_EXPORTER=otlp \
OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4317 \
OTEL_PYTHON_LOG_CORRELATION=true \
opentelemetry-instrument python app.py

Node.js Auto-Instrumentation

// tracing.js - Load before app startup
const { NodeSDK } = require('@opentelemetry/sdk-node')
const { getNodeAutoInstrumentations } = require('@opentelemetry/auto-instrumentations-node')
const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-grpc')

const sdk = new NodeSDK({
  traceExporter: new OTLPTraceExporter({
    url: 'http://otel-collector:4317',
  }),
  instrumentations: [
    getNodeAutoInstrumentations({
      '@opentelemetry/instrumentation-http': {
        ignoreIncomingPaths: ['/health', '/ready'],
      },
      '@opentelemetry/instrumentation-express': {
        enabled: true,
      },
      '@opentelemetry/instrumentation-pg': {
        enabled: true,
        enhancedDatabaseReporting: true,
      },
    }),
  ],
})

sdk.start()

process.on('SIGTERM', () => {
  sdk.shutdown().then(() => process.exit(0))
})
# Run app with auto-instrumentation
node --require ./tracing.js app.js

OpenTelemetry Collector Pipeline

Collector Architecture

The Collector consists of three components:

  • Receivers: Entry points that receive telemetry data. Supports protocols like OTLP, Jaeger, Zipkin.
  • Processors: Transform, filter, and batch data. Add/remove attributes, apply sampling, etc.
  • Exporters: Send processed data to backends. Jaeger, Tempo, Datadog, etc.

Collector Configuration Example

# otel-collector-config.yaml
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

  # Can also receive Jaeger format
  jaeger:
    protocols:
      thrift_http:
        endpoint: 0.0.0.0:14268

processors:
  # Batch processing for network efficiency
  batch:
    send_batch_size: 1024
    send_batch_max_size: 2048
    timeout: 5s

  # Limit memory usage
  memory_limiter:
    check_interval: 1s
    limit_mib: 512
    spike_limit_mib: 128

  # Add resource attributes
  resource:
    attributes:
      - key: environment
        value: production
        action: upsert
      - key: cluster
        value: ap-northeast-2-prod
        action: upsert

  # Remove sensitive attributes (cost reduction)
  attributes:
    actions:
      - key: http.request.header.authorization
        action: delete
      - key: db.statement
        action: hash # Hash SQL queries (security)

  # Tail-based sampling
  tail_sampling:
    decision_wait: 10s
    num_traces: 100000
    policies:
      - name: errors-policy
        type: status_code
        status_code:
          status_codes:
            - ERROR
      - name: slow-traces-policy
        type: latency
        latency:
          threshold_ms: 1000
      - name: probabilistic-policy
        type: probabilistic
        probabilistic:
          sampling_percentage: 10

exporters:
  # Send to Grafana Tempo
  otlp/tempo:
    endpoint: tempo:4317
    tls:
      insecure: true

  # Send to Jaeger
  otlp/jaeger:
    endpoint: jaeger:4317
    tls:
      insecure: true

  # Debug log output
  debug:
    verbosity: detailed

service:
  pipelines:
    traces:
      receivers: [otlp, jaeger]
      processors: [memory_limiter, resource, attributes, tail_sampling, batch]
      exporters: [otlp/tempo, debug]

  telemetry:
    logs:
      level: info
    metrics:
      address: 0.0.0.0:8888

Collector Deployment Modes

# Deploy Collector with Docker Compose
version: '3.8'
services:
  otel-collector:
    image: otel/opentelemetry-collector-contrib:0.96.0
    command: ['--config=/etc/otel/config.yaml']
    volumes:
      - ./otel-collector-config.yaml:/etc/otel/config.yaml
    ports:
      - '4317:4317' # OTLP gRPC
      - '4318:4318' # OTLP HTTP
      - '8888:8888' # Prometheus metrics
      - '8889:8889' # Prometheus exporter
      - '13133:13133' # Health check
    deploy:
      resources:
        limits:
          memory: 1G
          cpus: '1.0'

Sampling Strategies

Head-based vs Tail-based Sampling

CharacteristicHead-basedTail-based
Decision PointAt trace startAfter trace completion
Based OnTraceID hashComplete trace data
AdvantagesLow overhead, simple to implementGuaranteed capture of error/latency traces
DisadvantagesMay miss important tracesHigh memory usage, complex
Best ForHigh traffic, cost-sensitiveDebugging-focused, quality-first
ImplementationSDK (client-side)Collector (server-side)

Sampling Configuration Examples

# Head-based sampling in Python SDK
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.sampling import (
    TraceIdRatioBased,
    ParentBased,
    ALWAYS_ON,
    ALWAYS_OFF,
)

# 10% probability sampling (follows parent span decision)
sampler = ParentBased(root=TraceIdRatioBased(0.1))

provider = TracerProvider(
    resource=resource,
    sampler=sampler,
)
# Tail-based sampling in Collector
processors:
  tail_sampling:
    decision_wait: 30s
    num_traces: 50000
    expected_new_traces_per_sec: 1000
    policies:
      # Collect 100% of error traces
      - name: error-traces
        type: status_code
        status_code:
          status_codes: [ERROR]

      # Collect 100% of traces exceeding 1 second
      - name: slow-traces
        type: latency
        latency:
          threshold_ms: 1000

      # Collect 50% of traces from critical services
      - name: critical-service
        type: string_attribute
        string_attribute:
          key: service.name
          values: [payment-service, auth-service]
          enabled_regex_matching: false
        type: and
        and:
          and_sub_policy:
            - name: sample-critical
              type: probabilistic
              probabilistic:
                sampling_percentage: 50

      # Collect 5% of remaining traces
      - name: default
        type: probabilistic
        probabilistic:
          sampling_percentage: 5

Backend Comparison

ItemJaegerGrafana TempoZipkinDatadogNew Relic
LicenseApache 2.0AGPLv3Apache 2.0CommercialCommercial
StorageCassandra, ES, MemoryObject Storage (S3 etc.)Cassandra, ES, MySQLProprietaryProprietary
Query LanguageBuilt-in UI/APITraceQLBuilt-in UI/APIBuilt-in queryNRQL
CostFree (infra costs)Free (infra costs)Free (infra costs)Per-trace billingPer-trace billing
ScalingHorizontal scalingExcellentLimitedAutomaticAutomatic
OTel SupportNativeNativeNativeNativeNative
Logs/Metrics IntegrationLimitedGrafana stack integrationLimitedFull integrationFull integration
Operational ComplexityModerateLow (object storage)LowNone (SaaS)None (SaaS)

Context Propagation

W3C TraceContext

W3C TraceContext is the standard for propagating trace information through standard HTTP headers.

traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
             ^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^ ^^
           version     trace-id (32 hex)        parent-id (16 hex) flags
# W3C TraceContext propagation setup in Python
from opentelemetry import propagate
from opentelemetry.propagators.composite import CompositePropagator
from opentelemetry.propagators.textmap import DefaultTextMapPropagator

# W3C TraceContext (default)
propagate.set_global_textmap(
    CompositePropagator([
        DefaultTextMapPropagator(),  # W3C TraceContext
    ])
)

# Inject context into HTTP request
import requests
from opentelemetry.propagate import inject

headers = {}
inject(headers)  # Automatically adds traceparent, tracestate headers
response = requests.get("http://downstream-service/api/data", headers=headers)

B3 Propagation (Zipkin Compatible)

# B3 propagation setup (when Zipkin compatibility is needed)
from opentelemetry.propagators.b3 import B3MultiFormat

propagate.set_global_textmap(
    CompositePropagator([
        DefaultTextMapPropagator(),  # W3C
        B3MultiFormat(),             # B3 (Zipkin compatible)
    ])
)

Log and Metric Correlation

# Include trace ID in logs for correlation
import logging
from opentelemetry import trace

class TraceIdFilter(logging.Filter):
    def filter(self, record):
        span = trace.get_current_span()
        if span.is_recording():
            ctx = span.get_span_context()
            record.trace_id = format(ctx.trace_id, '032x')
            record.span_id = format(ctx.span_id, '016x')
        else:
            record.trace_id = '0' * 32
            record.span_id = '0' * 16
        return True

# Log configuration
handler = logging.StreamHandler()
handler.setFormatter(logging.Formatter(
    '%(asctime)s %(levelname)s [trace_id=%(trace_id)s span_id=%(span_id)s] %(message)s'
))
handler.addFilter(TraceIdFilter())
logger = logging.getLogger(__name__)
logger.addHandler(handler)

eBPF-Based Zero-Code Instrumentation

Using eBPF (extended Berkeley Packet Filter), tracing data can be collected at the kernel level without modifying application code. Grafana Beyla is a representative tool for this approach.

# Deploy Grafana Beyla on Kubernetes
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: beyla
spec:
  selector:
    matchLabels:
      app: beyla
  template:
    metadata:
      labels:
        app: beyla
    spec:
      hostPID: true
      containers:
        - name: beyla
          image: grafana/beyla:latest
          securityContext:
            privileged: true
          env:
            - name: BEYLA_OPEN_PORT
              value: '80,443,8080,3000'
            - name: OTEL_EXPORTER_OTLP_ENDPOINT
              value: 'http://otel-collector:4317'
            - name: BEYLA_SERVICE_NAMESPACE
              value: 'production'
          volumeMounts:
            - name: sys-kernel
              mountPath: /sys/kernel
      volumes:
        - name: sys-kernel
          hostPath:
            path: /sys/kernel

Pros and cons of eBPF-based instrumentation:

  • Advantages: No code changes required, language-independent, low overhead
  • Disadvantages: Cannot add business context (user IDs, etc.), requires Linux kernel 4.18+, supports limited protocols only

Failure Cases and Recovery Procedures

Case 1: Context Loss Across Async Boundaries

# Problem: Trace context lost in async tasks
import asyncio
from opentelemetry import trace, context

async def process_order(order_id: str):
    with tracer.start_as_current_span("process_order") as span:
        # Wrong: Context not propagated to new task
        asyncio.create_task(send_notification(order_id))  # Context lost!

# Fix: Explicitly pass context
async def process_order_fixed(order_id: str):
    with tracer.start_as_current_span("process_order") as span:
        ctx = context.get_current()
        asyncio.create_task(send_notification_with_context(order_id, ctx))

async def send_notification_with_context(order_id: str, ctx):
    token = context.attach(ctx)
    try:
        with tracer.start_as_current_span("send_notification"):
            # Notification sending logic
            pass
    finally:
        context.detach(token)

Case 2: Trace Loss Due to Sampling Misconfiguration

# Problem: Head-based sampling at 0.1% drops most error traces too
# SDK configuration
sampler: TraceIdRatioBased(0.001)  # 0.1% - too low

# Fix: Use ParentBased + tail_sampling combination
# Collect everything in SDK
sampler: ParentBased(root=ALWAYS_ON)

# Use tail-based sampling in Collector to guarantee error/latency capture
processors:
  tail_sampling:
    policies:
      - name: errors
        type: status_code
        status_code:
          status_codes: [ERROR]
      - name: latency
        type: latency
        latency:
          threshold_ms: 500
      - name: default
        type: probabilistic
        probabilistic:
          sampling_percentage: 1

Case 3: Collector Out of Memory (OOM)

# Problem: Collector terminates with OOM during traffic spikes

# Fix: Always add memory_limiter processor
processors:
  memory_limiter:
    check_interval: 1s
    limit_mib: 1024 # Maximum memory usage
    spike_limit_mib: 256 # Spike allowance
    limit_percentage: 80 # 80% of total memory

  batch:
    send_batch_size: 512 # Reduce batch size
    timeout: 2s

service:
  pipelines:
    traces:
      # Place memory_limiter at the front of the processor chain
      processors: [memory_limiter, batch, tail_sampling]

Case 4: Propagation Header Mismatch Between Services

When Service A uses W3C TraceContext and Service B uses B3 format, the context breaks.

The solution is to either use the same propagation format across all services, or support multiple formats simultaneously using CompositePropagator.

Production Checklist

Instrumentation

  • Verify OpenTelemetry SDK is installed in all services
  • Verify service name, version, and environment are included in resource attributes
  • Verify custom spans are added for key business transactions
  • Verify sensitive information (passwords, tokens, etc.) is not included in span attributes
  • Verify context is correctly propagated across async boundaries

Collector

  • Verify memory_limiter processor is placed first in the pipeline
  • Verify batch processor size and timeout are appropriate
  • Verify health check endpoint is configured
  • Verify the Collector's own metrics are being monitored
  • Verify security-sensitive attributes are removed/hashed by the attributes processor

Sampling

  • Verify error traces are collected at 100%
  • Verify latency traces (SLO violations) are collected
  • Verify sampling rate is within cost budget
  • Verify the combination of head-based and tail-based sampling is appropriate

Operations

  • Verify trace-log-metric correlation is configured
  • Verify service maps and dependency graphs display in dashboards
  • Verify alert rules are linked to trace-based SLOs
  • Verify trace data retention period is configured
  • Verify context propagation format is consistent across all services

References