Skip to content
Published on

OpenTelemetry Collector Production Guide: Pipeline Architecture, Custom Processors, and Scaling Strategies

Authors
  • Name
    Twitter
OpenTelemetry Collector

Introduction

The OpenTelemetry Collector is the vendor-agnostic backbone of modern observability infrastructure. It receives, processes, and exports telemetry data (traces, metrics, and logs) through a configurable pipeline model. While getting a basic Collector running is straightforward, operating it at production scale across hundreds of microservices and millions of spans per second demands careful architecture, tuned processors, and deliberate scaling strategies.

This guide covers the full lifecycle of production Collector deployment: pipeline architecture internals, receiver/processor/exporter configuration, custom processor development in Go, agent vs gateway deployment patterns, memory limiter and batch processor tuning, tail sampling strategies, failure recovery procedures, and a detailed comparison against alternative telemetry pipelines.

Telemetry Pipeline Comparison

Before diving into the Collector itself, it is worth understanding where it stands relative to other popular telemetry tools.

FeatureOTel CollectorFluentdVectorTelegraf
LanguageGoRuby/CRustGo
Signal typesTraces, Metrics, LogsLogs onlyLogs, Metrics, TracesMetrics primarily
Protocol supportOTLP, Jaeger, Zipkin, Prometheus, and moreSyslog, HTTP, TCP, file tailSyslog, HTTP, file, KafkaSNMP, StatsD, Prometheus, and more
Plugin ecosystem100+ receivers/exporters in contrib800+ community plugins100+ sources/sinks300+ input/output plugins
Memory (10K events/s)100-200 MB200-400 MB50-100 MB80-150 MB
Processing modelPipeline (receiver-processor-exporter)Filter/match/output chainTransform/sink DAGInput/processor/aggregator/output
CNCF statusGraduated (OpenTelemetry)GraduatedSandbox (was)Not CNCF
Best forFull-stack observabilityLog aggregationHigh-perf log routingMetrics collection
Vendor lock-inNone (vendor-agnostic by design)LowLowLow (InfluxDB affinity)

The Collector wins when you need unified telemetry collection across all three signal types with a single deployment. Fluentd remains strong for pure log pipelines with complex parsing. Vector excels at raw throughput with minimal resource usage. Telegraf is ideal for metrics-heavy workloads, especially with InfluxDB.

Pipeline Architecture Deep Dive

A Collector pipeline defines the path telemetry data follows from ingestion to export. Each pipeline operates on one signal type: traces, metrics, or logs.

Core Components

Receivers listen on network ports or actively scrape endpoints to ingest telemetry data. A single receiver instance can feed multiple pipelines through an internal fan-out consumer.

Processors transform data in sequence. They can filter, enrich, batch, sample, or rate-limit telemetry. Processor ordering matters: the memory limiter should always come first.

Exporters send processed data to backends. Multiple pipelines can share the same exporter instance. Exporters handle retry logic and queuing.

Connectors (introduced in Collector v0.86+) act as both exporter and receiver, enabling data to flow between pipelines. A common use case is the spanmetrics connector that generates RED metrics from trace spans.

Pipeline Configuration

# otel-collector-config.yaml
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318
  prometheus:
    config:
      scrape_configs:
        - job_name: 'kubernetes-pods'
          kubernetes_sd_configs:
            - role: pod
          relabel_configs:
            - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
              action: keep
              regex: true
  filelog:
    include:
      - /var/log/pods/*/*/*.log
    operators:
      - type: json_parser
        timestamp:
          parse_from: attributes.time
          layout: '%Y-%m-%dT%H:%M:%S.%LZ'

processors:
  memory_limiter:
    check_interval: 1s
    limit_mib: 1800
    spike_limit_mib: 400
  batch:
    timeout: 1s
    send_batch_size: 1024
    send_batch_max_size: 2048
  attributes:
    actions:
      - key: environment
        value: production
        action: upsert
      - key: collector.version
        value: '0.98.0'
        action: insert
  resource:
    attributes:
      - key: service.namespace
        value: platform
        action: upsert

exporters:
  otlp/tempo:
    endpoint: tempo-distributor.observability:4317
    tls:
      insecure: true
    retry_on_failure:
      enabled: true
      initial_interval: 5s
      max_interval: 30s
      max_elapsed_time: 300s
    sending_queue:
      enabled: true
      num_consumers: 10
      queue_size: 5000
  prometheusremotewrite:
    endpoint: http://mimir-distributor.observability:9009/api/v1/push
    resource_to_telemetry_conversion:
      enabled: true
  otlp/loki:
    endpoint: loki-distributor.observability:3100
    tls:
      insecure: true

connectors:
  spanmetrics:
    histogram:
      explicit:
        buckets: [5ms, 10ms, 25ms, 50ms, 100ms, 250ms, 500ms, 1s, 2.5s, 5s, 10s]
    dimensions:
      - name: http.method
      - name: http.status_code
    metrics_flush_interval: 15s

service:
  telemetry:
    logs:
      level: info
    metrics:
      address: 0.0.0.0:8888
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [otlp/tempo, spanmetrics]
    metrics:
      receivers: [otlp, prometheus, spanmetrics]
      processors: [memory_limiter, batch]
      exporters: [prometheusremotewrite]
    logs:
      receivers: [otlp, filelog]
      processors: [memory_limiter, attributes, batch]
      exporters: [otlp/loki]

This configuration demonstrates a complete three-signal pipeline with the spanmetrics connector bridging traces to metrics.

Memory Limiter and Batch Processor

These two processors are non-negotiable in production. Every pipeline should include both.

Memory Limiter

The memory limiter prevents OOM kills by monitoring the Collector process heap usage. When memory exceeds limit_mib, the Collector starts refusing incoming data and triggers garbage collection. The spike_limit_mib defines the maximum expected spike, so the effective soft limit becomes limit_mib - spike_limit_mib.

Best practices:

  • Set limit_mib to 70-80% of your container memory limit. For a 2Gi pod, use 1400-1600 MiB.
  • Set spike_limit_mib to 20-25% of limit_mib.
  • Set check_interval to 1s. Shorter intervals increase CPU usage; longer intervals risk overshooting.
  • Always place the memory limiter first in the processor chain.

Batch Processor

The batch processor groups telemetry into batches before export, reducing per-request overhead and improving compression ratios.

processors:
  batch:
    # Send a batch after this many items accumulate
    send_batch_size: 1024
    # Hard cap to prevent oversized batches
    send_batch_max_size: 2048
    # Send whatever is buffered after this duration
    timeout: 1s

Guidelines:

  • send_batch_size of 512-2048 works well for most workloads.
  • timeout of 1-5s balances latency vs throughput. Lower for real-time alerting pipelines.
  • Place the batch processor after the memory limiter and any filtering/sampling processors. Batching data before dropping it wastes CPU.

Tail Sampling Strategies

Tail sampling makes sampling decisions after collecting all spans of a trace. This enables intelligent decisions: keep all error traces, keep slow traces, sample healthy traces at a lower rate. The tail sampling processor lives in the opentelemetry-collector-contrib distribution.

Critical Constraint

All spans belonging to the same trace must arrive at the same Collector instance. In a gateway deployment, use a load balancing exporter with trace_id routing on the agent tier.

# Agent-tier config: route by trace ID to gateway
exporters:
  loadbalancing:
    protocol:
      otlp:
        tls:
          insecure: true
    resolver:
      dns:
        hostname: otel-gateway-headless.observability
        port: 4317

# Gateway-tier config: tail sampling
processors:
  tail_sampling:
    decision_wait: 10s
    num_traces: 100000
    expected_new_traces_per_sec: 2000
    policies:
      # Policy 1: Always keep error traces
      - name: errors-policy
        type: status_code
        status_code:
          status_codes: [ERROR]
      # Policy 2: Always keep slow traces (over 2 seconds)
      - name: latency-policy
        type: latency
        latency:
          threshold_ms: 2000
      # Policy 3: Keep traces touching critical services
      - name: critical-services
        type: string_attribute
        string_attribute:
          key: service.name
          values: [payment-service, auth-service, order-service]
      # Policy 4: Probabilistic sampling for everything else
      - name: probabilistic-policy
        type: probabilistic
        probabilistic:
          sampling_percentage: 5
      # Composite policy: combine the above
      - name: composite-policy
        type: composite
        composite:
          max_total_spans_per_second: 5000
          policy_order: [errors-policy, latency-policy, critical-services, probabilistic-policy]
          rate_allocation:
            - policy: errors-policy
              percent: 40
            - policy: latency-policy
              percent: 25
            - policy: critical-services
              percent: 25
            - policy: probabilistic-policy
              percent: 10

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter, tail_sampling, batch]
      exporters: [otlp/tempo]

Tail Sampling Memory Sizing

The decision_wait parameter determines how long the Collector holds spans in memory before making a sampling decision. With num_traces: 100000 and average spans of 5 KB each, you need approximately 500 MB just for the trace buffer. Size your gateway pods accordingly: if decision_wait is 10s and you receive 2000 new traces/sec, you are holding 20,000 traces concurrently. Multiply by average spans per trace and span size to estimate memory.

Custom Processor Development

When built-in processors do not meet your needs, you can write a custom processor in Go using the Collector SDK. Below is a complete example of a processor that adds deployment metadata to all telemetry.

Project Structure

# Create the processor module
mkdir -p processor/deploymetadataprocessor
cd processor/deploymetadataprocessor

# Initialize Go module
go mod init github.com/myorg/otelcol-custom/processor/deploymetadataprocessor
go get go.opentelemetry.io/collector/processor
go get go.opentelemetry.io/collector/pdata

Configuration

// config.go
package deploymetadataprocessor

type Config struct {
    ClusterName    string `mapstructure:"cluster_name"`
    Region         string `mapstructure:"region"`
    DeploymentEnv  string `mapstructure:"deployment_env"`
    AddBuildInfo   bool   `mapstructure:"add_build_info"`
}

func createDefaultConfig() *Config {
    return &Config{
        ClusterName:   "default",
        Region:        "us-east-1",
        DeploymentEnv: "production",
        AddBuildInfo:  true,
    }
}

Factory

// factory.go
package deploymetadataprocessor

import (
    "context"

    "go.opentelemetry.io/collector/component"
    "go.opentelemetry.io/collector/consumer"
    "go.opentelemetry.io/collector/processor"
)

const (
    typeStr   = "deploy_metadata"
    stability = component.StabilityLevelDevelopment
)

func NewFactory() processor.Factory {
    return processor.NewFactory(
        component.MustNewType(typeStr),
        func() component.Config {
            return createDefaultConfig()
        },
        processor.WithTraces(createTracesProcessor, stability),
        processor.WithMetrics(createMetricsProcessor, stability),
        processor.WithLogs(createLogsProcessor, stability),
    )
}

func createTracesProcessor(
    ctx context.Context,
    set processor.Settings,
    cfg component.Config,
    next consumer.Traces,
) (processor.Traces, error) {
    pCfg := cfg.(*Config)
    return &deployMetadataTracesProcessor{
        config: pCfg,
        next:   next,
        logger: set.Logger,
    }, nil
}

// createMetricsProcessor and createLogsProcessor follow the same pattern

Processor Implementation

// processor_traces.go
package deploymetadataprocessor

import (
    "context"

    "go.opentelemetry.io/collector/consumer"
    "go.opentelemetry.io/collector/pdata/ptrace"
    "go.uber.org/zap"
)

type deployMetadataTracesProcessor struct {
    config *Config
    next   consumer.Traces
    logger *zap.Logger
}

func (p *deployMetadataTracesProcessor) ConsumeTraces(ctx context.Context, td ptrace.Traces) error {
    rss := td.ResourceSpans()
    for i := 0; i < rss.Len(); i++ {
        rs := rss.At(i)
        attrs := rs.Resource().Attributes()
        attrs.PutStr("deploy.cluster", p.config.ClusterName)
        attrs.PutStr("deploy.region", p.config.Region)
        attrs.PutStr("deploy.env", p.config.DeploymentEnv)
        if p.config.AddBuildInfo {
            attrs.PutStr("deploy.collector_build", "custom-v1.2.0")
        }
    }
    return p.next.ConsumeTraces(ctx, td)
}

func (p *deployMetadataTracesProcessor) Capabilities() consumer.Capabilities {
    return consumer.Capabilities{MutatesData: true}
}

func (p *deployMetadataTracesProcessor) Start(_ context.Context, _ component.Host) error {
    p.logger.Info("deploy_metadata processor started",
        zap.String("cluster", p.config.ClusterName),
        zap.String("region", p.config.Region))
    return nil
}

func (p *deployMetadataTracesProcessor) Shutdown(_ context.Context) error {
    return nil
}

Building with OCB

Use the OpenTelemetry Collector Builder (ocb) to compile a custom Collector binary that includes your processor.

# builder-config.yaml
dist:
  name: custom-otelcol
  description: Custom OTel Collector with deploy metadata processor
  output_path: ./dist
  version: 1.0.0

receivers:
  - gomod: go.opentelemetry.io/collector/receiver/otlpreceiver v0.98.0
  - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver v0.98.0

processors:
  - gomod: go.opentelemetry.io/collector/processor/batchprocessor v0.98.0
  - gomod: go.opentelemetry.io/collector/processor/memorylimiterprocessor v0.98.0
  - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/processor/tailsamplingprocessor v0.98.0
  - gomod: github.com/myorg/otelcol-custom/processor/deploymetadataprocessor v0.0.1

exporters:
  - gomod: go.opentelemetry.io/collector/exporter/otlpexporter v0.98.0
# Build the custom collector
ocb --config builder-config.yaml

# Run with your config
./dist/custom-otelcol --config otel-collector-config.yaml

Agent vs Gateway Deployment

Production environments typically deploy the Collector in two tiers: agents on every node and gateways as centralized processors.

Agent Mode (DaemonSet)

Agents run as Kubernetes DaemonSets with one pod per node. Their job is lightweight: receive local telemetry, add node-level metadata, and forward to gateways.

# agent-daemonset.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: otel-agent
  namespace: observability
spec:
  selector:
    matchLabels:
      app: otel-agent
  template:
    metadata:
      labels:
        app: otel-agent
    spec:
      containers:
        - name: otel-agent
          image: otel/opentelemetry-collector-contrib:0.98.0
          args: ['--config=/etc/otel/config.yaml']
          resources:
            requests:
              cpu: 100m
              memory: 256Mi
            limits:
              cpu: 500m
              memory: 512Mi
          ports:
            - containerPort: 4317 # OTLP gRPC
            - containerPort: 4318 # OTLP HTTP
          volumeMounts:
            - name: config
              mountPath: /etc/otel
            - name: varlog
              mountPath: /var/log
              readOnly: true
      volumes:
        - name: config
          configMap:
            name: otel-agent-config
        - name: varlog
          hostPath:
            path: /var/log

Agent configuration should be minimal: OTLP receiver, memory limiter, resource detection processor (to add k8s metadata), and a load-balancing exporter pointed at the gateway.

Gateway Mode (Deployment)

Gateways run as a Kubernetes Deployment with a Horizontal Pod Autoscaler. They handle heavy processing: tail sampling, span-to-metrics conversion, attribute enrichment, and export to backends.

# gateway-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: otel-gateway
  namespace: observability
spec:
  replicas: 3
  selector:
    matchLabels:
      app: otel-gateway
  template:
    metadata:
      labels:
        app: otel-gateway
    spec:
      containers:
        - name: otel-gateway
          image: custom-otelcol:1.0.0
          args: ['--config=/etc/otel/config.yaml']
          resources:
            requests:
              cpu: '1'
              memory: 4Gi
            limits:
              cpu: '2'
              memory: 6Gi
          ports:
            - containerPort: 4317
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: otel-gateway-hpa
  namespace: observability
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: otel-gateway
  minReplicas: 3
  maxReplicas: 12
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 60
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 70

Scaling Decision Framework

  • Under 1,000 spans/sec: A single Collector in standalone mode is sufficient.
  • 1,000-10,000 spans/sec: Agent mode on each node with a small gateway cluster (2-3 replicas).
  • 10,000-100,000 spans/sec: Agent-gateway pattern with HPA, load-balancing exporter for trace-ID affinity, and tail sampling on the gateway.
  • Over 100,000 spans/sec: Multi-tier gateways with sharding. Consider multiple gateway clusters partitioned by signal type or tenant. Implement backpressure propagation with the memory limiter.

Failure Cases and Recovery Procedures

Case 1: Gateway OOM Kill

Symptom: Gateway pods repeatedly restart with OOMKilled status.

Root cause: Tail sampling decision_wait is too long for the traffic volume, or num_traces is undersized.

Recovery:

# 1. Check current memory usage
kubectl top pods -n observability -l app=otel-gateway

# 2. Inspect OOM events
kubectl describe pod otel-gateway-xxx -n observability | grep -A5 "Last State"

# 3. Immediate relief: reduce decision_wait and increase replicas
kubectl scale deployment otel-gateway -n observability --replicas=6

# 4. Long-term: adjust tail sampling config
# Reduce decision_wait from 10s to 5s
# Increase num_traces from 100000 to 200000
# Increase pod memory limit to 8Gi and memory_limiter limit_mib to 5600

Case 2: Data Loss During Export Failure

Symptom: Backend (Tempo/Mimir) is temporarily unavailable. Telemetry data is lost.

Recovery: Configure persistent queuing with the file storage extension.

extensions:
  file_storage:
    directory: /var/otel/queue
    timeout: 10s
    compaction:
      on_start: true
      directory: /var/otel/queue/compaction

exporters:
  otlp/tempo:
    endpoint: tempo-distributor:4317
    sending_queue:
      enabled: true
      num_consumers: 10
      queue_size: 10000
      storage: file_storage
    retry_on_failure:
      enabled: true
      initial_interval: 5s
      max_interval: 60s
      max_elapsed_time: 600s

service:
  extensions: [file_storage]

This configuration persists the export queue to disk. If the Collector restarts or the backend recovers, queued data will be sent.

Case 3: High Cardinality Explosion

Symptom: Metrics pipeline memory grows unbounded. Prometheus remote write rejects data.

Recovery: Use the filter processor to drop high-cardinality attributes before export.

processors:
  filter:
    metrics:
      exclude:
        match_type: regexp
        metric_names:
          - '.*_temp_.*'
        resource_attributes:
          - key: service.instance.id
            value: '.*debug.*'
  transform:
    metric_statements:
      - context: datapoint
        statements:
          - delete_key(attributes, "http.url")
          - delete_key(attributes, "user.id")

Case 4: Agent Cannot Reach Gateway

Symptom: Agents log connection refused errors. Telemetry is dropped silently.

Recovery: Enable the sending queue on agents with bounded retry. Add health check extension for readiness probes.

extensions:
  health_check:
    endpoint: 0.0.0.0:13133
    path: /health

exporters:
  otlp:
    endpoint: otel-gateway.observability:4317
    sending_queue:
      enabled: true
      queue_size: 2000
    retry_on_failure:
      enabled: true
      initial_interval: 1s
      max_interval: 30s
      max_elapsed_time: 120s

service:
  extensions: [health_check]

Production Readiness Checklist

  • Memory limiter is the first processor in every pipeline
  • Batch processor is the last processor before exporters
  • Sending queues are enabled on all exporters with retry configured
  • Health check extension is deployed for Kubernetes liveness/readiness probes
  • Collector self-observability metrics (port 8888) are scraped by Prometheus
  • Resource limits are set in Kubernetes manifests (requests and limits)
  • HPA is configured for gateway deployments
  • Load-balancing exporter with trace-ID routing is used when tail sampling is active
  • File storage extension is configured for persistent queuing in critical pipelines
  • Dashboard monitoring Collector queue depth, dropped spans, and export latency
  • Alerting on otelcol_exporter_send_failed_spans_total and otelcol_processor_refused_spans_total

References