- Authors
- Name
- Introduction
- Telemetry Pipeline Comparison
- Pipeline Architecture Deep Dive
- Memory Limiter and Batch Processor
- Tail Sampling Strategies
- Custom Processor Development
- Agent vs Gateway Deployment
- Failure Cases and Recovery Procedures
- Production Readiness Checklist
- References

Introduction
The OpenTelemetry Collector is the vendor-agnostic backbone of modern observability infrastructure. It receives, processes, and exports telemetry data (traces, metrics, and logs) through a configurable pipeline model. While getting a basic Collector running is straightforward, operating it at production scale across hundreds of microservices and millions of spans per second demands careful architecture, tuned processors, and deliberate scaling strategies.
This guide covers the full lifecycle of production Collector deployment: pipeline architecture internals, receiver/processor/exporter configuration, custom processor development in Go, agent vs gateway deployment patterns, memory limiter and batch processor tuning, tail sampling strategies, failure recovery procedures, and a detailed comparison against alternative telemetry pipelines.
Telemetry Pipeline Comparison
Before diving into the Collector itself, it is worth understanding where it stands relative to other popular telemetry tools.
| Feature | OTel Collector | Fluentd | Vector | Telegraf |
|---|---|---|---|---|
| Language | Go | Ruby/C | Rust | Go |
| Signal types | Traces, Metrics, Logs | Logs only | Logs, Metrics, Traces | Metrics primarily |
| Protocol support | OTLP, Jaeger, Zipkin, Prometheus, and more | Syslog, HTTP, TCP, file tail | Syslog, HTTP, file, Kafka | SNMP, StatsD, Prometheus, and more |
| Plugin ecosystem | 100+ receivers/exporters in contrib | 800+ community plugins | 100+ sources/sinks | 300+ input/output plugins |
| Memory (10K events/s) | 100-200 MB | 200-400 MB | 50-100 MB | 80-150 MB |
| Processing model | Pipeline (receiver-processor-exporter) | Filter/match/output chain | Transform/sink DAG | Input/processor/aggregator/output |
| CNCF status | Graduated (OpenTelemetry) | Graduated | Sandbox (was) | Not CNCF |
| Best for | Full-stack observability | Log aggregation | High-perf log routing | Metrics collection |
| Vendor lock-in | None (vendor-agnostic by design) | Low | Low | Low (InfluxDB affinity) |
The Collector wins when you need unified telemetry collection across all three signal types with a single deployment. Fluentd remains strong for pure log pipelines with complex parsing. Vector excels at raw throughput with minimal resource usage. Telegraf is ideal for metrics-heavy workloads, especially with InfluxDB.
Pipeline Architecture Deep Dive
A Collector pipeline defines the path telemetry data follows from ingestion to export. Each pipeline operates on one signal type: traces, metrics, or logs.
Core Components
Receivers listen on network ports or actively scrape endpoints to ingest telemetry data. A single receiver instance can feed multiple pipelines through an internal fan-out consumer.
Processors transform data in sequence. They can filter, enrich, batch, sample, or rate-limit telemetry. Processor ordering matters: the memory limiter should always come first.
Exporters send processed data to backends. Multiple pipelines can share the same exporter instance. Exporters handle retry logic and queuing.
Connectors (introduced in Collector v0.86+) act as both exporter and receiver, enabling data to flow between pipelines. A common use case is the spanmetrics connector that generates RED metrics from trace spans.
Pipeline Configuration
# otel-collector-config.yaml
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
prometheus:
config:
scrape_configs:
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
filelog:
include:
- /var/log/pods/*/*/*.log
operators:
- type: json_parser
timestamp:
parse_from: attributes.time
layout: '%Y-%m-%dT%H:%M:%S.%LZ'
processors:
memory_limiter:
check_interval: 1s
limit_mib: 1800
spike_limit_mib: 400
batch:
timeout: 1s
send_batch_size: 1024
send_batch_max_size: 2048
attributes:
actions:
- key: environment
value: production
action: upsert
- key: collector.version
value: '0.98.0'
action: insert
resource:
attributes:
- key: service.namespace
value: platform
action: upsert
exporters:
otlp/tempo:
endpoint: tempo-distributor.observability:4317
tls:
insecure: true
retry_on_failure:
enabled: true
initial_interval: 5s
max_interval: 30s
max_elapsed_time: 300s
sending_queue:
enabled: true
num_consumers: 10
queue_size: 5000
prometheusremotewrite:
endpoint: http://mimir-distributor.observability:9009/api/v1/push
resource_to_telemetry_conversion:
enabled: true
otlp/loki:
endpoint: loki-distributor.observability:3100
tls:
insecure: true
connectors:
spanmetrics:
histogram:
explicit:
buckets: [5ms, 10ms, 25ms, 50ms, 100ms, 250ms, 500ms, 1s, 2.5s, 5s, 10s]
dimensions:
- name: http.method
- name: http.status_code
metrics_flush_interval: 15s
service:
telemetry:
logs:
level: info
metrics:
address: 0.0.0.0:8888
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [otlp/tempo, spanmetrics]
metrics:
receivers: [otlp, prometheus, spanmetrics]
processors: [memory_limiter, batch]
exporters: [prometheusremotewrite]
logs:
receivers: [otlp, filelog]
processors: [memory_limiter, attributes, batch]
exporters: [otlp/loki]
This configuration demonstrates a complete three-signal pipeline with the spanmetrics connector bridging traces to metrics.
Memory Limiter and Batch Processor
These two processors are non-negotiable in production. Every pipeline should include both.
Memory Limiter
The memory limiter prevents OOM kills by monitoring the Collector process heap usage. When memory exceeds limit_mib, the Collector starts refusing incoming data and triggers garbage collection. The spike_limit_mib defines the maximum expected spike, so the effective soft limit becomes limit_mib - spike_limit_mib.
Best practices:
- Set
limit_mibto 70-80% of your container memory limit. For a 2Gi pod, use 1400-1600 MiB. - Set
spike_limit_mibto 20-25% oflimit_mib. - Set
check_intervalto 1s. Shorter intervals increase CPU usage; longer intervals risk overshooting. - Always place the memory limiter first in the processor chain.
Batch Processor
The batch processor groups telemetry into batches before export, reducing per-request overhead and improving compression ratios.
processors:
batch:
# Send a batch after this many items accumulate
send_batch_size: 1024
# Hard cap to prevent oversized batches
send_batch_max_size: 2048
# Send whatever is buffered after this duration
timeout: 1s
Guidelines:
send_batch_sizeof 512-2048 works well for most workloads.timeoutof 1-5s balances latency vs throughput. Lower for real-time alerting pipelines.- Place the batch processor after the memory limiter and any filtering/sampling processors. Batching data before dropping it wastes CPU.
Tail Sampling Strategies
Tail sampling makes sampling decisions after collecting all spans of a trace. This enables intelligent decisions: keep all error traces, keep slow traces, sample healthy traces at a lower rate. The tail sampling processor lives in the opentelemetry-collector-contrib distribution.
Critical Constraint
All spans belonging to the same trace must arrive at the same Collector instance. In a gateway deployment, use a load balancing exporter with trace_id routing on the agent tier.
# Agent-tier config: route by trace ID to gateway
exporters:
loadbalancing:
protocol:
otlp:
tls:
insecure: true
resolver:
dns:
hostname: otel-gateway-headless.observability
port: 4317
# Gateway-tier config: tail sampling
processors:
tail_sampling:
decision_wait: 10s
num_traces: 100000
expected_new_traces_per_sec: 2000
policies:
# Policy 1: Always keep error traces
- name: errors-policy
type: status_code
status_code:
status_codes: [ERROR]
# Policy 2: Always keep slow traces (over 2 seconds)
- name: latency-policy
type: latency
latency:
threshold_ms: 2000
# Policy 3: Keep traces touching critical services
- name: critical-services
type: string_attribute
string_attribute:
key: service.name
values: [payment-service, auth-service, order-service]
# Policy 4: Probabilistic sampling for everything else
- name: probabilistic-policy
type: probabilistic
probabilistic:
sampling_percentage: 5
# Composite policy: combine the above
- name: composite-policy
type: composite
composite:
max_total_spans_per_second: 5000
policy_order: [errors-policy, latency-policy, critical-services, probabilistic-policy]
rate_allocation:
- policy: errors-policy
percent: 40
- policy: latency-policy
percent: 25
- policy: critical-services
percent: 25
- policy: probabilistic-policy
percent: 10
service:
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, tail_sampling, batch]
exporters: [otlp/tempo]
Tail Sampling Memory Sizing
The decision_wait parameter determines how long the Collector holds spans in memory before making a sampling decision. With num_traces: 100000 and average spans of 5 KB each, you need approximately 500 MB just for the trace buffer. Size your gateway pods accordingly: if decision_wait is 10s and you receive 2000 new traces/sec, you are holding 20,000 traces concurrently. Multiply by average spans per trace and span size to estimate memory.
Custom Processor Development
When built-in processors do not meet your needs, you can write a custom processor in Go using the Collector SDK. Below is a complete example of a processor that adds deployment metadata to all telemetry.
Project Structure
# Create the processor module
mkdir -p processor/deploymetadataprocessor
cd processor/deploymetadataprocessor
# Initialize Go module
go mod init github.com/myorg/otelcol-custom/processor/deploymetadataprocessor
go get go.opentelemetry.io/collector/processor
go get go.opentelemetry.io/collector/pdata
Configuration
// config.go
package deploymetadataprocessor
type Config struct {
ClusterName string `mapstructure:"cluster_name"`
Region string `mapstructure:"region"`
DeploymentEnv string `mapstructure:"deployment_env"`
AddBuildInfo bool `mapstructure:"add_build_info"`
}
func createDefaultConfig() *Config {
return &Config{
ClusterName: "default",
Region: "us-east-1",
DeploymentEnv: "production",
AddBuildInfo: true,
}
}
Factory
// factory.go
package deploymetadataprocessor
import (
"context"
"go.opentelemetry.io/collector/component"
"go.opentelemetry.io/collector/consumer"
"go.opentelemetry.io/collector/processor"
)
const (
typeStr = "deploy_metadata"
stability = component.StabilityLevelDevelopment
)
func NewFactory() processor.Factory {
return processor.NewFactory(
component.MustNewType(typeStr),
func() component.Config {
return createDefaultConfig()
},
processor.WithTraces(createTracesProcessor, stability),
processor.WithMetrics(createMetricsProcessor, stability),
processor.WithLogs(createLogsProcessor, stability),
)
}
func createTracesProcessor(
ctx context.Context,
set processor.Settings,
cfg component.Config,
next consumer.Traces,
) (processor.Traces, error) {
pCfg := cfg.(*Config)
return &deployMetadataTracesProcessor{
config: pCfg,
next: next,
logger: set.Logger,
}, nil
}
// createMetricsProcessor and createLogsProcessor follow the same pattern
Processor Implementation
// processor_traces.go
package deploymetadataprocessor
import (
"context"
"go.opentelemetry.io/collector/consumer"
"go.opentelemetry.io/collector/pdata/ptrace"
"go.uber.org/zap"
)
type deployMetadataTracesProcessor struct {
config *Config
next consumer.Traces
logger *zap.Logger
}
func (p *deployMetadataTracesProcessor) ConsumeTraces(ctx context.Context, td ptrace.Traces) error {
rss := td.ResourceSpans()
for i := 0; i < rss.Len(); i++ {
rs := rss.At(i)
attrs := rs.Resource().Attributes()
attrs.PutStr("deploy.cluster", p.config.ClusterName)
attrs.PutStr("deploy.region", p.config.Region)
attrs.PutStr("deploy.env", p.config.DeploymentEnv)
if p.config.AddBuildInfo {
attrs.PutStr("deploy.collector_build", "custom-v1.2.0")
}
}
return p.next.ConsumeTraces(ctx, td)
}
func (p *deployMetadataTracesProcessor) Capabilities() consumer.Capabilities {
return consumer.Capabilities{MutatesData: true}
}
func (p *deployMetadataTracesProcessor) Start(_ context.Context, _ component.Host) error {
p.logger.Info("deploy_metadata processor started",
zap.String("cluster", p.config.ClusterName),
zap.String("region", p.config.Region))
return nil
}
func (p *deployMetadataTracesProcessor) Shutdown(_ context.Context) error {
return nil
}
Building with OCB
Use the OpenTelemetry Collector Builder (ocb) to compile a custom Collector binary that includes your processor.
# builder-config.yaml
dist:
name: custom-otelcol
description: Custom OTel Collector with deploy metadata processor
output_path: ./dist
version: 1.0.0
receivers:
- gomod: go.opentelemetry.io/collector/receiver/otlpreceiver v0.98.0
- gomod: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver v0.98.0
processors:
- gomod: go.opentelemetry.io/collector/processor/batchprocessor v0.98.0
- gomod: go.opentelemetry.io/collector/processor/memorylimiterprocessor v0.98.0
- gomod: github.com/open-telemetry/opentelemetry-collector-contrib/processor/tailsamplingprocessor v0.98.0
- gomod: github.com/myorg/otelcol-custom/processor/deploymetadataprocessor v0.0.1
exporters:
- gomod: go.opentelemetry.io/collector/exporter/otlpexporter v0.98.0
# Build the custom collector
ocb --config builder-config.yaml
# Run with your config
./dist/custom-otelcol --config otel-collector-config.yaml
Agent vs Gateway Deployment
Production environments typically deploy the Collector in two tiers: agents on every node and gateways as centralized processors.
Agent Mode (DaemonSet)
Agents run as Kubernetes DaemonSets with one pod per node. Their job is lightweight: receive local telemetry, add node-level metadata, and forward to gateways.
# agent-daemonset.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: otel-agent
namespace: observability
spec:
selector:
matchLabels:
app: otel-agent
template:
metadata:
labels:
app: otel-agent
spec:
containers:
- name: otel-agent
image: otel/opentelemetry-collector-contrib:0.98.0
args: ['--config=/etc/otel/config.yaml']
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: 500m
memory: 512Mi
ports:
- containerPort: 4317 # OTLP gRPC
- containerPort: 4318 # OTLP HTTP
volumeMounts:
- name: config
mountPath: /etc/otel
- name: varlog
mountPath: /var/log
readOnly: true
volumes:
- name: config
configMap:
name: otel-agent-config
- name: varlog
hostPath:
path: /var/log
Agent configuration should be minimal: OTLP receiver, memory limiter, resource detection processor (to add k8s metadata), and a load-balancing exporter pointed at the gateway.
Gateway Mode (Deployment)
Gateways run as a Kubernetes Deployment with a Horizontal Pod Autoscaler. They handle heavy processing: tail sampling, span-to-metrics conversion, attribute enrichment, and export to backends.
# gateway-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: otel-gateway
namespace: observability
spec:
replicas: 3
selector:
matchLabels:
app: otel-gateway
template:
metadata:
labels:
app: otel-gateway
spec:
containers:
- name: otel-gateway
image: custom-otelcol:1.0.0
args: ['--config=/etc/otel/config.yaml']
resources:
requests:
cpu: '1'
memory: 4Gi
limits:
cpu: '2'
memory: 6Gi
ports:
- containerPort: 4317
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: otel-gateway-hpa
namespace: observability
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: otel-gateway
minReplicas: 3
maxReplicas: 12
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 60
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 70
Scaling Decision Framework
- Under 1,000 spans/sec: A single Collector in standalone mode is sufficient.
- 1,000-10,000 spans/sec: Agent mode on each node with a small gateway cluster (2-3 replicas).
- 10,000-100,000 spans/sec: Agent-gateway pattern with HPA, load-balancing exporter for trace-ID affinity, and tail sampling on the gateway.
- Over 100,000 spans/sec: Multi-tier gateways with sharding. Consider multiple gateway clusters partitioned by signal type or tenant. Implement backpressure propagation with the memory limiter.
Failure Cases and Recovery Procedures
Case 1: Gateway OOM Kill
Symptom: Gateway pods repeatedly restart with OOMKilled status.
Root cause: Tail sampling decision_wait is too long for the traffic volume, or num_traces is undersized.
Recovery:
# 1. Check current memory usage
kubectl top pods -n observability -l app=otel-gateway
# 2. Inspect OOM events
kubectl describe pod otel-gateway-xxx -n observability | grep -A5 "Last State"
# 3. Immediate relief: reduce decision_wait and increase replicas
kubectl scale deployment otel-gateway -n observability --replicas=6
# 4. Long-term: adjust tail sampling config
# Reduce decision_wait from 10s to 5s
# Increase num_traces from 100000 to 200000
# Increase pod memory limit to 8Gi and memory_limiter limit_mib to 5600
Case 2: Data Loss During Export Failure
Symptom: Backend (Tempo/Mimir) is temporarily unavailable. Telemetry data is lost.
Recovery: Configure persistent queuing with the file storage extension.
extensions:
file_storage:
directory: /var/otel/queue
timeout: 10s
compaction:
on_start: true
directory: /var/otel/queue/compaction
exporters:
otlp/tempo:
endpoint: tempo-distributor:4317
sending_queue:
enabled: true
num_consumers: 10
queue_size: 10000
storage: file_storage
retry_on_failure:
enabled: true
initial_interval: 5s
max_interval: 60s
max_elapsed_time: 600s
service:
extensions: [file_storage]
This configuration persists the export queue to disk. If the Collector restarts or the backend recovers, queued data will be sent.
Case 3: High Cardinality Explosion
Symptom: Metrics pipeline memory grows unbounded. Prometheus remote write rejects data.
Recovery: Use the filter processor to drop high-cardinality attributes before export.
processors:
filter:
metrics:
exclude:
match_type: regexp
metric_names:
- '.*_temp_.*'
resource_attributes:
- key: service.instance.id
value: '.*debug.*'
transform:
metric_statements:
- context: datapoint
statements:
- delete_key(attributes, "http.url")
- delete_key(attributes, "user.id")
Case 4: Agent Cannot Reach Gateway
Symptom: Agents log connection refused errors. Telemetry is dropped silently.
Recovery: Enable the sending queue on agents with bounded retry. Add health check extension for readiness probes.
extensions:
health_check:
endpoint: 0.0.0.0:13133
path: /health
exporters:
otlp:
endpoint: otel-gateway.observability:4317
sending_queue:
enabled: true
queue_size: 2000
retry_on_failure:
enabled: true
initial_interval: 1s
max_interval: 30s
max_elapsed_time: 120s
service:
extensions: [health_check]
Production Readiness Checklist
- Memory limiter is the first processor in every pipeline
- Batch processor is the last processor before exporters
- Sending queues are enabled on all exporters with retry configured
- Health check extension is deployed for Kubernetes liveness/readiness probes
- Collector self-observability metrics (port 8888) are scraped by Prometheus
- Resource limits are set in Kubernetes manifests (requests and limits)
- HPA is configured for gateway deployments
- Load-balancing exporter with trace-ID routing is used when tail sampling is active
- File storage extension is configured for persistent queuing in critical pipelines
- Dashboard monitoring Collector queue depth, dropped spans, and export latency
- Alerting on
otelcol_exporter_send_failed_spans_totalandotelcol_processor_refused_spans_total
References
- OpenTelemetry Collector Official Documentation
- OpenTelemetry Collector Contrib Repository
- CNCF OpenTelemetry Project
- Grafana Labs - OpenTelemetry Integration Guide
- Honeycomb - OpenTelemetry Collector Guide
- OpenTelemetry Collector Gateway Deployment Pattern
- Tail Sampling Processor - opentelemetry-collector-contrib
- Memory Limiter Processor - opentelemetry-collector