- Authors
- Name

- Overview
- Tempo Architecture
- Deployment Modes
- Quick Start with Docker Compose
- TraceQL Query Syntax
- Span Metrics and Service Graphs
- Tempo vs Jaeger vs Zipkin Comparison
- OpenTelemetry Collector Integration
- Storage Optimization
- Grafana Dashboard Configuration
- Troubleshooting
- Operations Checklist
- Failure Cases and Recovery
- References
Overview
As microservices architecture has become mainstream, environments where a single request passes through dozens of services are now commonplace. In such environments, distributed tracing is essential for tracking the root cause of failures. Grafana Tempo is an open-source distributed tracing backend released by Grafana Labs in 2020 that can operate with only object storage, dramatically reducing infrastructure complexity and cost.
Tempo's core philosophy is simple. It does not create separate indexes for trace data, instead searching spans through Trace ID-based lookups and the TraceQL query engine. Thanks to this approach, storage costs are significantly lower compared to Jaeger or Zipkin, and petabyte-scale traces can be reliably stored.
This article covers Tempo's internal architecture, three deployment modes, TraceQL query syntax, span metrics generation and service graphs, OpenTelemetry Collector integration, storage optimization, Grafana dashboard configuration, troubleshooting, and real-world failure cases and recovery experiences from production operations.
Tempo Architecture
Internally, Tempo uses multiple components that work together to collect, store, and query trace data. Understanding each component's role helps quickly identify bottlenecks when failures occur.
Core Components
Distributor is the entry point that receives span data from clients. It supports various protocols including Jaeger, Zipkin, and OpenTelemetry (OTLP), and routes received spans to the appropriate Ingester using consistent hashing based on Trace ID hash.
Ingester indexes received span data and flushes it in block units to object storage after a certain period. It maintains a WAL (Write-Ahead Log) to minimize data loss even during abnormal process termination.
Query Frontend is the component called when clients like Grafana request Trace ID lookups or TraceQL searches. It distributes requests across multiple Queriers to search block data in parallel, reducing response time.
Querier is the worker that actually processes requests received from the Query Frontend. It searches both the Ingester's in-memory data and object storage block data to combine results.
Compactor periodically merges small blocks stored in object storage into larger blocks. This improves query performance and optimizes storage usage.
Metrics Generator is an optional component that automatically generates RED (Rate, Error, Duration) metrics and service graphs from received span data. Generated metrics are sent to Mimir or Prometheus via Prometheus-compatible remote write.
Data Flow
[Application] --> [OTel Collector] --> [Distributor]
|
[Hash Ring]
|
[Ingester]
/ \
[WAL] [Object Storage]
|
[Compactor] <-------+
|
[Query Frontend] ---+---> [Querier]
Spans arrive at the Distributor from the application via OTel Collector, then are distributed to Ingesters through the hash ring. The Ingester first writes to the WAL, then flushes blocks to object storage at configured intervals (default 30 minutes). The Compactor merges small blocks, and the Querier searches both Ingester in-memory and object storage data.
Deployment Modes
Tempo provides three deployment modes that can be selected based on the organization's scale and requirements.
Deployment Mode Comparison
| Item | Monolithic | Scalable Single Binary | Microservices |
|---|---|---|---|
| Structure | Single binary, single process | Single binary, multiple instances | Independent process per component |
| Scalability | Vertical scaling only | Horizontal scaling | Independent horizontal per component |
| Recommended traffic | Under 100GB/day | 100GB to 1TB/day | Over 1TB/day |
| Operational complexity | Low | Medium | High |
| High availability | Limited | Basic support | Full support |
| Suitable environment | Dev/test, small-scale | Medium-scale production | Large-scale production, multi-tenant |
| Kubernetes required | No | Recommended | Required |
Monolithic Mode
All components run in a single process. Suitable for local environments or small workloads with the simplest configuration.
# tempo-config.yaml (Monolithic)
server:
http_listen_port: 3200
distributor:
receivers:
otlp:
protocols:
grpc:
endpoint: '0.0.0.0:4317'
http:
endpoint: '0.0.0.0:4318'
jaeger:
protocols:
thrift_http:
endpoint: '0.0.0.0:14268'
zipkin:
endpoint: '0.0.0.0:9411'
ingester:
max_block_duration: 5m
max_block_bytes: 1073741824 # 1GB
storage:
trace:
backend: local
wal:
path: /var/tempo/wal
local:
path: /var/tempo/blocks
pool:
max_workers: 100
queue_depth: 10000
compactor:
compaction:
block_retention: 72h
metrics_generator:
registry:
external_labels:
source: tempo
cluster: local
storage:
path: /var/tempo/generator/wal
remote_write:
- url: http://prometheus:9090/api/v1/write
send_exemplars: true
traces_storage:
path: /var/tempo/generator/traces
processor:
service_graphs:
dimensions:
- service.namespace
- deployment.environment
span_metrics:
dimensions:
- http.method
- http.status_code
- http.route
overrides:
defaults:
metrics_generator:
processors:
- service-graphs
- span-metrics
Scalable Single Binary Mode
Achieves horizontal scaling by running the same binary as multiple instances. As a middle ground between Monolithic and Microservices, it provides scalability without significantly increasing configuration complexity. Each instance runs with the target flag set to scalable-single-binary.
Microservices Mode
Each component is deployed as an independent process, enabling individual scaling. In large-scale environments, specific components (e.g., Ingester) can be scaled out, or Queriers can be adjusted to match traffic patterns. In Kubernetes environments, using the Helm chart (tempo-distributed) makes deployment convenient.
Quick Start with Docker Compose
To quickly try Tempo in a local environment, use Docker Compose. The configuration below brings up Tempo (Monolithic), OTel Collector, Grafana, and Prometheus all at once.
# docker-compose.yaml
version: '3.9'
services:
tempo:
image: grafana/tempo:2.7.1
command: ['-config.file=/etc/tempo/tempo.yaml']
volumes:
- ./tempo.yaml:/etc/tempo/tempo.yaml
- tempo-data:/var/tempo
ports:
- '3200:3200' # Tempo HTTP API
- '4317:4317' # OTLP gRPC
- '4318:4318' # OTLP HTTP
- '9411:9411' # Zipkin
- '14268:14268' # Jaeger HTTP
networks:
- observability
otel-collector:
image: otel/opentelemetry-collector-contrib:0.118.0
command: ['--config=/etc/otel-collector/config.yaml']
volumes:
- ./otel-collector-config.yaml:/etc/otel-collector/config.yaml
ports:
- '4327:4317' # OTLP gRPC (for app access)
- '4328:4318' # OTLP HTTP
depends_on:
- tempo
networks:
- observability
prometheus:
image: prom/prometheus:v3.2.1
volumes:
- ./prometheus.yaml:/etc/prometheus/prometheus.yml
ports:
- '9090:9090'
networks:
- observability
grafana:
image: grafana/grafana:11.5.2
environment:
- GF_SECURITY_ADMIN_PASSWORD=admin
- GF_AUTH_ANONYMOUS_ENABLED=true
- GF_AUTH_ANONYMOUS_ORG_ROLE=Admin
volumes:
- ./grafana-datasources.yaml:/etc/grafana/provisioning/datasources/datasources.yaml
ports:
- '3000:3000'
depends_on:
- tempo
- prometheus
networks:
- observability
volumes:
tempo-data:
networks:
observability:
driver: bridge
After running docker compose up -d, access Grafana at http://localhost:3000 where the Tempo datasource is automatically provisioned, allowing you to search traces immediately.
TraceQL Query Syntax
TraceQL is Tempo's dedicated query language, following a syntax system similar to PromQL and LogQL. It selects spansets with curly braces {} and chains filters and aggregations with pipeline operators.
Basic Structure
A TraceQL query consists of three main elements:
- Intrinsics: Span's built-in properties (
name,status,duration,kind,rootName,rootServiceName,traceDuration) - Attributes: Custom key-value pairs using scope prefixes (
span.,resource.,link.,event.) - Operators: Comparison (
=,!=,>,<,>=,<=), regex (=~,!~), logical (&&,||), structural (>,>>,<,<<,~)
TraceQL Query Examples
// 1. Find error spans for a specific service
{ resource.service.name = "payment-service" && status = error }
// 2. HTTP GET request spans taking over 500ms
{ span.http.method = "GET" && duration > 500ms }
// 3. Spans returning 5xx responses on a specific route
{ span.http.route = "/api/v1/orders" && span.http.status_code >= 500 }
// 4. Trace call relationship between two services (structural operator)
{ resource.service.name = "api-gateway" } >> { resource.service.name = "order-service" }
// 5. Filter spans with direct parent-child relationship
{ resource.service.name = "frontend" } > { span.http.status_code = 503 }
// 6. Explore sibling span relationships
{ span.db.system = "postgresql" } ~ { span.db.system = "redis" }
// 7. Span name matching using regex
{ name =~ "HTTP.*POST" && resource.deployment.environment = "production" }
// 8. Filter by total trace duration
{ traceDuration > 3s }
// 9. Filter by root service
{ rootServiceName = "ingress-nginx" && duration > 1s }
// 10. Analysis using aggregation functions
{ resource.service.name = "checkout-service" } | rate()
// 11. Check latency distribution with histogram
{ resource.service.name = "search-service" } | histogram_over_time(duration)
// 12. Anomaly detection based on count
{ status = error } | count() > 100
Key Aggregation Functions
| Function | Description | Example |
|---|---|---|
rate() | Spans per second rate | {} | rate() |
count() | Matching span count | { status = error } | count() |
avg(field) | Field average value | {} | avg(duration) |
max(field) | Field maximum value | {} | max(duration) |
min(field) | Field minimum value | {} | min(duration) |
p50/p90/p95/p99(field) | Percentiles | {} | p99(duration) |
histogram_over_time(field) | Histogram over time | {} | histogram_over_time(duration) |
quantile_over_time(field, q) | Quantile over time | {} | quantile_over_time(duration, 0.95) |
Span Metrics and Service Graphs
Tempo's Metrics Generator is a powerful feature that automatically generates metrics from received spans. Without separate metric collection, you can obtain RED metrics and service dependency graphs from trace data alone.
Span Metrics Generator
The span metrics processor converts Request Rate, Error Rate, and Duration distribution from all incoming spans into Prometheus metrics. The main metrics generated are:
traces_spanmetrics_calls_total: Total span call counttraces_spanmetrics_latency_bucket: Latency histogram bucketstraces_spanmetrics_size_total: Total span size
By configuring dimensions, you can add span attributes like http.method, http.status_code, and http.route as metric labels, allowing fine-grained RED metrics observation per endpoint.
Service Graph Generator
The service graph processor analyzes client-server span pairs to automatically map call relationships between services. The service topology can be visually confirmed in Grafana's service graph view, with request rate, error rate, and latency displayed on each edge.
Key configuration parameters include:
max_items: Maximum number of service pairs to track (default 10000)wait: Wait time for incomplete edges (default 10s)dimensions: Custom labels to add to the service graphhistogram_buckets: Latency histogram bucket boundaries (default 0.1, 0.2, 0.4, 0.8, 1.6, 3.2, 6.4, 12.8)
Tempo vs Jaeger vs Zipkin Comparison
When selecting a distributed tracing backend, comparing the characteristics of each tool is important.
| Item | Grafana Tempo | Jaeger | Zipkin |
|---|---|---|---|
| Initial release | 2020 (Grafana Labs) | 2015 (Uber) | 2012 (Twitter) |
| CNCF status | - | Graduated | - |
| Storage method | Object storage (no index) | Elasticsearch, Cassandra, etc. | Elasticsearch, Cassandra, MySQL |
| Indexing | None (Trace ID + TraceQL) | Tag-based index creation | Tag-based index creation |
| Storage cost | Low (S3/GCS pricing) | High (includes index storage) | High |
| Ingestion protocols | OTLP, Jaeger, Zipkin | OTLP, Jaeger | Zipkin, OTLP (limited) |
| Query language | TraceQL | Tag-based search | Tag-based search |
| Built-in UI | Grafana integration | Jaeger UI | Zipkin UI |
| Metrics generation | Built-in (Metrics Generator) | External tools needed | External tools needed |
| Scalability | Excellent (PB scale) | Moderate | Limited |
| Grafana integration | Native | Plugin | Plugin |
| Maintained by | Grafana Labs (commercial support) | CNCF community | Volunteer community |
Selection Criteria Summary: If you already use the Grafana ecosystem and want to store large-scale traces at low cost, Tempo is optimal. If you need an independent tracing system and rich tag-based search is essential, consider Jaeger. For small teams looking to quickly adopt tracing, Zipkin remains a viable option.
OpenTelemetry Collector Integration
The most recommended way to send traces to Tempo is using OpenTelemetry Collector as an intermediate pipeline. The Collector collects traces from various sources, performs batch processing and retries, then reliably sends them to Tempo.
# otel-collector-config.yaml
receivers:
otlp:
protocols:
grpc:
endpoint: '0.0.0.0:4317'
http:
endpoint: '0.0.0.0:4318'
processors:
batch:
timeout: 5s
send_batch_size: 10000
send_batch_max_size: 11000
memory_limiter:
check_interval: 1s
limit_mib: 4096
spike_limit_mib: 512
attributes:
actions:
- key: deployment.environment
value: production
action: upsert
tail_sampling:
decision_wait: 10s
num_traces: 100000
expected_new_traces_per_sec: 1000
policies:
- name: errors-policy
type: status_code
status_code:
status_codes:
- ERROR
- name: slow-traces-policy
type: latency
latency:
threshold_ms: 1000
- name: probabilistic-policy
type: probabilistic
probabilistic:
sampling_percentage: 10
exporters:
otlp/tempo:
endpoint: 'tempo:4317'
tls:
insecure: true
retry_on_failure:
enabled: true
initial_interval: 5s
max_interval: 30s
max_elapsed_time: 300s
sending_queue:
enabled: true
num_consumers: 10
queue_size: 5000
debug:
verbosity: basic
service:
telemetry:
logs:
level: info
metrics:
address: '0.0.0.0:8888'
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, tail_sampling, attributes, batch]
exporters: [otlp/tempo, debug]
The key aspects of this configuration are:
- tail_sampling: Error spans are collected at 100%, slow traces over 1 second are also fully collected, and the rest are sampled at 10% probability. This ensures important traces are not missed while reducing storage costs.
- memory_limiter: Limits Collector memory usage to 4GB to prevent OOM.
- sending_queue: Buffers data in the queue and retries even during temporary Tempo outages.
- batch: Groups spans into batches of 10,000 for transmission, improving network efficiency.
Storage Optimization
Tempo's storage design is centered on object storage. In production environments, choose S3, GCS, or Azure Blob Storage as the backend.
Storage Backend Comparison
| Item | Amazon S3 | Google Cloud Storage | Azure Blob Storage |
|---|---|---|---|
| Config key | s3 | gcs | azure |
| Authentication | IAM Role, Access Key | Service Account, Workload Identity | Managed Identity, SAS Token |
| Cost (GB/month) | $0.023 (Standard) | $0.020 (Standard) | $0.018 (Hot) |
| Region availability | 33+ regions | 40+ regions | 60+ regions |
| Tempo compatibility | Full support | Full support | Full support |
| Lifecycle policy | S3 Lifecycle | Object Lifecycle | Lifecycle Management |
S3 Backend Configuration Example
storage:
trace:
backend: s3
s3:
bucket: tempo-traces-prod
endpoint: s3.ap-northeast-2.amazonaws.com
region: ap-northeast-2
access_key: ${S3_ACCESS_KEY}
secret_key: ${S3_SECRET_KEY}
# Or omit access_key/secret_key when using IAM Role
wal:
path: /var/tempo/wal
block:
bloom_filter_false_positive: 0.01
v2_index_downsample_bytes: 1048576
v2_encoding: zstd
blocklist_poll: 5m
pool:
max_workers: 200
queue_depth: 20000
compactor:
compaction:
block_retention: 336h # 14-day retention
compacted_block_retention: 1h
compaction_window: 4h
max_block_bytes: 107374182400 # 100GB
max_compaction_objects: 6000000
retention_concurrency: 10
ring:
kvstore:
store: memberlist
Storage Optimization Tips
Block Encoding: Setting v2_encoding to zstd achieves approximately 30-40% higher compression ratio compared to snappy, but with slightly increased CPU usage. Choose snappy for write-heavy workloads, or zstd when storage cost is the priority.
Bloom Filter Tuning: Lowering bloom_filter_false_positive (e.g., 0.01 to 0.005) improves query accuracy but increases bloom filter size. In environments with frequent queries, reducing the false positive rate is beneficial for overall performance.
Block Retention Period: Set block_retention according to business requirements. 14 days (336h) is typical, but compliance requirements may necessitate 90 days or more. In such cases, using object storage lifecycle policies to automatically transition to Infrequent Access (S3) or Nearline (GCS) tiers can reduce costs.
Compactor Tuning: Setting max_block_bytes too high causes Compactor memory usage to spike, while setting it too low increases the number of blocks and degrades query performance. Around 100GB is a balanced value.
Grafana Dashboard Configuration
Tempo integrates natively with Grafana, providing rich tracing visualization without a separate UI. Below are the Grafana datasource provisioning configuration and dashboard configuration examples.
Datasource Provisioning
# grafana-datasources.yaml
apiVersion: 1
datasources:
- name: Tempo
type: tempo
access: proxy
url: http://tempo:3200
uid: tempo
jsonData:
httpMethod: GET
tracesToLogsV2:
datasourceUid: loki
spanStartTimeShift: '-1h'
spanEndTimeShift: '1h'
filterByTraceID: true
filterBySpanID: true
tracesToMetrics:
datasourceUid: prometheus
spanStartTimeShift: '-1h'
spanEndTimeShift: '1h'
tags:
- key: service.name
value: service
- key: http.method
value: method
tracesToProfiles:
datasourceUid: pyroscope
profileTypeId: 'process_cpu:cpu:nanoseconds:cpu:nanoseconds'
tags:
- key: service.name
value: service_name
serviceMap:
datasourceUid: prometheus
nodeGraph:
enabled: true
search:
hide: false
traceQuery:
timeShiftEnabled: true
spanStartTimeShift: '-30m'
spanEndTimeShift: '30m'
Dashboard JSON Snippet
The following is a Grafana dashboard panel configuration showing request rate and error rate by service.
{
"panels": [
{
"title": "Service Request Rate",
"type": "timeseries",
"datasource": { "uid": "prometheus", "type": "prometheus" },
"targets": [
{
"expr": "sum(rate(traces_spanmetrics_calls_total{status_code!=\"STATUS_CODE_ERROR\"}[5m])) by (service)",
"legendFormat": "{{ service }}"
}
],
"fieldConfig": {
"defaults": {
"unit": "reqps",
"custom": { "drawStyle": "line", "lineWidth": 2 }
}
}
},
{
"title": "Service Error Rate",
"type": "timeseries",
"datasource": { "uid": "prometheus", "type": "prometheus" },
"targets": [
{
"expr": "sum(rate(traces_spanmetrics_calls_total{status_code=\"STATUS_CODE_ERROR\"}[5m])) by (service) / sum(rate(traces_spanmetrics_calls_total[5m])) by (service) * 100",
"legendFormat": "{{ service }}"
}
],
"fieldConfig": {
"defaults": {
"unit": "percent",
"thresholds": {
"steps": [
{ "color": "green", "value": null },
{ "color": "yellow", "value": 1 },
{ "color": "red", "value": 5 }
]
}
}
}
},
{
"title": "P99 Latency by Service",
"type": "timeseries",
"datasource": { "uid": "prometheus", "type": "prometheus" },
"targets": [
{
"expr": "histogram_quantile(0.99, sum(rate(traces_spanmetrics_latency_bucket[5m])) by (le, service))",
"legendFormat": "{{ service }}"
}
],
"fieldConfig": {
"defaults": { "unit": "s" }
}
}
]
}
Key Integration Features
When using Tempo in Grafana, the most powerful features are the three cross-datasource integrations: Traces to Logs, Traces to Metrics, and Traces to Profiles.
- Traces to Logs: Clicking a specific span in the trace view navigates directly to Loki logs for that time window. It automatically filters by Trace ID and Span ID, showing only related logs.
- Traces to Metrics: You can jump to Prometheus metric queries based on span attributes. When slow spans are found, you can immediately check CPU and memory metrics for that service.
- Traces to Profiles: When integrated with Pyroscope, you can trace the cause of slow spans down to the code level (function call profiles).
Troubleshooting
This section covers common issues and solutions encountered when operating Tempo.
Ingester Out of Memory (OOM)
Symptom: Ingester Pods repeatedly restart with OOMKilled status.
Cause: In-memory blocks become excessively large due to traffic spikes, or max_block_duration is set too long.
Solution: Reduce ingester.max_block_duration to 5 minutes to shorten the flush cycle, and limit ingester.max_block_bytes to a range of 500MB to 1GB. Kubernetes resource requests and limits should also be set sufficiently. Increasing the number of Ingester instances to distribute load is also effective.
TraceQL Query Timeout
Symptom: "context deadline exceeded" errors occur repeatedly during TraceQL searches.
Cause: Occurs when there are too many blocks (Compactor not functioning) or the search scope is too broad.
Solution: Verify that the Compactor is operating normally and adjust compaction_window appropriately. Set query_frontend.max_retries to 3 and limit results with query_frontend.search.default_result_limit. Narrowing the query time range is also an immediate mitigation.
Missing Spans
Symptom: Some spans are missing from traces, resulting in incomplete trace queries.
Cause: Often caused by hash ring inconsistency between Distributor and Ingester, network partitions, or sampling policy mismatches.
Solution: Check for "ring not healthy" messages in distributor logs. Verify that the Memberlist communication port (default 7946) is open in the firewall. Validate that the OTel Collector's tail_sampling policy is working as intended, and temporarily enable the debug exporter to trace span flow.
Compactor Block Merge Failure
Symptom: The number of blocks in object storage keeps increasing and query performance gradually degrades.
Cause: Compactor memory shortage, object storage permission issues, or max_compaction_objects limit exceeded.
Solution: Increase the Compactor's memory allocation and reconfirm storage IAM permissions (ListBucket, GetObject, PutObject, DeleteObject). Gradually increase compaction.max_compaction_objects to handle large blocks.
Operations Checklist
This is a checklist for reliably operating Tempo in production environments.
Pre-deployment Checks
- Determine deployment mode (based on daily traffic: under 100GB Monolithic, 100GB-1TB Scalable, over 1TB Microservices)
- Create object storage bucket and configure IAM permissions
- Verify disk IOPS for WAL storage path (SSD recommended, minimum 3000 IOPS)
- Configure network policies (Memberlist 7946/TCP, OTLP 4317-4318/TCP)
- Provision TLS certificates (mTLS recommended)
- Set resource requests/limits (Ingester: minimum 4GB RAM, Compactor: minimum 8GB RAM)
Essential Monitoring Metrics
-
tempo_ingester_live_traces: Active trace count (memory pressure indicator) -
tempo_ingester_bytes_received_total: Bytes received per second -
tempo_compactor_blocks_total: Object storage block count (alert on sustained increase) -
tempo_distributor_spans_received_total: Received span count (check for drops) -
tempo_query_frontend_queries_total: Query throughput and error rate -
tempo_discarded_spans_total: Discarded span count (investigate immediately if non-zero)
Regular Inspection Items
- Weekly: Check Compactor block merge status, monitor block count trends
- Weekly: Check WAL disk usage and verify flush operation
- Monthly: Review storage costs and reassess retention periods
- Monthly: Benchmark TraceQL query performance (track response times for key query patterns)
- Quarterly: Plan Tempo version upgrades and conduct compatibility tests
Failure Cases and Recovery
Case 1: Data Loss Due to Ingester WAL Corruption
Situation: An unexpected Kubernetes node shutdown corrupted the WAL on 2 out of 3 Ingesters. The Ingesters failed to recover WAL on restart, resulting in approximately 15 minutes of trace data loss.
Recovery Process: First, manually cleared the corrupted WAL directories and restarted the Ingesters. For the lost time window, partial recovery was achieved by resending some data buffered in the OTel Collector's sending_queue.
Lessons Learned: Set the Ingester's replication_factor to 3 so that identical spans are replicated to at least 2 Ingesters. Fixed the WAL path to local NVMe SSD and changed the PV (PersistentVolume) reclaimPolicy to Retain to preserve WAL even during Pod rescheduling. Increased Ingester Pod's terminationGracePeriodSeconds to 300 seconds to allow flush time during shutdown.
Case 2: Query Performance Collapse Due to Compactor Failure
Situation: After an S3 IAM policy change, the Compactor lost DeleteObject permissions, and block merging was interrupted for 2 weeks. Over 500,000 small blocks accumulated, causing TraceQL search response time to surge from the usual 2 seconds to 45 seconds.
Recovery Process: The S3 IAM policy was immediately corrected and the Compactor was restarted. However, attempting to merge 500,000 blocks at once caused Compactor OOM. By lowering compaction.max_compaction_objects from 1 million to 100,000 and reducing compaction_window to 1 hour, blocks were gradually merged. Full normalization took 3 days.
Lessons Learned: Set up an alarm on the tempo_compactor_blocks_total metric to receive immediate notification when the block count increases abnormally. Added a check item to the change management process to verify whether Tempo-related permissions are affected when IAM policies change.
Case 3: Cardinality Explosion from Indiscriminate Custom Attributes
Situation: The development team indiscriminately added user IDs (user.id) as span attributes, and this attribute was included in the Metrics Generator's dimensions, causing cardinality to explode to millions. Prometheus remote write became a bottleneck, delaying the entire metrics collection.
Recovery Process: Immediately removed user.id from dimensions and restarted the Metrics Generator. Deleted the affected time series in Prometheus to reclaim storage.
Lessons Learned: Always verify the cardinality of attributes added to dimensions in advance. Established a policy where attributes that could exceed 1000 cardinality are used only for TraceQL search instead of as metric labels. Also added a safety measure by setting overrides.defaults.metrics_generator.max_active_series to limit the number of time series.
References
- Grafana Tempo Official Documentation
- TraceQL Query Construction Guide
- Tempo Architecture Documentation
- Tempo Deployment Mode Comparison
- Tempo Storage Configuration Documentation
- Tempo Metrics Generator Documentation
- Tempo Docker Compose Examples (GitHub)
- Grafana Tempo vs Jaeger Comparison (Last9)
- Open-Source Tracing Tools: Jaeger vs Zipkin vs Tempo (CoderSociety)
- OpenTelemetry Collector Configuration