Grafana Tempo Distributed Tracing and TraceQL Operations Guide 2026

Overview
Tempo Architecture
- Core Components
- Data Flow
Deployment Modes
Quick Start with Docker Compose
TraceQL Query Syntax
Span Metrics and Service Graphs
- Span Metrics Generator
- Service Graph Generator
Tempo vs Jaeger vs Zipkin Comparison
OpenTelemetry Collector Integration
Storage Optimization
Grafana Dashboard Configuration
Troubleshooting
Operations Checklist
Failure Cases and Recovery
References

Overview

As microservices architecture has become mainstream, environments where a single request passes through dozens of services are now commonplace. In such environments, distributed tracing is essential for tracking the root cause of failures. Grafana Tempo is an open-source distributed tracing backend released by Grafana Labs in 2020 that can operate with only object storage, dramatically reducing infrastructure complexity and cost.

Tempo's core philosophy is simple. It does not create separate indexes for trace data, instead searching spans through Trace ID-based lookups and the TraceQL query engine. Thanks to this approach, storage costs are significantly lower compared to Jaeger or Zipkin, and petabyte-scale traces can be reliably stored.

This article covers Tempo's internal architecture, three deployment modes, TraceQL query syntax, span metrics generation and service graphs, OpenTelemetry Collector integration, storage optimization, Grafana dashboard configuration, troubleshooting, and real-world failure cases and recovery experiences from production operations.

Tempo Architecture

Internally, Tempo uses multiple components that work together to collect, store, and query trace data. Understanding each component's role helps quickly identify bottlenecks when failures occur.

Core Components

Distributor is the entry point that receives span data from clients. It supports various protocols including Jaeger, Zipkin, and OpenTelemetry (OTLP), and routes received spans to the appropriate Ingester using consistent hashing based on Trace ID hash.

Ingester indexes received span data and flushes it in block units to object storage after a certain period. It maintains a WAL (Write-Ahead Log) to minimize data loss even during abnormal process termination.

Query Frontend is the component called when clients like Grafana request Trace ID lookups or TraceQL searches. It distributes requests across multiple Queriers to search block data in parallel, reducing response time.

Querier is the worker that actually processes requests received from the Query Frontend. It searches both the Ingester's in-memory data and object storage block data to combine results.

Compactor periodically merges small blocks stored in object storage into larger blocks. This improves query performance and optimizes storage usage.

Metrics Generator is an optional component that automatically generates RED (Rate, Error, Duration) metrics and service graphs from received span data. Generated metrics are sent to Mimir or Prometheus via Prometheus-compatible remote write.

Data Flow

[Application] --> [OTel Collector] --> [Distributor]
                                           |
                                    [Hash Ring]
                                           |
                                      [Ingester]
                                       /      \
                              [WAL]         [Object Storage]
                                                  |
                              [Compactor] <-------+
                                                  |
                              [Query Frontend] ---+---> [Querier]

Spans arrive at the Distributor from the application via OTel Collector, then are distributed to Ingesters through the hash ring. The Ingester first writes to the WAL, then flushes blocks to object storage at configured intervals (default 30 minutes). The Compactor merges small blocks, and the Querier searches both Ingester in-memory and object storage data.

Deployment Modes

Tempo provides three deployment modes that can be selected based on the organization's scale and requirements.

Deployment Mode Comparison

Item	Monolithic	Scalable Single Binary	Microservices
Structure	Single binary, single process	Single binary, multiple instances	Independent process per component
Scalability	Vertical scaling only	Horizontal scaling	Independent horizontal per component
Recommended traffic	Under 100GB/day	100GB to 1TB/day	Over 1TB/day
Operational complexity	Low	Medium	High
High availability	Limited	Basic support	Full support
Suitable environment	Dev/test, small-scale	Medium-scale production	Large-scale production, multi-tenant
Kubernetes required	No	Recommended	Required

Monolithic Mode

All components run in a single process. Suitable for local environments or small workloads with the simplest configuration.

# tempo-config.yaml (Monolithic)
server:
  http_listen_port: 3200

distributor:
  receivers:
    otlp:
      protocols:
        grpc:
          endpoint: '0.0.0.0:4317'
        http:
          endpoint: '0.0.0.0:4318'
    jaeger:
      protocols:
        thrift_http:
          endpoint: '0.0.0.0:14268'
    zipkin:
      endpoint: '0.0.0.0:9411'

ingester:
  max_block_duration: 5m
  max_block_bytes: 1073741824 # 1GB

storage:
  trace:
    backend: local
    wal:
      path: /var/tempo/wal
    local:
      path: /var/tempo/blocks
    pool:
      max_workers: 100
      queue_depth: 10000

compactor:
  compaction:
    block_retention: 72h

metrics_generator:
  registry:
    external_labels:
      source: tempo
      cluster: local
  storage:
    path: /var/tempo/generator/wal
    remote_write:
      - url: http://prometheus:9090/api/v1/write
        send_exemplars: true
  traces_storage:
    path: /var/tempo/generator/traces
  processor:
    service_graphs:
      dimensions:
        - service.namespace
        - deployment.environment
    span_metrics:
      dimensions:
        - http.method
        - http.status_code
        - http.route

overrides:
  defaults:
    metrics_generator:
      processors:
        - service-graphs
        - span-metrics

Scalable Single Binary Mode

Achieves horizontal scaling by running the same binary as multiple instances. As a middle ground between Monolithic and Microservices, it provides scalability without significantly increasing configuration complexity. Each instance runs with the target flag set to scalable-single-binary.

Microservices Mode

Each component is deployed as an independent process, enabling individual scaling. In large-scale environments, specific components (e.g., Ingester) can be scaled out, or Queriers can be adjusted to match traffic patterns. In Kubernetes environments, using the Helm chart (tempo-distributed) makes deployment convenient.

Quick Start with Docker Compose

To quickly try Tempo in a local environment, use Docker Compose. The configuration below brings up Tempo (Monolithic), OTel Collector, Grafana, and Prometheus all at once.

# docker-compose.yaml
version: '3.9'

services:
  tempo:
    image: grafana/tempo:2.7.1
    command: ['-config.file=/etc/tempo/tempo.yaml']
    volumes:
      - ./tempo.yaml:/etc/tempo/tempo.yaml
      - tempo-data:/var/tempo
    ports:
      - '3200:3200' # Tempo HTTP API
      - '4317:4317' # OTLP gRPC
      - '4318:4318' # OTLP HTTP
      - '9411:9411' # Zipkin
      - '14268:14268' # Jaeger HTTP
    networks:
      - observability

  otel-collector:
    image: otel/opentelemetry-collector-contrib:0.118.0
    command: ['--config=/etc/otel-collector/config.yaml']
    volumes:
      - ./otel-collector-config.yaml:/etc/otel-collector/config.yaml
    ports:
      - '4327:4317' # OTLP gRPC (for app access)
      - '4328:4318' # OTLP HTTP
    depends_on:
      - tempo
    networks:
      - observability

  prometheus:
    image: prom/prometheus:v3.2.1
    volumes:
      - ./prometheus.yaml:/etc/prometheus/prometheus.yml
    ports:
      - '9090:9090'
    networks:
      - observability

  grafana:
    image: grafana/grafana:11.5.2
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin
      - GF_AUTH_ANONYMOUS_ENABLED=true
      - GF_AUTH_ANONYMOUS_ORG_ROLE=Admin
    volumes:
      - ./grafana-datasources.yaml:/etc/grafana/provisioning/datasources/datasources.yaml
    ports:
      - '3000:3000'
    depends_on:
      - tempo
      - prometheus
    networks:
      - observability

volumes:
  tempo-data:

networks:
  observability:
    driver: bridge

After running docker compose up -d, access Grafana at http://localhost:3000 where the Tempo datasource is automatically provisioned, allowing you to search traces immediately.

TraceQL Query Syntax

TraceQL is Tempo's dedicated query language, following a syntax system similar to PromQL and LogQL. It selects spansets with curly braces {} and chains filters and aggregations with pipeline operators.

Basic Structure

A TraceQL query consists of three main elements:

Intrinsics: Span's built-in properties (name, status, duration, kind, rootName, rootServiceName, traceDuration)
Attributes: Custom key-value pairs using scope prefixes (span., resource., link., event.)
Operators: Comparison (=, !=, >, <, >=, <=), regex (=~, !~), logical (&&, ||), structural (>, >>, <, <<, ~)

TraceQL Query Examples

// 1. Find error spans for a specific service
{ resource.service.name = "payment-service" && status = error }

// 2. HTTP GET request spans taking over 500ms
{ span.http.method = "GET" && duration > 500ms }

// 3. Spans returning 5xx responses on a specific route
{ span.http.route = "/api/v1/orders" && span.http.status_code >= 500 }

// 4. Trace call relationship between two services (structural operator)
{ resource.service.name = "api-gateway" } >> { resource.service.name = "order-service" }

// 5. Filter spans with direct parent-child relationship
{ resource.service.name = "frontend" } > { span.http.status_code = 503 }

// 6. Explore sibling span relationships
{ span.db.system = "postgresql" } ~ { span.db.system = "redis" }

// 7. Span name matching using regex
{ name =~ "HTTP.*POST" && resource.deployment.environment = "production" }

// 8. Filter by total trace duration
{ traceDuration > 3s }

// 9. Filter by root service
{ rootServiceName = "ingress-nginx" && duration > 1s }

// 10. Analysis using aggregation functions
{ resource.service.name = "checkout-service" } | rate()

// 11. Check latency distribution with histogram
{ resource.service.name = "search-service" } | histogram_over_time(duration)

// 12. Anomaly detection based on count
{ status = error } | count() > 100

Key Aggregation Functions

Function	Description	Example
`rate()`	Spans per second rate	`{} \| rate()`
`count()`	Matching span count	`{ status = error } \| count()`
`avg(field)`	Field average value	`{} \| avg(duration)`
`max(field)`	Field maximum value	`{} \| max(duration)`
`min(field)`	Field minimum value	`{} \| min(duration)`
`p50/p90/p95/p99(field)`	Percentiles	`{} \| p99(duration)`
`histogram_over_time(field)`	Histogram over time	`{} \| histogram_over_time(duration)`
`quantile_over_time(field, q)`	Quantile over time	`{} \| quantile_over_time(duration, 0.95)`

Span Metrics and Service Graphs

Tempo's Metrics Generator is a powerful feature that automatically generates metrics from received spans. Without separate metric collection, you can obtain RED metrics and service dependency graphs from trace data alone.

Span Metrics Generator

The span metrics processor converts Request Rate, Error Rate, and Duration distribution from all incoming spans into Prometheus metrics. The main metrics generated are:

traces_spanmetrics_calls_total: Total span call count
traces_spanmetrics_latency_bucket: Latency histogram buckets
traces_spanmetrics_size_total: Total span size

By configuring dimensions, you can add span attributes like http.method, http.status_code, and http.route as metric labels, allowing fine-grained RED metrics observation per endpoint.

Service Graph Generator

The service graph processor analyzes client-server span pairs to automatically map call relationships between services. The service topology can be visually confirmed in Grafana's service graph view, with request rate, error rate, and latency displayed on each edge.

Key configuration parameters include:

max_items: Maximum number of service pairs to track (default 10000)
wait: Wait time for incomplete edges (default 10s)
dimensions: Custom labels to add to the service graph
histogram_buckets: Latency histogram bucket boundaries (default 0.1, 0.2, 0.4, 0.8, 1.6, 3.2, 6.4, 12.8)

Tempo vs Jaeger vs Zipkin Comparison

When selecting a distributed tracing backend, comparing the characteristics of each tool is important.

Item	Grafana Tempo	Jaeger	Zipkin
Initial release	2020 (Grafana Labs)	2015 (Uber)	2012 (Twitter)
CNCF status	-	Graduated	-
Storage method	Object storage (no index)	Elasticsearch, Cassandra, etc.	Elasticsearch, Cassandra, MySQL
Indexing	None (Trace ID + TraceQL)	Tag-based index creation	Tag-based index creation
Storage cost	Low (S3/GCS pricing)	High (includes index storage)	High
Ingestion protocols	OTLP, Jaeger, Zipkin	OTLP, Jaeger	Zipkin, OTLP (limited)
Query language	TraceQL	Tag-based search	Tag-based search
Built-in UI	Grafana integration	Jaeger UI	Zipkin UI
Metrics generation	Built-in (Metrics Generator)	External tools needed	External tools needed
Scalability	Excellent (PB scale)	Moderate	Limited
Grafana integration	Native	Plugin	Plugin
Maintained by	Grafana Labs (commercial support)	CNCF community	Volunteer community

Selection Criteria Summary: If you already use the Grafana ecosystem and want to store large-scale traces at low cost, Tempo is optimal. If you need an independent tracing system and rich tag-based search is essential, consider Jaeger. For small teams looking to quickly adopt tracing, Zipkin remains a viable option.

OpenTelemetry Collector Integration

The most recommended way to send traces to Tempo is using OpenTelemetry Collector as an intermediate pipeline. The Collector collects traces from various sources, performs batch processing and retries, then reliably sends them to Tempo.

# otel-collector-config.yaml
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: '0.0.0.0:4317'
      http:
        endpoint: '0.0.0.0:4318'

processors:
  batch:
    timeout: 5s
    send_batch_size: 10000
    send_batch_max_size: 11000

  memory_limiter:
    check_interval: 1s
    limit_mib: 4096
    spike_limit_mib: 512

  attributes:
    actions:
      - key: deployment.environment
        value: production
        action: upsert

  tail_sampling:
    decision_wait: 10s
    num_traces: 100000
    expected_new_traces_per_sec: 1000
    policies:
      - name: errors-policy
        type: status_code
        status_code:
          status_codes:
            - ERROR
      - name: slow-traces-policy
        type: latency
        latency:
          threshold_ms: 1000
      - name: probabilistic-policy
        type: probabilistic
        probabilistic:
          sampling_percentage: 10

exporters:
  otlp/tempo:
    endpoint: 'tempo:4317'
    tls:
      insecure: true
    retry_on_failure:
      enabled: true
      initial_interval: 5s
      max_interval: 30s
      max_elapsed_time: 300s
    sending_queue:
      enabled: true
      num_consumers: 10
      queue_size: 5000

  debug:
    verbosity: basic

service:
  telemetry:
    logs:
      level: info
    metrics:
      address: '0.0.0.0:8888'

  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter, tail_sampling, attributes, batch]
      exporters: [otlp/tempo, debug]

The key aspects of this configuration are:

tail_sampling: Error spans are collected at 100%, slow traces over 1 second are also fully collected, and the rest are sampled at 10% probability. This ensures important traces are not missed while reducing storage costs.
memory_limiter: Limits Collector memory usage to 4GB to prevent OOM.
sending_queue: Buffers data in the queue and retries even during temporary Tempo outages.
batch: Groups spans into batches of 10,000 for transmission, improving network efficiency.

Storage Optimization

Tempo's storage design is centered on object storage. In production environments, choose S3, GCS, or Azure Blob Storage as the backend.

Storage Backend Comparison

Item	Amazon S3	Google Cloud Storage	Azure Blob Storage
Config key	`s3`	`gcs`	`azure`
Authentication	IAM Role, Access Key	Service Account, Workload Identity	Managed Identity, SAS Token
Cost (GB/month)	$0.023 (Standard)	$0.020 (Standard)	$0.018 (Hot)
Region availability	33+ regions	40+ regions	60+ regions
Tempo compatibility	Full support	Full support	Full support
Lifecycle policy	S3 Lifecycle	Object Lifecycle	Lifecycle Management

S3 Backend Configuration Example

storage:
  trace:
    backend: s3
    s3:
      bucket: tempo-traces-prod
      endpoint: s3.ap-northeast-2.amazonaws.com
      region: ap-northeast-2
      access_key: ${S3_ACCESS_KEY}
      secret_key: ${S3_SECRET_KEY}
      # Or omit access_key/secret_key when using IAM Role
    wal:
      path: /var/tempo/wal
    block:
      bloom_filter_false_positive: 0.01
      v2_index_downsample_bytes: 1048576
      v2_encoding: zstd
    blocklist_poll: 5m
    pool:
      max_workers: 200
      queue_depth: 20000

compactor:
  compaction:
    block_retention: 336h # 14-day retention
    compacted_block_retention: 1h
    compaction_window: 4h
    max_block_bytes: 107374182400 # 100GB
    max_compaction_objects: 6000000
    retention_concurrency: 10
  ring:
    kvstore:
      store: memberlist

Storage Optimization Tips

Block Encoding: Setting v2_encoding to zstd achieves approximately 30-40% higher compression ratio compared to snappy, but with slightly increased CPU usage. Choose snappy for write-heavy workloads, or zstd when storage cost is the priority.

Bloom Filter Tuning: Lowering bloom_filter_false_positive (e.g., 0.01 to 0.005) improves query accuracy but increases bloom filter size. In environments with frequent queries, reducing the false positive rate is beneficial for overall performance.

Block Retention Period: Set block_retention according to business requirements. 14 days (336h) is typical, but compliance requirements may necessitate 90 days or more. In such cases, using object storage lifecycle policies to automatically transition to Infrequent Access (S3) or Nearline (GCS) tiers can reduce costs.

Compactor Tuning: Setting max_block_bytes too high causes Compactor memory usage to spike, while setting it too low increases the number of blocks and degrades query performance. Around 100GB is a balanced value.

Grafana Dashboard Configuration

Tempo integrates natively with Grafana, providing rich tracing visualization without a separate UI. Below are the Grafana datasource provisioning configuration and dashboard configuration examples.

Datasource Provisioning

# grafana-datasources.yaml
apiVersion: 1

datasources:
  - name: Tempo
    type: tempo
    access: proxy
    url: http://tempo:3200
    uid: tempo
    jsonData:
      httpMethod: GET
      tracesToLogsV2:
        datasourceUid: loki
        spanStartTimeShift: '-1h'
        spanEndTimeShift: '1h'
        filterByTraceID: true
        filterBySpanID: true
      tracesToMetrics:
        datasourceUid: prometheus
        spanStartTimeShift: '-1h'
        spanEndTimeShift: '1h'
        tags:
          - key: service.name
            value: service
          - key: http.method
            value: method
      tracesToProfiles:
        datasourceUid: pyroscope
        profileTypeId: 'process_cpu:cpu:nanoseconds:cpu:nanoseconds'
        tags:
          - key: service.name
            value: service_name
      serviceMap:
        datasourceUid: prometheus
      nodeGraph:
        enabled: true
      search:
        hide: false
      traceQuery:
        timeShiftEnabled: true
        spanStartTimeShift: '-30m'
        spanEndTimeShift: '30m'

Dashboard JSON Snippet

The following is a Grafana dashboard panel configuration showing request rate and error rate by service.

{
  "panels": [
    {
      "title": "Service Request Rate",
      "type": "timeseries",
      "datasource": { "uid": "prometheus", "type": "prometheus" },
      "targets": [
        {
          "expr": "sum(rate(traces_spanmetrics_calls_total{status_code!=\"STATUS_CODE_ERROR\"}[5m])) by (service)",
          "legendFormat": "{{ service }}"
        }
      ],
      "fieldConfig": {
        "defaults": {
          "unit": "reqps",
          "custom": { "drawStyle": "line", "lineWidth": 2 }
        }
      }
    },
    {
      "title": "Service Error Rate",
      "type": "timeseries",
      "datasource": { "uid": "prometheus", "type": "prometheus" },
      "targets": [
        {
          "expr": "sum(rate(traces_spanmetrics_calls_total{status_code=\"STATUS_CODE_ERROR\"}[5m])) by (service) / sum(rate(traces_spanmetrics_calls_total[5m])) by (service) * 100",
          "legendFormat": "{{ service }}"
        }
      ],
      "fieldConfig": {
        "defaults": {
          "unit": "percent",
          "thresholds": {
            "steps": [
              { "color": "green", "value": null },
              { "color": "yellow", "value": 1 },
              { "color": "red", "value": 5 }
            ]
          }
        }
      }
    },
    {
      "title": "P99 Latency by Service",
      "type": "timeseries",
      "datasource": { "uid": "prometheus", "type": "prometheus" },
      "targets": [
        {
          "expr": "histogram_quantile(0.99, sum(rate(traces_spanmetrics_latency_bucket[5m])) by (le, service))",
          "legendFormat": "{{ service }}"
        }
      ],
      "fieldConfig": {
        "defaults": { "unit": "s" }
      }
    }
  ]
}

Key Integration Features

When using Tempo in Grafana, the most powerful features are the three cross-datasource integrations: Traces to Logs, Traces to Metrics, and Traces to Profiles.

Traces to Logs: Clicking a specific span in the trace view navigates directly to Loki logs for that time window. It automatically filters by Trace ID and Span ID, showing only related logs.
Traces to Metrics: You can jump to Prometheus metric queries based on span attributes. When slow spans are found, you can immediately check CPU and memory metrics for that service.
Traces to Profiles: When integrated with Pyroscope, you can trace the cause of slow spans down to the code level (function call profiles).

Troubleshooting

This section covers common issues and solutions encountered when operating Tempo.

Ingester Out of Memory (OOM)

Symptom: Ingester Pods repeatedly restart with OOMKilled status.

Cause: In-memory blocks become excessively large due to traffic spikes, or max_block_duration is set too long.

Solution: Reduce ingester.max_block_duration to 5 minutes to shorten the flush cycle, and limit ingester.max_block_bytes to a range of 500MB to 1GB. Kubernetes resource requests and limits should also be set sufficiently. Increasing the number of Ingester instances to distribute load is also effective.

TraceQL Query Timeout

Symptom: "context deadline exceeded" errors occur repeatedly during TraceQL searches.

Cause: Occurs when there are too many blocks (Compactor not functioning) or the search scope is too broad.

Solution: Verify that the Compactor is operating normally and adjust compaction_window appropriately. Set query_frontend.max_retries to 3 and limit results with query_frontend.search.default_result_limit. Narrowing the query time range is also an immediate mitigation.

Missing Spans

Symptom: Some spans are missing from traces, resulting in incomplete trace queries.

Cause: Often caused by hash ring inconsistency between Distributor and Ingester, network partitions, or sampling policy mismatches.

Solution: Check for "ring not healthy" messages in distributor logs. Verify that the Memberlist communication port (default 7946) is open in the firewall. Validate that the OTel Collector's tail_sampling policy is working as intended, and temporarily enable the debug exporter to trace span flow.

Compactor Block Merge Failure

Symptom: The number of blocks in object storage keeps increasing and query performance gradually degrades.

Cause: Compactor memory shortage, object storage permission issues, or max_compaction_objects limit exceeded.

Solution: Increase the Compactor's memory allocation and reconfirm storage IAM permissions (ListBucket, GetObject, PutObject, DeleteObject). Gradually increase compaction.max_compaction_objects to handle large blocks.

Operations Checklist

This is a checklist for reliably operating Tempo in production environments.

Pre-deployment Checks

Determine deployment mode (based on daily traffic: under 100GB Monolithic, 100GB-1TB Scalable, over 1TB Microservices)
Create object storage bucket and configure IAM permissions
Verify disk IOPS for WAL storage path (SSD recommended, minimum 3000 IOPS)
Configure network policies (Memberlist 7946/TCP, OTLP 4317-4318/TCP)
Provision TLS certificates (mTLS recommended)
Set resource requests/limits (Ingester: minimum 4GB RAM, Compactor: minimum 8GB RAM)

Essential Monitoring Metrics

tempo_ingester_live_traces: Active trace count (memory pressure indicator)
tempo_ingester_bytes_received_total: Bytes received per second
tempo_compactor_blocks_total: Object storage block count (alert on sustained increase)
tempo_distributor_spans_received_total: Received span count (check for drops)
tempo_query_frontend_queries_total: Query throughput and error rate
tempo_discarded_spans_total: Discarded span count (investigate immediately if non-zero)

Regular Inspection Items

Weekly: Check Compactor block merge status, monitor block count trends
Weekly: Check WAL disk usage and verify flush operation
Monthly: Review storage costs and reassess retention periods
Monthly: Benchmark TraceQL query performance (track response times for key query patterns)
Quarterly: Plan Tempo version upgrades and conduct compatibility tests

Failure Cases and Recovery

Case 1: Data Loss Due to Ingester WAL Corruption

Situation: An unexpected Kubernetes node shutdown corrupted the WAL on 2 out of 3 Ingesters. The Ingesters failed to recover WAL on restart, resulting in approximately 15 minutes of trace data loss.

Recovery Process: First, manually cleared the corrupted WAL directories and restarted the Ingesters. For the lost time window, partial recovery was achieved by resending some data buffered in the OTel Collector's sending_queue.

Lessons Learned: Set the Ingester's replication_factor to 3 so that identical spans are replicated to at least 2 Ingesters. Fixed the WAL path to local NVMe SSD and changed the PV (PersistentVolume) reclaimPolicy to Retain to preserve WAL even during Pod rescheduling. Increased Ingester Pod's terminationGracePeriodSeconds to 300 seconds to allow flush time during shutdown.

Case 2: Query Performance Collapse Due to Compactor Failure

Situation: After an S3 IAM policy change, the Compactor lost DeleteObject permissions, and block merging was interrupted for 2 weeks. Over 500,000 small blocks accumulated, causing TraceQL search response time to surge from the usual 2 seconds to 45 seconds.

Recovery Process: The S3 IAM policy was immediately corrected and the Compactor was restarted. However, attempting to merge 500,000 blocks at once caused Compactor OOM. By lowering compaction.max_compaction_objects from 1 million to 100,000 and reducing compaction_window to 1 hour, blocks were gradually merged. Full normalization took 3 days.

Lessons Learned: Set up an alarm on the tempo_compactor_blocks_total metric to receive immediate notification when the block count increases abnormally. Added a check item to the change management process to verify whether Tempo-related permissions are affected when IAM policies change.

Case 3: Cardinality Explosion from Indiscriminate Custom Attributes

Situation: The development team indiscriminately added user IDs (user.id) as span attributes, and this attribute was included in the Metrics Generator's dimensions, causing cardinality to explode to millions. Prometheus remote write became a bottleneck, delaying the entire metrics collection.

Recovery Process: Immediately removed user.id from dimensions and restarted the Metrics Generator. Deleted the affected time series in Prometheus to reclaim storage.

Lessons Learned: Always verify the cardinality of attributes added to dimensions in advance. Established a policy where attributes that could exceed 1000 cardinality are used only for TraceQL search instead of as metric labels. Also added a safety measure by setting overrides.defaults.metrics_generator.max_active_series to limit the number of time series.