- Authors
- Name
- 1. What Is the LGTM Stack
- 2. Building LGTM with Docker Compose
- 3. Application Instrumentation
- 4. Integration in Grafana
- 5. Kubernetes Deployment
- 6. Alert Configuration
- 7. Quiz
1. What Is the LGTM Stack
LGTM is an open-source observability stack from Grafana Labs:
| Component | Role | Alternatives |
|---|---|---|
| Loki | Log collection/search | Elasticsearch, Splunk |
| Grafana | Visualization/Dashboard | Kibana, Datadog |
| Tempo | Distributed tracing | Jaeger, Zipkin |
| Mimir | Metric storage/query | Thanos, Cortex |
graph TB
subgraph "Applications"
App1[Service A]
App2[Service B]
App3[Service C]
end
subgraph "Collection"
OTel[OpenTelemetry Collector]
Alloy[Grafana Alloy]
end
subgraph "LGTM Stack"
Mimir[Mimir<br/>Metrics]
Loki[Loki<br/>Logs]
Tempo[Tempo<br/>Traces]
Grafana[Grafana<br/>Dashboard]
end
App1 & App2 & App3 -->|OTLP| OTel
App1 & App2 & App3 -->|logs| Alloy
OTel -->|metrics| Mimir
OTel -->|traces| Tempo
Alloy -->|logs| Loki
OTel -->|logs| Loki
Grafana --> Mimir & Loki & Tempo
style Grafana fill:#ff9,stroke:#333
style Mimir fill:#f96,stroke:#333
style Loki fill:#6f9,stroke:#333
style Tempo fill:#69f,stroke:#333
2. Building LGTM with Docker Compose
2.1 Directory Structure
lgtm-stack/
├── docker-compose.yaml
├── config/
│ ├── mimir.yaml
│ ├── loki.yaml
│ ├── tempo.yaml
│ ├── grafana/
│ │ └── datasources.yaml
│ └── otel-collector.yaml
└── data/
├── mimir/
├── loki/
└── tempo/
2.2 docker-compose.yaml
version: '3.8'
services:
# === Mimir (Metrics) ===
mimir:
image: grafana/mimir:2.14.0
command: ['-config.file=/etc/mimir.yaml']
volumes:
- ./config/mimir.yaml:/etc/mimir.yaml
- ./data/mimir:/data
ports:
- '9009:9009'
# === Loki (Logs) ===
loki:
image: grafana/loki:3.3.0
command: ['-config.file=/etc/loki.yaml']
volumes:
- ./config/loki.yaml:/etc/loki.yaml
- ./data/loki:/loki
ports:
- '3100:3100'
# === Tempo (Traces) ===
tempo:
image: grafana/tempo:2.6.0
command: ['-config.file=/etc/tempo.yaml']
volumes:
- ./config/tempo.yaml:/etc/tempo.yaml
- ./data/tempo:/var/tempo
ports:
- '3200:3200' # Tempo API
- '4317:4317' # OTLP gRPC (via Tempo)
# === OpenTelemetry Collector ===
otel-collector:
image: otel/opentelemetry-collector-contrib:0.112.0
command: ['--config=/etc/otel-collector.yaml']
volumes:
- ./config/otel-collector.yaml:/etc/otel-collector.yaml
ports:
- '4318:4318' # OTLP HTTP
- '8889:8889' # Prometheus exporter
# === Grafana ===
grafana:
image: grafana/grafana:11.4.0
environment:
- GF_SECURITY_ADMIN_PASSWORD=admin
- GF_AUTH_ANONYMOUS_ENABLED=true
volumes:
- ./config/grafana/datasources.yaml:/etc/grafana/provisioning/datasources/datasources.yaml
ports:
- '3000:3000'
depends_on:
- mimir
- loki
- tempo
2.3 Mimir Configuration
# config/mimir.yaml
multitenancy_enabled: false
blocks_storage:
backend: filesystem
bucket_store:
sync_dir: /data/tsdb-sync
filesystem:
dir: /data/tsdb
compactor:
data_dir: /data/compactor
sharding_ring:
kvstore:
store: memberlist
distributor:
ring:
kvstore:
store: memberlist
ingester:
ring:
kvstore:
store: memberlist
replication_factor: 1
server:
http_listen_port: 9009
store_gateway:
sharding_ring:
replication_factor: 1
2.4 Loki Configuration
# config/loki.yaml
auth_enabled: false
server:
http_listen_port: 3100
common:
path_prefix: /loki
storage:
filesystem:
chunks_directory: /loki/chunks
rules_directory: /loki/rules
replication_factor: 1
ring:
kvstore:
store: inmemory
schema_config:
configs:
- from: 2024-01-01
store: tsdb
object_store: filesystem
schema: v13
index:
prefix: index_
period: 24h
limits_config:
allow_structured_metadata: true
volume_enabled: true
2.5 Tempo Configuration
# config/tempo.yaml
server:
http_listen_port: 3200
distributor:
receivers:
otlp:
protocols:
grpc:
endpoint: '0.0.0.0:4317'
storage:
trace:
backend: local
local:
path: /var/tempo/traces
wal:
path: /var/tempo/wal
metrics_generator:
registry:
external_labels:
source: tempo
storage:
path: /var/tempo/generator/wal
remote_write:
- url: http://mimir:9009/api/v1/push
send_exemplars: true
2.6 OTel Collector Configuration
# config/otel-collector.yaml
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
batch:
timeout: 5s
send_batch_size: 1000
resource:
attributes:
- key: service.instance.id
from_attribute: host.name
action: insert
exporters:
otlphttp/mimir:
endpoint: http://mimir:9009/otlp
otlphttp/loki:
endpoint: http://loki:3100/otlp
otlp/tempo:
endpoint: tempo:4317
tls:
insecure: true
debug:
verbosity: basic
service:
pipelines:
metrics:
receivers: [otlp]
processors: [batch, resource]
exporters: [otlphttp/mimir]
logs:
receivers: [otlp]
processors: [batch, resource]
exporters: [otlphttp/loki]
traces:
receivers: [otlp]
processors: [batch, resource]
exporters: [otlp/tempo]
2.7 Grafana Data Source Auto-Configuration
# config/grafana/datasources.yaml
apiVersion: 1
datasources:
- name: Mimir
type: prometheus
access: proxy
url: http://mimir:9009/prometheus
isDefault: true
- name: Loki
type: loki
access: proxy
url: http://loki:3100
jsonData:
derivedFields:
- datasourceUid: tempo
matcherRegex: '"traceId":"(\w+)"'
name: TraceID
url: '$${__value.raw}'
- name: Tempo
type: tempo
access: proxy
url: http://tempo:3200
jsonData:
tracesToLogsV2:
datasourceUid: loki
filterByTraceID: true
tracesToMetrics:
datasourceUid: mimir
serviceMap:
datasourceUid: mimir
3. Application Instrumentation
3.1 Python (FastAPI + OpenTelemetry)
# app.py
from fastapi import FastAPI
from opentelemetry import trace, metrics
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.exporter.otlp.proto.grpc.metric_exporter import OTLPMetricExporter
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.resources import Resource
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor
import logging
# Resource definition
resource = Resource.create({
"service.name": "order-service",
"service.version": "1.0.0",
"deployment.environment": "production",
})
# Traces setup
trace.set_tracer_provider(TracerProvider(resource=resource))
trace.get_tracer_provider().add_span_processor(
BatchSpanProcessor(OTLPSpanExporter(endpoint="http://otel-collector:4317"))
)
# Metrics setup
metrics.set_meter_provider(MeterProvider(
resource=resource,
metric_readers=[PeriodicExportingMetricReader(
OTLPMetricExporter(endpoint="http://otel-collector:4317")
)]
))
app = FastAPI()
FastAPIInstrumentor.instrument_app(app)
tracer = trace.get_tracer(__name__)
meter = metrics.get_meter(__name__)
# Custom metrics
order_counter = meter.create_counter("orders.created", description="Orders created")
order_duration = meter.create_histogram("orders.duration_ms", description="Order processing time")
@app.post("/orders")
async def create_order(order: dict):
with tracer.start_as_current_span("create_order") as span:
span.set_attribute("order.customer_id", order["customer_id"])
# Business logic
result = process_order(order)
order_counter.add(1, {"status": "success"})
logging.info("Order created", extra={
"order_id": result["id"],
"traceId": span.get_span_context().trace_id,
})
return result
4. Integration in Grafana
4.1 Logs to Traces Integration
sequenceDiagram
participant Dev as Developer
participant G as Grafana
participant L as Loki
participant T as Tempo
participant M as Mimir
Dev->>G: Search error logs
G->>L: LogQL query
L-->>G: Logs + traceId
Dev->>G: Click traceId
G->>T: Query trace
T-->>G: Visualize full trace
Dev->>G: "View related metrics"
G->>M: PromQL query
M-->>G: Metrics at that point in time
4.2 LogQL Examples
# Search error logs
{service_name="order-service"} |= "error" | json | line_format "{{.message}}"
# Filter by specific traceId
{service_name=~"order-service|payment-service"} | json | traceId="abc123"
# Log volume (metric-like)
sum by (level) (count_over_time({service_name="order-service"} | json [5m]))
4.3 TraceQL Examples
# Traces taking more than 500ms
{ duration > 500ms && span.http.status_code >= 500 }
# Slow DB queries for a specific service
{ resource.service.name = "order-service" && span.db.system = "postgresql" && duration > 100ms }
5. Kubernetes Deployment
# Deploy LGTM stack with Helm
helm repo add grafana https://grafana.github.io/helm-charts
# Mimir
helm install mimir grafana/mimir-distributed -n monitoring \
--set mimir.structuredConfig.common.storage.backend=s3 \
--set mimir.structuredConfig.common.storage.s3.endpoint=minio:9000
# Loki
helm install loki grafana/loki -n monitoring \
--set loki.storage.type=s3
# Tempo
helm install tempo grafana/tempo-distributed -n monitoring
# Grafana
helm install grafana grafana/grafana -n monitoring \
--set adminPassword=admin
6. Alert Configuration
# Grafana Alert Rule (via provisioning)
apiVersion: 1
groups:
- orgId: 1
name: Service Health
folder: Alerts
interval: 1m
rules:
- uid: high_error_rate
title: High Error Rate
condition: C
data:
- refId: A
datasourceUid: mimir
model:
expr: sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m]))
- refId: C
datasourceUid: __expr__
model:
type: threshold
conditions:
- evaluator:
type: gt
params: [0.05] # More than 5% errors
7. Quiz
Q1. What are the roles of each component in the LGTM stack?
Loki: Log collection/search, Grafana: Visualization/dashboard, Tempo: Distributed tracing, Mimir: Long-term metric storage/query (Prometheus-compatible).
Q2. What is the relationship between Mimir and Prometheus?
Mimir serves as long-term storage for Prometheus. It is PromQL-compatible and supports multi-tenancy, horizontal scaling, and global queries. Prometheus sends collected metrics to Mimir via remote_write.
Q3. What is the role of the OpenTelemetry Collector?
It is an intermediary agent that receives, processes, and exports Metrics, Logs, and Traces collected from applications. It is vendor-neutral and can route data to various backends.
Q4. Why is Loki lighter than Elasticsearch?
Loki does not index log content -- it only indexes labels. Instead of full-text search, it uses label-based filtering combined with a grep-style approach. This makes the index size dramatically smaller.
Q5. What are the benefits of Traces to Logs to Metrics integration?
From an error trace, you can instantly view related logs and analyze metrics (CPU, memory, error rate) at that point in time, enabling rapid root cause identification.
Q6. What is Tempo's metrics_generator?
It automatically generates RED metrics (Rate, Error, Duration) from trace data and sends them to Mimir. This enables building service performance dashboards from traces alone without separate metric instrumentation.