Split View: Grafana LGTM 스택 완벽 가이드: Loki + Grafana + Tempo + Mimir로 통합 옵저버빌리티 구축

Grafana LGTM 스택 완벽 가이드: Loki + Grafana + Tempo + Mimir로 통합 옵저버빌리티 구축

1. LGTM 스택이란
2. Docker Compose로 LGTM 구축
3. 애플리케이션 계측 (Instrumentation)
- 3.1 Python (FastAPI + OpenTelemetry)
4. Grafana에서 연동하기
5. Kubernetes 배포
6. 알림 설정
7. 퀴즈

1. LGTM 스택이란

LGTM은 Grafana Labs의 오픈소스 옵저버빌리티 스택이다:

컴포넌트	역할	대체 제품
Loki	로그 수집/검색	Elasticsearch, Splunk
Grafana	시각화/대시보드	Kibana, Datadog
Tempo	분산 트레이싱	Jaeger, Zipkin
Mimir	메트릭 저장/쿼리	Thanos, Cortex

graph TB
    subgraph "Applications"
        App1[Service A]
        App2[Service B]
        App3[Service C]
    end

    subgraph "Collection"
        OTel[OpenTelemetry Collector]
        Alloy[Grafana Alloy]
    end

    subgraph "LGTM Stack"
        Mimir[Mimir<br/>Metrics]
        Loki[Loki<br/>Logs]
        Tempo[Tempo<br/>Traces]
        Grafana[Grafana<br/>Dashboard]
    end

    App1 & App2 & App3 -->|OTLP| OTel
    App1 & App2 & App3 -->|logs| Alloy

    OTel -->|metrics| Mimir
    OTel -->|traces| Tempo
    Alloy -->|logs| Loki
    OTel -->|logs| Loki

    Grafana --> Mimir & Loki & Tempo

    style Grafana fill:#ff9,stroke:#333
    style Mimir fill:#f96,stroke:#333
    style Loki fill:#6f9,stroke:#333
    style Tempo fill:#69f,stroke:#333

2. Docker Compose로 LGTM 구축

2.1 디렉토리 구조

lgtm-stack/
├── docker-compose.yaml
├── config/
│   ├── mimir.yaml
│   ├── loki.yaml
│   ├── tempo.yaml
│   ├── grafana/
│   │   └── datasources.yaml
│   └── otel-collector.yaml
└── data/
    ├── mimir/
    ├── loki/
    └── tempo/

2.2 docker-compose.yaml

version: '3.8'

services:
  # === Mimir (Metrics) ===
  mimir:
    image: grafana/mimir:2.14.0
    command: ['-config.file=/etc/mimir.yaml']
    volumes:
      - ./config/mimir.yaml:/etc/mimir.yaml
      - ./data/mimir:/data
    ports:
      - '9009:9009'

  # === Loki (Logs) ===
  loki:
    image: grafana/loki:3.3.0
    command: ['-config.file=/etc/loki.yaml']
    volumes:
      - ./config/loki.yaml:/etc/loki.yaml
      - ./data/loki:/loki
    ports:
      - '3100:3100'

  # === Tempo (Traces) ===
  tempo:
    image: grafana/tempo:2.6.0
    command: ['-config.file=/etc/tempo.yaml']
    volumes:
      - ./config/tempo.yaml:/etc/tempo.yaml
      - ./data/tempo:/var/tempo
    ports:
      - '3200:3200' # Tempo API
      - '4317:4317' # OTLP gRPC (via Tempo)

  # === OpenTelemetry Collector ===
  otel-collector:
    image: otel/opentelemetry-collector-contrib:0.112.0
    command: ['--config=/etc/otel-collector.yaml']
    volumes:
      - ./config/otel-collector.yaml:/etc/otel-collector.yaml
    ports:
      - '4318:4318' # OTLP HTTP
      - '8889:8889' # Prometheus exporter

  # === Grafana ===
  grafana:
    image: grafana/grafana:11.4.0
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin
      - GF_AUTH_ANONYMOUS_ENABLED=true
    volumes:
      - ./config/grafana/datasources.yaml:/etc/grafana/provisioning/datasources/datasources.yaml
    ports:
      - '3000:3000'
    depends_on:
      - mimir
      - loki
      - tempo

2.3 Mimir 설정

# config/mimir.yaml
multitenancy_enabled: false

blocks_storage:
  backend: filesystem
  bucket_store:
    sync_dir: /data/tsdb-sync
  filesystem:
    dir: /data/tsdb

compactor:
  data_dir: /data/compactor
  sharding_ring:
    kvstore:
      store: memberlist

distributor:
  ring:
    kvstore:
      store: memberlist

ingester:
  ring:
    kvstore:
      store: memberlist
    replication_factor: 1

server:
  http_listen_port: 9009

store_gateway:
  sharding_ring:
    replication_factor: 1

2.4 Loki 설정

# config/loki.yaml
auth_enabled: false

server:
  http_listen_port: 3100

common:
  path_prefix: /loki
  storage:
    filesystem:
      chunks_directory: /loki/chunks
      rules_directory: /loki/rules
  replication_factor: 1
  ring:
    kvstore:
      store: inmemory

schema_config:
  configs:
    - from: 2024-01-01
      store: tsdb
      object_store: filesystem
      schema: v13
      index:
        prefix: index_
        period: 24h

limits_config:
  allow_structured_metadata: true
  volume_enabled: true

2.5 Tempo 설정

# config/tempo.yaml
server:
  http_listen_port: 3200

distributor:
  receivers:
    otlp:
      protocols:
        grpc:
          endpoint: '0.0.0.0:4317'

storage:
  trace:
    backend: local
    local:
      path: /var/tempo/traces
    wal:
      path: /var/tempo/wal

metrics_generator:
  registry:
    external_labels:
      source: tempo
  storage:
    path: /var/tempo/generator/wal
    remote_write:
      - url: http://mimir:9009/api/v1/push
        send_exemplars: true

2.6 OTel Collector 설정

# config/otel-collector.yaml
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  batch:
    timeout: 5s
    send_batch_size: 1000

  resource:
    attributes:
      - key: service.instance.id
        from_attribute: host.name
        action: insert

exporters:
  otlphttp/mimir:
    endpoint: http://mimir:9009/otlp

  otlphttp/loki:
    endpoint: http://loki:3100/otlp

  otlp/tempo:
    endpoint: tempo:4317
    tls:
      insecure: true

  debug:
    verbosity: basic

service:
  pipelines:
    metrics:
      receivers: [otlp]
      processors: [batch, resource]
      exporters: [otlphttp/mimir]
    logs:
      receivers: [otlp]
      processors: [batch, resource]
      exporters: [otlphttp/loki]
    traces:
      receivers: [otlp]
      processors: [batch, resource]
      exporters: [otlp/tempo]

2.7 Grafana 데이터소스 자동 설정

# config/grafana/datasources.yaml
apiVersion: 1

datasources:
  - name: Mimir
    type: prometheus
    access: proxy
    url: http://mimir:9009/prometheus
    isDefault: true

  - name: Loki
    type: loki
    access: proxy
    url: http://loki:3100
    jsonData:
      derivedFields:
        - datasourceUid: tempo
          matcherRegex: '"traceId":"(\w+)"'
          name: TraceID
          url: '$${__value.raw}'

  - name: Tempo
    type: tempo
    access: proxy
    url: http://tempo:3200
    jsonData:
      tracesToLogsV2:
        datasourceUid: loki
        filterByTraceID: true
      tracesToMetrics:
        datasourceUid: mimir
      serviceMap:
        datasourceUid: mimir

3. 애플리케이션 계측 (Instrumentation)

3.1 Python (FastAPI + OpenTelemetry)

# app.py
from fastapi import FastAPI
from opentelemetry import trace, metrics
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.exporter.otlp.proto.grpc.metric_exporter import OTLPMetricExporter
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.resources import Resource
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor
import logging

# Resource 정의
resource = Resource.create({
    "service.name": "order-service",
    "service.version": "1.0.0",
    "deployment.environment": "production",
})

# Traces 설정
trace.set_tracer_provider(TracerProvider(resource=resource))
trace.get_tracer_provider().add_span_processor(
    BatchSpanProcessor(OTLPSpanExporter(endpoint="http://otel-collector:4317"))
)

# Metrics 설정
metrics.set_meter_provider(MeterProvider(
    resource=resource,
    metric_readers=[PeriodicExportingMetricReader(
        OTLPMetricExporter(endpoint="http://otel-collector:4317")
    )]
))

app = FastAPI()
FastAPIInstrumentor.instrument_app(app)

tracer = trace.get_tracer(__name__)
meter = metrics.get_meter(__name__)

# Custom metrics
order_counter = meter.create_counter("orders.created", description="Orders created")
order_duration = meter.create_histogram("orders.duration_ms", description="Order processing time")

@app.post("/orders")
async def create_order(order: dict):
    with tracer.start_as_current_span("create_order") as span:
        span.set_attribute("order.customer_id", order["customer_id"])

        # 비즈니스 로직
        result = process_order(order)

        order_counter.add(1, {"status": "success"})
        logging.info("Order created", extra={
            "order_id": result["id"],
            "traceId": span.get_span_context().trace_id,
        })
        return result

4. Grafana에서 연동하기

4.1 Logs → Traces 연동

sequenceDiagram
    participant Dev as 개발자
    participant G as Grafana
    participant L as Loki
    participant T as Tempo
    participant M as Mimir

    Dev->>G: 에러 로그 검색
    G->>L: LogQL 쿼리
    L-->>G: 로그 + traceId
    Dev->>G: traceId 클릭
    G->>T: Trace 조회
    T-->>G: 전체 Trace 시각화
    Dev->>G: "관련 메트릭 보기"
    G->>M: PromQL 쿼리
    M-->>G: 해당 시점 메트릭

4.2 LogQL 예시

# 에러 로그 검색
{service_name="order-service"} |= "error" | json | line_format "{{.message}}"

# 특정 traceId로 필터
{service_name=~"order-service|payment-service"} | json | traceId="abc123"

# 로그 볼륨 (메트릭처럼)
sum by (level) (count_over_time({service_name="order-service"} | json [5m]))

4.3 TraceQL 예시

# 500ms 이상 걸린 트레이스
{ duration > 500ms && span.http.status_code >= 500 }

# 특정 서비스의 느린 DB 쿼리
{ resource.service.name = "order-service" && span.db.system = "postgresql" && duration > 100ms }

5. Kubernetes 배포

# Helm으로 LGTM 스택 배포
helm repo add grafana https://grafana.github.io/helm-charts

# Mimir
helm install mimir grafana/mimir-distributed -n monitoring \
  --set mimir.structuredConfig.common.storage.backend=s3 \
  --set mimir.structuredConfig.common.storage.s3.endpoint=minio:9000

# Loki
helm install loki grafana/loki -n monitoring \
  --set loki.storage.type=s3

# Tempo
helm install tempo grafana/tempo-distributed -n monitoring

# Grafana
helm install grafana grafana/grafana -n monitoring \
  --set adminPassword=admin

6. 알림 설정

# Grafana Alert Rule (via provisioning)
apiVersion: 1
groups:
  - orgId: 1
    name: Service Health
    folder: Alerts
    interval: 1m
    rules:
      - uid: high_error_rate
        title: High Error Rate
        condition: C
        data:
          - refId: A
            datasourceUid: mimir
            model:
              expr: sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m]))
          - refId: C
            datasourceUid: __expr__
            model:
              type: threshold
              conditions:
                - evaluator:
                    type: gt
                    params: [0.05] # 5% 이상 에러

7. 퀴즈

Q1. LGTM 스택의 각 컴포넌트 역할은?

Loki: 로그 수집/검색, Grafana: 시각화/대시보드, Tempo: 분산 트레이싱, Mimir: 메트릭 장기 저장/쿼리 (Prometheus 호환).

Q2. Mimir와 Prometheus의 관계는?

Mimir는 Prometheus의 장기 저장소(Long-term Storage). PromQL 호환이며, 멀티테넌시, 수평 확장, 글로벌 쿼리를 지원. Prometheus가 수집한 메트릭을 remote_write로 Mimir에 전송.

Q3. OpenTelemetry Collector의 역할은?

애플리케이션에서 수집한 Metrics, Logs, Traces를 수신(Receive), 가공(Process), 전송(Export)하는 중간 에이전트. 벤더 중립적이며 다양한 백엔드로 라우팅 가능.

Q4. Loki가 Elasticsearch 대비 가벼운 이유는?

Loki는 로그 내용을 인덱싱하지 않고 라벨만 인덱싱. 전문 검색 대신 라벨 기반 필터링 + grep 방식. 인덱스 크기가 극적으로 작음.

Q5. Traces → Logs → Metrics 연동의 이점은?

에러 트레이스에서 관련 로그를 즉시 확인하고, 해당 시점의 메트릭(CPU, 메모리, 에러율)을 함께 분석하여 **근본 원인(Root Cause)**을 빠르게 파악.

Q6. Tempo의 metrics_generator란?

Trace 데이터에서 **RED 메트릭(Rate, Error, Duration)**을 자동 생성하여 Mimir에 전송. 별도 메트릭 계측 없이 트레이스만으로 서비스 성능 대시보드 구성 가능.

Grafana LGTM Stack Complete Guide: Building Unified Observability with Loki + Grafana + Tempo + Mimir

1. What Is the LGTM Stack
2. Building LGTM with Docker Compose
3. Application Instrumentation
- 3.1 Python (FastAPI + OpenTelemetry)
4. Integration in Grafana
5. Kubernetes Deployment
6. Alert Configuration
7. Quiz
Quiz

1. What Is the LGTM Stack

LGTM is an open-source observability stack from Grafana Labs:

Component	Role	Alternatives
Loki	Log collection/search	Elasticsearch, Splunk
Grafana	Visualization/Dashboard	Kibana, Datadog
Tempo	Distributed tracing	Jaeger, Zipkin
Mimir	Metric storage/query	Thanos, Cortex

graph TB
    subgraph "Applications"
        App1[Service A]
        App2[Service B]
        App3[Service C]
    end

    subgraph "Collection"
        OTel[OpenTelemetry Collector]
        Alloy[Grafana Alloy]
    end

    subgraph "LGTM Stack"
        Mimir[Mimir<br/>Metrics]
        Loki[Loki<br/>Logs]
        Tempo[Tempo<br/>Traces]
        Grafana[Grafana<br/>Dashboard]
    end

    App1 & App2 & App3 -->|OTLP| OTel
    App1 & App2 & App3 -->|logs| Alloy

    OTel -->|metrics| Mimir
    OTel -->|traces| Tempo
    Alloy -->|logs| Loki
    OTel -->|logs| Loki

    Grafana --> Mimir & Loki & Tempo

    style Grafana fill:#ff9,stroke:#333
    style Mimir fill:#f96,stroke:#333
    style Loki fill:#6f9,stroke:#333
    style Tempo fill:#69f,stroke:#333

2. Building LGTM with Docker Compose

2.1 Directory Structure

lgtm-stack/
├── docker-compose.yaml
├── config/
│   ├── mimir.yaml
│   ├── loki.yaml
│   ├── tempo.yaml
│   ├── grafana/
│   │   └── datasources.yaml
│   └── otel-collector.yaml
└── data/
    ├── mimir/
    ├── loki/
    └── tempo/

2.2 docker-compose.yaml

version: '3.8'

services:
  # === Mimir (Metrics) ===
  mimir:
    image: grafana/mimir:2.14.0
    command: ['-config.file=/etc/mimir.yaml']
    volumes:
      - ./config/mimir.yaml:/etc/mimir.yaml
      - ./data/mimir:/data
    ports:
      - '9009:9009'

  # === Loki (Logs) ===
  loki:
    image: grafana/loki:3.3.0
    command: ['-config.file=/etc/loki.yaml']
    volumes:
      - ./config/loki.yaml:/etc/loki.yaml
      - ./data/loki:/loki
    ports:
      - '3100:3100'

  # === Tempo (Traces) ===
  tempo:
    image: grafana/tempo:2.6.0
    command: ['-config.file=/etc/tempo.yaml']
    volumes:
      - ./config/tempo.yaml:/etc/tempo.yaml
      - ./data/tempo:/var/tempo
    ports:
      - '3200:3200' # Tempo API
      - '4317:4317' # OTLP gRPC (via Tempo)

  # === OpenTelemetry Collector ===
  otel-collector:
    image: otel/opentelemetry-collector-contrib:0.112.0
    command: ['--config=/etc/otel-collector.yaml']
    volumes:
      - ./config/otel-collector.yaml:/etc/otel-collector.yaml
    ports:
      - '4318:4318' # OTLP HTTP
      - '8889:8889' # Prometheus exporter

  # === Grafana ===
  grafana:
    image: grafana/grafana:11.4.0
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin
      - GF_AUTH_ANONYMOUS_ENABLED=true
    volumes:
      - ./config/grafana/datasources.yaml:/etc/grafana/provisioning/datasources/datasources.yaml
    ports:
      - '3000:3000'
    depends_on:
      - mimir
      - loki
      - tempo

2.3 Mimir Configuration

# config/mimir.yaml
multitenancy_enabled: false

blocks_storage:
  backend: filesystem
  bucket_store:
    sync_dir: /data/tsdb-sync
  filesystem:
    dir: /data/tsdb

compactor:
  data_dir: /data/compactor
  sharding_ring:
    kvstore:
      store: memberlist

distributor:
  ring:
    kvstore:
      store: memberlist

ingester:
  ring:
    kvstore:
      store: memberlist
    replication_factor: 1

server:
  http_listen_port: 9009

store_gateway:
  sharding_ring:
    replication_factor: 1

2.4 Loki Configuration

# config/loki.yaml
auth_enabled: false

server:
  http_listen_port: 3100

common:
  path_prefix: /loki
  storage:
    filesystem:
      chunks_directory: /loki/chunks
      rules_directory: /loki/rules
  replication_factor: 1
  ring:
    kvstore:
      store: inmemory

schema_config:
  configs:
    - from: 2024-01-01
      store: tsdb
      object_store: filesystem
      schema: v13
      index:
        prefix: index_
        period: 24h

limits_config:
  allow_structured_metadata: true
  volume_enabled: true

2.5 Tempo Configuration

# config/tempo.yaml
server:
  http_listen_port: 3200

distributor:
  receivers:
    otlp:
      protocols:
        grpc:
          endpoint: '0.0.0.0:4317'

storage:
  trace:
    backend: local
    local:
      path: /var/tempo/traces
    wal:
      path: /var/tempo/wal

metrics_generator:
  registry:
    external_labels:
      source: tempo
  storage:
    path: /var/tempo/generator/wal
    remote_write:
      - url: http://mimir:9009/api/v1/push
        send_exemplars: true

2.6 OTel Collector Configuration

# config/otel-collector.yaml
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  batch:
    timeout: 5s
    send_batch_size: 1000

  resource:
    attributes:
      - key: service.instance.id
        from_attribute: host.name
        action: insert

exporters:
  otlphttp/mimir:
    endpoint: http://mimir:9009/otlp

  otlphttp/loki:
    endpoint: http://loki:3100/otlp

  otlp/tempo:
    endpoint: tempo:4317
    tls:
      insecure: true

  debug:
    verbosity: basic

service:
  pipelines:
    metrics:
      receivers: [otlp]
      processors: [batch, resource]
      exporters: [otlphttp/mimir]
    logs:
      receivers: [otlp]
      processors: [batch, resource]
      exporters: [otlphttp/loki]
    traces:
      receivers: [otlp]
      processors: [batch, resource]
      exporters: [otlp/tempo]

2.7 Grafana Data Source Auto-Configuration

# config/grafana/datasources.yaml
apiVersion: 1

datasources:
  - name: Mimir
    type: prometheus
    access: proxy
    url: http://mimir:9009/prometheus
    isDefault: true

  - name: Loki
    type: loki
    access: proxy
    url: http://loki:3100
    jsonData:
      derivedFields:
        - datasourceUid: tempo
          matcherRegex: '"traceId":"(\w+)"'
          name: TraceID
          url: '$${__value.raw}'

  - name: Tempo
    type: tempo
    access: proxy
    url: http://tempo:3200
    jsonData:
      tracesToLogsV2:
        datasourceUid: loki
        filterByTraceID: true
      tracesToMetrics:
        datasourceUid: mimir
      serviceMap:
        datasourceUid: mimir

3. Application Instrumentation

3.1 Python (FastAPI + OpenTelemetry)

# app.py
from fastapi import FastAPI
from opentelemetry import trace, metrics
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.exporter.otlp.proto.grpc.metric_exporter import OTLPMetricExporter
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.resources import Resource
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor
import logging

# Resource definition
resource = Resource.create({
    "service.name": "order-service",
    "service.version": "1.0.0",
    "deployment.environment": "production",
})

# Traces setup
trace.set_tracer_provider(TracerProvider(resource=resource))
trace.get_tracer_provider().add_span_processor(
    BatchSpanProcessor(OTLPSpanExporter(endpoint="http://otel-collector:4317"))
)

# Metrics setup
metrics.set_meter_provider(MeterProvider(
    resource=resource,
    metric_readers=[PeriodicExportingMetricReader(
        OTLPMetricExporter(endpoint="http://otel-collector:4317")
    )]
))

app = FastAPI()
FastAPIInstrumentor.instrument_app(app)

tracer = trace.get_tracer(__name__)
meter = metrics.get_meter(__name__)

# Custom metrics
order_counter = meter.create_counter("orders.created", description="Orders created")
order_duration = meter.create_histogram("orders.duration_ms", description="Order processing time")

@app.post("/orders")
async def create_order(order: dict):
    with tracer.start_as_current_span("create_order") as span:
        span.set_attribute("order.customer_id", order["customer_id"])

        # Business logic
        result = process_order(order)

        order_counter.add(1, {"status": "success"})
        logging.info("Order created", extra={
            "order_id": result["id"],
            "traceId": span.get_span_context().trace_id,
        })
        return result

4. Integration in Grafana

4.1 Logs to Traces Integration

sequenceDiagram
    participant Dev as Developer
    participant G as Grafana
    participant L as Loki
    participant T as Tempo
    participant M as Mimir

    Dev->>G: Search error logs
    G->>L: LogQL query
    L-->>G: Logs + traceId
    Dev->>G: Click traceId
    G->>T: Query trace
    T-->>G: Visualize full trace
    Dev->>G: "View related metrics"
    G->>M: PromQL query
    M-->>G: Metrics at that point in time

4.2 LogQL Examples

# Search error logs
{service_name="order-service"} |= "error" | json | line_format "{{.message}}"

# Filter by specific traceId
{service_name=~"order-service|payment-service"} | json | traceId="abc123"

# Log volume (metric-like)
sum by (level) (count_over_time({service_name="order-service"} | json [5m]))

4.3 TraceQL Examples

# Traces taking more than 500ms
{ duration > 500ms && span.http.status_code >= 500 }

# Slow DB queries for a specific service
{ resource.service.name = "order-service" && span.db.system = "postgresql" && duration > 100ms }

5. Kubernetes Deployment

# Deploy LGTM stack with Helm
helm repo add grafana https://grafana.github.io/helm-charts

# Mimir
helm install mimir grafana/mimir-distributed -n monitoring \
  --set mimir.structuredConfig.common.storage.backend=s3 \
  --set mimir.structuredConfig.common.storage.s3.endpoint=minio:9000

# Loki
helm install loki grafana/loki -n monitoring \
  --set loki.storage.type=s3

# Tempo
helm install tempo grafana/tempo-distributed -n monitoring

# Grafana
helm install grafana grafana/grafana -n monitoring \
  --set adminPassword=admin

6. Alert Configuration

# Grafana Alert Rule (via provisioning)
apiVersion: 1
groups:
  - orgId: 1
    name: Service Health
    folder: Alerts
    interval: 1m
    rules:
      - uid: high_error_rate
        title: High Error Rate
        condition: C
        data:
          - refId: A
            datasourceUid: mimir
            model:
              expr: sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m]))
          - refId: C
            datasourceUid: __expr__
            model:
              type: threshold
              conditions:
                - evaluator:
                    type: gt
                    params: [0.05] # More than 5% errors

7. Quiz

Q1. What are the roles of each component in the LGTM stack?

Loki: Log collection/search, Grafana: Visualization/dashboard, Tempo: Distributed tracing, Mimir: Long-term metric storage/query (Prometheus-compatible).

Q2. What is the relationship between Mimir and Prometheus?

Mimir serves as long-term storage for Prometheus. It is PromQL-compatible and supports multi-tenancy, horizontal scaling, and global queries. Prometheus sends collected metrics to Mimir via remote_write.

Q3. What is the role of the OpenTelemetry Collector?

It is an intermediary agent that receives, processes, and exports Metrics, Logs, and Traces collected from applications. It is vendor-neutral and can route data to various backends.

Q4. Why is Loki lighter than Elasticsearch?

Loki does not index log content -- it only indexes labels. Instead of full-text search, it uses label-based filtering combined with a grep-style approach. This makes the index size dramatically smaller.

Q5. What are the benefits of Traces to Logs to Metrics integration?

From an error trace, you can instantly view related logs and analyze metrics (CPU, memory, error rate) at that point in time, enabling rapid root cause identification.

Q6. What is Tempo's metrics_generator?

It automatically generates RED metrics (Rate, Error, Duration) from trace data and sends them to Mimir. This enables building service performance dashboards from traces alone without separate metric instrumentation.

Quiz

Q1: What is the main topic covered in "Grafana LGTM Stack Complete Guide: Building Unified Observability with Loki + Grafana + Tempo + Mimir"?

Build a unified observability platform that manages Logs, Metrics, and Traces with the Grafana LGTM (Loki, Grafana, Tempo, Mimir) stack. Includes Docker Compose hands-on, OpenTelemetry integration, and Mermaid architecture diagrams.

Q2: What Is the LGTM Stack?

LGTM is an open-source observability stack from Grafana Labs:

Q3: Explain the core concept of Building LGTM with Docker Compose.

2.1 Directory Structure 2.2 docker-compose.yaml 2.3 Mimir Configuration 2.4 Loki Configuration 2.5 Tempo Configuration 2.6 OTel Collector Configuration 2.7 Grafana Data Source Auto-Configuration

Q4: What are the key aspects of Application Instrumentation?

3.1 Python (FastAPI + OpenTelemetry)

Q5: How does Integration in Grafana work?

4.1 Logs to Traces Integration 4.2 LogQL Examples 4.3 TraceQL Examples