- Published on
Grafana LGTM 스택 완벽 가이드: Loki + Grafana + Tempo + Mimir로 통합 옵저버빌리티 구축
- Authors
- Name
- 1. LGTM 스택이란
- 2. Docker Compose로 LGTM 구축
- 3. 애플리케이션 계측 (Instrumentation)
- 4. Grafana에서 연동하기
- 5. Kubernetes 배포
- 6. 알림 설정
- 7. 퀴즈
1. LGTM 스택이란
LGTM은 Grafana Labs의 오픈소스 옵저버빌리티 스택이다:
| 컴포넌트 | 역할 | 대체 제품 |
|---|---|---|
| Loki | 로그 수집/검색 | Elasticsearch, Splunk |
| Grafana | 시각화/대시보드 | Kibana, Datadog |
| Tempo | 분산 트레이싱 | Jaeger, Zipkin |
| Mimir | 메트릭 저장/쿼리 | Thanos, Cortex |
graph TB
subgraph "Applications"
App1[Service A]
App2[Service B]
App3[Service C]
end
subgraph "Collection"
OTel[OpenTelemetry Collector]
Alloy[Grafana Alloy]
end
subgraph "LGTM Stack"
Mimir[Mimir<br/>Metrics]
Loki[Loki<br/>Logs]
Tempo[Tempo<br/>Traces]
Grafana[Grafana<br/>Dashboard]
end
App1 & App2 & App3 -->|OTLP| OTel
App1 & App2 & App3 -->|logs| Alloy
OTel -->|metrics| Mimir
OTel -->|traces| Tempo
Alloy -->|logs| Loki
OTel -->|logs| Loki
Grafana --> Mimir & Loki & Tempo
style Grafana fill:#ff9,stroke:#333
style Mimir fill:#f96,stroke:#333
style Loki fill:#6f9,stroke:#333
style Tempo fill:#69f,stroke:#333
2. Docker Compose로 LGTM 구축
2.1 디렉토리 구조
lgtm-stack/
├── docker-compose.yaml
├── config/
│ ├── mimir.yaml
│ ├── loki.yaml
│ ├── tempo.yaml
│ ├── grafana/
│ │ └── datasources.yaml
│ └── otel-collector.yaml
└── data/
├── mimir/
├── loki/
└── tempo/
2.2 docker-compose.yaml
version: '3.8'
services:
# === Mimir (Metrics) ===
mimir:
image: grafana/mimir:2.14.0
command: ['-config.file=/etc/mimir.yaml']
volumes:
- ./config/mimir.yaml:/etc/mimir.yaml
- ./data/mimir:/data
ports:
- '9009:9009'
# === Loki (Logs) ===
loki:
image: grafana/loki:3.3.0
command: ['-config.file=/etc/loki.yaml']
volumes:
- ./config/loki.yaml:/etc/loki.yaml
- ./data/loki:/loki
ports:
- '3100:3100'
# === Tempo (Traces) ===
tempo:
image: grafana/tempo:2.6.0
command: ['-config.file=/etc/tempo.yaml']
volumes:
- ./config/tempo.yaml:/etc/tempo.yaml
- ./data/tempo:/var/tempo
ports:
- '3200:3200' # Tempo API
- '4317:4317' # OTLP gRPC (via Tempo)
# === OpenTelemetry Collector ===
otel-collector:
image: otel/opentelemetry-collector-contrib:0.112.0
command: ['--config=/etc/otel-collector.yaml']
volumes:
- ./config/otel-collector.yaml:/etc/otel-collector.yaml
ports:
- '4318:4318' # OTLP HTTP
- '8889:8889' # Prometheus exporter
# === Grafana ===
grafana:
image: grafana/grafana:11.4.0
environment:
- GF_SECURITY_ADMIN_PASSWORD=admin
- GF_AUTH_ANONYMOUS_ENABLED=true
volumes:
- ./config/grafana/datasources.yaml:/etc/grafana/provisioning/datasources/datasources.yaml
ports:
- '3000:3000'
depends_on:
- mimir
- loki
- tempo
2.3 Mimir 설정
# config/mimir.yaml
multitenancy_enabled: false
blocks_storage:
backend: filesystem
bucket_store:
sync_dir: /data/tsdb-sync
filesystem:
dir: /data/tsdb
compactor:
data_dir: /data/compactor
sharding_ring:
kvstore:
store: memberlist
distributor:
ring:
kvstore:
store: memberlist
ingester:
ring:
kvstore:
store: memberlist
replication_factor: 1
server:
http_listen_port: 9009
store_gateway:
sharding_ring:
replication_factor: 1
2.4 Loki 설정
# config/loki.yaml
auth_enabled: false
server:
http_listen_port: 3100
common:
path_prefix: /loki
storage:
filesystem:
chunks_directory: /loki/chunks
rules_directory: /loki/rules
replication_factor: 1
ring:
kvstore:
store: inmemory
schema_config:
configs:
- from: 2024-01-01
store: tsdb
object_store: filesystem
schema: v13
index:
prefix: index_
period: 24h
limits_config:
allow_structured_metadata: true
volume_enabled: true
2.5 Tempo 설정
# config/tempo.yaml
server:
http_listen_port: 3200
distributor:
receivers:
otlp:
protocols:
grpc:
endpoint: '0.0.0.0:4317'
storage:
trace:
backend: local
local:
path: /var/tempo/traces
wal:
path: /var/tempo/wal
metrics_generator:
registry:
external_labels:
source: tempo
storage:
path: /var/tempo/generator/wal
remote_write:
- url: http://mimir:9009/api/v1/push
send_exemplars: true
2.6 OTel Collector 설정
# config/otel-collector.yaml
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
batch:
timeout: 5s
send_batch_size: 1000
resource:
attributes:
- key: service.instance.id
from_attribute: host.name
action: insert
exporters:
otlphttp/mimir:
endpoint: http://mimir:9009/otlp
otlphttp/loki:
endpoint: http://loki:3100/otlp
otlp/tempo:
endpoint: tempo:4317
tls:
insecure: true
debug:
verbosity: basic
service:
pipelines:
metrics:
receivers: [otlp]
processors: [batch, resource]
exporters: [otlphttp/mimir]
logs:
receivers: [otlp]
processors: [batch, resource]
exporters: [otlphttp/loki]
traces:
receivers: [otlp]
processors: [batch, resource]
exporters: [otlp/tempo]
2.7 Grafana 데이터소스 자동 설정
# config/grafana/datasources.yaml
apiVersion: 1
datasources:
- name: Mimir
type: prometheus
access: proxy
url: http://mimir:9009/prometheus
isDefault: true
- name: Loki
type: loki
access: proxy
url: http://loki:3100
jsonData:
derivedFields:
- datasourceUid: tempo
matcherRegex: '"traceId":"(\w+)"'
name: TraceID
url: '$${__value.raw}'
- name: Tempo
type: tempo
access: proxy
url: http://tempo:3200
jsonData:
tracesToLogsV2:
datasourceUid: loki
filterByTraceID: true
tracesToMetrics:
datasourceUid: mimir
serviceMap:
datasourceUid: mimir
3. 애플리케이션 계측 (Instrumentation)
3.1 Python (FastAPI + OpenTelemetry)
# app.py
from fastapi import FastAPI
from opentelemetry import trace, metrics
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.exporter.otlp.proto.grpc.metric_exporter import OTLPMetricExporter
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.resources import Resource
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor
import logging
# Resource 정의
resource = Resource.create({
"service.name": "order-service",
"service.version": "1.0.0",
"deployment.environment": "production",
})
# Traces 설정
trace.set_tracer_provider(TracerProvider(resource=resource))
trace.get_tracer_provider().add_span_processor(
BatchSpanProcessor(OTLPSpanExporter(endpoint="http://otel-collector:4317"))
)
# Metrics 설정
metrics.set_meter_provider(MeterProvider(
resource=resource,
metric_readers=[PeriodicExportingMetricReader(
OTLPMetricExporter(endpoint="http://otel-collector:4317")
)]
))
app = FastAPI()
FastAPIInstrumentor.instrument_app(app)
tracer = trace.get_tracer(__name__)
meter = metrics.get_meter(__name__)
# Custom metrics
order_counter = meter.create_counter("orders.created", description="Orders created")
order_duration = meter.create_histogram("orders.duration_ms", description="Order processing time")
@app.post("/orders")
async def create_order(order: dict):
with tracer.start_as_current_span("create_order") as span:
span.set_attribute("order.customer_id", order["customer_id"])
# 비즈니스 로직
result = process_order(order)
order_counter.add(1, {"status": "success"})
logging.info("Order created", extra={
"order_id": result["id"],
"traceId": span.get_span_context().trace_id,
})
return result
4. Grafana에서 연동하기
4.1 Logs → Traces 연동
sequenceDiagram
participant Dev as 개발자
participant G as Grafana
participant L as Loki
participant T as Tempo
participant M as Mimir
Dev->>G: 에러 로그 검색
G->>L: LogQL 쿼리
L-->>G: 로그 + traceId
Dev->>G: traceId 클릭
G->>T: Trace 조회
T-->>G: 전체 Trace 시각화
Dev->>G: "관련 메트릭 보기"
G->>M: PromQL 쿼리
M-->>G: 해당 시점 메트릭
4.2 LogQL 예시
# 에러 로그 검색
{service_name="order-service"} |= "error" | json | line_format "{{.message}}"
# 특정 traceId로 필터
{service_name=~"order-service|payment-service"} | json | traceId="abc123"
# 로그 볼륨 (메트릭처럼)
sum by (level) (count_over_time({service_name="order-service"} | json [5m]))
4.3 TraceQL 예시
# 500ms 이상 걸린 트레이스
{ duration > 500ms && span.http.status_code >= 500 }
# 특정 서비스의 느린 DB 쿼리
{ resource.service.name = "order-service" && span.db.system = "postgresql" && duration > 100ms }
5. Kubernetes 배포
# Helm으로 LGTM 스택 배포
helm repo add grafana https://grafana.github.io/helm-charts
# Mimir
helm install mimir grafana/mimir-distributed -n monitoring \
--set mimir.structuredConfig.common.storage.backend=s3 \
--set mimir.structuredConfig.common.storage.s3.endpoint=minio:9000
# Loki
helm install loki grafana/loki -n monitoring \
--set loki.storage.type=s3
# Tempo
helm install tempo grafana/tempo-distributed -n monitoring
# Grafana
helm install grafana grafana/grafana -n monitoring \
--set adminPassword=admin
6. 알림 설정
# Grafana Alert Rule (via provisioning)
apiVersion: 1
groups:
- orgId: 1
name: Service Health
folder: Alerts
interval: 1m
rules:
- uid: high_error_rate
title: High Error Rate
condition: C
data:
- refId: A
datasourceUid: mimir
model:
expr: sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m]))
- refId: C
datasourceUid: __expr__
model:
type: threshold
conditions:
- evaluator:
type: gt
params: [0.05] # 5% 이상 에러
7. 퀴즈
Q1. LGTM 스택의 각 컴포넌트 역할은?
Loki: 로그 수집/검색, Grafana: 시각화/대시보드, Tempo: 분산 트레이싱, Mimir: 메트릭 장기 저장/쿼리 (Prometheus 호환).
Q2. Mimir와 Prometheus의 관계는?
Mimir는 Prometheus의 장기 저장소(Long-term Storage). PromQL 호환이며, 멀티테넌시, 수평 확장, 글로벌 쿼리를 지원. Prometheus가 수집한 메트릭을 remote_write로 Mimir에 전송.
Q3. OpenTelemetry Collector의 역할은?
애플리케이션에서 수집한 Metrics, Logs, Traces를 수신(Receive), 가공(Process), 전송(Export)하는 중간 에이전트. 벤더 중립적이며 다양한 백엔드로 라우팅 가능.
Q4. Loki가 Elasticsearch 대비 가벼운 이유는?
Loki는 로그 내용을 인덱싱하지 않고 라벨만 인덱싱. 전문 검색 대신 라벨 기반 필터링 + grep 방식. 인덱스 크기가 극적으로 작음.
Q5. Traces → Logs → Metrics 연동의 이점은?
에러 트레이스에서 관련 로그를 즉시 확인하고, 해당 시점의 메트릭(CPU, 메모리, 에러율)을 함께 분석하여 **근본 원인(Root Cause)**을 빠르게 파악.
Q6. Tempo의 metrics_generator란?
Trace 데이터에서 **RED 메트릭(Rate, Error, Duration)**을 자동 생성하여 Mimir에 전송. 별도 메트릭 계측 없이 트레이스만으로 서비스 성능 대시보드 구성 가능.