Skip to content

필사 모드: Grafana LGTM 스택 완벽 가이드: Loki + Grafana + Tempo + Mimir로 통합 옵저버빌리티 구축

한국어
0%
정확도 0%
💡 왼쪽 원문을 읽으면서 오른쪽에 따라 써보세요. Tab 키로 힌트를 받을 수 있습니다.
원문 렌더가 준비되기 전까지 텍스트 가이드로 표시합니다.

1. LGTM 스택이란

**LGTM**은 Grafana Labs의 오픈소스 옵저버빌리티 스택이다:

| 컴포넌트 | 역할 | 대체 제품 |

| ----------- | ---------------- | --------------------- |

| **L**oki | 로그 수집/검색 | Elasticsearch, Splunk |

| **G**rafana | 시각화/대시보드 | Kibana, Datadog |

| **T**empo | 분산 트레이싱 | Jaeger, Zipkin |

| **M**imir | 메트릭 저장/쿼리 | Thanos, Cortex |

graph TB

subgraph "Applications"

App1[Service A]

App2[Service B]

App3[Service C]

end

subgraph "Collection"

OTel[OpenTelemetry Collector]

Alloy[Grafana Alloy]

end

subgraph "LGTM Stack"

Mimir[Mimir<br/>Metrics]

Loki[Loki<br/>Logs]

Tempo[Tempo<br/>Traces]

Grafana[Grafana<br/>Dashboard]

end

App1 & App2 & App3 -->|OTLP| OTel

App1 & App2 & App3 -->|logs| Alloy

OTel -->|metrics| Mimir

OTel -->|traces| Tempo

Alloy -->|logs| Loki

OTel -->|logs| Loki

Grafana --> Mimir & Loki & Tempo

style Grafana fill:#ff9,stroke:#333

style Mimir fill:#f96,stroke:#333

style Loki fill:#6f9,stroke:#333

style Tempo fill:#69f,stroke:#333

2. Docker Compose로 LGTM 구축

2.1 디렉토리 구조

lgtm-stack/

├── docker-compose.yaml

├── config/

│ ├── mimir.yaml

│ ├── loki.yaml

│ ├── tempo.yaml

│ ├── grafana/

│ │ └── datasources.yaml

│ └── otel-collector.yaml

└── data/

├── mimir/

├── loki/

└── tempo/

2.2 docker-compose.yaml

version: '3.8'

services:

=== Mimir (Metrics) ===

mimir:

image: grafana/mimir:2.14.0

command: ['-config.file=/etc/mimir.yaml']

volumes:

- ./config/mimir.yaml:/etc/mimir.yaml

- ./data/mimir:/data

ports:

- '9009:9009'

=== Loki (Logs) ===

loki:

image: grafana/loki:3.3.0

command: ['-config.file=/etc/loki.yaml']

volumes:

- ./config/loki.yaml:/etc/loki.yaml

- ./data/loki:/loki

ports:

- '3100:3100'

=== Tempo (Traces) ===

tempo:

image: grafana/tempo:2.6.0

command: ['-config.file=/etc/tempo.yaml']

volumes:

- ./config/tempo.yaml:/etc/tempo.yaml

- ./data/tempo:/var/tempo

ports:

- '3200:3200' # Tempo API

- '4317:4317' # OTLP gRPC (via Tempo)

=== OpenTelemetry Collector ===

otel-collector:

image: otel/opentelemetry-collector-contrib:0.112.0

command: ['--config=/etc/otel-collector.yaml']

volumes:

- ./config/otel-collector.yaml:/etc/otel-collector.yaml

ports:

- '4318:4318' # OTLP HTTP

- '8889:8889' # Prometheus exporter

=== Grafana ===

grafana:

image: grafana/grafana:11.4.0

environment:

- GF_SECURITY_ADMIN_PASSWORD=admin

- GF_AUTH_ANONYMOUS_ENABLED=true

volumes:

- ./config/grafana/datasources.yaml:/etc/grafana/provisioning/datasources/datasources.yaml

ports:

- '3000:3000'

depends_on:

- mimir

- loki

- tempo

2.3 Mimir 설정

config/mimir.yaml

multitenancy_enabled: false

blocks_storage:

backend: filesystem

bucket_store:

sync_dir: /data/tsdb-sync

filesystem:

dir: /data/tsdb

compactor:

data_dir: /data/compactor

sharding_ring:

kvstore:

store: memberlist

distributor:

ring:

kvstore:

store: memberlist

ingester:

ring:

kvstore:

store: memberlist

replication_factor: 1

server:

http_listen_port: 9009

store_gateway:

sharding_ring:

replication_factor: 1

2.4 Loki 설정

config/loki.yaml

auth_enabled: false

server:

http_listen_port: 3100

common:

path_prefix: /loki

storage:

filesystem:

chunks_directory: /loki/chunks

rules_directory: /loki/rules

replication_factor: 1

ring:

kvstore:

store: inmemory

schema_config:

configs:

- from: 2024-01-01

store: tsdb

object_store: filesystem

schema: v13

index:

prefix: index_

period: 24h

limits_config:

allow_structured_metadata: true

volume_enabled: true

2.5 Tempo 설정

config/tempo.yaml

server:

http_listen_port: 3200

distributor:

receivers:

otlp:

protocols:

grpc:

endpoint: '0.0.0.0:4317'

storage:

trace:

backend: local

local:

path: /var/tempo/traces

wal:

path: /var/tempo/wal

metrics_generator:

registry:

external_labels:

source: tempo

storage:

path: /var/tempo/generator/wal

remote_write:

- url: http://mimir:9009/api/v1/push

send_exemplars: true

2.6 OTel Collector 설정

config/otel-collector.yaml

receivers:

otlp:

protocols:

grpc:

endpoint: 0.0.0.0:4317

http:

endpoint: 0.0.0.0:4318

processors:

batch:

timeout: 5s

send_batch_size: 1000

resource:

attributes:

- key: service.instance.id

from_attribute: host.name

action: insert

exporters:

otlphttp/mimir:

endpoint: http://mimir:9009/otlp

otlphttp/loki:

endpoint: http://loki:3100/otlp

otlp/tempo:

endpoint: tempo:4317

tls:

insecure: true

debug:

verbosity: basic

service:

pipelines:

metrics:

receivers: [otlp]

processors: [batch, resource]

exporters: [otlphttp/mimir]

logs:

receivers: [otlp]

processors: [batch, resource]

exporters: [otlphttp/loki]

traces:

receivers: [otlp]

processors: [batch, resource]

exporters: [otlp/tempo]

2.7 Grafana 데이터소스 자동 설정

config/grafana/datasources.yaml

apiVersion: 1

datasources:

- name: Mimir

type: prometheus

access: proxy

url: http://mimir:9009/prometheus

isDefault: true

- name: Loki

type: loki

access: proxy

url: http://loki:3100

jsonData:

derivedFields:

- datasourceUid: tempo

matcherRegex: '"traceId":"(\w+)"'

name: TraceID

url: '$${__value.raw}'

- name: Tempo

type: tempo

access: proxy

url: http://tempo:3200

jsonData:

tracesToLogsV2:

datasourceUid: loki

filterByTraceID: true

tracesToMetrics:

datasourceUid: mimir

serviceMap:

datasourceUid: mimir

3. 애플리케이션 계측 (Instrumentation)

3.1 Python (FastAPI + OpenTelemetry)

app.py

from fastapi import FastAPI

from opentelemetry import trace, metrics

from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter

from opentelemetry.exporter.otlp.proto.grpc.metric_exporter import OTLPMetricExporter

from opentelemetry.sdk.trace import TracerProvider

from opentelemetry.sdk.metrics import MeterProvider

from opentelemetry.sdk.resources import Resource

from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor

Resource 정의

resource = Resource.create({

"service.name": "order-service",

"service.version": "1.0.0",

"deployment.environment": "production",

})

Traces 설정

trace.set_tracer_provider(TracerProvider(resource=resource))

trace.get_tracer_provider().add_span_processor(

BatchSpanProcessor(OTLPSpanExporter(endpoint="http://otel-collector:4317"))

)

Metrics 설정

metrics.set_meter_provider(MeterProvider(

resource=resource,

metric_readers=[PeriodicExportingMetricReader(

OTLPMetricExporter(endpoint="http://otel-collector:4317")

)]

))

app = FastAPI()

FastAPIInstrumentor.instrument_app(app)

tracer = trace.get_tracer(__name__)

meter = metrics.get_meter(__name__)

Custom metrics

order_counter = meter.create_counter("orders.created", description="Orders created")

order_duration = meter.create_histogram("orders.duration_ms", description="Order processing time")

@app.post("/orders")

async def create_order(order: dict):

with tracer.start_as_current_span("create_order") as span:

span.set_attribute("order.customer_id", order["customer_id"])

비즈니스 로직

result = process_order(order)

order_counter.add(1, {"status": "success"})

logging.info("Order created", extra={

"order_id": result["id"],

"traceId": span.get_span_context().trace_id,

})

return result

4. Grafana에서 연동하기

4.1 Logs → Traces 연동

sequenceDiagram

participant Dev as 개발자

participant G as Grafana

participant L as Loki

participant T as Tempo

participant M as Mimir

Dev->>G: 에러 로그 검색

G->>L: LogQL 쿼리

L-->>G: 로그 + traceId

Dev->>G: traceId 클릭

G->>T: Trace 조회

T-->>G: 전체 Trace 시각화

Dev->>G: "관련 메트릭 보기"

G->>M: PromQL 쿼리

M-->>G: 해당 시점 메트릭

4.2 LogQL 예시

에러 로그 검색

{service_name="order-service"} |= "error" | json | line_format "{{.message}}"

특정 traceId로 필터

{service_name=~"order-service|payment-service"} | json | traceId="abc123"

로그 볼륨 (메트릭처럼)

sum by (level) (count_over_time({service_name="order-service"} | json [5m]))

4.3 TraceQL 예시

500ms 이상 걸린 트레이스

{ duration > 500ms && span.http.status_code >= 500 }

특정 서비스의 느린 DB 쿼리

{ resource.service.name = "order-service" && span.db.system = "postgresql" && duration > 100ms }

5. Kubernetes 배포

Helm으로 LGTM 스택 배포

helm repo add grafana https://grafana.github.io/helm-charts

Mimir

helm install mimir grafana/mimir-distributed -n monitoring \

--set mimir.structuredConfig.common.storage.backend=s3 \

--set mimir.structuredConfig.common.storage.s3.endpoint=minio:9000

Loki

helm install loki grafana/loki -n monitoring \

--set loki.storage.type=s3

Tempo

helm install tempo grafana/tempo-distributed -n monitoring

Grafana

helm install grafana grafana/grafana -n monitoring \

--set adminPassword=admin

6. 알림 설정

Grafana Alert Rule (via provisioning)

apiVersion: 1

groups:

- orgId: 1

name: Service Health

folder: Alerts

interval: 1m

rules:

- uid: high_error_rate

title: High Error Rate

condition: C

data:

- refId: A

datasourceUid: mimir

model:

expr: sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m]))

- refId: C

datasourceUid: __expr__

model:

type: threshold

conditions:

- evaluator:

type: gt

params: [0.05] # 5% 이상 에러

7. 퀴즈

**L**oki: 로그 수집/검색, **G**rafana: 시각화/대시보드, **T**empo: 분산 트레이싱, **M**imir: 메트릭 장기 저장/쿼리 (Prometheus 호환).

Mimir는 Prometheus의 **장기 저장소(Long-term Storage)**. PromQL 호환이며, 멀티테넌시, 수평 확장, 글로벌 쿼리를 지원. Prometheus가 수집한 메트릭을 remote_write로 Mimir에 전송.

애플리케이션에서 수집한 **Metrics, Logs, Traces를 수신(Receive)**, 가공(Process), 전송(Export)하는 중간 에이전트. 벤더 중립적이며 다양한 백엔드로 라우팅 가능.

Loki는 로그 **내용을 인덱싱하지 않고 라벨만 인덱싱**. 전문 검색 대신 라벨 기반 필터링 + grep 방식. 인덱스 크기가 극적으로 작음.

에러 트레이스에서 관련 로그를 즉시 확인하고, 해당 시점의 메트릭(CPU, 메모리, 에러율)을 함께 분석하여 **근본 원인(Root Cause)**을 빠르게 파악.

Trace 데이터에서 **RED 메트릭(Rate, Error, Duration)**을 자동 생성하여 Mimir에 전송. 별도 메트릭 계측 없이 트레이스만으로 서비스 성능 대시보드 구성 가능.

현재 단락 (1/338)

**LGTM**은 Grafana Labs의 오픈소스 옵저버빌리티 스택이다:

작성 글자: 0원문 글자: 8,360작성 단락: 0/338