Split View: Grafana + Loki + Promtail 로그 파이프라인 구축 가이드

Grafana + Loki + Promtail 로그 파이프라인 구축 가이드

개요
아키텍처 개요
환경 준비
- 프로젝트 디렉토리 구조
Docker Compose 설정
- docker-compose.yml
Loki 설정
- config/loki-config.yml
Promtail 설정
- config/promtail-config.yml
- Pipeline Stages 상세 흐름
실행 및 확인
Grafana에서 Loki 연결
대시보드 구성
- JSON 모델로 대시보드 프로비저닝
알림 설정 (Alerting)
- Contact Point 설정 (Slack)
- Alert Rule 생성
프로덕션 운영 팁
전체 데이터 흐름 요약
마무리
퀴즈

개요

운영 환경에서 로그는 장애 대응과 디버깅의 핵심이다. ELK(Elasticsearch + Logstash + Kibana) 스택이 오랫동안 표준이었지만, Elasticsearch의 높은 리소스 사용량과 복잡한 운영 부담이 문제였다. Grafana Loki는 이런 문제를 해결하기 위해 등장한 경량 로그 수집 시스템으로, 로그 본문을 인덱싱하지 않고 라벨 기반 인덱싱만 수행하여 저장 비용과 운영 복잡도를 획기적으로 낮춘다.

이 글에서는 Promtail → Loki → Grafana 파이프라인을 Docker Compose로 구축하고, LogQL 쿼리와 알림 규칙까지 설정하는 전 과정을 다룬다.

아키텍처 개요

전체 로그 파이프라인의 흐름은 다음과 같다:

graph LR
    A[Application Logs] -->|tail| B[Promtail]
    B -->|HTTP Push| C[Loki]
    C -->|Store| D[Object Storage / Filesystem]
    C -->|Query| E[Grafana]
    E -->|Alert| F[Slack / Email]

각 컴포넌트의 역할:

컴포넌트	역할
Promtail	로그 파일을 tail하여 Loki로 전송하는 에이전트
Loki	로그 저장소. 라벨 인덱싱 + 청크 저장
Grafana	로그 시각화 및 대시보드, 알림 설정

환경 준비

프로젝트 디렉토리 구조

mkdir -p loki-stack/{config,data}
cd loki-stack

# 디렉토리 구조
# loki-stack/
# ├── docker-compose.yml
# ├── config/
# │   ├── loki-config.yml
# │   └── promtail-config.yml
# └── data/

Docker Compose 설정

docker-compose.yml

version: '3.8'

services:
  loki:
    image: grafana/loki:3.3.2
    container_name: loki
    ports:
      - '3100:3100'
    volumes:
      - ./config/loki-config.yml:/etc/loki/local-config.yaml
      - loki-data:/loki
    command: -config.file=/etc/loki/local-config.yaml
    restart: unless-stopped
    networks:
      - loki-net

  promtail:
    image: grafana/promtail:3.3.2
    container_name: promtail
    volumes:
      - ./config/promtail-config.yml:/etc/promtail/config.yml
      - /var/log:/var/log:ro
      - /var/lib/docker/containers:/var/lib/docker/containers:ro
    command: -config.file=/etc/promtail/config.yml
    depends_on:
      - loki
    restart: unless-stopped
    networks:
      - loki-net

  grafana:
    image: grafana/grafana:11.4.0
    container_name: grafana
    ports:
      - '3000:3000'
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin123
      - GF_AUTH_ANONYMOUS_ENABLED=true
    volumes:
      - grafana-data:/var/lib/grafana
    depends_on:
      - loki
    restart: unless-stopped
    networks:
      - loki-net

volumes:
  loki-data:
  grafana-data:

networks:
  loki-net:
    driver: bridge

Loki 설정

config/loki-config.yml

auth_enabled: false

server:
  http_listen_port: 3100
  grpc_listen_port: 9096
  log_level: info

common:
  instance_addr: 127.0.0.1
  path_prefix: /loki
  storage:
    filesystem:
      chunks_directory: /loki/chunks
      rules_directory: /loki/rules
  replication_factor: 1
  ring:
    kvstore:
      store: inmemory

schema_config:
  configs:
    - from: 2024-01-01
      store: tsdb
      object_store: filesystem
      schema: v13
      index:
        prefix: index_
        period: 24h

limits_config:
  retention_period: 720h # 30일 보관
  max_query_length: 721h
  max_query_parallelism: 4
  ingestion_rate_mb: 10
  ingestion_burst_size_mb: 20

compactor:
  working_directory: /loki/compactor
  compaction_interval: 10m
  retention_enabled: true
  retention_delete_delay: 2h
  delete_request_store: filesystem

핵심 설정 포인트:

schema: v13 — 최신 TSDB 스키마로 쿼리 성능 향상
retention_period: 720h — 30일 후 자동 삭제
auth_enabled: false — 단일 테넌트 모드 (개발/소규모 운영)

Promtail 설정

config/promtail-config.yml

server:
  http_listen_port: 9080
  grpc_listen_port: 0

positions:
  filename: /tmp/positions.yaml

clients:
  - url: http://loki:3100/loki/api/v1/push
    batchwait: 1s
    batchsize: 1048576 # 1MB

scrape_configs:
  # 시스템 로그 수집
  - job_name: system
    static_configs:
      - targets:
          - localhost
        labels:
          job: syslog
          host: myserver
          __path__: /var/log/syslog

  # Docker 컨테이너 로그 수집
  - job_name: docker
    static_configs:
      - targets:
          - localhost
        labels:
          job: docker
          __path__: /var/lib/docker/containers/**/*.log
    pipeline_stages:
      - docker: {}
      - json:
          expressions:
            stream: stream
            time: time
            log: log
      - labels:
          stream:
      - output:
          source: log

  # Nginx 액세스 로그
  - job_name: nginx
    static_configs:
      - targets:
          - localhost
        labels:
          job: nginx
          type: access
          __path__: /var/log/nginx/access.log
    pipeline_stages:
      - regex:
          expression: '^(?P<remote_addr>[\w.]+) - (?P<remote_user>\S+) \[(?P<time_local>[^\]]+)\] "(?P<method>\w+) (?P<request_uri>\S+) \S+" (?P<status>\d+) (?P<body_bytes_sent>\d+)'
      - labels:
          method:
          status:
      - metrics:
          http_requests_total:
            type: Counter
            description: 'Total HTTP requests'
            match_all: true
            action: inc

Pipeline Stages 상세 흐름

Promtail의 파이프라인 처리 흐름을 시각화하면:

graph TD
    A[Raw Log Line] --> B[docker stage]
    B --> C[json stage - 필드 추출]
    C --> D[labels stage - 라벨 부여]
    D --> E[output stage - 최종 로그]
    E --> F[Loki Push]

    G[Nginx Log Line] --> H[regex stage - 패턴 매칭]
    H --> I[labels stage - method, status]
    I --> J[metrics stage - 카운터 증가]
    J --> F

실행 및 확인

# 스택 시작
docker compose up -d

# 상태 확인
docker compose ps

# Loki 상태 확인
curl -s http://localhost:3100/ready
# ready

# Promtail 타겟 확인
curl -s http://localhost:9080/targets | jq '.[] | .labels'

# Loki에 저장된 라벨 확인
curl -s http://localhost:3100/loki/api/v1/labels | jq

Grafana에서 Loki 연결

1. 데이터소스 추가

Grafana(http://localhost:3000)에 접속 후:

Connections → Data Sources → Add data source
Loki 선택
URL: http://loki:3100
Save & Test 클릭

2. LogQL 기본 쿼리

Explore 메뉴에서 다양한 LogQL 쿼리를 실행해보자:

# 전체 syslog 조회
{job="syslog"}

# ERROR 키워드 필터링
{job="syslog"} |= "error"

# Nginx 5xx 에러만 조회
{job="nginx", type="access"} | json | status >= 500

# 정규식 필터
{job="docker"} |~ "(?i)exception|panic|fatal"

# 최근 1시간 에러 카운트 (1분 간격)
count_over_time({job="syslog"} |= "error" [1m])

# 상위 10개 에러 패턴
{job="syslog"} |= "error"
  | pattern `<_> error: <message>`
  | topk(10, count_over_time({job="syslog"} |= "error" [1h]))

LogQL 연산자 요약

연산자	설명	예시
`\|=`	문자열 포함	`{job="app"} \|= "error"`
`!=`	문자열 미포함	`{job="app"} != "debug"`
`\|~`	정규식 매칭	`{job="app"} \|~ "err\|warn"`
`!~`	정규식 미매칭	`{job="app"} !~ "health"`
`\| json`	JSON 파싱	`{job="app"} \| json`
`\| logfmt`	logfmt 파싱	`{job="app"} \| logfmt`

대시보드 구성

JSON 모델로 대시보드 프로비저닝

config/dashboards/logs-overview.json 파일을 작성하여 자동 프로비저닝할 수 있다:

{
  "dashboard": {
    "title": "Logs Overview",
    "panels": [
      {
        "title": "Error Rate (1m)",
        "type": "timeseries",
        "targets": [
          {
            "expr": "sum(count_over_time({job=~\".+\"} |= \"error\" [1m]))",
            "legendFormat": "errors/min"
          }
        ],
        "gridPos": { "x": 0, "y": 0, "w": 12, "h": 8 }
      },
      {
        "title": "Log Volume by Job",
        "type": "barchart",
        "targets": [
          {
            "expr": "sum by (job) (count_over_time({job=~\".+\"} [5m]))",
            "legendFormat": "{{job}}"
          }
        ],
        "gridPos": { "x": 12, "y": 0, "w": 12, "h": 8 }
      },
      {
        "title": "Recent Errors",
        "type": "logs",
        "targets": [
          {
            "expr": "{job=~\".+\"} |= \"error\""
          }
        ],
        "gridPos": { "x": 0, "y": 8, "w": 24, "h": 10 }
      }
    ]
  }
}

알림 설정 (Alerting)

Grafana의 Unified Alerting을 활용하여 에러 급증 시 알림을 보내자:

Contact Point 설정 (Slack)

# Grafana provisioning: config/alerting/contact-points.yml
apiVersion: 1
contactPoints:
  - orgId: 1
    name: slack-alerts
    receivers:
      - uid: slack-1
        type: slack
        settings:
          url: 'https://hooks.slack.com/services/YOUR/WEBHOOK/URL'
          title: '🚨 {{ .CommonLabels.alertname }}'
          text: |
            **Status:** {{ .Status }}
            **Summary:** {{ .CommonAnnotations.summary }}

Alert Rule 생성

Grafana UI에서:

Alerting → Alert Rules → New Alert Rule
쿼리: count_over_time({job="syslog"} |= "error" [5m]) > 50
평가 주기: 1분
대기 시간: 5분 (일시적 스파이크 무시)
Contact Point: slack-alerts

프로덕션 운영 팁

1. 멀티 테넌트 설정

# loki-config.yml
auth_enabled: true

# Promtail에서 테넌트 ID 전송
clients:
  - url: http://loki:3100/loki/api/v1/push
    tenant_id: team-backend

2. S3 호환 오브젝트 스토리지 사용

# loki-config.yml (프로덕션)
common:
  storage:
    s3:
      endpoint: minio:9000
      bucketnames: loki-chunks
      access_key_id: ${MINIO_ACCESS_KEY}
      secret_access_key: ${MINIO_SECRET_KEY}
      insecure: true
      s3forcepathstyle: true

3. Kubernetes 환경에서 Helm 배포

# Loki Stack Helm 차트
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update

helm install loki grafana/loki-stack \
  --namespace observability \
  --create-namespace \
  --set grafana.enabled=true \
  --set promtail.enabled=true \
  --set loki.persistence.enabled=true \
  --set loki.persistence.size=50Gi

4. 로그 볼륨 제어

# promtail-config.yml - 불필요한 로그 드롭
pipeline_stages:
  - match:
      selector: '{job="nginx"}'
      stages:
        - regex:
            expression: '"(?P<method>\w+) (?P<uri>\S+)'
        - drop:
            expression: '^/health$'
            source: uri
        - drop:
            expression: '^/metrics$'
            source: uri

전체 데이터 흐름 요약

sequenceDiagram
    participant App as Application
    participant PT as Promtail
    participant LK as Loki
    participant S3 as Storage
    participant GF as Grafana
    participant User as Operator

    App->>App: Write logs to /var/log/app.log
    PT->>App: Tail log file
    PT->>PT: Pipeline processing (parse, label, filter)
    PT->>LK: HTTP POST /loki/api/v1/push
    LK->>LK: Index labels + compress chunks
    LK->>S3: Store chunks & index
    User->>GF: Open Dashboard
    GF->>LK: LogQL query
    LK->>S3: Read chunks
    LK->>GF: Return results
    GF->>User: Render logs & charts
    GF->>GF: Evaluate alert rules
    GF-->>User: 🚨 Slack notification (if threshold exceeded)

마무리

Grafana + Loki + Promtail 스택은 ELK 대비 훨씬 적은 리소스로 효과적인 로그 파이프라인을 구축할 수 있다. 핵심 장점을 정리하면:

낮은 리소스 사용: 로그 본문을 인덱싱하지 않으므로 스토리지와 메모리 절약
Grafana 네이티브 통합: 메트릭(Prometheus)과 로그(Loki)를 하나의 대시보드에서 조회
LogQL: PromQL과 유사한 문법으로 학습 곡선이 완만
수평 확장: 마이크로서비스 아키텍처로 읽기/쓰기 경로 독립 스케일링

운영 환경에서는 S3 호환 스토리지, 멀티 테넌트, 적절한 retention 정책을 반드시 고려하자.

퀴즈

Q1: Loki가 ELK 대비 저장 비용이 낮은 핵심 이유는?

Loki는 로그 본문을 인덱싱하지 않고 라벨(label)만 인덱싱하기 때문이다. Elasticsearch는 전문(full-text) 인덱싱을 수행하여 훨씬 많은 스토리지와 메모리를 소비한다.

Q2: Promtail의 positions 파일의 역할은?

각 로그 파일에서 마지막으로 읽은 위치(오프셋)를 기록한다. Promtail 재시작 시 중복 전송이나 누락 없이 이전 위치부터 이어서 읽을 수 있다.

Q3: LogQL에서 |=와 |~의 차이는?

Q4: Loki의 schema v13에서 사용하는 인덱스 스토어는?

TSDB (Time Series Database) 스토어를 사용한다. 이전 버전의 BoltDB보다 쿼리 성능과 압축 효율이 크게 향상되었다.

Q5: Pipeline stages에서 drop 스테이지의 용도는?

특정 조건에 매칭되는 로그 라인을 Loki로 전송하지 않고 버리는 역할이다. 헬스체크나 메트릭 엔드포인트 등 불필요한 로그를 필터링하여 저장 비용을 줄인다.

Q6: 멀티 테넌트 모드에서 Promtail이 테넌트를 구분하는 방법은?

Promtail의 clients 설정에서 tenant_id를 지정하면, Loki에 HTTP 헤더 X-Scope-OrgID로 전송되어 테넌트별로 로그가 격리된다.

Q7: count_over_time 함수의 역할은?

지정된 시간 범위 내에서 로그 라인의 개수를 세는 LogQL 메트릭 쿼리이다. 예: count_over_time( app |= "error" [5m])은 최근 5분간 에러 로그 수를 반환한다.

Q8: Loki의 retention_period와 compactor의 관계는?

retention_period는 로그 보관 기간을 정의하고, compactor가 실제로 만료된 청크를 찾아 삭제하는 역할을 한다. compactor의 retention_enabled: true가 반드시 설정되어야 retention이 작동한다.

Grafana + Loki + Promtail Log Pipeline Setup Guide

Overview
Architecture Overview
Environment Setup
- Project Directory Structure
Docker Compose Configuration
- docker-compose.yml
Loki Configuration
- config/loki-config.yml
Promtail Configuration
- config/promtail-config.yml
- Pipeline Stages Detailed Flow
Running and Verification
Connecting Loki in Grafana
Dashboard Configuration
- Provisioning Dashboards with JSON Model
Alerting Configuration
- Contact Point Configuration (Slack)
- Creating Alert Rules
Production Operation Tips
End-to-End Data Flow Summary
Conclusion
Quiz

Overview

In production environments, logs are essential for incident response and debugging. The ELK (Elasticsearch + Logstash + Kibana) stack has long been the standard, but Elasticsearch's high resource consumption and complex operational overhead have been persistent issues. Grafana Loki was created to solve these problems as a lightweight log aggregation system that performs only label-based indexing without indexing the log content itself, dramatically reducing storage costs and operational complexity.

In this article, we cover the entire process of building a Promtail to Loki to Grafana pipeline with Docker Compose, and configuring LogQL queries and alert rules.

Architecture Overview

The overall log pipeline flow is as follows:

graph LR
    A[Application Logs] -->|tail| B[Promtail]
    B -->|HTTP Push| C[Loki]
    C -->|Store| D[Object Storage / Filesystem]
    C -->|Query| E[Grafana]
    E -->|Alert| F[Slack / Email]

The role of each component:

Component	Role
Promtail	Agent that tails log files and sends them to Loki
Loki	Log storage. Label indexing + chunk storage
Grafana	Log visualization, dashboards, and alert configuration

Environment Setup

Project Directory Structure

mkdir -p loki-stack/{config,data}
cd loki-stack

# Directory structure
# loki-stack/
# ├── docker-compose.yml
# ├── config/
# │   ├── loki-config.yml
# │   └── promtail-config.yml
# └── data/

Docker Compose Configuration

docker-compose.yml

version: '3.8'

services:
  loki:
    image: grafana/loki:3.3.2
    container_name: loki
    ports:
      - '3100:3100'
    volumes:
      - ./config/loki-config.yml:/etc/loki/local-config.yaml
      - loki-data:/loki
    command: -config.file=/etc/loki/local-config.yaml
    restart: unless-stopped
    networks:
      - loki-net

  promtail:
    image: grafana/promtail:3.3.2
    container_name: promtail
    volumes:
      - ./config/promtail-config.yml:/etc/promtail/config.yml
      - /var/log:/var/log:ro
      - /var/lib/docker/containers:/var/lib/docker/containers:ro
    command: -config.file=/etc/promtail/config.yml
    depends_on:
      - loki
    restart: unless-stopped
    networks:
      - loki-net

  grafana:
    image: grafana/grafana:11.4.0
    container_name: grafana
    ports:
      - '3000:3000'
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin123
      - GF_AUTH_ANONYMOUS_ENABLED=true
    volumes:
      - grafana-data:/var/lib/grafana
    depends_on:
      - loki
    restart: unless-stopped
    networks:
      - loki-net

volumes:
  loki-data:
  grafana-data:

networks:
  loki-net:
    driver: bridge

Loki Configuration

config/loki-config.yml

auth_enabled: false

server:
  http_listen_port: 3100
  grpc_listen_port: 9096
  log_level: info

common:
  instance_addr: 127.0.0.1
  path_prefix: /loki
  storage:
    filesystem:
      chunks_directory: /loki/chunks
      rules_directory: /loki/rules
  replication_factor: 1
  ring:
    kvstore:
      store: inmemory

schema_config:
  configs:
    - from: 2024-01-01
      store: tsdb
      object_store: filesystem
      schema: v13
      index:
        prefix: index_
        period: 24h

limits_config:
  retention_period: 720h # 30-day retention
  max_query_length: 721h
  max_query_parallelism: 4
  ingestion_rate_mb: 10
  ingestion_burst_size_mb: 20

compactor:
  working_directory: /loki/compactor
  compaction_interval: 10m
  retention_enabled: true
  retention_delete_delay: 2h
  delete_request_store: filesystem

Key configuration points:

schema: v13 — Latest TSDB schema for improved query performance
retention_period: 720h — Automatic deletion after 30 days
auth_enabled: false — Single-tenant mode (for development/small-scale operations)

Promtail Configuration

config/promtail-config.yml

server:
  http_listen_port: 9080
  grpc_listen_port: 0

positions:
  filename: /tmp/positions.yaml

clients:
  - url: http://loki:3100/loki/api/v1/push
    batchwait: 1s
    batchsize: 1048576 # 1MB

scrape_configs:
  # System log collection
  - job_name: system
    static_configs:
      - targets:
          - localhost
        labels:
          job: syslog
          host: myserver
          __path__: /var/log/syslog

  # Docker container log collection
  - job_name: docker
    static_configs:
      - targets:
          - localhost
        labels:
          job: docker
          __path__: /var/lib/docker/containers/**/*.log
    pipeline_stages:
      - docker: {}
      - json:
          expressions:
            stream: stream
            time: time
            log: log
      - labels:
          stream:
      - output:
          source: log

  # Nginx access logs
  - job_name: nginx
    static_configs:
      - targets:
          - localhost
        labels:
          job: nginx
          type: access
          __path__: /var/log/nginx/access.log
    pipeline_stages:
      - regex:
          expression: '^(?P<remote_addr>[\w.]+) - (?P<remote_user>\S+) \[(?P<time_local>[^\]]+)\] "(?P<method>\w+) (?P<request_uri>\S+) \S+" (?P<status>\d+) (?P<body_bytes_sent>\d+)'
      - labels:
          method:
          status:
      - metrics:
          http_requests_total:
            type: Counter
            description: 'Total HTTP requests'
            match_all: true
            action: inc

Pipeline Stages Detailed Flow

Visualizing Promtail's pipeline processing flow:

graph TD
    A[Raw Log Line] --> B[docker stage]
    B --> C[json stage - Field Extraction]
    C --> D[labels stage - Label Assignment]
    D --> E[output stage - Final Log]
    E --> F[Loki Push]

    G[Nginx Log Line] --> H[regex stage - Pattern Matching]
    H --> I[labels stage - method, status]
    I --> J[metrics stage - Counter Increment]
    J --> F

Running and Verification

# Start the stack
docker compose up -d

# Check status
docker compose ps

# Check Loki status
curl -s http://localhost:3100/ready
# ready

# Check Promtail targets
curl -s http://localhost:9080/targets | jq '.[] | .labels'

# Check labels stored in Loki
curl -s http://localhost:3100/loki/api/v1/labels | jq

Connecting Loki in Grafana

1. Add Data Source

After accessing Grafana (http://localhost:3000):

Connections then Data Sources then Add data source
Select Loki
URL: http://loki:3100
Click Save and Test

2. Basic LogQL Queries

Let's run various LogQL queries in the Explore menu:

# View all syslogs
{job="syslog"}

# Filter for ERROR keyword
{job="syslog"} |= "error"

# View only Nginx 5xx errors
{job="nginx", type="access"} | json | status >= 500

# Regex filter
{job="docker"} |~ "(?i)exception|panic|fatal"

# Error count in the last 1 hour (1-minute intervals)
count_over_time({job="syslog"} |= "error" [1m])

# Top 10 error patterns
{job="syslog"} |= "error"
  | pattern `<_> error: <message>`
  | topk(10, count_over_time({job="syslog"} |= "error" [1h]))

LogQL Operator Summary

Operator	Description	Example
`\|=`	Contains string	`{job="app"} \|= "error"`
`!=`	Does not contain	`{job="app"} != "debug"`
`\|~`	Regex match	`{job="app"} \|~ "err\|warn"`
`!~`	Regex not match	`{job="app"} !~ "health"`
`\| json`	JSON parsing	`{job="app"} \| json`
`\| logfmt`	logfmt parsing	`{job="app"} \| logfmt`

Dashboard Configuration

Provisioning Dashboards with JSON Model

You can create a config/dashboards/logs-overview.json file for automatic provisioning:

{
  "dashboard": {
    "title": "Logs Overview",
    "panels": [
      {
        "title": "Error Rate (1m)",
        "type": "timeseries",
        "targets": [
          {
            "expr": "sum(count_over_time({job=~\".+\"} |= \"error\" [1m]))",
            "legendFormat": "errors/min"
          }
        ],
        "gridPos": { "x": 0, "y": 0, "w": 12, "h": 8 }
      },
      {
        "title": "Log Volume by Job",
        "type": "barchart",
        "targets": [
          {
            "expr": "sum by (job) (count_over_time({job=~\".+\"} [5m]))",
            "legendFormat": "{{job}}"
          }
        ],
        "gridPos": { "x": 12, "y": 0, "w": 12, "h": 8 }
      },
      {
        "title": "Recent Errors",
        "type": "logs",
        "targets": [
          {
            "expr": "{job=~\".+\"} |= \"error\""
          }
        ],
        "gridPos": { "x": 0, "y": 8, "w": 24, "h": 10 }
      }
    ]
  }
}

Alerting Configuration

Let's use Grafana's Unified Alerting to send notifications when errors spike:

Contact Point Configuration (Slack)

# Grafana provisioning: config/alerting/contact-points.yml
apiVersion: 1
contactPoints:
  - orgId: 1
    name: slack-alerts
    receivers:
      - uid: slack-1
        type: slack
        settings:
          url: 'https://hooks.slack.com/services/YOUR/WEBHOOK/URL'
          title: '🚨 {{ .CommonLabels.alertname }}'
          text: |
            **Status:** {{ .Status }}
            **Summary:** {{ .CommonAnnotations.summary }}

Creating Alert Rules

In the Grafana UI:

Alerting then Alert Rules then New Alert Rule
Query: count_over_time({job="syslog"} |= "error" [5m]) > 50
Evaluation interval: 1 minute
Pending period: 5 minutes (to ignore temporary spikes)
Contact Point: slack-alerts

Production Operation Tips

1. Multi-Tenant Configuration

# loki-config.yml
auth_enabled: true

# Sending tenant ID from Promtail
clients:
  - url: http://loki:3100/loki/api/v1/push
    tenant_id: team-backend

2. Using S3-Compatible Object Storage

# loki-config.yml (production)
common:
  storage:
    s3:
      endpoint: minio:9000
      bucketnames: loki-chunks
      access_key_id: ${MINIO_ACCESS_KEY}
      secret_access_key: ${MINIO_SECRET_KEY}
      insecure: true
      s3forcepathstyle: true

3. Helm Deployment in Kubernetes

# Loki Stack Helm chart
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update

helm install loki grafana/loki-stack \
  --namespace observability \
  --create-namespace \
  --set grafana.enabled=true \
  --set promtail.enabled=true \
  --set loki.persistence.enabled=true \
  --set loki.persistence.size=50Gi

4. Log Volume Control

# promtail-config.yml - Drop unnecessary logs
pipeline_stages:
  - match:
      selector: '{job="nginx"}'
      stages:
        - regex:
            expression: '"(?P<method>\w+) (?P<uri>\S+)'
        - drop:
            expression: '^/health$'
            source: uri
        - drop:
            expression: '^/metrics$'
            source: uri

End-to-End Data Flow Summary

sequenceDiagram
    participant App as Application
    participant PT as Promtail
    participant LK as Loki
    participant S3 as Storage
    participant GF as Grafana
    participant User as Operator

    App->>App: Write logs to /var/log/app.log
    PT->>App: Tail log file
    PT->>PT: Pipeline processing (parse, label, filter)
    PT->>LK: HTTP POST /loki/api/v1/push
    LK->>LK: Index labels + compress chunks
    LK->>S3: Store chunks & index
    User->>GF: Open Dashboard
    GF->>LK: LogQL query
    LK->>S3: Read chunks
    LK->>GF: Return results
    GF->>User: Render logs & charts
    GF->>GF: Evaluate alert rules
    GF-->>User: 🚨 Slack notification (if threshold exceeded)

Conclusion

The Grafana + Loki + Promtail stack enables building an effective log pipeline with significantly fewer resources compared to ELK. Here are the key advantages:

Low resource usage: Storage and memory savings since log content is not indexed
Native Grafana integration: Query metrics (Prometheus) and logs (Loki) from a single dashboard
LogQL: Gentle learning curve with syntax similar to PromQL
Horizontal scaling: Independent scaling of read/write paths with microservice architecture

For production environments, make sure to consider S3-compatible storage, multi-tenancy, and appropriate retention policies.

Quiz

Q1: What is the core reason Loki has lower storage costs compared to ELK?

Loki does not index the log content itself and only indexes labels. Elasticsearch performs full-text indexing, consuming much more storage and memory.

Q2: What is the role of Promtail's positions file?

It records the last read position (offset) for each log file. When Promtail restarts, it can resume reading from the previous position without duplicate sending or missing entries.

Q3: What is the difference between |= and |~ in LogQL?

Q4: What index store does Loki's schema v13 use?

It uses the TSDB (Time Series Database) store. Query performance and compression efficiency are significantly improved compared to the previous BoltDB versions.

Q5: What is the purpose of the drop stage in pipeline stages?

It discards log lines matching specific conditions instead of sending them to Loki. This filters out unnecessary logs such as health checks and metrics endpoints to reduce storage costs.

Q6: How does Promtail distinguish tenants in multi-tenant mode?

When tenant_id is specified in Promtail's clients configuration, it is sent to Loki via the X-Scope-OrgID HTTP header, isolating logs per tenant.

Q7: What is the role of the count_over_time function?

It is a LogQL metric query that counts the number of log lines within a specified time range. For example: count_over_time({(job = 'app')} |= "error" [5m]) returns the number of error logs in the last 5 minutes.

Q8: What is the relationship between Loki's retention_period and the compactor?

retention_period defines the log retention duration, and the compactor is responsible for finding and deleting expired chunks. The compactor's retention_enabled: true must be configured for retention to work.