Skip to content
Published on

Grafana + Loki + Promtail Log Pipeline Setup Guide

Authors
  • Name
    Twitter
Grafana + Loki + Promtail Log Pipeline

Overview

In production environments, logs are essential for incident response and debugging. The ELK (Elasticsearch + Logstash + Kibana) stack has long been the standard, but Elasticsearch's high resource consumption and complex operational overhead have been persistent issues. Grafana Loki was created to solve these problems as a lightweight log aggregation system that performs only label-based indexing without indexing the log content itself, dramatically reducing storage costs and operational complexity.

In this article, we cover the entire process of building a Promtail to Loki to Grafana pipeline with Docker Compose, and configuring LogQL queries and alert rules.

Architecture Overview

The overall log pipeline flow is as follows:

graph LR
    A[Application Logs] -->|tail| B[Promtail]
    B -->|HTTP Push| C[Loki]
    C -->|Store| D[Object Storage / Filesystem]
    C -->|Query| E[Grafana]
    E -->|Alert| F[Slack / Email]

The role of each component:

ComponentRole
PromtailAgent that tails log files and sends them to Loki
LokiLog storage. Label indexing + chunk storage
GrafanaLog visualization, dashboards, and alert configuration

Environment Setup

Project Directory Structure

mkdir -p loki-stack/{config,data}
cd loki-stack

# Directory structure
# loki-stack/
# ├── docker-compose.yml
# ├── config/
# │   ├── loki-config.yml
# │   └── promtail-config.yml
# └── data/

Docker Compose Configuration

docker-compose.yml

version: '3.8'

services:
  loki:
    image: grafana/loki:3.3.2
    container_name: loki
    ports:
      - '3100:3100'
    volumes:
      - ./config/loki-config.yml:/etc/loki/local-config.yaml
      - loki-data:/loki
    command: -config.file=/etc/loki/local-config.yaml
    restart: unless-stopped
    networks:
      - loki-net

  promtail:
    image: grafana/promtail:3.3.2
    container_name: promtail
    volumes:
      - ./config/promtail-config.yml:/etc/promtail/config.yml
      - /var/log:/var/log:ro
      - /var/lib/docker/containers:/var/lib/docker/containers:ro
    command: -config.file=/etc/promtail/config.yml
    depends_on:
      - loki
    restart: unless-stopped
    networks:
      - loki-net

  grafana:
    image: grafana/grafana:11.4.0
    container_name: grafana
    ports:
      - '3000:3000'
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin123
      - GF_AUTH_ANONYMOUS_ENABLED=true
    volumes:
      - grafana-data:/var/lib/grafana
    depends_on:
      - loki
    restart: unless-stopped
    networks:
      - loki-net

volumes:
  loki-data:
  grafana-data:

networks:
  loki-net:
    driver: bridge

Loki Configuration

config/loki-config.yml

auth_enabled: false

server:
  http_listen_port: 3100
  grpc_listen_port: 9096
  log_level: info

common:
  instance_addr: 127.0.0.1
  path_prefix: /loki
  storage:
    filesystem:
      chunks_directory: /loki/chunks
      rules_directory: /loki/rules
  replication_factor: 1
  ring:
    kvstore:
      store: inmemory

schema_config:
  configs:
    - from: 2024-01-01
      store: tsdb
      object_store: filesystem
      schema: v13
      index:
        prefix: index_
        period: 24h

limits_config:
  retention_period: 720h # 30-day retention
  max_query_length: 721h
  max_query_parallelism: 4
  ingestion_rate_mb: 10
  ingestion_burst_size_mb: 20

compactor:
  working_directory: /loki/compactor
  compaction_interval: 10m
  retention_enabled: true
  retention_delete_delay: 2h
  delete_request_store: filesystem

Key configuration points:

  • schema: v13 — Latest TSDB schema for improved query performance
  • retention_period: 720h — Automatic deletion after 30 days
  • auth_enabled: false — Single-tenant mode (for development/small-scale operations)

Promtail Configuration

config/promtail-config.yml

server:
  http_listen_port: 9080
  grpc_listen_port: 0

positions:
  filename: /tmp/positions.yaml

clients:
  - url: http://loki:3100/loki/api/v1/push
    batchwait: 1s
    batchsize: 1048576 # 1MB

scrape_configs:
  # System log collection
  - job_name: system
    static_configs:
      - targets:
          - localhost
        labels:
          job: syslog
          host: myserver
          __path__: /var/log/syslog

  # Docker container log collection
  - job_name: docker
    static_configs:
      - targets:
          - localhost
        labels:
          job: docker
          __path__: /var/lib/docker/containers/**/*.log
    pipeline_stages:
      - docker: {}
      - json:
          expressions:
            stream: stream
            time: time
            log: log
      - labels:
          stream:
      - output:
          source: log

  # Nginx access logs
  - job_name: nginx
    static_configs:
      - targets:
          - localhost
        labels:
          job: nginx
          type: access
          __path__: /var/log/nginx/access.log
    pipeline_stages:
      - regex:
          expression: '^(?P<remote_addr>[\w.]+) - (?P<remote_user>\S+) \[(?P<time_local>[^\]]+)\] "(?P<method>\w+) (?P<request_uri>\S+) \S+" (?P<status>\d+) (?P<body_bytes_sent>\d+)'
      - labels:
          method:
          status:
      - metrics:
          http_requests_total:
            type: Counter
            description: 'Total HTTP requests'
            match_all: true
            action: inc

Pipeline Stages Detailed Flow

Visualizing Promtail's pipeline processing flow:

graph TD
    A[Raw Log Line] --> B[docker stage]
    B --> C[json stage - Field Extraction]
    C --> D[labels stage - Label Assignment]
    D --> E[output stage - Final Log]
    E --> F[Loki Push]

    G[Nginx Log Line] --> H[regex stage - Pattern Matching]
    H --> I[labels stage - method, status]
    I --> J[metrics stage - Counter Increment]
    J --> F

Running and Verification

# Start the stack
docker compose up -d

# Check status
docker compose ps

# Check Loki status
curl -s http://localhost:3100/ready
# ready

# Check Promtail targets
curl -s http://localhost:9080/targets | jq '.[] | .labels'

# Check labels stored in Loki
curl -s http://localhost:3100/loki/api/v1/labels | jq

Connecting Loki in Grafana

1. Add Data Source

After accessing Grafana (http://localhost:3000):

  1. Connections then Data Sources then Add data source
  2. Select Loki
  3. URL: http://loki:3100
  4. Click Save and Test

2. Basic LogQL Queries

Let's run various LogQL queries in the Explore menu:

# View all syslogs
{job="syslog"}

# Filter for ERROR keyword
{job="syslog"} |= "error"

# View only Nginx 5xx errors
{job="nginx", type="access"} | json | status >= 500

# Regex filter
{job="docker"} |~ "(?i)exception|panic|fatal"

# Error count in the last 1 hour (1-minute intervals)
count_over_time({job="syslog"} |= "error" [1m])

# Top 10 error patterns
{job="syslog"} |= "error"
  | pattern `<_> error: <message>`
  | topk(10, count_over_time({job="syslog"} |= "error" [1h]))

LogQL Operator Summary

OperatorDescriptionExample
|=Contains string{job="app"} |= "error"
!=Does not contain{job="app"} != "debug"
|~Regex match{job="app"} |~ "err|warn"
!~Regex not match{job="app"} !~ "health"
| jsonJSON parsing{job="app"} | json
| logfmtlogfmt parsing{job="app"} | logfmt

Dashboard Configuration

Provisioning Dashboards with JSON Model

You can create a config/dashboards/logs-overview.json file for automatic provisioning:

{
  "dashboard": {
    "title": "Logs Overview",
    "panels": [
      {
        "title": "Error Rate (1m)",
        "type": "timeseries",
        "targets": [
          {
            "expr": "sum(count_over_time({job=~\".+\"} |= \"error\" [1m]))",
            "legendFormat": "errors/min"
          }
        ],
        "gridPos": { "x": 0, "y": 0, "w": 12, "h": 8 }
      },
      {
        "title": "Log Volume by Job",
        "type": "barchart",
        "targets": [
          {
            "expr": "sum by (job) (count_over_time({job=~\".+\"} [5m]))",
            "legendFormat": "{{job}}"
          }
        ],
        "gridPos": { "x": 12, "y": 0, "w": 12, "h": 8 }
      },
      {
        "title": "Recent Errors",
        "type": "logs",
        "targets": [
          {
            "expr": "{job=~\".+\"} |= \"error\""
          }
        ],
        "gridPos": { "x": 0, "y": 8, "w": 24, "h": 10 }
      }
    ]
  }
}

Alerting Configuration

Let's use Grafana's Unified Alerting to send notifications when errors spike:

Contact Point Configuration (Slack)

# Grafana provisioning: config/alerting/contact-points.yml
apiVersion: 1
contactPoints:
  - orgId: 1
    name: slack-alerts
    receivers:
      - uid: slack-1
        type: slack
        settings:
          url: 'https://hooks.slack.com/services/YOUR/WEBHOOK/URL'
          title: '🚨 {{ .CommonLabels.alertname }}'
          text: |
            **Status:** {{ .Status }}
            **Summary:** {{ .CommonAnnotations.summary }}

Creating Alert Rules

In the Grafana UI:

  1. Alerting then Alert Rules then New Alert Rule
  2. Query: count_over_time({job="syslog"} |= "error" [5m]) > 50
  3. Evaluation interval: 1 minute
  4. Pending period: 5 minutes (to ignore temporary spikes)
  5. Contact Point: slack-alerts

Production Operation Tips

1. Multi-Tenant Configuration

# loki-config.yml
auth_enabled: true

# Sending tenant ID from Promtail
clients:
  - url: http://loki:3100/loki/api/v1/push
    tenant_id: team-backend

2. Using S3-Compatible Object Storage

# loki-config.yml (production)
common:
  storage:
    s3:
      endpoint: minio:9000
      bucketnames: loki-chunks
      access_key_id: ${MINIO_ACCESS_KEY}
      secret_access_key: ${MINIO_SECRET_KEY}
      insecure: true
      s3forcepathstyle: true

3. Helm Deployment in Kubernetes

# Loki Stack Helm chart
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update

helm install loki grafana/loki-stack \
  --namespace observability \
  --create-namespace \
  --set grafana.enabled=true \
  --set promtail.enabled=true \
  --set loki.persistence.enabled=true \
  --set loki.persistence.size=50Gi

4. Log Volume Control

# promtail-config.yml - Drop unnecessary logs
pipeline_stages:
  - match:
      selector: '{job="nginx"}'
      stages:
        - regex:
            expression: '"(?P<method>\w+) (?P<uri>\S+)'
        - drop:
            expression: '^/health$'
            source: uri
        - drop:
            expression: '^/metrics$'
            source: uri

End-to-End Data Flow Summary

sequenceDiagram
    participant App as Application
    participant PT as Promtail
    participant LK as Loki
    participant S3 as Storage
    participant GF as Grafana
    participant User as Operator

    App->>App: Write logs to /var/log/app.log
    PT->>App: Tail log file
    PT->>PT: Pipeline processing (parse, label, filter)
    PT->>LK: HTTP POST /loki/api/v1/push
    LK->>LK: Index labels + compress chunks
    LK->>S3: Store chunks & index
    User->>GF: Open Dashboard
    GF->>LK: LogQL query
    LK->>S3: Read chunks
    LK->>GF: Return results
    GF->>User: Render logs & charts
    GF->>GF: Evaluate alert rules
    GF-->>User: 🚨 Slack notification (if threshold exceeded)

Conclusion

The Grafana + Loki + Promtail stack enables building an effective log pipeline with significantly fewer resources compared to ELK. Here are the key advantages:

  • Low resource usage: Storage and memory savings since log content is not indexed
  • Native Grafana integration: Query metrics (Prometheus) and logs (Loki) from a single dashboard
  • LogQL: Gentle learning curve with syntax similar to PromQL
  • Horizontal scaling: Independent scaling of read/write paths with microservice architecture

For production environments, make sure to consider S3-compatible storage, multi-tenancy, and appropriate retention policies.

Quiz

Q1: What is the core reason Loki has lower storage costs compared to ELK? Loki does not index the log content itself and only indexes labels. Elasticsearch performs full-text indexing, consuming much more storage and memory.

Q2: What is the role of Promtail's positions file? It records the last read position (offset) for each log file. When Promtail restarts, it can resume reading from the previous position without duplicate sending or missing entries.

Q3: What is the difference between |= and |~ in LogQL? |= checks for exact string containment, while |~ performs regex pattern matching. For example: |= "error" checks for the string "error", while |~ "err|warn" matches "err" or "warn".

Q4: What index store does Loki's schema v13 use? It uses the TSDB (Time Series Database) store. Query performance and compression efficiency are significantly improved compared to the previous BoltDB versions.

Q5: What is the purpose of the drop stage in pipeline stages? It discards log lines matching specific conditions instead of sending them to Loki. This filters out unnecessary logs such as health checks and metrics endpoints to reduce storage costs.

Q6: How does Promtail distinguish tenants in multi-tenant mode? When tenant_id is specified in Promtail's clients configuration, it is sent to Loki via the X-Scope-OrgID HTTP header, isolating logs per tenant.

Q7: What is the role of the count_over_time function? It is a LogQL metric query that counts the number of log lines within a specified time range. For example: count_over_time({(job = 'app')} |= "error" [5m]) returns the number of error logs in the last 5 minutes.

Q8: What is the relationship between Loki's retention_period and the compactor? retention_period defines the log retention duration, and the compactor is responsible for finding and deleting expired chunks. The compactor's retention_enabled: true must be configured for retention to work.