Building Continuous Profiling for Kubernetes Applications with Pyroscope

Introduction
What Is Continuous Profiling?
Pyroscope Architecture
- Installing Pyroscope on Kubernetes
- eBPF Profiling with Grafana Alloy
SDK Integration by Language
Flame Graph Analysis
- How to Read a Flame Graph
- Bottleneck Diagnosis Patterns
Grafana Dashboard Integration
- Useful Grafana Panel Configuration
- Integrating Traces with Profiling
Real-World Performance Improvement Cases
- Case 1: Discovering N+1 Queries
- Case 2: Tracking Memory Leaks
Profile Comparison (Diff View)
Summary

Introduction

Continuous Profiling is the fourth pillar of observability, following Metrics, Logs, and Traces. What if CPU is sufficient but latency is high? What if memory usage is gradually increasing? Traditional monitoring makes it difficult to find the root cause.

Pyroscope is a continuous profiling tool integrated into the Grafana ecosystem that continuously tracks performance bottlenecks at the code level.

What Is Continuous Profiling?

Traditional profiling was run manually "after a problem occurred," but Continuous Profiling is a profiler that is always on.

Aspect	Traditional Profiling	Continuous Profiling
Timing	Manual, after issues	Always-on, auto-collected
Overhead	High (10-50%)	Low (2-5%)
History	None	Stored as time series
Environment	Dev/Staging	Production
Comparison	Difficult	Time-range comparison

Pyroscope Architecture

Pyroscope consists of two components:

Pyroscope Server: Collects, stores, and queries profile data
Agent/SDK: Collects profile data from applications and sends it to the server

Installing Pyroscope on Kubernetes

# Install Pyroscope with Helm
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update

helm install pyroscope grafana/pyroscope \
  --namespace monitoring \
  --create-namespace \
  --set pyroscope.extraArgs.storage.backend=filesystem \
  --set persistence.enabled=true \
  --set persistence.size=50Gi

eBPF Profiling with Grafana Alloy

Using Grafana Alloy (formerly Grafana Agent), you can collect profiles via eBPF without any code changes.

# alloy-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: alloy-config
  namespace: monitoring
data:
  config.alloy: |
    pyroscope.ebpf "instance" {
      forward_to = [pyroscope.write.endpoint.receiver]
      targets_only = false
      default_target = {"service_name" = "unspecified"}

      targets = discovery.kubernetes.pods.targets
    }

    discovery.kubernetes "pods" {
      role = "pod"
    }

    pyroscope.write "endpoint" {
      endpoint {
        url = "http://pyroscope:4040"
      }
    }

# alloy-daemonset.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: alloy
  namespace: monitoring
spec:
  selector:
    matchLabels:
      app: alloy
  template:
    metadata:
      labels:
        app: alloy
    spec:
      hostPID: true # Required for eBPF
      containers:
        - name: alloy
          image: grafana/alloy:latest
          args:
            - run
            - /etc/alloy/config.alloy
          securityContext:
            privileged: true
            runAsUser: 0
          volumeMounts:
            - name: config
              mountPath: /etc/alloy
            - name: sys-kernel
              mountPath: /sys/kernel
      volumes:
        - name: config
          configMap:
            name: alloy-config
        - name: sys-kernel
          hostPath:
            path: /sys/kernel

SDK Integration by Language

Python (FastAPI)

# pip install pyroscope-io
import pyroscope

pyroscope.configure(
    application_name="order-service",
    server_address="http://pyroscope:4040",
    tags={
        "region": "ap-northeast-2",
        "env": "production",
    },
)

from fastapi import FastAPI
app = FastAPI()

@app.get("/orders/{order_id}")
async def get_order(order_id: str):
    # CPU/memory usage of this function is automatically profiled
    order = await db.fetch_order(order_id)
    return order

Go

package main

import (
    "github.com/grafana/pyroscope-go"
    "net/http"
)

func main() {
    pyroscope.Start(pyroscope.Config{
        ApplicationName: "api-gateway",
        ServerAddress:   "http://pyroscope:4040",
        ProfileTypes: []pyroscope.ProfileType{
            pyroscope.ProfileCPU,
            pyroscope.ProfileAllocObjects,
            pyroscope.ProfileAllocSpace,
            pyroscope.ProfileInuseObjects,
            pyroscope.ProfileInuseSpace,
            pyroscope.ProfileGoroutines,
            pyroscope.ProfileMutexCount,
            pyroscope.ProfileMutexDuration,
            pyroscope.ProfileBlockCount,
            pyroscope.ProfileBlockDuration,
        },
        Tags: map[string]string{
            "env": "production",
        },
    })

    http.HandleFunc("/health", healthHandler)
    http.ListenAndServe(":8080", nil)
}

Java (Spring Boot)

// build.gradle
// implementation 'io.pyroscope:agent:0.13.1'

// application.yml
// pyroscope:
//   application-name: user-service
//   server-address: http://pyroscope:4040
//   format: jfr

import io.pyroscope.javaagent.PyroscopeAgent;
import io.pyroscope.javaagent.config.Config;

@SpringBootApplication
public class UserServiceApplication {
    public static void main(String[] args) {
        PyroscopeAgent.start(
            new Config.Builder()
                .setApplicationName("user-service")
                .setServerAddress("http://pyroscope:4040")
                .setProfilingEvent(EventType.ITIMER)
                .setFormat(Format.JFR)
                .build()
        );
        SpringApplication.run(UserServiceApplication.class, args);
    }
}

Flame Graph Analysis

How to Read a Flame Graph

In a Flame Graph:

X-axis: Sample proportion (wider means more time spent in that function)
Y-axis: Call stack depth (bottom to top)
Color: Random (for differentiation)

# Query profiles with Pyroscope CLI
pyroscope query \
  --app-name order-service \
  --from "now-1h" \
  --until "now" \
  --profile-type cpu

Bottleneck Diagnosis Patterns

CPU Bottleneck Example:

main.handleRequest (40%)
  └── db.QueryRow (35%)
      └── net/http.(*conn).readRequest (30%)
          └── crypto/tls.(*Conn).Read (25%)

Interpretation: TLS handshake consumes 25% of CPU - resolve with connection pooling

Memory Leak Example:

runtime.mallocgc (60%)
  └── encoding/json.(*Decoder).Decode (45%)
      └── main.processLargePayload (40%)

Interpretation: Massive memory allocation during JSON decoding - switch to a streaming parser

Grafana Dashboard Integration

# Add Pyroscope data source to Grafana
# Settings > Data Sources > Add > Grafana Pyroscope
# URL: http://pyroscope:4040

Useful Grafana Panel Configuration

{
  "targets": [
    {
      "datasource": { "type": "grafana-pyroscope-datasource" },
      "profileTypeId": "process_cpu:cpu:nanoseconds:cpu:nanoseconds",
      "labelSelector": "{service_name=\"order-service\"}",
      "queryType": "profile"
    }
  ]
}

Integrating Traces with Profiling

By integrating Tempo (distributed tracing) with Pyroscope, you can view the profile at the exact moment of a slow trace.

# Add Pyroscope integration to Tempo configuration
# tempo.yaml
overrides:
  defaults:
    profiles:
      pyroscope:
        url: http://pyroscope:4040

Real-World Performance Improvement Cases

Case 1: Discovering N+1 Queries

# Profile shows db.fetch_order consuming 80% of CPU
# Flame Graph reveals the same query being called repeatedly

# Before: N+1
async def get_orders_with_items(user_id):
    orders = await db.fetch_orders(user_id)
    for order in orders:
        order.items = await db.fetch_items(order.id)  # Called N times!
    return orders

# After: Single query with JOIN
async def get_orders_with_items(user_id):
    return await db.fetch_orders_with_items(user_id)  # Called once

Case 2: Tracking Memory Leaks

// Profile shows runtime.mallocgc continuously increasing
// Confirmed via inuse_space profile

// Before: Allocating new buffer every time
func processRequest(data []byte) {
    buf := make([]byte, 1024*1024)  // 1MB allocated per request
    // ...
}

// After: Reusing buffers with sync.Pool
var bufPool = sync.Pool{
    New: func() interface{} {
        buf := make([]byte, 1024*1024)
        return &buf
    },
}

func processRequest(data []byte) {
    buf := bufPool.Get().(*[]byte)
    defer bufPool.Put(buf)
    // ...
}

Profile Comparison (Diff View)

# Compare profiles before and after deployment
pyroscope diff \
  --app-name order-service \
  --left-from "2026-03-02T10:00:00Z" \
  --left-until "2026-03-02T11:00:00Z" \
  --right-from "2026-03-03T10:00:00Z" \
  --right-until "2026-03-03T11:00:00Z" \
  --profile-type cpu

In Grafana, you can also overlay and compare profiles from two time periods in the Explore panel. Red indicates an increase and green indicates a decrease.

Summary

Continuous Profiling is the final piece of the observability puzzle:

Metrics: Tells you "what" is slow
Logs: Tells you "what happened"
Traces: Tells you "where" it is slow
Profiles: Tells you "why" it is slow (at the code level)

Use the Pyroscope + Grafana combination to trace performance issues in production environments down to the code level.

Quiz: Continuous Profiling Comprehension Check (7 Questions)

Q1. How does Continuous Profiling differ from traditional profiling?

It is always on, continuously collecting in production with only 2-5% overhead, and enables time-range comparisons.

Q2. What are the advantages of eBPF profiling?

It can collect profiles from all processes at the kernel level without any code changes.

Q3. What does the X-axis represent in a Flame Graph?

The proportion of time spent in that function (and its child functions). Wider means more time consumed.

Q4. Why is hostPID: true required in the Grafana Alloy DaemonSet?

eBPF needs access to the host's process information, which requires the host PID namespace.

Q5. Which profile type is suitable for tracking memory leaks?

The inuse_space (currently used memory) profile tracks allocations that increase over time.

Q6. What are the use cases for Diff View?

Comparing profiles before and after deployment to see how new code impacted performance.

Q7. What are the benefits of integrating Traces with Profiling?

You can jump directly from a slow trace to the code-level profile at that exact moment, pinpointing the exact cause of the bottleneck.