- Authors
- Name
- Introduction
- What Is Continuous Profiling?
- Pyroscope Architecture
- SDK Integration by Language
- Flame Graph Analysis
- Grafana Dashboard Integration
- Real-World Performance Improvement Cases
- Profile Comparison (Diff View)
- Summary
Introduction
Continuous Profiling is the fourth pillar of observability, following Metrics, Logs, and Traces. What if CPU is sufficient but latency is high? What if memory usage is gradually increasing? Traditional monitoring makes it difficult to find the root cause.
Pyroscope is a continuous profiling tool integrated into the Grafana ecosystem that continuously tracks performance bottlenecks at the code level.
What Is Continuous Profiling?
Traditional profiling was run manually "after a problem occurred," but Continuous Profiling is a profiler that is always on.
| Aspect | Traditional Profiling | Continuous Profiling |
|---|---|---|
| Timing | Manual, after issues | Always-on, auto-collected |
| Overhead | High (10-50%) | Low (2-5%) |
| History | None | Stored as time series |
| Environment | Dev/Staging | Production |
| Comparison | Difficult | Time-range comparison |
Pyroscope Architecture
Pyroscope consists of two components:
- Pyroscope Server: Collects, stores, and queries profile data
- Agent/SDK: Collects profile data from applications and sends it to the server
Installing Pyroscope on Kubernetes
# Install Pyroscope with Helm
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
helm install pyroscope grafana/pyroscope \
--namespace monitoring \
--create-namespace \
--set pyroscope.extraArgs.storage.backend=filesystem \
--set persistence.enabled=true \
--set persistence.size=50Gi
eBPF Profiling with Grafana Alloy
Using Grafana Alloy (formerly Grafana Agent), you can collect profiles via eBPF without any code changes.
# alloy-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: alloy-config
namespace: monitoring
data:
config.alloy: |
pyroscope.ebpf "instance" {
forward_to = [pyroscope.write.endpoint.receiver]
targets_only = false
default_target = {"service_name" = "unspecified"}
targets = discovery.kubernetes.pods.targets
}
discovery.kubernetes "pods" {
role = "pod"
}
pyroscope.write "endpoint" {
endpoint {
url = "http://pyroscope:4040"
}
}
# alloy-daemonset.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: alloy
namespace: monitoring
spec:
selector:
matchLabels:
app: alloy
template:
metadata:
labels:
app: alloy
spec:
hostPID: true # Required for eBPF
containers:
- name: alloy
image: grafana/alloy:latest
args:
- run
- /etc/alloy/config.alloy
securityContext:
privileged: true
runAsUser: 0
volumeMounts:
- name: config
mountPath: /etc/alloy
- name: sys-kernel
mountPath: /sys/kernel
volumes:
- name: config
configMap:
name: alloy-config
- name: sys-kernel
hostPath:
path: /sys/kernel
SDK Integration by Language
Python (FastAPI)
# pip install pyroscope-io
import pyroscope
pyroscope.configure(
application_name="order-service",
server_address="http://pyroscope:4040",
tags={
"region": "ap-northeast-2",
"env": "production",
},
)
from fastapi import FastAPI
app = FastAPI()
@app.get("/orders/{order_id}")
async def get_order(order_id: str):
# CPU/memory usage of this function is automatically profiled
order = await db.fetch_order(order_id)
return order
Go
package main
import (
"github.com/grafana/pyroscope-go"
"net/http"
)
func main() {
pyroscope.Start(pyroscope.Config{
ApplicationName: "api-gateway",
ServerAddress: "http://pyroscope:4040",
ProfileTypes: []pyroscope.ProfileType{
pyroscope.ProfileCPU,
pyroscope.ProfileAllocObjects,
pyroscope.ProfileAllocSpace,
pyroscope.ProfileInuseObjects,
pyroscope.ProfileInuseSpace,
pyroscope.ProfileGoroutines,
pyroscope.ProfileMutexCount,
pyroscope.ProfileMutexDuration,
pyroscope.ProfileBlockCount,
pyroscope.ProfileBlockDuration,
},
Tags: map[string]string{
"env": "production",
},
})
http.HandleFunc("/health", healthHandler)
http.ListenAndServe(":8080", nil)
}
Java (Spring Boot)
// build.gradle
// implementation 'io.pyroscope:agent:0.13.1'
// application.yml
// pyroscope:
// application-name: user-service
// server-address: http://pyroscope:4040
// format: jfr
import io.pyroscope.javaagent.PyroscopeAgent;
import io.pyroscope.javaagent.config.Config;
@SpringBootApplication
public class UserServiceApplication {
public static void main(String[] args) {
PyroscopeAgent.start(
new Config.Builder()
.setApplicationName("user-service")
.setServerAddress("http://pyroscope:4040")
.setProfilingEvent(EventType.ITIMER)
.setFormat(Format.JFR)
.build()
);
SpringApplication.run(UserServiceApplication.class, args);
}
}
Flame Graph Analysis
How to Read a Flame Graph
In a Flame Graph:
- X-axis: Sample proportion (wider means more time spent in that function)
- Y-axis: Call stack depth (bottom to top)
- Color: Random (for differentiation)
# Query profiles with Pyroscope CLI
pyroscope query \
--app-name order-service \
--from "now-1h" \
--until "now" \
--profile-type cpu
Bottleneck Diagnosis Patterns
CPU Bottleneck Example:
main.handleRequest (40%)
└── db.QueryRow (35%)
└── net/http.(*conn).readRequest (30%)
└── crypto/tls.(*Conn).Read (25%)
Interpretation: TLS handshake consumes 25% of CPU - resolve with connection pooling
Memory Leak Example:
runtime.mallocgc (60%)
└── encoding/json.(*Decoder).Decode (45%)
└── main.processLargePayload (40%)
Interpretation: Massive memory allocation during JSON decoding - switch to a streaming parser
Grafana Dashboard Integration
# Add Pyroscope data source to Grafana
# Settings > Data Sources > Add > Grafana Pyroscope
# URL: http://pyroscope:4040
Useful Grafana Panel Configuration
{
"targets": [
{
"datasource": { "type": "grafana-pyroscope-datasource" },
"profileTypeId": "process_cpu:cpu:nanoseconds:cpu:nanoseconds",
"labelSelector": "{service_name=\"order-service\"}",
"queryType": "profile"
}
]
}
Integrating Traces with Profiling
By integrating Tempo (distributed tracing) with Pyroscope, you can view the profile at the exact moment of a slow trace.
# Add Pyroscope integration to Tempo configuration
# tempo.yaml
overrides:
defaults:
profiles:
pyroscope:
url: http://pyroscope:4040
Real-World Performance Improvement Cases
Case 1: Discovering N+1 Queries
# Profile shows db.fetch_order consuming 80% of CPU
# Flame Graph reveals the same query being called repeatedly
# Before: N+1
async def get_orders_with_items(user_id):
orders = await db.fetch_orders(user_id)
for order in orders:
order.items = await db.fetch_items(order.id) # Called N times!
return orders
# After: Single query with JOIN
async def get_orders_with_items(user_id):
return await db.fetch_orders_with_items(user_id) # Called once
Case 2: Tracking Memory Leaks
// Profile shows runtime.mallocgc continuously increasing
// Confirmed via inuse_space profile
// Before: Allocating new buffer every time
func processRequest(data []byte) {
buf := make([]byte, 1024*1024) // 1MB allocated per request
// ...
}
// After: Reusing buffers with sync.Pool
var bufPool = sync.Pool{
New: func() interface{} {
buf := make([]byte, 1024*1024)
return &buf
},
}
func processRequest(data []byte) {
buf := bufPool.Get().(*[]byte)
defer bufPool.Put(buf)
// ...
}
Profile Comparison (Diff View)
# Compare profiles before and after deployment
pyroscope diff \
--app-name order-service \
--left-from "2026-03-02T10:00:00Z" \
--left-until "2026-03-02T11:00:00Z" \
--right-from "2026-03-03T10:00:00Z" \
--right-until "2026-03-03T11:00:00Z" \
--profile-type cpu
In Grafana, you can also overlay and compare profiles from two time periods in the Explore panel. Red indicates an increase and green indicates a decrease.
Summary
Continuous Profiling is the final piece of the observability puzzle:
- Metrics: Tells you "what" is slow
- Logs: Tells you "what happened"
- Traces: Tells you "where" it is slow
- Profiles: Tells you "why" it is slow (at the code level)
Use the Pyroscope + Grafana combination to trace performance issues in production environments down to the code level.
Quiz: Continuous Profiling Comprehension Check (7 Questions)
Q1. How does Continuous Profiling differ from traditional profiling?
It is always on, continuously collecting in production with only 2-5% overhead, and enables time-range comparisons.
Q2. What are the advantages of eBPF profiling?
It can collect profiles from all processes at the kernel level without any code changes.
Q3. What does the X-axis represent in a Flame Graph?
The proportion of time spent in that function (and its child functions). Wider means more time consumed.
Q4. Why is hostPID: true required in the Grafana Alloy DaemonSet?
eBPF needs access to the host's process information, which requires the host PID namespace.
Q5. Which profile type is suitable for tracking memory leaks?
The inuse_space (currently used memory) profile tracks allocations that increase over time.
Q6. What are the use cases for Diff View?
Comparing profiles before and after deployment to see how new code impacted performance.
Q7. What are the benefits of integrating Traces with Profiling?
You can jump directly from a slow trace to the code-level profile at that exact moment, pinpointing the exact cause of the bottleneck.