- Authors

- Name
- Youngju Kim
- @fjvbn20031
- 1. Overview
- 2. Complete Query Processing Pipeline
- 3. Lexer
- 4. Parser
- 5. AST (Abstract Syntax Tree)
- 6. Query Evaluation Engine
- 7. Function Evaluation
- 8. Vector Matching (Binary Operations)
- 9. Subqueries
- 10. Memory Management
- 11. Native Histogram Processing
- 12. Performance Optimization Techniques
- 13. Error Handling and Limitations
- 14. Summary
1. Overview
PromQL (Prometheus Query Language) is the core query language for Prometheus, a functional expression language for selecting and aggregating time series data. This post analyzes the PromQL engine internals at the source code level.
The PromQL engine code resides in the promql/ package, consisting of the parser, AST (Abstract Syntax Tree), and evaluation engine.
2. Complete Query Processing Pipeline
PromQL String
|
v
+----------+
| Lexer | (tokenization)
+----------+
|
v
+----------+
| Parser | (syntax analysis)
+----------+
|
v
+----------+
| AST | (abstract syntax tree)
+----------+
|
v
+----------+
| Analyzer | (optimization/validation)
+----------+
|
v
+----------+
| Engine | (evaluation/execution)
+----------+
|
v
Result (Matrix/Vector/Scalar)
3. Lexer
3.1 Tokenization Process
The lexer converts a PromQL string into a token stream:
Input: rate(http_requests_total[5m]) > 10
Token stream:
IDENTIFIER("rate")
LEFT_PAREN
IDENTIFIER("http_requests_total")
LEFT_BRACKET
DURATION("5m")
RIGHT_BRACKET
RIGHT_PAREN
GTR
NUMBER(10)
3.2 Token Types
Major token types recognized by the PromQL lexer:
Literals:
NUMBER - integer/float (e.g., 42, 3.14)
STRING - string literal (e.g., "hello")
DURATION - time range (e.g., 5m, 1h, 30s)
Identifiers:
IDENTIFIER - metric names, label names
METRIC_IDENTIFIER - metric identifier
Operators:
ADD, SUB, MUL, DIV, MOD, POW
EQL, NEQ, LSS, GTR, LTE, GTE
AND, OR, UNLESS
Keywords:
BY, WITHOUT, ON, IGNORING
GROUP_LEFT, GROUP_RIGHT
OFFSET, BOOL
SUM, AVG, MIN, MAX, COUNT, ...
3.3 Lexer Implementation
The lexer is implemented as a manual state machine in Go. Key characteristics:
- 1-char lookahead: Examines current and next character to determine token boundaries
- Duration parsing: Numbers followed by time units (ms, s, m, h, d, w, y) are recognized as DURATION tokens
- String escaping: Supports single-quote, double-quote, and backtick string literals
- No comments: PromQL has no comment syntax
4. Parser
4.1 Parser Structure
The PromQL parser uses the Pratt parser (Top-Down Operator Precedence) approach:
Parser flow:
1. Read next token from stream
2. Call prefix parsing function (literals, unary ops, parentheses, etc.)
3. If higher-precedence infix operator exists, perform infix parsing
4. Return AST node
4.2 Operator Precedence
Precedence (lowest first):
1. OR
2. AND, UNLESS
3. ==, !=, <, >, <=, >= (comparison)
4. +, - (addition/subtraction)
5. *, /, % (multiplication/division)
6. ^ (power, right-associative)
7. Unary +, -
4.3 Parsing Rules
Key parsing rules:
Vector Selector Parsing:
When curly braces follow a metric name, parse label matchers
Label matching operators: =, !=, =~, !~
Function Call Parsing:
IDENTIFIER + LEFT_PAREN -> function call
Validate name against registered function list
Verify argument count and types
Range Vector Parsing:
Vector selector + LEFT_BRACKET + DURATION + RIGHT_BRACKET
Example: http_requests_total[5m]
5. AST (Abstract Syntax Tree)
5.1 Node Types
The AST consists of the following node types:
Expr (expression interface)
|
+-- NumberLiteral (numeric literal)
+-- StringLiteral (string literal)
+-- VectorSelector (instant vector selector)
+-- MatrixSelector (range vector selector)
+-- AggregateExpr (aggregation expression)
+-- BinaryExpr (binary operation)
+-- UnaryExpr (unary operation)
+-- Call (function call)
+-- ParenExpr (parenthesized expression)
+-- SubqueryExpr (subquery expression)
+-- StepInvariantExpr (step-invariant expression)
5.2 AST Example
Query: sum(rate(http_requests_total[5m])) by (job)
AggregateExpr (sum, by=[job])
|
+-- Call (rate)
|
+-- MatrixSelector
|
+-- VectorSelector (http_requests_total)
+-- Range: 5m
5.3 VectorSelector Structure
VectorSelector:
Name: "http_requests_total"
LabelMatchers:
- __name__ = "http_requests_total"
- job = "prometheus" (user-specified)
- instance =~ ".*:9090" (regex matching)
Offset: 0s (offset modifier)
Timestamp: nil (@ modifier)
OriginalOffset: 0s
StartOrEnd: 0 (start() or end())
6. Query Evaluation Engine
6.1 Engine Configuration
Engine:
+-- queryable (TSDB storage interface)
+-- maxSamples (maximum sample count limit)
+-- timeout (query timeout)
+-- activeQueries (active query counter)
+-- maxConcurrency (maximum concurrent queries)
6.2 Instant Query vs Range Query
Instant Query: Evaluation at a single point in time
Request: GET /api/v1/query?query=up&time=1609459200
Processing:
1. Generate AST
2. Evaluate once at time=1609459200
3. Result: Vector (single value per series)
Range Query: Step-by-step evaluation over a time range
Request: GET /api/v1/query_range?query=up&start=1609455600&end=1609459200&step=60
Processing:
1. Generate AST
2. Evaluate repeatedly from start to end at step (60s) intervals
3. Each step evaluates like an instant query
4. Result: Matrix (time-indexed value array per series)
6.3 Step Evaluation Mechanism
Each step in a Range Query is an independent instant evaluation:
query: rate(http_requests_total[5m])
start: T0, end: T0+300s, step: 60s
Step evaluations:
T0: rate(samples from T0-5m to T0)
T0+60: rate(samples from T0+60-5m to T0+60)
T0+120: rate(samples from T0+120-5m to T0+120)
T0+180: rate(samples from T0+180-5m to T0+180)
T0+240: rate(samples from T0+240-5m to T0+240)
T0+300: rate(samples from T0+300-5m to T0+300)
6.4 Lookback Delta
The Lookback Delta (default 5 minutes) is the time window within which an instant vector selector searches for samples:
Evaluation time: T
lookback delta: 5m
Sample search range: (T - 5m, T]
Behavior:
1. Find most recent sample within 5 minutes before T
2. Include in result if not a stale marker
3. Exclude series if no sample within 5 minutes
Important: An exact sample at time T is not required
Uses the most recent sample within lookback delta
7. Function Evaluation
7.1 Function Registry
All PromQL functions are registered in a central registry:
Function registration:
Name: "rate"
ArgTypes: [ValueTypeMatrix] (argument types)
ReturnType: ValueTypeVector (return type)
Call: funcRate (actual implementation)
Variadic: 0 (variadic argument count)
7.2 rate() Implementation
Internal operation of rate():
rate(counter[5m]) evaluation:
1. Extract samples from 5-minute range: [s1, s2, s3, ..., sN]
2. Counter reset detection and correction:
- If value decreases, add previous value to cumulative total
3. Calculate corrected total increase:
resultValue = (last - first) + counterCorrection
4. Extrapolation:
- Calculate time ratio beyond range start/end
- durationToStart = first.time - rangeStart
- durationToEnd = rangeEnd - last.time
- Apply extrapolation ratio
5. Per-second rate:
result = extrapolatedIncrease / rangeDurationSeconds
7.3 histogram_quantile() Implementation
histogram_quantile(phi, buckets) evaluation:
1. Sort buckets by le label
2. Collect cumulative count for each bucket
3. Calculate target_count = phi (e.g., 0.95) * total_count
4. Find bucket containing target_count
5. Linear interpolation between bucket boundaries:
bucketStart + (bucketEnd - bucketStart) *
(target - countBelow) / (countInBucket)
Notes:
- le="+Inf" bucket must exist
- Returns NaN if buckets are empty
- Inaccurate if bucket boundaries do not match actual distribution
7.4 Aggregation Function Evaluation
sum by (job) (metric) evaluation:
1. Collect all series of metric
2. Group by job label value
3. Sum values within each group
4. Result series: only job label retained
Implementation:
- Group management via hashmap
- Label set hash used as key
- Exact label comparison on hash collision
8. Vector Matching (Binary Operations)
8.1 One-to-One Matching
metric_a + metric_b
Matching algorithm:
1. Index series from both vectors by label set
2. Find pairs with identical matching label sets
3. Perform operation (addition here)
4. Result series uses left-side label set
Performance:
- Hash join approach with O(n+m) time complexity
- Smaller side built as hash table
8.2 Many-to-One/One-to-Many Matching
metric_a * on(job) group_left metric_b
Processing:
1. Determine matching key with on(job)
2. Index metric_b (one side) by job in hashmap
3. For each series in metric_a (many side):
a. Look up matching item in metric_b by job label
b. Perform operation
c. Retain all labels from metric_a (group_left)
9. Subqueries
9.1 Subquery Processing
max_over_time(rate(http_requests_total[5m])[30m:1m])
Evaluation:
1. Outer function (max_over_time) needs range vector
2. Subquery part: rate(http_requests_total[5m])[30m:1m]
3. Inner evaluation:
- From 30 minutes before current time (T) to T
- Evaluate rate(http_requests_total[5m]) at 1-minute intervals
- Generate 30 instant vector results
4. Convert to range vector
5. max_over_time calculates maximum for each series
9.2 Subquery Optimization
Key optimizations for subquery evaluation:
1. Reuse inner query range vectors (same series)
2. Step alignment: align subquery step with outer step
3. Use global evaluation_interval when resolution not specified
10. Memory Management
10.1 Sample Counting
--query.max-samples (default 50,000,000)
Counting approach:
- Accumulate samples loaded/generated at each step
- For range vectors, count all samples within range
- Immediately abort query when limit exceeded
10.2 Query Concurrency Control
--query.max-concurrency (default 20)
Implementation:
- Semaphore limits concurrent queries
- New queries wait for semaphore acquisition
- Return error if acquisition fails within timeout
10.3 Point Pool
The PromQL engine uses point pools to minimize memory allocation:
Point pool operation:
1. Allocate point slices from pool at query start
2. Reuse between step evaluations
3. Return to pool at query completion
4. Uses sync.Pool to reduce GC pressure
11. Native Histogram Processing
11.1 Native Histogram Aggregation
Native Histograms have different processing paths than float samples:
1. Align histogram schemas (resolution)
2. Merge at bucket level
3. Auto-convert schemas when needed during aggregation
sum(native_histogram_metric) evaluation:
1. Collect histograms from each series
2. Unify to lower resolution if schemas differ
3. Sum per-bucket counts
4. Sum sum and count fields
12. Performance Optimization Techniques
12.1 Series Set Merging
When same series exist across multiple blocks:
1. Search series set in each block
2. Consolidate via merge sort
3. Filter at chunk level by time range
4. Deduplicate
12.2 StepInvariantExpr Optimization
Detect expressions returning identical results regardless of step:
- Numeric literals
- Scalar functions without offset
- Certain aggregation results
When detected:
- Evaluate once and copy result to all steps
- Eliminate unnecessary repeated evaluation
12.3 Query Preprocessing
Optimization passes:
1. Constant folding: 2 + 3 -> 5
2. Label matching optimization: generate index hints
3. Remove unnecessary label operations
4. Early return for empty results
13. Error Handling and Limitations
13.1 Common Error Conditions
1. "query processing would load too many samples"
- max-samples limit exceeded
- Fix: reduce query range or increase limit
2. "query timed out in expression evaluation"
- query.timeout exceeded
- Fix: optimize query or increase timeout
3. "found duplicate series for the match group"
- Many-to-one relationship without group_left/right
- Fix: add group_left or group_right
4. "many-to-many matching not allowed"
- Many-on-both-sides matching not supported
- Fix: aggregate one side to reduce to one
13.2 Known Limitations
1. Limited join capability: no SQL-style joins
2. Limited string operations: no string comparison/manipulation
3. No sort guarantee: series order in results is non-deterministic
4. Subquery nesting limits: excessive nesting risks memory explosion
14. Summary
The PromQL engine efficiently processes a functional query language specialized for time series data. Key aspects include clean syntax analysis via Pratt parser, step-based range query evaluation, flexible sample lookup via lookback delta, and memory efficiency through point pools and StepInvariantExpr optimization.
In the next post, we will analyze the Prometheus service discovery mechanism, covering the kubernetes_sd watch mechanism, relabeling internals, and target lifecycle.