Split View: PromQL 엔진 내부 구조: 파서부터 실행 엔진까지

PromQL 엔진 내부 구조: 파서부터 실행 엔진까지

1. 개요
2. 전체 쿼리 처리 파이프라인
3. 렉서 (Lexer)
4. 파서 (Parser)
5. AST (Abstract Syntax Tree)
6. 쿼리 평가 엔진
7. 함수 평가
8. 벡터 매칭 (Binary Operations)
- 8.1 일대일 매칭
- 8.2 다대일/일대다 매칭
9. 서브쿼리
- 9.1 서브쿼리 처리
- 9.2 서브쿼리 최적화
10. 메모리 관리
11. Native Histogram 처리
- 11.1 Native Histogram 집계
- 11.2 histogram_quantile과 Native Histogram
12. 성능 최적화 기법
13. 에러 처리와 한계
- 13.1 일반적인 에러 상황
- 13.2 알려진 제한사항
14. 정리

1. 개요

PromQL(Prometheus Query Language)은 Prometheus의 핵심 쿼리 언어로, 시계열 데이터를 선택하고 집계하기 위한 함수형 표현식 언어입니다. 이 글에서는 PromQL 엔진의 내부 구조를 소스코드 레벨에서 분석합니다.

PromQL 엔진의 코드는 promql/ 패키지에 위치하며, 크게 파서(parser), AST(Abstract Syntax Tree), 평가 엔진(engine)으로 구성됩니다.

2. 전체 쿼리 처리 파이프라인

PromQL 문자열
    |
    v
+----------+
| Lexer    |  (토큰화)
+----------+
    |
    v
+----------+
| Parser   |  (구문 분석)
+----------+
    |
    v
+----------+
| AST      |  (추상 구문 트리)
+----------+
    |
    v
+----------+
| Analyzer |  (최적화/검증)
+----------+
    |
    v
+----------+
| Engine   |  (평가/실행)
+----------+
    |
    v
결과 (Matrix/Vector/Scalar)

3. 렉서 (Lexer)

3.1 토큰화 과정

렉서는 PromQL 문자열을 토큰 스트림으로 변환합니다:

입력: rate(http_requests_total[5m]) > 10

토큰 스트림:
  IDENTIFIER("rate")
  LEFT_PAREN
  IDENTIFIER("http_requests_total")
  LEFT_BRACKET
  DURATION("5m")
  RIGHT_BRACKET
  RIGHT_PAREN
  GTR
  NUMBER(10)

3.2 토큰 타입

PromQL 렉서가 인식하는 주요 토큰 타입:

리터럴:
  NUMBER      - 정수/부동소수점 (예: 42, 3.14)
  STRING      - 문자열 리터럴 (예: "hello")
  DURATION    - 시간 범위 (예: 5m, 1h, 30s)

식별자:
  IDENTIFIER  - 메트릭 이름, 레이블 이름
  METRIC_IDENTIFIER - 메트릭 식별자

연산자:
  ADD, SUB, MUL, DIV, MOD, POW
  EQL, NEQ, LSS, GTR, LTE, GTE
  AND, OR, UNLESS

키워드:
  BY, WITHOUT, ON, IGNORING
  GROUP_LEFT, GROUP_RIGHT
  OFFSET, BOOL
  SUM, AVG, MIN, MAX, COUNT, ...

3.3 렉서 구현 특성

렉서는 Go의 수동 상태 머신 방식으로 구현되어 있습니다. 핵심 특성:

1-char lookahead: 현재 문자와 다음 문자를 확인하여 토큰 경계를 결정
duration 파싱: 숫자 뒤에 시간 단위(ms, s, m, h, d, w, y)가 오면 DURATION 토큰으로 인식
문자열 이스케이프: 작은따옴표, 큰따옴표, 백틱 문자열 리터럴 지원
주석 미지원: PromQL에는 주석 문법이 없음

4. 파서 (Parser)

4.1 파서 구조

PromQL 파서는 Pratt 파서(Top-Down Operator Precedence) 방식을 사용합니다:

파서 동작 흐름:
1. 토큰 스트림에서 다음 토큰 읽기
2. prefix 파싱 함수 호출 (리터럴, 단항 연산, 괄호 등)
3. 현재 우선순위보다 높은 infix 연산자가 있으면 infix 파싱
4. AST 노드 반환

4.2 연산자 우선순위

우선순위 (낮은 것부터):
1. OR
2. AND, UNLESS
3. ==, !=, <, >, <=, >= (비교)
4. +, - (가감)
5. *, /, % (곱나눗셈)
6. ^ (거듭제곱, 우측 결합)
7. 단항 +, -

4.3 파싱 규칙

주요 파싱 규칙:

벡터 셀렉터 파싱:

메트릭이름 뒤에 중괄호가 오면 레이블 매칭 파싱
예: http_requests_total 뒤에 중괄호가 있으면 레이블 셀렉터 파싱

레이블 매칭 연산자: =, !=, =~, !~

함수 호출 파싱:

IDENTIFIER + LEFT_PAREN -> 함수 호출
등록된 함수 목록에서 이름 확인
인수 개수와 타입 검증

범위 벡터 파싱:

벡터 셀렉터 + LEFT_BRACKET + DURATION + RIGHT_BRACKET
예: http_requests_total[5m]

5. AST (Abstract Syntax Tree)

5.1 노드 타입

AST는 다음과 같은 노드 타입으로 구성됩니다:

Expr (표현식 인터페이스)
  |
  +-- NumberLiteral        (숫자 리터럴)
  +-- StringLiteral        (문자열 리터럴)
  +-- VectorSelector       (즉시 벡터 셀렉터)
  +-- MatrixSelector       (범위 벡터 셀렉터)
  +-- AggregateExpr        (집계 표현식)
  +-- BinaryExpr           (이항 연산 표현식)
  +-- UnaryExpr            (단항 연산 표현식)
  +-- Call                 (함수 호출)
  +-- ParenExpr            (괄호 표현식)
  +-- SubqueryExpr         (서브쿼리 표현식)
  +-- StepInvariantExpr    (step 불변 표현식)

5.2 AST 예시

쿼리: sum(rate(http_requests_total[5m])) by (job)

AggregateExpr (sum, by=[job])
  |
  +-- Call (rate)
        |
        +-- MatrixSelector
              |
              +-- VectorSelector (http_requests_total)
              +-- Range: 5m

5.3 VectorSelector 구조

VectorSelector:
  Name: "http_requests_total"
  LabelMatchers:
    - __name__ = "http_requests_total"
    - job = "prometheus"       (사용자 지정)
    - instance =~ ".*:9090"    (정규식 매칭)
  Offset: 0s                   (offset 수정자)
  Timestamp: nil               (@ 수정자)
  OriginalOffset: 0s
  StartOrEnd: 0                (start() 또는 end())

6. 쿼리 평가 엔진

6.1 엔진 구성

Engine:
  +-- queryable      (TSDB 스토리지 인터페이스)
  +-- maxSamples     (최대 샘플 수 제한)
  +-- timeout        (쿼리 타임아웃)
  +-- activeQueries  (활성 쿼리 카운터)
  +-- maxConcurrency (최대 동시 쿼리 수)

6.2 Instant Query vs Range Query

Instant Query: 단일 시점에서의 평가

요청: GET /api/v1/query?query=up&time=1609459200

처리:
1. AST 생성
2. time=1609459200에서 한 번 평가
3. 결과: Vector (각 시계열에 대한 단일 값)

Range Query: 시간 범위에서의 step별 평가

요청: GET /api/v1/query_range?query=up&start=1609455600&end=1609459200&step=60

처리:
1. AST 생성
2. start부터 end까지 step(60초) 간격으로 반복 평가
3. 각 step에서 instant query처럼 평가
4. 결과: Matrix (각 시계열에 대한 시간별 값 배열)

6.3 Step 평가 메커니즘

Range Query의 각 step은 독립적인 instant evaluation입니다:

query: rate(http_requests_total[5m])
start: T0, end: T0+300s, step: 60s

Step 평가:
  T0:      rate(samples from T0-5m to T0)
  T0+60:   rate(samples from T0+60-5m to T0+60)
  T0+120:  rate(samples from T0+120-5m to T0+120)
  T0+180:  rate(samples from T0+180-5m to T0+180)
  T0+240:  rate(samples from T0+240-5m to T0+240)
  T0+300:  rate(samples from T0+300-5m to T0+300)

6.4 Lookback Delta

Lookback Delta(기본 5분)는 instant vector 셀렉터가 샘플을 찾는 시간 윈도우입니다:

평가 시점: T
lookback delta: 5m

샘플 검색 범위: (T - 5m, T]

동작:
1. T 이전 5분 이내의 가장 최근 샘플을 찾음
2. 해당 샘플이 stale marker가 아니면 결과에 포함
3. 5분 이내에 샘플이 없으면 해당 시계열 제외

중요: 정확히 T 시점의 샘플이 없어도 됨
      lookback delta 내의 가장 최근 샘플 사용

7. 함수 평가

7.1 함수 등록 시스템

모든 PromQL 함수는 중앙 레지스트리에 등록됩니다:

함수 등록 정보:
  Name:       "rate"
  ArgTypes:   [ValueTypeMatrix]     (인수 타입)
  ReturnType: ValueTypeVector       (반환 타입)
  Call:       funcRate              (실제 구현 함수)
  Variadic:   0                     (가변 인수 수)

7.2 rate() 함수 구현

rate()의 내부 동작:

rate(counter[5m]) 평가:

1. 5분 범위의 샘플 추출: [s1, s2, s3, ..., sN]
2. Counter 리셋 감지 및 보정:
   - 값이 감소하면 이전 값을 누적에 더함
3. 보정된 총 증가량 계산:
   resultValue = (last - first) + counterCorrection
4. 외삽(extrapolation):
   - 범위 시작/끝을 넘어서는 시간 비율 계산
   - durationToStart = first.time - rangeStart
   - durationToEnd = rangeEnd - last.time
   - 외삽 비율 적용
5. 초당 증가율:
   result = extrapolatedIncrease / rangeDurationSeconds

7.3 histogram_quantile() 구현

histogram_quantile(phi, buckets) 평가:

1. le 레이블로 버킷 정렬
2. 각 버킷의 누적 카운트 수집
3. phi(예: 0.95) * total_count = target_count 계산
4. target_count를 포함하는 버킷 찾기
5. 해당 버킷 경계 사이에서 선형 보간:
   bucketStart + (bucketEnd - bucketStart) *
   (target - countBelow) / (countInBucket)

주의사항:
- le="+Inf" 버킷은 반드시 있어야 함
- 버킷이 비어있으면 NaN 반환
- 버킷 경계가 실제 분포와 맞지 않으면 부정확

7.4 집계 함수 평가

sum by (job) (metric) 평가:

1. metric의 모든 시계열 수집
2. job 레이블 값별로 그룹핑
3. 각 그룹 내에서 값 합산
4. 결과 시계열: job 레이블만 유지

내부 구현:
- 해시맵으로 그룹 관리
- 레이블 셋의 해시를 키로 사용
- 충돌 시 정확한 레이블 비교

8. 벡터 매칭 (Binary Operations)

8.1 일대일 매칭

metric_a + metric_b

매칭 알고리즘:
1. 양쪽 벡터의 시계열을 레이블 셋으로 인덱싱
2. 매칭 레이블 셋이 동일한 쌍을 찾음
3. 연산 수행 (여기서는 값 더하기)
4. 결과 시계열에 왼쪽 레이블 셋 사용

성능:
- 해시 조인 방식으로 O(n+m) 시간 복잡도
- 작은 쪽을 해시 테이블로 구성

8.2 다대일/일대다 매칭

metric_a * on(job) group_left metric_b

처리:
1. on(job)으로 매칭 키 결정
2. metric_b(one 쪽)를 job별 해시맵에 인덱싱
3. metric_a(many 쪽)의 각 시계열에 대해:
   a. job 레이블로 metric_b에서 매칭 항목 검색
   b. 연산 수행
   c. metric_a의 모든 레이블 유지 (group_left)

9. 서브쿼리

9.1 서브쿼리 처리

max_over_time(rate(http_requests_total[5m])[30m:1m])

평가 과정:
1. 외부 함수(max_over_time)가 range vector 필요
2. 서브쿼리 부분: rate(http_requests_total[5m])[30m:1m]
3. 내부 평가:
   - 현재 시점(T)에서 30분 전부터 T까지
   - 1분 간격으로 rate(http_requests_total[5m]) 평가
   - 30개의 instant vector 결과 생성
4. 이를 range vector로 변환
5. max_over_time이 각 시계열의 최대값 계산

9.2 서브쿼리 최적화

서브쿼리 평가의 핵심 최적화:
1. 내부 쿼리의 range vector를 재사용 (동일 시계열)
2. step alignment: 서브쿼리의 step을 외부 step에 정렬
3. resolution이 지정되지 않으면 global evaluation_interval 사용

10. 메모리 관리

10.1 샘플 카운팅

--query.max-samples (기본 50,000,000)

카운팅 방식:
- 각 step에서 로드/생성된 샘플 수 누적
- range vector의 경우 범위 내 모든 샘플 카운트
- 제한 초과 시 쿼리 즉시 중단

10.2 쿼리 동시성 제어

--query.max-concurrency (기본 20)

구현:
- semaphore(세마포어)로 동시 쿼리 수 제한
- 새 쿼리가 세마포어 획득 대기
- 타임아웃 내 획득 실패 시 에러 반환

10.3 포인트 풀 (Point Pool)

PromQL 엔진은 메모리 할당을 최소화하기 위해 포인트 풀을 사용합니다:

포인트 풀 동작:
1. 쿼리 시작 시 포인트 슬라이스 풀에서 할당
2. step 평가 간 재사용
3. 쿼리 완료 시 풀에 반환
4. sync.Pool을 사용하여 GC 부담 감소

11. Native Histogram 처리

11.1 Native Histogram 집계

Native Histogram은 기존 float 샘플과 다른 처리 경로:
1. 히스토그램 스키마(resolution) 정렬
2. 버킷 단위 병합
3. 집계 시 스키마 변환 필요 시 자동 수행

sum(native_histogram_metric) 평가:
1. 각 시계열의 히스토그램 수집
2. 스키마가 다르면 더 낮은 해상도로 통일
3. 버킷별 카운트 합산
4. sum, count 합산

11.2 histogram_quantile과 Native Histogram

Native Histogram에서의 quantile 계산:
1. 지수 버킷 경계 자동 결정
2. 기존보다 정확한 보간 가능
3. le 레이블 불필요 (내부 구조에서 직접 계산)

12. 성능 최적화 기법

12.1 Series Set 병합

여러 블록에 걸쳐 동일 시계열이 존재할 때:
1. 각 블록에서 시계열 집합 검색
2. 병합 정렬(merge sort)으로 통합
3. 청크 레벨에서 시간 범위로 필터링
4. 중복 제거

12.2 StepInvariantExpr 최적화

step에 관계없이 동일한 결과를 반환하는 표현식 감지:
- 숫자 리터럴
- offset이 없는 스칼라 함수
- 특정 집계 결과

감지된 경우:
- 한 번만 평가하고 모든 step에 결과 복사
- 불필요한 반복 평가 제거

12.3 쿼리 전처리

최적화 패스:
1. 상수 접기(constant folding): 2 + 3 -> 5
2. 레이블 매칭 최적화: 인덱스 힌트 생성
3. 불필요한 레이블 연산 제거
4. 비어있는 결과 조기 반환

13. 에러 처리와 한계

13.1 일반적인 에러 상황

1. "query processing would load too many samples"
   - max-samples 제한 초과
   - 해결: 쿼리 범위 축소 또는 제한 증가

2. "query timed out in expression evaluation"
   - query.timeout 초과
   - 해결: 쿼리 최적화 또는 타임아웃 증가

3. "found duplicate series for the match group"
   - 벡터 매칭에서 다대일 관계인데 group_left/right 미지정
   - 해결: group_left 또는 group_right 추가

4. "many-to-many matching not allowed"
   - 양쪽 모두 다수인 매칭은 지원되지 않음
   - 해결: 집계를 통해 한쪽을 일(one)로 축소

13.2 알려진 제한사항

1. 조인 기능 제한: SQL 스타일 조인 불가
2. 문자열 연산 제한: 문자열 비교/조작 불가
3. 정렬 보장 없음: 결과의 시계열 순서가 비결정적
4. 서브쿼리 중첩 제한: 과도한 중첩은 메모리 폭발 위험

14. 정리

PromQL 엔진은 시계열 데이터에 특화된 함수형 쿼리 언어를 효율적으로 처리하는 시스템입니다. Pratt 파서를 활용한 깔끔한 구문 분석, step 기반의 range query 평가, lookback delta를 이용한 유연한 샘플 검색, 그리고 포인트 풀과 StepInvariantExpr 최적화를 통한 메모리 효율성이 핵심입니다.

다음 글에서는 Prometheus의 서비스 디스커버리 메커니즘을 분석합니다. kubernetes_sd의 watch 메커니즘, relabeling의 동작 원리, 타겟 생명주기를 살펴볼 예정입니다.

[Prometheus] PromQL Engine Internals: From Parser to Execution Engine

1. Overview
2. Complete Query Processing Pipeline
3. Lexer
4. Parser
5. AST (Abstract Syntax Tree)
6. Query Evaluation Engine
7. Function Evaluation
8. Vector Matching (Binary Operations)
- 8.1 One-to-One Matching
- 8.2 Many-to-One/One-to-Many Matching
9. Subqueries
- 9.1 Subquery Processing
- 9.2 Subquery Optimization
10. Memory Management
11. Native Histogram Processing
- 11.1 Native Histogram Aggregation
12. Performance Optimization Techniques
13. Error Handling and Limitations
- 13.1 Common Error Conditions
- 13.2 Known Limitations
14. Summary

1. Overview

PromQL (Prometheus Query Language) is the core query language for Prometheus, a functional expression language for selecting and aggregating time series data. This post analyzes the PromQL engine internals at the source code level.

The PromQL engine code resides in the promql/ package, consisting of the parser, AST (Abstract Syntax Tree), and evaluation engine.

2. Complete Query Processing Pipeline

PromQL String
    |
    v
+----------+
| Lexer    |  (tokenization)
+----------+
    |
    v
+----------+
| Parser   |  (syntax analysis)
+----------+
    |
    v
+----------+
| AST      |  (abstract syntax tree)
+----------+
    |
    v
+----------+
| Analyzer |  (optimization/validation)
+----------+
    |
    v
+----------+
| Engine   |  (evaluation/execution)
+----------+
    |
    v
Result (Matrix/Vector/Scalar)

3. Lexer

3.1 Tokenization Process

The lexer converts a PromQL string into a token stream:

Input: rate(http_requests_total[5m]) > 10

Token stream:
  IDENTIFIER("rate")
  LEFT_PAREN
  IDENTIFIER("http_requests_total")
  LEFT_BRACKET
  DURATION("5m")
  RIGHT_BRACKET
  RIGHT_PAREN
  GTR
  NUMBER(10)

3.2 Token Types

Major token types recognized by the PromQL lexer:

Literals:
  NUMBER      - integer/float (e.g., 42, 3.14)
  STRING      - string literal (e.g., "hello")
  DURATION    - time range (e.g., 5m, 1h, 30s)

Identifiers:
  IDENTIFIER  - metric names, label names
  METRIC_IDENTIFIER - metric identifier

Operators:
  ADD, SUB, MUL, DIV, MOD, POW
  EQL, NEQ, LSS, GTR, LTE, GTE
  AND, OR, UNLESS

Keywords:
  BY, WITHOUT, ON, IGNORING
  GROUP_LEFT, GROUP_RIGHT
  OFFSET, BOOL
  SUM, AVG, MIN, MAX, COUNT, ...

3.3 Lexer Implementation

The lexer is implemented as a manual state machine in Go. Key characteristics:

1-char lookahead: Examines current and next character to determine token boundaries
Duration parsing: Numbers followed by time units (ms, s, m, h, d, w, y) are recognized as DURATION tokens
String escaping: Supports single-quote, double-quote, and backtick string literals
No comments: PromQL has no comment syntax

4. Parser

4.1 Parser Structure

The PromQL parser uses the Pratt parser (Top-Down Operator Precedence) approach:

Parser flow:
1. Read next token from stream
2. Call prefix parsing function (literals, unary ops, parentheses, etc.)
3. If higher-precedence infix operator exists, perform infix parsing
4. Return AST node

4.2 Operator Precedence

Precedence (lowest first):
1. OR
2. AND, UNLESS
3. ==, !=, <, >, <=, >= (comparison)
4. +, - (addition/subtraction)
5. *, /, % (multiplication/division)
6. ^ (power, right-associative)
7. Unary +, -

4.3 Parsing Rules

Key parsing rules:

Vector Selector Parsing:

When curly braces follow a metric name, parse label matchers
Label matching operators: =, !=, =~, !~

Function Call Parsing:

IDENTIFIER + LEFT_PAREN -> function call
Validate name against registered function list
Verify argument count and types

Range Vector Parsing:

Vector selector + LEFT_BRACKET + DURATION + RIGHT_BRACKET
Example: http_requests_total[5m]

5. AST (Abstract Syntax Tree)

5.1 Node Types

The AST consists of the following node types:

Expr (expression interface)
  |
  +-- NumberLiteral        (numeric literal)
  +-- StringLiteral        (string literal)
  +-- VectorSelector       (instant vector selector)
  +-- MatrixSelector       (range vector selector)
  +-- AggregateExpr        (aggregation expression)
  +-- BinaryExpr           (binary operation)
  +-- UnaryExpr            (unary operation)
  +-- Call                 (function call)
  +-- ParenExpr            (parenthesized expression)
  +-- SubqueryExpr         (subquery expression)
  +-- StepInvariantExpr    (step-invariant expression)

5.2 AST Example

Query: sum(rate(http_requests_total[5m])) by (job)

AggregateExpr (sum, by=[job])
  |
  +-- Call (rate)
        |
        +-- MatrixSelector
              |
              +-- VectorSelector (http_requests_total)
              +-- Range: 5m

5.3 VectorSelector Structure

VectorSelector:
  Name: "http_requests_total"
  LabelMatchers:
    - __name__ = "http_requests_total"
    - job = "prometheus"       (user-specified)
    - instance =~ ".*:9090"    (regex matching)
  Offset: 0s                   (offset modifier)
  Timestamp: nil               (@ modifier)
  OriginalOffset: 0s
  StartOrEnd: 0                (start() or end())

6. Query Evaluation Engine

6.1 Engine Configuration

Engine:
  +-- queryable      (TSDB storage interface)
  +-- maxSamples     (maximum sample count limit)
  +-- timeout        (query timeout)
  +-- activeQueries  (active query counter)
  +-- maxConcurrency (maximum concurrent queries)

6.2 Instant Query vs Range Query

Instant Query: Evaluation at a single point in time

Request: GET /api/v1/query?query=up&time=1609459200

Processing:
1. Generate AST
2. Evaluate once at time=1609459200
3. Result: Vector (single value per series)

Range Query: Step-by-step evaluation over a time range

Request: GET /api/v1/query_range?query=up&start=1609455600&end=1609459200&step=60

Processing:
1. Generate AST
2. Evaluate repeatedly from start to end at step (60s) intervals
3. Each step evaluates like an instant query
4. Result: Matrix (time-indexed value array per series)

6.3 Step Evaluation Mechanism

Each step in a Range Query is an independent instant evaluation:

query: rate(http_requests_total[5m])
start: T0, end: T0+300s, step: 60s

Step evaluations:
  T0:      rate(samples from T0-5m to T0)
  T0+60:   rate(samples from T0+60-5m to T0+60)
  T0+120:  rate(samples from T0+120-5m to T0+120)
  T0+180:  rate(samples from T0+180-5m to T0+180)
  T0+240:  rate(samples from T0+240-5m to T0+240)
  T0+300:  rate(samples from T0+300-5m to T0+300)

6.4 Lookback Delta

The Lookback Delta (default 5 minutes) is the time window within which an instant vector selector searches for samples:

Evaluation time: T
lookback delta: 5m

Sample search range: (T - 5m, T]

Behavior:
1. Find most recent sample within 5 minutes before T
2. Include in result if not a stale marker
3. Exclude series if no sample within 5 minutes

Important: An exact sample at time T is not required
           Uses the most recent sample within lookback delta

7. Function Evaluation

7.1 Function Registry

All PromQL functions are registered in a central registry:

Function registration:
  Name:       "rate"
  ArgTypes:   [ValueTypeMatrix]     (argument types)
  ReturnType: ValueTypeVector       (return type)
  Call:       funcRate              (actual implementation)
  Variadic:   0                     (variadic argument count)

7.2 rate() Implementation

Internal operation of rate():

rate(counter[5m]) evaluation:

1. Extract samples from 5-minute range: [s1, s2, s3, ..., sN]
2. Counter reset detection and correction:
   - If value decreases, add previous value to cumulative total
3. Calculate corrected total increase:
   resultValue = (last - first) + counterCorrection
4. Extrapolation:
   - Calculate time ratio beyond range start/end
   - durationToStart = first.time - rangeStart
   - durationToEnd = rangeEnd - last.time
   - Apply extrapolation ratio
5. Per-second rate:
   result = extrapolatedIncrease / rangeDurationSeconds

7.3 histogram_quantile() Implementation

histogram_quantile(phi, buckets) evaluation:

1. Sort buckets by le label
2. Collect cumulative count for each bucket
3. Calculate target_count = phi (e.g., 0.95) * total_count
4. Find bucket containing target_count
5. Linear interpolation between bucket boundaries:
   bucketStart + (bucketEnd - bucketStart) *
   (target - countBelow) / (countInBucket)

Notes:
- le="+Inf" bucket must exist
- Returns NaN if buckets are empty
- Inaccurate if bucket boundaries do not match actual distribution

7.4 Aggregation Function Evaluation

sum by (job) (metric) evaluation:

1. Collect all series of metric
2. Group by job label value
3. Sum values within each group
4. Result series: only job label retained

Implementation:
- Group management via hashmap
- Label set hash used as key
- Exact label comparison on hash collision

8. Vector Matching (Binary Operations)

8.1 One-to-One Matching

metric_a + metric_b

Matching algorithm:
1. Index series from both vectors by label set
2. Find pairs with identical matching label sets
3. Perform operation (addition here)
4. Result series uses left-side label set

Performance:
- Hash join approach with O(n+m) time complexity
- Smaller side built as hash table

8.2 Many-to-One/One-to-Many Matching

metric_a * on(job) group_left metric_b

Processing:
1. Determine matching key with on(job)
2. Index metric_b (one side) by job in hashmap
3. For each series in metric_a (many side):
   a. Look up matching item in metric_b by job label
   b. Perform operation
   c. Retain all labels from metric_a (group_left)

9. Subqueries

9.1 Subquery Processing

max_over_time(rate(http_requests_total[5m])[30m:1m])

Evaluation:
1. Outer function (max_over_time) needs range vector
2. Subquery part: rate(http_requests_total[5m])[30m:1m]
3. Inner evaluation:
   - From 30 minutes before current time (T) to T
   - Evaluate rate(http_requests_total[5m]) at 1-minute intervals
   - Generate 30 instant vector results
4. Convert to range vector
5. max_over_time calculates maximum for each series

9.2 Subquery Optimization

Key optimizations for subquery evaluation:
1. Reuse inner query range vectors (same series)
2. Step alignment: align subquery step with outer step
3. Use global evaluation_interval when resolution not specified

10. Memory Management

10.1 Sample Counting

--query.max-samples (default 50,000,000)

Counting approach:
- Accumulate samples loaded/generated at each step
- For range vectors, count all samples within range
- Immediately abort query when limit exceeded

10.2 Query Concurrency Control

--query.max-concurrency (default 20)

Implementation:
- Semaphore limits concurrent queries
- New queries wait for semaphore acquisition
- Return error if acquisition fails within timeout

10.3 Point Pool

The PromQL engine uses point pools to minimize memory allocation:

Point pool operation:
1. Allocate point slices from pool at query start
2. Reuse between step evaluations
3. Return to pool at query completion
4. Uses sync.Pool to reduce GC pressure

11. Native Histogram Processing

11.1 Native Histogram Aggregation

Native Histograms have different processing paths than float samples:
1. Align histogram schemas (resolution)
2. Merge at bucket level
3. Auto-convert schemas when needed during aggregation

sum(native_histogram_metric) evaluation:
1. Collect histograms from each series
2. Unify to lower resolution if schemas differ
3. Sum per-bucket counts
4. Sum sum and count fields

12. Performance Optimization Techniques

12.1 Series Set Merging

When same series exist across multiple blocks:
1. Search series set in each block
2. Consolidate via merge sort
3. Filter at chunk level by time range
4. Deduplicate

12.2 StepInvariantExpr Optimization

Detect expressions returning identical results regardless of step:
- Numeric literals
- Scalar functions without offset
- Certain aggregation results

When detected:
- Evaluate once and copy result to all steps
- Eliminate unnecessary repeated evaluation

12.3 Query Preprocessing

Optimization passes:
1. Constant folding: 2 + 3 -> 5
2. Label matching optimization: generate index hints
3. Remove unnecessary label operations
4. Early return for empty results

13. Error Handling and Limitations

13.1 Common Error Conditions

1. "query processing would load too many samples"
   - max-samples limit exceeded
   - Fix: reduce query range or increase limit

2. "query timed out in expression evaluation"
   - query.timeout exceeded
   - Fix: optimize query or increase timeout

3. "found duplicate series for the match group"
   - Many-to-one relationship without group_left/right
   - Fix: add group_left or group_right

4. "many-to-many matching not allowed"
   - Many-on-both-sides matching not supported
   - Fix: aggregate one side to reduce to one

13.2 Known Limitations

1. Limited join capability: no SQL-style joins
2. Limited string operations: no string comparison/manipulation
3. No sort guarantee: series order in results is non-deterministic
4. Subquery nesting limits: excessive nesting risks memory explosion

14. Summary

The PromQL engine efficiently processes a functional query language specialized for time series data. Key aspects include clean syntax analysis via Pratt parser, step-based range query evaluation, flexible sample lookup via lookback delta, and memory efficiency through point pools and StepInvariantExpr optimization.

In the next post, we will analyze the Prometheus service discovery mechanism, covering the kubernetes_sd watch mechanism, relabeling internals, and target lifecycle.