Chaos and Order

💡 왼쪽 원문을 읽으면서 오른쪽에 따라 써보세요. Tab 키로 힌트를 받을 수 있습니다.

원문 렌더가 준비되기 전까지 텍스트 가이드로 표시합니다.

Prologue — Why time series needs its own database

The fantasy a new backend engineer meets in 2026: "Metrics? Just add a timestamp column to Postgres and INSERT."

Three months later: 100k metrics per second arrive, Postgres dies on disk I/O, the index fattens by 10GB a day, a single `SELECT avg(value) FROM metrics WHERE time > now() - interval '1 hour' GROUP BY tag` takes 90 seconds, and 30 days of retention overflows a single NVMe.

**Time-series data is fundamentally different from typical OLTP data.** Four differences matter.

- **Append-heavy** — 99.9% INSERT, almost no UPDATE. Every new row is always "now."

- **Time-ordered** — data arrives sorted on the time axis, and queries almost always carry a time range.

- **Downsample-able** — last hour at 1-second resolution, yesterday at 1-minute, last year at 1-hour. It's fine to drop precision as time passes.

- **Retention policies** — auto-delete after 30 days. Keep only 365-day rollups.

The class of products that exploit these four traits is called a **time-series database (TSDB)**. The same 1 trillion rows of metrics need 10TB in Postgres but fit in 300GB in InfluxDB or ClickHouse. **30x compression, 100x query speed** — that is why TSDBs exist.

This post dissects the 2026 TSDB stack: InfluxDB 3.0's Rust rewrite, TimescaleDB hypertables, ClickHouse MergeTree, Prometheus 3.0, VictoriaMetrics 1.110+, Grafana Mimir · Cortex · Thanos, TDengine · GreptimeDB.

1. The five traits of time-series data

**1. Time-ordered append.** Virtually every write arrives with a monotonically increasing timestamp. Sorted data compresses well, and an LSM-tree or column-store fits better than a B-tree.

**2. High cardinality.** "100k metrics per second" really means an explosion of label combinations like (host, region, container, pod, namespace). Five labels with 100 values each gives a cardinality of 10^10 — TSDB's number-one scaling problem.

**3. Compression-friendly.** Adjacent values resemble each other. CPU utilization wobbles around 73.2, 73.3, 73.1. Combining Gorilla XOR encoding, delta-of-delta, and ZSTD yields 30–50x compression.

**4. Range queries dominate.** "Average over the last hour," "max between 9–10 yesterday," "p95 this month" — time range plus aggregate covers 99% of patterns. Point lookups are rare.

**5. Downsample-able.** Rolling 1-second data up to 1-minute, 1-hour, or 1-day buckets and dropping the originals barely loses analytical fidelity. That is why **continuous aggregates** or **materialized views** are headline features.

How well a TSDB handles these five traits is what differentiates it from the next one.

2. The TSDB market map — six camps

The 2026 TSDB landscape divides into six camps.

**1. Influx camp.** InfluxDB 3.0/3.1 (Rust + Arrow + Parquet) and the legacy InfluxDB 1.x/2.x (TSM + Flux). The oldest dedicated TSDB lineage.

**2. Postgres extension camp.** TimescaleDB 2.18, from Timescale Inc., rebranded to **TigerData** in 2025. A hybrid of relational and time-series.

**3. ClickHouse camp.** ClickHouse 25.x — strictly an OLAP database, yet the most widely deployed "unofficial" TSDB. Born at Yandex, commercialized by ClickHouse Inc.

**4. Prometheus ecosystem.** Prometheus 3.0 (CNCF), VictoriaMetrics 1.110+, Grafana Mimir, Cortex, Thanos. Specialized for monitoring metrics.

**5. Newcomers.** QuestDB 8.x (SIMD), GreptimeDB (Rust, cloud-native), TDengine (Chinese IoT), CrateDB (distributed SQL).

**6. OLAP adjacents.** Apache Druid, Apache Pinot, StarRocks — primarily OLAP but they also chew through time-series workloads.

On top, **commercial SaaS** (Datadog, New Relic, Splunk, Honeycomb, Lightstep) sits as a separate tier. Those are "observability platforms" more than "databases."

Memorizing the camps speeds up decisions. The default mapping is: **metrics monitoring → Prometheus ecosystem; IoT or financial ticks → InfluxDB or QuestDB; BI plus analytics → ClickHouse or TimescaleDB.**

3. InfluxDB 1.x / 2.x — the TSM era's legacy

InfluxDB shipped in 2013 as the first generation of dedicated TSDBs.

**1.x (2016–2020).** TSM (Time-Structured Merge tree) engine. InfluxQL (SQL-flavored). Single node. Telegraf collected, InfluxDB stored, Chronograf visualized, Kapacitor alerted. The famous TICK stack.

**2.x (2020–2024).** TSM plus a custom key-value index. Flux language (functional data pipeline). UI integration, token auth, task scheduler. First-generation Cloud offering.

**Flux example.**

from(bucket: "metrics")

|> range(start: -1h)

|> filter(fn: (r) => r._measurement == "cpu")

|> filter(fn: (r) => r._field == "usage_user")

|> aggregateWindow(every: 1m, fn: mean)

|> yield(name: "mean_cpu")

**The problem.** Flux had a steep learning curve. SQL-fluent data engineers had to relearn the language. The index also collapsed once cardinality crossed one million. Clustering was Enterprise-only.

In 2024, InfluxDB Inc. (Influx Data) decided to **scrap TSM and Flux and rewrite from scratch**. The result is 3.0. **1.x/2.x is legacy** and new projects should target 3.0.

4. InfluxDB 3.0 / 3.1 — the Rust + Arrow + Parquet rewrite

InfluxDB 3.0 shipped GA in June 2024 (3.1 in 2026) and is effectively a different database.

**Architecture.**

- **Language**: Go to **Rust**, full rewrite.

- **In-memory format**: Apache Arrow (column-oriented).

- **Query engine**: Apache DataFusion (Rust SQL engine).

- **On-disk format**: Apache Parquet (column-store, ZSTD compression).

- **Query languages**: SQL plus InfluxQL (Flux is deprecated).

- **Separated architecture**: ingest, storage, query, compaction run as distinct components.

**Three SKUs.**

- **InfluxDB 3 Core** — open source (MIT). Single node. For edge or IoT.

- **InfluxDB 3 Enterprise** — commercial. Cluster, HA, security.

- **InfluxDB 3 Cloud (Serverless / Dedicated)** — fully managed on AWS, GCP, or Azure regions.

**Why Arrow plus Parquet.** Column-oriented layouts are overwhelmingly better for time-series. When (timestamp, host=A, cpu=73.2) and (timestamp, host=A, cpu=73.3) flow in, isolating the cpu column compresses spectacularly because the values are nearly identical. Parquet is the industry standard for that model.

**Arrow's real value is zero-copy interop**: a query result from InfluxDB 3 flows straight into Pandas, Polars, or DuckDB. No ETL between database and analysis tool.

**Migration caveat.** 2.x to 3.x **changes the API and query language.** Flux is deprecated and the write API endpoint paths shift. Telegraf output plugins remain mostly compatible, but hand-written code almost always needs a refactor. **That is why many shops are still on 2.x in 2026.**

5. TimescaleDB 2.18 / TigerData — Postgres becomes a TSDB

TimescaleDB shipped in 2017 and now runs at 2.18 in 2026. **The headline is that it's a Postgres extension.**

**Core concept: hypertable.** It looks like an ordinary Postgres table but is partitioned automatically along the time axis under the hood.

CREATE TABLE metrics (

time TIMESTAMPTZ NOT NULL,

device_id INT,

cpu DOUBLE PRECISION,

memory DOUBLE PRECISION

);

SELECT create_hypertable('metrics', 'time', chunk_time_interval => INTERVAL '1 day');

That alone auto-creates a new chunk per day. Retention is one line away.

SELECT add_retention_policy('metrics', INTERVAL '30 days');

**Continuous aggregates.** Materialized views over the 1-second data that automatically roll up to minute, hour, or day. A background job applies incremental refresh.

CREATE MATERIALIZED VIEW metrics_1h

WITH (timescaledb.continuous) AS

SELECT time_bucket('1 hour', time) AS bucket,

device_id,

avg(cpu) AS avg_cpu,

max(cpu) AS max_cpu

FROM metrics

GROUP BY bucket, device_id;

**Column compression.** Convert row-store chunks to column-store. Typically 90–95% reduction.

ALTER TABLE metrics SET (timescaledb.compress, timescaledb.compress_segmentby = 'device_id');

SELECT add_compression_policy('metrics', INTERVAL '7 days');

**TigerData rebrand (2025).** Timescale Inc. renamed itself to **TigerData** in 2025. The product is still TimescaleDB. The cloud SaaS rebranded from Timescale Cloud to Tiger Cloud. The reasoning was shedding the "Timescale equals retrospective analytics" perception and leaning into AI and real-time messaging.

**Strengths.** 100% Postgres compatibility — psql, pgAdmin, ORMs, BI tools all work unchanged. Relational joins are free and SQL is rich.

**Weaknesses.** Single-node ingest caps near one million rows per second. Beyond that, ClickHouse or InfluxDB pull ahead. Multi-node TimescaleDB was deprecated in 2.13, with distribution living only in the cloud product.

6. QuestDB 8.x — a SIMD-armed ingest monster

QuestDB is an open-source TSDB started in 2014, now at 8.x in 2026.

**The quirk.** Written in Java plus native (SIMD intrinsics). Ingest speeds are absurd — **four million rows per second** benchmarks are routine.

**Storage.** Columnar plus time partitioning. Column files lay down on disk in time order — `cpu.d`, `memory.d`, `timestamp.d`. New data is always appended at the tail.

**Queries.** SQL with time-series extensions like SAMPLE BY and LATEST ON.

SELECT timestamp, avg(cpu)

FROM metrics

WHERE timestamp > dateadd('h', -1, now())

SAMPLE BY 1m;

SELECT * FROM metrics

LATEST ON timestamp PARTITION BY device_id;

**Interfaces.** InfluxDB Line Protocol over UDP/TCP (Telegraf-compatible), the PostgreSQL wire protocol (psql connects), and a REST API.

**Strengths.** Top-of-industry single-node ingest. Low memory footprint (tens of billions of rows in 10GB), simple operations. Licensed Apache 2.0.

**Weaknesses.** Weak distribution — replication arrived in 8.0 but is still new. Very high cardinality (tens of millions of label values) breaks it. SQL is real but standard JOIN support is limited.

**Where it fits.** Financial market data (ticks), IoT sensors, game telemetry. Workloads that pour one trillion time-sorted rows into a single node.

7. ClickHouse 24.x / 25.x — OLAP that doubles as a TSDB

ClickHouse began at Yandex in 2009 as an internal column-store OLAP database. Open-sourced in 2016, spun out as ClickHouse Inc. in 2021. 2026 versioning is 25.x.

**Why people treat it as a TSDB.** Column-store plus time-sorted data is overwhelmingly fast. The MergeTree engine resembles an LSM-tree, merging sorted chunks in the background.

**Core engines.**

- **MergeTree** — the default. Background merge of chunks sorted by an ORDER BY key.

- **ReplicatedMergeTree** — replication via ZooKeeper. For HA.

- **SummingMergeTree / AggregatingMergeTree** — auto-aggregate during merge (count, sum, min, max).

- **ReplacingMergeTree** — on duplicate key, keep the latest row.

**Table definition.**

CREATE TABLE metrics

(

timestamp DateTime,

device_id UInt32,

cpu Float64,

memory Float64

)

ENGINE = MergeTree()

PARTITION BY toYYYYMM(timestamp)

ORDER BY (device_id, timestamp);

**ALTER TABLE materialized view.** A ClickHouse materialized view is an INSERT trigger. It transforms or aggregates incoming rows and INSERTs them into another table.

CREATE MATERIALIZED VIEW metrics_1h

ENGINE = AggregatingMergeTree()

ORDER BY (device_id, bucket)

AS SELECT

toStartOfHour(timestamp) AS bucket,

device_id,

avgState(cpu) AS avg_cpu,

maxState(cpu) AS max_cpu

FROM metrics

GROUP BY bucket, device_id;

**Strengths.** Overwhelmingly fast analytics. A 1-billion-row GROUP BY in 1–3 seconds. 10–30x compression. Rich SQL (window functions, arrays, maps). Distributed clustering as a standard feature.

**Weaknesses.** UPDATE and DELETE are expensive (ALTER TABLE DELETE is an async mutation). No transactions. Weak single-row lookups. Operationally more complex than InfluxDB.

**2026 stature.** Cloudflare, Uber, Bloomberg, and eBay run it as their metrics store. The relationship with Grafana Mimir is **complementary** rather than purely competitive.

8. Prometheus 2.x / 3.0 — the de facto standard for metrics monitoring

Prometheus started at SoundCloud in 2012 and joined CNCF in 2016. **Prometheus 3.0** shipped in late 2025. It is the de facto standard for metrics monitoring.

**Architecture.**

- **Pull model** — the Prometheus server periodically scrapes targets exposing `/metrics`.

- **Custom TSDB engine** — 2-hour chunks, WAL, head and persistent block layout.

- **PromQL** — a time-series query language with operators like `rate`, `sum by`, `histogram_quantile`.

- **Alertmanager** — a separate component for routing and deduplication.

- **Service discovery** — auto-discovery from Kubernetes, EC2, Consul, and more.

**PromQL examples.**

rate(http_requests_total[5m])

sum by (status) (rate(http_requests_total[5m]))

histogram_quantile(0.95, sum by (le) (rate(http_request_duration_seconds_bucket[5m])))

**3.0 changes (2025).**

- **UTF-8 label support** — dots and hyphens are valid in label names (matching OpenTelemetry semantic conventions).

- **Native histograms** — dynamic bucketing that solves the cardinality blowups of classic bucket histograms.

- **Remote-write 2.0** — efficient transport to remote storage.

- **OpenTelemetry ingest** — direct OTLP metric reception.

**Weaknesses.** Single node. Long retention is bounded by local disk. HA is delegated to federation or external tools (Mimir, Thanos, VictoriaMetrics). Cardinality explosions push memory off a cliff once series counts exceed one million.

**Why it is still the 2026 standard.** Kubernetes ecosystem integration is bulletproof. Every infrastructure tool emits the Prometheus exposition format. Single-node deployments are still enough for many environments.

9. VictoriaMetrics 1.110+ — the efficient Prometheus alternative

VictoriaMetrics is an open-source TSDB started in 2018, currently at 1.110+ in 2026. **Prometheus-compatible** but more efficient under the hood.

**Architecture.**

- **vmstorage** — data store with column-orientation and ZSTD compression.

- **vminsert** — write-side proxy.

- **vmselect** — query router.

- **vmagent** — Prometheus-style scraper that sends via remote-write.

- **vmalert** — Alertmanager replacement.

**MetricsQL.** A superset of PromQL with extra functions (`keep_last_value`, `aggr_over_time`, `running_sum`) and a faster executor.

**Advantages over Prometheus.**

- **10x better compression.** Same data: 100GB on Prometheus, 10GB on VM.

- **7x less memory.** Ten million active series take 32GB on Prometheus and only 4GB on VM.

- **7x faster ingest.** Holds up better to cardinality blowups.

- **The clustered variant is open source.** Prometheus HA requires federation or Mimir; VM clusters on its own.

**Replacing Prometheus with vmagent.** Prometheus juggles memory storage, disk flush, and remote-write at once. vmagent scrapes and remote-writes immediately, dropping memory usage by an order of magnitude.

**Where it fits.** Large-scale metrics (100M+ active series), cost reduction, and shops that want to ship a single open-source tool end-to-end.

**Weaknesses.** Grafana and Prometheus ecosystem integration is 99% compatible, with a 1% edge-case tail. Recording rules syntax differs slightly.

10. Grafana Mimir — horizontally scalable Prometheus

Grafana Mimir is an open-source project announced by Grafana Labs in 2022 — a **fork and re-architecture of Cortex**. It is the backend behind Grafana Cloud Metrics.

**The pitch.** "Scale Prometheus horizontally to one trillion series."

**Architecture.**

- **Distributor** — receives ingest, hash-routes to ingesters.

- **Ingester** — WAL plus in-memory chunks.

- **Store-Gateway** — queries historical data from S3, GCS, or Azure Blob.

- **Querier** — query execution.

- **Query-Frontend** — query caching and sharding.

- **Compactor** — block merges.

- **Object storage** — all persistent data lives on S3-compatible storage.

**Object storage is the key.** Mimir uses RAM and SSD only as cache; persistence sits in S3 (or GCS or Azure Blob). Infinite retention, cheap storage, and easy node scale-out or scale-in.

**PromQL compatibility.** Prometheus queries, recording rules, and Alertmanager all work unchanged.

**Where it fits.** SaaS shops already on the Grafana stack scaling metrics into the hundreds of millions of series. The self-hosted backbone of Grafana Cloud.

**Weaknesses.** Operationally heavy. Eight to ten components. A KV store (memberlist works) is required, ZooKeeper- or Consul-style. Even with Helm the tuning is non-trivial.

11. Cortex and Thanos — the two branches before Mimir

Before Mimir, two solutions delivered horizontally scaled Prometheus.

**Cortex (2017–).** Started at Weaveworks, CNCF Incubating. Multi-tenant Prometheus, S3-backed. Mimir forked from Cortex in 2021 and broke off after the license change to AGPL.

**Thanos (2017–).** Started at Improbable, CNCF Incubating. A different approach — keep Prometheus instances intact, attach a sidecar that uploads data to S3, then query across multiple Prometheus instances plus S3 at query time. Simplicity is the selling point.

**2026 status.**

- **Thanos** — still popular. Augments existing Prometheus deployments. Strong for multi-cluster, multi-region.

- **Cortex** — eclipsed by Mimir. Some legacy operators remain.

- **Mimir** — the most active development. Grafana Labs pushes hard.

- **VictoriaMetrics** — a separate camp from the three. Simpler.

**Selection guide.** Greenfield: Mimir or VictoriaMetrics. Existing Prometheus instances scattered across clusters and you want to leave them alone: Thanos.

12. Cardinality — TSDB's number-one scaling problem

Anyone who has operated a TSDB for any length of time knows. **Every problem starts with cardinality.**

**Definition.** The number of unique time series. `http_requests_total{method="GET", status="200", path="/api/v1/users"}` is one series. Label-combination explosions explode series counts.

**Dangerous patterns.**

- **user_id as a label** — one million users, one million series.

- **request_id as a label** — a new series per request. Instant memory death.

- **timestamp as a label** — similar suicide.

- **Full error messages as labels** — unbounded variance.

**Memory cost.** A Prometheus single node spends roughly 2GB on one million active series. Ten million is 20GB. One hundred million is 200GB — impossible on a single node.

**Mitigations.**

- **Label normalization.** Use `/api/v1/users/:id` instead of paths with raw ids.

- **Bucketing.** Use histograms instead of per-millisecond latency labels.

- **Send high-cardinality labels elsewhere.** Push `request_id` to logs (Loki or Elasticsearch), keep it out of metrics.

- **Scale via VictoriaMetrics or Mimir.** Up to 100M series is a tooling problem.

**Design cardinality from day one.** Once a label schema is baked into the code, ripping it out is hard. **Write a labeling guide before adopting a TSDB** is an industry rule.

13. Compression — Gorilla XOR, delta-of-delta, ZSTD

TSDBs compress 30x better than typical databases because they layer three techniques.

**1. Delta encoding for timestamps.** Store the difference from the previous value instead of the absolute. With 1-second intervals the delta is almost always 1000ms — a variable-length 1–2 bytes instead of 8.

**2. Delta-of-delta encoding for timestamps.** One step further: store the delta of the delta. Even spacing collapses to runs of zeros and compresses even tighter. The core idea of the Gorilla paper (Facebook 2015).

**3. Gorilla XOR encoding for floats.** Store the XOR with the previous value. Similar floats yield many leading and trailing zeros in the XOR, and variable-length encoding rides that. Smoothly varying metrics like CPU utilization respond especially well.

**4. ZSTD compression at the block level.** A second pass on top of the above. Usually another 2–3x.

**Effect.** A 16-byte pair (timestamp plus float64) collapses to an average of 1.5 bytes. **10x compression is the baseline; 30x is common.**

**Per-engine implementations.**

- **Prometheus / Mimir / Thanos** — Gorilla-based.

- **InfluxDB 3** — Parquet encodings (PLAIN_DELTA_ENCODING, RLE_DICTIONARY, and so on) plus ZSTD.

- **TimescaleDB** — segment-by column compression plus ZSTD or LZ4.

- **VictoriaMetrics** — custom algorithm plus ZSTD. About 10x more efficient than Prometheus on the same data.

- **ClickHouse** — DoubleDelta plus Gorilla plus LZ4 or ZSTD.

Compression is not merely disk savings; it is **I/O savings**. SSD bandwidth is query performance, and good compression buys more data per I/O.

14. Continuous aggregates vs materialized views

Downsampling and pre-aggregation surface as two distinct patterns.

**Continuous aggregates (TimescaleDB).** Materialized views with **incremental refresh.** A background job aggregates only freshly arrived data and appends to the view. The user just SELECTs.

**Materialized views (ClickHouse).** **INSERT triggers.** When a new row arrives it is transformed or aggregated and INSERTed into another table. There is no automatic incremental refresh — only forward-looking data is captured from the definition point.

**Recording rules (Prometheus).** Periodically (say every minute) evaluate a PromQL expression and store the result as a new series. Useful for pre-computing expensive queries that drive Grafana panels.

**Tasks (InfluxDB 3).** Schedule SQL or InfluxQL to run periodically and write results into another measurement.

**Continuous queries (1.x InfluxDB).** Legacy. Replaced by Tasks in 3.x.

**Picking the right one.**

- **Need automatic, correct backfill: TimescaleDB continuous aggregates.** Past-data mutations are auto-reflected.

- **Need aggregation at ingest time: ClickHouse materialized view.** Backfill must be handled separately.

- **Already on Prometheus: stick with recording rules.**

15. OpenTelemetry metrics — the new standard

In 2026 the de facto standard for metric transport is **OpenTelemetry (OTel).**

**OTel Metrics model.**

- **Counter** — monotonically increasing. Think `http_requests_total`.

- **UpDownCounter** — increments and decrements. Think `goroutines_active`.

- **Gauge** — instantaneous values. Think `memory_used_bytes`.

- **Histogram** — distribution. Think `request_duration_seconds`.

- **ExponentialHistogram** — dynamic-bucket histogram, compatible with Prometheus 3.0's native histograms.

**OTLP — OpenTelemetry Line Protocol.** Carries metrics over gRPC or HTTP/protobuf.

**TSDB ingest.** Nearly every 2026 TSDB ingests OTLP directly.

- **Prometheus 3.0** — built-in OTLP receiver.

- **VictoriaMetrics** — OTLP endpoint.

- **InfluxDB 3** — OTLP compatibility.

- **Grafana Mimir** — OTLP ingest.

- **ClickHouse** — via the OTel Collector.

**Why it matters.** The era where Prometheus exposition format (pull) was the lone standard has given way to OTLP (push) coexisting with Prometheus (pull). Application code uses a single OTel SDK and config decides where data flows.

**Korean and Japanese adoption.** Kakao is migrating internal metrics from Prometheus to OTel. NTT defaults to OTel. Greenfield projects choose OTel by default.

16. TDengine, GreptimeDB, CrateDB — the newcomer camp

Three newcomers worth tracking in 2026.

**TDengine 3.x.** Open-source TSDB from China's Taos Data. Built for IoT. Written in C, lean and fast. The "1 device = 1 table" model — each device has its own table, isolating series. A single node handles 100 million devices. SQL. Clustering is enterprise.

**GreptimeDB 0.10+.** Shipped in 2022, written in Rust, cloud-native. Storage and compute separated, S3 backend, Kubernetes-friendly. Supports InfluxDB Line Protocol, the MySQL and PostgreSQL wire protocols, and PromQL simultaneously. **Stabilized at v0.10 in 2025.** Late mover but a direct competitor to InfluxDB 3.

**CrateDB.** Shipped in 2014 by Crate.io (Austria). A distributed SQL database that also handles time-series. Sharded like Elasticsearch with Lucene indexing. Rich SQL plus full-text search. Popular with some IoT customers.

**When to choose them.**

- **TDengine** — hundreds of millions of IoT devices sharing schema. China market.

- **GreptimeDB** — cloud-native environments on K8s plus S3. Evaluating an InfluxDB 3 alternative.

- **CrateDB** — time-series plus full text plus relational mixed. A SQL-friendly distributed database.

These three target **specific niches** rather than dethroning Prometheus, ClickHouse, InfluxDB, or TimescaleDB.

17. OLAP adjacents — Apache Druid, Pinot, StarRocks

Three OLAP databases that absorb time-series workloads.

**Apache Druid.** Born at Metamarkets in 2011, Apache in 2018. Time-series plus OLAP. Real-time ingest (loves Kafka), columnar storage, bitmap indexes. Netflix, Airbnb, and eBay run it for user-behavior analytics and ad metrics.

**Apache Pinot.** Started at LinkedIn in 2014, Apache in 2019. Similar space to Druid — real-time OLAP. Strong on low-latency (sub-second) queries. LinkedIn, Uber, and Stripe deploy it.

**StarRocks.** Started in 2020 (China). MPP query engine. Mixes analytics and time-series. Competes head-on with ClickHouse. Growing fast in 2026.

**OLAP TSDB vs dedicated TSDB.**

- Dedicated TSDB strengths: **append-heavy, retention, compression.**

- OLAP strengths: **ad-hoc GROUP BY, JOIN, complex analytics.**

**When to choose OLAP.** Time-series plus dimensional analytics (slice-and-dice, drill-down) together. Ad attribution, user-behavior analytics, business intelligence.

**When to stick with TSDB.** Monitoring metrics, IoT sensors, financial ticks — 99% time-axis aggregation is best served by a dedicated TSDB.

18. Log adjacents — VictoriaLogs, Loki, OpenObserve

Metrics and logs sit next to each other, and many tools try to merge them.

**Grafana Loki.** Shipped in 2018 by Grafana Labs. "Prometheus for logs." Indexes labels only (no full-text index). LogQL for queries. S3-backed. The sibling product to Mimir.

**VictoriaLogs.** The logs counterpart to VictoriaMetrics, shipped in 2023. Single binary, excellent compression, LogsQL query language.

**OpenObserve.** Shipped in 2023. An all-in-one observability tool written in Rust — logs, metrics, traces. Parquet plus S3 backend. A late mover challenging Elasticsearch and Splunk.

**Elasticsearch / OpenSearch.** Still the most common log store. Heavy and expensive but unmatched on full-text search.

**ClickHouse for logs.** Uber and Cloudflare also run logs on ClickHouse. Better compression and speed than ELK, weaker full-text search.

**Common combinations.**

- **Grafana stack** = Prometheus/Mimir (metrics) + Loki (logs) + Tempo (traces).

- **VictoriaMetrics stack** = VM (metrics) + VictoriaLogs (logs).

- **OpenObserve alone.** One tool for all three.

- **ELK plus Prometheus.** The classic split.

19. Commercial SaaS — Datadog, New Relic, Splunk, Honeycomb

Many teams skip self-hosted TSDB and pay for SaaS.

**Datadog.** Founded 2010, market cap roughly $40B in 2026. Metrics plus APM plus logs plus RUM plus Security in one pane. World-class UX, infamously expensive. Billed by series or by host.

**New Relic.** A Datadog competitor. Simplified pricing in 2020 (the SKU sprawl was too much before). Gaining share in the Korean and Japanese markets.

**Splunk.** Founded 2003. Log-centric (weaker on metrics). Strong in enterprise security (SIEM). Acquired by Cisco in 2024.

**Honeycomb.** Founded by Charity Majors in 2016. "Observability 2.0" — event-based high-cardinality analytics. More about traces and events than classical metrics. Cult following in the SRE community.

**Lightstep.** Founded by Ben Sigelman, acquired by ServiceNow in 2021. Trace-centric.

**Grafana Cloud.** Grafana Labs' SaaS. Managed Mimir, Loki, and Tempo.

**Selection guide.**

- **Full-stack quickly: Datadog.** Expensive but fast.

- **Metrics only with a self-hosted option: Grafana Cloud.**

- **High-cardinality debugging: Honeycomb.**

- **Enterprise security: Splunk.**

**Cost trap.** Datadog bills go vertical when cardinality explodes. Adding one "custom metric" label can pile on six figures monthly.

20. Hardware — NVMe I/O patterns and memory budgets

TSDB performance is directly tied to hardware decisions.

**NVMe SSD is mandatory.** SATA SSD bandwidth (550MB/s) cannot keep up with ingest. NVMe (3–7 GB/s read, 1–5 GB/s write) is standard. PCIe 5.0 NVMe is mainstream in 2026.

**Time-series I/O patterns.**

- **Sequential writes** — new data is append-only, nearly sequential.

- **Range reads** — time-range queries read contiguous blocks.

- **Few random reads** — point lookups are rare.

That pattern favors sequential-friendly structures like LSM-tree, MergeTree, or Parquet.

**Memory budget per series.** A simplified rule of thumb.

- **Prometheus single node** — 1.5–2 KB per active series. One million series equals 2GB.

- **VictoriaMetrics** — about 400 bytes per series. 7x more efficient.

- **TimescaleDB** — varies with chunk_cache settings.

- **ClickHouse** — primary key index plus mark cache. Explicitly tuned.

- **InfluxDB 3** — explicit Arrow in-memory buffer sizing.

**The S3 backend implication.** Mimir, Thanos, and GreptimeDB rest persistence on S3, treating disk as cache only. Nodes scale freely and storage costs roughly 1/10 of SSD. The trade-off is 100ms-plus extra query latency versus direct SSD access.

**ARM CPU effect.** On AWS Graviton or Ampere Altra ARM servers, ClickHouse and VictoriaMetrics deliver 20–30% better price-performance than x86. InfluxDB 3 runs well on ARM thanks to Rust.

21. Korea — NHN Cloud, Kakao, Coupang

Korean usage of TSDBs.

**NHN Cloud.** Offers a managed service called **NHN Cloud Time Series**. Internally built on InfluxDB plus custom extensions. Marketed for game telemetry and IoT.

**Kakao.** KakaoTalk backend metrics ride Prometheus plus Thanos. In 2024 some workloads moved to VictoriaMetrics. Metric series counts exceed 100 million. Internal data platforms also run ClickHouse.

**Naver.** Naver Cloud (NCP) offers a managed Cloud DB for InfluxDB. Internal monitoring defaults to Prometheus plus Grafana. The AI platform team uses ClickHouse for metrics.

**Coupang.** Coupang R&D is a heavy ClickHouse adopter for search, recommendation, and logistics metrics. Prometheus handles infrastructure monitoring.

**Toss.** Payment workload metrics run on Prometheus plus VictoriaMetrics. The team aggressively distributes a cardinality-management guide internally.

**Samsung SDS / LG CNS.** Enterprises lean Splunk and Datadog, but cloud-native workloads are quickly moving to Prometheus plus Grafana.

**Korean learning resources.** Owen Lee's Prometheus Korean guide, the VictoriaMetrics Korean Slack community. The ClickHouse user meetup is driven by Kakao and Daangn Market.

22. Japan — NTT R&D, Rakuten, CyberAgent, Mercari

Japan adopts TSDBs aggressively.

**NTT R&D.** Standardized on Prometheus plus Thanos. Some IoT research projects run InfluxDB. The most aggressive OpenTelemetry adopter among major Japanese companies.

**Rakuten.** The Rakuten Data Platform runs ClickHouse at scale for user-behavior analytics and ad metrics. Datadog handles infrastructure monitoring on top.

**CyberAgent.** AbemaTV metrics run on Prometheus plus Grafana Mimir. Ad attribution lives in BigQuery plus an in-house time-series tool. The team regularly speaks about Mimir at internal conferences.

**Mercari.** Microservices number in the thousands. Prometheus plus Cortex (migrating to Mimir). Datadog APM runs in parallel.

**LINE.** Post-merger LINE Yahoo (2023) operates a Prometheus stack with custom extensions for metrics infrastructure. Hadoop/Hive-based time-series analytics continue to run.

**SoftBank Robotics, Sony.** IoT and robotics shops carry InfluxDB 1.x/2.x legacy and are evaluating the migration to InfluxDB 3.

**Japanese resources.** O'Reilly Japan's "Prometheus 実践ガイド" and Nikkei BP's time-series data processing series are standards. The Grafana Tokyo user group meets quarterly.

23. Migration patterns

Scenarios for moving between TSDBs.

**InfluxDB 2 → InfluxDB 3.** **No API compatibility.** The write API stays line-protocol compatible, but query API and Flux diverge completely. Options:

1. **Big-bang migration** — short downtime window (overnight) for export and import. Only feasible on small datasets.

2. **Dual-write** — write to both for a period, validate, then cut over.

3. **Frozen archive** — pin 2.x as a historical read-only store, send new data only to 3.x.

**Prometheus → Mimir.** Mostly painless. Point Prometheus remote-write at Mimir, run dual for a while, then read directly from Mimir. PromQL compatible.

**Prometheus → VictoriaMetrics.** Similar. Replace Prometheus with vmagent and persist on vmstorage. Recording rules syntax differs slightly — pre-convert.

**Raw PostgreSQL → TimescaleDB.** Just `CREATE EXTENSION timescaledb` and `create_hypertable()`. Near-zero downtime. Existing SQL keeps working.

**Postgres → ClickHouse.** Big SQL syntax delta. Transactions are gone. The usual pattern is dual-write — transactional data stays on PG, analytics and time-series migrate to ClickHouse. Use CDC (Debezium) for PG-to-ClickHouse replication.

**OpenTSDB → anything.** OpenTSDB is effectively dead in 2026. HBase operations are too heavy. Most shops migrated to Mimir, VictoriaMetrics, or ClickHouse.

**Migration pitfalls.** Label schema differences (especially Prometheus's UTF-8 handling), timestamp precision (ns/μs/ms/s), and retention policy mapping. A dry-run is non-negotiable.

24. Decision matrix

A by-scenario selection guide.

**Case A — single team, metrics monitoring, K8s.**

- Under 10M series: **Prometheus 3.0 single node.**

- 10M–100M series: **VictoriaMetrics** or **Prometheus + Thanos.**

- Over 100M series: **Grafana Mimir** or **VictoriaMetrics cluster.**

**Case B — IoT sensor data.**

- Hundreds of thousands of devices, edge computing required: **InfluxDB 3 Core (Edge).**

- Hundreds of millions of devices, China market: **TDengine.**

- Devices plus full-text search: **CrateDB.**

**Case C — financial ticks, single-node ingest optimization.**

- One million-plus rows per second on one node: **QuestDB.**

- Distributed plus rich analytical queries: **ClickHouse.**

**Case D — business KPIs plus analytics.**

- Postgres-friendly shop: **TimescaleDB.**

- Overwhelming analytics: **ClickHouse.**

- Real-time OLAP: **Apache Druid** or **StarRocks.**

**Case E — full-stack observability (metrics + logs + traces).**

- Prefer the Grafana stack: **Mimir + Loki + Tempo.**

- Prefer the VictoriaMetrics stack: **VM + VictoriaLogs.**

- Single tool: **OpenObserve.**

- SaaS speed: **Datadog.**

**Case F — AI/ML workload metrics.**

- Model inference latency, GPU utilization: **Prometheus + Grafana + OpenTelemetry.**

- Training metrics: Weights & Biases / MLflow plus Prometheus.

**Default starting point.** When in doubt, start with **Prometheus plus Grafana** and scale out to VictoriaMetrics or Mimir when cardinality explodes.

25. Ten anti-patterns

The TSDB mistakes you see most often.

1. **user_id as a label.** Number-one cause of cardinality explosions.

2. **Using Prometheus as long-term storage.** Past 30 days the disk dies. Send to Mimir, Thanos, or VictoriaMetrics.

3. **Raw time-series in PostgreSQL.** Without TimescaleDB the cost is 10x.

4. **Mixing OLTP and TSDB workloads.** Time-series load kills OLTP performance.

5. **Timestamps or UUIDs as labels.** Every row becomes a new series.

6. **No backups.** Prometheus stores everything on a single node's disk. Lose the node, lose everything.

7. **Storing raw data with no downsampling.** Disk explodes.

8. **No cardinality monitoring.** Always track `prometheus_tsdb_head_series`.

9. **Starting a new project on Flux or Influx 1.x.** Go straight to 3.x.

10. **Adopting Datadog without a card cap.** "Custom metric" growth balloons the invoice. Build label-policy guardrails.

26. References

- [InfluxDB 3 — official docs](https://docs.influxdata.com/influxdb3/)

- [InfluxDB GitHub — influxdb_iox (Rust core)](https://github.com/influxdata/influxdb)

- [Apache Arrow — official](https://arrow.apache.org/)

- [Apache DataFusion — official](https://datafusion.apache.org/)

- [Apache Parquet — official](https://parquet.apache.org/)

- [TimescaleDB / TigerData — official](https://www.tigerdata.com/)

- [TimescaleDB GitHub](https://github.com/timescale/timescaledb)

- [QuestDB — official](https://questdb.io/)

- [QuestDB GitHub](https://github.com/questdb/questdb)

- [ClickHouse — official docs](https://clickhouse.com/docs)

- [ClickHouse GitHub](https://github.com/ClickHouse/ClickHouse)

- [Prometheus — official](https://prometheus.io/)

- [Prometheus 3.0 — release notes](https://prometheus.io/blog/2024/11/14/prometheus-3-0/)

- [VictoriaMetrics — official](https://victoriametrics.com/)

- [VictoriaMetrics GitHub](https://github.com/VictoriaMetrics/VictoriaMetrics)

- [Grafana Mimir — official](https://grafana.com/oss/mimir/)

- [Grafana Mimir GitHub](https://github.com/grafana/mimir)

- [Cortex — official](https://cortexmetrics.io/)

- [Thanos — official](https://thanos.io/)

- [TDengine — official](https://tdengine.com/)

- [GreptimeDB — official](https://greptime.com/)

- [CrateDB — official](https://crate.io/)

- [Apache Druid — official](https://druid.apache.org/)

- [Apache Pinot — official](https://pinot.apache.org/)

- [StarRocks — official](https://www.starrocks.io/)

- [Grafana Loki — official](https://grafana.com/oss/loki/)

- [VictoriaLogs — official docs](https://docs.victoriametrics.com/victorialogs/)

- [OpenObserve — official](https://openobserve.ai/)

- [Datadog — official](https://www.datadoghq.com/)

- [New Relic — official](https://newrelic.com/)

- [Splunk — official](https://www.splunk.com/)

- [Honeycomb — official](https://www.honeycomb.io/)

- [OpenTelemetry — official](https://opentelemetry.io/)

- [OpenTelemetry Metrics spec](https://opentelemetry.io/docs/specs/otel/metrics/)

- [Gorilla paper — Facebook 2015](https://www.vldb.org/pvldb/vol8/p1816-teller.pdf)

- [Awesome Time-Series Databases — GitHub](https://github.com/xephonhq/awesome-time-series-database)

- [NHN Cloud Time Series — official](https://www.nhncloud.com/kr/service/database/time-series)

- [Naver Cloud — Cloud DB for InfluxDB](https://www.ncloud.com/product/database/cloudDbForInfluxdb)

— End of Time Series Databases 2026.