Chaos and Order

💡 왼쪽 원문을 읽으면서 오른쪽에 따라 써보세요. Tab 키로 힌트를 받을 수 있습니다.

Prologue — Elasticsearch Is Still a Giant, But the Landscape Has Scattered

In 2018, the answer to "what search engine should we use?" was essentially one word. Elasticsearch. ELK was Elasticsearch for logs, Elasticsearch for site search, and (somehow) Elasticsearch for recommendations.

In 2026, the same question splits into at least five branches.

"For in-app search, Meilisearch or Typesense. We don't bother with ES."
"We moved log search to Quickwit. Just leave it on S3 — the cost is insane."
"We went to OpenSearch because of Elastic's license. On AWS it's natural."
"Recommendation, RAG, real-time AI serving — Vespa. Nothing else competes."
"Still Elasticsearch. v3 license is fine and ESQL is good."

This post compares the search engines honestly as of May 2026. The strengths and weaknesses of each, the reality of hybrid search (full-text + vector + reranker) that has entered every engine, and a decision framework for "when to pick what."

One-line spoiler: The "full-text vs vector" dichotomy is dead. The 2026 answer is almost always hybrid. The question is who makes hybrid the smoothest.

1. The Landscape — Seven Engines Plus Some

Let's draw the map first, so categories don't collapse.

Category	Engine	One line
in-app / site search (simple)	Meilisearch, Typesense	Rust/C++, lightweight, self-host friendly
general-purpose search and analytics	Elasticsearch, OpenSearch	the giant plus the AWS fork
log / observability search	Quickwit, Loki, Elasticsearch	S3-based is the new shape
AI-serving search	Vespa	real-time ranking, tensors, ML are first-class
SaaS site search	Algolia	hosted-only, very fast
ultra-lightweight	Sonic	autocomplete-tier, Rust
search + analytics blur	Apache Doris, StarRocks, ClickHouse	OLAP encroaches on search
pure vector DBs	Pinecone, Weaviate, Qdrant, Milvus, LanceDB	covered separately

Most people don't evaluate all of these at once. With three scenarios — in-app search, log search, RAG search — the candidate list narrows naturally.

In-app search — site, product, document search. Meilisearch, Typesense, Algolia, Elasticsearch, OpenSearch.
Log / observability — traces, metrics, logs. Quickwit, Elasticsearch, OpenSearch, Loki.
RAG / real-time AI — embeddings + full-text + reranker. Vespa, Elasticsearch, OpenSearch, Weaviate, Qdrant, pgvector, plus Meilisearch hybrid.

2. Elasticsearch — The Giant Is Alive

Elasticsearch is not dead. In 2024 and 2025 it fired back in two ways.

2.1 License — From SSPL / Elastic License v2 to AGPL / Elastic v3

In late 2024, Elastic re-opened Elasticsearch and Kibana under AGPL v3 (alongside Elastic License v2). That decision was a clear signal aimed at the OpenSearch camp, and it revived energy in OSS-friendly clouds — Bonsai, Elastic Cloud, and self-hosting. It does not mean AWS managed Elasticsearch comes back — AWS has fully committed to OpenSearch.

The licensing essentials.

AGPL v3 — real OSS. But if you host it as a service, you must publish modifications.
Elastic License v2 — free unless you resell it as a managed SaaS.
Either/or — users pick one of the two.

2.2 ESQL — A New SQL-Like Query Language

In Elasticsearch 8.11+, ESQL (Elasticsearch Query Language) went GA. Instead of writing Lucene's query DSL directly, you write a pipe-based SQL-ish syntax that does the same job.

FROM logs-*
| WHERE @timestamp > NOW() - 1 hour AND http.status >= 500
| STATS count(*) BY service.name
| SORT count DESC
| LIMIT 10

ESQL's meaning is simple. "For logs and metrics, Elasticsearch moves closer to OLAP" — a surface that can compete with Apache Doris and ClickHouse. ESQL uses a path optimized for stats and aggregations rather than the inverted index.

2.3 Vector Search and Hybrid as First-Class

Since 8.x late, dense vector fields and HNSW indexes, plus BM25 + kNN hybrid search, are standard features. learning_to_rank and ELSER (Elastic's trained sparse embeddings) are available too, making ES a respectable option to sit at the center of a RAG pipeline without a separate vector DB.

2.4 Elasticsearch Strengths and Weaknesses

Strengths: works for every scenario, rich ecosystem (Kibana, Logstash, Beats, Fleet, APM), 8.x hybrid and ESQL.
Weaknesses: operational complexity (JVM, shards, replicas, node tuning), cost, cold-data spend. Keeping all log data on hot tier gets scary fast.

3. OpenSearch — Politics and Engineering of the AWS Fork

OpenSearch, forked by AWS in 2021 in reaction to the SSPL change, is no longer a "fork" after five years — it's its own project. As of 2026.

Governance — In September 2024, AWS moved OpenSearch to the OpenSearch Software Foundation under the Linux Foundation. SAP, Uber, Aiven, Atlassian joined as founding members. The change matters: it breaks the "AWS-only" perception and creates multi-vendor governance.
Compatibility — OpenSearch split from Elasticsearch 7.10. Since then API compatibility is no longer 1:1. At the client-library level, much still works on both, though.
Vector — opensearch-knn plugin, HNSW, IVF, k-NN. Faiss and Lucene back-ends are selectable.
OpenSearch Dashboards — Kibana fork. Feature pace lags Kibana but is catching up.

Two common reasons to pick OpenSearch.

You live on AWS and want managed search — Amazon OpenSearch Service is the natural pick.
You need to avoid Elastic License v2's SaaS resale restriction — managed-search vendors whose product itself exposes a search engine.

For most in-app and log scenarios, ES and OpenSearch are roughly feature-equivalent. New ES features like ESQL arrive in OpenSearch in a different shape (PPL — Piped Processing Language), so once you've written code against one query language, switching costs are real.

4. Meilisearch — Minimalism in Rust

Meilisearch is a Rust-written search engine specialized for in-app search. As of May 2026 it's on the v1.13+ line. One line: "Algolia-like experience, self-hosted."

4.1 Why People Love It

Single binary — a single static Rust binary. No JVM, no dependencies.
Near-zero-config start — run meilisearch, POST JSON documents, search immediately.
Typo tolerance, prefix, highlighting, facets, synonyms — almost everything an in-app search needs is built in.
Smooth JS, React, Vue SDKs — with tools like InstantMeiliSearch, you can wire UI in five minutes.

4.2 The 2026 Additions — Vector and Hybrid

Meilisearch introduced dense vector search in v1.6, and through 2025 embeddings, rerankers, and hybrid stabilized.

// Meilisearch 1.x — hybrid search
const results = await client.index('products').search('blue runners', {
  hybrid: {
    embedder: 'openai',
    semanticRatio: 0.7,
  },
  limit: 20,
})

semanticRatio: 0.7 is "70 percent vector, 30 percent BM25." Zero is full-text only, one is vector only.

4.3 Meilisearch Strengths and Weaknesses

Strengths: simplest operations, fastest start, almost everything for in-app search is built in, edge and self-host friendly.
Weaknesses: distributed and sharding are less mature than ES (multi-node clustering is a back-burner concern), unsuited for log or OLAP scenarios, very large single indexes (hundreds of millions of documents+) are more stable on ES.

When to pick it — product catalogs, document search, blog search, small-to-medium in-app search. Almost every case of replacing Algolia with self-hosting.

5. Typesense — Another In-App Powerhouse in C++

Typesense is a search engine written in C++. It targets the same scenarios as Meilisearch with a slightly different philosophy.

5.1 How It Differs From Meilisearch

Language — Meilisearch in Rust, Typesense in C++. Both single-binary. Performance is comparable.
Distribution — Typesense supports Raft-based distributed clustering from the start. Multi-node HA was first-class from v0.x.
Multi-tenancy — many collections per cluster, and scoping each API key to a search scope per collection is smooth. Fits SaaS in-app search well.
Visualization — Typesense Dashboard isn't Kibana-tier, but the tool itself is simple, so a heavy dashboard isn't needed.

5.2 The 2026 Additions — Conversational, Vector, Hybrid

Typesense added vector search, hybrid, and natural-language search from v0.25+. "Conversational search" — an endpoint that answers natural-language questions directly via LLMs — went GA in v28.x.

const result = await client.collections('products').documents().search({
  q: 'red running shoes under 100',
  query_by: 'name,description,embedding',
  vector_query: 'embedding:([], k: 50, alpha: 0.4)',
})

alpha: 0.4 is the same idea as Meilisearch's semanticRatio — the weight between full-text and vector.

5.3 Typesense Strengths and Weaknesses

Strengths: first-class distributed clustering, multi-tenant friendly, very fast response time, well-organized SDKs.
Weaknesses: community a little smaller than Meilisearch (not small in absolute terms, just relatively), some natural-language and typo-heavy scenarios feel slightly smoother on Meilisearch.

When to pick it — multi-tenant in-app search for SaaS, in-app search that needs distribution from day one.

6. Quickwit — S3-Native Log Search, and the Datadog Acquisition

Quickwit is the most recent big event in the search-engine landscape. A Rust-written log-specialized search engine, with object storage (S3, GCS, Azure Blob) as primary storage — which is the decisive difference from ES.

6.1 Why It Matters — "Just Leave It On S3, The Cost Is Insane"

Anyone who has run ES on logs knows the pain. Keeping time-series on hot tier means endless SSDs, and frozen-tier, snapshot, and ILM settings eat half your ops time.

Quickwit's model is the opposite.

Index files (splits) stay on S3, and only the parts needed for a query are fetched.
Compute and storage are fully separated. Search nodes are stateless. Only indexing nodes keep some state.
The unit cost is S3 pricing — i.e., a single-digit fraction of EBS/SSD.

6.2 The 2024 Datadog Acquisition — What Happened Next

In October 2024 Datadog acquired Quickwit. The market's biggest question post-acquisition was the OSS fate. The 2025–2026 answer: "Maintained, but core development moves deeper into Datadog's internal log and trace systems."

The OSS itself — stays under Apache 2.0. The GitHub repo is still actively updated.
The commercial cloud — Quickwit Cloud closed to new signups. Instead, Datadog Managed Logs runs on Quickwit internally.
Risk for new adopters — post-acquisition, the OSS lost its "official managed" option. If you're confident self-hosting, it's still the best choice. For "I just want it managed without thinking" companies, this is a warning signal.

6.3 Quickwit's Data Model

Two differences from ES.

Append-only, time-series first — updates are limited. Optimized for logs and traces.
Schema-less is first-class — accepts JSON straight and indexes it. Dynamic field mapping is smoother than ES.

Queries support an Elasticsearch DSL-like shape, and OpenTelemetry, Jaeger, and Grafana integrations are first-class.

6.4 Quickwit Strengths and Weaknesses

Strengths: overwhelming cost efficiency via S3-native, optimized for log and trace scenarios, OpenTelemetry / Grafana integration, stateless search nodes.
Weaknesses: unsuited for in-app search (weak updates), the OSS managed option vanished after the Datadog acquisition, some features (vector, parts of aggregation) are less rich than ES.

When to pick it — log, trace, observability search, putting cold data on S3 to keep the bill sane.

7. Vespa — Real AI-Serving Search

Vespa sits in an interesting place. It started as Yahoo!'s internal search and recommendation infrastructure, was open-sourced in 2017, and spun out from Verizon Media as Vespa.ai in 2023. As of 2026 it's on v8.x.

Spotify, Wikipedia, Pinterest — their search and recommendation systems run on Vespa. What's special.

7.1 The Decisive Differences From Other Engines

Tensors are first-class — embeddings and multi-dimensional tensors live directly in index, storage, and queries. Vector search is the core, not an add-on.
Ranking pipeline is first-class — first-phase, second-phase, and global-phase ranking are structural. ONNX and TensorFlow models run directly inside the ranking step.
Real-time write + serving — indexing and serving share the same nodes. Real-time updates are smooth.

That's why it dominates RAG, recommendation, and real-time AI serving.

7.2 ColBERTv2 — The Reranker Vespa Handles Most Smoothly

ColBERTv2 is the efficient middle ground between dense retrieval and a cross-encoder reranker. It computes late interaction with per-token embeddings, and Vespa makes it first-class — ColBERT tensors are stored directly in the index, and the ranking step does efficient MaxSim computation.

7.3 Vespa Strengths and Weaknesses

Strengths: real AI-serving search, tensors and embeddings and models are first-class, real-time updates, proven at very large scale.
Weaknesses: steep learning curve, operational complexity bigger than ES, overkill for small in-app search, ops burden outside Vespa Cloud is real.

When to pick it — the core of a recommendation system, the actual core of RAG, places where both retrieval and ranking are ML-driven.

8. Algolia, Sonic, Apache Doris, StarRocks — The Rest

8.1 Algolia — Still No. 1 in SaaS Site Search

Algolia has no self-hosted option. SaaS only. That's the strength and the weakness.

Strengths: very fast response (global edge CDN), zero ops burden, the most mature InstantSearch UI kit.
Weaknesses: cost grows fast (per-record, per-query pricing), data-sovereignty constraints, no self-host.

When to pick it — low-traffic in-app search that ships fast, site or blog search. Once traffic or record count grows, migrating to Meilisearch or Typesense is a common pattern.

8.2 Sonic — Extremely Lightweight Autocomplete

Sonic is a Rust-written extremely lightweight search engine. Specialized for autocomplete / suggest. SQLite-like index file, nearly zero memory footprint.

Strengths: very lightweight, fast to start.
Weaknesses: simplistic full-text search, limited facets and highlighting, no vector.

When to pick it — search is essentially autocomplete and a different tool handles main search. In 2026, Meilisearch and Typesense handle most autocomplete well too, so Sonic's space has narrowed.

8.3 Apache Doris, StarRocks — OLAP Eating Into Search

Columnar OLAP engines like Doris, StarRocks, and ClickHouse started supporting inverted indexes and text search, opening a new scenario: "search and analytics on the same data." It's especially attractive for log and event data, where analysis and search hit the same dataset.

Strengths: one system for analytics and search.
Weaknesses: the smoothness of full-text search (ranking, synonyms, facets) is not as mature as ES.

When to pick it — analytics is primary and search is secondary. Running main site search on this is still a stretch.

9. Hybrid Search — The 2026 Default

The "full-text vs vector DB" split is over. In 2026, search is almost always hybrid. Here's the structure.

9.1 The Three Layers of Hybrid

1. retrieval (first pass)
   - BM25 (full-text, lexical match)
   - Dense vector (semantic, embedding ANN)
   - Merge the two results via RRF or a weighted sum

2. reranking (second pass)
   - A cross-encoder or ColBERT-style model
   - Re-rank only the top 50 to 200 results
   - Expensive but a big quality lift

3. business logic (third pass)
   - Popularity, recency, personalization, business rules

How each engine implements these three layers differs.

9.2 RRF (Reciprocal Rank Fusion) — The Standard Merger for Hybrid

How do you merge two ranked lists (BM25 results and vector results)? The most common method is RRF.

RRF_score(d) = sum( 1 / (k + rank_i(d)) )  for each ranker i

k is usually 60. Simple but very robust. ES, OpenSearch, Vespa, Meilisearch, and Typesense all support RRF or a variant.

9.3 Rerankers — Cohere Rerank, Voyage, ColBERT

Cohere Rerank — a Cohere-trained cross-encoder. API call. Priced per query. The most common pick.
Voyage AI rerankers — the voyage-rerank series. Direct competitor to Cohere. Better in some domains.
Vespa's ColBERTv2 — late-interaction. The model is embedded in the index, so the reranker doesn't depend on an external API.
Open-source rerankers — bge-reranker, mxbai-rerank, jina-rerank. Self-hosted.

Reranking usually runs only on the top 50 to 200 retrieved results. Anything more and cost explodes.

9.4 Hybrid Support — Side-by-Side

Engine	BM25	dense vector	RRF	external reranker	ColBERT / late-interaction
Elasticsearch	yes	yes (HNSW)	yes	call yourself	limited
OpenSearch	yes	yes (k-NN)	yes	call yourself	limited
Vespa	yes	yes (HNSW + more)	yes	in-index possible	first-class
Meilisearch	yes	yes	semanticRatio	embedder integration	none
Typesense	yes	yes	alpha	call yourself	none
Quickwit	yes	limited	-	-	none
Weaviate	limited	yes	hybrid query	module	none
Qdrant	sparse	yes	RRF	external	limited

10. Full-Text vs Vector DB vs Both — A Decision Framework

10.1 Full-Text Alone Is Enough When…

Product catalog — SKUs, names, tags need precise match. Low need for semantic search.
Logs and traces — exact keyword and identifier match is the core.
Developer-tool search — code, identifiers, exact match.

→ Meilisearch, Typesense, Quickwit, Elasticsearch, or OpenSearch, depending on the scenario.

10.2 Vector Alone Is Enough When…

Almost never. Pure vector search is weak for short queries, exact matches, and identifier search. An environment with keyword search beside it is almost always better.

Exception — image, audio, video similarity search where no text index exists. Then Pinecone, Qdrant, Milvus, Weaviate, or LanceDB are natural.

10.3 Hybrid Is the Answer (Most of the Time)

Semantic expansion of site search — "blue runners" → running shoes.
RAG retrieval — pulling context documents for a user question.
Internal docs and knowledge-base search — many synonyms, fluid keywords.
The candidate-generation stage of a recommender.

→ Vespa, Elasticsearch, OpenSearch, Meilisearch hybrid, or Typesense hybrid, balancing scenario vs ops overhead.

11. Capability Matrix — At a Glance

Capability	Elasticsearch	OpenSearch	Meilisearch	Typesense	Quickwit	Vespa	Algolia
Full-text (BM25-ish)	excellent	excellent	good	good	good	good	excellent
Vector search	good (HNSW)	good (k-NN)	good	good	limited	excellent	side feature
Hybrid search	yes (RRF)	yes (RRF)	yes (ratio)	yes (alpha)	limited	very smooth	yes
Reranker integration	external call	external call	embedder	external call	-	in-index	partial
Distributed	excellent	excellent	fair	good	good (S3)	excellent	SaaS only
Logs / time-series	good	good	unsuited	unsuited	excellent	unsuited	unsuited
in-app / site search	good	good	excellent	excellent	unsuited	good (overkill)	excellent
RAG retrieval	good	good	fair to good	fair to good	unsuited	excellent	fair
Ops complexity	high	high	very low	low	medium	very high	SaaS only
Cost	expensive (hot tier)	expensive	cheap	cheap	very cheap (S3)	expensive	expensive at scale
License	AGPL + ELv2	Apache 2.0	MIT	GPL/SSPL	Apache 2.0	Apache 2.0	proprietary

12. Scenario Recommendations — In-App, Logs, RAG

Three scenarios, each as a tiny decision tree.

12.1 In-App Search (Site, Product, Document)

- Low traffic, ship fast: Algolia
- Self-host, simple: Meilisearch
- Self-host, multi-tenant: Typesense
- ES/OS already in place, ops capable: Elasticsearch / OpenSearch
- Search needs ML ranking: Vespa (accept the ops cost)

12.2 Logs and Observability

- Cost first, large data: Quickwit (self-hosted)
- Managed, AWS: Amazon OpenSearch Service
- Managed, multi-cloud: Elastic Cloud, Grafana Cloud Loki
- Already on Datadog: Datadog (Quickwit underneath)

12.3 RAG and Real-Time AI Serving

- Only first-stage retrieval needed: Meilisearch/Typesense hybrid + external reranker
- Full ranking pipeline needed: Vespa
- ES/OS already in place, ops capable: ES/OS dense_vector + reranker
- Image/audio-first: Pinecone, Qdrant, Milvus, Weaviate
- Postgres is the source of truth: pgvector + external reranker

13. Real Cases — Who Uses What

Spotify — Vespa at the core of search and recommendation. Ranking for songs, artists, playlists.
Wikipedia — Elasticsearch-based search (CirrusSearch); some recommendation experimented on Vespa.
Pinterest — Vespa for recommendation, Elasticsearch for some search.
HackerNews search — Algolia (long partnership).
Datadog log search — Quickwit (absorbed after acquisition).
GitHub Code Search — Blackbird (in-house engine). Migrated off ES in 2023–2024.
GitLab search — Elasticsearch.
Notion search — in-house plus Elasticsearch plus vector DB hybrid.
Shopify — Elasticsearch-based product search plus in-house ML ranking.
Small SaaS — Algolia, Meilisearch, or Typesense.

Patterns visible in this distribution.

Large-scale recommendation and ranking — Vespa or in-house.
Large-scale full-text and logs — Elasticsearch.
Log-only, cost-optimized — Quickwit / Datadog.
In-app search — Algolia, Meilisearch, Typesense.
Already on AWS — OpenSearch.

14. Operating Cost and Complexity — Where the Real Cost Lives

The variable most often ignored in engine choice is operating cost. Not the license sticker, but human time + infrastructure + data movement.

14.1 Infrastructure Cost

Order-of-magnitude differences.

Quickwit (S3-based) — 1x baseline.
Meilisearch, Typesense (single node) — similar, depending on data size.
Elasticsearch, OpenSearch (hot-tier SSD) — 5–10x for large logs.
Algolia (per-record) — cheap small, very expensive large.
Vespa Cloud — mid to high.

14.2 Operating Time Cost

Algolia — basically none (SaaS).
Meilisearch, Typesense — basically none (single binary, simple ops).
OpenSearch, Elasticsearch — large (shards, nodes, ILM, tuning).
Vespa — very large (learning curve, ops complexity).
Quickwit — medium (object-storage operation plus self-host).

14.3 Migration Cost

Full-text to full-text (e.g., ES to Meilisearch) — moderate.
Full-text to AI-serving (e.g., ES to Vespa) — very large.
In-app to log engine (e.g., Algolia to Quickwit) — nearly meaningless (different scenarios).

Before ripping out an engine because the license looks expensive, add up migration + ops time + learning cost.

Epilogue — Search Is Neither Full-Text Nor Vector, It Is Search

If we summarize May 2026 in one line.

"Elasticsearch is still a giant, in-app and logs split off, AI-serving was taken by Vespa, and hybrid landed in every engine."

Decision Checklist

What is the scenario? — narrow to one of in-app / logs / RAG / recommendation.
Do you have operators? — if not, managed SaaS (Algolia, Elastic Cloud, OpenSearch Service).
If self-hosting, who runs it? — for simplicity, Meilisearch or Typesense.
Is AGPL acceptable? — if not, OpenSearch.
Are log costs exploding? — investigate Quickwit.
Does ranking need ML models? — Vespa.
Do you need hybrid? — almost always yes. Pick the engine where hybrid is smoothest.
Do you really need a vector DB? — if text is alongside, a hybrid engine usually beats a pure vector DB.

Anti-Patterns

"Elasticsearch does it all" — using ES for every scenario blows up ops cost.
"One vector DB for hybrid and full-text" — keyword-match accuracy collapses.
"Quickwit for in-app search" — weak updates, scenario mismatch.
"Vespa for all search" — ops complexity is overwhelming compared to in-app.
"Algolia at infinite scale" — beyond tens of millions of records, evaluate self-host.
"OS equals ES" — after 7.10, API and feature drift accumulate.
"Hybrid isn't necessary" — in 2026 user expectations disagree.

Coming Up

Candidates next — First 90 Days With Vespa — a Spotify-Style Recommender Self-Host Guide, Self-Hosting Quickwit — Log Search on S3 + Kubernetes, RAG in an Hour With Meilisearch + Cohere Rerank.

"Search is not a database. Search is user experience. The tool gets picked at the end of that thought."

— Next-Gen Search Engines 2026, end.