Skip to content
Published on

Site & Docs Search 2026 — Algolia / Meilisearch / Typesense / Pagefind / Orama / Markprompt / kapa.ai Deep Dive

Authors

In 2026, open any static site, docs page, or SaaS dashboard and you will find a small magnifier icon in the top right. Click it, a modal pops up, you type, results appear. Users think it is all the same thing. For developers, that box has four completely different camps behind it.

  1. Managed SaaS — Algolia, Algolia DocSearch, the former Sajari that became Algolia NeuralSearch. Sign up, drop in a key, done. You pay money and your data sits in their cloud.
  2. Self-hosted engines — Meilisearch, Typesense, Elastic, OpenSearch. You run it, you index, you operate. Lower cost, more control, more ops burden.
  3. Static / in-browser — Pagefind, Orama, Lunr.js, Fuse.js, MiniSearch, FlexSearch. No backend at all, or an index baked at build time and served statically.
  4. AI docs search (question to answer) — Markprompt, kapa.ai, Inkeep, Sourcegraph Cody, AnswerOverflow. Not search but a RAG chatbot wearing a search-box hat.

This post maps all four as of May 2026: where each tool fits, pricing model, the shape of the index and query API, and answers to "what should I use for my site". It also covers embeddings vs keyword vs hybrid retrieval, with a chapter on Korean and Japanese search SDKs.


1. The 2026 site-search map — four camps

Same surface, very different responsibilities.

CampExamplesIndex livesQuery runsOps burdenCost model
Managed SaaSAlgolia, DocSearchCloudCloudNonePer search / record
Self-hostedMeilisearch, Typesense, Elastic, OpenSearchYour serversYour serversHighInfra
Static / clientPagefind, Orama, Lunr, FuseBuild / browserBrowserNoneFree
AI docs searchMarkprompt, kapa.ai, InkeepCloudCloudLowPer site / question

Two key forks:

  • Where does the data go? — Cloud (SaaS) vs yours (self-hosted) vs browser (static).
  • What is the result? — A list of matching pages (search) vs a synthesized answer (AI search).

The first is a cost and compliance question; the second is a UX question. Neither has a single right answer — you do not run an Elastic cluster for a 100-page static site, and you cannot stuff a 500k-page manual into a Lunr.js index. Site size, update frequency, content type, and compliance pick the camp.


2. Algolia + DocSearch — the de facto standard for OSS docs

Algolia (founded Paris, 2012) is the household name in managed site search. Sign up, create an index, drop in a key, wire up InstantSearch, and a search box is live in five minutes. Result quality, typo tolerance, faceting, synonyms, and analytics are all good. The downside is price — the free tier (about 10k searches / month) gets blown past quickly and the bill ramps fast.

The decisive thing for OSS and docs sites is DocSearch. Algolia gives crawler + index + query free of charge to open-source docs sites. React, Vue, Vite, Tailwind, Next.js, MDN — most of those "Search Docs" boxes you see are DocSearch. Apply, get approved, add a meta tag plus the DocSearch component, done.

// Next.js + DocSearch component
import { DocSearch } from '@docsearch/react'
import '@docsearch/css'

export function Search() {
  return (
    <DocSearch
      appId="YOUR_APP_ID"
      apiKey="YOUR_SEARCH_ONLY_API_KEY"
      indexName="your-docs"
    />
  )
}

DocSearch v3 ships the Cmd+K keyboard shortcut, modal UI, recent searches, and favorites. No self-hosting, so ops is zero, and for OSS the cost is also zero.

For commercial sites, the Standard plan starts at $50/month and scales with searches and records. Large e-commerce sites get expensive, but they get the result quality and analytics in return. In 2026 Algolia ships NeuralSearch (built on the former Sajari acquisition), bringing embeddings-based semantic + keyword hybrid retrieval inside one index — hard to match by self-hosting.


3. Meilisearch — the Rust open source champion

Meilisearch (France, 2018) is the most common answer when people ask for a self-hosted Algolia. A single Rust binary, a REST API, typo tolerance / stemming / facets enabled by default. One docker line away.

docker run -p 7700:7700 getmeili/meilisearch:v1.10

Indexing is simple.

curl -X POST 'http://localhost:7700/indexes/posts/documents' \
  -H 'Content-Type: application/json' \
  --data-binary '[{"id": 1, "title": "Hello", "body": "world"}]'

Querying.

curl 'http://localhost:7700/indexes/posts/search?q=hllo'
# → hllo matches hello (typo tolerance)

They closed Series B in 2024, so they are well funded, and v1.x is stable as of 2026. Embeddings-based semantic and hybrid search are first-class, with built-in support for OpenAI / Ollama / Cohere embedders. There is a managed option (Meilisearch Cloud) that is usually cheaper than Algolia for the same load.

When to pick it: when you do not want your data in someone else's cloud, and when you index non-Latin scripts like Korean / Japanese / Chinese (Meili handles segmentation well via charabia). It does not yet match Algolia on analytics, synonyms, and rule engines, but the OSS license plus Rust stability plus a 1MB binary makes it attractive.


4. Typesense — Meilisearch's biggest rival

Typesense (2017, C++) plays in the same "Algolia alternative" space. It often comes up next to Meili, and the big picture is similar. The details differ.

ItemMeilisearchTypesense
LanguageRustC++
LicenseMITGPL v3
Single binaryYesYes
Embeddings / semanticBuilt-inBuilt-in
ClusteringCloud onlyIn OSS too
Managed hostingMeilisearch CloudTypesense Cloud
Korean defaultSegmentableSegmentable

Typesense's differentiator is that clustering and Raft consensus are part of the open source build. If you plan to run a cluster, Typesense is a touch friendlier. Conversely, for a single-node simple setup, Meili tends to be a little lighter.

Cloud pricing for both is RAM / CPU based, and both are usually cheaper than Algolia for the same data size. Both are stable as of 2026 (v1.x and v0.27.x) and both are actively developed.


5. Pagefind — the answer for static sites

Pagefind (2022, built by the Eleventy team) is "search built for static sites". The core idea:

  • After build, scan the output directory and generate JSON / WASM index files.
  • At runtime, with no backend, lazy-load only the index chunks needed for the query in the browser.
  • Indices are chunked, so initial load stays small even with many pages.
# After the site build
npx pagefind --site dist

Browser-side code.

import * as pagefind from '/pagefind/pagefind.js'

const search = await pagefind.search('algolia')
for (const result of search.results) {
  const data = await result.data()
  console.log(data.url, data.excerpt)
}

Works with Astro, Eleventy, Next.js (static build), Hugo, anywhere. Zero backend, scales horizontally with whatever CDN you have. For sites under tens of thousands of pages, this is the closest thing to a 2026 default answer.

Limits are real. Hundreds of thousands of pages bloat the index and lengthen builds. No real-time updates (the index is baked at build). Analytics and faceting do not match a managed SaaS. But for blogs, docs, OSS project sites, it is almost always enough.


6. Orama — TypeScript + typo tolerance + semantic

Orama (2022, Italy) is "a TypeScript search engine that runs both in the browser and on the server". The same code runs on Node, Deno, browsers, edge runtimes.

import { create, insert, search } from '@orama/orama'

const db = create({
  schema: {
    title: 'string',
    body: 'string',
    tags: 'string[]',
  },
})

insert(db, { title: 'Hello', body: 'world', tags: ['a', 'b'] })

const result = await search(db, { term: 'hllo', tolerance: 1 })

Notable points:

  • Zero-dependency TypeScript, small bundle.
  • Typo tolerance / BM25 / facets / synonyms built in.
  • Semantic search (embeddings) in the same library.
  • Managed option via Orama Cloud.

Compared to Pagefind — Pagefind shines at "auto-index from the built output of a static site"; Orama shines at "library where you write your own indexing and querying code". Both can run in the browser. SaaS or app, pick Orama; static site, Pagefind usually wins on convenience.


7. Elastic + OpenSearch — the enterprise heavyweights

Elasticsearch (2010, the E in ELK) was the dominant search engine for a long time. Their 2021 license change to SSPL / Elastic License pushed AWS to fork, and OpenSearch was born. As of 2026, both forks evolve in parallel.

  • Elastic — the company runs cloud and self-hosted. ML, semantic search, observability (APM, logs) all live in the same stack. In 2024 Elasticsearch and Kibana partially returned to Apache 2.0.
  • OpenSearch — AWS-led. Apache 2.0. Available as Amazon OpenSearch Service. Common in Korean and Japanese cloud teams.

Both are powerful, but for pure site search they are overkill. The real value shows up when you need log analytics, metrics, APM, full semantic search, and a flexible DSL all on one platform. Running an Elastic cluster just for docs search inverts the cost equation. Tens to hundreds of GB of data + complex queries + analytics on one stack is the sweet spot for Elastic / OpenSearch.

# Single-node dev cluster in docker
docker run -p 9200:9200 \
  -e "discovery.type=single-node" \
  docker.elastic.co/elasticsearch/elasticsearch:8.15.0

8. Lunr.js / Fuse.js / MiniSearch / FlexSearch — in-browser options

The classic "no backend" search libraries. Each still has a place in 2026.

Around 2012, this was everywhere on Jekyll / Hugo sites. Build a JSON index, query it in the browser. BM25-based, English stemmer by default, support for other languages. Still functional, but development has slowed. For a new project, Pagefind or Orama is usually the better pick.

Fuse.js — fuzzy / typo-friendly matching

import Fuse from 'fuse.js'

const fuse = new Fuse(items, { keys: ['title', 'body'], threshold: 0.3 })
const result = fuse.search('algoria') // typo to algolia

Not a search engine but a library for fuzzily matching a JavaScript array. Great for menus, command palettes, autocomplete. It will not power a full site search, but for small datasets it is perfect.

MiniSearch

Small, fast in-browser full-text search. Typically scales to tens of thousands of documents. Clean API, BM25 with typo tolerance and facets.

import MiniSearch from 'minisearch'

const ms = new MiniSearch({ fields: ['title', 'body'] })
ms.addAll(docs)
const results = ms.search('hello world', { fuzzy: 0.2 })

FlexSearch

Often cited as the fastest JS search library. Compact index format and tiny memory footprint. The API is a bit more finicky, but when raw performance matters it pays off.

Pick guide: for static sites, Pagefind; for in-browser library use in SaaS, Orama or MiniSearch; for fuzzy matching, Fuse.js; for max performance, FlexSearch.


If classic search is "keyword to page", AI docs search is "question to answer". The user asks in natural language, a RAG pipeline pulls relevant chunks, and an LLM synthesizes a reply.

Markprompt (2023) started as "AI search on top of your markdown / MDX docs". The workflow:

  1. Crawl the docs site or hook into a GitHub repo.
  2. Chunk and embed into a vector index.
  3. Expose a chat widget, a search box, or an API.
  4. User asks a question, relevant chunks are retrieved, LLM synthesizes, answer comes back with source links.
// Simplified React widget
import { Markprompt } from '@markprompt/react'

<Markprompt projectKey="YOUR_PROJECT_KEY" />

They run an open-source library set (the @markprompt/react package etc.) alongside managed hosting. Pricing is per answer / message. The big wins are citation of sources + fast integration, and both chat and search UIs are available.


10. kapa.ai — AI search specifically for developer docs

kapa.ai positions itself as "a chatbot trained on your technical docs". Docker, OpenAI, Mapbox, Reflex and others run it as the "Ask AI" button next to their docs.

The differentiator is that it is heavily tuned for developer documentation. Code-block citations, version / library awareness, ingestion of GitHub Issues, Discord, Stack Overflow, and an answer-verification pipeline that reduces hallucinations. They started with LangChain under the hood and now run their own pipeline.

Deployment is usually one widget script.

<script src="https://widget.kapa.ai/kapa-widget.bundle.js"
  data-website-id="YOUR_ID"
  data-project-name="YourProject"
  async></script>

Pricing is enterprise quote, generally higher than Markprompt, but they invest more in answer quality and ops (hallucination monitoring, human review workflows).


The ecosystem has more.

Inkeep

AI chatbot that merges product docs, community, and code. Used by SaaS companies like Anthropic, Pinecone, Speakeasy. Rich set of surfaces — SDK, widget, Slack bot, Discord bot.

Sourcegraph Cody

Built by Sourcegraph, originally a code search company. Codebase search, not site search — inside the IDE you ask things like "where is this function called from". Not the category to drop into your docs site, but powerful as a developer tool.

AnswerOverflow

Turns Discord help channels into a searchable index so Google and site search can surface them. The fix for the OSS-project pain of "the answer is in Discord but no one can find it". Free and self-hostable, with a managed option too.

(Note) Mark (RIP)

Some popular search libraries from the 2010s are gone. Sajari was acquired by Algolia and folded into NeuralSearch. Small projects like Mark have unmaintained code. For a new project, lean on the active tools above.


12. Embeddings vs Keyword vs Hybrid

The defining debate of 2026 search: did embedding-based semantic search replace keyword search?

Answer: no, the standard became hybrid — both, side by side.

The three modes.

ModeHowStrong atWeak at
Keyword (BM25 etc.)Word match, weightsExact terms, code, proper nounsSynonyms, meaning
Embeddings (vectors)Semantic embedding, cosine similarity"Related content"Exact terms, expensive
HybridBoth + rerankStrengths of bothOperationally complex, needs tuning

The common production pattern.

  1. Keyword (BM25) for the top 100 candidates.
  2. Embeddings semantic score to re-rank.
  3. Optionally an LLM-based reranker (cross-encoder, Cohere Rerank) for the final 10.

Algolia NeuralSearch, Meilisearch, Typesense, Elastic, OpenSearch all support hybrid in 2026. Static and in-browser tools (Pagefind, Orama) bolt on some embeddings too, but the embedding index itself is heavy, so they hit limits as the site grows. AI docs search (Markprompt, kapa.ai) lives on embeddings as the primary axis, with keyword as the assist.


13. Korea / Japan — Naver, Kakao, Yahoo! search SDKs

The global standards above cover most of the world, but Korea and Japan have their own context.

Korea — Naver / Kakao

  • Naver Search API — Search Naver's index of blogs / news / encyclopedias / shopping / books / movies and get results. The model is "borrow Naver's index", not "search your site". Sign up and get keys at NAVER Developers.
  • Kakao Search API — Same idea for the Daum / Kakao index, with similar categories (web / blog / cafe / book / video / image).
  • Korean tokenization — Algolia, Meili, Typesense, Elastic all accept Korean tokenizers (MeCab, nori, lucene-analyzers-nori). Result quality is dominated by analyzer config.

For your own site search, the global tools above combined with a Korean analyzer is the usual choice. Naver / Kakao APIs fit when you want to "display Naver / Daum's results inside your page".

  • Yahoo! Search API — exposes Yahoo! JAPAN web search through an API. Licensing and rate limits have made it less common as a primary site search box.
  • Japanese tokenization — MeCab / Sudachi / Kuromoji are the staples. Kuromoji plugins ship with Elastic and OpenSearch, Algolia has a Japanese analyzer, Meilisearch uses its own charabia tokenizer. For Japanese, where there are no spaces, the tokenizer pretty much determines search quality.

Summary: both Korea and Japan use the global tools for the index itself and put effort into the tokenizer. Naver / Yahoo! APIs are for the separate scenario of showing other sites' results inside yours.


14. Who should pick what — recommendations by scenario

ScenarioFirst pickSecond pick
OSS docs site (under 10k pages)DocSearch (free)Pagefind
OSS docs site (large)DocSearchAlgolia paid
Personal blog / static sitePagefindLunr.js
Astro / Eleventy / HugoPagefindOrama
SaaS in-browser searchOramaMiniSearch
Command palette / autocompleteFuse.jsFlexSearch
Commercial site (mid-size)AlgoliaMeilisearch Cloud
Data must stay insideMeilisearch (self)Typesense (self)
Enterprise + logs in the same stackElastic / OpenSearch-
Docs chatbot (developer)kapa.aiInkeep
Docs chatbot (product)MarkpromptInkeep
Discord answers on the webAnswerOverflow-
Codebase searchSourcegraph Cody-
Korean site (own index)Meili / Algolia + Korean analyzer-
Japanese site (own index)Elastic + kuromoji / AlgoliaMeili + charabia

Big principles:

  • If the site is small, finish with static / client-side. Zero ops.
  • As size / update rate / analytics needs grow, move to managed SaaS or a self-hosted engine.
  • If compliance / data sovereignty drives it, self-host.
  • When users start asking in natural language, add an AI docs search.

Nothing is a perfect first pick. Start small (Pagefind / DocSearch / Lunr), move when you outgrow it, scale to self-hosted or SaaS as data grows, layer in AI search when users start asking conversationally — that is the typical 2026 path.


That one search box hides four invisible decisions.

  1. Data hosting — Cloud / your server / browser.
  2. Retrieval algorithm — Keyword / embeddings / hybrid.
  3. Result shape — Page list / synthesized answer.
  4. Operational responsibility — Managed / self.

The good news for 2026 is that each decision has a strong open source or free option. DocSearch for OSS docs, Pagefind for static sites, Meilisearch / Typesense for self-indexed, Orama / Fuse / MiniSearch for in-browser, Markprompt / kapa.ai for AI answers. The era when "users feel this site searches well" required massive infra is over.

What is left is the simple work — watch how users actually search on your site, and pick the right camp from the map above.


References