필사 모드: Site & Docs Search 2026 — Algolia / Meilisearch / Typesense / Pagefind / Orama / Markprompt / kapa.ai Deep Dive
EnglishPrologue — Four worlds behind one search box
In 2026, open any static site, docs page, or SaaS dashboard and you will find a small magnifier icon in the top right. Click it, a modal pops up, you type, results appear. Users think it is all the same thing. For developers, that box has **four completely different camps** behind it.
1. **Managed SaaS** — Algolia, Algolia DocSearch, the former Sajari that became Algolia NeuralSearch. Sign up, drop in a key, done. You pay money and your data sits in their cloud.
2. **Self-hosted engines** — Meilisearch, Typesense, Elastic, OpenSearch. You run it, you index, you operate. Lower cost, more control, more ops burden.
3. **Static / in-browser** — Pagefind, Orama, Lunr.js, Fuse.js, MiniSearch, FlexSearch. No backend at all, or an index baked at build time and served statically.
4. **AI docs search (question to answer)** — Markprompt, kapa.ai, Inkeep, Sourcegraph Cody, AnswerOverflow. Not search but a RAG chatbot wearing a search-box hat.
This post maps all four as of May 2026: where each tool fits, pricing model, the shape of the index and query API, and answers to "what should I use for my site". It also covers embeddings vs keyword vs hybrid retrieval, with a chapter on Korean and Japanese search SDKs.
1. The 2026 site-search map — four camps
Same surface, very different responsibilities.
| Camp | Examples | Index lives | Query runs | Ops burden | Cost model |
| --- | --- | --- | --- | --- | --- |
| Managed SaaS | Algolia, DocSearch | Cloud | Cloud | None | Per search / record |
| Self-hosted | Meilisearch, Typesense, Elastic, OpenSearch | Your servers | Your servers | High | Infra |
| Static / client | Pagefind, Orama, Lunr, Fuse | Build / browser | Browser | None | Free |
| AI docs search | Markprompt, kapa.ai, Inkeep | Cloud | Cloud | Low | Per site / question |
Two key forks:
- **Where does the data go?** — Cloud (SaaS) vs yours (self-hosted) vs browser (static).
- **What is the result?** — A list of matching pages (search) vs a synthesized answer (AI search).
The first is a cost and compliance question; the second is a UX question. Neither has a single right answer — you do not run an Elastic cluster for a 100-page static site, and you cannot stuff a 500k-page manual into a Lunr.js index. Site size, update frequency, content type, and compliance pick the camp.
2. Algolia + DocSearch — the de facto standard for OSS docs
Algolia (founded Paris, 2012) is the household name in managed site search. Sign up, create an index, drop in a key, wire up InstantSearch, and a search box is live in five minutes. Result quality, typo tolerance, faceting, synonyms, and analytics are all good. The downside is price — the free tier (about 10k searches / month) gets blown past quickly and the bill ramps fast.
The decisive thing for OSS and docs sites is **DocSearch**. Algolia gives crawler + index + query free of charge to open-source docs sites. React, Vue, Vite, Tailwind, Next.js, MDN — most of those "Search Docs" boxes you see are DocSearch. Apply, get approved, add a meta tag plus the DocSearch component, done.
// Next.js + DocSearch component
export function Search() {
return (
appId="YOUR_APP_ID"
apiKey="YOUR_SEARCH_ONLY_API_KEY"
indexName="your-docs"
/>
)
}
DocSearch v3 ships the Cmd+K keyboard shortcut, modal UI, recent searches, and favorites. No self-hosting, so ops is zero, and for OSS the cost is also zero.
For commercial sites, the Standard plan starts at $50/month and scales with searches and records. Large e-commerce sites get expensive, but they get the result quality and analytics in return. **In 2026 Algolia ships NeuralSearch (built on the former Sajari acquisition), bringing embeddings-based semantic + keyword hybrid retrieval inside one index — hard to match by self-hosting.**
3. Meilisearch — the Rust open source champion
Meilisearch (France, 2018) is the most common answer when people ask for a self-hosted Algolia. A single Rust binary, a REST API, typo tolerance / stemming / facets enabled by default. One docker line away.
docker run -p 7700:7700 getmeili/meilisearch:v1.10
Indexing is simple.
curl -X POST 'http://localhost:7700/indexes/posts/documents' \
-H 'Content-Type: application/json' \
--data-binary '[{"id": 1, "title": "Hello", "body": "world"}]'
Querying.
curl 'http://localhost:7700/indexes/posts/search?q=hllo'
→ hllo matches hello (typo tolerance)
They closed Series B in 2024, so they are well funded, and v1.x is stable as of 2026. Embeddings-based semantic and hybrid search are first-class, with built-in support for OpenAI / Ollama / Cohere embedders. There is a managed option (Meilisearch Cloud) that is usually cheaper than Algolia for the same load.
When to pick it: **when you do not want your data in someone else's cloud, and when you index non-Latin scripts like Korean / Japanese / Chinese** (Meili handles segmentation well via charabia). It does not yet match Algolia on analytics, synonyms, and rule engines, but the OSS license plus Rust stability plus a 1MB binary makes it attractive.
4. Typesense — Meilisearch's biggest rival
Typesense (2017, C++) plays in the same "Algolia alternative" space. It often comes up next to Meili, and the big picture is similar. The details differ.
| Item | Meilisearch | Typesense |
| --- | --- | --- |
| Language | Rust | C++ |
| License | MIT | GPL v3 |
| Single binary | Yes | Yes |
| Embeddings / semantic | Built-in | Built-in |
| Clustering | Cloud only | In OSS too |
| Managed hosting | Meilisearch Cloud | Typesense Cloud |
| Korean default | Segmentable | Segmentable |
Typesense's differentiator is that clustering and Raft consensus are part of the open source build. If you plan to run a cluster, Typesense is a touch friendlier. Conversely, for a single-node simple setup, Meili tends to be a little lighter.
Cloud pricing for both is RAM / CPU based, and both are usually cheaper than Algolia for the same data size. Both are stable as of 2026 (v1.x and v0.27.x) and both are actively developed.
5. Pagefind — the answer for static sites
Pagefind (2022, built by the Eleventy team) is "search built for static sites". The core idea:
- After build, scan the output directory and **generate JSON / WASM index files**.
- At runtime, with no backend, **lazy-load only the index chunks needed** for the query in the browser.
- Indices are chunked, so initial load stays small even with many pages.
After the site build
npx pagefind --site dist
Browser-side code.
const search = await pagefind.search('algolia')
for (const result of search.results) {
const data = await result.data()
console.log(data.url, data.excerpt)
}
Works with Astro, Eleventy, Next.js (static build), Hugo, anywhere. Zero backend, scales horizontally with whatever CDN you have. **For sites under tens of thousands of pages, this is the closest thing to a 2026 default answer.**
Limits are real. Hundreds of thousands of pages bloat the index and lengthen builds. No real-time updates (the index is baked at build). Analytics and faceting do not match a managed SaaS. But for blogs, docs, OSS project sites, it is almost always enough.
6. Orama — TypeScript + typo tolerance + semantic
Orama (2022, Italy) is "a TypeScript search engine that runs both in the browser and on the server". The same code runs on Node, Deno, browsers, edge runtimes.
const db = create({
schema: {
title: 'string',
body: 'string',
tags: 'string[]',
},
})
insert(db, { title: 'Hello', body: 'world', tags: ['a', 'b'] })
const result = await search(db, { term: 'hllo', tolerance: 1 })
Notable points:
- Zero-dependency TypeScript, small bundle.
- Typo tolerance / BM25 / facets / synonyms built in.
- Semantic search (embeddings) in the same library.
- Managed option via Orama Cloud.
Compared to Pagefind — Pagefind shines at "auto-index from the built output of a static site"; Orama shines at "library where you write your own indexing and querying code". Both can run in the browser. SaaS or app, pick Orama; static site, Pagefind usually wins on convenience.
7. Elastic + OpenSearch — the enterprise heavyweights
Elasticsearch (2010, the E in ELK) was the dominant search engine for a long time. Their 2021 license change to SSPL / Elastic License pushed AWS to fork, and OpenSearch was born. As of 2026, both forks evolve in parallel.
- **Elastic** — the company runs cloud and self-hosted. ML, semantic search, observability (APM, logs) all live in the same stack. In 2024 Elasticsearch and Kibana partially returned to Apache 2.0.
- **OpenSearch** — AWS-led. Apache 2.0. Available as Amazon OpenSearch Service. Common in Korean and Japanese cloud teams.
Both are powerful, but **for pure site search they are overkill**. The real value shows up when you need log analytics, metrics, APM, full semantic search, and a flexible DSL all on one platform. Running an Elastic cluster just for docs search inverts the cost equation. **Tens to hundreds of GB of data + complex queries + analytics on one stack** is the sweet spot for Elastic / OpenSearch.
Single-node dev cluster in docker
docker run -p 9200:9200 \
-e "discovery.type=single-node" \
docker.elastic.co/elasticsearch/elasticsearch:8.15.0
8. Lunr.js / Fuse.js / MiniSearch / FlexSearch — in-browser options
The classic "no backend" search libraries. Each still has a place in 2026.
Lunr.js — the classic, the original static-site search
Around 2012, this was everywhere on Jekyll / Hugo sites. Build a JSON index, query it in the browser. BM25-based, English stemmer by default, support for other languages. **Still functional, but development has slowed.** For a new project, Pagefind or Orama is usually the better pick.
Fuse.js — fuzzy / typo-friendly matching
const fuse = new Fuse(items, { keys: ['title', 'body'], threshold: 0.3 })
const result = fuse.search('algoria') // typo to algolia
Not a search engine but **a library for fuzzily matching a JavaScript array**. Great for menus, command palettes, autocomplete. It will not power a full site search, but for small datasets it is perfect.
MiniSearch
Small, fast in-browser full-text search. Typically scales to tens of thousands of documents. Clean API, BM25 with typo tolerance and facets.
const ms = new MiniSearch({ fields: ['title', 'body'] })
ms.addAll(docs)
const results = ms.search('hello world', { fuzzy: 0.2 })
FlexSearch
Often cited as the fastest JS search library. Compact index format and tiny memory footprint. The API is a bit more finicky, but when raw performance matters it pays off.
**Pick guide**: for static sites, Pagefind; for in-browser library use in SaaS, Orama or MiniSearch; for fuzzy matching, Fuse.js; for max performance, FlexSearch.
9. Markprompt — one path for AI docs search
If classic search is "keyword to page", AI docs search is "question to answer". The user asks in natural language, a RAG pipeline pulls relevant chunks, and an LLM synthesizes a reply.
Markprompt (2023) started as "AI search on top of your markdown / MDX docs". The workflow:
1. Crawl the docs site or hook into a GitHub repo.
2. Chunk and embed into a vector index.
3. Expose a chat widget, a search box, or an API.
4. User asks a question, relevant chunks are retrieved, LLM synthesizes, answer comes back with source links.
// Simplified React widget
They run an open-source library set (the @markprompt/react package etc.) alongside managed hosting. Pricing is per answer / message. The big wins are **citation of sources + fast integration**, and **both chat and search UIs** are available.
10. kapa.ai — AI search specifically for developer docs
kapa.ai positions itself as "a chatbot trained on your technical docs". Docker, OpenAI, Mapbox, Reflex and others run it as the "Ask AI" button next to their docs.
The differentiator is that it is **heavily tuned for developer documentation**. Code-block citations, version / library awareness, ingestion of GitHub Issues, Discord, Stack Overflow, and an answer-verification pipeline that reduces hallucinations. They started with LangChain under the hood and now run their own pipeline.
Deployment is usually one widget script.
data-website-id="YOUR_ID"
data-project-name="YourProject"
async></script>
Pricing is enterprise quote, generally higher than Markprompt, but they invest more in answer quality and ops (hallucination monitoring, human review workflows).
11. Inkeep / Sourcegraph Cody / AnswerOverflow — other AI search
The ecosystem has more.
Inkeep
AI chatbot that merges product docs, community, and code. Used by SaaS companies like Anthropic, Pinecone, Speakeasy. Rich set of surfaces — SDK, widget, Slack bot, Discord bot.
Sourcegraph Cody
Built by Sourcegraph, originally a code search company. **Codebase search, not site search** — inside the IDE you ask things like "where is this function called from". Not the category to drop into your docs site, but powerful as a developer tool.
AnswerOverflow
Turns Discord help channels into a searchable index so Google and site search can surface them. The fix for the OSS-project pain of "the answer is in Discord but no one can find it". Free and self-hostable, with a managed option too.
(Note) Mark (RIP)
Some popular search libraries from the 2010s are gone. Sajari was acquired by Algolia and folded into NeuralSearch. Small projects like Mark have unmaintained code. For a new project, lean on the active tools above.
12. Embeddings vs Keyword vs Hybrid
The defining debate of 2026 search: **did embedding-based semantic search replace keyword search**?
Answer: **no, the standard became hybrid — both, side by side.**
The three modes.
| Mode | How | Strong at | Weak at |
| --- | --- | --- | --- |
| Keyword (BM25 etc.) | Word match, weights | Exact terms, code, proper nouns | Synonyms, meaning |
| Embeddings (vectors) | Semantic embedding, cosine similarity | "Related content" | Exact terms, expensive |
| Hybrid | Both + rerank | Strengths of both | Operationally complex, needs tuning |
The common production pattern.
1. Keyword (BM25) for the top 100 candidates.
2. Embeddings semantic score to re-rank.
3. Optionally an LLM-based reranker (cross-encoder, Cohere Rerank) for the final 10.
Algolia NeuralSearch, Meilisearch, Typesense, Elastic, OpenSearch all support hybrid in 2026. Static and in-browser tools (Pagefind, Orama) bolt on some embeddings too, but the embedding index itself is heavy, so they hit limits as the site grows. **AI docs search (Markprompt, kapa.ai) lives on embeddings as the primary axis, with keyword as the assist.**
13. Korea / Japan — Naver, Kakao, Yahoo! search SDKs
The global standards above cover most of the world, but Korea and Japan have their own context.
Korea — Naver / Kakao
- **Naver Search API** — Search Naver's index of blogs / news / encyclopedias / shopping / books / movies and get results. The model is "borrow Naver's index", not "search your site". Sign up and get keys at NAVER Developers.
- **Kakao Search API** — Same idea for the Daum / Kakao index, with similar categories (web / blog / cafe / book / video / image).
- **Korean tokenization** — Algolia, Meili, Typesense, Elastic all accept Korean tokenizers (MeCab, nori, lucene-analyzers-nori). Result quality is dominated by analyzer config.
For your own site search, the global tools above combined with a Korean analyzer is the usual choice. Naver / Kakao APIs fit when you want to "display Naver / Daum's results inside your page".
Japan — Yahoo! / site search
- **Yahoo! Search API** — exposes Yahoo! JAPAN web search through an API. Licensing and rate limits have made it less common as a primary site search box.
- **Japanese tokenization** — MeCab / Sudachi / Kuromoji are the staples. Kuromoji plugins ship with Elastic and OpenSearch, Algolia has a Japanese analyzer, Meilisearch uses its own charabia tokenizer. **For Japanese, where there are no spaces, the tokenizer pretty much determines search quality.**
Summary: **both Korea and Japan use the global tools for the index itself and put effort into the tokenizer**. Naver / Yahoo! APIs are for the separate scenario of showing other sites' results inside yours.
14. Who should pick what — recommendations by scenario
| Scenario | First pick | Second pick |
| --- | --- | --- |
| OSS docs site (under 10k pages) | DocSearch (free) | Pagefind |
| OSS docs site (large) | DocSearch | Algolia paid |
| Personal blog / static site | Pagefind | Lunr.js |
| Astro / Eleventy / Hugo | Pagefind | Orama |
| SaaS in-browser search | Orama | MiniSearch |
| Command palette / autocomplete | Fuse.js | FlexSearch |
| Commercial site (mid-size) | Algolia | Meilisearch Cloud |
| Data must stay inside | Meilisearch (self) | Typesense (self) |
| Enterprise + logs in the same stack | Elastic / OpenSearch | - |
| Docs chatbot (developer) | kapa.ai | Inkeep |
| Docs chatbot (product) | Markprompt | Inkeep |
| Discord answers on the web | AnswerOverflow | - |
| Codebase search | Sourcegraph Cody | - |
| Korean site (own index) | Meili / Algolia + Korean analyzer | - |
| Japanese site (own index) | Elastic + kuromoji / Algolia | Meili + charabia |
Big principles:
- **If the site is small, finish with static / client-side**. Zero ops.
- **As size / update rate / analytics needs grow**, move to managed SaaS or a self-hosted engine.
- **If compliance / data sovereignty drives it**, self-host.
- **When users start asking in natural language**, add an AI docs search.
Nothing is a perfect first pick. Start small (Pagefind / DocSearch / Lunr), move when you outgrow it, scale to self-hosted or SaaS as data grows, layer in AI search when users start asking conversationally — that is the typical 2026 path.
Closing — the decisions hidden behind that search box
That one search box hides four invisible decisions.
1. **Data hosting** — Cloud / your server / browser.
2. **Retrieval algorithm** — Keyword / embeddings / hybrid.
3. **Result shape** — Page list / synthesized answer.
4. **Operational responsibility** — Managed / self.
The good news for 2026 is that **each decision has a strong open source or free option**. DocSearch for OSS docs, Pagefind for static sites, Meilisearch / Typesense for self-indexed, Orama / Fuse / MiniSearch for in-browser, Markprompt / kapa.ai for AI answers. The era when "users feel this site searches well" required massive infra is over.
What is left is the simple work — watch how users actually search on your site, and pick the right camp from the map above.
References
- [Algolia DocSearch](https://docsearch.algolia.com/) — free search for OSS docs.
- [Algolia InstantSearch](https://www.algolia.com/products/search-and-discovery/instantsearch/) — UI components.
- [Algolia NeuralSearch](https://www.algolia.com/products/ai-search/) — semantic + keyword hybrid.
- [Meilisearch](https://www.meilisearch.com/) — Rust self-hosted search.
- [Meilisearch Cloud](https://cloud.meilisearch.com/) — managed option.
- [Typesense](https://typesense.org/) — C++ self-hosted search.
- [Pagefind](https://pagefind.app/) — search for static sites (Eleventy team).
- [Orama](https://orama.com/) — TypeScript search engine.
- [Elasticsearch](https://www.elastic.co/elasticsearch) — enterprise search + observability.
- [OpenSearch](https://opensearch.org/) — Apache 2.0, AWS-led fork.
- [Amazon OpenSearch Service](https://aws.amazon.com/opensearch-service/) — AWS managed.
- [Lunr.js](https://lunrjs.com/) — classic static-site browser search.
- [Fuse.js](https://www.fusejs.io/) — JS fuzzy matching.
- [MiniSearch](https://lucaong.github.io/minisearch/) — lightweight full-text search.
- [FlexSearch](https://github.com/nextapps-de/flexsearch) — fast JS search.
- [Markprompt](https://markprompt.com/) — AI docs search.
- [kapa.ai](https://www.kapa.ai/) — AI chatbot for developer docs.
- [Inkeep](https://inkeep.com/) — AI search across product + community.
- [Sourcegraph Cody](https://sourcegraph.com/cody) — AI for codebases.
- [AnswerOverflow](https://www.answeroverflow.com/) — Discord answers on the web.
- [NAVER Developers — Search API](https://developers.naver.com/docs/serviceapi/search/) — Naver search API.
- [Kakao Developers — Search](https://developers.kakao.com/docs/latest/ko/daum-search/common) — Daum / Kakao search.
- [Yahoo! JAPAN Developer Network](https://developer.yahoo.co.jp/) — Yahoo! search and other Japanese APIs.
- [Elasticsearch kuromoji plugin](https://www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-kuromoji.html) — Japanese analyzer.
- [Lucene nori Korean analyzer](https://lucene.apache.org/core/9_10_0/analysis/nori/index.html) — Korean morphology.
- [Cohere Rerank](https://cohere.com/rerank) — reranking model for hybrid search.
현재 단락 (1/196)
In 2026, open any static site, docs page, or SaaS dashboard and you will find a small magnifier icon...