Split View: AI 검색 엔진 2026 정면 비교 — Perplexity · You.com · Phind · Exa · SearchGPT · Gemini AI Mode · Kagi · Tavily 그리고 Deep Research 카테고리

AI 검색 엔진 2026 정면 비교 — Perplexity · You.com · Phind · Exa · SearchGPT · Gemini AI Mode · Kagi · Tavily 그리고 Deep Research 카테고리

프롤로그 — "10개 파란 링크"의 시대가 끝나고 있다
1장 · 질의와 답 사이의 새로운 레이어
2장 · 컨슈머 AI 검색 앱 — 사람이 직접 쓰는 것
3장 · 개발자용 검색 API — 에이전트의 손과 발
4장 · Deep Research 카테고리 — 5분짜리 답이 30분짜리 보고서가 될 때
5장 · 인용 신뢰성 문제 — AI 검색의 가장 큰 약점
6장 · AI 검색이 이기는 경우 vs 못 이기는 경우
7장 · AI-네이티브 브라우저 명제 — Comet 그리고 그 너머
8장 · 솔직한 결정 트리 — 무엇을 언제 쓸지
에필로그 — 체크리스트, 안티패턴, 다음 글
참고 / References

프롤로그 — "10개 파란 링크"의 시대가 끝나고 있다

2024년까지만 해도 '검색한다'는 말은 명확했다. Google이나 Bing의 박스에 단어를 치고, 결과 페이지(SERP)에서 파란 링크 10개를 받고, 그중 몇 개를 클릭해 직접 읽고, 머릿속에서 답을 합성했다. 검색 엔진은 검색을 하고, 종합은 사람이 했다.

2026년 봄, 그 구분이 무너졌다. 이제 검색 엔진은 답을 한다. Perplexity는 처음부터 그렇게 설계됐고, OpenAI의 ChatGPT Search/SearchGPT가 따라왔으며, Google은 자신의 핵심 제품인 google.com에 AI Mode를 통합하기 시작했다. 그리고 그 위에 새로운 카테고리가 올라왔다 — Deep Research. 5~10분 동안 수십 페이지를 자동으로 브라우즈하고, 인용된 보고서를 만들어 주는 그것.

동시에 그 아래에서는 또 다른 시장이 자라났다. 검색 인프라. Exa, Tavily, Serper, You.com의 Search API — 이들은 컨슈머에게 답을 주지 않는다. AI 에이전트가 호출할 검색 API를 판다. RAG 파이프라인, 에이전트 워크플로의 도구로 쓰인다. 우리가 Perplexity에 묻는 질문 뒤에서 도는 검색 호출의 상당수는 이런 인프라를 거친다.

이 글은 두 시장을 함께 본다. 컨슈머 AI 검색(사람이 직접 쓰는) vs 개발자 AI 검색 API(에이전트가 쓰는)를 한 페이지에 놓고 정리한다. Deep Research라는 새로운 동물도 따로 떼서 본다. 마지막으로 솔직한 질문 — 어떤 경우에 AI 검색이 실제로 전통 검색을 이기고, 어떤 경우에는 못 이기는가. 그리고 Perplexity의 Comet 브라우저가 던지는 명제 — "검색은 결국 브라우저가 된다" — 가 얼마나 그럴듯한가.

가격·기능은 빠르게 바뀐다. 이 글의 모든 숫자는 2026년 5월 기준이며, 의사결정 프레임에 집중한다. 6개월 뒤 가격이 달라져도 "컨슈머 vs 인프라", "단일 질의 vs Deep Research", "인용 신뢰성"이라는 축은 살아남는다.

1장 · 질의와 답 사이의 새로운 레이어

전통 검색을 한 줄로 그리면 이렇다.

사용자 질의 → 검색 엔진(인덱스 매칭 + 랭킹) → SERP(10개 링크)
                                                      ↓
                                                 사람이 읽고 합성

AI 검색을 한 줄로 그리면 이렇다.

사용자 질의 → 의도 해석(LLM) → 다중 검색 쿼리 생성
            → 검색 인덱스 호출(자체 인덱스 또는 외부 API)
            → 결과 페이지 패치 + 본문 추출
            → 모델이 합성 → 인용 달린 답변

여기서 중요한 건 레이어가 늘었다는 사실이다. 사용자 질의는 그대로 인덱스로 가지 않는다. LLM이 한 번 의도를 풀고, 여러 개의 서브쿼리로 분해하고, 검색 결과를 패치해 본문을 읽고, 그것을 다시 모델에 넣어 답을 합성한다. 이 과정에서 두 가지가 바뀌었다.

첫째, 검색 엔진의 정의가 흐려졌다. Perplexity는 자체 크롤러를 돌리지만, 동시에 Bing과 Google의 외부 검색 API에도 의존한다. ChatGPT Search는 Bing 인덱스를 깔고 OpenAI의 큐레이션 레이어를 얹는다. You.com은 자체 인덱스를 가지고 있지만, 외부 소스도 섞는다. 인덱스를 가졌는가와 답을 주는가가 분리됐다.

둘째, 검색 결과가 곧 답이다. 사용자가 직접 링크를 클릭할 일이 줄었다. Perplexity의 내부 통계로는 평균 답변당 1.2~~1.5개 인용 클릭이 일어난다 — 즉 대부분의 사용자는 답을 읽고 끝낸다. 출판사 입장에서는 트래픽이 사라지는 셈이다. 이게 2025~~2026년에 출판사 vs Perplexity의 저작권/라이센싱 분쟁의 핵심이다.

세 가지 아키텍처를 구분해 두자.

자체 인덱스 + 자체 합성 — Google AI Mode가 대표적이다. Google이 가진 인덱스 위에 Gemini가 올라타 답을 합성한다. Kagi의 AI 기능도 유사하다.
외부 인덱스 + 자체 합성 — Perplexity의 일부 모드, ChatGPT Search(Bing 기반). 인덱스는 빌려오고, 합성과 UX만 만든다.
자체 인덱스 + API 노출 — Exa, Tavily, You.com의 API 사이드. 답을 하지 않고, 검색 결과(또는 추출된 본문)를 API로 노출해 다른 LLM이 합성하게 한다.

이 세 아키텍처가 누가 누구를 이기는 게임이 아니라 다른 시장이라는 게 2026년 시점의 솔직한 인식이다. 컨슈머 사용자에게 1번과 2번이 보이고, 개발자에게 3번이 보인다. 그리고 같은 회사가 두 시장에 동시에 발을 걸치는 경우도 많다(You.com이 가장 명확한 예).

2장 · 컨슈머 AI 검색 앱 — 사람이 직접 쓰는 것

2.1 Perplexity — 카테고리를 정의한 회사

Perplexity는 2022년 말에 출시돼 AI 검색이라는 카테고리 자체를 정의했다. 2026년 5월 기준 주요 표면은 네 가지다.

Pro Search. 디폴트 모드. 사용자 질문을 받아 LLM(GPT-4·Claude·Sonar 자체 모델 중 선택)이 여러 서브쿼리로 분해, 웹을 검색, 결과를 합성. 답변마다 인라인 인용이 달리고, 사이드바에 "관련 질문"이 따라온다. 무료 사용자도 일정량 가능, Pro 구독자(월 20달러)는 무제한에 모델 선택 가능.

Deep Research. 2024년 말 등장, 2025~~2026년 동안 핵심 차별화 기능이 됐다. 한 질문에 5~~10분을 쓰며 수십 페이지를 자동으로 패치·분석·교차검증해 보고서를 만든다. 일반 Pro Search가 5~~10개 소스를 보고 1분 안에 답한다면, Deep Research는 30~~80개 소스를 보고 5~~10분 안에 보고서를 낸다. Pro 구독자에게 일일 사용 한도(2026년 5월 기준 일일 5~~10회 수준)가 있다. Max 플랜(월 200달러 수준)은 한도가 훨씬 크다.

Spaces. 2024년 등장한 워크스페이스 개념. 특정 주제(예: "내 박사 학위 리서치", "한국 라면 시장 분석")에 대해 컨텍스트를 누적하고, 자체 PDF·노트를 업로드해 그것을 함께 검색에 쓴다. 협업 가능. 사실상 "Perplexity 위에 올린 Notion" 비슷한 포지셔닝.

Comet 브라우저. 2025년 베타, 2026년 초 정식 출시. 단순한 브라우저가 아니라 "AI-네이티브 브라우저"를 자처한다. 모든 페이지에 사이드바 어시스턴트가 있어 현재 보고 있는 페이지를 컨텍스트로 질문 가능. 멀티탭을 한꺼번에 요약 가능. 에이전틱 모드는 사용자 대신 페이지를 조작해 작업을 수행한다(예: "이 회사들의 가격을 비교해서 표로 정리"). 명제는 명확하다 — 검색의 미래는 "박스"가 아니라 브라우저 자체다.

Perplexity의 강점은 UX 일관성과 인용 표시의 가독성이다. 답변마다 어떤 소스에서 왔는지 명확하다. 약점은 자체 인덱스의 신선도와 인용된 사실의 정확도 — 멀티홉 추론에서 종종 인용과 본문이 어긋난다. 2026년 들어 자체 Sonar 모델 라인이 빨라지고 저렴해져, Pro Search의 응답 속도가 ChatGPT Search보다 빠를 때가 많다.

2.2 You.com — 가장 먼저 시작한 자가 카니발

You.com은 사실 가장 먼저 AI 검색을 시도한 회사다. Richard Socher가 2020년 창업, 2022년에 이미 AI 답변 기능을 넣었다. 그런데 2026년 시점에서 You.com은 컨슈머 검색에서는 Perplexity에 밀렸다.

이유는 명확하다. You.com은 너무 많은 것을 동시에 시도했다. 검색, AI 채팅, 이미지 생성, 코드 에이전트, 그 위에 광고 슬롯까지. UX가 분산됐고, 어느 표면도 1등이 되지 못했다. 2025년 후반부터 회사는 명백히 API 비즈니스로 무게 중심을 옮기는 행보를 보였다 — You.com Search API가 핵심 매출 상품이 되고, 컨슈머 you.com은 데모/마케팅 도구의 역할이 강해졌다. Enterprise B2B 검색 솔루션도 강조한다.

그래서 You.com을 평가하는 두 가지 방법이 있다.

컨슈머 검색 엔진으로? Perplexity나 ChatGPT Search가 낫다.
개발자 검색 API로? You.com Search API는 진지하게 고려할 만하다. 가격이 합리적이고, 한국·일본 등 비영어권 결과가 의외로 좋다.

You.com의 진짜 가치는 다음 장의 인프라 시장에서 보인다.

2.3 Phind — 개발자만 쓰는 그 도구

Phind는 다른 회사들과 완전히 다른 방향으로 갔다. 개발자 검색에 특화. Stack Overflow 대용으로 시작했고, 코드 블록·라이브러리 문서·GitHub Issues를 우선 가중치로 둔다. 답변에 코드가 많고, 인용은 공식 문서·GitHub·SO·MDN이 압도적이다.

2026년 시점 Phind는 두 갈래로 진화했다.

Phind Search — 디폴트 모드. 코딩 질문에 특화된 AI 검색. 무료는 일일 한도 있음, Phind Pro는 월 20달러 수준.
Phind 70B / Phind Models — 자체 코드 특화 모델 라인. 일부는 오픈웨이트로 공개됐다. 다른 도구(Cursor 등)에서 쓸 수 있다.

특이점은 CLI 도구도 제공한다는 것. phind 명령어로 터미널에서 직접 검색·코드 생성이 가능하다. 일부 개발자들에게는 Stack Overflow보다 빠른 첫 응답 도구다.

약점은 명확하다 — 코딩 외의 영역에서는 Perplexity가 압도적이고, 사용자 베이스가 좁아 데이터 효과가 느리다. 그리고 코딩 영역 자체도 GitHub Copilot Chat, Cursor의 inline AI, Claude Code 같은 IDE/터미널 도구가 빠르게 잠식해 들어왔다. "검색해서 답을 본다"는 워크플로 자체가 IDE 안으로 이동하는 중이다.

2.4 SearchGPT / ChatGPT Search — OpenAI의 후발 진입

OpenAI는 2024년 7월에 SearchGPT 프로토타입을 발표했고, 2024년 10월에 ChatGPT에 정식으로 검색을 통합했다. 2026년 5월 기준 작동 방식은 다음과 같다.

ChatGPT 안에서 사용자가 명시적으로 "Search" 버튼을 누르거나 모델이 자동으로 검색이 필요하다고 판단하면 웹 검색을 시작한다.
검색 인덱스는 Bing 기반 + OpenAI 자체 크롤러 보완.
답변에 인라인 인용이 달리고, 사이드 패널에 출처 목록이 펼쳐진다.
무료 사용자에게도 검색이 열려 있다 — 이게 사용자 수 측면에서 가장 큰 차별점.

OpenAI는 Deep Research도 별도의 모드로 제공한다. 2025년 초 공개됐고, OpenAI o-시리즈 reasoning 모델 위에 멀티홉 브라우즈를 입혔다. 한 질문에 평균 10~30분을 쓴다. Plus( $20)에서 월 한정 횟수, Pro($ 200)에서 더 많이 쓸 수 있다. OpenAI Deep Research는 종종 Perplexity Deep Research보다 깊지만, 더 느리다. 학술/시장조사 등 무거운 작업에 유리하다.

ChatGPT Search의 강점은 ChatGPT라는 거대한 사용자 베이스에 검색이 무료로 들어가 있다는 사실이다. 이건 Perplexity 같은 전용 앱에는 치명적인 압력이다. 대다수 일반 사용자에게 "AI에 질문할 때 검색까지 같이"의 디폴트 자리는 이미 ChatGPT가 차지했다고 봐도 무방하다.

약점은 — UX가 검색 전용이 아니라 채팅의 일부라 인용 클릭률이 낮고, 출처 표시도 Perplexity만큼 깔끔하지 않다. 멀티홉 추론에서 인용과 본문이 어긋나는 빈도도 비슷하거나 약간 더 높다.

2.5 Gemini AI Mode / Grounded Search — Google이 자기 본진을 바꾸다

가장 중요하지만 가장 덜 다뤄지는 변화는 Google에서 일어났다. Google은 2024년에 "AI Overviews"를 SERP 상단에 띄우기 시작했다(미국). 2025년에는 "AI Mode"를 별도 탭으로 추가, 2026년 초에는 점진적으로 AI Mode를 디폴트 검색 UX로 만드는 실험을 광범위하게 굴리고 있다.

작동 방식은:

사용자가 google.com 검색창에 질의 → AI Mode가 자동으로 켜진 케이스라면 상단에 Gemini가 합성한 답이 인라인 인용과 함께 뜨고, 그 아래에 기존 SERP가 따라온다.
"AI 모드" 탭으로 들어가면 Perplexity와 흡사한 UI — Gemini와 채팅, 인용, 후속 질문.
개발자에게는 Gemini API의 Grounding with Google Search 기능 — 자기 LLM 호출에 google 검색 결과를 자동으로 인용 포함시킨다.

Google이 가진 압도적 우위는 인덱스다. 어떤 AI 검색 회사도 Google의 인덱스 신선도와 커버리지를 따라잡지 못한다. 약점은 — Google의 광고 매출이 클릭에서 오기 때문에, AI 모드를 너무 적극적으로 밀면 자기 매출을 카니발화한다. 그래서 Google은 의도적으로 천천히 간다. 2026년에는 출판사 보상 모델·광고 모델 실험이 계속 진행 중이다.

전통적 검색 사용자라면 Gemini AI Mode는 사실 "이미 충분히 가깝다". Perplexity로 굳이 갈 이유가 줄어들고 있다 — 결과 품질이 동등하고, 인덱스가 더 신선하고, 무료다.

2.6 Bing / Copilot — Microsoft의 두 트랙

Bing은 2023년 초 GPT-4 기반 채팅을 검색에 통합하며 처음 AI 검색을 컨슈머에 가져왔다. 그 흥분이 가라앉은 뒤 Microsoft는 두 트랙으로 정리했다.

Bing Search 자체 — 인덱스로서 다른 서비스들(ChatGPT Search, Perplexity 일부)에 도매 공급. 사실상 인프라가 됐다.
Copilot (구 Bing Chat) — Microsoft 365에 깊이 통합된 어시스턴트로 진화. Windows에 기본 탑재, Edge 사이드바, Office 앱 안 곳곳. 검색만 하는 것이 아니라 문서 작업·코드·이메일 작성 다 한다.

컨슈머 AI 검색 단독 도구로서 Copilot은 시장 점유율이 크지 않다. 그러나 Microsoft 생태계 안에서는 디폴트로 있어 사용자 수가 결코 작지 않다.

2.7 Kagi — 유료 프라이버시 우선 검색의 AI 측면

Kagi는 매우 다른 종류의 회사다. 유료 검색 엔진(월 10달러 수준). 광고 없음. 사용자 데이터를 모으지 않는다. 검색 결과를 사용자가 직접 차단·승격할 수 있다. 충성도 높은 작은 사용자 베이스를 가지고 있다.

Kagi의 AI 기능은 세 가지다.

Quick Answer — 검색 결과 위에 짧은 AI 요약. 인용 포함.
The Assistant — 별도 채팅 인터페이스. Claude, GPT, Gemini 등 여러 모델 중에 사용자가 고른다.
Universal Summarizer — URL·YouTube 영상 등을 요약하는 별도 도구.

Kagi의 가치 명제는 명확하다 — 광고도 없고 데이터도 안 모으는 검색. 그 위에 AI가 정직하게 옵션으로 얹혀 있다. AI를 끄고 검색만 쓸 수도 있다. 이건 Perplexity나 Gemini AI Mode가 디폴트로 답을 강요하는 것과 정반대 철학이다.

약점은 가격(월 10달러는 검색에 돈 내는 게 익숙치 않은 다수에게 진입장벽)과 인덱스 커버리지(Google·Bing 대비 일부 영역에서 약하다 — 자체 크롤러 + 여러 외부 인덱스의 결합).

진지한 정보 노동자, 학자, 저널리스트에게는 진지하게 좋은 선택이다. 대중 시장 도구는 아니다.

3장 · 개발자용 검색 API — 에이전트의 손과 발

같은 시기, 사용자에게 보이지 않는 시장이 폭발적으로 커졌다. RAG 파이프라인, AI 에이전트가 직접 호출하는 검색 API 시장이다. 이게 인프라 레이어다.

3.1 Exa — 임베딩 우선의 개발자 검색

Exa(구 Metaphor)는 검색을 처음부터 LLM이 호출할 것을 가정해 설계했다. 일반 키워드 검색이 아니라 의미 기반(embedding) 검색이 핵심이다. "이 페이지와 비슷한 페이지를 찾아"라거나 "이런 내용을 다루는 블로그 글을 찾아"가 잘 된다. 키워드 검색도 지원한다.

핵심 기능 셋:

/search — 의미·키워드 검색. 결과로 URL·제목·발행일·요약을 돌려준다.
/contents — 검색 결과 페이지의 본문을 깨끗하게 추출. 메인 콘텐츠만 추리고 광고·내비게이션을 제거. LLM에 바로 넣을 수 있는 형태.
/findSimilar — 한 URL을 주면 비슷한 페이지들을 돌려준다.
/answer — 위 셋을 합쳐 짧은 답을 만들어 주는 편의 엔드포인트.

Exa의 강점은 LLM/에이전트 친화적인 API 디자인이다. 결과 페이지의 본문 추출(/contents)이 깔끔해 RAG 파이프라인에서 거의 추가 가공 없이 LLM에 넘길 수 있다. findSimilar는 일반 검색 엔진에는 없는 동작이고, 리서치 작업에서 의외로 강력하다.

가격은 사용량 기반(천 쿼리당 몇 달러 수준, 콘텐츠 추출 추가 요금). 컨슈머 제품을 만들면 빠르게 비싸진다.

Perplexity·ChatGPT 같은 컨슈머 답변기를 만들고 싶은 스타트업이 가장 자주 호출하는 인프라가 Exa다. Perplexity는 자체 인덱스를 가졌지만, 작은 스타트업은 Exa를 깐다.

3.2 Tavily — 에이전트 검색의 사실상 표준

Tavily는 처음부터 LLM 에이전트용 검색 API라는 좁은 시장만 노렸다. LangChain·LlamaIndex 같은 프레임워크에 일찍 통합됐고, 에이전트 워크플로의 디폴트 검색 도구가 됐다.

API는 단순하다 — tavily.search(query, depth=...) 하나로 의도를 풀어 다중 쿼리를 실행, 결과를 추출·정제해 반환. depth='basic'은 빠르고 싸고, depth='advanced'는 비싸고 깊다.

특이점 — 답변 합성도 옵션으로 제공한다(include_answer=True). Tavily가 결과를 합성해 짧은 답을 반환할 수 있다. 단, 답변 품질은 LLM이 직접 합성하는 것보다 약하다. 일반적으로는 답을 자기 LLM이 만들고 Tavily는 검색만 시킨다.

Tavily의 가치 명제는 명확하고 좁다 — "LangChain의 WebSearchTool을 빠르게 채워 넣고 싶다? Tavily." 그게 다다. 가격도 단순하다. 무료 한도가 있고, 그 위는 사용량 기반.

3.3 Serper / SerpAPI — Google 결과 그대로

Serper, SerpAPI, ScaleSerp 같은 회사들은 한 가지를 판다 — Google의 결과를 API로. 자체 인덱스가 없다. Google 검색을 스크레이프(또는 공식 채널로) 받아 구조화된 JSON으로 돌려준다.

왜 이게 시장인가? Google의 공식 Programmable Search API는 비싸고 제한적이다(일일 한도, 결과 가공 한정). 그래서 비공식적인 우회 인프라가 사실상 표준 사용처가 됐다.

특징:

가장 싼 가격(천 쿼리당 1~2달러 수준).
결과는 Google과 동일 — 인덱스 신선도·커버리지는 따라올 자가 없다.
답변 합성 없음. 그냥 결과 리스트.

RAG/에이전트 파이프라인에서 Tavily/Exa가 비싸다고 느낄 때, Serper로 한 단계 내려가 비용을 절약하는 패턴이 흔하다. 단, 본문 추출은 별도로 해야 한다(예: Reader API, Trafilatura 같은 라이브러리).

3.4 You.com Search API — 자체 인덱스를 가진 인프라

앞서 컨슈머 시장에서 You.com이 흔들렸다고 했지만, 인프라 시장에서는 자기 자리를 만들었다. You.com Search API는:

자체 웹 인덱스 + 외부 결합. Bing/Google 의존도가 작다.
본문 추출 포함. Exa의 /contents와 유사.
가격이 Tavily·Exa와 비슷하거나 약간 저렴.
다국어 결과(한국·일본 포함)가 의외로 좋다.

엔터프라이즈 시장(자체 데이터 + 웹 검색 결합)을 명시적으로 노리고 있고, 일부 대형 B2B SaaS의 검색 백엔드로 도입됐다.

3.5 Brave Search API — 자체 인덱스의 또 하나

Brave는 브라우저 회사로 더 유명하지만, 자체 검색 인덱스(Brave Search)를 운영한다. 그 위에 API를 노출 — Brave Search API. 가격이 매우 합리적이고, 데이터 사용 정책이 명확하다(쿼리·결과를 학습에 안 쓴다).

자체 인덱스이므로 Google과 결과가 다르고, 일부 도메인에서는 결과 품질이 떨어진다. 그러나 프라이버시·라이센싱 측면에서 명확한 매력이 있어 일부 AI 회사들이 백엔드로 채택했다.

3.6 컨슈머 vs API 매트릭스

제품	컨슈머 앱	개발자 API	자체 인덱스	Deep Research	가격(월/시작)	핵심 차별점
Perplexity	강 (Pro Search, Spaces, Comet)	약 (Sonar API)	부분 (자체 + Bing 등)	강 (Deep Research)	무료 / Pro 20달러 / Max ~200달러	UX·인용·Comet 브라우저
You.com	약	강 (Search API)	강 (자체)	부분	무료 / Pro ~20달러 / API 사용량	다국어, 엔터프라이즈
Phind	중 (개발자만)	약	부분	없음	무료 / Pro 20달러	코드·문서 특화
Exa	없음	강	강	부분 (Research API)	사용량 (천 쿼리 수 달러)	의미 검색, 본문 추출, `findSimilar`
OpenAI Search/Deep Research	강 (ChatGPT Search)	중 (web_search 도구)	부분 (Bing 기반)	강 (Deep Research)	ChatGPT Plus 20달러 / Pro 200달러	사용자 베이스, 모델 일체화
Gemini AI Mode / Grounding	강 (google.com AI Mode)	강 (Grounding API)	매우 강 (Google 인덱스)	강 (Gemini Deep Research)	무료 / Google One AI ~20달러 / Vertex 사용량	인덱스 신선도, 무료
Bing / Copilot	중 (Copilot)	중 (Bing API)	매우 강 (Bing 인덱스)	부분 (Copilot Pages)	무료 / Copilot Pro 20달러 / API 사용량	M365 통합
Kagi	중 (Kagi Search + Assistant)	약 (Kagi API 작음)	부분 (자체+외부)	부분	월 10달러부터	광고·트래킹 없음, 사용자 제어
Tavily	없음	강	없음 (외부 큐레이션)	부분 (Research API)	무료 한도 / 사용량	LangChain·LlamaIndex 디폴트
Serper / SerpAPI	없음	강 (Google 결과)	없음	없음	사용량 (천 쿼리 1~2달러)	가장 싸고 Google 결과 그대로
Brave Search API	약 (Brave Search)	중	강 (자체)	없음	무료 한도 / 사용량	자체 인덱스, 학습 미사용

이 매트릭스를 머리에 넣고 다음 장으로 간다.

4장 · Deep Research 카테고리 — 5분짜리 답이 30분짜리 보고서가 될 때

2025년 동안 가장 흥미로운 단일 사건은 Deep Research라는 새 카테고리가 부상한 것이다. 이름이 같은 제품을 OpenAI, Perplexity, Google이 거의 동시에 출시했다. 셋 다 한 가지를 한다 — 한 질문에 5~30분을 쓰면서 자율적으로 수십 페이지를 브라우즈하고, 인용된 보고서를 만든다.

작동 메커니즘은 비슷하다.

사용자 질의 → 모델이 리서치 계획 수립(어떤 서브토픽을 봐야 할지)
            → 다중 검색 쿼리를 자동 생성
            → 결과 페이지들을 차례로 패치, 본문을 모델 컨텍스트에 누적
            → 부족하면 추가 검색 → 보강
            → 충돌하는 사실들을 확인 → 교차검증
            → 구조화된 보고서로 합성, 모든 주장에 인용 첨부

세 제품의 차이를 정리하면.

OpenAI Deep Research — 가장 깊다. 30분 이상 도는 경우도 흔하다. o-시리즈 reasoning 모델 위에 도구 호출을 입혔다. 학술 리서치·시장조사·DD 문서 같은 무거운 작업에 강하다. 단점: 느리다. 비싸다(Pro 플랜이거나 API 사용량). 그리고 가끔 너무 깊이 들어가서 핵심을 놓친다.

Perplexity Deep Research — 가장 빠르고 가장 자주 쓰기 좋다. 보통 5~10분. 결과물은 OpenAI보다 가볍지만 일상적인 정보 작업에는 충분히 깊다. Pro 사용자에게 일일 한도가 있고, 다른 두 제품보다 진입장벽이 낮다.

Gemini Deep Research — Google One AI 플랜에 포함. Google 인덱스의 신선도가 그대로 살아 있다. 합성 품질이 점진적으로 올라왔고, Gemini의 1M+ 토큰 컨텍스트 윈도우로 누적 정보를 많이 들고 있다. 이게 정성적으로 차이를 만든다 — 길고 흩어진 정보를 합칠 때 유리.

언제 Deep Research를 쓸 가치가 있는가.

쓸 만한 작업:

시장조사("2026년 동남아 핀테크 시장 주요 플레이어와 자금조달 흐름")
학술 리서치 정리("Mamba와 Transformer 아키텍처를 비교한 최근 1년 논문 요약")
회사 DD("XYZ 회사 — 제품, 팀, 자금조달, 경쟁 환경, 위험")
정책·법률 흐름 정리("EU AI Act 시행령 2026년 변화")

쓸 가치 없는 작업:

단순 사실 확인("React 19 출시일") — 5초면 일반 검색이 답한다.
코딩 디버깅 — Deep Research보다 코드를 직접 읽고 실행하는 게 빠르다.
실시간 정보(주가, 뉴스 속보) — 인덱스 신선도가 더 중요하다.
정답이 단일 출처에 있는 경우(공식 문서 한 페이지면 끝)

Deep Research가 잘하는 영역은 여러 소스를 가로지르며 합성해야 하는 작업이다. 한 페이지로 답이 안 나오는 질문. 그 외에는 과잉이고 시간 낭비다.

또 한 가지 — Deep Research의 환각이 더 위험하다. 일반 검색 답변에서 모델이 인용을 어긋나게 달면 사용자가 빨리 알아챈다(인용을 클릭해 보면 안 맞는다). Deep Research의 30페이지 보고서에서 한 문장의 인용이 어긋나면, 사용자가 30개 인용을 다 확인하지 않는다. 길이가 길수록 검증 비용이 늘고, 어긋난 인용을 못 잡을 확률도 늘어난다. 진지한 산출물에 쓰려면 핵심 주장은 반드시 인간이 인용을 직접 확인해야 한다.

5장 · 인용 신뢰성 문제 — AI 검색의 가장 큰 약점

AI 검색의 가장 큰 약속은 "인용이 달려 있으니 신뢰할 수 있다"이다. 정직하게 말하자 — 이 약속은 절반만 사실이다.

2024~~2025년에 여러 독립 평가가 같은 결론을 냈다. AI 검색 답변의 인용을 무작위 표본으로 검증해 보면, **20~~40%의 빈도로 인용과 본문이 어긋난다**. 어긋나는 방식에는 세 가지가 있다.

소스에 그 사실이 없다 (가장 위험). 모델이 다른 곳에서 들은 사실을 그 인용에 붙였다. 사용자가 인용을 클릭하면 그 문장의 근거가 그 페이지에 없다.
소스에 비슷한 사실이 있지만 미묘하게 다르다. 숫자가 다르거나, 조건이 빠졌거나, 시점이 어긋났다. 가장 자주 보이는 패턴.
소스에 그 사실이 맞게 있지만, 다른 인용 슬롯에 잘못 붙었다. 두 문장의 인용을 모델이 바꿔 단 케이스.

세 가지 다 LLM이 합성 단계에서 만드는 실수다. 검색 자체의 문제는 아니다 — 검색은 관련 페이지를 잘 가져왔다. 그러나 모델이 그 페이지에서 "본 것"을 자신의 출력에 "옮기는" 과정에서 어긋난다.

각 제품의 인용 신뢰성에 대한 정직한 인상(독립 평가들의 평균치):

Gemini AI Mode (Google 인덱스 기반) — 인용 정확도가 평균적으로 가장 높다. 빠른 응답(2~3 페이지 보고 답)에서 특히 좋다.
Perplexity Pro Search — 인용 표시가 가장 가독성이 좋고, 짧은 답은 보통 잘 맞는다. 멀티홉 답변(여러 소스를 가로지르는 합성)에서는 1, 2번 종류의 실수가 늘어난다.
ChatGPT Search — 비슷한 수준이지만 인용 표시 UX가 더 묻혀 있어 사용자가 검증을 덜 한다.
Deep Research 제품들 — 30개 이상의 인용 중 평균 3~6개가 어긋난다. 짧은 답보다 절대 수치는 더 정확할 수 있지만, 문서 전체로 보면 어긋난 인용이 반드시 몇 개는 끼어 있다.

실용적 결론은 명확하다.

저비용 결정(어디서 점심을 먹을까, React 19에 useEffect 동작이 어떻게 바뀌었나) — AI 검색을 그대로 믿어도 무방하다.
중비용 결정(시장 진입 분석의 초안, 어떤 기술 스택을 검토할지) — AI 검색을 1차 자료로 쓰되, 핵심 사실 2~3개는 인용을 직접 확인.
고비용 결정(법률·의료·재무·정책) — AI 검색은 출발점일 뿐. 모든 핵심 사실은 1차 출처에서 직접 검증. 보고서가 30 페이지여도 인용 30개 다 클릭해서 확인.

이건 AI 검색을 안 쓰라는 말이 아니다. 검색 결과가 100% 정확하다고 가정하지 말고, 답변의 신뢰도에 등급을 매겨 쓰라는 말이다. 동일한 등급화는 전통 검색에도 필요했다(위키피디아의 한 문장도 검증이 필요하다). AI 검색이 다른 건, 답이 너무 매끄러워 보여서 사용자가 검증 본능을 잃기 쉽다는 점이다.

6장 · AI 검색이 이기는 경우 vs 못 이기는 경우

전통 검색의 죽음을 너무 일찍 선언하는 글이 많다. 2026년 시점에서 정직한 진단은 두 시장이 공존한다는 것이다. 어느 쪽이 디폴트인지가 케이스마다 다르다.

AI 검색이 명백히 이기는 경우.

"이게 뭐야 / 이게 어떻게 돼" 류의 설명형 질문. 한 사실이 아니라 개념을 묻는 경우. 예: "Mamba 아키텍처가 Transformer와 어떻게 다른가". AI 검색이 합성해 주는 게 SERP를 직접 읽는 것보다 압도적으로 빠르다.
여러 페이지에 흩어진 사실을 합쳐야 할 때. 비교, 시장조사, 트렌드 분석. 인간이 5~10개 페이지를 직접 열어 읽고 합칠 일을 AI가 1분 안에 한다.
모호한 자연어 질의. "그 회사 — 작년에 시리즈 B 받았던 — 이름이 뭐였더라". 키워드를 모르는 검색에서 LLM의 의도 해석이 빛난다.
언어를 넘는 검색. 한국어로 묻고 영어 소스에서 답을 합성. AI 검색이 자연스럽게 한다.
코딩의 첫 단계 — "이 라이브러리에서 X를 어떻게 하지" — 다만 IDE 통합 도구(Cursor, Claude Code, Copilot Chat)가 이 자리를 빠르게 잠식하는 중.

전통 검색이 여전히 이기는 경우.

단일 사실 확인이 명확한 경우. "삼성전자 종가 어제 얼마". 한 번호다. AI 검색은 더 느리고 더 비싸다. SERP의 첫 박스가 답이다.
공식 문서 한 페이지로 끝나는 질문. React 공식 문서, Python 공식 문서 — 거기로 직접 가는 게 더 빠르다. AI 검색은 그 페이지를 요약해 주지만 종종 빠지는 디테일이 있다.
탐색이 목적인 검색. 영감을 얻으려고 둘러보는 경우. 이미지 검색, 디자인 레퍼런스, 쇼핑. AI 답은 결정적이라 둘러보기에 불리하다.
속보·실시간 정보. 인덱스 신선도가 결정적인 경우. Google이 가장 강하다.
출처 그 자체가 정보일 때. 뉴스의 어느 매체가 이걸 보도했는지가 중요한 케이스. AI 답은 합성하느라 "어디서 봤는지"를 가린다.
검색 행위가 사적 데이터를 남기는 게 부담스러울 때. AI 검색은 더 많은 컨텍스트를 본다. 프라이버시 측면에서 더 노출이 크다. 이게 Kagi 같은 제품의 명제다.

진지한 정보 노동자의 2026년 패턴은 명확하다 — 두 가지를 동시에 쓴다. 디폴트 검색창은 여전히 Google이지만(특히 빠른 사실 확인), 깊은 질문에는 Perplexity나 ChatGPT Search로 간다. Deep Research가 필요한 무거운 작업은 OpenAI나 Perplexity Deep Research에 던지고 차 마시며 기다린다. 결과는 인간이 다시 검증한다.

도구를 하나만 쓰려고 하지 마라. 그게 안티패턴이다.

7장 · AI-네이티브 브라우저 명제 — Comet 그리고 그 너머

Perplexity의 Comet 브라우저, 그리고 Arc의 Max(Browser Company가 한때 추진한 AI 브라우저), Brave의 Leo 통합, Opera의 Aria — 이 흐름은 한 가지 명제로 요약된다. 검색 박스가 아니라 브라우저 자체가 AI-네이티브가 되어야 한다.

그 명제의 논거는 다음과 같다.

사용자의 컨텍스트는 검색 박스에 들어가지 않는다. 사용자는 이미 어떤 페이지를 보고 있다. 그 페이지가 컨텍스트다. 새 검색을 하려고 박스로 돌아가는 건 컨텍스트의 단절이다. 사이드바 어시스턴트가 현재 페이지를 컨텍스트로 갖고 있어야 자연스럽다.
멀티탭이 한 작업이다. 비교 쇼핑, 시장조사 — 사용자는 이미 여러 탭을 띄워 놓고 있다. AI는 그 모든 탭의 내용을 같이 봐야 진짜 도움이 된다.
AI는 에이전트로 진화한다. 답을 주는 것이 아니라 페이지를 직접 조작해 작업을 수행한다. 그러려면 브라우저 안에 살아야 한다.

Comet은 이 명제를 가장 명확히 구현했다. 사이드바 어시스턴트, 멀티탭 요약, 에이전틱 모드. 2026년 시점 사용자 베이스는 아직 작다 — 수십만 명 단위. Chrome·Safari 같은 거인의 시장 점유율 앞에서는 미미하다.

장벽은 명확하다.

브라우저 교체 비용이 매우 높다. 사람들은 자신의 북마크·확장·세션을 옮기지 않는다.
Chrome이 이미 Gemini를 통합하기 시작했다. Google이 자기 브라우저에 똑같은 기능을 깔면 Comet의 명제가 흡수된다.
Safari가 ChatGPT를 통합하기 시작했다(iOS/macOS 26 흐름). Apple도 같은 자리를 노린다.

Comet의 진정한 가치는 — 결과적으로 Perplexity가 점유한 사용자를 잃지 않게 하는 잠금장치다. 컨슈머 AI 검색 시장은 ChatGPT와 Google에 둘러싸여 압박을 받고 있고, 자기 브라우저를 가지면 사용자 채널을 직접 갖게 된다. 비즈니스 명제가 더 명확하다.

다른 한편으로, AI 브라우저는 컨슈머 AI 검색의 최종 폼이 맞을 가능성이 높다. 5년 뒤를 그리면, 우리가 "검색"이라는 별도 행위를 하기보다는 브라우저 안에 항상 켜진 어시스턴트가 있고, 우리가 보는 페이지·여는 탭·작성하는 텍스트가 모두 그 어시스턴트의 컨텍스트가 된다. Perplexity Comet, Chrome+Gemini, Safari+ChatGPT 중 누가 그 자리를 가져갈지가 다음 라운드의 싸움이다.

명제는 옳을 가능성이 높다. 구현자가 누가 되는지가 미정이다.

8장 · 솔직한 결정 트리 — 무엇을 언제 쓸지

일반 사용자 — 정보 노동자, 학자, 저널리스트.

빠른 사실 확인 — Google(검색창이 이미 익숙하면) 또는 Gemini AI Mode(같은 박스에서 AI 답도). 5초 안에 답이 나오는 게 핵심.
개념 설명·합성형 질문 — Perplexity Pro 또는 ChatGPT Search. 둘 다 좋다. ChatGPT를 이미 쓰면 ChatGPT가 자연스럽다.
무거운 리서치(시장조사, 학술 정리, DD) — Deep Research. OpenAI(가장 깊음), Perplexity(가장 빠름), Gemini(인덱스 신선) 중 작업 성격에 맞춰. 결과는 반드시 인간이 검증.
프라이버시·트래킹 회피가 중요 — Kagi. 비용을 감수.
코드 관련 질문 — IDE 안의 도구(Claude Code, Cursor, Copilot Chat)가 먼저. 검색 도구가 필요하면 Phind.

개발자 — RAG / 에이전트 만드는 사람.

빠르게 띄우고 싶다, 표준 통합 원함 — Tavily. LangChain·LlamaIndex 디폴트.
의미 검색·findSimilar 필요 — Exa. 본문 추출도 깔끔.
Google 결과 그대로, 비용 최소 — Serper / SerpAPI. 본문 추출은 별도.
다국어 결과 중요, 엔터프라이즈 거버넌스 필요 — You.com Search API.
인덱스 신선도 + 자기 LLM 호출에 인용 — Gemini API의 Grounding with Google Search.
자체 인덱스 + 학습 미사용 보장 — Brave Search API.
헤비 사용, 비용 절감 필요 — 여러 백엔드를 섞어 비용·품질 트레이드오프. 단일 의존 회피.

팀·조직.

소규모 컨슈머 SaaS 만든다 — 컨슈머 AI 답변기 빌더. Exa + 자체 LLM, 또는 Perplexity Sonar API를 본다.
사내 지식 검색 — Glean·Mem 같은 엔터프라이즈 검색이 다른 시장이다. 본 글 범위 밖.
에이전트 워크플로(예: Slack 봇이 자료 찾아 줌) — Tavily가 가장 빠르게 띄울 수 있는 선택. 비용이 커지면 Exa·Serper로 분산.
저작권·라이센싱 신중함이 필수(미디어·법무) — Brave Search API(학습 미사용) 또는 자체 크롤러 + 명시적 라이센스 계약.

가격 민감도.

무료로 끝낼 수 있나 — 컨슈머는 Gemini AI Mode가 이미 무료에 충분히 좋다. 개발자는 Tavily·Exa·Brave 무료 한도 안에서 시작.
월 20달러 수준 — Perplexity Pro, ChatGPT Plus, Kagi Pro 중 워크플로에 맞춰 하나.
월 100~200달러 — Perplexity Max나 ChatGPT Pro. Deep Research 헤비 사용자.
사용량 가변 — API 사이드는 본질적으로 사용량. 헤비 사용 가능성이 있으면 월간 예산을 정해 두고 대시보드로 추적하라.

가장 흔한 실수: 컨슈머 도구 한 개에 모든 워크플로를 욱여넣는다. Perplexity로 모든 걸 다 하려 하지 마라. 빠른 사실은 Google이 빠르고, 코드는 IDE 도구가 낫고, 무거운 리서치는 Deep Research가 별도 도구다. 두세 가지를 워크플로마다 분담해 쓰는 게 2026년 표준이다.

에필로그 — 체크리스트, 안티패턴, 다음 글

도구를 채택한 다음 일주일 안에 확인할 체크리스트

내가 지난 주에 한 검색 5개를 두세 도구에 같이 던져 비교했다.
인용을 무작위로 3개 골라 실제 클릭해 본문에 그 사실이 있는지 확인했다.
Deep Research를 한 번 써 보고 — 진짜 그 시간 가치가 있었는지 자기 평가했다.
모바일 사용 패턴을 확인했다(브라우저, 데스크톱 앱, 모바일 앱별 UX 차이).
프라이버시 설정을 확인했다(검색이 학습에 쓰이는지, 데이터 보관 정책).
개발 작업이라면 — 한 API(Tavily 또는 Exa)에 가벼운 RAG 데모를 만들어 봤다.
월 예상 비용을 계산했다(구독료 + API 사용량 + Deep Research 호출).

안티패턴 — 흔히 보는 실수

인용을 검증하지 않고 답을 그대로 옮긴다. 가장 흔하고 가장 위험하다. 답이 매끄럽다고 사실인 건 아니다. 핵심 주장 2~3개는 1차 출처 확인이 필수.
Deep Research를 가벼운 질문에 쓴다. 5초면 답이 나오는 사실 확인에 30분짜리 보고서를 만들지 마라. 도구의 목적이 다르다.
하나의 컨슈머 도구로 모든 워크플로를 처리하려 한다. 빠른 사실, 합성형 질문, 무거운 리서치, 코드 — 다 다른 도구가 잘한다.
AI 검색의 답이 100% 정확하다고 가정한다. 20~40% 인용 어긋남은 어느 제품에서나 있다. 답의 신뢰도에 등급을 매겨라.
개발자가 RAG 파이프라인에 한 API만 깐다. 비용 변동성과 가용성 위험이 크다. 두세 백엔드를 섞어 폴백 구조를 만들어라.
출판사·저작권 측면을 무시한다. 미디어·법무 회사에서 사용 시, 학습에 안 쓰는 인덱스(Brave 등)이나 명시적 라이센스 있는 소스를 우선해라.
"검색은 죽었다"고 너무 일찍 선언한다. 2026년에도 Google은 여전히 절대다수의 검색을 처리한다. 두 시장은 공존한다.
Comet 같은 새 브라우저를 메인으로 통째로 바꾼다. 사이드에서 한 달 써 보고 결정해라. 익숙한 브라우저를 잃는 비용이 크다.

다음 글 예고

다음 글에서는 RAG 파이프라인의 검색 백엔드 비교 — Exa vs Tavily vs You.com vs Serper 동일 질의 벤치마크를 본다. 같은 100개 질의에 네 API를 던지고, 결과 관련성·본문 추출 품질·응답 시간·비용을 정량 비교한다. 그리고 그 다음은 AI-네이티브 브라우저의 내부 구조 — Comet이 어떻게 페이지를 보고 멀티탭을 합치고 에이전틱 액션을 수행하는지, 그것을 자기 제품에 어떻게 흉내 낼 수 있는지.

그 다음은 또 Deep Research 시스템을 직접 만들기 — OpenAI Deep Research 같은 멀티홉 리서치 에이전트를 자기 도메인에서 작동하게 만드는 법. 검색 API + LLM + 합성 루프의 구조.

참고 / References

AI Search Engines 2026 Head-to-Head — Perplexity · You.com · Phind · Exa · SearchGPT · Gemini AI Mode · Kagi · Tavily, and the Deep Research Category

Prologue — the "ten blue links" era is ending
1. The new layer between query and answer
2. Consumer AI search apps — what humans type into
3. Developer search APIs — hands and feet for agents
4. The Deep Research category — when a five-minute answer becomes a thirty-minute report
5. The citation reliability problem — AI search's largest weakness
6. When AI search wins vs when it doesn't
7. The AI-native browser thesis — Comet, and beyond
8. Honest decision tree — what to use when
Epilogue — checklist, anti-patterns, what's next
References

Prologue — the "ten blue links" era is ending

Through 2024, the act of "searching" was unambiguous. You typed a few words into Google or Bing, you got a SERP with ten blue links, you clicked some of them, you read the pages, and you synthesized the answer yourself. The engine searched. The human composed.

By spring 2026 that split had collapsed. The engines themselves answer now. Perplexity was designed that way from day one; OpenAI's ChatGPT Search and SearchGPT followed; Google began integrating AI Mode into its own crown jewel at google.com. On top of all that, a new category emerged — Deep Research, the multi-minute autonomous browse-and-synthesize tools that fetch dozens of pages, cross-reference, and hand back a cited report.

Below that, a separate market quietly exploded — search infrastructure. Exa, Tavily, Serper, You.com's Search API. These are not consumer products. They sell search APIs for AI agents to call. RAG pipelines, agent workflows, the toolboxes underneath the LLMs we already use. Many of the queries you fire at Perplexity are themselves powered by infrastructure of this shape.

This piece looks at both markets together. Consumer AI search (humans typing) versus developer AI search APIs (agents calling) on one page. We pull Deep Research out as its own animal. And we ask the honest question — when does AI search actually beat classic search, and when does it not? Plus the thesis Perplexity's Comet browser keeps making — that search will eventually be the browser — and how plausible that really is.

Prices and features move fast. Every number in this post is as of May 2026 and the focus is on the decision frame. Six months from now the numbers will shift, but the axes — consumer vs infra, single-query vs Deep Research, citation reliability — will still apply.

1. The new layer between query and answer

Classic search in one line.

user query → search engine (index match + ranking) → SERP (ten links)
                                                          ↓
                                                human reads + synthesizes

AI search in one line.

user query → intent parsing (LLM) → multiple sub-queries generated
           → search index called (own index or external API)
           → result pages fetched + main content extracted
           → model synthesizes → cited answer

The crucial point is that a layer has been added. The user's query no longer goes straight to an index. An LLM first parses intent, decomposes the question into sub-queries, fetches results, reads the body text, and feeds all of that back into the model to synthesize an answer. Two things flipped.

First, the definition of a "search engine" blurred. Perplexity runs its own crawler but also leans on Bing and Google APIs. ChatGPT Search sits on the Bing index with OpenAI's curation layer over it. You.com has its own index but mixes in external sources. Owning an index and producing an answer are now decoupled.

Second, results are the answer. Users click out less. Perplexity's own data shows roughly 1.2 to 1.5 citation clicks per answer on average — meaning most users read and stop. For publishers, that is traffic vaporizing, and it is the heart of the 2025–2026 publisher-versus-Perplexity licensing fights.

Three architectures are worth naming.

Own index + own synthesis — Google AI Mode is the cleanest example. Google's index plus Gemini on top. Kagi's AI features are similar.
External index + own synthesis — Perplexity in some modes, ChatGPT Search (Bing-backed). They rent the index and own the synthesis and UX.
Own index + API exposure — Exa, Tavily, the API side of You.com. They do not answer for the user. They expose results (or extracted body text) for some other LLM to synthesize on.

These are not three players fighting for the same crown. They are three markets. Consumers see (1) and (2). Developers see (3). And several companies — You.com being the clearest — sit in both at once.

2. Consumer AI search apps — what humans type into

2.1 Perplexity — the company that defined the category

Perplexity launched late 2022 and basically named the AI-search category. As of May 2026 it has four major surfaces.

Pro Search. The default. The user's question goes to an LLM (the user picks among GPT-4, Claude, or Perplexity's own Sonar line), which fans out sub-queries, searches the web, and synthesizes a cited answer with inline references and a sidebar of follow-up questions. Free tier has a cap; Pro at about $20/month removes the cap and unlocks model choice.

Deep Research. Showed up late 2024 and became the headline differentiator through 2025–2026. Spends five to ten minutes on one question, fetching dozens of pages, cross-referencing, and producing a structured report. Where Pro Search reads five to ten sources in under a minute, Deep Research reads thirty to eighty sources in five to ten minutes. Pro plan has a daily cap (in the order of five to ten runs as of May 2026); the Max plan (around $200/month) raises it dramatically.

Spaces. A workspace concept added in 2024. You pin a topic — say "my PhD reading list" or "Korean instant noodle market" — accumulate context inside it, upload your own PDFs and notes, and let them flow into search. Collaborative. Roughly Notion-shaped on top of Perplexity.

Comet browser. Beta in 2025, generally available in early 2026. Not just a browser — it pitches itself as an AI-native browser. Every page has a sidebar assistant that holds the current page as context. Multi-tab summarization. Agentic mode that can drive pages on your behalf (for example, "compare the pricing on these companies and put it in a table"). The thesis is direct — search's future is not a box, it's the browser itself.

Perplexity's strengths are UX consistency and readable citation rendering — every claim is traceable. The weaknesses are freshness of its own index and fidelity of cited facts — in multi-hop reasoning, citations and underlying text often diverge. Going into 2026 the Sonar line got faster and cheaper, so Pro Search frequently feels snappier than ChatGPT Search.

2.2 You.com — the early mover that ate its own lunch

You.com was actually first to ship AI search. Richard Socher founded it in 2020 and shipped AI answers inside it by 2022. By 2026, though, You.com had clearly lost the consumer race to Perplexity.

The reason is plain. You.com tried to do too many things at once — search, AI chat, image generation, code agent, ads bolted on — and the UX fractured. None of the surfaces ever became number one in their lane. By late 2025 the company very visibly pivoted weight onto the API business — You.com Search API became the headline revenue product, and consumer you.com leaned more toward demo and marketing. Their enterprise B2B angle leans hard on the same API.

So there are two ways to score You.com.

As a consumer search engine? Perplexity or ChatGPT Search is better.
As a developer search API? You.com Search API is genuinely worth a look — pricing is reasonable and non-English (Korean, Japanese) results are surprisingly good.

The real story lives in the next chapter, in the infra market.

2.3 Phind — the one only developers use

Phind chose a totally different lane — developer search. It started as a Stack Overflow alternative and weights code blocks, library docs, and GitHub issues heavily. Answers are code-dense, and citations skew to official docs, GitHub, SO, MDN.

By 2026, Phind ran along two tracks.

Phind Search — the default, an AI search tuned for code. Free tier with daily caps, Phind Pro around $20/month.
Phind 70B / Phind Models — its own code-tuned model family, some of which were released as open weights and got integrated into other tools (Cursor and similar).

A specific quirk — Phind also ships a CLI. Running the phind command in your terminal does search and code generation right there. Some developers treat it as a faster-first-response than Stack Overflow.

The weaknesses are obvious. Outside coding Perplexity wins easily; the user base is narrow so data effects compound slowly; and coding itself is being eaten from inside the IDE — GitHub Copilot Chat, Cursor's inline AI, Claude Code. The "search then answer" workflow keeps getting subsumed into IDE tooling.

2.4 SearchGPT / ChatGPT Search — OpenAI's late entry

OpenAI announced SearchGPT as a prototype in July 2024 and folded full web search into ChatGPT in October 2024. As of May 2026, how it works:

Inside ChatGPT the user can hit a "Search" toggle, or the model auto-decides search is needed.
The index is Bing-backed plus OpenAI's own crawler topping up gaps.
Answers carry inline citations; a side panel lists sources.
Free users have search — that is the biggest distribution wedge.

OpenAI's Deep Research is a separate mode, launched early 2025, built on the o-series reasoning models with multi-hop browsing layered on. Averages ten to thirty minutes per question. Limited runs on Plus ($20), much higher caps on Pro ($200). OpenAI Deep Research is often deeper than Perplexity Deep Research but also slower. It shines on academic and heavy market-research jobs.

ChatGPT Search's biggest strength is that search is free inside a product that already has hundreds of millions of users. That is a brutal pressure on dedicated apps like Perplexity. For mainstream users the default seat for "ask AI and have it search" is already ChatGPT.

The weakness — the UX is chat-first, not search-first, so citation clickthrough is lower and source rendering is less crisp than Perplexity. Citation-vs-claim drift in multi-hop answers is similar or a hair worse.

2.5 Gemini AI Mode / Grounded Search — Google rewires its core

The most important shift gets the least airtime — Google itself. Google started showing "AI Overviews" above the SERP in 2024 (US). In 2025 it added an "AI Mode" tab. By early 2026 it was running broad experiments to make AI Mode the default search UX for many query classes.

How it works:

You type into google.com → if AI Mode is on, a Gemini-synthesized answer with inline citations appears at the top, with the conventional SERP underneath.
Click the "AI Mode" tab and you get something very close to Perplexity — chat with Gemini, citations, follow-ups.
For developers, the Gemini API's Grounding with Google Search automatically attaches cited search results to your LLM calls.

Google's overwhelming asset is the index. Nobody matches Google's freshness and coverage. The constraint is that Google's ad revenue lives on clicks, so pushing AI Mode aggressively cannibalizes its own P&L. Google goes deliberately slow. Through 2026 the publisher compensation and ad model experiments are still in flight.

For mainstream search users, Gemini AI Mode is honestly "already close enough" — quality is comparable to Perplexity, the index is fresher, and it is free. The reason to keep going to Perplexity is shrinking.

2.6 Bing / Copilot — Microsoft's two tracks

Bing got the AI search story started in early 2023 by bolting GPT-4 chat onto search. Once the hype settled, Microsoft cleanly split the work in two.

Bing Search itself — became wholesale infrastructure, powering ChatGPT Search and parts of Perplexity. Effectively an index for hire.
Copilot (formerly Bing Chat) — evolved into a Microsoft 365 assistant. Baked into Windows, in the Edge sidebar, threaded through Office apps. It does more than search — documents, code, email.

As a standalone consumer AI search tool, Copilot is not the leader. But inside the Microsoft ecosystem it is the default, and the user count is not small.

2.7 Kagi — paid privacy-first search with optional AI

Kagi is a different kind of company. Paid search engine, around $10/month. No ads. No user data collection. You can block or boost domains yourself. Loyal, small user base.

Its AI surfaces:

Quick Answer — a short AI summary above the result list, with citations.
The Assistant — separate chat UI; the user picks the model (Claude, GPT, Gemini, others).
Universal Summarizer — a side tool for summarizing URLs or YouTube videos.

The value prop is sharp — search without ads, without tracking, with AI as an honest option. You can turn AI off and just search. The opposite philosophy of Perplexity or Gemini AI Mode, both of which force an answer by default.

Weaknesses: the price tag ($10/month is friction in a market trained on free search), and index coverage that lags Google/Bing in some niches (it blends its own crawler with external indexes).

For serious knowledge workers, academics, journalists — Kagi is a genuinely good pick. Not a mass-market tool.

3. Developer search APIs — hands and feet for agents

In the same window, a market most users never see has exploded — search APIs for agents to call directly. Infrastructure layer.

3.1 Exa — embeddings-first developer search

Exa (née Metaphor) was designed from the start assuming an LLM would be the caller. Its signature is semantic search, not keyword search. "Find me pages similar to this one" or "find blog posts that talk about this topic" works well. Keyword search is also supported.

Core surface:

/search — semantic or keyword. Returns URLs, titles, publication dates, snippets.
/contents — fetches clean body text for a page, stripped of ads and navigation. Drop-in ready for LLM context.
/findSimilar — give it one URL, get back similar pages. Not something traditional engines expose.
/answer — convenience endpoint that stacks the above into a short synthesized answer.

Exa's strength is LLM-friendly API design. The body extraction (/contents) is clean enough that RAG pipelines hand it to the LLM with almost no post-processing. findSimilar is genuinely useful for research workflows and has no real equivalent in classic engines.

Pricing is usage-based (a few dollars per thousand queries; content extraction extra). For consumer-scale products that gets expensive fast.

Startups building Perplexity-shaped answer engines call Exa more often than any other infra. Perplexity has its own index; smaller teams plug in Exa.

3.2 Tavily — the de facto agent search API

Tavily targeted one narrow market from day one — search APIs for LLM agents. It got integrated into LangChain and LlamaIndex early and became the default web search tool for agent workflows.

The API is simple — tavily.search(query, depth=...) parses intent, runs multiple queries, scrapes and cleans results. depth='basic' is fast and cheap; depth='advanced' is heavier and more thorough.

A wrinkle — answer synthesis is optional (include_answer=True). Tavily will synthesize a short answer for you. Quality is okay but generally weaker than letting your own LLM compose. Most teams use Tavily for retrieval only.

Tavily's pitch is narrow and obvious — "want to fill in LangChain's WebSearchTool quickly? Tavily." That is the whole pitch. Free tier exists; usage-based above it.

3.3 Serper / SerpAPI — Google results, plain

Serper, SerpAPI, ScaleSerp and friends sell exactly one thing — Google results as an API. No index of their own. They scrape (or pull through approved channels) and hand back structured JSON.

Why is there a market? Google's official Programmable Search API is expensive and capped, with limited result processing. So unofficial proxies became the standard.

Characteristics:

Cheapest pricing (a dollar or two per thousand queries).
Same results Google would show — index freshness and coverage matched.
No synthesis. Just a result list.

When Tavily or Exa start to feel expensive in a RAG pipeline, dropping down to Serper is a common cost lever. Body extraction has to be added separately (Reader API, Trafilatura, similar libraries).

3.4 You.com Search API — infrastructure with its own index

In the consumer chapter we said You.com stumbled. On the infra side they actually built a real seat. The You.com Search API:

Own web index plus external blends. Less dependent on Bing or Google than competitors.
Content extraction included, similar to Exa's /contents.
Pricing comparable to or slightly under Tavily and Exa.
Multilingual results (Korean, Japanese) are surprisingly strong.

They actively target enterprise (combining first-party data with web search) and have been adopted as the search backend of several large B2B SaaS products.

3.5 Brave Search API — another own-index option

Brave is best known as the browser, but it runs its own search index (Brave Search) and exposes it via the Brave Search API. Pricing is reasonable, and the data policy is clean — queries and results are not used for training.

Because it is a separate index it disagrees with Google on some queries, and in some domains the quality lags. But its privacy and licensing posture is unambiguous, and several AI companies have picked it up as a backend.

3.6 Consumer vs API matrix

Product	Consumer app	Developer API	Own index	Deep Research	Starting price	Headline differentiator
Perplexity	Strong (Pro Search, Spaces, Comet)	Weak (Sonar API)	Partial (own + Bing etc.)	Strong (Deep Research)	Free / Pro $20 / Max ~$200	UX, citations, Comet browser
You.com	Weak	Strong (Search API)	Strong (own)	Partial	Free / Pro ~$20 / API usage	Multilingual, enterprise
Phind	Medium (devs only)	Weak	Partial	None	Free / Pro $20	Code and docs focus
Exa	None	Strong	Strong	Partial (Research API)	Usage (a few dollars per 1k queries)	Semantic search, content extraction, `findSimilar`
OpenAI Search / Deep Research	Strong (ChatGPT Search)	Medium (web_search tool)	Partial (Bing-backed)	Strong (Deep Research)	ChatGPT Plus $20 / Pro $200	User base, model integration
Gemini AI Mode / Grounding	Strong (google.com AI Mode)	Strong (Grounding API)	Very strong (Google index)	Strong (Gemini Deep Research)	Free / Google One AI ~$20 / Vertex usage	Index freshness, free
Bing / Copilot	Medium (Copilot)	Medium (Bing API)	Very strong (Bing index)	Partial (Copilot Pages)	Free / Copilot Pro $20 / API usage	M365 integration
Kagi	Medium (Search + Assistant)	Weak (small API)	Partial (own+external)	Partial	From $10/month	No ads or tracking, user control
Tavily	None	Strong	None (external curation)	Partial (Research API)	Free tier / usage	LangChain and LlamaIndex default
Serper / SerpAPI	None	Strong (Google results)	None	None	Usage ($1–2/1k queries)	Cheapest, Google results as-is
Brave Search API	Weak (Brave Search)	Medium	Strong (own)	None	Free tier / usage	Own index, no training use

Pin this matrix to memory. We move on.

4. The Deep Research category — when a five-minute answer becomes a thirty-minute report

The single most interesting event of 2025 was the rise of Deep Research as a category. OpenAI, Perplexity, and Google all shipped a product with that name within months of each other. All three do the same shape of work — spend five to thirty minutes per question, autonomously browse dozens of pages, and return a cited report.

The mechanism is similar.

user query → model lays out a research plan (what subtopics to cover)
           → auto-generates multiple search queries
           → fetches result pages, accumulates body text into context
           → if gaps remain → more search → fill in
           → reconciles conflicting facts → cross-check
           → synthesizes a structured report with citations on every claim

The three products differ on style.

OpenAI Deep Research — the deepest. Often runs over thirty minutes. Built on o-series reasoning models with tool calls bolted on. Excels at academic research, market reports, due diligence. Downsides: slow, expensive (Pro plan or API usage), and occasionally so deep it overshoots the actual question.

Perplexity Deep Research — the fastest and most often-used. Usually five to ten minutes. Lighter output than OpenAI's, but enough depth for everyday information work. Pro users hit a daily cap; entry barrier is lower than the other two.

Gemini Deep Research — bundled into Google One AI. Inherits Google's index freshness, which matters. Synthesis quality has steadily climbed, and Gemini's 1M+ token context window holds more accumulated material in working memory at once. That qualitative difference helps when you are stitching long, scattered information together.

When is Deep Research worth it.

Worth it:

Market research ("major players and funding flows in 2026 SE Asia fintech")
Academic synthesis ("recent papers comparing Mamba and Transformer architectures")
Company DD ("XYZ Inc — product, team, funding, competition, risks")
Policy and legal tracking ("EU AI Act rulemaking changes in 2026")

Not worth it:

Simple fact checks ("React 19 release date") — a basic search answers in five seconds.
Coding debugging — reading and running the code yourself is faster.
Real-time info (stock prices, breaking news) — index freshness matters more.
Anything where the answer lives in a single authoritative page (one official doc and you are done).

Deep Research wins when the work is cross-source synthesis — when no one page has the answer. Anything else is overkill and a waste of clock.

One more thing — hallucinations are more dangerous in Deep Research output. In a short answer, when a citation does not match the text, the user can verify in a click. In a thirty-page report with thirty citations, nobody clicks all thirty. The longer the output, the higher the verification cost and the more likely a bad citation slips through. For serious deliverables every load-bearing claim must be human-verified against the source. Always.

5. The citation reliability problem — AI search's largest weakness

The headline promise of AI search is "citations attached, so trustworthy." Be honest — that promise is half-true.

Multiple independent evaluations across 2024–2025 converged on the same number. Sample random citations from AI search answers and 20 to 40 percent of them diverge from the underlying text. The divergence comes in three flavors.

The fact is not in the source. (Most dangerous.) The model picked up the fact elsewhere and pinned it to that citation. Click through and the claim is not on that page.
A similar fact is in the source but subtly different. Numbers off, conditions stripped, dates shifted. The most common pattern.
The fact is correctly in the source but pinned to the wrong sentence. The model swapped citation slots.

All three are synthesis-stage failures, not retrieval-stage. Search did its job and pulled relevant pages. But moving from "what the page says" to "what the answer says" is where the model slips.

Honest impressions across products (averaging across independent evaluations):

Gemini AI Mode (Google index-backed) — citation accuracy ranks highest on average, especially on quick answers (two to three pages).
Perplexity Pro Search — best citation rendering, short answers usually hold up. Multi-hop answers see more type 1 and 2 drift.
ChatGPT Search — comparable level, but the citation rendering is more buried, so users verify less.
Deep Research products — three to six bad citations among thirty-plus on average. Absolute accuracy per citation may be higher than a quick answer, but a long report almost always carries some bad cites somewhere.

Practical conclusion is clean.

Low-stakes decisions (where to eat, how useEffect changed in React 19) — trust AI search as-is.
Mid-stakes decisions (drafting a market-entry analysis, choosing a stack) — use AI search for the first pass, manually verify two or three load-bearing facts.
High-stakes decisions (legal, medical, financial, policy) — AI search is a starting point only. Every load-bearing fact verified against primary sources. If the report has thirty citations, you click thirty citations.

This is not "do not use AI search." It is do not assume the answer is 100% correct; grade your answers by stakes. The same grading was always needed for classic search — Wikipedia sentences also require verification. The difference is that AI search reads so fluently that users lose the verification reflex.

6. When AI search wins vs when it doesn't

A lot of writing announces the death of classic search prematurely. The honest 2026 read is that the two markets coexist. Default-tool changes per case.

Where AI search clearly wins.

Explanatory questions — "what is X, how does X work". Concept questions, not fact questions. Example: "how does the Mamba architecture differ from Transformers". The synthesis is dramatically faster than reading the SERP yourself.
Cross-source synthesis — comparisons, market scans, trend analysis. Work that would have meant opening five to ten tabs and stitching it together.
Vague natural-language queries — "that company, the one that raised a Series B last year, what was its name". When you do not know the keywords, intent parsing carries you home.
Cross-language searches — ask in Korean, synthesize from English sources. AI search just does this.
First-step coding — "how do I do X in this library" — though IDE-integrated tools (Cursor, Claude Code, Copilot Chat) are eating this seat fast.

Where classic search still wins.

Clean single-fact lookups. "Samsung closing price yesterday." It is one number. AI search is slower and pricier; the SERP knowledge box answers in a second.
Single official documentation page. React docs, Python docs — going straight there is faster. AI summaries lose subtle details.
Exploratory browsing. When you want to be surprised. Image search, design references, shopping. AI answers are too decisive to leave room for browsing.
Breaking and real-time information. Index freshness rules. Google still wins by a margin.
When the source itself is the information. Which outlet broke the story matters. AI answers smear the "who said this" question.
When the act of searching itself is sensitive. AI search gives away more context. The privacy posture is worse. That is exactly Kagi's pitch.

The serious 2026 information worker uses both. Default search box is still Google for fast facts. Deeper questions go to Perplexity or ChatGPT Search. Heavy research jobs go to Deep Research and you go make coffee. Then a human verifies.

Do not try to use only one tool. That is the anti-pattern.

7. The AI-native browser thesis — Comet, and beyond

Perplexity's Comet, Arc's Max (the AI-browser push the Browser Company once made), Brave's Leo integration, Opera's Aria — they all reduce to one claim. The future is not a search box; the browser itself becomes AI-native.

The argument:

A user's context does not fit in a search box. The user is already on a page. That page is context. Going back to a search box to start over breaks the thread. A sidebar assistant that already sees the current page is the natural shape.
Multi-tab is one task. Comparison shopping, market research — the user already has tabs open. The AI has to see all of them to be genuinely useful.
AI is becoming agentic. Not just answering, but operating pages on the user's behalf. That requires living inside the browser.

Comet is the cleanest implementation of this thesis. Sidebar assistant, multi-tab summarization, agentic mode. As of 2026 its installed base is small — hundreds of thousands. Against Chrome and Safari's market share it is a rounding error.

The headwinds are sharp.

Browser-switching cost is huge. Bookmarks, extensions, sessions. People do not move.
Chrome is integrating Gemini into itself. When Google puts the same feature into the browser most people already use, Comet's thesis gets absorbed.
Safari is integrating ChatGPT (the iOS/macOS 26 motion). Apple is going for the same seat.

The real value of Comet is that it serves as a lock-in mechanism so Perplexity does not lose the users it has. Consumer AI search is squeezed between ChatGPT and Google; owning the browser owns the user channel. The business logic is cleaner than the UX logic.

On the other hand, an AI-native browser is probably the final form of consumer AI search. Project forward five years and we likely do not perform "search" as a discrete act. An assistant is always on inside the browser; the pages we view, the tabs we open, the text we write — all of it is context. Perplexity Comet, Chrome+Gemini, Safari+ChatGPT — which one claims that seat is the next round of the fight.

The thesis is probably right. Who implements it is undecided.

8. Honest decision tree — what to use when

General users — knowledge workers, academics, journalists.

Fast fact checks — Google (if the search bar is already a habit) or Gemini AI Mode (same box, with an AI answer on top). Answer-in-five-seconds is what matters.
Explanatory or synthesis questions — Perplexity Pro or ChatGPT Search. Both are fine. If ChatGPT is already in your day, lean there.
Heavy research (market scans, academic synthesis, DD) — Deep Research. Pick among OpenAI (deepest), Perplexity (fastest), Gemini (freshest index) by job character. Always human-verify the output.
Privacy and tracking matter — Kagi. Accept the monthly cost.
Code-related — IDE-internal tools first (Claude Code, Cursor, Copilot Chat). If you still need a search tool, Phind.

Developers — RAG/agent builders.

Need to ship fast, standard integrations — Tavily. LangChain and LlamaIndex default.
Need semantic search and findSimilar — Exa. Content extraction is clean.
Need Google results, minimum cost — Serper / SerpAPI. Body extraction is your problem.
Multilingual results, enterprise governance — You.com Search API.
Index freshness plus citations attached to your own LLM calls — Gemini API Grounding with Google Search.
Own index plus no-training guarantee — Brave Search API.
Heavy usage, cost pressure — mix backends; trade quality vs cost. Avoid single-vendor dependency.

Teams and organizations.

Building a small consumer SaaS — answer engine builder. Look at Exa plus your own LLM, or Perplexity's Sonar API.
Internal knowledge search — Glean, Mem, and similar enterprise search live in a different market. Out of scope here.
Agent workflows (e.g. a Slack bot that fetches research) — Tavily ships fastest. Diversify to Exa or Serper once costs rise.
Copyright and licensing sensitivity required (media, legal) — Brave Search API (no training use), or your own crawl with explicit licensing.

Price sensitivity.

Can you stay free? — For consumers, Gemini AI Mode is already good enough free. For developers, start inside the Tavily / Exa / Brave free tiers.
Around $20/month — Perplexity Pro, ChatGPT Plus, Kagi Pro. Pick one based on workflow.
$100–$200/month — Perplexity Max or ChatGPT Pro. Heavy Deep Research users.
Variable usage — API side is inherently usage-based. If heavy use is likely, set a monthly budget and dashboard it.

Most common mistake: trying to cram every workflow into one consumer tool. Do not try to do everything in Perplexity. Fast facts are faster on Google. Code is better in IDE tools. Heavy research is a separate tool. Splitting two or three across workflows is the 2026 standard.

Epilogue — checklist, anti-patterns, what's next

A one-week check after adopting a tool

Replayed five real searches from the past week across two or three tools and compared.
Picked three random citations and clicked through to verify the fact actually appears in the source.
Ran Deep Research once and honestly assessed whether the time was worth it.
Checked mobile patterns (browser, desktop app, mobile app — UX differs).
Checked privacy settings (whether searches train models, retention policy).
For developer work — wired a small RAG demo against one API (Tavily or Exa).
Estimated monthly cost (subscriptions + API usage + Deep Research calls).

Anti-patterns — common mistakes

Lifting answers without verifying citations. The most common, most dangerous failure. Smooth ≠ true. Verify two or three load-bearing claims at the primary source.
Using Deep Research for trivia. Don't burn a thirty-minute report on a five-second fact. The tools have different purposes.
Forcing every workflow into one consumer tool. Fast facts, explanatory questions, heavy research, code — different tools win each lane.
Assuming AI search answers are 100% correct. 20–40% citation drift is a real number across products. Grade your answers by stakes.
Putting a RAG pipeline on a single API. Cost volatility and availability risk. Mix two or three with a fallback structure.
Ignoring publisher and copyright posture. For media or legal, prefer indexes with no-training guarantees (Brave) or explicit licensing.
Declaring search dead too early. In 2026 Google still handles the vast majority of queries. Both markets coexist.
Switching entire browsers to a new one like Comet overnight. Run it as a side browser for a month first. The cost of losing your daily browser is real.

Coming next

The next post is RAG search backend bake-off — Exa vs Tavily vs You.com vs Serper on the same queries. We throw the same one hundred queries at four APIs and quantify result relevance, body-extraction quality, latency, and cost. After that, inside an AI-native browser — how Comet reads pages, fuses multi-tab context, and runs agentic actions; and how to imitate the parts that matter inside your own product.

And after that, building a Deep Research system yourself — getting an OpenAI Deep Research-shaped multi-hop research agent to work in your own domain. The structure of search API plus LLM plus synthesis loop.