Chaos and Order

Chaos and Order https://www.youngju.dev/blog 천천히 올바르게. AI Researcher & DevOps Engineer Youngju's tech blog. GPU/CUDA, LLM, MLOps, Kubernetes AI workloads, distributed training, and data engineering. ko fjvbn2003@gmail.com (Youngju Kim) fjvbn2003@gmail.com (Youngju Kim) Sat, 16 May 2026 00:00:00 GMT https://www.youngju.dev/blog/culture/2026-05-16-voice-ai-tts-2026-elevenlabs-cartesia-openai-voice-play-ht-hume-sesame-fish-deepgram-aura-deep-dive.en Voice AI & TTS 2026 Deep Dive - ElevenLabs · Cartesia Sonic · OpenAI Voice · Play.HT · Hume · Sesame · Fish Audio · Deepgram Aura https://www.youngju.dev/blog/culture/2026-05-16-voice-ai-tts-2026-elevenlabs-cartesia-openai-voice-play-ht-hume-sesame-fish-deepgram-aura-deep-dive.en 2026 is the year voice AI moved from the STT → LLM → TTS pipeline to full-duplex real-time voice agents as the new norm. ElevenLabs v3 holds the multilingual / emotion TTS throne, Cartesia Sonic 3 hits 75ms TTFW and is the default TTS for LiveKit Agents, and OpenAI Realtime API / Google Gemini Live / Anthropic Claude voice mode have cemented LLM-native voice. Hume EVI 2 and Sesame Maya/Miles redefined emotional naturalness; Fish Audio, CosyVoice 2 and F5-TTS own the open/Chinese ecosystem. STT-wise, Deepgram Nova-3 leads on latency below 50ms while AssemblyAI Universal-2 and OpenAI GPT-4o transcribe lead on accuracy. LiveKit Agents, Pipecat, Vapi, Retell AI and Bland AI run the orchestration layer; Tennessee ELVIS Act and EU AI Act draw the first legal lines on cloning. Korea is owned by Typecast and Clova Dubbing, Japan by CoeFont and VOICEVOX. This is the complete map. Sat, 16 May 2026 00:00:00 GMT fjvbn2003@gmail.com (Youngju Kim) voice-aittselevenlabscartesiaopenai-voiceplay-hthumesesamefish-audiodeepgramasrconversational-aienglish https://www.youngju.dev/blog/culture/2026-05-16-voice-ai-tts-2026-elevenlabs-cartesia-openai-voice-play-ht-hume-sesame-fish-deepgram-aura-deep-dive.ja 音声AI & TTS 2026 完全ガイド - ElevenLabs · Cartesia Sonic · OpenAI Voice · Play.HT · Hume · Sesame · Fish Audio · Deepgram Aura 徹底解説 https://www.youngju.dev/blog/culture/2026-05-16-voice-ai-tts-2026-elevenlabs-cartesia-openai-voice-play-ht-hume-sesame-fish-deepgram-aura-deep-dive.ja 2026年は音声AIが「STT → LLM → TTS」のパイプラインから、フルデュプレックスのリアルタイム音声エージェントへ完全に移行した年だ。ElevenLabs v3が多言語・感情TTSの王座を維持し、Cartesia Sonic 3はTTFW 75msでLiveKit Agentsのデフォルト、OpenAI Realtime API・Google Gemini Live・Anthropic Claude voice modeはLLMネイティブ音声を定着させた。Hume EVI 2・Sesame Maya/Milesは感情と自然さの基準を再定義し、Fish Audio・CosyVoice 2・F5-TTSはオープン/中華圏を握る。STT側ではDeepgram Nova-3が50ms未満で最速、AssemblyAI Universal-2・OpenAI GPT-4o transcribeが精度で対抗する。LiveKit Agents・Pipecat・Vapi・Retell AI・Bland AIがオーケストレーションを担い、テネシー州ELVIS法・EU AI法が音声クローンの倫理に最初の線を引いた。韓国は타입캐스트・클로바ダビング、日本はCoeFont・VOICEVOXが市場を分けあう。本稿はその全体地図である。 Sat, 16 May 2026 00:00:00 GMT fjvbn2003@gmail.com (Youngju Kim) voice-aittselevenlabscartesiaopenai-voiceplay-hthumesesamefish-audiodeepgramasrconversational-ai日本語 https://www.youngju.dev/blog/culture/2026-05-16-voice-ai-tts-2026-elevenlabs-cartesia-openai-voice-play-ht-hume-sesame-fish-deepgram-aura-deep-dive 음성 AI & TTS 2026 완벽 가이드 - ElevenLabs · Cartesia Sonic · OpenAI Voice · Play.HT · Hume · Sesame · Fish Audio · Deepgram Aura 심층 분석 https://www.youngju.dev/blog/culture/2026-05-16-voice-ai-tts-2026-elevenlabs-cartesia-openai-voice-play-ht-hume-sesame-fish-deepgram-aura-deep-dive 2026년의 음성 AI는 STT → LLM → TTS 파이프라인이 끝나고, 풀듀플렉스 실시간 음성 에이전트가 표준이 된 해다. ElevenLabs v3가 다국어·감정 TTS의 왕좌를 지키는 동안, Cartesia Sonic은 75ms TTFW로 LiveKit Agents의 기본 TTS가 됐고, OpenAI Realtime API · Google Gemini Live · Anthropic Claude voice mode는 LLM-네이티브 음성을 정착시켰다. Hume EVI 2 · Sesame Maya/Miles는 감정 음성을, Fish Audio · CosyVoice 2 · F5-TTS는 오픈/중화권을 장악했다. STT 쪽은 Deepgram Nova-3가 50ms 미만으로 가장 빠르고, AssemblyAI Universal-2 · OpenAI GPT-4o transcribe가 정확도로 맞선다. LiveKit Agents · Pipecat · Vapi · Retell AI · Bland AI가 오케스트레이션을 맡고, Tennessee ELVIS 법 · EU AI Act가 클로닝 윤리에 선을 그었다. 한국은 타입캐스트 · 클로바 더빙, 일본은 CoeFont · VOICEVOX가 시장을 나눠 갖는다. 이 글은 그 전체 지도를 그린다. Sat, 16 May 2026 00:00:00 GMT fjvbn2003@gmail.com (Youngju Kim) voice-aittselevenlabscartesiaopenai-voiceplay-hthumesesamefish-audiodeepgramasrconversational-ai