💡 왼쪽 원문을 읽으면서 오른쪽에 따라 써보세요. Tab 키로 힌트를 받을 수 있습니다.

원문 렌더가 준비되기 전까지 텍스트 가이드로 표시합니다.

Opening — September 2024, the day NotebookLM changed the room

One day in September 2024, Google quietly added a button labeled "Audio Overview" to NotebookLM. Drop in a PDF, a slide deck, a webpage — anything — and a 5-to-15-minute two-host "Deep Dive" podcast came out the other side. The voices were natural, the hosts cracked jokes, and the information flow was smooth. Within two days, X (Twitter) was flooded with "I listened to my own paper through NotebookLM", "I generated a podcast about my résumé".

The eighteen months from then to May 2026 has been the explosion phase of AI podcast tooling. Wondercraft, Podcastle, and Castmagic grew into full-stack AI podcast platforms, Descript and Riverside hardened their AI editing features, and ElevenLabs and Cartesia raised the ceiling for TTS quality. This article walks through the full landscape of "I'm making a podcast with AI" as of May 2026.

AI podcast evolution 2024-2026 — three leaps

- **Phase 1 (through Aug 2024)**: AI plays a supporting role. Descript Overdub, Adobe Podcast Enhance, Otter auto-transcription were about it. Hosts were human.

- **Phase 2 (Sep 2024 to mid-2025)**: NotebookLM Audio Overview is the dividing line. A new category emerges: "AI playing two hosts in a convincingly real conversation". Wondercraft and others follow.

- **Phase 3 (late 2025 to May 2026)**: Personalized and interactive AI podcasts. NotebookLM 1.0 adds an "Interactive Mode" that takes listener questions, Spotify ships AI playlists and auto-narration based on listening history.

Full AI podcast generators — NotebookLM, Wondercraft, Podcastle

In the "give it a document or a topic and it makes the whole thing" category, the May 2026 leaders are:

- **Google NotebookLM Audio Overview / Deep Dive** (launched Sep 2024, customization added Oct 2024): free. Expanded beyond English to 50+ languages. Korean and Japanese officially supported from mid-2025.

- **Wondercraft**: flagship of the AI podcast SaaS category. Text becomes multi-speaker dialogue with BGM and effects auto-inserted. Ad insertion automated too.

- **Podcastle**: AI hosts plus real recording in one platform. Strong free tier.

- **NoteGPT, Audyo**: NotebookLM clones. Focused on URL/PDF to audio conversion.

- **ElevenLabs Audio Native**: an embedded widget that turns blog text into audio on the page. Less an AI podcast than a "text to voice" widget, but in the same ecosystem.

- **Meta Audiobox**: research-stage, but a Q1 2026 demo dropped publicly. Combines text and voice prompts.

How does NotebookLM Audio Overview actually work? When a user adds sources, Gemini summarizes and structures them, combines that with a two-host persona system prompt to draft a script, then synthesizes it with multi-speaker TTS. Hit "Customize" and you can steer tone, length, and focus in natural language.

NotebookLM's limits — why it's not the final boss

NotebookLM is strong, but as of May 2026 it still has rough edges.

- **No editing**: you can't trim the generated audio or regenerate a specific section. Length is automatic.

- **Limited host voice selection**: changing the default two voices is restricted.

- **Murky commercial-use guidelines**: Google did publish an "Ethical Use" document, but whether you can directly distribute an ad-monetized podcast from it is a gray zone.

- **Disclosure norms**: guidelines for telling listeners "this was AI-generated" are still settling.

Tools like Wondercraft aim straight at this gap. Editable, wide voice selection, clear commercial-use license.

Voice cloning + TTS — ElevenLabs, Cartesia, Play.HT, OpenAI Voice

The "AI host" voice in a podcast ultimately comes from a TTS engine. As of May 2026, six or seven vendors split the market.

- **ElevenLabs**: market leader. Strongest on naturalness, emotion, multilingual. Voice Lab (personal voice cloning) is the killer feature.

- **Cartesia**: opened the low-latency (`<100ms`) TTS market with Sonic. A fit for interactive AI hosts.

- **Play.HT**: direct competitor to ElevenLabs. Holds the mid-market on pricing.

- **HeyGen Voice**: HeyGen bundles video and voice in one package.

- **OpenAI Voice + Realtime API**: based on GPT-4o voice mode. A contender for interactive podcasts.

- **Hume EVI (Empathic Voice Interface)**: emotion recognition and tonal modulation. Human-like reactions are the strength.

- **Sesame**: a voice AI that emerged in 2025 and now stands up to ElevenLabs on naturalness.

TTS APIs are increasingly standardized. ElevenLabs example.

from elevenlabs.client import ElevenLabs

from elevenlabs import play

client = ElevenLabs(api_key="sk_xxx")

audio = client.text_to_speech.convert(

voice_id="21m00Tcm4TlvDq8ikWAM",

model_id="eleven_multilingual_v2",

text="May 2026 — the AI podcast market exploded after NotebookLM shipped.",

output_format="mp3_44100_128",

)

play(audio)

The OpenAI Realtime API is bidirectional voice, so its shape is different.

from openai import OpenAI

client = OpenAI()

Interactive podcast: user voice in, AI host responds.

with client.beta.realtime.connect(model="gpt-4o-realtime-preview") as connection:

connection.session.update(session={"modalities": ["audio", "text"]})

connection.conversation.item.create(item={

"type": "message",

"role": "user",

"content": [{"type": "input_text", "text": "Summarize this week's AI news"}]

})

connection.response.create()

Podcast editors + AI — Descript, Riverside, Adobe Podcast

The editing phase after recording is still human-driven, but AI assistance is now table stakes.

- **Descript**: edit audio by editing text (the original). Overdub (voice cloning), Studio Sound (noise removal), Underlord (AI assistant), Eye Contact. The standard for podcasters and YouTubers.

- **Riverside.fm**: remote recording with 4K video and AI magic editing. Since 2024: AI Show Notes, AI Magic Clips, Magic Audio.

- **Cleanvoice**: specializes in filler-word (uh, um) removal. A standalone version of one Descript feature.

- **Adobe Podcast** (formerly Project Shasta): Enhance Speech is the game-changer. Restores noisy audio to near studio quality. Mic Check and Background Remover added.

- **Auphonic**: the classic for automatic leveling and noise removal. A 14-year-old service that's still the standard.

- **Veed.io**: video-first but strong for pulling podcast clips.

- **Hindenburg PRO**: journalist-friendly DAW with integrated AI noise removal.

- **Podcastle Magic Dust**: one-click noise and reverb removal.

As of May 2026 Descript is the most integrated "podcast plus video" tool. UI flow.

1. Record or import (mp3, wav, mp4) → automatic transcript

2. Select "uh, um" in the text and delete → audio sync

3. Use Overdub to regenerate missing words (requires prior voice-model consent)

4. Use Studio Sound to clean up room tone

5. Tell Underlord in natural language: "make me five one-minute clips"

6. Video track auto-captions plus speaker tracking

7. Push mp3 or mp4 plus RSS — one screen

Transcription + show notes — Otter, Castmagic, Capsho, Whisper

The core of the publishing workflow is the show notes. By 2026 they're nearly 100% automated.

- **Otter.ai**: the standard for meeting and interview transcription. Generative Summaries handle summaries and action items.

- **Castmagic**: podcast-specific. Transcript becomes show notes, chapters, quotes, newsletter, tweets, LinkedIn posts in one pass.

- **Capsho**: direct competitor to Castmagic. Optimized for podcaster content reuse.

- **Podsqueeze**: one episode into 80 marketing outputs — that's literally their pitch.

- **Swell AI**: aimed at enterprise podcast teams. SOC 2 etc.

- **OpenAI Whisper**: foundation model under many of the above. Free and open-weight.

- **Deepgram Aura**: combined TTS plus STT API. Strong in enterprise STT.

- **AssemblyAI**: API-first STT. Strong on diarization, sentiment, entity tagging.

A typical Castmagic output looks like this.

[Episode 99: The NotebookLM Shock]

- Chapters:

00:00 Intro

01:23 NotebookLM launch background

05:40 First-use experience

12:15 Limits and ethics

18:50 Outro

- 5-sentence summary: NotebookLM, in September 2024 ... (omitted)

- 3 listener questions

- 8 tweet/X threads

- 1 LinkedIn post

- 600-character newsletter body

- 8 pull quotes

Hosting + distribution — Spotify for Podcasters, Apple, Podbbang, Buzzsprout

Once recorded and edited, the mp3 lands on a hosting platform. The May 2026 market is two giants plus a healthy independent layer.

- **Spotify for Podcasters** (formerly Anchor): the de facto standard for free hosting. The Anchor brand sunset in 2022 and was unified into Spotify for Podcasters.

- **Apple Podcasts Connect**: the starting point since the iTunes era. The standard for RSS feed registration.

- **Podbean**: strong on video podcasts and live.

- **Buzzsprout**: a favorite among indie podcasters. Friendly UX.

- **Transistor.fm**: the leader for multi-show operations (agencies, companies).

- **Simplecast**: under SiriusXM. Enterprise.

- **RSS.com, Captivate.fm, Acast**: mid-size hosting.

- **Spreaker**: live broadcasting plus hosting. Under iHeartMedia.

- **Megaphone**: under Spotify. Ad-insertion infrastructure.

Buzzsprout's upload-to-distribution flow has become almost a template.

1. Upload mp3 (metadata auto-recognized)

2. Write episode title and description (paste Castmagic output)

3. Auto-register the RSS feed with Apple, Spotify, Google

4. View 30/90-day stats in Buzzsprout Stats

5. Auto-captions plus a transcript page

6. Magic Mastering (automatic audio mastering)

Discovery + SEO — Listen Notes, Podchaser, Goodpods

Podcast search is still hard, because audio isn't text.

- **Listen Notes**: "the Google for podcasts Google doesn't index". Around 3.5M shows indexed.

- **Podchaser**: "IMDB for podcasts". A guest-host credit database.

- **Goodpods**: friend-driven recommendations. "Letterboxd for podcasts".

- **Podscribe**: transcript search.

- **Chartable** (acquired by Spotify and integrated): charts and attribution are now inside Spotify.

From an SEO angle, what works in 2026 hasn't changed: publish full transcripts on your own site, post a video version on YouTube, push clips on X and LinkedIn.

AI video clips — Headliner, Opus Clip, Repurpose

Audio alone isn't enough. Short video clips are the core of podcast marketing.

- **Headliner**: the original audiograms (waveform plus captions). Strong free tier.

- **Wavve**: Headliner alternative. Mobile-friendly.

- **Repurpose.io**: automated multi-platform distribution (TikTok, Reels, Shorts, LinkedIn).

- **Opus Clip**: AI auto-extracts one-minute clips from long videos. Became the podcaster standard during 2024.

- **Descript 1-click clips**: same job, inside Descript.

- **Munch**: same category. AI clip extraction.

Opus Clip's ClipAnything flow is typical. Upload a one-hour video, the AI auto-extracts clips by topic, emotion, and highlight, captions them in vertical 9:16, and ranks them by score.

Live audio — Spaces, Discord, the Clubhouse residue

After the 2021 Clubhouse mania, live audio went through a full boom-bust cycle.

- **Clubhouse**: peaked in 2021, then sharply declined. Pivoted to text messaging in 2024. Still operating as of 2026 but with minimal influence.

- **X (Twitter) Spaces**: rebranded from Twitter Spaces to X Spaces. The most active live audio venue.

- **Discord Stages**: mostly community and gaming-internal events.

- **LinkedIn Live Audio**: B2B and enterprise conferences.

- **Spotify Live**: sunset in 2023. Persists as a case study.

- **Telegram Voice Chats**: sometimes used as informal live audio.

The lesson from Clubhouse's decline is that "live audio isn't as large a market as async podcasting". The hybrid — record live, repost as a podcast — turned out to be more efficient.

Korean podcasts — Podbbang, Naver Audio Clip, Kakao, Welaaa

The Korean podcast market evolved as a distinct ecosystem.

- **Podbbang**: the leader in Korean podcasting. Carries accumulated assets from the "Naneun Ggomsuda" era. Hosting plus ad insertion.

- **Naver Audio Clip**: Naver's unified audio platform. Podcasts plus audiobooks plus radio.

- **Kakao Channel Audio**: audio inside KakaoTalk. Moments-style format.

- **Welaaa**: audiobooks plus classes. As of 2026 a top candidate for Korean audiobook market leader.

- **Millie's Library**: e-books plus audiobooks. Accelerated after the KT acquisition.

- **Government and public podcasts**: an established information channel.

Two things make the Korean market unusual. First, political and current-affairs podcasts make up a much larger share than globally (Naneun Ggomsuda legacy). Second, the video-first culture means YouTube podcasts (with video) are far bigger than pure-audio podcasts.

Japanese podcasts — Voicy, Stand.fm, Radiotalk

Japan is another ecosystem.

- **Voicy**: effectively number one in Japanese podcasting. Positioned as "radio-like audio social".

- **Stand.fm**: anyone can start a live broadcast. Many Japanese indie creators.

- **Radiotalk**: mobile-friendly. Short episodes.

- **Spoon Japan**: Korea's Spoon expanded to Japan. Live radio plus voice social.

- **Anchor Japan (Spotify)**: the Japan version of global Spotify.

- **himalaya Japan**: Chinese audio platform's Japan presence.

What's distinctive about Japan is the deep radio-listening habit shaped by NHK and commercial broadcasters. Podcasts modeled after radio formats (like Voicy) do better there.

Workflow comparison — human vs AI host

A step-by-step time comparison for a 60-minute episode.

| Step | Human host (2026) | AI host (2026) |

| --- | --- | --- |

| Planning + research | 4-8 hours | 30 minutes |

| Guest outreach + scheduling | 2-5 hours | 0 |

| Recording | 60-90 minutes | 5 minutes (gen) |

| Transcript | auto 5 min | auto |

| Editing | 1-3 hours | 0 |

| Show notes | Castmagic auto | auto |

| Mastering | Auphonic 10 min | auto |

| Upload + distribute | 10 minutes | 10 minutes |

| **Total** | **~10-16 hours** | **~45 minutes** |

This is purely a quantitative comparison. A human host's hour is probably much more valuable to a listener than an AI host's hour. But as a strategy for allocating finite time, putting some content on AI rails and reserving human hours for flagship episodes is a reasonable hybrid.

RSS feeds and standards — the infrastructure under the soil

The base infrastructure of podcasting has been RSS since the 1990s. That hasn't changed in 2026.

- **RSS 2.0**: the standard feed format.

- **iTunes Podcast Tags**: Apple's extension.

- **Podcast 2.0 Namespace**: led by podcastindex.org. Extension tags for transcripts, chapters, value-for-value.

- **Open Podcast Analytics Working Group**: IAB standard for counting downloads.

- **Spotify Open Access**: Spotify's own measurement and recommendation.

Most hosting services auto-generate an RSS feed; once you register it with Apple and Spotify, new episodes get pushed automatically. However far AI workflows evolve, "an mp3 advertised in an RSS feed" remains a sturdy model.

NotebookLM Korean and Japanese — official from mid-2025

NotebookLM Audio Overview's multilingual support expanded beyond English to 50+ languages in May 2025. Korean and Japanese quality breaks down like this.

- **Korean Audio Overview**: officially supported from mid-2025. Natural, but loanword pronunciation (especially names and places) is often awkward. The two host voices read close to Korean native.

- **Japanese Audio Overview**: supported in the same wave. Accuracy and naturalness exceed Korean — there's just more Japanese NLP training data.

- **Mixed-language sources**: a Korean PDF with English tables is still weak. The system sometimes blends the two awkwardly.

This is the opening for competitors like Wondercraft. There's no proper Korean-specialized multi-speaker TTS tool yet.

AI risks — voice cloning, fake interviews, authenticity debates

The shadow side of AI podcasting is clear.

- **Voice cloning fraud**: A fake Biden robocall went viral during the 2024 US election season; a similar incident hit the 2025 Korean election. Cloning a politician's or celebrity's voice for fraud is now criminally prosecutable.

- **AI fake interviews**: In November 2024 a fake Lex Fridman x Trump podcast video racked up millions of views on X (later labeled). NotebookLM Audio Overview's naturalness is a double-edged sword.

- **Voice copyright**: Cases of training on and reproducing a deceased celebrity's voice for ads or podcasts are increasing (the Scarlett Johansson vs OpenAI Sky-voice dispute is the canonical example).

- **Authenticity debate**: should listeners know a podcast was AI-generated? Labeling obligations are being pushed since the 2025 EU AI Act started applying.

- **Persona consistency**: an AI host can crack jokes, but doesn't really have opinions. Listeners have a right to know that.

Right after the NotebookLM launch, Google published additional Ethical Use Guidelines: (1) disclose AI generation to listeners, (2) don't voice personal or confidential material without explicit consent, (3) don't use for political disinformation.

2025-2026 trends — personalization, interactivity, AI curation

The themes of the last two years.

1. **Personalized AI podcasts**: one episode auto-generated daily from your reading list, Pocket, Readwise, highlights. NotebookLM, Recall, Snipd are experimenting.

2. **Interactive AI podcasts**: NotebookLM added a "Join" feature in December 2024. Speak up mid-episode and the AI host responds. GPT-4o Realtime and Cartesia Sonic followed in 2025.

3. **AI curation**: Spotify AI Playlist plus auto-narration. Splices in short AI commentary based on listener mood and time of day.

4. **Public broadcaster AI experiments**: NPR ran an AI-assisted news podcast pilot in Q1 2025. BBC has similar work at R&D stage. All under a transparency-and-disclosure first principle.

5. **Voice cloning security**: stronger identity verification. ElevenLabs has run identity checks plus Voice Captcha since 2024.

6. **Low-power on-device inference**: low-latency models like Cartesia Sonic enable in-device AI hosts on smart speakers and earbuds.

7. **Multilingual simultaneous dubbing**: one speaker's voice generated simultaneously in English, Korean, Japanese. ElevenLabs Dubbing is the de facto standard.

8. **AI-inserted podcast ads**: Spotify and Megaphone's dynamic ad insertion is evolving into AI-voiced ads.

Tool selection guide — by goal

A May 2026 recommendation by "what do I actually want to make".

- **Source documents to a short summary podcast (15 minutes)**: NotebookLM Audio Overview, free.

- **Full-stack AI podcast production**: Wondercraft or Podcastle.

- **Human host plus AI editing**: Descript + Cleanvoice + Castmagic.

- **High-quality remote recording**: Riverside.fm or Squadcast.

- **Just clean noise quickly**: Adobe Podcast Enhance (free) or Auphonic.

- **Voice cloning or multilingual dubbing**: ElevenLabs.

- **Interactive AI host**: OpenAI Realtime API or Cartesia plus ElevenLabs.

- **Content reuse**: Castmagic + Opus Clip + Repurpose.io.

- **Korean podcast hosting**: Podbbang or Buzzsprout.

- **Japanese podcast hosting**: Spotify for Podcasters or Voicy.

Pricing — from free to enterprise

Rough pricing as of May 2026 (USD per month).

- NotebookLM: free (a Google One / Workspace bundled feature).

- Wondercraft: from $25 (starter), from $100 (pro).

- Podcastle: free to $24 (pro) to $48 (storyteller).

- Descript: free to $24 (creator) to $50 (pro).

- Riverside: free to $24 (standard) to $49 (pro).

- Cleanvoice: from $11 (per-hour pricing).

- Adobe Podcast Enhance: free (ongoing beta).

- ElevenLabs: $5 (starter) to $22 (creator) to enterprise.

- Castmagic: $39 to $99 to enterprise.

- Otter.ai: free to $10 (pro) to $20 (business).

- Buzzsprout: $12 to $24 to $36 (tiered by hosting hours).

- Podbbang: free plus ad revenue share.

- Voicy: free (creator) plus premium channel monetization.

Annual billing typically discounts 15-20%.

Final check — what to measure

Podcasts are hard to measure. The metrics commonly used as of May 2026.

- **Downloads (IAB-certified)**: hosting services provide it by default.

- **Listen-through rate**: Spotify/Apple analytics.

- **30/60/90-day download curve**: new-episode momentum.

- **New vs returning ratio**: listener loyalty.

- **Per-episode chapter listen rate**: enabled by AI show-note chaptering.

- **Transcript search traffic**: SEO effect on your own site.

- **AI-generated clip CTR**: Opus/Headliner clip reach.

- **Ad-insertion impressions**: Megaphone/Acast dynamic ads.

Even with AI helping, the human-readable thing is still "do they listen and click the next episode". The point that 100 loyal listeners matter more than raw download counts hasn't changed.

Closing — "AI is a tool; the host is still a person"

Two things are clear as of May 2026.

First, **low-cost and low-frequency content has largely been ceded to AI**. Turning internal training materials into a podcast, listening to a paper or blog post as audio, or hearing the week's news as a five-minute summary — NotebookLM and Wondercraft cover all of that.

Second, **podcasts that pull listeners through the magnetism of host and guest still belong to humans**. AI can imitate Lex Fridman's four-hour interviews, Joe Rogan's controversies, or political talk-show riffs, but the reason listeners come is "that person's opinion, that person's experience". AI doesn't reach there — at least not in May 2026.

People who use tools well buy back time. Hand transcripts to Whisper, show notes to Castmagic, clips to Opus, noise to Adobe Podcast. With the hours you reclaim, prepare better questions and book better guests. That's the podcaster workflow of 2026.

References

- NotebookLM Audio Overview announcement: https://blog.google/technology/ai/notebooklm-audio-overviews/

- NotebookLM Customization (Oct 2024): https://blog.google/technology/ai/notebooklm-update-october-2024/

- NotebookLM Interactive Mode: https://blog.google/technology/ai/notebooklm-interactive-mode/

- NotebookLM Ethical Use Guidelines: https://support.google.com/notebooklm

- Wondercraft: https://www.wondercraft.ai/

- Podcastle: https://podcastle.ai/

- Descript: https://www.descript.com/

- Descript Underlord: https://www.descript.com/underlord

- Riverside.fm: https://riverside.fm/

- Riverside Magic Clips: https://riverside.fm/blog/magic-clips

- Cleanvoice: https://cleanvoice.ai/

- Adobe Podcast: https://podcast.adobe.com/

- Adobe Podcast Enhance: https://podcast.adobe.com/enhance

- Auphonic: https://auphonic.com/

- Hindenburg PRO: https://hindenburg.com/

- ElevenLabs: https://elevenlabs.io/

- ElevenLabs Voice Lab: https://elevenlabs.io/voice-lab

- Cartesia Sonic: https://cartesia.ai/sonic

- Play.HT: https://play.ht/

- OpenAI Realtime API: https://platform.openai.com/docs/guides/realtime

- Hume EVI: https://www.hume.ai/products/empathic-voice-interface-evi

- Sesame: https://www.sesame.com/

- Otter.ai: https://otter.ai/

- Castmagic: https://www.castmagic.io/

- Capsho: https://www.capsho.com/

- Podsqueeze: https://podsqueeze.com/

- Swell AI: https://www.swellai.com/

- OpenAI Whisper: https://github.com/openai/whisper

- Deepgram Aura: https://deepgram.com/learn/aura-text-to-speech-api

- AssemblyAI: https://www.assemblyai.com/

- Spotify for Podcasters: https://podcasters.spotify.com/

- Apple Podcasts Connect: https://podcasters.apple.com/

- Podbean: https://www.podbean.com/

- Buzzsprout: https://www.buzzsprout.com/

- Transistor.fm: https://transistor.fm/

- Simplecast: https://simplecast.com/

- Acast: https://www.acast.com/

- Listen Notes: https://www.listennotes.com/

- Podchaser: https://www.podchaser.com/

- Goodpods: https://www.goodpods.com/

- Headliner: https://www.headliner.app/

- Opus Clip: https://www.opus.pro/

- Repurpose.io: https://repurpose.io/

- X Spaces: https://help.x.com/en/using-x/spaces

- Podbbang: https://www.podbbang.com/

- Naver Audio Clip: https://audioclip.naver.com/

- Welaaa: https://www.welaaa.com/

- Millie's Library: https://www.millie.co.kr/

- Voicy: https://voicy.jp/

- Stand.fm: https://stand.fm/

- Radiotalk: https://radiotalk.jp/

- Meta Audiobox research: https://audiobox.metademolab.com/

- EU AI Act labeling guidance: https://artificialintelligenceact.eu/