필사 모드: WebRTC & Real-Time Communications 2026 Deep Dive — LiveKit, Daily, Agora, Twilio, Pion, Mediasoup, Jitsi, Janus, AWS IVS, Cloudflare Calls
EnglishPrologue — The Death of Programmable Video and the New Order
On 2024-12-05, Twilio officially shut down Programmable Video. The product that had been the de facto WebRTC PaaS for half a decade stopped accepting new sign-ups in March 2024, and exactly nine months later the existing workloads were cut as well. Thousands of apps that had been putting off the migration scrambled across LiveKit, Daily, Agora, Vonage, Dolby.io and Zoom Video SDK looking for a new home.
Who filled that void is, in itself, the 2026 map of real-time communications.
- **LiveKit** became the de facto standard for AI voice agents when OpenAI Realtime API picked it as its official transport. The open-source SFU, LiveKit Cloud and Agents SDK move together as one bundle.
- **Daily.co** packaged LLM, STT, TTS and SFU into a single pipeline alongside Daily Bots, and Pipecat standardized the glue code in between.
- **Cloudflare Calls** entered with an aggressive 0.05 USD/GB egress price and put pressure on every other SFU vendor.
- **AWS IVS Real-Time** (GA 2023-08) borrowed the Twitch infrastructure wholesale to ship sub-100ms global multi-host live streaming.
- And on the standards side, **WebRTC NV** finally promoted years of experiments — RTCRtpScriptTransform, Encoded Audio/Video Frame, AV1, L1T3 SVC — into working drafts.
This article maps that landscape end to end. From the one-line RTCPeerConnection example, through a 9-platform comparison matrix, AI voice agent integration, and the local market situation in Korea and Japan.
1 · What WebRTC Actually Does — Three Legs
WebRTC is not magic. It is a standard that stands on three legs.
[Browser A] [Browser B]
| |
| (1) Signaling — agree on how to meet |
| (NOT part of the WebRTC spec — WebSocket etc)|
+----> Signaling Server (you operate this) <----+
| |
| (2) ICE — gather candidates on where to connect |
+----> STUN (learn your public IP and port) |
| TURN (relay when NAT traversal fails) |
| |
| (3) Media — actually stream A/V or data |
+-----------------DTLS-SRTP encrypted------------ +
(P2P or through SFU/MCU)
Memorize the responsibility of each leg and tool selection gets easier.
- **(1) Signaling is out of scope for the WebRTC standard.** Any channel works — WebSocket, Server-Sent Events, MQTT, SIP — as long as both sides can exchange the same SDP (Session Description Protocol) and ICE candidates. LiveKit, Daily and Agora write this themselves and hide it in their SDKs.
- **(2) ICE is the heart of the standard.** STUN learns your public IP and port, and TURN relays the media when that is not enough. Open-source coturn is the de facto standard.
- **(3) Media is always encrypted with DTLS-SRTP.** Plaintext media does not exist in the standard. On top sit codecs: Opus (audio, mandatory), VP8/VP9/H.264/AV1 for video (AV1 and H.265 entered serious production in 2024).
Take any leg away and the call does not happen. What a PaaS sells you is the labor of binding those three legs into a working bundle.
2 · RTCPeerConnection — The Core API on One Page
The surface the browser exposes is surprisingly small. Three objects are at the center.
- `RTCPeerConnection` — represents one peer. SDP offer/answer, ICE candidates and track send/receive all happen here.
- `MediaStream` and `MediaStreamTrack` — the abstraction for camera, microphone and screen share.
- `RTCDataChannel` — sends arbitrary, non-media data over the same PeerConnection.
The smallest two-party example (signaling is pseudocode):
// common setup on both sides
const pc = new RTCPeerConnection({
iceServers: [
{ urls: 'stun:stun.l.google.com:19302' },
{ urls: 'turn:turn.example.com:3478', username: 'u', credential: 'p' },
],
})
pc.onicecandidate = (e) => e.candidate && signaling.send('ice', e.candidate)
pc.ontrack = (e) => (remoteVideo.srcObject = e.streams[0])
const stream = await navigator.mediaDevices.getUserMedia({ video: true, audio: true })
stream.getTracks().forEach((t) => pc.addTrack(t, stream))
// caller
const offer = await pc.createOffer()
await pc.setLocalDescription(offer)
signaling.send('sdp', offer)
// callee
signaling.on('sdp', async (sdp) => {
await pc.setRemoteDescription(sdp)
const answer = await pc.createAnswer()
await pc.setLocalDescription(answer)
signaling.send('sdp', answer)
})
About 40 lines and a 1:1 call lights up. The trouble starts here. The moment you scale to 3, 30 or 300 participants, the cost of maintaining a P2P mesh explodes. That is why the next chapter is necessary.
3 · Topology — Mesh vs MCU vs SFU
There are three topologies, and in 2026 production almost every group call uses an SFU.
Mesh (full N:N graph)
A <----> B
^ \ / ^
| \/ |
| /\ |
v / \ v
D <----> C
Up and downlink: O(N) per peer, total O(N^2)
Pros: zero server cost, end-to-end encryption is natural
Cons: client CPU and bandwidth explode beyond 4-5 peers
MCU (Multipoint Conferencing Unit — server decodes, mixes, re-encodes)
A -+
B -+- [MCU: decode -> composite -> re-encode] -> one video -> everyone
C -+
D -+
Pros: only one stream of client bandwidth, legacy device compatible
Cons: very expensive server CPU, end-to-end encryption impossible
SFU (Selective Forwarding Unit — server only routes)
A -> [SFU] -> B, C, D
B -> [SFU] -> A, C, D
...
Pros: lightweight server CPU, easy to scale, per-recipient quality with simulcast/SVC
Cons: client downlink is O(N), end-to-end encryption needs Insertable Streams
In 2026, MCU rarely shows up in newly built group-call systems. SFU is the standard, and audience modes over 100 viewers typically combine SFU with HLS/LL-HLS or WHEP fan-out. LiveKit, Mediasoup, Janus, Jitsi Videobridge, Agora, Daily and Cloudflare Calls are all SFUs.
4 · WebRTC NV — Where Does the Standard Stand
WebRTC 1.0 became a W3C Recommendation on 2021-01, and every new feature since has been bundled under "WebRTC NV (Next Version)", progressed through the W3C WebRTC Working Group and IETF RTCWEB. At the 2026 mark, the items that matter in practice are:
- **Insertable Streams / RTCRtpScriptTransform** — JavaScript access to encoded frames, enabling E2EE, watermarking and adaptive transcoding. Stable in Chromium for years. Universal across Safari and Firefox 26+.
- **Encoded Audio/Video Frame** — the unit those transforms operate on. `RTCEncodedAudioFrame`, `RTCEncodedVideoFrame`, with metadata.
- **AV1** — codec. About 30% bitrate savings over VP9 at the same quality. Decode on Chrome 90+, Safari 17+, Firefox 116+. Encode depends on the device; hardware encoders are still scarce so a fallback is necessary.
- **L1T3 SVC (Scalable Video Coding)** — one encode produces three temporal layers, and the SFU picks per recipient. Complements simulcast (spatial layers).
- **Selectable Audio Output (setSinkId)** — pick the output device. Stable across Chrome, Edge, Safari and Firefox.
- **WebRTC-HTTP Ingest/Egress (WHIP/WHEP)** — IETF RFC. Encoder POSTs SDP once to push media (WHIP); viewer POSTs SDP once to pull (WHEP). OBS Studio 30+, FFmpeg 7, Cloudflare Stream Live, AWS IVS and LiveKit all support it.
- **WebCodecs and WebTransport** — adjacent standards. Decoupling encode/decode and transport, useful for games and cloud gaming where lower latency is required.
The point is that what was "experimental" through 2024 became commodity standard by 2026.
5 · Codecs — Opus Is God, Video Is Politics
Audio has effectively no choice. The mandatory codec in the WebRTC standard is Opus. From 8kHz speech to 48kHz music, variable bitrate, low latency. Voice AI agents use Opus as-is.
Video is politics.
- **VP8** — the most universal and compatible. No license fee. Hardware acceleration is weak.
- **VP9** — about 50% savings over VP8 at the same quality. Decode on Chrome, Edge, Firefox and Safari 14+.
- **H.264 (Baseline / Constrained Baseline)** — near-universal mobile hardware acceleration. Patent fees exist but PaaS handles them.
- **H.265 / HEVC** — long licensing disputes kept it niche on WebRTC. Apple ecosystem mostly.
- **AV1** — the future. 30% more efficient. Encoding is expensive, so phased rollout with simulcast/SVC.
- **Lyra** — Google's neural speech codec. Intelligible at 3kbps. Not in the WebRTC standard but usable inside SDKs.
Recommended default in 2026: Opus only for audio, simulcast with VP9 plus H.264 for video, AV1 as an opt-in. Screen sharing benefits from VP9 or AV1 for text clarity.
6 · ICE, STUN, TURN — Where Things Break Most Often
In WebRTC operations the single most frequent failure point is ICE candidate gathering. Behind corporate firewalls, carrier-grade NATs and half-deployed IPv6 networks, STUN alone is often not enough.
- **STUN** — only tells you your public IP and port. Very lightweight. Google's `stun:stun.l.google.com:19302` is, in practice, public infrastructure.
- **TURN** — relays the actual media. The server is moving real traffic, so the bandwidth bill matters. Roughly 10-25% of global call traffic ends up going through TURN.
- **TURN-TLS over port 443** — fallback for corporate firewalls that block UDP. Higher latency but raises connect rate.
- **Trickle ICE** — send candidates as soon as they are gathered. Cuts the time-to-first-packet. Long standardized but implementation quality varies.
The open-source standard is **coturn**. PaaS vendors run their own global TURN, bundle a quota for free and bill the excess, or charge separately. Cloudflare TURN (launched 2023) acted as a price disruptor and dragged other PaaS TURN prices down with it.
7 · Nine Platforms in One Line
A one-line summary of the candidate set in 2026:
- **LiveKit** — open-source SFU + LiveKit Cloud + Agents SDK. Official transport for OpenAI Realtime API. Apache 2.0.
- **Daily.co** — managed SFU + Daily Bots + Pipecat glue. Smoothest experience for AI voice integration.
- **Agora** — global plus China simultaneous coverage. 4.x SDK. Sub-100ms latency average globally.
- **Twilio Video** — shut down on 2024-12-05. Recommended replacements: Zoom Video SDK, LiveKit, Daily, Vonage.
- **Pion** — Go-based open-source WebRTC stack. A library, for when you build the infrastructure yourself.
- **Mediasoup 3** — Node.js + C++ SFU library. Very strong router, for teams that want to write their own.
- **Jitsi Meet / JVB / Jicofo** — open-source meeting stack. The self-host standard. Operated by 8x8.
- **Janus Gateway** — Meetecho's modular gateway. Plugins for conferencing, streaming, recording, SIP gateway.
- **AWS IVS Real-Time** — built on Twitch infrastructure. Managed. Live streaming and multi-host.
- **Cloudflare Calls (Realtime SFU)** — 0.05 USD/GB egress. SFU on the Cloudflare edge.
The next chapters look at each platform in detail.
8 · LiveKit — The Standard Transport for OpenAI Realtime API
LiveKit is an open-source project started in 2021. When OpenAI Realtime API chose LiveKit Agents as its first official SDK in 2024, it cemented LiveKit's position as the de facto standard.
Three layers move together as one bundle.
- **LiveKit Server** — Go-based SFU. Apache 2.0. Tens of thousands of concurrent sessions per node, hundreds of thousands in a cluster. Together with ION and Mediasoup, one of the three big open-source SFUs.
- **LiveKit Cloud** — that server run as a managed service on a global edge. Workspace-level billing. Free 50GB/month, Build plan from 100 USD/month.
- **LiveKit Agents** — a voice agent framework. Bundles STT, LLM and TTS into a single pipeline and joins a LiveKit Room as a bot. Adapters for OpenAI Realtime, GPT-4o Realtime, Anthropic Claude, Google Gemini, Cartesia, ElevenLabs, Deepgram and Whisper.
The core API revolves around Room.
const room = new Room({ adaptiveStream: true, dynacast: true })
room
.on(RoomEvent.TrackSubscribed, (track, pub, participant) => {
if (track.kind === 'video') document.body.appendChild(track.attach())
})
.on(RoomEvent.ParticipantConnected, (p) => console.log('joined', p.identity))
await room.connect('wss://your.livekit.cloud', token)
await room.localParticipant.enableCameraAndMicrophone()
`adaptiveStream` asks the SFU for a different simulcast layer based on the recipient's render size, and `dynacast` auto-pauses tracks no one is watching. Both directly cut cloud bandwidth bills.
9 · Daily.co, Daily Bots and Pipecat — Smoothest Stack for AI Voice
Daily has been selling managed WebRTC since 2016. In 2024 it bundled Daily Bots and the open-source Pipecat framework to claim the "easiest place to build AI calls" position.
- **Daily SDK** — JS, iOS, Android, React Native. `daily-js` drops an embeddable call UI in a single line.
- **Daily Bots** — managed server-side bots. Bundles LLM, STT and TTS to join a call.
- **Pipecat** — open source. Connects STT, LLM and TTS as a node graph. Daily built it but it is transport-independent and runs over LiveKit, Twilio Programmable Voice and the PSTN too.
The canonical one-liner:
const call = DailyIframe.createFrame({ url: 'https://yourdomain.daily.co/room' })
call.join()
`createFrame` creates an iframe and embeds Daily's UI wholesale. When UI customization is required you drop down to `createCallObject` mode and control every track yourself.
Pricing is per-minute, with Free 10,000 minutes/month and a Scale plan from 600 USD/month plus usage. Unlike most PaaS vendors, the meter is participants and minutes rather than bandwidth, which makes forecasting easier.
10 · Agora — Effectively the Only Choice for Global Plus China
Agora is a 2014 Shanghai/Silicon Valley company that runs its own global network called SD-RTN (Software-Defined Real-Time Network) under SoftBank. The 4.x SDK advertises a global average first-frame time of 76ms and sub-100ms global e2e latency under 5G.
Anyone with workloads inside China seriously considers Agora at this point. Other PaaS vendors struggle to operate reliably on the mainland and lack experience with regulatory items like ICP. Agora has the most.
The API is a Channel model.
const client = AgoraRTC.createClient({ mode: 'rtc', codec: 'vp9' })
await client.join(appId, channel, token, uid)
const [mic, cam] = await Promise.all([
AgoraRTC.createMicrophoneAudioTrack(),
AgoraRTC.createCameraVideoTrack({ encoderConfig: '1080p_1' }),
])
await client.publish([mic, cam])
Pricing is per-minute and per-quality. 1080p video runs about 0.0099 USD/minute, HD at 0.00399. Global and China routing is billed separately. The matrix is more complex than other PaaS vendors but the volume discount is generous.
11 · The Twilio Video Shutdown and Migration Paths
On 2024-03-04 Twilio closed new sign-ups for Programmable Video. On 2024-12-05, EOL. The product that was the de facto WebRTC PaaS for five years died — Twilio Voice/Messaging survived, but Video did not.
The official migration guide pointed at Zoom Video SDK. Zoom produced migration packages and code-conversion tools. In reality the market dispersed further.
- **Zoom Video SDK** — closest API shape to Twilio's. Per-minute pricing. Best migration tooling.
- **LiveKit Cloud** — for open-source-friendly teams. Official guide mapping Twilio Programmable Video almost 1:1 to LiveKit Room.
- **Daily.co** — for teams that want to pivot quickly to AI voice.
- **Vonage Video API** (formerly OpenTok) — the oldest PaaS. Abstraction level similar to Twilio's.
- **Dolby.io Communications** — for teams that need spatial audio or high-fidelity audio.
2025 was the year of migration, and market data through 2026 shows traffic split across those five destinations.
12 · Pion — A WebRTC Stack Written in Go
Pion is a Go-based WebRTC library Sean DuBois started in 2018. Not a managed service — a "way to write WebRTC directly in Go". The first place a team that has decided to implement WebRTC themselves looks.
Notes.
- **Modular** — DTLS, SRTP, ICE, SCTP, RTP and RTCP are separate packages. Pull only what you need.
- **Go's concurrency model** — handling tens of thousands of PeerConnections on a single node is straightforward.
- **Production adoption** — Twitch Cloud Game, Hopin, WeWork, parts of LiveKit.
In the Rust camp there is **WebRTC-rs** (started 2023, deliberately ported with an API close to Pion's), and in C++ there is Google's libwebrtc reference. Anyone building a managed SFU from scratch reasonably picks Pion, WebRTC-rs or libwebrtc.
13 · Mediasoup 3 — The Essence of the Router/Worker Model
Mediasoup is a Node.js + C++ SFU library started by José Luis Millán (formerly SIP.js). Not managed — a library. Teams that intend to build their own SFU reach for it most often.
- **C++ Worker process** — one process pins one CPU core. The media-routing hot path.
- **Node.js Router object** — JS abstraction over Worker, managing rooms, peers and transports.
- **Pipe Transport** — media delivery between Routers. The standard pattern for multi-node scale.
- **DataProducer / DataConsumer** — SCTP-on-DataChannel routed through the SFU.
Representative users include Atlassian, Versatica, Whereby and parts of LiveKit. The pattern is: teams without resources to build a managed product build their own SFU on top of Mediasoup and host business rooms on top of that.
14 · Jitsi Meet — The Self-Hosting Standard
Jitsi started at BlueJimp, was acquired by Atlassian in 2015 and moved to 8x8 in 2018. It is the most frequently chosen bundle for self-hosted meetings.
- **Jitsi Videobridge (JVB)** — Java-based SFU. Hundreds of participants per node, thousands with cascading.
- **Jicofo (Jitsi Conference Focus)** — XMPP-based signaling and conference focus.
- **Prosody** — XMPP server.
- **Jitsi Meet** — React frontend. meet.jit.si is the reference instance.
- **Jibri** — recording and live-streaming bot. Spins up Chromium and captures the screen.
The canonical self-host is the docker-jitsi-meet bundle. Spins up in one command on a single VM. Most often seen in government, healthcare and education where security demands are high.
15 · Janus Gateway — The Essence of the Modular Gateway
Janus is a C-based WebRTC gateway from the Italian company Meetecho. The structure swaps SFU, MCU, SIP gateway, recording, streaming and NoSIP modes by loading plugins. While Jitsi specializes in meetings, Janus is closer to a "generic gateway that attaches WebRTC to anything".
Key plugins.
- **videoroom** — SFU-mode group calls.
- **streaming** — accept RTSP/RTP and fan out as WebRTC. Camera and OBS live distribution.
- **sip** — SIP gateway. Pulls PSTN calls into WebRTC.
- **recordplay** — record and playback.
- **textroom** — DataChannel-based chat rooms.
Discord voice channels in the 2017-2020 era, early Slack Huddle, and parts of Microsoft Teams reportedly used Janus or a fork. Maximum flexibility, in exchange for operational complexity.
16 · AWS IVS Real-Time and Cloudflare Calls
The two newest axes in managed SFU came from cloud providers.
**AWS IVS (Interactive Video Service)** is Twitch infrastructure rebranded as an AWS product. It launched as a live-streaming-only service in 2020 and added IVS Real-Time on 2023-08 with multi-host (stage) capabilities.
- **Stage** — up to 12 hosts in real-time bidirectional. Broadcast live to the audience.
- **Channel** — one-way live streaming. LL-HLS at sub-5-second latency.
- **Composition** — server-side composition into a single video for downstream broadcast.
- **WHIP** support — push directly from OBS, FFmpeg or your own encoder.
Pricing for Stage is 0.0149 USD/minute/participant. Channel breaks down into viewing-time and encoding-time.
**Cloudflare Calls (Realtime SFU)** launched in beta on 2023-11 and went GA on 2024-09. The single-line price was a shock.
- **0.05 USD/GB** egress — 5x to 10x cheaper than other PaaS.
- Global edge — uses Cloudflare's 300-plus PoPs directly as the SFU.
- **TURN over 443** included free. Cloudflare TURN is also sold standalone.
That price pressure pulled down every other PaaS price sheet. Cloudflare also runs WebRTC WHIP/WHEP gateways on top of Calls.
17 · OpenAI Realtime API, Claude Voice, Cartesia and ElevenLabs — The Transport for AI Voice
In October 2024 OpenAI shipped the Realtime API. The core change was that GPT-4o now hears and speaks audio directly, rather than through a text bridge. Before this, voice agents were a three-model serial pipeline (STT, LLM, TTS); Realtime collapsed those three stages into a single GPT-4o.
LiveKit was decisive here. OpenAI built its official SDK on LiveKit Agents, and as a result LiveKit became the de facto standard transport. Direct WebSocket connections are possible, but for production almost everyone runs LiveKit Agents on top of an SFU.
Competitors that appeared around the same time.
- **Anthropic Claude voice** — voice agents on top of Claude 3.5/3.7. Adapters for Pipecat and LiveKit Agents.
- **Google Gemini Live API** — announced at 2024 I/O. Multimodal bidirectional streaming.
- **Cartesia Sonic** — TTS. 90ms first-token latency. Most frequently used to shave response latency off voice agents.
- **ElevenLabs Conversational AI** — managed STT, LLM, TTS and turn-taking. Own transport or Twilio/LiveKit connectors.
- **Deepgram Aura** — TTS. 200ms latency.
- **Whisper and Whisper-Large-v3** — STT. Open-source foundation.
With WebRTC P95 e2e latency under 200ms, the 2026 baseline goal for AI voice is total user-perceived latency (end of human turn to start of agent response) under 500ms. The combination that made that goal reachable is exactly the WebRTC standard plus modern AI models.
18 · WHIP and WHEP — Live Streaming Moves onto WebRTC
For years live streaming meant the RTMP-push and HLS-pull pair. Encoders like OBS pushed to a media server over RTMP and viewers pulled with HLS. Latency was typically 6 to 30 seconds.
IETF started standardizing WHIP (WebRTC-HTTP Ingest Protocol) and WHEP (WebRTC-HTTP Egress Protocol) in 2022 to replace that pair. The key is simplicity.
- **WHIP** — encoder POSTs SDP over HTTP. Server responds with an SDP answer. The rest is plain WebRTC.
- **WHEP** — viewer POSTs SDP over HTTP. The rest is WebRTC.
That single POST unified ingest and egress on top of the same WebRTC stack. Latency drops to the 1-3 second range.
State of support in 2026.
- **OBS Studio 30+** — WHIP output supported by default.
- **FFmpeg 7+** — WHIP muxer.
- **Cloudflare Stream Live, AWS IVS, Mux, Daily, LiveKit, Wowza, Ant Media** — all support WHIP/WHEP.
- Browsers can implement WHEP without a special SDK, just `RTCPeerConnection` and `fetch`.
RTMP is not dead, but the standards crowd's consensus is that by 2030 new systems will almost all be on WHIP/WHEP.
19 · The Korean and Japanese RTC Markets
Korea and Japan are markets where strong local vendors exist on top of the global PaaS.
**Korea — NHN TalkN, NCP Real-Time Comms, KakaoTalk Voice**
- **NHN TalkN** — NHN Cloud's managed WebRTC. Frequently chosen in gaming and education. Strong on domestic data-residency requirements.
- **Naver Cloud Platform Real-Time Comms** — Naver Cloud's managed SFU. Strong on Korean KISA and VoIP certifications.
- **KakaoTalk voice/video** — proprietary stack. No public SDK, but pieces are exposed in Kakao i BizCall and KakaoWork.
- **Zoom, Google Meet, Microsoft Teams** — the Korean enterprise collaboration standard. All ride on WebRTC.
**Japan — Skyway (NTT Communications), Yahoo!Japan, NTT-X**
- **Skyway** — operated by NTT Communications. The oldest domestic PaaS since 2014. v2 launched in 2023 with a fully revamped SDK. Strong on domestic data centers and Japanese documentation.
- **NTT Communications Smart Workspace** — a collaboration SaaS layered on top of Skyway.
- **LINE voice/video** — proprietary stack. Integrated into LINE WORKS.
- **Yahoo!Japan** — proprietary meeting system.
Japan's Skyway is solid enough that global PaaS share is lower than in Korea. Korea has seen faster penetration by global PaaS — Zoom, Google Meet, Agora and LiveKit in particular.
20 · Decision Matrix — Where to Use What
Recommended choices by scenario, May 2026.
- **1:1 video calls — SaaS where UI freedom matters most** → LiveKit Cloud or Daily.co. Full UI customization is easy.
- **Group meetings — Zoom-class features from day one** → Zoom Video SDK is the fastest ramp-up. UI freedom is limited.
- **Global plus China simultaneous coverage required** → Agora is effectively the only choice.
- **Self-hosting, government, healthcare, education** → Jitsi Meet or Janus Gateway. Data does not leave the premises.
- **Live streaming plus multi-host** → AWS IVS Real-Time. Twitch-friendly workflow.
- **Price-sensitive large audiences** → Cloudflare Calls plus WHEP. 0.05 USD/GB.
- **AI voice agents** → LiveKit Agents plus OpenAI Realtime or Claude voice plus Cartesia/ElevenLabs. De facto standard.
- **Want to build your own SFU** → Mediasoup 3 (Node.js) or Pion (Go) or ION.
- **Domestic data residency — Korea** → NHN TalkN or NCP Real-Time Comms.
- **Domestic data residency — Japan** → Skyway v2.
This matrix is not a universal answer. Pricing, the team's language stack, operations staffing, data governance and government certifications are all variables. It is, however, enough to narrow the candidate set.
21 · Operations Traps — Where Things Actually Break
Whether managed or self-hosted, the operational failure points look similar.
- **First connection fails** — TURN configuration. Corporate networks block UDP and TURN-TLS over 443 is often required.
- **Quality regression** — adaptive simulcast disabled. If the sender's uplink is narrow and auto-downlayer is off, everyone breaks together.
- **Missing recordings** — server-side recording SDK permissions missing, or container disk full.
- **Audio echo** — `echoCancellation: true` not actually applied on some devices. Forcing AEC3 helps.
- **Mobile background drops** — iOS Safari background restrictions, especially in PWAs. The standard is gradually relaxing.
- **CPU overruns** — software VP9/AV1 encoding pinning mobile CPU. Fall back to H.264 simulcast.
- **Clock drift** — bad NTP breaks RTCP. Common in containers.
- **TURN cost overruns** — when TURN usage crosses 25%, revisit the cost model.
The standard operations runbook is to always keep four graphs visible: P95 latency, connect failure rate, TURN usage and per-device encoder fallback rate.
22 · Security — DTLS-SRTP, E2EE, Workflow
The default security of WebRTC is strong. All media is encrypted with DTLS-SRTP and the keys are not in the SDP. The standard itself has no "encryption off" toggle.
The catch is the SFU. Routing media requires unwrapping DTLS-SRTP once, which means the SFU operator can see plaintext media. For genuine end-to-end encryption you encrypt once more before the SFU using **Insertable Streams / RTCRtpScriptTransform**.
- **Jitsi E2EE** — Insertable Streams based. A single key shared by all participants.
- **Google Meet E2EE** — 1:1 and small group only. Same mechanism.
- **Zoom E2EE** — opt-in. Disables some features (recording, dial-in).
- **LiveKit E2EE** — Insertable Streams plus Web Crypto. Built into the client library.
Turning E2EE on disables server-side recording, server-side captions and SFU transcoding all at once. That trade-off needs to be designed in from the start.
Other items to mind.
- **JWT token expiry** — keep room entry token TTL short.
- **TURN credential rotation** — static credentials are risky. Use the ephemeral credential standard.
- **Watermarking** — user-ID watermarks on screen share. Doable with RTCRtpScriptTransform.
23 · The Future — WebTransport, QUIC, Cloud Gaming Adjacent
Other standards are growing next to WebRTC, solving similar problems differently.
- **WebTransport** — bidirectional, optionally-reliable transport over QUIC. The successor to WebSocket and WebRTC DataChannel. Chrome, Edge, Firefox, Safari 26+.
- **WebCodecs** — codecs exposed to the browser, decoupled from transport. Useful in games where you build your own routing.
- **Media over QUIC (MoQ)** — IETF working group standardizing live media over QUIC. Target standardization 2027-2028.
- **HTTP/3 multiplexing** — collapse signaling onto a single connection too.
- **Cloud gaming — NVIDIA GeForce NOW, Xbox Cloud Gaming** — combinations of WebRTC, WebCodecs and WebTransport.
WebRTC is not going away. The recognition that WebRTC is heavy for certain workloads has hardened over five years, and WebTransport and MoQ have grown alongside it. By around 2030 the likely lineup is calls on WebRTC, live and games and messaging on WebTransport and MoQ.
24 · Closing — One-Line Recommendation
If a team starting fresh in 2026 asked for a one-line recommendation:
- **AI voice agents** — LiveKit Cloud plus LiveKit Agents plus OpenAI Realtime or Claude voice.
- **Human group meetings** — Daily.co or LiveKit Cloud. The best balance of UI freedom and price.
- **Global plus China** — Agora.
- **Self-host** — Jitsi Meet or LiveKit Server.
- **Live streaming plus multi-host** — AWS IVS Real-Time.
- **Cheap mass audiences** — Cloudflare Calls.
- **Domestic data** — Korea NHN TalkN, Japan Skyway.
The standards are stable. The tools are varied enough. What is left is to pick the bundle that fits the workload and keep the P95 and ops graphs on screen.
References
- W3C WebRTC 1.0: https://www.w3.org/TR/webrtc/
- W3C WebRTC NV Use Cases: https://www.w3.org/TR/webrtc-nv-use-cases/
- IETF RTCWEB Working Group: https://datatracker.ietf.org/wg/rtcweb/about/
- WHIP RFC 9725: https://datatracker.ietf.org/doc/rfc9725/
- WHEP draft: https://datatracker.ietf.org/doc/draft-ietf-wish-whep/
- LiveKit: https://livekit.io/
- LiveKit Agents docs: https://docs.livekit.io/agents/
- Daily.co: https://www.daily.co/
- Pipecat: https://www.pipecat.ai/
- Agora: https://www.agora.io/
- Twilio Programmable Video EOL: https://www.twilio.com/en-us/changelog/programmable-video-eol
- Pion: https://github.com/pion/webrtc
- WebRTC-rs: https://github.com/webrtc-rs/webrtc
- Mediasoup: https://mediasoup.org/
- Jitsi Meet: https://jitsi.org/
- Janus Gateway: https://janus.conf.meetecho.com/
- AWS IVS Real-Time: https://aws.amazon.com/ivs/
- Cloudflare Calls: https://developers.cloudflare.com/calls/
- OpenAI Realtime API: https://platform.openai.com/docs/guides/realtime
- Anthropic Voice: https://www.anthropic.com/
- Cartesia Sonic: https://cartesia.ai/
- ElevenLabs Conversational AI: https://elevenlabs.io/conversational-ai
- coturn: https://github.com/coturn/coturn
- Skyway (NTT): https://skyway.ntt.com/
- NHN TalkN: https://www.toast.com/service/realtime/talkn
- Naver Cloud Real-Time Comms: https://www.ncloud.com/product/media/rtc
현재 단락 (1/306)
On 2024-12-05, Twilio officially shut down Programmable Video. The product that had been the de fact...