Split View: KubeVirt는 어떻게 Pod 위에서 VM을 실행하는가

KubeVirt는 어떻게 Pod 위에서 VM을 실행하는가

들어가며
가장 먼저 잡아야 할 정신 모델
왜 굳이 Pod를 VM 실행 단위로 쓰는가
Pod 안에서 실제로 무엇이 돌아가는가
Kubernetes와 KubeVirt의 경계
소스 코드에서 이 구조가 드러나는 위치
"Pod 위에서 VM"이 성립하는 핵심 메커니즘
자주 하는 오해
운영자가 바로 써먹을 수 있는 디버깅 체크포인트
마무리

들어가며

KubeVirt를 처음 보면 가장 낯선 지점은 이것이다. "Kubernetes는 원래 컨테이너 오케스트레이터인데, 어떻게 VM을 Pod 위에서 띄울 수 있지?" 이 질문에 답하려면 먼저 오해 하나를 지워야 한다. KubeVirt는 kubelet 안에 VM 기능을 집어넣지 않는다. 대신 Kubernetes가 이미 잘하는 일과, 가상화 스택이 잘하는 일을 분리한다.

Kubernetes가 맡는 것: API 저장, 스케줄링, Pod 네트워크, 볼륨 마운트, 노드 배치, 재시도
KubeVirt가 맡는 것: VM 관련 CRD, VM 전용 controller, 노드 에이전트, libvirt와 QEMU orchestration
Linux와 하이퍼바이저가 맡는 것: cgroup, namespace, tap, netlink, /dev/kvm, QEMU 가상화 실행

즉 KubeVirt의 핵심은 "컨테이너 런타임 위에서 VM을 에뮬레이션한다"가 아니다. 정확히는 Pod를 VM 실행을 위한 샌드박스와 제어 단위로 사용하고, 그 안에서 QEMU와 libvirt를 동작시키는 구조다.

가장 먼저 잡아야 할 정신 모델

KubeVirt의 구조를 한 줄로 요약하면 다음과 같다.

사용자는 VirtualMachine 또는 VirtualMachineInstance를 만든다.
virt-controller가 이를 보고 virt-launcher Pod를 만든다.
Kubernetes는 그 Pod를 일반 Pod처럼 스케줄링한다.
해당 노드의 virt-handler가 Pod와 VMI를 보고 실제 VM 런치를 지휘한다.
virt-launcher Pod 내부의 libvirt와 QEMU가 VM 프로세스를 실행한다.

여기서 중요한 점은 VM이 Pod 안에 "포장"되어 있다는 것이지, 게스트 운영체제가 컨테이너가 된다는 뜻이 아니라는 점이다. 게스트 운영체제는 여전히 QEMU가 제공하는 가상 하드웨어 위에서 동작한다. 다만 그 QEMU 프로세스가 Pod의 리소스 경계 안에서 실행된다.

KubeVirt 공식 아키텍처 문서인 docs/architecture.md와 docs/components.md도 이 구조를 분명하게 보여준다. 거기서 KubeVirt는 Kubernetes 위에 얹히는 추가 control plane과 node agent 집합으로 설명된다.

왜 굳이 Pod를 VM 실행 단위로 쓰는가

이 설계는 매우 실용적이다. KubeVirt가 직접 스케줄러, 볼륨 attach 로직, 네트워크 allocator를 다시 만들지 않아도 되기 때문이다.

1. 스케줄링을 다시 만들 필요가 없다

VM도 결국 어느 노드에 배치되어야 한다. Kubernetes는 이미 자원 요청, affinity, taint, topology spread, priority를 기반으로 Pod를 잘 배치한다. KubeVirt는 이 기능을 재사용한다.

2. 네트워크를 새로 만들 필요가 없다

기본 모델에서는 먼저 virt-launcher Pod가 CNI를 통해 네트워크를 받는다. 그 다음 KubeVirt가 그 Pod 네트워크 namespace 내부에서 bridge, masquerade, TAP 같은 추가 wiring을 수행해 게스트 NIC를 붙인다.

3. 스토리지를 새로 만들 필요가 없다

PVC, DataVolume, container disk, secret, config map 같은 리소스는 모두 Pod 볼륨 모델을 통해 virt-launcher에 전달된다. KubeVirt는 그 위에서 이를 디스크 이미지나 블록 디바이스로 guest에 연결한다.

Pod 안에서 실제로 무엇이 돌아가는가

핵심 Pod는 virt-launcher Pod다. 이 Pod는 "VM 하나당 하나"라는 감각으로 이해하면 된다. 이 Pod 안에서 중요한 프로세스는 다음과 같다.

virt-launcher
libvirtd 또는 virtqemud 계열 제어 컴포넌트
QEMU
경우에 따라 sidecar나 hook container

사용자는 보통 "VM이 실행된다"라고 말하지만, 노드 관점에서는 실제로 QEMU 프로세스가 Pod의 cgroup과 namespace 안에서 실행된다. 이것이 KubeVirt가 VM을 Pod 위에 올릴 수 있는 가장 실질적인 이유다.

docs/components.md는 virt-launcher Pod의 목적을 "VMI process를 위한 cgroup과 namespace를 제공하는 것"이라고 설명한다. 이 표현이 아주 중요하다. Pod는 여기서 단순한 배포 단위가 아니라 VM 실행 경계다.

Kubernetes와 KubeVirt의 경계

KubeVirt를 이해할 때 가장 많이 헷갈리는 부분은 "누가 무엇을 책임지는가"이다.

Kubernetes가 계속 책임지는 부분

Pod 스케줄링
volume mount 준비
Pod 네트워크 attach
container lifecycle
node 상태 반영

KubeVirt가 추가로 책임지는 부분

VM 관련 API 종류 제공
VM spec를 launcher Pod spec로 변환
노드에서 VM 프로세스 lifecycle 조율
guest용 네트워크 binding과 DHCP 보조
live migration orchestration

libvirt와 QEMU가 책임지는 부분

domain XML 해석
가상 하드웨어 모델 구성
CPU, 메모리, 디스크, NIC 가상화
live migration 데이터 전송

이 분리가 잘 되어 있기 때문에 KubeVirt는 Kubernetes를 포크하거나 kubelet을 대규모로 수정하지 않고도 VM 워크로드를 붙일 수 있다.

소스 코드에서 이 구조가 드러나는 위치

이번 시리즈 전반에서 반복적으로 보게 될 핵심 패키지는 다음과 같다.

staging/src/kubevirt.io/api/core/v1
pkg/virt-controller/watch
pkg/virt-handler
pkg/virt-launcher/virtwrap
pkg/network

각 레이어를 한 줄로 정리하면 다음과 같다.

staging/src/kubevirt.io/api/core/v1: VM 관련 CRD 스키마
pkg/virt-controller/watch: cluster-wide reconcile logic
pkg/virt-handler: 노드별 VM 에이전트
pkg/virt-launcher/virtwrap: libvirt, QEMU 제어
pkg/network: Pod 네트워크를 guest NIC로 연결하는 코드

"Pod 위에서 VM"이 성립하는 핵심 메커니즘

이제 질문으로 돌아가 보자. 도대체 어떻게 VM 기능을 Pod에서 구현할 수 있었을까?

핵심은 세 가지다.

첫째, Pod는 원래 프로세스 격리 경계다

Pod는 network namespace, mount namespace, PID namespace, cgroup 등의 경계를 제공한다. QEMU는 결국 리눅스 프로세스이므로 이 경계 안에서 실행될 수 있다.

둘째, `/dev/kvm` 같은 호스트 기능을 Pod에 노출할 수 있다

가속된 하드웨어 가상화를 위해서는 KVM 장치 접근이 필요하다. KubeVirt는 적절한 장치와 권한을 virt-launcher 쪽에 연결해 guest 실행 성능을 확보한다.

셋째, Kubernetes 리소스 모델과 가상화 모델 사이에 translation layer를 만든다

사용자는 VMI spec에 CPU, 메모리, 디스크, NIC를 선언한다. KubeVirt는 이를 Pod spec, libvirt domain spec, guest-visible 디바이스 구성으로 차례대로 변환한다. 즉 KubeVirt의 본질은 translation engine이다.

자주 하는 오해

오해 1: VM이 container 안에서 돌아가니 결국 container와 같다

아니다. 실행 경계는 Pod를 재사용하지만, guest 운영체제는 QEMU가 제공하는 가상 하드웨어 위에서 돌아간다. 프로세스 모델과 guest OS 모델은 다르다.

오해 2: KubeVirt가 네트워크와 스토리지를 모두 자체 구현한다

아니다. 설계 철학상 Kubernetes와 CNI, 볼륨 시스템을 최대한 재사용한다. KubeVirt는 그 위에서 VM 친화적 wiring을 덧붙인다.

오해 3: kubelet이 VM lifecycle을 이해한다

직접 이해하지 않는다. kubelet은 virt-launcher Pod를 관리한다. VM lifecycle의 세부 상태는 virt-handler와 virt-launcher가 추가로 조율한다.

운영자가 바로 써먹을 수 있는 디버깅 체크포인트

VMI가 생성되었는데 Pod가 없다면 virt-controller 쪽 reconcile을 본다.
Pod는 떴는데 VM이 부팅되지 않으면 virt-handler와 virt-launcher의 통신을 본다.
guest 네트워크가 이상하면 Pod NIC, bridge, TAP, DHCP 순서로 본다.
migration 이슈가 나면 controller 단계와 libvirt migration 단계가 분리되어 있다는 점을 기억한다.

마무리

KubeVirt가 "Pod 위에서 VM"을 구현할 수 있었던 이유는 Kubernetes를 바꾸어서가 아니라, Kubernetes의 강점을 그대로 이용하고 VM에 필요한 translation layer를 추가했기 때문이다. Pod는 실행 샌드박스가 되고, virt-controller는 orchestration을 맡고, virt-handler는 노드별 실행을 맞추며, virt-launcher 안의 libvirt와 QEMU가 실제 VM을 만든다.

다음 글에서는 이 구조를 구성하는 객체 모델, 즉 VirtualMachine, VirtualMachineInstance, VirtualMachineInstanceMigration이 각각 무엇을 표현하는지 소스 스키마 기준으로 정리해보겠다.

How KubeVirt Runs VMs on Pods

Introduction
The Mental Model to Grasp First
Why Use Pods as VM Execution Units
What Actually Runs Inside the Pod
The Boundary Between Kubernetes and KubeVirt
Where This Structure Appears in Source Code
The Core Mechanisms That Make "VMs on Pods" Work
Common Misconceptions
Debugging Checkpoints Operators Can Use Immediately
Conclusion

Introduction

The most unfamiliar aspect when first encountering KubeVirt is this: "Kubernetes is originally a container orchestrator, so how can it run VMs on Pods?" To answer this question, we first need to clear up a misconception. KubeVirt does not inject VM functionality into kubelet. Instead, it separates what Kubernetes already does well from what the virtualization stack does well.

What Kubernetes handles: API storage, scheduling, Pod networking, volume mounting, node placement, retries
What KubeVirt handles: VM-related CRDs, VM-specific controllers, node agents, libvirt and QEMU orchestration
What Linux and the hypervisor handle: cgroup, namespace, tap, netlink, /dev/kvm, QEMU virtualization execution

In other words, the essence of KubeVirt is not "emulating VMs on top of the container runtime." More precisely, it is a structure that uses Pods as sandboxes and control units for VM execution, running QEMU and libvirt within them.

The Mental Model to Grasp First

KubeVirt's structure can be summarized in one line:

The user creates a VirtualMachine or VirtualMachineInstance.
virt-controller sees this and creates a virt-launcher Pod.
Kubernetes schedules that Pod like any ordinary Pod.
The virt-handler on the assigned node sees the Pod and VMI and directs the actual VM launch.
libvirt and QEMU inside the virt-launcher Pod execute the VM process.

The important point here is that the VM is "packaged" inside a Pod, not that the guest operating system becomes a container. The guest OS still runs on virtual hardware provided by QEMU. However, that QEMU process executes within the Pod's resource boundaries.

KubeVirt's official architecture documents docs/architecture.md and docs/components.md also clearly show this structure. There, KubeVirt is described as an additional control plane and node agent set layered on top of Kubernetes.

Why Use Pods as VM Execution Units

This design is very practical. KubeVirt doesn't need to recreate schedulers, volume attach logic, or network allocators itself.

1. No Need to Recreate Scheduling

VMs ultimately need to be placed on a node. Kubernetes already places Pods well based on resource requests, affinity, taints, topology spread, and priority. KubeVirt reuses this capability.

2. No Need to Create Networking from Scratch

In the default model, the virt-launcher Pod first receives a network through CNI. Then KubeVirt performs additional wiring such as bridge, masquerade, and TAP within that Pod's network namespace to attach guest NICs.

3. No Need to Create Storage from Scratch

Resources like PVCs, DataVolumes, container disks, secrets, and config maps are all delivered to virt-launcher through the Pod volume model. KubeVirt then connects these as disk images or block devices to the guest on top of that.

What Actually Runs Inside the Pod

The core Pod is the virt-launcher Pod. Think of it as "one Pod per VM." The important processes inside this Pod are:

virt-launcher
libvirtd or virtqemud family control components
QEMU
Optionally sidecars or hook containers

Users typically say "a VM is running," but from the node's perspective, a QEMU process is actually executing within the Pod's cgroup and namespace. This is the most practical reason KubeVirt can run VMs on Pods.

docs/components.md describes the purpose of the virt-launcher Pod as "providing cgroups and namespaces for the VMI process." This expression is very important. The Pod here is not just a deployment unit but a VM execution boundary.

The Boundary Between Kubernetes and KubeVirt

The most confusing part when understanding KubeVirt is "who is responsible for what."

What Kubernetes Continues to Be Responsible For

Pod scheduling
Volume mount preparation
Pod network attach
Container lifecycle
Node state reflection

What KubeVirt Additionally Takes Responsibility For

Providing VM-related API types
Converting VM spec to launcher Pod spec
Coordinating VM process lifecycle on nodes
Network binding and DHCP assistance for guests
Live migration orchestration

What libvirt and QEMU Are Responsible For

Domain XML interpretation
Virtual hardware model configuration
CPU, memory, disk, NIC virtualization
Live migration data transfer

Because this separation is well-designed, KubeVirt can attach VM workloads without forking Kubernetes or making large-scale modifications to kubelet.

Where This Structure Appears in Source Code

The core packages you will see repeatedly throughout this series are:

staging/src/kubevirt.io/api/core/v1
pkg/virt-controller/watch
pkg/virt-handler
pkg/virt-launcher/virtwrap
pkg/network

Summarizing each layer in one line:

staging/src/kubevirt.io/api/core/v1: VM-related CRD schemas
pkg/virt-controller/watch: cluster-wide reconcile logic
pkg/virt-handler: per-node VM agent
pkg/virt-launcher/virtwrap: libvirt, QEMU control
pkg/network: code connecting Pod network to guest NIC

The Core Mechanisms That Make "VMs on Pods" Work

Let's return to the question. How exactly was it possible to implement VM functionality in Pods?

The key is three things:

First, Pods Are Originally Process Isolation Boundaries

Pods provide boundaries including network namespace, mount namespace, PID namespace, and cgroups. QEMU is ultimately a Linux process, so it can execute within these boundaries.

Second, Host Capabilities Like `/dev/kvm` Can Be Exposed to Pods

For accelerated hardware virtualization, KVM device access is needed. KubeVirt connects the appropriate devices and permissions to the virt-launcher side to ensure guest execution performance.

Third, A Translation Layer Is Created Between the Kubernetes Resource Model and the Virtualization Model

Users declare CPU, memory, disks, and NICs in the VMI spec. KubeVirt converts these sequentially into Pod spec, libvirt domain spec, and guest-visible device configuration. In other words, the essence of KubeVirt is a translation engine.

Common Misconceptions

Misconception 1: Since VMs Run Inside Containers, They Are Essentially the Same as Containers

No. The execution boundary reuses Pods, but the guest OS runs on virtual hardware provided by QEMU. The process model and guest OS model are different.

Misconception 2: KubeVirt Implements All Networking and Storage Itself

No. By design philosophy, it maximally reuses Kubernetes, CNI, and the volume system. KubeVirt adds VM-friendly wiring on top.

Misconception 3: kubelet Understands the VM Lifecycle

It doesn't directly understand it. kubelet manages the virt-launcher Pod. The detailed state of the VM lifecycle is additionally coordinated by virt-handler and virt-launcher.

Debugging Checkpoints Operators Can Use Immediately

If a VMI is created but there is no Pod, look at the virt-controller side reconcile.
If the Pod is up but the VM is not booting, look at the communication between virt-handler and virt-launcher.
If guest networking is abnormal, check in order: Pod NIC, bridge, TAP, DHCP.
If migration issues occur, remember that the controller stage and libvirt migration stage are separated.

Conclusion

The reason KubeVirt was able to implement "VMs on Pods" is not because it changed Kubernetes, but because it leveraged Kubernetes's strengths as-is and added the translation layers needed for VMs. Pods become execution sandboxes, virt-controller handles orchestration, virt-handler manages per-node execution, and libvirt and QEMU inside virt-launcher create the actual VM.

In the next article, we will examine the object model that constitutes this structure -- what VirtualMachine, VirtualMachineInstance, and VirtualMachineInstanceMigration each represent based on the source schema.