Skip to content
Published on

How KubeVirt Runs VMs on Pods

Authors

Introduction

The most unfamiliar aspect when first encountering KubeVirt is this: "Kubernetes is originally a container orchestrator, so how can it run VMs on Pods?" To answer this question, we first need to clear up a misconception. KubeVirt does not inject VM functionality into kubelet. Instead, it separates what Kubernetes already does well from what the virtualization stack does well.

  • What Kubernetes handles: API storage, scheduling, Pod networking, volume mounting, node placement, retries
  • What KubeVirt handles: VM-related CRDs, VM-specific controllers, node agents, libvirt and QEMU orchestration
  • What Linux and the hypervisor handle: cgroup, namespace, tap, netlink, /dev/kvm, QEMU virtualization execution

In other words, the essence of KubeVirt is not "emulating VMs on top of the container runtime." More precisely, it is a structure that uses Pods as sandboxes and control units for VM execution, running QEMU and libvirt within them.

The Mental Model to Grasp First

KubeVirt's structure can be summarized in one line:

  1. The user creates a VirtualMachine or VirtualMachineInstance.
  2. virt-controller sees this and creates a virt-launcher Pod.
  3. Kubernetes schedules that Pod like any ordinary Pod.
  4. The virt-handler on the assigned node sees the Pod and VMI and directs the actual VM launch.
  5. libvirt and QEMU inside the virt-launcher Pod execute the VM process.

The important point here is that the VM is "packaged" inside a Pod, not that the guest operating system becomes a container. The guest OS still runs on virtual hardware provided by QEMU. However, that QEMU process executes within the Pod's resource boundaries.

KubeVirt's official architecture documents docs/architecture.md and docs/components.md also clearly show this structure. There, KubeVirt is described as an additional control plane and node agent set layered on top of Kubernetes.

Why Use Pods as VM Execution Units

This design is very practical. KubeVirt doesn't need to recreate schedulers, volume attach logic, or network allocators itself.

1. No Need to Recreate Scheduling

VMs ultimately need to be placed on a node. Kubernetes already places Pods well based on resource requests, affinity, taints, topology spread, and priority. KubeVirt reuses this capability.

2. No Need to Create Networking from Scratch

In the default model, the virt-launcher Pod first receives a network through CNI. Then KubeVirt performs additional wiring such as bridge, masquerade, and TAP within that Pod's network namespace to attach guest NICs.

3. No Need to Create Storage from Scratch

Resources like PVCs, DataVolumes, container disks, secrets, and config maps are all delivered to virt-launcher through the Pod volume model. KubeVirt then connects these as disk images or block devices to the guest on top of that.

What Actually Runs Inside the Pod

The core Pod is the virt-launcher Pod. Think of it as "one Pod per VM." The important processes inside this Pod are:

  • virt-launcher
  • libvirtd or virtqemud family control components
  • QEMU
  • Optionally sidecars or hook containers

Users typically say "a VM is running," but from the node's perspective, a QEMU process is actually executing within the Pod's cgroup and namespace. This is the most practical reason KubeVirt can run VMs on Pods.

docs/components.md describes the purpose of the virt-launcher Pod as "providing cgroups and namespaces for the VMI process." This expression is very important. The Pod here is not just a deployment unit but a VM execution boundary.

The Boundary Between Kubernetes and KubeVirt

The most confusing part when understanding KubeVirt is "who is responsible for what."

What Kubernetes Continues to Be Responsible For

  • Pod scheduling
  • Volume mount preparation
  • Pod network attach
  • Container lifecycle
  • Node state reflection

What KubeVirt Additionally Takes Responsibility For

  • Providing VM-related API types
  • Converting VM spec to launcher Pod spec
  • Coordinating VM process lifecycle on nodes
  • Network binding and DHCP assistance for guests
  • Live migration orchestration

What libvirt and QEMU Are Responsible For

  • Domain XML interpretation
  • Virtual hardware model configuration
  • CPU, memory, disk, NIC virtualization
  • Live migration data transfer

Because this separation is well-designed, KubeVirt can attach VM workloads without forking Kubernetes or making large-scale modifications to kubelet.

Where This Structure Appears in Source Code

The core packages you will see repeatedly throughout this series are:

staging/src/kubevirt.io/api/core/v1
pkg/virt-controller/watch
pkg/virt-handler
pkg/virt-launcher/virtwrap
pkg/network

Summarizing each layer in one line:

  • staging/src/kubevirt.io/api/core/v1: VM-related CRD schemas
  • pkg/virt-controller/watch: cluster-wide reconcile logic
  • pkg/virt-handler: per-node VM agent
  • pkg/virt-launcher/virtwrap: libvirt, QEMU control
  • pkg/network: code connecting Pod network to guest NIC

The Core Mechanisms That Make "VMs on Pods" Work

Let's return to the question. How exactly was it possible to implement VM functionality in Pods?

The key is three things:

First, Pods Are Originally Process Isolation Boundaries

Pods provide boundaries including network namespace, mount namespace, PID namespace, and cgroups. QEMU is ultimately a Linux process, so it can execute within these boundaries.

Second, Host Capabilities Like /dev/kvm Can Be Exposed to Pods

For accelerated hardware virtualization, KVM device access is needed. KubeVirt connects the appropriate devices and permissions to the virt-launcher side to ensure guest execution performance.

Third, A Translation Layer Is Created Between the Kubernetes Resource Model and the Virtualization Model

Users declare CPU, memory, disks, and NICs in the VMI spec. KubeVirt converts these sequentially into Pod spec, libvirt domain spec, and guest-visible device configuration. In other words, the essence of KubeVirt is a translation engine.

Common Misconceptions

Misconception 1: Since VMs Run Inside Containers, They Are Essentially the Same as Containers

No. The execution boundary reuses Pods, but the guest OS runs on virtual hardware provided by QEMU. The process model and guest OS model are different.

Misconception 2: KubeVirt Implements All Networking and Storage Itself

No. By design philosophy, it maximally reuses Kubernetes, CNI, and the volume system. KubeVirt adds VM-friendly wiring on top.

Misconception 3: kubelet Understands the VM Lifecycle

It doesn't directly understand it. kubelet manages the virt-launcher Pod. The detailed state of the VM lifecycle is additionally coordinated by virt-handler and virt-launcher.

Debugging Checkpoints Operators Can Use Immediately

  • If a VMI is created but there is no Pod, look at the virt-controller side reconcile.
  • If the Pod is up but the VM is not booting, look at the communication between virt-handler and virt-launcher.
  • If guest networking is abnormal, check in order: Pod NIC, bridge, TAP, DHCP.
  • If migration issues occur, remember that the controller stage and libvirt migration stage are separated.

Conclusion

The reason KubeVirt was able to implement "VMs on Pods" is not because it changed Kubernetes, but because it leveraged Kubernetes's strengths as-is and added the translation layers needed for VMs. Pods become execution sandboxes, virt-controller handles orchestration, virt-handler manages per-node execution, and libvirt and QEMU inside virt-launcher create the actual VM.

In the next article, we will examine the object model that constitutes this structure -- what VirtualMachine, VirtualMachineInstance, and VirtualMachineInstanceMigration each represent based on the source schema.