- Authors

- Name
- Youngju Kim
- @fjvbn20031
- Introduction
- QEMU Architecture
- TCG: Tiny Code Generator
- Block Layer: Storage Management
- Device Emulation and VirtIO
- Network Backends
- KVM Integration: Hardware-Accelerated Virtualization
- GPU Passthrough (VFIO)
- Practical VM Creation
- Use Cases
Introduction
QEMU (Quick Emulator) is an open-source machine emulator and virtualization tool. On its own, it provides software emulation; when combined with KVM, it delivers hardware-accelerated virtualization. It is the core engine behind major cloud platforms like OpenStack, Proxmox, and libvirt.
QEMU Architecture
System Emulation vs User-mode Emulation
QEMU offers two execution modes.
[System Emulation] [User-mode Emulation]
+-------------------+ +-------------------+
| Guest OS | | Guest Binary |
| (Full System) | | (Single Process) |
+-------------------+ +-------------------+
| Virtual Hardware | | Syscall Translation|
| CPU, RAM, Disk, | | Layer (Linux-user /|
| NIC, GPU, USB | | BSD-user) |
+-------------------+ +-------------------+
| QEMU Engine | | QEMU Engine |
+-------------------+ +-------------------+
| Host OS / HW | | Host OS / HW |
+-------------------+ +-------------------+
System Emulation (qemu-system-*):
- Emulates a complete computer system
- Virtualizes all hardware: CPU, memory, disk, network, and more
- Can run OSes from different architectures (e.g., ARM Linux on x86)
User-mode Emulation (qemu-*):
- Emulates a single binary only
- Runs programs compiled for different architectures on the current system
- Translates system calls to host OS system calls
# System Emulation: Boot an ARM system
qemu-system-aarch64 \
-M virt -cpu cortex-a72 \
-m 2G -nographic \
-kernel Image -append "console=ttyAMA0"
# User-mode Emulation: Run an ARM binary on x86
qemu-aarch64 ./arm-binary
TCG: Tiny Code Generator
When running without KVM, QEMU uses TCG (Tiny Code Generator), a JIT compiler.
TCG Workflow
Guest Code (e.g., ARM)
|
v
[Frontend] Decode guest instructions
|
v
TCG IR (Intermediate Representation)
|
v
[Backend] Convert to host instructions (e.g., x86)
|
v
Translation Block (TB) cached
|
v
Execute on Host CPU
Translation Blocks (TB):
- Guest code is translated in basic block units (up to branch points)
- Translated TBs are cached for reuse
- TB chaining optimizes frequently executed paths
- TCG is pure software, so it is 10-100x slower than KVM
[TCG Translation Block Cache]
Guest PC 0x1000 --> TB #1 (Host code at 0x7f001000)
Guest PC 0x1040 --> TB #2 (Host code at 0x7f001200)
Guest PC 0x1080 --> TB #3 (Host code at 0x7f001400)
...
TB #1 --> TB #2 --> TB #3 (Chaining for direct jumps)
Block Layer: Storage Management
Disk Image Formats
| Format | Features | Use Case |
|---|---|---|
| qcow2 | Copy-on-Write, snapshots, compression, encryption | QEMU default, production |
| raw | Zero overhead, best I/O performance | Performance-critical environments |
| vmdk | VMware compatible | VMware migration |
| vdi | VirtualBox compatible | VirtualBox migration |
| vpc/vhdx | Hyper-V compatible | Hyper-V migration |
# Create a qcow2 image (thin provisioned)
qemu-img create -f qcow2 disk.qcow2 100G
# Create a snapshot
qemu-img snapshot -c snap1 disk.qcow2
# List snapshots
qemu-img snapshot -l disk.qcow2
# Convert image format
qemu-img convert -f vmdk -O qcow2 source.vmdk target.qcow2
# Check image info
qemu-img info disk.qcow2
External Storage Backends
+------------------+
| Guest VM |
+------------------+
| VirtIO-blk/SCSI |
+------------------+
| QEMU Block Layer|
+--+--+--+--+--+--+
| | | | |
v v v v v
File NBD iSCSI Ceph GlusterFS
RBD
- NBD (Network Block Device): Remote disk sharing
- Ceph RBD: Distributed storage block device
- GlusterFS: Distributed filesystem volumes
- iSCSI: IP-based block storage
Device Emulation and VirtIO
Emulation vs Paravirtualization
[Traditional Emulation] [VirtIO Paravirtualization]
Guest OS Guest OS
| |
v v
Emulated e1000 NIC VirtIO-net driver
| |
v v
QEMU processes every Shared memory ring buffer
register access (many VM Exits) (minimal VM Exits)
| |
v v
Host NIC Host NIC
VirtIO Device Types:
| VirtIO Device | Function | Performance Gain |
|---|---|---|
| virtio-net | Network | 10x+ (vs e1000) |
| virtio-blk | Block storage | 2-5x (vs IDE emulation) |
| virtio-scsi | SCSI storage | Optimal for many disks |
| virtio-gpu | Graphics | 3D acceleration (virgl) |
| virtio-fs | File sharing | Host-guest filesystem sharing |
| virtio-balloon | Memory | Dynamic memory adjustment |
Network Backends
+--------+ +--------+ +--------+
| VM 1 | | VM 2 | | VM 3 |
| virtio | | virtio | | virtio |
+---+----+ +---+----+ +---+----+
| | |
+---+-------------+-------------+---+
| Linux Bridge |
| (virbr0) |
+----------------+-------------------+
|
Physical NIC
(eth0)
| Backend | Description | Use Case |
|---|---|---|
| SLIRP (user) | User-mode NAT, simple setup | Development/testing |
| TAP/TUN | Kernel-level virtual NIC | Production |
| Bridge | Connect TAP to bridge | Inter-VM, external access |
| macvtap | macvlan + TAP combined | Simple L2 connectivity |
| vhost-net | In-kernel VirtIO processing | High-performance networking |
# TAP + Bridge network setup
sudo ip link add br0 type bridge
sudo ip link set eth0 master br0
sudo ip link set br0 up
# Use TAP network in QEMU
qemu-system-x86_64 \
-netdev tap,id=net0,ifname=tap0,script=no \
-device virtio-net-pci,netdev=net0
KVM Integration: Hardware-Accelerated Virtualization
KVM_RUN ioctl Flow
QEMU Process (User Space)
|
| ioctl(KVM_RUN)
v
KVM Module (Kernel Space)
|
| VMLAUNCH / VMRESUME
v
Guest Mode (VMX non-root)
|
| VM Exit occurs
v
KVM Module (Kernel Space)
|
| If KVM can handle it -> handle directly
| If not -> return to QEMU
v
QEMU Process (User Space)
|
| Process device emulation etc.
| Call ioctl(KVM_RUN) again
v
... (repeat)
VM Exits handled by KVM:
- Most MSR accesses
- Simple I/O port accesses
- EPT violations (memory mapping)
- External interrupts
VM Exits forwarded to QEMU:
- Complex device I/O
- MMIO (Memory-Mapped I/O) access
- Certain CPUID requests
QEMU + KVM Performance
[Performance Comparison (relative, native = 100)]
CPU Computation (integer/floating-point):
Native: ████████████████████████████████████████ 100%
QEMU+KVM: ███████████████████████████████████████ 98%
QEMU(TCG): ██████████ 25%
Disk I/O (VirtIO):
Native: ████████████████████████████████████████ 100%
QEMU+KVM: ████████████████████████████████████ 90%
QEMU(TCG): ████████████ 30%
Network (vhost-net):
Native: ████████████████████████████████████████ 100%
QEMU+KVM: ██████████████████████████████████████ 95%
QEMU(TCG): ██████████ 25%
GPU Passthrough (VFIO)
Directly attach a GPU to a VM for near-native graphics performance.
Setup Steps
# 1. Enable IOMMU (in GRUB)
# Intel: intel_iommu=on iommu=pt
# AMD: amd_iommu=on iommu=pt
# 2. Check IOMMU groups
for d in /sys/kernel/iommu_groups/*/devices/*; do
n=$(basename "$d")
echo "IOMMU Group $(basename $(dirname $(dirname "$d"))): $n $(lspci -nns "$n")"
done
# 3. Bind GPU to vfio-pci
# Check GPU vendor:device ID
lspci -nn | grep -i nvidia
# Example output: 01:00.0 VGA ... [10de:2484]
# Example output: 01:00.1 Audio ... [10de:228b]
# Bind to vfio-pci
echo "10de 2484" > /sys/bus/pci/drivers/vfio-pci/new_id
echo "10de 228b" > /sys/bus/pci/drivers/vfio-pci/new_id
# 4. Run QEMU with GPU passthrough
qemu-system-x86_64 \
-enable-kvm \
-m 16G \
-cpu host \
-smp 8 \
-device vfio-pci,host=01:00.0,multifunction=on \
-device vfio-pci,host=01:00.1 \
-drive if=pflash,format=raw,readonly=on,file=OVMF_CODE.fd \
-drive if=pflash,format=raw,file=OVMF_VARS.fd \
-drive file=vm-disk.qcow2,format=qcow2,if=virtio
VFIO Architecture
+-------------------+
| Guest VM |
| (GPU Driver) |
+-------------------+
| VFIO-PCI |
| (Passthrough) |
+-------------------+
| IOMMU |
| (DMA Remapping) |
+-------------------+
| Physical GPU |
| (NVIDIA/AMD) |
+-------------------+
- IOMMU remaps DMA requests to ensure memory isolation between VMs
- All devices in the same IOMMU group must be passed through together
- OVMF (UEFI firmware) initializes the GPU BIOS
- A single GPU is exclusively assigned to one VM
Practical VM Creation
# Create and run a basic VM
qemu-system-x86_64 \
-enable-kvm \
-name "ubuntu-server" \
-m 4G \
-smp cores=4 \
-cpu host \
-drive file=ubuntu.qcow2,format=qcow2,if=virtio \
-cdrom ubuntu-22.04-server.iso \
-boot d \
-device virtio-net-pci,netdev=net0 \
-netdev user,id=net0,hostfwd=tcp::2222-:22 \
-vnc :1 \
-monitor stdio
# SSH into the VM (from host)
ssh -p 2222 user@localhost
# Live migration (from source)
# On destination host, start VM with same config plus -incoming tcp:0:4444
# On source (in QEMU monitor):
# migrate tcp:dest-host:4444
Use Cases
| Use Case | Description |
|---|---|
| OpenStack | Nova compute uses libvirt+QEMU/KVM as default hypervisor |
| Proxmox VE | Web UI-based KVM virtualization + LXC container management |
| Development | Cross-architecture development (ARM on x86, etc.) |
| Security Research | Isolated environments for malware analysis and vulnerability research |
| CI/CD | Quickly create/destroy clean test environments |
| Embedded Development | Emulate various boards with QEMU |
Quiz: QEMU/KVM Knowledge Check
Q1. What is the difference between QEMU System Emulation and User-mode Emulation?
System Emulation emulates an entire computer system (CPU, memory, disk, etc.) to run a full OS. User-mode Emulation emulates only a single binary and translates system calls to the host OS.
Q2. Why is TCG slower than KVM?
TCG converts guest instructions to an IR (Intermediate Representation) then to host instructions via software JIT compilation. KVM leverages hardware virtualization extensions (VT-x/AMD-V) to execute guest code directly on the CPU, making it much faster.
Q3. Why is VirtIO faster than an emulated e1000 NIC?
An emulated e1000 triggers a VM Exit for every register access. VirtIO uses shared memory ring buffers to transfer large amounts of data with minimal VM Exits, greatly reducing overhead.
Q4. Why are IOMMU groups important in GPU Passthrough?
Devices in the same IOMMU group can access each other's memory via DMA. For security, all devices in the same group must be assigned to a single VM together, or all must be isolated.
Q5. What are the advantages of the qcow2 format?
It supports thin provisioning via Copy-on-Write, with built-in snapshots, compression, and encryption. Creating a 100GB disk only consumes host storage proportional to the actual data written.