[Virtualization] 04. Complete Guide to GPU Virtualization: Passthrough, vGPU, MIG, SR-IOV

Introduction
GPU Passthrough (VFIO-PCI)
vGPU (NVIDIA GRID)
MIG (Multi-Instance GPU)
SR-IOV for GPUs
- PCIe-Level Virtualization
GPU Time-Slicing in Kubernetes
Comprehensive Comparison Table
Decision Guide

Introduction

GPU virtualization has become a critical infrastructure technology due to the explosive growth of AI/ML workloads and cloud computing. Various approaches exist, from exclusively assigning a physical GPU to a single VM to sharing one GPU across multiple VMs or containers.

[GPU Virtualization Technology Spectrum]

Exclusive <----------------------------------------> Maximum Sharing

GPU Passthrough    vGPU (SR-IOV)    MIG    vGPU (mdev)    Time-Slicing
 (1 GPU : 1 VM)   (HW partition)  (spatial) (SW timeslice)  (K8s share)

GPU Passthrough (VFIO-PCI)

Overview

GPU Passthrough directly attaches a physical GPU to a VM -- the simplest and most powerful approach.

+------------------+     +------------------+
|    VM 1          |     |    VM 2          |
|  GPU Driver      |     |  GPU Driver      |
|  (Full Access)   |     |  (Full Access)   |
+--------+---------+     +--------+---------+
         |                         |
    IOMMU/VT-d                IOMMU/VT-d
         |                         |
+--------+---------+     +--------+---------+
|  Physical GPU 1  |     |  Physical GPU 2  |
|  (Exclusive)     |     |  (Exclusive)     |
+------------------+     +------------------+

Characteristics

Aspect	Details
Performance	95-99% of native
GPU Sharing	Not possible (1 GPU = 1 VM)
Supported GPUs	All GPUs (NVIDIA, AMD, Intel)
Licensing	No additional license needed
Requirements	IOMMU (VT-d/AMD-Vi) capable CPU/motherboard
Drivers	Standard GPU driver installed in guest OS
Limitation	Only one VM per GPU

Setup Flow

# 1. Enable VT-d/AMD-Vi (IOMMU) in BIOS

# 2. Add IOMMU to boot parameters
# /etc/default/grub:
# GRUB_CMDLINE_LINUX="intel_iommu=on iommu=pt"

# 3. Check IOMMU groups
find /sys/kernel/iommu_groups/ -type l | sort -V

# 4. Bind GPU to vfio-pci driver
# /etc/modprobe.d/vfio.conf:
# options vfio-pci ids=10de:2484,10de:228b

# 5. Rebuild initramfs
update-initramfs -u

# 6. Verify after reboot
lspci -nnk -s 01:00.0
# Kernel driver in use: vfio-pci

vGPU (NVIDIA GRID)

Architecture

vGPU enables sharing a single physical GPU across multiple VMs.

+--------+ +--------+ +--------+ +--------+
| VM 1   | | VM 2   | | VM 3   | | VM 4   |
| vGPU   | | vGPU   | | vGPU   | | vGPU   |
| 4GB    | | 4GB    | | 4GB    | | 4GB    |
+--------+ +--------+ +--------+ +--------+
|            vGPU Manager                  |
|        (Hypervisor Module)               |
+------------------------------------------+
|          Physical GPU (16GB)             |
|        (NVIDIA A100/L40S/etc.)          |
+------------------------------------------+

mdev (Mediated Devices) - Pre-Volta

[Software Time-Slicing]

Time -->  |  VM1  |  VM2  |  VM3  |  VM1  |  VM2  |  VM3  |
          +-------+-------+-------+-------+-------+-------+
Full GPU   used    used    used    used    used    used

- Entire GPU shared via time-division
- vGPU Manager handles context switching
- VRAM statically partitioned
- Compute resources time-shared

SR-IOV - Ampere and Later

[Hardware Partitioning - SR-IOV]

+-------------------------------------------+
| Physical Function (PF)                     |
| - GPU management and configuration        |
+-------------------------------------------+
| VF 0        | VF 1        | VF 2          |
| (VM 1)      | (VM 2)      | (VM 3)        |
| Own queues  | Own queues  | Own queues     |
| Own IRQs    | Own IRQs    | Own IRQs       |
+-------------------------------------------+
|          Physical GPU (PCIe)              |
+-------------------------------------------+

- PF (Physical Function): Managed by host
- VF (Virtual Function): Directly assigned to each VM
- IOMMU ensures memory isolation between VFs
- Hardware-level isolation for better stability

vGPU Profile Series

Series	Purpose	VRAM	Typical Use Cases
C-series	Compute	Large	AI/ML training, HPC
Q-series	Quadro	Medium-Large	CAD, 3D rendering, professional graphics
B-series	Desktop	Small-Medium	VDI, general desktop
A-series	App Streaming	Small-Medium	App virtualization, remote workstations

[NVIDIA A100 vGPU Profile Examples]

GPU: NVIDIA A100 40GB

Profile           VRAM    Max Instances
A100-1-5C         5GB      7
A100-2-10C       10GB      4
A100-4-20C       20GB      2
A100-8-40C       40GB      1
A100-1-5CME       5GB      7  (MIG-backed)
A100-2-10CME     10GB      4  (MIG-backed)

License Requirements:

NVIDIA vGPU Software License required (vApps, vPC, vCS, vWS)
License server (DLS or CLS) needed
License type determined by GPU model and profile

MIG (Multi-Instance GPU)

Overview

MIG is a hardware-level spatial partitioning technology first introduced with NVIDIA A100.

[MIG Architecture - A100 Example]

+-----------------------------------------------------------+
|                    NVIDIA A100 (80GB)                       |
+-----------------------------------------------------------+
| GPC 0-1  | GPC 2-3  | GPC 4   | GPC 5   | GPC 6          |
| MIG 1    | MIG 2    | MIG 3   | MIG 4   | MIG 5          |
| 2g.20gb  | 2g.20gb  | 1g.10gb | 1g.10gb | 1g.10gb        |
+----------+----------+---------+---------+-----------------+
| Memory   | Memory   | Memory  | Memory  | Memory          |
| Ctrl 0-1 | Ctrl 2-3 | Ctrl 4  | Ctrl 5  | Ctrl 6          |
| 20GB HBM | 20GB HBM | 10GB HBM| 10GB HBM| 10GB HBM       |
+----------+----------+---------+---------+-----------------+
| L2 Cache | L2 Cache | L2 $    | L2 $    | L2 $            |
| Slice 0-1| Slice 2-3| Slice 4 | Slice 5 | Slice 6         |
+-----------------------------------------------------------+

Key Characteristics

Hardware Spatial Partitioning: Up to 7 independent instances
Dedicated Resources: Each instance gets its own SMs, memory controllers, L2 cache, and HBM
Complete Isolation: No resource contention between instances
Error Isolation: Errors in one instance do not affect others

Supported GPUs

GPU	Max Instances	HBM	Architecture
A100 40GB	7	40GB HBM2e	Ampere
A100 80GB	7	80GB HBM2e	Ampere
A30	4	24GB HBM2e	Ampere
H100	7	80GB HBM3	Hopper
H200	7	141GB HBM3e	Hopper

MIG Configuration Example

# Enable MIG mode (reboot required)
sudo nvidia-smi -i 0 --mig 1

# After reboot, create MIG instances
# List available profiles
nvidia-smi mig -lgip

# Create two 3g.40gb instances (A100 80GB)
sudo nvidia-smi mig -cgi 9,9 -C

# Verify created instances
nvidia-smi mig -lgi
nvidia-smi mig -lci

# Delete MIG instances
sudo nvidia-smi mig -dci
sudo nvidia-smi mig -dgi

# Disable MIG mode (reboot required)
sudo nvidia-smi -i 0 --mig 0

MIG-backed vGPU

Combining MIG with vGPU provides both hardware isolation and VM support.

+--------+ +--------+ +--------+
| VM 1   | | VM 2   | | VM 3   |
| vGPU   | | vGPU   | | vGPU   |
+--------+ +--------+ +--------+
| vGPU Manager                  |
+------+-------+-------+-------+
| MIG 1| MIG 2 | MIG 3 | MIG 4 |  <-- Hardware isolation
+------+-------+-------+-------+
|        Physical GPU           |
+-------------------------------+

- Between MIG slices: Hardware isolation
- Multiple vGPUs within same MIG: Time-slicing

SR-IOV for GPUs

PCIe-Level Virtualization

[SR-IOV Structure]

PCIe Bus
    |
+---+---+---+---+---+---+
| PF    | VF 0 | VF 1 | VF 2 |
| (Host)| (VM1)| (VM2)| (VM3)|
+-------+------+------+------+
|          Physical GPU       |
+-----------------------------+

PF = Physical Function (host management)
VF = Virtual Function (direct VM assignment)

NVIDIA supports GPU SR-IOV from Ampere onward
AMD MxGPU is also SR-IOV-based (Radeon Instinct/Pro)
Each VF is an independent PCIe function, isolated via IOMMU
PF manages the entire GPU and creates/configures VFs

GPU Time-Slicing in Kubernetes

The simplest GPU sharing approach, implemented purely in software without special hardware features.

[K8s GPU Time-Slicing]

+---Pod A---+ +---Pod B---+ +---Pod C---+
|  Process  | |  Process  | |  Process  |
|  CUDA     | |  CUDA     | |  CUDA     |
+-----------+ +-----------+ +-----------+
|    NVIDIA Device Plugin (time-slicing)  |
+-----------------------------------------+
|          Physical GPU                   |
|     (shared by all processes)           |
+-----------------------------------------+

- NVIDIA GPU Operator TimeSlicing configuration
- No hardware isolation
- Shared memory space (OOM possible)
- MPS (Multi-Process Service) option for better performance

# NVIDIA GPU Operator ConfigMap example
# time-slicing-config
apiVersion: v1
kind: ConfigMap
metadata:
  name: time-slicing-config
data:
  any: |-
    version: v1
    flags:
      migStrategy: none
    sharing:
      timeSlicing:
        renameByDefault: false
        failRequestsGreaterThanOne: false
        resources:
          - name: nvidia.com/gpu
            replicas: 4

Comprehensive Comparison Table

Aspect	Passthrough	vGPU (mdev)	vGPU (SR-IOV)	MIG	Time-Slicing
GPU Sharing	No	Yes (SW)	Yes (HW)	Yes (HW)	Yes (SW)
Isolation Level	Full (exclusive)	Time-sliced	PCIe VF	Spatial	None
Memory Isolation	Full	VRAM partition	VRAM partition	HBM partition	Shared
Performance	95-99%	85-95%	88-96%	85-95%	Variable
Max Instances	1	GPU-dependent	VF count	Up to 7	Unlimited
Supported GPUs	All GPUs	NVIDIA vGPU	NVIDIA Ampere+	A100/H100 etc.	All NVIDIA
Licensing	None	NVIDIA vGPU	NVIDIA vGPU	None	None
Primary Use	Dedicated GPU VM	VDI, multi-user	Enterprise	AI/ML multi-tenancy	K8s dev/test

Decision Guide

[GPU Virtualization Selection Flowchart]

Start: Do you need to share the GPU?
  |
  +-- No --> GPU Passthrough
  |          (Best performance, 1:1 assignment)
  |
  +-- Yes --> Do you need hardware isolation?
               |
               +-- No --> Is this Kubernetes?
               |           |
               |           +-- Yes --> Time-Slicing
               |           |          (Simplest approach)
               |           |
               |           +-- No --> vGPU (mdev)
               |                      (VM-based sharing)
               |
               +-- Yes --> MIG-capable GPU (A100/H100)?
                           |
                           +-- Yes --> MIG or MIG-backed vGPU
                           |          (Best isolation)
                           |
                           +-- No --> vGPU (SR-IOV)
                                      (Ampere+ HW partitioning)

Quiz: GPU Virtualization Knowledge Check

Q1. What is the fundamental difference between GPU Passthrough and vGPU?

GPU Passthrough exclusively assigns a physical GPU to a single VM, providing native performance. vGPU allows multiple VMs to share one GPU, distributing resources through time-slicing or SR-IOV.

Q2. Why does MIG provide better isolation than other GPU sharing methods?

MIG spatially partitions the GPU at the hardware level. Each instance has dedicated SMs, memory controllers, L2 cache, and HBM, fundamentally eliminating resource contention between instances. Errors are also isolated.

Q3. What are the roles of PF and VF in SR-IOV?

PF (Physical Function) is the PCIe function used by the host to manage the entire GPU. VFs (Virtual Functions) are created from the PF and directly assigned to VMs, providing independent GPU access.

Q4. What is the biggest weakness of Kubernetes Time-Slicing?

There is no hardware isolation, and all Pods share GPU memory. Excessive memory usage by one Pod can cause OOM in others, and security boundaries are unclear.

Q5. What is absolutely required to use vGPU?

An NVIDIA vGPU Software License is required. A license server (DLS or CLS) must be deployed, and vGPU Manager software must be installed on the hypervisor. GPU Passthrough and standalone MIG do not require licensing.