Skip to content
Published on

[Virtualization] 04. Complete Guide to GPU Virtualization: Passthrough, vGPU, MIG, SR-IOV

Authors

Introduction

GPU virtualization has become a critical infrastructure technology due to the explosive growth of AI/ML workloads and cloud computing. Various approaches exist, from exclusively assigning a physical GPU to a single VM to sharing one GPU across multiple VMs or containers.

[GPU Virtualization Technology Spectrum]

Exclusive <----------------------------------------> Maximum Sharing

GPU Passthrough    vGPU (SR-IOV)    MIG    vGPU (mdev)    Time-Slicing
 (1 GPU : 1 VM)   (HW partition)  (spatial) (SW timeslice)  (K8s share)

GPU Passthrough (VFIO-PCI)

Overview

GPU Passthrough directly attaches a physical GPU to a VM -- the simplest and most powerful approach.

+------------------+     +------------------+
|    VM 1          |     |    VM 2          |
|  GPU Driver      |     |  GPU Driver      |
|  (Full Access)   |     |  (Full Access)   |
+--------+---------+     +--------+---------+
         |                         |
    IOMMU/VT-d                IOMMU/VT-d
         |                         |
+--------+---------+     +--------+---------+
|  Physical GPU 1  |     |  Physical GPU 2  |
|  (Exclusive)     |     |  (Exclusive)     |
+------------------+     +------------------+

Characteristics

AspectDetails
Performance95-99% of native
GPU SharingNot possible (1 GPU = 1 VM)
Supported GPUsAll GPUs (NVIDIA, AMD, Intel)
LicensingNo additional license needed
RequirementsIOMMU (VT-d/AMD-Vi) capable CPU/motherboard
DriversStandard GPU driver installed in guest OS
LimitationOnly one VM per GPU

Setup Flow

# 1. Enable VT-d/AMD-Vi (IOMMU) in BIOS

# 2. Add IOMMU to boot parameters
# /etc/default/grub:
# GRUB_CMDLINE_LINUX="intel_iommu=on iommu=pt"

# 3. Check IOMMU groups
find /sys/kernel/iommu_groups/ -type l | sort -V

# 4. Bind GPU to vfio-pci driver
# /etc/modprobe.d/vfio.conf:
# options vfio-pci ids=10de:2484,10de:228b

# 5. Rebuild initramfs
update-initramfs -u

# 6. Verify after reboot
lspci -nnk -s 01:00.0
# Kernel driver in use: vfio-pci

vGPU (NVIDIA GRID)

Architecture

vGPU enables sharing a single physical GPU across multiple VMs.

+--------+ +--------+ +--------+ +--------+
| VM 1   | | VM 2   | | VM 3   | | VM 4   |
| vGPU   | | vGPU   | | vGPU   | | vGPU   |
| 4GB    | | 4GB    | | 4GB    | | 4GB    |
+--------+ +--------+ +--------+ +--------+
|            vGPU Manager                  |
|        (Hypervisor Module)               |
+------------------------------------------+
|          Physical GPU (16GB)             |
|        (NVIDIA A100/L40S/etc.)          |
+------------------------------------------+

mdev (Mediated Devices) - Pre-Volta

[Software Time-Slicing]

Time -->  |  VM1  |  VM2  |  VM3  |  VM1  |  VM2  |  VM3  |
          +-------+-------+-------+-------+-------+-------+
Full GPU   used    used    used    used    used    used

- Entire GPU shared via time-division
- vGPU Manager handles context switching
- VRAM statically partitioned
- Compute resources time-shared

SR-IOV - Ampere and Later

[Hardware Partitioning - SR-IOV]

+-------------------------------------------+
| Physical Function (PF)                     |
| - GPU management and configuration        |
+-------------------------------------------+
| VF 0        | VF 1        | VF 2          |
| (VM 1)      | (VM 2)      | (VM 3)        |
| Own queues  | Own queues  | Own queues     |
| Own IRQs    | Own IRQs    | Own IRQs       |
+-------------------------------------------+
|          Physical GPU (PCIe)              |
+-------------------------------------------+

- PF (Physical Function): Managed by host
- VF (Virtual Function): Directly assigned to each VM
- IOMMU ensures memory isolation between VFs
- Hardware-level isolation for better stability

vGPU Profile Series

SeriesPurposeVRAMTypical Use Cases
C-seriesComputeLargeAI/ML training, HPC
Q-seriesQuadroMedium-LargeCAD, 3D rendering, professional graphics
B-seriesDesktopSmall-MediumVDI, general desktop
A-seriesApp StreamingSmall-MediumApp virtualization, remote workstations
[NVIDIA A100 vGPU Profile Examples]

GPU: NVIDIA A100 40GB

Profile           VRAM    Max Instances
A100-1-5C         5GB      7
A100-2-10C       10GB      4
A100-4-20C       20GB      2
A100-8-40C       40GB      1
A100-1-5CME       5GB      7  (MIG-backed)
A100-2-10CME     10GB      4  (MIG-backed)

License Requirements:

  • NVIDIA vGPU Software License required (vApps, vPC, vCS, vWS)
  • License server (DLS or CLS) needed
  • License type determined by GPU model and profile

MIG (Multi-Instance GPU)

Overview

MIG is a hardware-level spatial partitioning technology first introduced with NVIDIA A100.

[MIG Architecture - A100 Example]

+-----------------------------------------------------------+
|                    NVIDIA A100 (80GB)                       |
+-----------------------------------------------------------+
| GPC 0-1  | GPC 2-3  | GPC 4   | GPC 5   | GPC 6          |
| MIG 1    | MIG 2    | MIG 3   | MIG 4   | MIG 5          |
| 2g.20gb  | 2g.20gb  | 1g.10gb | 1g.10gb | 1g.10gb        |
+----------+----------+---------+---------+-----------------+
| Memory   | Memory   | Memory  | Memory  | Memory          |
| Ctrl 0-1 | Ctrl 2-3 | Ctrl 4  | Ctrl 5  | Ctrl 6          |
| 20GB HBM | 20GB HBM | 10GB HBM| 10GB HBM| 10GB HBM       |
+----------+----------+---------+---------+-----------------+
| L2 Cache | L2 Cache | L2 $    | L2 $    | L2 $            |
| Slice 0-1| Slice 2-3| Slice 4 | Slice 5 | Slice 6         |
+-----------------------------------------------------------+

Key Characteristics

  • Hardware Spatial Partitioning: Up to 7 independent instances
  • Dedicated Resources: Each instance gets its own SMs, memory controllers, L2 cache, and HBM
  • Complete Isolation: No resource contention between instances
  • Error Isolation: Errors in one instance do not affect others

Supported GPUs

GPUMax InstancesHBMArchitecture
A100 40GB740GB HBM2eAmpere
A100 80GB780GB HBM2eAmpere
A30424GB HBM2eAmpere
H100780GB HBM3Hopper
H2007141GB HBM3eHopper

MIG Configuration Example

# Enable MIG mode (reboot required)
sudo nvidia-smi -i 0 --mig 1

# After reboot, create MIG instances
# List available profiles
nvidia-smi mig -lgip

# Create two 3g.40gb instances (A100 80GB)
sudo nvidia-smi mig -cgi 9,9 -C

# Verify created instances
nvidia-smi mig -lgi
nvidia-smi mig -lci

# Delete MIG instances
sudo nvidia-smi mig -dci
sudo nvidia-smi mig -dgi

# Disable MIG mode (reboot required)
sudo nvidia-smi -i 0 --mig 0

MIG-backed vGPU

Combining MIG with vGPU provides both hardware isolation and VM support.

+--------+ +--------+ +--------+
| VM 1   | | VM 2   | | VM 3   |
| vGPU   | | vGPU   | | vGPU   |
+--------+ +--------+ +--------+
| vGPU Manager                  |
+------+-------+-------+-------+
| MIG 1| MIG 2 | MIG 3 | MIG 4 |  <-- Hardware isolation
+------+-------+-------+-------+
|        Physical GPU           |
+-------------------------------+

- Between MIG slices: Hardware isolation
- Multiple vGPUs within same MIG: Time-slicing

SR-IOV for GPUs

PCIe-Level Virtualization

[SR-IOV Structure]

PCIe Bus
    |
+---+---+---+---+---+---+
| PF    | VF 0 | VF 1 | VF 2 |
| (Host)| (VM1)| (VM2)| (VM3)|
+-------+------+------+------+
|          Physical GPU       |
+-----------------------------+

PF = Physical Function (host management)
VF = Virtual Function (direct VM assignment)
  • NVIDIA supports GPU SR-IOV from Ampere onward
  • AMD MxGPU is also SR-IOV-based (Radeon Instinct/Pro)
  • Each VF is an independent PCIe function, isolated via IOMMU
  • PF manages the entire GPU and creates/configures VFs

GPU Time-Slicing in Kubernetes

The simplest GPU sharing approach, implemented purely in software without special hardware features.

[K8s GPU Time-Slicing]

+---Pod A---+ +---Pod B---+ +---Pod C---+
|  Process  | |  Process  | |  Process  |
|  CUDA     | |  CUDA     | |  CUDA     |
+-----------+ +-----------+ +-----------+
|    NVIDIA Device Plugin (time-slicing)  |
+-----------------------------------------+
|          Physical GPU                   |
|     (shared by all processes)           |
+-----------------------------------------+

- NVIDIA GPU Operator TimeSlicing configuration
- No hardware isolation
- Shared memory space (OOM possible)
- MPS (Multi-Process Service) option for better performance
# NVIDIA GPU Operator ConfigMap example
# time-slicing-config
apiVersion: v1
kind: ConfigMap
metadata:
  name: time-slicing-config
data:
  any: |-
    version: v1
    flags:
      migStrategy: none
    sharing:
      timeSlicing:
        renameByDefault: false
        failRequestsGreaterThanOne: false
        resources:
          - name: nvidia.com/gpu
            replicas: 4

Comprehensive Comparison Table

AspectPassthroughvGPU (mdev)vGPU (SR-IOV)MIGTime-Slicing
GPU SharingNoYes (SW)Yes (HW)Yes (HW)Yes (SW)
Isolation LevelFull (exclusive)Time-slicedPCIe VFSpatialNone
Memory IsolationFullVRAM partitionVRAM partitionHBM partitionShared
Performance95-99%85-95%88-96%85-95%Variable
Max Instances1GPU-dependentVF countUp to 7Unlimited
Supported GPUsAll GPUsNVIDIA vGPUNVIDIA Ampere+A100/H100 etc.All NVIDIA
LicensingNoneNVIDIA vGPUNVIDIA vGPUNoneNone
Primary UseDedicated GPU VMVDI, multi-userEnterpriseAI/ML multi-tenancyK8s dev/test

Decision Guide

[GPU Virtualization Selection Flowchart]

Start: Do you need to share the GPU?
  |
  +-- No --> GPU Passthrough
  |          (Best performance, 1:1 assignment)
  |
  +-- Yes --> Do you need hardware isolation?
               |
               +-- No --> Is this Kubernetes?
               |           |
               |           +-- Yes --> Time-Slicing
               |           |          (Simplest approach)
               |           |
               |           +-- No --> vGPU (mdev)
               |                      (VM-based sharing)
               |
               +-- Yes --> MIG-capable GPU (A100/H100)?
                           |
                           +-- Yes --> MIG or MIG-backed vGPU
                           |          (Best isolation)
                           |
                           +-- No --> vGPU (SR-IOV)
                                      (Ampere+ HW partitioning)

Quiz: GPU Virtualization Knowledge Check

Q1. What is the fundamental difference between GPU Passthrough and vGPU?

GPU Passthrough exclusively assigns a physical GPU to a single VM, providing native performance. vGPU allows multiple VMs to share one GPU, distributing resources through time-slicing or SR-IOV.

Q2. Why does MIG provide better isolation than other GPU sharing methods?

MIG spatially partitions the GPU at the hardware level. Each instance has dedicated SMs, memory controllers, L2 cache, and HBM, fundamentally eliminating resource contention between instances. Errors are also isolated.

Q3. What are the roles of PF and VF in SR-IOV?

PF (Physical Function) is the PCIe function used by the host to manage the entire GPU. VFs (Virtual Functions) are created from the PF and directly assigned to VMs, providing independent GPU access.

Q4. What is the biggest weakness of Kubernetes Time-Slicing?

There is no hardware isolation, and all Pods share GPU memory. Excessive memory usage by one Pod can cause OOM in others, and security boundaries are unclear.

Q5. What is absolutely required to use vGPU?

An NVIDIA vGPU Software License is required. A license server (DLS or CLS) must be deployed, and vGPU Manager software must be installed on the hypervisor. GPU Passthrough and standalone MIG do not require licensing.