[Virtualization] 05. AWS EC2 and Nitro System: The Evolution of Cloud Virtualization

Introduction
AWS Nitro System Architecture
GPU Instance Types
EC2 Networking
- EFA (Elastic Fabric Adapter)
- Placement Groups
Elastic Graphics (Deprecated)
Comparison with On-Premises GPU Virtualization
EC2 Instance Selection Guide
Hands-On: EC2 GPU Instance Setup
Key Innovations of the Nitro System

Introduction

AWS EC2 (Elastic Compute Cloud) is the world's largest cloud virtualization platform. At its core is the Nitro System, developed in-house by AWS. Nitro overcomes the limitations of traditional hypervisors by offloading I/O processing to dedicated hardware, delivering nearly all host resources to instances.

AWS Nitro System Architecture

Traditional Virtualization vs Nitro

[Traditional Virtualization]          [AWS Nitro]

+------------------+                 +------------------+
|    VM 1 | VM 2   |                 |    VM 1 | VM 2   |
+------------------+                 +------------------+
| Hypervisor       |                 | Nitro Hypervisor |
| - CPU/Mem mgmt   |                 | (lightweight,    |
| - Network proc   |                 |  CPU/Mem only)   |
| - Storage proc   |                 +------------------+
| - Security/Mgmt  |                 | Nitro Cards      |
+------------------+                 | (dedicated HW)   |
|   Hardware       |                 +--+--+--+---------+
+------------------+                 |NIC|EBS|Mgmt|Security
                                     +--+--+--+---------+
Host CPU: ~30%                       |   Hardware       |
consumed by hypervisor               +------------------+
                                     Host CPU: ~100%
                                     available to instances

Nitro System Components

+----------------------------------------------------------+
|                    AWS Nitro System                        |
+----------------------------------------------------------+
|                                                            |
|  +------------------+  +------------------------------+   |
|  | Nitro Hypervisor |  |       Nitro Cards            |   |
|  | - Lightweight    |  | +--------+ +--------+       |   |
|  |   KVM-based      |  | | VPC    | | EBS    |       |   |
|  | - CPU/Memory     |  | | Card   | | Card   |       |   |
|  |   isolation only |  | +--------+ +--------+       |   |
|  +------------------+  | +--------+ +--------+       |   |
|                         | | NVMe   | | Mgmt   |       |   |
|  +------------------+  | | Card   | | Card   |       |   |
|  | Nitro Security   |  | +--------+ +--------+       |   |
|  | Chip             |  +------------------------------+   |
|  | - HW Root of Trust|                                     |
|  | - Firmware protect|  +------------------------------+   |
|  +------------------+  | Nitro Enclaves               |   |
|                         | - Isolated compute           |   |
|                         | - Sensitive data processing  |   |
|                         +------------------------------+   |
+----------------------------------------------------------+

1. Nitro Hypervisor

Lightweight KVM-based: Handles only CPU and memory isolation
Network, storage, and management functions all offloaded to Nitro Cards
Delivers nearly 100% of host CPU/memory to instances
Minimizes software attack surface

2. Nitro Cards

Hardware cards built with dedicated ASICs.

Nitro Card	Role
VPC Card	Virtual network processing (VPC, SG, NACL, EFA)
EBS Card	EBS volume I/O, encryption, NVMe protocol
Local NVMe Card	Instance store NVMe SSD management
Management Card	Instance monitoring, boot, security management

3. Nitro Security Chip

[Nitro Security Chain]

Server Boot
    |
    v
Nitro Security Chip (HW Root of Trust)
    |
    v  Firmware integrity verification
    |
Nitro Hypervisor loads
    |
    v  Hypervisor integrity verification
    |
EC2 Instance starts
    |
    v  Runtime monitoring (continuous)

Hardware-based Root of Trust
Verifies server firmware integrity at every boot
Even AWS employees cannot access instance memory
NitroTPM provides instance-level TPM 2.0

4. Nitro Enclaves

Isolated compute environments for processing sensitive data.

+-------------------------------------+
|           EC2 Instance              |
|  +-------------+  +-------------+  |
|  | Application |  | Nitro       |  |
|  | (general)   |  | Enclave     |  |
|  |             |  | (isolated)  |  |
|  |             |  | - own kernel|  |
|  |             |  | - no network|  |
|  |             |  | - no storage|  |
|  |             |  | - vsock only|  |
|  +-------------+  +-------------+  |
+-------------------------------------+

Parent instance cannot access Enclave memory
No network or storage access (vsock communication only)
Cryptographic attestation for integrity verification
Use cases: encryption key management, financial data, medical information

GPU Instance Types

P-Series (Training/HPC)

Instance	GPU	Count	GPU Memory	vCPU	Memory	Network
p5.48xlarge	H100	8	640GB HBM3	192	2TB	3,200 Gbps EFA
p5e.48xlarge	H200	8	1,128GB HBM3e	192	2TB	3,200 Gbps EFA
p5en.48xlarge	H200	8	1,128GB HBM3e	192	2TB	3,200 Gbps EFAv2
p4d.24xlarge	A100	8	320GB HBM2e	96	1.1TB	400 Gbps EFA
p4de.24xlarge	A100 80G	8	640GB HBM2e	96	1.1TB	400 Gbps EFA

G-Series (Inference/Graphics)

Instance	GPU	Count	GPU Memory	vCPU	Memory	Network
g6.xlarge-48xl	L4	1-8	24-192GB	4-192	16-768GB	Up to 100 Gbps
g6e.xlarge-48xl	L40S	1-8	48-384GB	4-192	16-768GB	Up to 100 Gbps
g5.xlarge-48xl	A10G	1-8	24-192GB	4-192	16-768GB	Up to 100 Gbps

GPU Provisioning Method

AWS provides GPUs in passthrough mode via the Nitro System.

[AWS GPU Passthrough via Nitro]

+------------------+
|   EC2 Instance   |
|  (GPU Driver)    |
+------------------+
|  Nitro Hypervisor|
|  (CPU/Mem only)  |
+------------------+
|  Nitro VPC Card  |  Nitro EBS Card  |  Nitro Mgmt Card
+------------------+------------------+-----------------+
|                Physical Server                         |
|  CPU | RAM | GPU (Direct Passthrough) | NVMe           |
+-------------------------------------------------------+

GPUs are directly assigned to instances (not vGPU)
Bare-metal equivalent GPU performance
Full native GPU stack available (CUDA, cuDNN, NCCL, etc.)
Users can configure MIG directly within instances (A100/H100)

EC2 Networking

EFA (Elastic Fabric Adapter)

[EFA Architecture]

+----------+  +----------+  +----------+  +----------+
| Instance |  | Instance |  | Instance |  | Instance |
| GPU x8   |  | GPU x8   |  | GPU x8   |  | GPU x8   |
+----+-----+  +----+-----+  +----+-----+  +----+-----+
     |              |              |              |
+----+--------------+--------------+--------------+----+
|              EFA Network (RDMA-like)                  |
|         (OS bypass, low-latency, high-bandwidth)      |
+------------------------------------------------------+

Feature	Description
OS Bypass	Direct NIC access bypassing the kernel
Bandwidth	P5: 3,200 Gbps, P4d: 400 Gbps
NCCL Support	Direct GPU-to-GPU communication (All-reduce, etc.)
GDR (GPUDirect RDMA)	Direct network transfer from GPU memory
SRD Protocol	Scalable Reliable Datagram

[GPUDirect RDMA Path]

Standard path:     GPU -> CPU Memory -> NIC -> Network
GPUDirect RDMA:    GPU -> NIC -> Network  (CPU bypass)

Placement Groups

Optimize network performance for GPU clusters.

[Cluster Placement Group]

+---------------------------------------------------+
|  Same AZ, Same Rack (or Adjacent Racks)           |
|                                                    |
|  +--------+  +--------+  +--------+  +--------+  |
|  | p5.48xl|  | p5.48xl|  | p5.48xl|  | p5.48xl|  |
|  | 8xH100 |  | 8xH100 |  | 8xH100 |  | 8xH100 |  |
|  +--------+  +--------+  +--------+  +--------+  |
|                                                    |
|  --> Minimal network latency, maximum bandwidth    |
|  --> Essential for large-scale distributed training|
+---------------------------------------------------+

Placement Group Type	Description	Use Case
Cluster	Dense placement in same AZ	Distributed GPU training, HPC
Spread	Distributed across different racks	High availability
Partition	Separate racks per partition	Large distributed systems

Elastic Graphics (Deprecated)

Elastic Graphics was a service that attached remote GPUs to EC2 instances over the network.

Officially deprecated in 2024
Only offered limited OpenGL support
Alternatives: G-series instances or NICE DCV protocol

Comparison with On-Premises GPU Virtualization

Aspect	AWS EC2 (Nitro)	On-Premises (ESXi/KVM)
GPU Assignment	Passthrough (Nitro)	Passthrough, vGPU, MIG selectable
GPU Sharing	Exclusive per instance	Multi-VM sharing via vGPU
Network	EFA (up to 3,200 Gbps)	InfiniBand (up to 400 Gbps/port)
GPU Type Change	Instant via instance type change	Physical GPU replacement needed
Scalability	Hundreds of GPUs in minutes	Weeks-months procurement
Cost Model	Pay-per-use (per second)	CAPEX + maintenance
MIG Support	User configures within instance	Managed at hypervisor level
Multi-tenancy	Nitro HW isolation between instances	Logical isolation via vGPU/MIG

EC2 Instance Selection Guide

[GPU Instance Selection Flowchart]

What is your use case?
  |
  +-- AI/ML Training --> Model size?
  |                       |
  |                       +-- Large LLM --> P5 (H100/H200)
  |                       |                 8 GPUs, EFA 3200Gbps
  |                       |
  |                       +-- Medium --> P4d (A100)
  |                                      8 GPUs, EFA 400Gbps
  |
  +-- Inference --> Throughput needs?
  |                  |
  |                  +-- High throughput --> G6e (L40S)
  |                  |                      Up to 8 GPUs
  |                  |
  |                  +-- Cost efficient --> G6 (L4)
  |                                         Up to 8 GPUs
  |
  +-- Graphics/Rendering --> G5 (A10G)
  |                          3D rendering, video processing
  |
  +-- Dev/Prototype --> G6.xlarge (1x L4)
                        Most affordable GPU option

Hands-On: EC2 GPU Instance Setup

# Launch GPU instance with AWS CLI
aws ec2 run-instances \
  --instance-type p4d.24xlarge \
  --image-id ami-0abcdef1234567890 \
  --key-name my-key \
  --security-group-ids sg-12345678 \
  --subnet-id subnet-12345678 \
  --placement "GroupName=my-gpu-cluster,Tenancy=default" \
  --tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=gpu-training}]'

# Add EFA network interface
aws ec2 create-network-interface \
  --subnet-id subnet-12345678 \
  --interface-type efa \
  --groups sg-12345678

# Check GPU status (inside instance)
nvidia-smi

# Enable MIG (on A100/H100 instances)
sudo nvidia-smi -i 0 --mig 1
# After reboot
sudo nvidia-smi mig -cgi 9,9 -C

# NCCL test (multi-node)
# All-reduce benchmark for GPU-to-GPU communication
mpirun -np 16 --hostfile hosts \
  -x NCCL_DEBUG=INFO \
  -x FI_PROVIDER=efa \
  -x FI_EFA_USE_DEVICE_RDMA=1 \
  all_reduce_perf -b 8 -e 1G -f 2 -g 8

Key Innovations of the Nitro System

[Core Innovation: I/O Offloading]

Before Nitro:
  Host CPU:  [==VM==][==VM==][===Hypervisor I/O===]
                                    ~30% consumed

After Nitro:
  Host CPU:  [========VM========][========VM========]
  Nitro HW:  [Net][EBS][NVMe][Mgmt][Security]
                  ~100% available to VMs

I/O Hardware Offload: Network, storage, management handled by dedicated chips
Security Isolation: Hardware-level root of trust and memory isolation
Bare-metal Performance: Near-zero virtualization overhead
Consistent Performance: I/O processing independent of CPU eliminates "noisy neighbor" issues
Rapid Innovation: Hardware components can be updated independently

Quiz: AWS EC2/Nitro Knowledge Check

Q1. What is the key advantage of the Nitro System over traditional hypervisors?

It offloads network, storage, and management functions to dedicated Nitro Cards (ASICs), providing nearly 100% of host CPU/memory to instances. Traditional hypervisors process these in software, consuming about 30% of host resources.

Q2. How does AWS EC2 assign GPUs?

Via passthrough. The Nitro System directly assigns physical GPUs to instances, delivering bare-metal equivalent performance. It does not use vGPU-style sharing.

Q3. How does EFA differ from standard networking?

EFA provides OS bypass for direct NIC access without going through the kernel. It offers RDMA-like low-latency, high-bandwidth communication, and GPUDirect RDMA enables direct network transfer from GPU memory.

Q4. What is the security model of Nitro Enclaves?

Even the parent instance cannot access Enclave memory. There is no network or storage access; communication occurs only through vsock. Cryptographic attestation verifies Enclave integrity.

Q5. Why are Cluster Placement Groups important for distributed GPU training?

They place instances densely in adjacent racks within the same AZ, minimizing network latency and maximizing bandwidth. Distributed training involves frequent GPU-to-GPU communication (All-reduce, etc.), making network performance directly impact training speed.