- Published on
[Virtualization] 05. AWS EC2 and Nitro System: The Evolution of Cloud Virtualization
- Authors

- Name
- Youngju Kim
- @fjvbn20031
- Introduction
- AWS Nitro System Architecture
- GPU Instance Types
- EC2 Networking
- Elastic Graphics (Deprecated)
- Comparison with On-Premises GPU Virtualization
- EC2 Instance Selection Guide
- Hands-On: EC2 GPU Instance Setup
- Key Innovations of the Nitro System
Introduction
AWS EC2 (Elastic Compute Cloud) is the world's largest cloud virtualization platform. At its core is the Nitro System, developed in-house by AWS. Nitro overcomes the limitations of traditional hypervisors by offloading I/O processing to dedicated hardware, delivering nearly all host resources to instances.
AWS Nitro System Architecture
Traditional Virtualization vs Nitro
[Traditional Virtualization] [AWS Nitro]
+------------------+ +------------------+
| VM 1 | VM 2 | | VM 1 | VM 2 |
+------------------+ +------------------+
| Hypervisor | | Nitro Hypervisor |
| - CPU/Mem mgmt | | (lightweight, |
| - Network proc | | CPU/Mem only) |
| - Storage proc | +------------------+
| - Security/Mgmt | | Nitro Cards |
+------------------+ | (dedicated HW) |
| Hardware | +--+--+--+---------+
+------------------+ |NIC|EBS|Mgmt|Security
+--+--+--+---------+
Host CPU: ~30% | Hardware |
consumed by hypervisor +------------------+
Host CPU: ~100%
available to instances
Nitro System Components
+----------------------------------------------------------+
| AWS Nitro System |
+----------------------------------------------------------+
| |
| +------------------+ +------------------------------+ |
| | Nitro Hypervisor | | Nitro Cards | |
| | - Lightweight | | +--------+ +--------+ | |
| | KVM-based | | | VPC | | EBS | | |
| | - CPU/Memory | | | Card | | Card | | |
| | isolation only | | +--------+ +--------+ | |
| +------------------+ | +--------+ +--------+ | |
| | | NVMe | | Mgmt | | |
| +------------------+ | | Card | | Card | | |
| | Nitro Security | | +--------+ +--------+ | |
| | Chip | +------------------------------+ |
| | - HW Root of Trust| |
| | - Firmware protect| +------------------------------+ |
| +------------------+ | Nitro Enclaves | |
| | - Isolated compute | |
| | - Sensitive data processing | |
| +------------------------------+ |
+----------------------------------------------------------+
1. Nitro Hypervisor
- Lightweight KVM-based: Handles only CPU and memory isolation
- Network, storage, and management functions all offloaded to Nitro Cards
- Delivers nearly 100% of host CPU/memory to instances
- Minimizes software attack surface
2. Nitro Cards
Hardware cards built with dedicated ASICs.
| Nitro Card | Role |
|---|---|
| VPC Card | Virtual network processing (VPC, SG, NACL, EFA) |
| EBS Card | EBS volume I/O, encryption, NVMe protocol |
| Local NVMe Card | Instance store NVMe SSD management |
| Management Card | Instance monitoring, boot, security management |
3. Nitro Security Chip
[Nitro Security Chain]
Server Boot
|
v
Nitro Security Chip (HW Root of Trust)
|
v Firmware integrity verification
|
Nitro Hypervisor loads
|
v Hypervisor integrity verification
|
EC2 Instance starts
|
v Runtime monitoring (continuous)
- Hardware-based Root of Trust
- Verifies server firmware integrity at every boot
- Even AWS employees cannot access instance memory
- NitroTPM provides instance-level TPM 2.0
4. Nitro Enclaves
Isolated compute environments for processing sensitive data.
+-------------------------------------+
| EC2 Instance |
| +-------------+ +-------------+ |
| | Application | | Nitro | |
| | (general) | | Enclave | |
| | | | (isolated) | |
| | | | - own kernel| |
| | | | - no network| |
| | | | - no storage| |
| | | | - vsock only| |
| +-------------+ +-------------+ |
+-------------------------------------+
- Parent instance cannot access Enclave memory
- No network or storage access (vsock communication only)
- Cryptographic attestation for integrity verification
- Use cases: encryption key management, financial data, medical information
GPU Instance Types
P-Series (Training/HPC)
| Instance | GPU | Count | GPU Memory | vCPU | Memory | Network |
|---|---|---|---|---|---|---|
| p5.48xlarge | H100 | 8 | 640GB HBM3 | 192 | 2TB | 3,200 Gbps EFA |
| p5e.48xlarge | H200 | 8 | 1,128GB HBM3e | 192 | 2TB | 3,200 Gbps EFA |
| p5en.48xlarge | H200 | 8 | 1,128GB HBM3e | 192 | 2TB | 3,200 Gbps EFAv2 |
| p4d.24xlarge | A100 | 8 | 320GB HBM2e | 96 | 1.1TB | 400 Gbps EFA |
| p4de.24xlarge | A100 80G | 8 | 640GB HBM2e | 96 | 1.1TB | 400 Gbps EFA |
G-Series (Inference/Graphics)
| Instance | GPU | Count | GPU Memory | vCPU | Memory | Network |
|---|---|---|---|---|---|---|
| g6.xlarge-48xl | L4 | 1-8 | 24-192GB | 4-192 | 16-768GB | Up to 100 Gbps |
| g6e.xlarge-48xl | L40S | 1-8 | 48-384GB | 4-192 | 16-768GB | Up to 100 Gbps |
| g5.xlarge-48xl | A10G | 1-8 | 24-192GB | 4-192 | 16-768GB | Up to 100 Gbps |
GPU Provisioning Method
AWS provides GPUs in passthrough mode via the Nitro System.
[AWS GPU Passthrough via Nitro]
+------------------+
| EC2 Instance |
| (GPU Driver) |
+------------------+
| Nitro Hypervisor|
| (CPU/Mem only) |
+------------------+
| Nitro VPC Card | Nitro EBS Card | Nitro Mgmt Card
+------------------+------------------+-----------------+
| Physical Server |
| CPU | RAM | GPU (Direct Passthrough) | NVMe |
+-------------------------------------------------------+
- GPUs are directly assigned to instances (not vGPU)
- Bare-metal equivalent GPU performance
- Full native GPU stack available (CUDA, cuDNN, NCCL, etc.)
- Users can configure MIG directly within instances (A100/H100)
EC2 Networking
EFA (Elastic Fabric Adapter)
[EFA Architecture]
+----------+ +----------+ +----------+ +----------+
| Instance | | Instance | | Instance | | Instance |
| GPU x8 | | GPU x8 | | GPU x8 | | GPU x8 |
+----+-----+ +----+-----+ +----+-----+ +----+-----+
| | | |
+----+--------------+--------------+--------------+----+
| EFA Network (RDMA-like) |
| (OS bypass, low-latency, high-bandwidth) |
+------------------------------------------------------+
| Feature | Description |
|---|---|
| OS Bypass | Direct NIC access bypassing the kernel |
| Bandwidth | P5: 3,200 Gbps, P4d: 400 Gbps |
| NCCL Support | Direct GPU-to-GPU communication (All-reduce, etc.) |
| GDR (GPUDirect RDMA) | Direct network transfer from GPU memory |
| SRD Protocol | Scalable Reliable Datagram |
[GPUDirect RDMA Path]
Standard path: GPU -> CPU Memory -> NIC -> Network
GPUDirect RDMA: GPU -> NIC -> Network (CPU bypass)
Placement Groups
Optimize network performance for GPU clusters.
[Cluster Placement Group]
+---------------------------------------------------+
| Same AZ, Same Rack (or Adjacent Racks) |
| |
| +--------+ +--------+ +--------+ +--------+ |
| | p5.48xl| | p5.48xl| | p5.48xl| | p5.48xl| |
| | 8xH100 | | 8xH100 | | 8xH100 | | 8xH100 | |
| +--------+ +--------+ +--------+ +--------+ |
| |
| --> Minimal network latency, maximum bandwidth |
| --> Essential for large-scale distributed training|
+---------------------------------------------------+
| Placement Group Type | Description | Use Case |
|---|---|---|
| Cluster | Dense placement in same AZ | Distributed GPU training, HPC |
| Spread | Distributed across different racks | High availability |
| Partition | Separate racks per partition | Large distributed systems |
Elastic Graphics (Deprecated)
Elastic Graphics was a service that attached remote GPUs to EC2 instances over the network.
- Officially deprecated in 2024
- Only offered limited OpenGL support
- Alternatives: G-series instances or NICE DCV protocol
Comparison with On-Premises GPU Virtualization
| Aspect | AWS EC2 (Nitro) | On-Premises (ESXi/KVM) |
|---|---|---|
| GPU Assignment | Passthrough (Nitro) | Passthrough, vGPU, MIG selectable |
| GPU Sharing | Exclusive per instance | Multi-VM sharing via vGPU |
| Network | EFA (up to 3,200 Gbps) | InfiniBand (up to 400 Gbps/port) |
| GPU Type Change | Instant via instance type change | Physical GPU replacement needed |
| Scalability | Hundreds of GPUs in minutes | Weeks-months procurement |
| Cost Model | Pay-per-use (per second) | CAPEX + maintenance |
| MIG Support | User configures within instance | Managed at hypervisor level |
| Multi-tenancy | Nitro HW isolation between instances | Logical isolation via vGPU/MIG |
EC2 Instance Selection Guide
[GPU Instance Selection Flowchart]
What is your use case?
|
+-- AI/ML Training --> Model size?
| |
| +-- Large LLM --> P5 (H100/H200)
| | 8 GPUs, EFA 3200Gbps
| |
| +-- Medium --> P4d (A100)
| 8 GPUs, EFA 400Gbps
|
+-- Inference --> Throughput needs?
| |
| +-- High throughput --> G6e (L40S)
| | Up to 8 GPUs
| |
| +-- Cost efficient --> G6 (L4)
| Up to 8 GPUs
|
+-- Graphics/Rendering --> G5 (A10G)
| 3D rendering, video processing
|
+-- Dev/Prototype --> G6.xlarge (1x L4)
Most affordable GPU option
Hands-On: EC2 GPU Instance Setup
# Launch GPU instance with AWS CLI
aws ec2 run-instances \
--instance-type p4d.24xlarge \
--image-id ami-0abcdef1234567890 \
--key-name my-key \
--security-group-ids sg-12345678 \
--subnet-id subnet-12345678 \
--placement "GroupName=my-gpu-cluster,Tenancy=default" \
--tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=gpu-training}]'
# Add EFA network interface
aws ec2 create-network-interface \
--subnet-id subnet-12345678 \
--interface-type efa \
--groups sg-12345678
# Check GPU status (inside instance)
nvidia-smi
# Enable MIG (on A100/H100 instances)
sudo nvidia-smi -i 0 --mig 1
# After reboot
sudo nvidia-smi mig -cgi 9,9 -C
# NCCL test (multi-node)
# All-reduce benchmark for GPU-to-GPU communication
mpirun -np 16 --hostfile hosts \
-x NCCL_DEBUG=INFO \
-x FI_PROVIDER=efa \
-x FI_EFA_USE_DEVICE_RDMA=1 \
all_reduce_perf -b 8 -e 1G -f 2 -g 8
Key Innovations of the Nitro System
[Core Innovation: I/O Offloading]
Before Nitro:
Host CPU: [==VM==][==VM==][===Hypervisor I/O===]
~30% consumed
After Nitro:
Host CPU: [========VM========][========VM========]
Nitro HW: [Net][EBS][NVMe][Mgmt][Security]
~100% available to VMs
- I/O Hardware Offload: Network, storage, management handled by dedicated chips
- Security Isolation: Hardware-level root of trust and memory isolation
- Bare-metal Performance: Near-zero virtualization overhead
- Consistent Performance: I/O processing independent of CPU eliminates "noisy neighbor" issues
- Rapid Innovation: Hardware components can be updated independently
Quiz: AWS EC2/Nitro Knowledge Check
Q1. What is the key advantage of the Nitro System over traditional hypervisors?
It offloads network, storage, and management functions to dedicated Nitro Cards (ASICs), providing nearly 100% of host CPU/memory to instances. Traditional hypervisors process these in software, consuming about 30% of host resources.
Q2. How does AWS EC2 assign GPUs?
Via passthrough. The Nitro System directly assigns physical GPUs to instances, delivering bare-metal equivalent performance. It does not use vGPU-style sharing.
Q3. How does EFA differ from standard networking?
EFA provides OS bypass for direct NIC access without going through the kernel. It offers RDMA-like low-latency, high-bandwidth communication, and GPUDirect RDMA enables direct network transfer from GPU memory.
Q4. What is the security model of Nitro Enclaves?
Even the parent instance cannot access Enclave memory. There is no network or storage access; communication occurs only through vsock. Cryptographic attestation verifies Enclave integrity.
Q5. Why are Cluster Placement Groups important for distributed GPU training?
They place instances densely in adjacent racks within the same AZ, minimizing network latency and maximizing bandwidth. Distributed training involves frequent GPU-to-GPU communication (All-reduce, etc.), making network performance directly impact training speed.