Skip to content
Published on

[Virtualization] 05. AWS EC2 and Nitro System: The Evolution of Cloud Virtualization

Authors

Introduction

AWS EC2 (Elastic Compute Cloud) is the world's largest cloud virtualization platform. At its core is the Nitro System, developed in-house by AWS. Nitro overcomes the limitations of traditional hypervisors by offloading I/O processing to dedicated hardware, delivering nearly all host resources to instances.

AWS Nitro System Architecture

Traditional Virtualization vs Nitro

[Traditional Virtualization]          [AWS Nitro]

+------------------+                 +------------------+
|    VM 1 | VM 2   |                 |    VM 1 | VM 2   |
+------------------+                 +------------------+
| Hypervisor       |                 | Nitro Hypervisor |
| - CPU/Mem mgmt   |                 | (lightweight,    |
| - Network proc   |                 |  CPU/Mem only)   |
| - Storage proc   |                 +------------------+
| - Security/Mgmt  |                 | Nitro Cards      |
+------------------+                 | (dedicated HW)   |
|   Hardware       |                 +--+--+--+---------+
+------------------+                 |NIC|EBS|Mgmt|Security
                                     +--+--+--+---------+
Host CPU: ~30%                       |   Hardware       |
consumed by hypervisor               +------------------+
                                     Host CPU: ~100%
                                     available to instances

Nitro System Components

+----------------------------------------------------------+
|                    AWS Nitro System                        |
+----------------------------------------------------------+
|                                                            |
|  +------------------+  +------------------------------+   |
|  | Nitro Hypervisor |  |       Nitro Cards            |   |
|  | - Lightweight    |  | +--------+ +--------+       |   |
|  |   KVM-based      |  | | VPC    | | EBS    |       |   |
|  | - CPU/Memory     |  | | Card   | | Card   |       |   |
|  |   isolation only |  | +--------+ +--------+       |   |
|  +------------------+  | +--------+ +--------+       |   |
|                         | | NVMe   | | Mgmt   |       |   |
|  +------------------+  | | Card   | | Card   |       |   |
|  | Nitro Security   |  | +--------+ +--------+       |   |
|  | Chip             |  +------------------------------+   |
|  | - HW Root of Trust|                                     |
|  | - Firmware protect|  +------------------------------+   |
|  +------------------+  | Nitro Enclaves               |   |
|                         | - Isolated compute           |   |
|                         | - Sensitive data processing  |   |
|                         +------------------------------+   |
+----------------------------------------------------------+

1. Nitro Hypervisor

  • Lightweight KVM-based: Handles only CPU and memory isolation
  • Network, storage, and management functions all offloaded to Nitro Cards
  • Delivers nearly 100% of host CPU/memory to instances
  • Minimizes software attack surface

2. Nitro Cards

Hardware cards built with dedicated ASICs.

Nitro CardRole
VPC CardVirtual network processing (VPC, SG, NACL, EFA)
EBS CardEBS volume I/O, encryption, NVMe protocol
Local NVMe CardInstance store NVMe SSD management
Management CardInstance monitoring, boot, security management

3. Nitro Security Chip

[Nitro Security Chain]

Server Boot
    |
    v
Nitro Security Chip (HW Root of Trust)
    |
    v  Firmware integrity verification
    |
Nitro Hypervisor loads
    |
    v  Hypervisor integrity verification
    |
EC2 Instance starts
    |
    v  Runtime monitoring (continuous)
  • Hardware-based Root of Trust
  • Verifies server firmware integrity at every boot
  • Even AWS employees cannot access instance memory
  • NitroTPM provides instance-level TPM 2.0

4. Nitro Enclaves

Isolated compute environments for processing sensitive data.

+-------------------------------------+
|           EC2 Instance              |
|  +-------------+  +-------------+  |
|  | Application |  | Nitro       |  |
|  | (general)   |  | Enclave     |  |
|  |             |  | (isolated)  |  |
|  |             |  | - own kernel|  |
|  |             |  | - no network|  |
|  |             |  | - no storage|  |
|  |             |  | - vsock only|  |
|  +-------------+  +-------------+  |
+-------------------------------------+
  • Parent instance cannot access Enclave memory
  • No network or storage access (vsock communication only)
  • Cryptographic attestation for integrity verification
  • Use cases: encryption key management, financial data, medical information

GPU Instance Types

P-Series (Training/HPC)

InstanceGPUCountGPU MemoryvCPUMemoryNetwork
p5.48xlargeH1008640GB HBM31922TB3,200 Gbps EFA
p5e.48xlargeH20081,128GB HBM3e1922TB3,200 Gbps EFA
p5en.48xlargeH20081,128GB HBM3e1922TB3,200 Gbps EFAv2
p4d.24xlargeA1008320GB HBM2e961.1TB400 Gbps EFA
p4de.24xlargeA100 80G8640GB HBM2e961.1TB400 Gbps EFA

G-Series (Inference/Graphics)

InstanceGPUCountGPU MemoryvCPUMemoryNetwork
g6.xlarge-48xlL41-824-192GB4-19216-768GBUp to 100 Gbps
g6e.xlarge-48xlL40S1-848-384GB4-19216-768GBUp to 100 Gbps
g5.xlarge-48xlA10G1-824-192GB4-19216-768GBUp to 100 Gbps

GPU Provisioning Method

AWS provides GPUs in passthrough mode via the Nitro System.

[AWS GPU Passthrough via Nitro]

+------------------+
|   EC2 Instance   |
|  (GPU Driver)    |
+------------------+
|  Nitro Hypervisor|
|  (CPU/Mem only)  |
+------------------+
|  Nitro VPC Card  |  Nitro EBS Card  |  Nitro Mgmt Card
+------------------+------------------+-----------------+
|                Physical Server                         |
|  CPU | RAM | GPU (Direct Passthrough) | NVMe           |
+-------------------------------------------------------+
  • GPUs are directly assigned to instances (not vGPU)
  • Bare-metal equivalent GPU performance
  • Full native GPU stack available (CUDA, cuDNN, NCCL, etc.)
  • Users can configure MIG directly within instances (A100/H100)

EC2 Networking

EFA (Elastic Fabric Adapter)

[EFA Architecture]

+----------+  +----------+  +----------+  +----------+
| Instance |  | Instance |  | Instance |  | Instance |
| GPU x8   |  | GPU x8   |  | GPU x8   |  | GPU x8   |
+----+-----+  +----+-----+  +----+-----+  +----+-----+
     |              |              |              |
+----+--------------+--------------+--------------+----+
|              EFA Network (RDMA-like)                  |
|         (OS bypass, low-latency, high-bandwidth)      |
+------------------------------------------------------+
FeatureDescription
OS BypassDirect NIC access bypassing the kernel
BandwidthP5: 3,200 Gbps, P4d: 400 Gbps
NCCL SupportDirect GPU-to-GPU communication (All-reduce, etc.)
GDR (GPUDirect RDMA)Direct network transfer from GPU memory
SRD ProtocolScalable Reliable Datagram
[GPUDirect RDMA Path]

Standard path:     GPU -> CPU Memory -> NIC -> Network
GPUDirect RDMA:    GPU -> NIC -> Network  (CPU bypass)

Placement Groups

Optimize network performance for GPU clusters.

[Cluster Placement Group]

+---------------------------------------------------+
|  Same AZ, Same Rack (or Adjacent Racks)           |
|                                                    |
|  +--------+  +--------+  +--------+  +--------+  |
|  | p5.48xl|  | p5.48xl|  | p5.48xl|  | p5.48xl|  |
|  | 8xH100 |  | 8xH100 |  | 8xH100 |  | 8xH100 |  |
|  +--------+  +--------+  +--------+  +--------+  |
|                                                    |
|  --> Minimal network latency, maximum bandwidth    |
|  --> Essential for large-scale distributed training|
+---------------------------------------------------+
Placement Group TypeDescriptionUse Case
ClusterDense placement in same AZDistributed GPU training, HPC
SpreadDistributed across different racksHigh availability
PartitionSeparate racks per partitionLarge distributed systems

Elastic Graphics (Deprecated)

Elastic Graphics was a service that attached remote GPUs to EC2 instances over the network.

  • Officially deprecated in 2024
  • Only offered limited OpenGL support
  • Alternatives: G-series instances or NICE DCV protocol

Comparison with On-Premises GPU Virtualization

AspectAWS EC2 (Nitro)On-Premises (ESXi/KVM)
GPU AssignmentPassthrough (Nitro)Passthrough, vGPU, MIG selectable
GPU SharingExclusive per instanceMulti-VM sharing via vGPU
NetworkEFA (up to 3,200 Gbps)InfiniBand (up to 400 Gbps/port)
GPU Type ChangeInstant via instance type changePhysical GPU replacement needed
ScalabilityHundreds of GPUs in minutesWeeks-months procurement
Cost ModelPay-per-use (per second)CAPEX + maintenance
MIG SupportUser configures within instanceManaged at hypervisor level
Multi-tenancyNitro HW isolation between instancesLogical isolation via vGPU/MIG

EC2 Instance Selection Guide

[GPU Instance Selection Flowchart]

What is your use case?
  |
  +-- AI/ML Training --> Model size?
  |                       |
  |                       +-- Large LLM --> P5 (H100/H200)
  |                       |                 8 GPUs, EFA 3200Gbps
  |                       |
  |                       +-- Medium --> P4d (A100)
  |                                      8 GPUs, EFA 400Gbps
  |
  +-- Inference --> Throughput needs?
  |                  |
  |                  +-- High throughput --> G6e (L40S)
  |                  |                      Up to 8 GPUs
  |                  |
  |                  +-- Cost efficient --> G6 (L4)
  |                                         Up to 8 GPUs
  |
  +-- Graphics/Rendering --> G5 (A10G)
  |                          3D rendering, video processing
  |
  +-- Dev/Prototype --> G6.xlarge (1x L4)
                        Most affordable GPU option

Hands-On: EC2 GPU Instance Setup

# Launch GPU instance with AWS CLI
aws ec2 run-instances \
  --instance-type p4d.24xlarge \
  --image-id ami-0abcdef1234567890 \
  --key-name my-key \
  --security-group-ids sg-12345678 \
  --subnet-id subnet-12345678 \
  --placement "GroupName=my-gpu-cluster,Tenancy=default" \
  --tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=gpu-training}]'

# Add EFA network interface
aws ec2 create-network-interface \
  --subnet-id subnet-12345678 \
  --interface-type efa \
  --groups sg-12345678

# Check GPU status (inside instance)
nvidia-smi

# Enable MIG (on A100/H100 instances)
sudo nvidia-smi -i 0 --mig 1
# After reboot
sudo nvidia-smi mig -cgi 9,9 -C

# NCCL test (multi-node)
# All-reduce benchmark for GPU-to-GPU communication
mpirun -np 16 --hostfile hosts \
  -x NCCL_DEBUG=INFO \
  -x FI_PROVIDER=efa \
  -x FI_EFA_USE_DEVICE_RDMA=1 \
  all_reduce_perf -b 8 -e 1G -f 2 -g 8

Key Innovations of the Nitro System

[Core Innovation: I/O Offloading]

Before Nitro:
  Host CPU:  [==VM==][==VM==][===Hypervisor I/O===]
                                    ~30% consumed

After Nitro:
  Host CPU:  [========VM========][========VM========]
  Nitro HW:  [Net][EBS][NVMe][Mgmt][Security]
                  ~100% available to VMs
  1. I/O Hardware Offload: Network, storage, management handled by dedicated chips
  2. Security Isolation: Hardware-level root of trust and memory isolation
  3. Bare-metal Performance: Near-zero virtualization overhead
  4. Consistent Performance: I/O processing independent of CPU eliminates "noisy neighbor" issues
  5. Rapid Innovation: Hardware components can be updated independently

Quiz: AWS EC2/Nitro Knowledge Check

Q1. What is the key advantage of the Nitro System over traditional hypervisors?

It offloads network, storage, and management functions to dedicated Nitro Cards (ASICs), providing nearly 100% of host CPU/memory to instances. Traditional hypervisors process these in software, consuming about 30% of host resources.

Q2. How does AWS EC2 assign GPUs?

Via passthrough. The Nitro System directly assigns physical GPUs to instances, delivering bare-metal equivalent performance. It does not use vGPU-style sharing.

Q3. How does EFA differ from standard networking?

EFA provides OS bypass for direct NIC access without going through the kernel. It offers RDMA-like low-latency, high-bandwidth communication, and GPUDirect RDMA enables direct network transfer from GPU memory.

Q4. What is the security model of Nitro Enclaves?

Even the parent instance cannot access Enclave memory. There is no network or storage access; communication occurs only through vsock. Cryptographic attestation verifies Enclave integrity.

Q5. Why are Cluster Placement Groups important for distributed GPU training?

They place instances densely in adjacent racks within the same AZ, minimizing network latency and maximizing bandwidth. Distributed training involves frequent GPU-to-GPU communication (All-reduce, etc.), making network performance directly impact training speed.