Skip to content
Published on

[AWS] Karpenter Complete Guide: Kubernetes Node Auto-Provisioning

Authors

Table of Contents

1. What is Karpenter

Karpenter is an open-source Kubernetes node provisioner developed by AWS. Unlike Cluster Autoscaler, which manages nodes indirectly through Auto Scaling Groups (ASGs), Karpenter calls EC2 APIs directly to provision nodes optimized for your workloads within seconds.

Key Features

  • Direct EC2 Provisioning: Calls EC2 Fleet API directly without ASGs
  • Fast Scaling: Nodes come online in approximately 45-60 seconds (vs. 3-4 minutes with CA)
  • Intelligent Instance Selection: Automatically selects optimal instance types matching workload requirements
  • Bin-Packing Optimization: Advanced bin-packing algorithms maximize cluster utilization
  • Auto Consolidation: Automatically removes unused nodes and replaces with lower-cost alternatives
  • Drift Detection: Automatically detects and replaces out-of-date nodes when configurations change

2. Why Karpenter: Limitations of Cluster Autoscaler

Problems with Cluster Autoscaler

+------------------------------------------+
|           Cluster Autoscaler              |
|                                           |
|  Pod Pending                              |
|      |                                    |
|      v                                    |
|  CA requests scale-out from ASG           |
|      |                                    |
|      v                                    |
|  ASG launches EC2 instance using          |
|  pre-defined Launch Template              |
|      |                                    |
|      v                                    |
|  Node registration takes 3-5 minutes     |
+------------------------------------------+

Cluster Autoscaler has the following limitations:

  1. ASG Dependency: Tied to pre-defined Node Groups, reducing flexibility
  2. Slow Scaling: 3-5 minute provisioning time due to ASG intermediary
  3. Limited Instance Types: Only fixed instance types per Node Group
  4. Inefficient Bin-Packing: Scales at the Node Group level, causing resource waste
  5. Manual Management Overhead: Multiple Node Groups needed for diverse workloads

Karpenter's Approach

+------------------------------------------+
|              Karpenter                    |
|                                           |
|  Pod Pending                              |
|      |                                    |
|      v                                    |
|  Karpenter analyzes pod requirements      |
|  (CPU, Memory, GPU, Topology, etc.)       |
|      |                                    |
|      v                                    |
|  Auto-selects optimal instance type       |
|  (Cost optimization from 200+ types)      |
|      |                                    |
|      v                                    |
|  Calls EC2 Fleet API directly             |
|      |                                    |
|      v                                    |
|  Node Ready within 45-60 seconds          |
+------------------------------------------+

3. Karpenter Architecture

Overall Structure

+----------------------------------------------------------------+
|                     EKS Cluster                                |
|                                                                |
|  +------------------+     +-----------------------------+      |
|  | Karpenter        |     | Kubernetes API Server       |      |
|  | Controller       |---->| (Pod Watch, Node Mgmt)      |      |
|  | (Deployment)     |     +-----------------------------+      |
|  +--------+---------+                                          |
|           |                                                    |
|           |  References NodePool + EC2NodeClass                |
|           |                                                    |
|  +--------v---------+     +-----------------------------+      |
|  | Instance Type     |     | AWS Services                |      |
|  | Selection Engine  |---->| - EC2 Fleet API             |      |
|  | (Cost/Capacity    |     | - SSM (AMI Discovery)       |      |
|  |  Optimization)    |     | - Pricing API               |      |
|  +------------------+     | - SQS (Interruption)        |      |
|                           | - EventBridge               |      |
|                           +-----------------------------+      |
+----------------------------------------------------------------+

Core Components

Karpenter uses three primary Custom Resource Definitions (CRDs):

CRDAPI VersionDescription
NodePoolkarpenter.sh/v1Defines node provisioning constraints
EC2NodeClasskarpenter.k8s.aws/v1AWS-specific instance settings
NodeClaimkarpenter.sh/v1Runtime node request object

Provisioning Flow

1. Pod detected in Pending state
       |
2. Karpenter analyzes pod resource requests,
   nodeSelector, affinity, tolerations, etc.
       |
3. Determines matching NodePool (weight-based priority)
       |
4. References AWS settings from EC2NodeClass
   (subnets, security groups, AMIs, etc.)
       |
5. Selects optimal instance type
   (based on cost, capacity, and requirements)
       |
6. Launches instance via EC2 Fleet API
       |
7. Creates and tracks NodeClaim object
       |
8. Node registration complete -> Pod scheduled

4. NodePool Configuration in Detail

NodePool is the core CRD in Karpenter v1 that replaces the legacy Provisioner.

Basic NodePool Example

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: default
spec:
  template:
    metadata:
      labels:
        team: platform
        environment: production
    spec:
      requirements:
        # Instance category constraint
        - key: karpenter.k8s.aws/instance-category
          operator: In
          values: ['c', 'm', 'r']

        # Instance generation constraint
        - key: karpenter.k8s.aws/instance-generation
          operator: Gt
          values: ['5']

        # Capacity type (on-demand or spot)
        - key: karpenter.sh/capacity-type
          operator: In
          values: ['on-demand', 'spot']

        # Availability zones
        - key: topology.kubernetes.io/zone
          operator: In
          values: ['us-east-1a', 'us-east-1b', 'us-east-1c']

        # Architecture
        - key: kubernetes.io/arch
          operator: In
          values: ['amd64', 'arm64']

      # EC2NodeClass reference
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: default

      # Node expiration (auto-replace after 72 hours)
      expireAfter: 72h

  # Resource limits
  limits:
    cpu: '1000'
    memory: 1000Gi

  # Disruption policy
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized
    consolidateAfter: 1m
    budgets:
      - nodes: '10%'
      - nodes: '0'
        schedule: '0 9 * * MON-FRI'
        duration: 1h

  # NodePool weight (higher = higher priority)
  weight: 50

Key Requirement Keys

+---------------------------------------------+----------------------------------+
| Key                                         | Description                      |
+---------------------------------------------+----------------------------------+
| karpenter.sh/capacity-type                  | on-demand or spot                |
| karpenter.k8s.aws/instance-category         | Instance family (c, m, r, etc.)  |
| karpenter.k8s.aws/instance-generation       | Instance generation (5, 6, 7)    |
| karpenter.k8s.aws/instance-size             | Instance size (large, xlarge)    |
| karpenter.k8s.aws/instance-gpu-count        | GPU count                        |
| karpenter.k8s.aws/instance-gpu-name         | GPU name (a10g, t4, etc.)        |
| topology.kubernetes.io/zone                 | Availability zone                |
| kubernetes.io/arch                          | CPU architecture                 |
| kubernetes.io/os                            | Operating system                 |
+---------------------------------------------+----------------------------------+

Multi-NodePool Strategy

# Production workloads (On-Demand only, high priority)
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: production
spec:
  template:
    metadata:
      labels:
        workload-type: production
    spec:
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values: ['on-demand']
        - key: karpenter.k8s.aws/instance-category
          operator: In
          values: ['m', 'r']
        - key: karpenter.k8s.aws/instance-generation
          operator: Gt
          values: ['5']
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: production
      expireAfter: 168h
  limits:
    cpu: '500'
    memory: 500Gi
  disruption:
    consolidationPolicy: WhenEmpty
    consolidateAfter: 5m
  weight: 100
---
# Dev/Test workloads (Spot allowed, low priority)
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: development
spec:
  template:
    metadata:
      labels:
        workload-type: development
      annotations:
        dev-team: 'true'
    spec:
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values: ['spot', 'on-demand']
        - key: karpenter.k8s.aws/instance-category
          operator: In
          values: ['c', 'm']
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: development
      expireAfter: 24h
  limits:
    cpu: '200'
    memory: 200Gi
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized
    consolidateAfter: 30s
  weight: 10

5. EC2NodeClass Configuration in Detail

EC2NodeClass is the CRD that defines AWS-specific instance settings.

Complete EC2NodeClass Example

apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
  name: default
spec:
  # AMI configuration
  amiSelectorTerms:
    - alias: al2023@latest

  # IAM role
  role: KarpenterNodeRole-my-cluster

  # Subnet selection
  subnetSelectorTerms:
    - tags:
        karpenter.sh/discovery: my-cluster
        network-type: private

  # Security group selection
  securityGroupSelectorTerms:
    - tags:
        karpenter.sh/discovery: my-cluster

  # Block device mappings
  blockDeviceMappings:
    - deviceName: /dev/xvda
      ebs:
        volumeSize: 100Gi
        volumeType: gp3
        iops: 3000
        throughput: 125
        encrypted: true
        deleteOnTermination: true

  # Metadata options
  metadataOptions:
    httpEndpoint: enabled
    httpProtocolIPv6: disabled
    httpPutResponseHopLimit: 2
    httpTokens: required

  # Tags
  tags:
    Environment: production
    ManagedBy: karpenter
    Team: platform

  # User data (bootstrap script)
  userData: |
    #!/bin/bash
    echo "Karpenter managed node"
    # Additional bootstrap logic

AMI Selection Options

# Option 1: Alias (recommended)
amiSelectorTerms:
  - alias: al2023@latest       # Amazon Linux 2023 latest
  # - alias: al2@latest        # Amazon Linux 2
  # - alias: bottlerocket@latest # Bottlerocket

# Option 2: Tag-based selection
amiSelectorTerms:
  - tags:
      environment: production
      ami-type: custom-al2023

# Option 3: Direct AMI ID
amiSelectorTerms:
  - id: ami-0123456789abcdef0

Supported AMI Families

+-----------------+------------------------------------------+
| AMI Family      | Description                              |
+-----------------+------------------------------------------+
| AL2023          | Amazon Linux 2023 (recommended)          |
| AL2             | Amazon Linux 2                           |
| Bottlerocket    | AWS Bottlerocket (container-optimized)   |
| Windows2019     | Windows Server 2019                      |
| Windows2022     | Windows Server 2022                      |
| Windows2025     | Windows Server 2025                      |
+-----------------+------------------------------------------+

6. Consolidation: The Core of Cost Optimization

Karpenter's Consolidation automatically optimizes cluster costs by reducing unnecessary resources.

How Consolidation Works

Consolidation Types:
+------------------------------------------------------------------+
|                                                                  |
|  1. Delete Consolidation                                         |
|     - When all pods on a node can run on other existing nodes    |
|     - Safely removes the node                                    |
|                                                                  |
|  2. Replace Consolidation                                        |
|     - When the current node can be replaced with a smaller,      |
|       cheaper instance                                           |
|     - Provisions new node -> Migrates pods -> Removes old node   |
|                                                                  |
+------------------------------------------------------------------+

Consolidation Policy Settings

# Policy 1: Consolidate only empty nodes
disruption:
  consolidationPolicy: WhenEmpty
  consolidateAfter: 30s

# Policy 2: Consolidate empty + underutilized nodes (recommended)
disruption:
  consolidationPolicy: WhenEmptyOrUnderutilized
  consolidateAfter: 1m

Rate Limiting with Disruption Budgets

disruption:
  consolidationPolicy: WhenEmptyOrUnderutilized
  consolidateAfter: 1m
  budgets:
    # Allow disruption of up to 10% of total nodes simultaneously
    - nodes: '10%'

    # Block disruptions during business hours
    - nodes: '0'
      schedule: '0 9 * * MON-FRI'
      duration: 8h

    # Apply budget only for specific reasons (v1.0+)
    - nodes: '5%'
      reasons:
        - 'Underutilized'

Spot-to-Spot Consolidation

Spot-to-Spot consolidation requires at least 15 instance types to be configured.

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: spot-optimized
spec:
  template:
    spec:
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values: ['spot']
        # Diverse instance types to enable Spot-to-Spot consolidation
        - key: karpenter.k8s.aws/instance-category
          operator: In
          values: ['c', 'm', 'r']
        - key: karpenter.k8s.aws/instance-generation
          operator: Gt
          values: ['4']
        - key: karpenter.k8s.aws/instance-size
          operator: In
          values: ['large', 'xlarge', '2xlarge', '4xlarge']

7. Drift Detection

Drift detection identifies when existing nodes no longer match the current NodePool or EC2NodeClass configuration and automatically replaces them.

Scenarios That Trigger Drift

+---------------------------------------------------+
| Drift Detection Scenarios                         |
+---------------------------------------------------+
| - AMI has been updated                            |
| - NodePool requirements have changed              |
| - EC2NodeClass security groups have changed       |
| - EC2NodeClass subnets have changed               |
| - Block device settings have changed              |
| - Metadata options have changed                   |
| - Tags have changed                               |
+---------------------------------------------------+

Drift Replacement Process

1. Karpenter compares NodeClaim with current NodePool/EC2NodeClass
       |
2. If differences found, marks NodeClaim as Drifted
       |
3. Checks Disruption Budget
       |
4. Provisions new node (with latest settings)
       |
5. Safely drains pods from old node
       |
6. Terminates old node

8. Interruption Handling

Karpenter automatically handles various EC2 interruption events.

Supported Interruption Types

+-----------------------------+------------------------------------------+
| Interruption Type           | Description                              |
+-----------------------------+------------------------------------------+
| Spot Interruption           | 2-minute warning for Spot reclamation    |
| Rebalance Recommendation    | Pre-alert when disruption risk increases |
| Scheduled Maintenance       | AWS scheduled maintenance events         |
| Instance State Change       | State changes (stopping, stopped)        |
+-----------------------------+------------------------------------------+

SQS-Based Interruption Handling Architecture

+-------------------+     +-------------------+     +------------------+
| EC2 Spot          |     | Amazon            |     | Amazon           |
| Interruption      |---->| EventBridge       |---->| SQS Queue        |
| Notice            |     | Rules             |     |                  |
+-------------------+     +-------------------+     +--------+---------+
                                                             |
+-------------------+     +-------------------+              |
| EC2 Rebalance     |---->| EventBridge       |----+         |
| Recommendation    |     |                   |    |         |
+-------------------+     +-------------------+    |         |
                                                   v         v
                                              +----+---------+----+
                                              | Karpenter         |
                                              | Controller        |
                                              |                   |
                                              | 1. Receive event  |
                                              | 2. Cordon node    |
                                              | 3. Drain pods     |
                                              | 4. Launch new node|
                                              | 5. Terminate old  |
                                              +-------------------+

Interruption Handling Configuration

Specify the SQS queue name during Helm installation:

helm install karpenter oci://public.ecr.aws/karpenter/karpenter \
  --version "1.0.0" \
  --namespace karpenter \
  --create-namespace \
  --set "settings.clusterName=my-cluster" \
  --set "settings.interruptionQueue=my-cluster-karpenter" \
  --set controller.resources.requests.cpu=1 \
  --set controller.resources.requests.memory=1Gi \
  --set controller.resources.limits.cpu=1 \
  --set controller.resources.limits.memory=1Gi

9. Spot Instance Best Practices

Diversified Instance Types

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: spot-diverse
spec:
  template:
    spec:
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values: ['spot']
        # Diverse instance families for better Spot availability
        - key: karpenter.k8s.aws/instance-category
          operator: In
          values: ['c', 'm', 'r']
        # Multiple generations
        - key: karpenter.k8s.aws/instance-generation
          operator: In
          values: ['5', '6', '7']
        # Various sizes
        - key: karpenter.k8s.aws/instance-size
          operator: In
          values: ['large', 'xlarge', '2xlarge', '4xlarge']
        # Multiple availability zones
        - key: topology.kubernetes.io/zone
          operator: In
          values: ['us-east-1a', 'us-east-1b', 'us-east-1c']
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: default

Spot + On-Demand Mixed Strategy

# Spot-first NodePool (high weight)
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: spot-first
spec:
  template:
    spec:
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values: ['spot']
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: default
  weight: 100
  limits:
    cpu: '500'
---
# On-Demand fallback NodePool (low weight)
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: on-demand-fallback
spec:
  template:
    spec:
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values: ['on-demand']
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: default
  weight: 1
  limits:
    cpu: '200'

Spot Usage Best Practices

  • Diversify instance types: Allow at least 15 instance types to maximize Spot availability
  • Use multiple availability zones: Avoid single-AZ dependency for Spot capacity
  • Set PDB (Pod Disruption Budget): Ensure minimum availability for critical workloads
  • Handle graceful shutdown: Set appropriate terminationGracePeriodSeconds

10. Installing Karpenter with Helm

Prerequisites

# Set environment variables
export KARPENTER_NAMESPACE="karpenter"
export KARPENTER_VERSION="1.0.0"
export CLUSTER_NAME="my-eks-cluster"
export AWS_ACCOUNT_ID="$(aws sts get-caller-identity --query Account --output text)"
export TEMPOUT="$(mktemp)"

IAM Role Creation

# Create Karpenter controller role
aws iam create-role \
  --role-name "KarpenterControllerRole-my-cluster" \
  --assume-role-policy-document '{
    "Version": "2012-10-17",
    "Statement": [{
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::oidc-provider/oidc.eks.REGION.amazonaws.com/id/EXAMPLE"
      },
      "Action": "sts:AssumeRoleWithWebIdentity"
    }]
  }'

# Create Karpenter node role
aws iam create-role \
  --role-name "KarpenterNodeRole-my-cluster" \
  --assume-role-policy-document '{
    "Version": "2012-10-17",
    "Statement": [{
      "Effect": "Allow",
      "Principal": {
        "Service": "ec2.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }]
  }'

Helm Installation

# Install via OCI registry
helm install karpenter oci://public.ecr.aws/karpenter/karpenter \
  --version "1.0.0" \
  --namespace karpenter \
  --create-namespace \
  --set "settings.clusterName=my-cluster" \
  --set "settings.interruptionQueue=my-cluster-karpenter" \
  --set "serviceAccount.annotations.eks\\.amazonaws\\.com/role-arn=arn:aws:iam::123456789012:role/KarpenterControllerRole-my-cluster" \
  --set controller.resources.requests.cpu=1 \
  --set controller.resources.requests.memory=1Gi \
  --set controller.resources.limits.cpu=1 \
  --set controller.resources.limits.memory=1Gi \
  --wait

Verify Installation

# Check Karpenter pod status
kubectl get pods -n karpenter

# Verify CRDs
kubectl get crd | grep karpenter

# Check logs
kubectl logs -n karpenter -l app.kubernetes.io/name=karpenter -f

11. Complete Deployment Example

Full Configuration (NodePool + EC2NodeClass)

apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
  name: production
spec:
  amiSelectorTerms:
    - alias: al2023@latest
  role: KarpenterNodeRole-my-cluster
  subnetSelectorTerms:
    - tags:
        karpenter.sh/discovery: my-cluster
        network-type: private
  securityGroupSelectorTerms:
    - tags:
        karpenter.sh/discovery: my-cluster
  blockDeviceMappings:
    - deviceName: /dev/xvda
      ebs:
        volumeSize: 100Gi
        volumeType: gp3
        iops: 3000
        throughput: 125
        encrypted: true
        deleteOnTermination: true
  metadataOptions:
    httpEndpoint: enabled
    httpPutResponseHopLimit: 2
    httpTokens: required
  tags:
    Environment: production
    ManagedBy: karpenter
---
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: production
spec:
  template:
    metadata:
      labels:
        environment: production
        managed-by: karpenter
    spec:
      requirements:
        - key: karpenter.k8s.aws/instance-category
          operator: In
          values: ['c', 'm', 'r']
        - key: karpenter.k8s.aws/instance-generation
          operator: Gt
          values: ['5']
        - key: karpenter.sh/capacity-type
          operator: In
          values: ['on-demand']
        - key: topology.kubernetes.io/zone
          operator: In
          values: ['us-east-1a', 'us-east-1b', 'us-east-1c']
        - key: kubernetes.io/arch
          operator: In
          values: ['amd64']
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: production
      expireAfter: 168h
  limits:
    cpu: '1000'
    memory: 2000Gi
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized
    consolidateAfter: 2m
    budgets:
      - nodes: '10%'
      - nodes: '0'
        schedule: '0 2 * * *'
        duration: 1h
  weight: 100

Test Workload Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: inflate
  namespace: default
spec:
  replicas: 0
  selector:
    matchLabels:
      app: inflate
  template:
    metadata:
      labels:
        app: inflate
    spec:
      containers:
        - name: inflate
          image: public.ecr.aws/eks-distro/kubernetes/pause:3.7
          resources:
            requests:
              cpu: '1'
              memory: 1Gi
# Scale up to trigger Karpenter provisioning
kubectl scale deployment inflate --replicas=10

# Watch node provisioning
kubectl get nodes -w

# Check NodeClaim status
kubectl get nodeclaims

# Scale down to observe consolidation
kubectl scale deployment inflate --replicas=0

12. Karpenter vs Cluster Autoscaler Comparison

+-----------------------------+----------------------------+----------------------------+
| Feature                     | Karpenter                  | Cluster Autoscaler         |
+-----------------------------+----------------------------+----------------------------+
| Provisioning Method         | Direct EC2 API calls       | Indirect via ASG           |
| Provisioning Speed          | 45-60 seconds              | 3-5 minutes                |
| Instance Selection          | Auto-optimized (200+ types)| Fixed per Node Group       |
| Bin Packing                 | Pod-level optimization     | Node Group level           |
| Consolidation               | Built-in (automatic)       | Limited (scale-down only)  |
| Drift Detection             | Automatic                  | Not supported              |
| Spot Instances              | Native, auto-diversified   | ASG Mixed Instances        |
| Spot Interruption Handling  | SQS-based automatic        | Separate tools needed      |
| Multi-Architecture          | Auto AMD64 + ARM64         | Separate Node Groups       |
| Configuration Complexity    | NodePool + EC2NodeClass    | ASG + Launch Template      |
| Multi-Cloud                 | AWS only (community ext.)  | Official multi-cloud       |
| Cost Savings                | 25-40% (bin-pack + Spot)   | 10-20%                     |
| Official K8s Project        | No (AWS-led)               | Yes (SIG Autoscaling)      |
+-----------------------------+----------------------------+----------------------------+

13. Common Issues and Troubleshooting

Pods Stuck in Pending State

# Check NodePool requirements
kubectl describe nodepool default

# Check Karpenter logs for provisioning failure reasons
kubectl logs -n karpenter -l app.kubernetes.io/name=karpenter | grep -i "error\|failed"

# Check pod events
kubectl describe pod my-pod-name

Common Causes and Solutions

  1. Missing subnet or security group tags: Verify EC2NodeClass selector tags match actual AWS resources
  2. Insufficient IAM permissions: Check Karpenter controller role has required EC2, IAM, SSM permissions
  3. Resource limits exceeded: Verify NodePool limits are not exceeded by current usage
  4. Instance type unavailability: Certain instance types may have insufficient capacity in specific AZs

14. Summary

Karpenter is transforming the paradigm of Kubernetes node provisioning. Moving away from static ASG-based node management, it enables dynamic, workload-centric infrastructure provisioning.

  • Workloads requiring diverse instance types
  • Environments actively leveraging Spot instances
  • Event-driven workloads needing rapid scaling
  • Clusters where cost optimization is a primary goal
  • Environments including GPU/ML workloads

Important Considerations

  • AWS EKS exclusive; consider Cluster Autoscaler alongside for multi-cloud environments
  • Recommended to use v1.0+ stable version
  • Always enable Spot interruption handling by configuring the SQS queue
  • Set appropriate NodePool limits to prevent cost overruns