Kubernetes Advanced Operations Guide 2025: Autoscaling, Scheduling, Resource Management, Multi-Cluster

1. Introduction: Why Advanced Kubernetes Operations Matter

Running Kubernetes in production reveals challenges that basic deployments cannot address. Pods may not scale fast enough during traffic spikes, workloads may cluster on specific nodes causing cascading failures, or costs may explode without proper resource management.

This guide covers four core areas of advanced Kubernetes operations:

Autoscaling - Scale workloads and infrastructure automatically with HPA, VPA, KEDA, and Karpenter
Scheduling - Optimize Pod placement with Affinity, Taints, Priority, and Topology Spread
Resource Management - Ensure stability with QoS, LimitRange, ResourceQuota, and PDB
Multi-Cluster - Manage multiple clusters with Cluster API and Fleet

2. Autoscaling Strategies

2.1 HPA (Horizontal Pod Autoscaler) Deep Dive

HPA is the most fundamental autoscaler that adjusts the number of Pods. The v2 API supports custom and external metrics.

Basic HPA Configuration

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-server-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  minReplicas: 3
  maxReplicas: 50
  metrics:
    # CPU-based scaling
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    # Memory-based scaling
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 30
      policies:
        - type: Percent
          value: 100
          periodSeconds: 60
        - type: Pods
          value: 10
          periodSeconds: 60
      selectPolicy: Max
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
        - type: Percent
          value: 10
          periodSeconds: 60

Custom Metrics HPA

Using Prometheus Adapter, you can scale based on application-specific custom metrics.

# Prometheus Adapter configuration
apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-adapter-config
  namespace: monitoring
data:
  config.yaml: |
    rules:
      - seriesQuery: 'http_requests_per_second{namespace!="",pod!=""}'
        resources:
          overrides:
            namespace:
              resource: namespace
            pod:
              resource: pod
        name:
          matches: "^(.*)$"
          as: "requests_per_second"
        metricsQuery: 'sum(rate(http_requests_total{<<.LabelMatchers>>}[2m])) by (<<.GroupBy>>)'
---
# Custom metrics HPA
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-server-custom-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  minReplicas: 3
  maxReplicas: 100
  metrics:
    - type: Pods
      pods:
        metric:
          name: requests_per_second
        target:
          type: AverageValue
          averageValue: "1000"

External Metrics HPA

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: queue-worker-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: queue-worker
  minReplicas: 1
  maxReplicas: 30
  metrics:
    - type: External
      external:
        metric:
          name: sqs_queue_depth
          selector:
            matchLabels:
              queue: "order-processing"
        target:
          type: AverageValue
          averageValue: "5"

2.2 VPA (Vertical Pod Autoscaler)

VPA automatically adjusts CPU/memory requests for Pods. It is especially useful in early stages when optimal resource requests are unknown.

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: api-server-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  updatePolicy:
    updateMode: "Auto"  # Off, Initial, Recreate, Auto
  resourcePolicy:
    containerPolicies:
      - containerName: api-server
        minAllowed:
          cpu: 100m
          memory: 128Mi
        maxAllowed:
          cpu: 4
          memory: 8Gi
        controlledResources: ["cpu", "memory"]
        controlledValues: RequestsAndLimits

VPA Operating Mode Comparison:

Mode	Behavior	When to Use
Off	Provides recommendations only, no application	Initial analysis phase
Initial	Applied only at creation	Stable workloads
Recreate	Applied by recreating Pods	General operations
Auto	In-place if possible, otherwise recreate	Latest K8s environments

Caution: Using HPA and VPA on the same metrics (CPU/memory) simultaneously causes conflicts. The recommended pattern is to use VPA in Off mode for recommendations only while HPA handles scaling.

2.3 KEDA (Kubernetes Event-Driven Autoscaling)

KEDA scales workloads based on external event sources, supporting over 60 scalers.

# Install KEDA
# helm repo add kedacore https://kedacore.github.io/charts
# helm install keda kedacore/keda --namespace keda-system --create-namespace

# ScaledObject example: Kafka-based scaling
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: kafka-consumer-scaler
  namespace: production
spec:
  scaleTargetRef:
    name: kafka-consumer
  pollingInterval: 15
  cooldownPeriod: 300
  idleReplicaCount: 0
  minReplicaCount: 1
  maxReplicaCount: 50
  fallback:
    failureThreshold: 3
    replicas: 5
  triggers:
    - type: kafka
      metadata:
        bootstrapServers: kafka.production.svc:9092
        consumerGroup: order-processor
        topic: orders
        lagThreshold: "100"
        offsetResetPolicy: latest
---
# ScaledObject example: AWS SQS-based scaling
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: sqs-worker-scaler
spec:
  scaleTargetRef:
    name: sqs-worker
  pollingInterval: 10
  cooldownPeriod: 60
  minReplicaCount: 0
  maxReplicaCount: 100
  triggers:
    - type: aws-sqs-queue
      metadata:
        queueURL: https://sqs.ap-northeast-2.amazonaws.com/123456789012/order-queue
        queueLength: "5"
        awsRegion: ap-northeast-2
      authenticationRef:
        name: aws-credentials
---
# ScaledJob example: Batch job scaling
apiVersion: keda.sh/v1alpha1
kind: ScaledJob
metadata:
  name: image-processor
spec:
  jobTargetRef:
    template:
      spec:
        containers:
          - name: processor
            image: myapp/image-processor:latest
        restartPolicy: Never
  pollingInterval: 10
  maxReplicaCount: 20
  successfulJobsHistoryLimit: 10
  failedJobsHistoryLimit: 5
  triggers:
    - type: redis-lists
      metadata:
        address: redis.production.svc:6379
        listName: image-processing-queue
        listLength: "3"

2.4 Karpenter - Next-Generation Node Autoscaler

Karpenter is a node provisioning engine that overcomes the limitations of Cluster Autoscaler.

# NodePool definition
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: general-purpose
spec:
  template:
    metadata:
      labels:
        team: platform
        tier: general
    spec:
      requirements:
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64", "arm64"]
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["on-demand", "spot"]
        - key: karpenter.k8s.aws/instance-category
          operator: In
          values: ["c", "m", "r"]
        - key: karpenter.k8s.aws/instance-generation
          operator: Gt
          values: ["5"]
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: default
      expireAfter: 720h
  limits:
    cpu: "1000"
    memory: 2000Gi
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized
    consolidateAfter: 30s
---
# EC2NodeClass definition
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
  name: default
spec:
  amiSelectorTerms:
    - alias: al2023@latest
  subnetSelectorTerms:
    - tags:
        karpenter.sh/discovery: my-cluster
  securityGroupSelectorTerms:
    - tags:
        karpenter.sh/discovery: my-cluster
  role: KarpenterNodeRole-my-cluster
  blockDeviceMappings:
    - deviceName: /dev/xvda
      ebs:
        volumeSize: 100Gi
        volumeType: gp3
        iops: 3000
        throughput: 125

Cluster Autoscaler vs Karpenter Comparison:

Aspect	Cluster Autoscaler	Karpenter
Node Selection	Node Group based	Workload requirements based
Provisioning Speed	Minutes	Seconds
Instance Variety	Fixed per group	Automatic optimal selection
Spot Handling	Manual configuration	Automatic price/availability optimization
Consolidation	Not supported	Automatic node consolidation
Cloud Support	All clouds	AWS (Azure preview)

3. Advanced Scheduling

3.1 nodeSelector

The simplest node selection method.

apiVersion: v1
kind: Pod
metadata:
  name: gpu-worker
spec:
  nodeSelector:
    accelerator: nvidia-tesla-v100
    topology.kubernetes.io/zone: ap-northeast-2a
  containers:
    - name: gpu-worker
      image: myapp/gpu-worker:latest
      resources:
        limits:
          nvidia.com/gpu: 1

3.2 Node Affinity and Pod Affinity

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-frontend
spec:
  replicas: 6
  selector:
    matchLabels:
      app: web-frontend
  template:
    metadata:
      labels:
        app: web-frontend
    spec:
      affinity:
        # Node Affinity: place on specific nodes
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: node-type
                    operator: In
                    values:
                      - compute-optimized
                      - general-purpose
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 80
              preference:
                matchExpressions:
                  - key: topology.kubernetes.io/zone
                    operator: In
                    values:
                      - ap-northeast-2a
            - weight: 20
              preference:
                matchExpressions:
                  - key: topology.kubernetes.io/zone
                    operator: In
                    values:
                      - ap-northeast-2c
        # Pod Affinity: place on same node/zone as specific Pods
        podAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchExpressions:
                  - key: app
                    operator: In
                    values:
                      - redis-cache
              topologyKey: topology.kubernetes.io/zone
        # Pod Anti-Affinity: spread Pods of same app
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchExpressions:
                  - key: app
                    operator: In
                    values:
                      - web-frontend
              topologyKey: kubernetes.io/hostname
      containers:
        - name: web-frontend
          image: myapp/web-frontend:latest

3.3 Taints and Tolerations

# Add Taint to nodes
# kubectl taint nodes gpu-node-1 gpu=true:NoSchedule
# kubectl taint nodes spot-node-1 spot=true:PreferNoSchedule

# GPU workload: Tolerate gpu Taint
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ml-training
spec:
  replicas: 2
  selector:
    matchLabels:
      app: ml-training
  template:
    metadata:
      labels:
        app: ml-training
    spec:
      tolerations:
        - key: "gpu"
          operator: "Equal"
          value: "true"
          effect: "NoSchedule"
        - key: "nvidia.com/gpu"
          operator: "Exists"
          effect: "NoSchedule"
      nodeSelector:
        accelerator: nvidia-tesla-v100
      containers:
        - name: trainer
          image: myapp/ml-trainer:latest
          resources:
            limits:
              nvidia.com/gpu: 4
---
# Spot instance workloads
apiVersion: apps/v1
kind: Deployment
metadata:
  name: batch-processor
spec:
  replicas: 10
  selector:
    matchLabels:
      app: batch-processor
  template:
    metadata:
      labels:
        app: batch-processor
    spec:
      tolerations:
        - key: "spot"
          operator: "Equal"
          value: "true"
          effect: "PreferNoSchedule"
        - key: "node.kubernetes.io/not-ready"
          operator: "Exists"
          effect: "NoExecute"
          tolerationSeconds: 60
      containers:
        - name: processor
          image: myapp/batch-processor:latest

3.4 Priority and Preemption

# PriorityClass definitions
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: critical-production
value: 1000000
globalDefault: false
preemptionPolicy: PreemptLowerPriority
description: "For critical production services"
---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: standard-production
value: 100000
globalDefault: true
preemptionPolicy: PreemptLowerPriority
description: "For standard production workloads"
---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: batch-low
value: 1000
globalDefault: false
preemptionPolicy: Never
description: "For batch jobs. No preemption"
---
# Using Priority
apiVersion: apps/v1
kind: Deployment
metadata:
  name: payment-service
spec:
  replicas: 3
  selector:
    matchLabels:
      app: payment-service
  template:
    metadata:
      labels:
        app: payment-service
    spec:
      priorityClassName: critical-production
      containers:
        - name: payment
          image: myapp/payment:latest

3.5 Topology Spread Constraints

A powerful feature that distributes Pods evenly across topology domains.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-gateway
spec:
  replicas: 12
  selector:
    matchLabels:
      app: api-gateway
  template:
    metadata:
      labels:
        app: api-gateway
    spec:
      topologySpreadConstraints:
        # Spread across AZs
        - maxSkew: 1
          topologyKey: topology.kubernetes.io/zone
          whenUnsatisfiable: DoNotSchedule
          labelSelector:
            matchLabels:
              app: api-gateway
        # Spread across nodes
        - maxSkew: 2
          topologyKey: kubernetes.io/hostname
          whenUnsatisfiable: ScheduleAnyway
          labelSelector:
            matchLabels:
              app: api-gateway
          nodeAffinityPolicy: Honor
          nodeTaintsPolicy: Honor
      containers:
        - name: api-gateway
          image: myapp/api-gateway:latest

4. Resource Management

4.1 Understanding Requests vs Limits

apiVersion: v1
kind: Pod
metadata:
  name: resource-demo
spec:
  containers:
    - name: app
      image: myapp/demo:latest
      resources:
        # Used for scheduling. This amount of resources is guaranteed
        requests:
          cpu: 500m
          memory: 512Mi
          ephemeral-storage: 1Gi
        # Upper bound. CPU is throttled when exceeded, memory triggers OOMKill
        limits:
          cpu: "2"
          memory: 1Gi
          ephemeral-storage: 2Gi

4.2 QoS Classes

QoS Class	Condition	OOM Kill Priority
Guaranteed	All containers have requests = limits	Lowest (killed last)
Burstable	Only requests or limits set	Medium
BestEffort	Neither requests nor limits set	Highest (killed first)

# Guaranteed QoS
apiVersion: v1
kind: Pod
metadata:
  name: guaranteed-pod
spec:
  containers:
    - name: app
      image: myapp/critical:latest
      resources:
        requests:
          cpu: "1"
          memory: 1Gi
        limits:
          cpu: "1"
          memory: 1Gi

4.3 LimitRange and ResourceQuota

# LimitRange: Per-Pod/Container resource limits within a namespace
apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
  namespace: team-backend
spec:
  limits:
    - type: Container
      default:
        cpu: 500m
        memory: 512Mi
      defaultRequest:
        cpu: 100m
        memory: 128Mi
      max:
        cpu: "4"
        memory: 8Gi
      min:
        cpu: 50m
        memory: 64Mi
    - type: Pod
      max:
        cpu: "8"
        memory: 16Gi
    - type: PersistentVolumeClaim
      max:
        storage: 100Gi
      min:
        storage: 1Gi
---
# ResourceQuota: Total resource cap for entire namespace
apiVersion: v1
kind: ResourceQuota
metadata:
  name: team-backend-quota
  namespace: team-backend
spec:
  hard:
    requests.cpu: "20"
    requests.memory: 40Gi
    limits.cpu: "40"
    limits.memory: 80Gi
    pods: "100"
    services: "20"
    persistentvolumeclaims: "30"
    requests.storage: 500Gi
    count/deployments.apps: "30"
    count/configmaps: "50"
    count/secrets: "50"
  scopeSelector:
    matchExpressions:
      - scopeName: PriorityClass
        operator: In
        values:
          - standard-production
          - critical-production

4.4 PodDisruptionBudget (PDB)

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: api-server-pdb
  namespace: production
spec:
  minAvailable: "60%"
  selector:
    matchLabels:
      app: api-server
  unhealthyPodEvictionPolicy: IfHealthy
---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: redis-pdb
  namespace: production
spec:
  maxUnavailable: 1
  selector:
    matchLabels:
      app: redis-cluster

5. Multi-Cluster Operations

5.1 Cluster API

Cluster API is a project for declaratively creating and managing Kubernetes clusters.

# Cluster definition
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
  name: production-cluster
  namespace: clusters
spec:
  clusterNetwork:
    pods:
      cidrBlocks:
        - 192.168.0.0/16
    services:
      cidrBlocks:
        - 10.96.0.0/12
  controlPlaneRef:
    apiVersion: controlplane.cluster.x-k8s.io/v1beta1
    kind: KubeadmControlPlane
    name: production-control-plane
  infrastructureRef:
    apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
    kind: AWSCluster
    name: production-cluster
---
# Control Plane definition
apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: KubeadmControlPlane
metadata:
  name: production-control-plane
  namespace: clusters
spec:
  replicas: 3
  version: v1.30.2
  machineTemplate:
    infrastructureRef:
      apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
      kind: AWSMachineTemplate
      name: production-control-plane
  kubeadmConfigSpec:
    clusterConfiguration:
      apiServer:
        extraArgs:
          audit-log-maxage: "30"
          audit-log-maxbackup: "10"
          enable-admission-plugins: "NodeRestriction,PodSecurityAdmission"
    initConfiguration:
      nodeRegistration:
        kubeletExtraArgs:
          cloud-provider: external

5.2 Fleet/Rancher Multi-Cluster Management

# Fleet GitRepo: Deploy across multiple clusters
apiVersion: fleet.cattle.io/v1alpha1
kind: GitRepo
metadata:
  name: platform-apps
  namespace: fleet-default
spec:
  repo: https://github.com/myorg/platform-apps
  branch: main
  paths:
    - monitoring/
    - logging/
    - ingress/
  targets:
    - name: production
      clusterSelector:
        matchLabels:
          env: production
    - name: staging
      clusterSelector:
        matchLabels:
          env: staging
---
# Fleet Bundle customization
apiVersion: fleet.cattle.io/v1alpha1
kind: GitRepo
metadata:
  name: app-deployments
  namespace: fleet-default
spec:
  repo: https://github.com/myorg/app-deployments
  branch: main
  targets:
    - name: us-east
      clusterSelector:
        matchLabels:
          region: us-east
      helm:
        values:
          replicaCount: 5
          ingress:
            host: api-us.mycompany.com
    - name: ap-northeast
      clusterSelector:
        matchLabels:
          region: ap-northeast
      helm:
        values:
          replicaCount: 3
          ingress:
            host: api-ap.mycompany.com

5.3 Multi-Cluster Architecture Patterns

Pattern	Description	When to Use
Hub-Spoke	Central management cluster controls worker clusters	Basic multi-cluster
Federation	KubeFed syncs resources across clusters	Same app multi-region
Service Mesh	Istio Multi-cluster for inter-cluster communication	Distributed microservices
Virtual Kubelet	Admiralty, Liqo connect virtual nodes	Burst workloads

6. Cluster Upgrade Strategies

6.1 In-place Upgrade

#!/bin/bash
# Control Plane upgrade
echo "=== Starting Control Plane Upgrade ==="

# 1. Check current version
kubectl get nodes
kubectl version --short

# 2. Upgrade kubeadm
sudo apt-get update
sudo apt-get install -y kubeadm=1.30.2-1.1
sudo kubeadm upgrade plan
sudo kubeadm upgrade apply v1.30.2

# 3. Upgrade kubelet and kubectl
sudo apt-get install -y kubelet=1.30.2-1.1 kubectl=1.30.2-1.1
sudo systemctl daemon-reload
sudo systemctl restart kubelet

echo "=== Sequential Worker Node Upgrade ==="
NODES=$(kubectl get nodes -l node-role.kubernetes.io/worker -o jsonpath='{.items[*].metadata.name}')

for NODE in $NODES; do
  echo "--- Starting upgrade for $NODE ---"

  # Cordon: prevent new Pod scheduling
  kubectl cordon "$NODE"

  # Drain: evict existing Pods
  kubectl drain "$NODE" \
    --ignore-daemonsets \
    --delete-emptydir-data \
    --grace-period=120 \
    --timeout=300s

  # Run upgrade on the node (via SSH or automation tool)
  echo "Running kubeadm and kubelet upgrade on node $NODE"

  # Uncordon: resume scheduling
  kubectl uncordon "$NODE"

  # Verify node Ready status
  kubectl wait --for=condition=Ready "node/$NODE" --timeout=300s

  echo "--- Upgrade complete for $NODE ---"
done

6.2 Blue-Green Cluster Upgrade

# Create new cluster with Cluster API
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
  name: production-v130
  namespace: clusters
  labels:
    upgrade-group: production
    version: v1.30
spec:
  clusterNetwork:
    pods:
      cidrBlocks:
        - 192.168.0.0/16
  controlPlaneRef:
    apiVersion: controlplane.cluster.x-k8s.io/v1beta1
    kind: KubeadmControlPlane
    name: production-v130-cp
  infrastructureRef:
    apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
    kind: AWSCluster
    name: production-v130

7. Troubleshooting

7.1 Using kubectl debug

# Add debug container to running Pod
kubectl debug -it pod/api-server-abc123 \
  --image=nicolaka/netshoot \
  --target=api-server \
  -- /bin/bash

# Node debugging
kubectl debug node/worker-1 \
  -it --image=ubuntu:22.04 \
  -- /bin/bash

# Debug with Pod copy (image change)
kubectl debug pod/api-server-abc123 \
  -it --copy-to=debug-pod \
  --container=api-server \
  --image=myapp/api-server:debug \
  -- /bin/sh

7.2 Common Issues and Solutions

Pending Pod Issues:

# Check why Pod is Pending
kubectl describe pod pending-pod-name

# Common causes:
# 1. Insufficient resources -> Add nodes or adjust resources
kubectl get nodes -o custom-columns=\
NAME:.metadata.name,\
CPU_ALLOC:.status.allocatable.cpu,\
MEM_ALLOC:.status.allocatable.memory,\
CPU_REQ:.status.capacity.cpu

# 2. nodeSelector/affinity mismatch -> Check labels
kubectl get nodes --show-labels

# 3. Taints blocking -> Add tolerations
kubectl describe nodes | grep -A5 Taints

CrashLoopBackOff Resolution:

# Check logs
kubectl logs pod/crashing-pod --previous
kubectl logs pod/crashing-pod -c init-container-name

# Check events
kubectl get events --sort-by=.lastTimestamp \
  --field-selector involvedObject.name=crashing-pod

# Check for OOM Kill
kubectl describe pod crashing-pod | grep -A5 "Last State"
# If OOMKilled appears, increase memory limits

# Run in debug mode
kubectl debug pod/crashing-pod \
  -it --copy-to=debug-pod \
  --container=app \
  --image=busybox \
  -- /bin/sh

Network Issue Diagnosis:

# DNS check
kubectl run dns-test --image=busybox:1.36 --rm -it --restart=Never \
  -- nslookup kubernetes.default.svc.cluster.local

# Service connectivity test
kubectl run curl-test --image=curlimages/curl --rm -it --restart=Never \
  -- curl -v http://api-server.production.svc:8080/health

# Check network policies
kubectl get networkpolicy -A
kubectl describe networkpolicy -n production

8. Cost Optimization

8.1 Leveraging Spot Nodes

# Karpenter Spot NodePool
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: spot-workloads
spec:
  template:
    spec:
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot"]
        - key: karpenter.k8s.aws/instance-category
          operator: In
          values: ["c", "m", "r"]
        - key: karpenter.k8s.aws/instance-generation
          operator: Gt
          values: ["5"]
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: default
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized
    consolidateAfter: 30s
  limits:
    cpu: "500"
    memory: 1000Gi

8.2 Right-sizing Automation

#!/bin/bash
# Right-sizing report based on VPA recommendations

echo "=== Resource Utilization by Namespace ==="
for NS in $(kubectl get ns -o jsonpath='{.items[*].metadata.name}'); do
  CPU_REQ=$(kubectl top pods -n "$NS" --no-headers 2>/dev/null | \
    awk '{sum += $2} END {print sum}')
  MEM_REQ=$(kubectl top pods -n "$NS" --no-headers 2>/dev/null | \
    awk '{sum += $3} END {print sum}')
  if [ -n "$CPU_REQ" ] && [ "$CPU_REQ" != "0" ]; then
    echo "Namespace: $NS | CPU: ${CPU_REQ}m | Memory: ${MEM_REQ}Mi"
  fi
done

echo ""
echo "=== VPA Recommendations ==="
for VPA in $(kubectl get vpa -A -o jsonpath='{range .items[*]}{.metadata.namespace}/{.metadata.name}{"\n"}{end}'); do
  NS=$(echo "$VPA" | cut -d'/' -f1)
  NAME=$(echo "$VPA" | cut -d'/' -f2)
  echo "--- $VPA ---"
  kubectl get vpa "$NAME" -n "$NS" -o jsonpath='{.status.recommendation.containerRecommendations[*]}'
  echo ""
done

8.3 Namespace Cost Allocation

# Using Kubecost or OpenCost
# helm install kubecost kubecost/cost-analyzer \
#   --namespace kubecost --create-namespace \
#   --set prometheus.enabled=false \
#   --set prometheus.fqdn=http://prometheus-server.monitoring:80

# Cost allocation via namespace labels
apiVersion: v1
kind: Namespace
metadata:
  name: team-backend
  labels:
    cost-center: "backend-team"
    department: "engineering"
    project: "api-platform"
    environment: "production"

9. Practice Quiz

Q1: What component is needed for HPA to scale on custom metrics?

Answer: A custom metrics API server like Prometheus Adapter (or Datadog Cluster Agent, etc.) is required.

Prometheus Adapter exposes Prometheus metrics through the Kubernetes Custom Metrics API (custom.metrics.k8s.io)
HPA queries this API to retrieve custom metric values and make scaling decisions
Flow: Prometheus collects -> Adapter transforms -> HPA queries -> Scaling executes

Q2: What conditions must be met for Guaranteed QoS Class?

Answer: All containers in the Pod must have CPU and memory requests and limits set, and each pair must be equal.

requests.cpu = limits.cpu
requests.memory = limits.memory
Applies to all containers (including init containers)
Guaranteed Pods are the last to be OOM Killed under node memory pressure

Q3: What is the difference between requiredDuringSchedulingIgnoredDuringExecution and preferredDuringSchedulingIgnoredDuringExecution in podAntiAffinity?

Answer: Required means the condition must be satisfied for the Pod to be scheduled. If no node satisfies the condition, the Pod stays Pending. Preferred means the scheduler tries to place the Pod where conditions are met, but will place it elsewhere if necessary.

required: Hard constraint (mandatory)
preferred: Soft constraint, priority adjustable via weight
IgnoredDuringExecution: Already running Pods are not evicted even if conditions change

Q4: How does Karpenter consolidation save costs?

Answer: Karpenter consolidation moves Pods from idle or underutilized nodes to other nodes, then removes empty nodes or replaces them with smaller (cheaper) instances.

WhenEmpty: Only removes nodes with no Pods
WhenEmptyOrUnderutilized: Also relocates Pods from underutilized nodes before removing
Can replace with cheaper instance types (e.g., two c5.2xlarge to one c5.4xlarge)
consolidateAfter sets the stabilization wait time

Q5: How does PodDisruptionBudget affect cluster upgrades?

Answer: PDB limits the number of Pods that can be simultaneously disrupted during voluntary disruptions.

During kubectl drain, PDB is respected to evict Pods sequentially
minAvailable: Guarantees minimum available Pod count/percentage
maxUnavailable: Limits maximum disrupted Pod count/percentage
Overly strict PDBs can cause drain timeouts
unhealthyPodEvictionPolicy: IfHealthy allows ignoring PDB for unhealthy Pods