Cluster API Deep Dive — Managing Clusters as Kubernetes Resources

Introduction — The Single-Cluster Era Is Over
Why Existing Approaches Hit Their Limits
The Core Philosophy of Cluster API — Kubernetes Managing Kubernetes
Architecture Dissection — Management Cluster vs Workload Cluster
- Core CRD Relationship Map
The Provider Model — The Substance of the Abstraction
- Infrastructure Providers
- Bootstrap / Control Plane Providers
Hands-On — Running a Cluster Factory on Your Laptop with CAPD
Operations Deep Dive — Upgrades, Scaling, Auto-Remediation
Combining with GitOps — Governing a Cluster Fleet from Git
- Naming and Label Strategy at Fleet Scale
Pivot, Backup/DR, Version Skew — Operating the Management Cluster Itself
Limits and Reality — What to Know Before Adopting
Troubleshooting — The Order to Look When a Machine Is Stuck
Adoption Checklist
Closing Thoughts
References

Introduction — The Single-Cluster Era Is Over

In the early days of Kubernetes adoption, an organization ran one cluster, maybe two or three. The situation today is completely different. Separation by environment (dev/stage/prod), by region, by regulatory domain (even more so in network-segregated financial environments), dedicated clusters per tenant, and edge clusters deployed into stores, factories, and vehicles. Dozens is the baseline; telcos and retail-edge operators run hundreds to thousands of clusters.

The moment you cross ten clusters, the following questions become daily reality.

How many days does it take to create one new cluster? Is the procedure documented, or does it live only in one engineer's head?
What Kubernetes version is cluster number 37 running? Can you see the version distribution of the whole fleet on one screen?
When a control plane node dies, who recovers it, when, and how?
How many weeks does it take to move the entire fleet from v1.31 to v1.32?

If you cannot answer these questions with confidence, cluster lifecycle management is a bottleneck for your organization. Cluster API (CAPI for short), the subject of this article, is the Kubernetes community's official answer to this problem. The core idea is simple yet radical: declare the cluster itself as a Kubernetes resource, and let controllers make that declaration real.

Why Existing Approaches Hit Their Limits

Console Clicking (ClickOps)

Creating a cluster by clicking buttons in a cloud console is the fastest way to make one. But it is not reproducible. To build an identical cluster three months later, you depend on screenshots and memory, and audit trails are hard to produce. Configuration drift between clusters is inevitable, and nobody can explain why "this one cluster is different."

Terraform / OpenTofu

As infrastructure-as-code, Terraform is an excellent tool, but it has structural weaknesses for cluster lifecycle.

It only acts at execution time. Once terraform apply finishes, state observation ends too. If someone deletes a node group in the console, nobody notices until the next apply. There is no continuous reconcile.
The state file is a single point of failure. State locking, backend management, and recovering from corrupted state are operational burdens in themselves.
Node-level lifecycle is hard to express. A declaration like "replace this node if it is NotReady for 10 minutes" does not fit Terraform's language.
At the scale of hundreds of clusters, workspace/module splitting, plan duration, and drift detection all become painful.

kubeadm Scripts

kubeadm is the de facto standard low-level tool for cluster bootstrapping, but it is strictly a tool for "putting one node into a cluster." VM provisioning, LB setup, certificate renewal, node replacement, and upgrade sequencing are all left to your shell scripts. Over time, those scripts become secret rituals nobody dares to touch.

The common defect of all three approaches is the same: there is no agent continuously converging declaration and reality. The exact problem Kubernetes solved for pods remained unsolved for clusters themselves.

The Core Philosophy of Cluster API — Kubernetes Managing Kubernetes

Cluster API is an official subproject of Kubernetes SIG Cluster Lifecycle. Its philosophy can be summarized in three sentences.

Declarative API. Clusters, machines, and control planes are all defined as CRDs (Custom Resources). Declare "3 workers, version v1.32.2" and you are done.
Controller reconcile loops. Just as the Deployment controller maintains pod counts, CAPI controllers maintain machine counts and versions. When reality deviates from the declaration (node failure, manual deletion), they automatically converge it back.
Provider abstraction. Whether AWS, vSphere, or bare metal, infrastructure differences are hidden behind provider plugins. The user experience remains the same kubectl and YAML.

The analogy goes like this: pods have Deployments; clusters have Cluster API. Just as a ReplicaSet stamps out pods, a MachineSet stamps out machines (node VMs); just as a Deployment rolls pods, a MachineDeployment rolls nodes. It lifts the mental model Kubernetes operators already know up to the cluster level.

Another crucial principle is immutable infrastructure. CAPI does not "fix" machines. When configuration or version changes, it creates new machines and discards old ones. SSHing into a live node to upgrade packages is an act outside the model.

Architecture Dissection — Management Cluster vs Workload Cluster

There are two kinds of clusters in the CAPI world.

Management cluster: the cluster where CAPI controllers and CRDs are installed. The declarations (YAML) of other clusters are stored and reconciled here.
Workload cluster: the target cluster produced by the management cluster. Real applications run here. The workload cluster itself is unaware that CAPI exists.

+----------------------------------------------------------------------+
|                      Management Cluster                              |
|                                                                      |
|  +----------------+  +----------------+  +------------------------+  |
|  | CAPI Core      |  | Bootstrap      |  | Control Plane          |  |
|  | Controller     |  | Provider       |  | Provider               |  |
|  | (Cluster,      |  | (CABPK:        |  | (KCP: KubeadmControl-  |  |
|  |  Machine, MS,  |  |  KubeadmConfig)|  |  Plane controller)     |  |
|  |  MD, MHC)      |  +----------------+  +------------------------+  |
|  +----------------+                                                  |
|  +-------------------------------+                                   |
|  | Infrastructure Provider       |   Declarations stored in etcd:    |
|  | (CAPA/CAPZ/CAPV/CAPD/Metal3)  |   Cluster, MachineDeployment,     |
|  +-------------------------------+   KubeadmControlPlane ...         |
+----------------------------------------------------------------------+
        |                        |                          |
        | provision / reconcile  |                          |
        v                        v                          v
+----------------+      +----------------+        +----------------+
| Workload       |      | Workload       |        | Workload       |
| Cluster A      |      | Cluster B      |        | Cluster C      |
| (prod-seoul)   |      | (prod-tokyo)   |        | (edge-store-7) |
+----------------+      +----------------+        +----------------+

Core CRD Relationship Map

CAPI CRDs are cleanly divided by role. Let us look at the relationships as a diagram first.

                         +-----------+
                         |  Cluster  |  umbrella resource for the whole cluster
                         +-----+-----+
                               |
            +------------------+-------------------+
            | controlPlaneRef                      | infrastructureRef
            v                                      v
 +----------------------+              +---------------------------+
 | KubeadmControlPlane  |              | (Infra)Cluster            |
 | (replicas, version)  |              | e.g. DockerCluster,       |
 +----------+-----------+              |      AWSCluster: VPC/LB   |
            |                          +---------------------------+
            | creates/manages
            v
       +---------+     1:1      +------------------------+
       | Machine |------------- | (Infra)Machine         |
       +---------+              | e.g. DockerMachine,    |
            ^                   |      AWSMachine: a VM  |
            |                   +------------------------+
            |  creates                   ^ 1:1
   +--------+-------+                    |
   |   MachineSet   | <-- template refs: (Infra)MachineTemplate
   +--------+-------+                    KubeadmConfigTemplate
            ^
            | creates / rolling replace
   +--------+----------------+         +---------------------+
   |   MachineDeployment     |         | MachineHealthCheck  |
   |  (replicas, version,    |         |  detect unhealthy   |
   |   rolling strategy)     |         |  machines → replace |
   +-------------------------+         +---------------------+

The responsibilities of each resource are summarized below.

Resource	Analogy (pod world)	Responsibility
Cluster	Umbrella, somewhat like a Namespace	Top-level resource binding cluster network CIDRs and control plane / infra references
Machine	Pod	Declaration for one node. Immutable — replaced when spec changes
MachineSet	ReplicaSet	Maintains replica count of identical machines
MachineDeployment	Deployment	Orchestrates rolling updates of MachineSets
KubeadmControlPlane	Closest to a StatefulSet	Manages control plane machines, etcd membership, certificates, versions
KubeadmConfig	cloud-init generator	Generates kubeadm init/join configuration executed at machine boot
MachineHealthCheck	livenessProbe plus auto-replace	Detects unhealthy nodes and triggers remediation

KubeadmControlPlane (KCP) is especially important. Because of etcd quorum, control plane nodes cannot be swapped as casually as workers. During an upgrade, the KCP controller automates the sequence of adding one new control plane machine, joining it as an etcd member, safely removing the old machine from etcd, and then discarding it. A task that makes you sweat when done by hand finishes with a one-line declaration.

The Provider Model — The Substance of the Abstraction

The CAPI core knows nothing about infrastructure. Actual VM creation belongs to infrastructure providers, node initialization script generation to bootstrap providers, and control plane orchestration to control plane providers.

Infrastructure Providers

Provider	Target infrastructure	Maturity / notes
CAPA	AWS (EC2, EKS)	Mature. Supports EKS managed control planes
CAPZ	Azure (VM, AKS)	Mature. Supports AKS managed topologies
CAPG	GCP (GCE, GKE)	Stable, but narrower feature breadth than CAPA/CAPZ
CAPV	vSphere	The de facto standard choice for on-prem virtualization
CAPO	OpenStack	Active in telco / private cloud
CAPD	Docker containers	Dev/test/CI only. Never production
Metal3	Bare metal (Ironic-based)	Controls physical server power/images via BMC
BYOH	Reuse existing hosts	Enrolls hosts that already have an OS as nodes

Bootstrap / Control Plane Providers

kubeadm is the default (CABPK + KCP), but the ecosystem is wider.

Provider	Distribution	Characteristics
kubeadm (default)	Vanilla Kubernetes	Most mature, the reference implementation
k3s	k3s	Lightweight edge environments. Maintained by the k3s-io community
RKE2	RKE2	Rancher family, security-hardened distribution
Talos	Talos Linux	Immutable OS with no SSH, managed only via API. Sidero Labs
Managed CP	EKS/AKS/GKE	Managed control plane resources built into infra providers

The relationship with managed Kubernetes is covered separately below, but the headline is this: with resources like CAPA's AWSManagedControlPlane, even an EKS control plane can become the target of a CAPI declaration.

Hands-On — Running a Cluster Factory on Your Laptop with CAPD

CAPD (Cluster API Provider Docker) treats Docker containers as "machines," so you can experience the full flow on a laptop without a cloud account. You need kind, Docker, clusterctl, and kubectl.

Step 1 — Prepare the Management Cluster and Run clusterctl init

# Create a kind cluster that lets CAPD use the Docker socket
cat > kind-mgmt.yaml <<EOF
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
name: capi-mgmt
nodes:
  - role: control-plane
    extraMounts:
      - hostPath: /var/run/docker.sock
        containerPath: /var/run/docker.sock
EOF
kind create cluster --config kind-mgmt.yaml

# Enable the ClusterClass feature, then install CAPI + CAPD
export CLUSTER_TOPOLOGY=true
clusterctl init --infrastructure docker

clusterctl init installs cert-manager, the CAPI core, the kubeadm bootstrap/control-plane providers, and CAPD into the management cluster. Verify the installation:

kubectl get pods -A | grep -E "capi|capd|cert-manager"
# capd-system / capi-kubeadm-bootstrap-system /
# capi-kubeadm-control-plane-system / capi-system Running means healthy

Step 2 — The Full Workload Cluster Declaration YAML

You could generate a template with clusterctl generate cluster, but for learning purposes we will look at the core resources directly. Below is the full manifest for a cluster with 1 control plane node and 2 workers.

# dev-cluster-01.yaml — Cluster: the top-level umbrella
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
  name: dev-cluster-01
  namespace: default
  labels:
    env: dev
    region: local
spec:
  clusterNetwork:
    pods:
      cidrBlocks: ["192.168.0.0/16"]
    serviceDomain: cluster.local
  controlPlaneRef:
    apiVersion: controlplane.cluster.x-k8s.io/v1beta1
    kind: KubeadmControlPlane
    name: dev-cluster-01-cp
  infrastructureRef:
    apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
    kind: DockerCluster
    name: dev-cluster-01
---
# Infra cluster: in CAPD this represents the LB container etc.
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: DockerCluster
metadata:
  name: dev-cluster-01
  namespace: default
---
# Infra template for control plane machines
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: DockerMachineTemplate
metadata:
  name: dev-cluster-01-cp
  namespace: default
spec:
  template:
    spec:
      extraMounts:
        - containerPath: /var/run/docker.sock
          hostPath: /var/run/docker.sock
---
# KubeadmControlPlane: the control plane declaration
apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: KubeadmControlPlane
metadata:
  name: dev-cluster-01-cp
  namespace: default
spec:
  replicas: 1
  version: v1.31.4
  machineTemplate:
    infrastructureRef:
      apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
      kind: DockerMachineTemplate
      name: dev-cluster-01-cp
  kubeadmConfigSpec:
    clusterConfiguration:
      apiServer:
        certSANs: [localhost, 127.0.0.1, 0.0.0.0, host.docker.internal]
    initConfiguration:
      nodeRegistration:
        criSocket: unix:///var/run/containerd/containerd.sock
    joinConfiguration:
      nodeRegistration:
        criSocket: unix:///var/run/containerd/containerd.sock
---
# Infra template for worker machines
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: DockerMachineTemplate
metadata:
  name: dev-cluster-01-md-0
  namespace: default
spec:
  template:
    spec: {}
---
# Bootstrap (join) config template for worker machines
apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
kind: KubeadmConfigTemplate
metadata:
  name: dev-cluster-01-md-0
  namespace: default
spec:
  template:
    spec:
      joinConfiguration:
        nodeRegistration:
          criSocket: unix:///var/run/containerd/containerd.sock
---
# MachineDeployment: the worker pool declaration
apiVersion: cluster.x-k8s.io/v1beta1
kind: MachineDeployment
metadata:
  name: dev-cluster-01-md-0
  namespace: default
spec:
  clusterName: dev-cluster-01
  replicas: 2
  selector:
    matchLabels: null
  template:
    spec:
      clusterName: dev-cluster-01
      version: v1.31.4
      bootstrap:
        configRef:
          apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
          kind: KubeadmConfigTemplate
          name: dev-cluster-01-md-0
      infrastructureRef:
        apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
        kind: DockerMachineTemplate
        name: dev-cluster-01-md-0

Note that the examples in this article use the widely deployed v1beta1 API. The latest CAPI release lines are transitioning to v1beta2, so check the API version of the release you run before applying.

kubectl apply -f dev-cluster-01.yaml

Step 3 — Watching Convergence

# View overall state as a tree (the single most useful command)
clusterctl describe cluster dev-cluster-01

# Track individual resources
kubectl get cluster,machinedeployment,machineset,machine
kubectl get kubeadmcontrolplane
docker ps   # you will see the "machine" containers CAPD created

Watching a Machine transition through Pending → Provisioning → Provisioned → Running feels exactly like the pod lifecycle.

Step 4 — Obtain the kubeconfig and Install a CNI

Nodes in the new cluster are NotReady because there is no CNI yet. This is normal.

clusterctl get kubeconfig dev-cluster-01 > dev-01.kubeconfig

kubectl --kubeconfig dev-01.kubeconfig get nodes
# STATUS NotReady — expected before CNI installation

# Install Calico (Cilium, Flannel, anything works)
kubectl --kubeconfig dev-01.kubeconfig apply -f \
  https://raw.githubusercontent.com/projectcalico/calico/v3.29.1/manifests/calico.yaml

kubectl --kubeconfig dev-01.kubeconfig get nodes
# Shortly afterwards, all nodes Ready

This is the moment of realization: you just created a cluster with a single kubectl apply. Want a hundred of them? Apply a hundred YAML sets. And those YAMLs can live in Git.

Operations Deep Dive — Upgrades, Scaling, Auto-Remediation

How Rolling Upgrades Work — Replace Nodes, Never Patch Them

CAPI upgrades are not OS-patch style but machine-replacement style. Change the version field and the controller creates new-version machines, joins them, drains the old ones, and discards them.

# Control plane upgrade: change only the KCP version
kubectl patch kubeadmcontrolplane dev-cluster-01-cp --type merge \
  -p '{"spec":{"version":"v1.32.2"}}'

# Worker upgrade: change the MachineDeployment version
kubectl patch machinedeployment dev-cluster-01-md-0 --type merge \
  -p '{"spec":{"template":{"spec":{"version":"v1.32.2"}}}}'

The internal sequence looks like this.

KCP upgrade (with replicas=3)
 1. Create 1 new v1.32 control plane machine (4 total)
 2. Join with kubeadm join --control-plane, etcd members = 4
 3. Pick 1 old v1.31 machine → etcd member remove → drain → delete (3 total)
 4. Repeat 1-3 until no old machines remain
 5. Quorum is preserved throughout: 3 → 4 → 3 → 4 → 3

MachineDeployment upgrade (RollingUpdate strategy)
 1. Create a new-version MachineSet
 2. Add new machines per maxSurge/maxUnavailable, drain and delete old ones
 3. Repeat until the old MachineSet reaches replicas = 0

The upgrade ordering rule follows the Kubernetes version skew policy exactly: control plane first, workers later. Since the kubelet may lag kube-apiserver by up to three minor versions, you have room to roll worker upgrades out in stages.

Scaling

# Workers 2 → 5
kubectl scale machinedeployment dev-cluster-01-md-0 --replicas=5

# Control plane 1 → 3 (switch to HA)
kubectl scale kubeadmcontrolplane dev-cluster-01-cp --replicas=3

If you need autoscaling, the cluster-autoscaler supports a CAPI provider mode that scales MachineDeployments instead of cloud node groups.

MachineHealthCheck — Automatic Node Failure Recovery

A MachineHealthCheck (MHC) watches node conditions and, when an unhealthy condition persists, deletes the machine (which means replacing it with a new one). That is remediation.

apiVersion: cluster.x-k8s.io/v1beta1
kind: MachineHealthCheck
metadata:
  name: dev-cluster-01-worker-mhc
  namespace: default
spec:
  clusterName: dev-cluster-01
  # Safety valve so too many machines are not replaced at once
  maxUnhealthy: 40%
  # Unhealthy if the machine fails to join as a node within this window
  nodeStartupTimeout: 10m
  selector:
    matchLabels:
      cluster.x-k8s.io/deployment-name: dev-cluster-01-md-0
  unhealthyConditions:
    - type: Ready
      status: Unknown
      timeout: 300s
    - type: Ready
      status: "False"
      timeout: 300s

maxUnhealthy is a critical safety mechanism. For example, when a CNI outage makes every node NotReady, it prevents the catastrophe of MHC replacing all machines. Once the unhealthy ratio exceeds the threshold, remediation stops and waits for a human.

ClusterClass — Turning Clusters into Templates

Managing seven YAML documents per cluster becomes torture at even ten clusters. ClusterClass defines a "class (template)" of cluster, and individual clusters become thin declarations that only fill in variables.

apiVersion: cluster.x-k8s.io/v1beta1
kind: ClusterClass
metadata:
  name: standard-dev
  namespace: default
spec:
  controlPlane:
    ref:
      apiVersion: controlplane.cluster.x-k8s.io/v1beta1
      kind: KubeadmControlPlaneTemplate
      name: standard-dev-cp
    machineInfrastructure:
      ref:
        apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
        kind: DockerMachineTemplate
        name: standard-dev-cp-machine
  infrastructure:
    ref:
      apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
      kind: DockerClusterTemplate
      name: standard-dev-infra
  workers:
    machineDeployments:
      - class: default-worker
        template:
          bootstrap:
            ref:
              apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
              kind: KubeadmConfigTemplate
              name: standard-dev-worker-bootstrap
          infrastructure:
            ref:
              apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
              kind: DockerMachineTemplate
              name: standard-dev-worker-machine
  variables:
    - name: workerReplicas
      required: true
      schema:
        openAPIV3Schema:
          type: integer
          default: 2

Now a single cluster shrinks to this. The topology field is the key.

apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
  name: dev-cluster-02
  namespace: default
spec:
  clusterNetwork:
    pods:
      cidrBlocks: ["192.168.0.0/16"]
  topology:
    class: standard-dev
    version: v1.31.4
    controlPlane:
      replicas: 1
    workers:
      machineDeployments:
        - class: default-worker
          name: md-0
          replicas: 3
    variables:
      - name: workerReplicas
        value: 3

The real power of managed topologies is propagation of class changes. Modify the machine template in a ClusterClass, and every cluster using that class follows along (per policy) via rolling replacement. Standardizing hundreds of clusters runs on this mechanism.

Machine Image Management — image-builder

In production you use golden images with kubeadm and kubelet pre-baked. The Kubernetes SIG-maintained image-builder project builds AWS AMIs, Azure images, vSphere OVAs, and Raw/QCOW2 (bare metal) images on top of Packer + Ansible.

git clone https://github.com/kubernetes-sigs/image-builder.git
cd image-builder/images/capi
# Example: build an Ubuntu 22.04 + K8s v1.31 OVA for vSphere
make build-node-ova-vsphere-ubuntu-2204

Image versioning is node versioning. Pin "K8s version + OS patch level" into the image tag, and every node in a cluster becomes bit-for-bit identical. This is the foundation of immutable infrastructure.

Combining with GitOps — Governing a Cluster Fleet from Git

CAPI resources are, in the end, ordinary Kubernetes YAML, so syncing them into the management cluster with Argo CD or Flux completes GitOps for clusters.

+-----------+     push      +-----------+     sync      +---------------------+
| Platform  | ------------> |    Git    | ------------> | Management Cluster  |
| Team      |   PR review   |  fleet-   |  Argo CD /    |  CAPI controllers   |
|           |               |  repo     |  Flux         |  reconcile          |
+-----------+               +-----------+               +----------+----------+
                                                                   |
                                            create/upgrade/repair  v
                                              +---------------------------+
                                              | Workload Clusters (fleet) |
                                              +---------------------------+

An example repository layout:

fleet-repo/
├── clusterclasses/
│   ├── standard-prod.yaml
│   └── standard-edge.yaml
├── clusters/
│   ├── prod/
│   │   ├── prod-seoul-01.yaml      # thin topology-based declarations
│   │   └── prod-tokyo-01.yaml
│   ├── stage/
│   │   └── stage-seoul-01.yaml
│   └── edge/
│       ├── store-0001.yaml
│       └── store-0002.yaml
└── addons/
    ├── cni/                        # default addons for new clusters
    └── monitoring/

The operational effects of this pattern are powerful.

Cluster creation and changes must pass PR review. The audit trail is the Git history itself.
Version upgrades become standardized as "a PR that changes the version field in YAML."
To auto-deploy addons like CNI and monitoring into new clusters, use the Argo CD ApplicationSet cluster generator or CAPI ClusterResourceSets.

Naming and Label Strategy at Fleet Scale

Across tens to hundreds of clusters, consistent naming and labels become the handles for automation.

Naming convention example: <purpose>-<region>-<serial>
  prod-seoul-01, stage-tokyo-01, edge-store-0042

Recommended labels (set on the Cluster resource):
  env: prod | stage | dev
  region: seoul | tokyo | ...
  tier: core | edge
  team: payments | search | ...
  upgrade-wave: "1" | "2" | "3"   # controls upgrade waves

The upgrade-wave label is especially useful. Apply the new version to wave 1 (internal clusters) first, let it bake for a few days, then spread to waves 2 and 3 — an operation you can automate with label selectors.

Pivot, Backup/DR, Version Skew — Operating the Management Cluster Itself

Chicken and Egg — Who Creates the Management Cluster

The common bootstrap pattern looks like this.

Create a temporary local kind cluster and run clusterctl init
From the temporary cluster, create the "real" management cluster (cloud/on-prem) as a workload cluster
Move all CAPI resources to the new cluster with clusterctl move (pivot)
The management cluster then manages itself (self-hosted)

# Install providers on the new cluster, then move the resources
clusterctl init --kubeconfig mgmt-real.kubeconfig --infrastructure aws
clusterctl move --to-kubeconfig mgmt-real.kubeconfig

clusterctl move transfers the Cluster and its whole ownership chain of objects plus secrets (kubeconfig, CA, etcd certificates) wholesale. Reconciliation is paused during the move, so workload clusters are unaffected.

If the Management Cluster Dies, Do Workloads Die Too

No. This is an important virtue of CAPI design. Workload clusters operate fully independently without the management cluster. What you lose is lifecycle operations (creation/upgrade/auto-remediation). You still need a DR plan.

If the source of truth for CAPI resources is Git (GitOps), the primary recovery path is building a new management cluster and re-syncing. However, cluster secrets (CA and so on) are not in Git, so backups are mandatory.
Periodically back up the CAPI namespaces of the management cluster (resources plus secrets) with tools like Velero.
Rehearse the recovery. "We have backups" and "we can restore" are different propositions.

Version Skew and Upgrade Order

The versions of the management machinery itself are also under management.

Upgrade order (top to bottom)
 1. Kubernetes version of the management cluster
    (check the range supported by your CAPI release)
 2. The clusterctl binary
 3. CAPI core + providers:  clusterctl upgrade plan / apply
 4. Kubernetes versions of workload clusters (rolling by wave)
    - Inside each cluster: control plane → workers
    - No skipping minor versions (1.30 → 1.32 not allowed; go via 1.31)

Also check CAPI contract versions (the compatibility contract with infrastructure providers). Upgrading only the core while neglecting providers leads to incidents where reconciliation stops. clusterctl upgrade plan computes the compatibility matrix, so always look at the plan first.

Limits and Reality — What to Know Before Adopting

Learning Curve and Required Operational Maturity

CAPI has many abstraction layers. When something breaks, you must be able to ride the debugging chain down from Cluster → KCP → Machine → InfraMachine → cloud-init logs. For teams unfamiliar with Kubernetes controller patterns (owner references, conditions, finalizers), it is a steep hill.

Uneven Provider Maturity

CAPA/CAPZ/CAPV have many large-scale production references, but some providers have thin feature breadth and documentation. Before adopting, verify the provider's release cadence, issue response speed, and ClusterClass support.

The Relationship with Managed Kubernetes (EKS/AKS/GKE)

"We use EKS anyway — do we need CAPI?" is a fair question. The answer depends on the shape of your organization.

If you run only three EKS clusters, Terraform or eksctl is probably sufficient.
But if your fleet mixes EKS, on-prem vSphere, and edge bare metal, CAPI is close to the only option that covers everything with a single API. Managed control planes can also be declared as CAPI resources, via CAPA's AWSManagedControlPlane or CAPZ's AKS support.

Comparison with Crossplane / Terraform

Aspect	Cluster API	Crossplane	Terraform
Primary purpose	Dedicated to K8s cluster lifecycle	General cloud resource composition	General IaC
Operating model	Always-on controller reconcile	Always-on controller reconcile	Apply at execution time
Node/machine abstraction	First-class (Machine etc.)	None (managed K8s focused)	Indirect via modules
Control plane orchestration	KCP automates down to etcd	Delegated to managed CPs	Build it yourself
Auto-remediation	MachineHealthCheck built in	Drift correction level	None
Non-cluster resources (DBs etc.)	Out of scope	Strength	Strength
State store	etcd (K8s native)	etcd	State file

These three are often a division of labor rather than rivals. It is common to layer them: "VPC/IAM with Terraform, clusters with CAPI, the databases clusters use with Crossplane."

Which Organizations Does It Fit — A Decision Guide

Q1. You manage fewer than 5 clusters with no growth plans
  → CAPI is overkill. Managed K8s + IaC is enough.

Q2. You have 10+ clusters, or growth toward tenants/edge is coming
  → Strong CAPI candidate. Especially if cluster creation must be self-service.

Q3. Your estate mixes on-prem (vSphere/bare metal) or multi-cloud
  → CAPI's strongest use case. The single declarative model shines most here.

Q4. Does the team have K8s controller/CRD operating experience
  → If not, build muscle first at small scale (CAPD labs, dev clusters).

Q5. Does a platform team exist
  → CAPI is a platform team tool. Without a dedicated owner it rots unattended.

Troubleshooting — The Order to Look When a Machine Is Stuck

A machine stuck in Provisioning is the most common symptom in CAPI operations. Internalize the diagnostic order and you will find the cause within 20 minutes most of the time.

# 1. Big picture: see where it stopped, as a tree
clusterctl describe cluster dev-cluster-01 --show-conditions all

# 2. Machine conditions and events
kubectl describe machine dev-cluster-01-md-0-xxxxx
kubectl get events --field-selector involvedObject.kind=Machine

# 3. Infra machine (provider side) state — did the VM actually start
kubectl describe dockermachine dev-cluster-01-md-0-xxxxx
# On AWS: kubectl describe awsmachine ...

# 4. Was the bootstrap secret created (cloud-init data)
kubectl get secret | grep dev-cluster-01-md-0

# 5. Controller logs — layer by layer
kubectl logs -n capi-system deploy/capi-controller-manager
kubectl logs -n capi-kubeadm-bootstrap-system \
  deploy/capi-kubeadm-bootstrap-controller-manager
kubectl logs -n capi-kubeadm-control-plane-system \
  deploy/capi-kubeadm-control-plane-controller-manager
kubectl logs -n capd-system deploy/capd-controller-manager

# 6. cloud-init logs inside the machine (access differs per infra)
# CAPD: docker exec into the container, then
#   cat /var/log/cloud-init-output.log  (or journalctl -u kubelet)

A list of frequently encountered causes:

Symptom	Common cause
InfraMachine never appears	Provider not installed, credential secret error, quota exceeded
VM started but never joins	Image lacks kubeadm/kubelet, network blocked toward the CP endpoint
Stuck on the first CP machine	LB/endpoint not created, certSANs mismatch, etcd failed to start
Nodes stay NotReady	CNI not installed (normal phase), CNI misconfiguration
Stuck mid-upgrade	Drain blocked by PDBs, insufficient capacity for maxSurge
Deletion never finishes	Waiting on finalizers — check provider logs for infra deletion failure

In particular, drain deadlock caused by PodDisruptionBudgets is a regular upgrade issue. A PDB whose minAvailable equals the replica count means the drain never completes. Setting nodeDrainTimeout lets the process force ahead after a set period.

Adoption Checklist

[ ] Does your cluster count / growth plan justify adopting CAPI
[ ] Did you vet the maturity of your infra provider (CAPA/CAPV/Metal3 etc.)
[ ] Has the whole team experienced the full flow via a CAPD-based local lab
[ ] Did you build a golden image pipeline (image-builder)
[ ] Did you templatize the standard cluster shape with ClusterClass
[ ] Did you configure the MachineHealthCheck maxUnhealthy safety valve
[ ] Did you define the GitOps repo structure and PR review policy
[ ] Did you document naming/label/upgrade-wave strategies
[ ] Did you finish management cluster backup (Velero etc.) and a restore drill
[ ] Do you have a regular upgrade calendar driven by clusterctl upgrade plan
[ ] Did you write a runbook for stuck-machine troubleshooting
[ ] Did you agree PDB/nodeDrainTimeout policies with workload teams

Closing Thoughts

The essence of Cluster API is not a "cluster creation tool" but a replacement of the operating model for clusters. In the world of console clicks and scripts, a cluster was a handcrafted artifact. In the CAPI world, a cluster is an ordinary resource — defined by declaration and guarded by controllers, like pods stamped out by a Deployment. Dead nodes get replaced, version bumps converge via rolling replacement, and every change lands in Git history.

Of course it is not free. There is the learning curve of the abstraction layers, the judgment needed about the provider ecosystem, and a new operational subject called the management cluster. For organizations with five clusters or fewer, it may be overkill. But if your destiny is dozens of clusters soon — and most platform organizations cannot escape that destiny — it is a technology that pays compound interest the earlier you learn it. Start today with a 30-minute lab on your laptop using kind and CAPD.

References

Cluster API official documentation (The Cluster API Book): https://cluster-api.sigs.k8s.io/
Cluster API GitHub repository: https://github.com/kubernetes-sigs/cluster-api
Quick Start (CAPD lab): https://cluster-api.sigs.k8s.io/user/quick-start
ClusterClass concept documentation: https://cluster-api.sigs.k8s.io/tasks/experimental-features/cluster-class/
clusterctl reference: https://cluster-api.sigs.k8s.io/clusterctl/overview
Cluster API Provider AWS (CAPA): https://cluster-api-aws.sigs.k8s.io/
Cluster API Provider Azure (CAPZ): https://capz.sigs.k8s.io/
Cluster API Provider GCP (CAPG): https://github.com/kubernetes-sigs/cluster-api-provider-gcp
Cluster API Provider vSphere (CAPV): https://github.com/kubernetes-sigs/cluster-api-provider-vsphere
Cluster API Provider OpenStack (CAPO): https://github.com/kubernetes-sigs/cluster-api-provider-openstack
Metal3 (bare metal provider): https://metal3.io/
Kubernetes image-builder: https://image-builder.sigs.k8s.io/
kubeadm official documentation: https://kubernetes.io/docs/reference/setup-tools/kubeadm/
Kubernetes version skew policy: https://kubernetes.io/releases/version-skew-policy/
kind official documentation: https://kind.sigs.k8s.io/
Argo CD official documentation: https://argo-cd.readthedocs.io/
Flux official documentation: https://fluxcd.io/
Crossplane official site: https://www.crossplane.io/