Skip to content
Published on

Advanced Operators — Finalizers, Admission Webhooks, and Status/Conditions Design

Authors

Introduction

If you have written a basic Reconcile loop, it is time to move your Operator from a "demo" to "production." Three topics sit right on that boundary: Finalizers, Admission Webhooks, and Status/Conditions design.

Why these three matter is simple. The Reconcile loop focuses on "converging to the desired state," but real operations constantly raise other questions.

  • When a user deletes a CR (Custom Resource), who cleans up the external cloud resources we created (an S3 bucket, a DNS record, an external DB)? (Finalizers)
  • How do we stop an invalid spec from entering the cluster in the first place, or fill in default values? (Admission Webhooks)
  • How do we show resource status at a glance with kubectl get, and let other controllers or humans judge "is this resource ready" in a standard way? (Status/Conditions)

This post explains each topic with hands-on code, using the 2026 baseline where Kubebuilder supports Kubernetes 1.36 / Go 1.26, controller-runtime is v0.24.x, and controller-tools is v0.21.x. It also reflects the change where kube-rbac-proxy, formerly used to protect the metrics endpoint, has been removed and replaced by controller-runtime's WithAuthenticationAndAuthorization filter.

Let me first lay out the whole picture in one diagram.

                     +-------------------------------+
   kubectl apply --> | Admission (mutating)          |  defaulting
                     |   - fill in default values    |
                     +---------------+---------------+
                                     |
                                     v
                     +-------------------------------+
                     | Admission (validating)        |  validation
                     |   - validate spec, reject     |
                     +---------------+---------------+
                                     |
                                     v
                     +-------------------------------+
                     | etcd store (spec)             |
                     +---------------+---------------+
                                     |
                                     v
                     +-------------------------------+
                     | Reconcile loop                |  desired-state converge
                     |   - create/update external    |
                     |   - add finalizer             |
                     |   - update status/conditions  |
                     +-------------------------------+

Cleaning Up External Resources with Finalizers

What Is a Finalizer

A finalizer is a string in the object's metadata.finalizers list. As long as this list is not empty, the API server does not delete the object immediately even when the user requests deletion. Instead it sets metadata.deletionTimestamp to mark the object as "scheduled for deletion." Garbage collection (GC) waits until the finalizer list is fully empty.

Using this mechanism, we can hold an object in place until our controller is certain that "external cleanup is done."

user: kubectl delete myresource foo
        |
        v
+-------------------------------------------+
| API server                                |
|   is finalizers empty?                    |
+--------------------+----------------------+
        | empty                   | not empty
        v                        v
+----------------+    +-------------------------------+
| GC deletes now |    | set deletionTimestamp         |
+----------------+    | (object remains, pending del) |
                      +---------------+---------------+
                                      |
                                      v
                      +-------------------------------+
                      | Reconcile called again         |
                      |   detect deletionTimestamp != 0|
                      |   -> clean external resources  |
                      |   -> remove finalizer          |
                      +---------------+---------------+
                                      |
                                      v
                      +-------------------------------+
                      | finalizers empty -> GC deletes|
                      +-------------------------------+

The key point is that "delete" does not mean "disappear immediately." It means "a deletionTimestamp is stamped and Reconcile is called one more time." That one extra call is the last chance to clean up external resources.

Implementing the Finalizer Flow in Reconcile

controller-runtime provides helpers to add, remove, and check finalizers. They live in the sigs.k8s.io/controller-runtime/pkg/controller/controllerutil package as AddFinalizer, RemoveFinalizer, and ContainsFinalizer.

package controller

import (
	"context"

	ctrl "sigs.k8s.io/controller-runtime"
	"sigs.k8s.io/controller-runtime/pkg/client"
	"sigs.k8s.io/controller-runtime/pkg/controller/controllerutil"

	cloudv1 "example.com/operator/api/v1"
)

const bucketFinalizer = "cloud.example.com/bucket-cleanup"

func (r *BucketReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
	log := ctrl.LoggerFrom(ctx)

	var bucket cloudv1.Bucket
	if err := r.Get(ctx, req.NamespacedName, &bucket); err != nil {
		// NotFound means already deleted, so ignore.
		return ctrl.Result{}, client.IgnoreNotFound(err)
	}

	// 1) Check whether deletion was requested.
	if !bucket.ObjectMeta.DeletionTimestamp.IsZero() {
		// Deletion in progress. If our finalizer is still present, clean up.
		if controllerutil.ContainsFinalizer(&bucket, bucketFinalizer) {
			if err := r.cleanupExternalBucket(ctx, &bucket); err != nil {
				// On cleanup failure, keep the finalizer and retry.
				log.Error(err, "failed to clean up external bucket")
				return ctrl.Result{}, err
			}

			// Cleanup succeeded. Remove the finalizer -> GC deletes the object.
			controllerutil.RemoveFinalizer(&bucket, bucketFinalizer)
			if err := r.Update(ctx, &bucket); err != nil {
				return ctrl.Result{}, err
			}
		}
		// Nothing left to do if there is no finalizer.
		return ctrl.Result{}, nil
	}

	// 2) Normal create/update path: add the finalizer first if missing.
	if !controllerutil.ContainsFinalizer(&bucket, bucketFinalizer) {
		controllerutil.AddFinalizer(&bucket, bucketFinalizer)
		if err := r.Update(ctx, &bucket); err != nil {
			return ctrl.Result{}, err
		}
		// The Update created a new resource version, so return immediately
		// and continue the real work in the next Reconcile.
		return ctrl.Result{}, nil
	}

	// 3) The actual external-resource convergence logic.
	if err := r.reconcileExternalBucket(ctx, &bucket); err != nil {
		return ctrl.Result{}, err
	}

	return ctrl.Result{}, nil
}

The core patterns in the code above are as follows.

  1. Check whether deletion is in progress first with DeletionTimestamp.IsZero().
  2. If deleting, clean up external resources and remove the finalizer only after success. If cleanup fails, return an error to retry and leave the finalizer in place.
  3. On the normal path, add the finalizer before the real convergence logic. The finalizer must be attached before you create external resources, so that you do not lose the cleanup opportunity if a delete request arrives midway.

The External Cleanup Function and Idempotency

cleanupExternalBucket must be idempotent. Reconcile may be called many times for the same object, so an attempt to delete an already-deleted bucket must not throw an error.

func (r *BucketReconciler) cleanupExternalBucket(ctx context.Context, bucket *cloudv1.Bucket) error {
	name := bucket.Status.ExternalName
	if name == "" {
		// We never created an external resource -> nothing to clean up.
		return nil
	}

	err := r.CloudAPI.DeleteBucket(ctx, name)
	if isNotFound(err) {
		// Already gone -> treat as success (idempotency).
		return nil
	}
	return err
}

If Status.ExternalName is empty or the external API returns "already gone," treat it as success. This keeps the cleanup operation safely repeatable.

Admission Webhooks: Defaulting and Validation

Types of Webhooks

An Admission Webhook is a hook that intercepts an object before it is stored in etcd. There are two kinds.

KindPurposeCan modify the objectTypical use
MutatingTransform the objectYesFill defaults, inject labels
ValidatingValidate the objectNoReject specs, enforce policy

The order is always Mutating first, Validating later. Since defaults are filled before validation, validation logic can always assume it sees a complete object.

Creating a Webhook with Kubebuilder

Kubebuilder provides webhook scaffolding in a single command. Below enables both defaulting (mutating) and validation (validating).

kubebuilder create webhook \
  --group cloud \
  --version v1 \
  --kind Bucket \
  --defaulting \
  --programmatic-validation

This command generates a file such as internal/webhook/v1/bucket_webhook.go, the markers for registration, and the manifest patches under config/webhook/.

Implementing Defaulting (Mutating)

The latest controller-runtime webhook API uses the admission.CustomDefaulter interface. Inside the Default method, you fill in default values on the object.

package webhookv1

import (
	"context"
	"fmt"

	"k8s.io/apimachinery/pkg/runtime"
	ctrl "sigs.k8s.io/controller-runtime"
	logf "sigs.k8s.io/controller-runtime/pkg/log"
	"sigs.k8s.io/controller-runtime/pkg/webhook"
	"sigs.k8s.io/controller-runtime/pkg/webhook/admission"

	cloudv1 "example.com/operator/api/v1"
)

// +kubebuilder:webhook:path=/mutate-cloud-example-com-v1-bucket,mutating=true,failurePolicy=fail,sideEffects=None,groups=cloud.example.com,resources=buckets,verbs=create;update,versions=v1,name=mbucket-v1.kb.io,admissionReviewVersions=v1

type BucketCustomDefaulter struct {
	DefaultRegion string
}

var _ admission.CustomDefaulter = &BucketCustomDefaulter{}

func (d *BucketCustomDefaulter) Default(ctx context.Context, obj runtime.Object) error {
	bucket, ok := obj.(*cloudv1.Bucket)
	if !ok {
		return fmt.Errorf("expected a Bucket object but got %T", obj)
	}
	log := logf.FromContext(ctx)

	// Fill the default region if empty.
	if bucket.Spec.Region == "" {
		bucket.Spec.Region = d.DefaultRegion
		log.Info("defaulting region", "region", d.DefaultRegion)
	}

	// Turn versioning on safely if not specified.
	if bucket.Spec.Versioning == nil {
		enabled := true
		bucket.Spec.Versioning = &enabled
	}

	return nil
}

Implementing Validation (Validating)

Validation implements the admission.CustomValidator interface. It has methods for create, update, and delete, and returns an error to reject.

// +kubebuilder:webhook:path=/validate-cloud-example-com-v1-bucket,mutating=false,failurePolicy=fail,sideEffects=None,groups=cloud.example.com,resources=buckets,verbs=create;update,versions=v1,name=vbucket-v1.kb.io,admissionReviewVersions=v1

type BucketCustomValidator struct{}

var _ admission.CustomValidator = &BucketCustomValidator{}

func (v *BucketCustomValidator) ValidateCreate(ctx context.Context, obj runtime.Object) (admission.Warnings, error) {
	bucket, ok := obj.(*cloudv1.Bucket)
	if !ok {
		return nil, fmt.Errorf("expected a Bucket object but got %T", obj)
	}
	return v.validate(bucket)
}

func (v *BucketCustomValidator) ValidateUpdate(ctx context.Context, oldObj, newObj runtime.Object) (admission.Warnings, error) {
	oldBucket := oldObj.(*cloudv1.Bucket)
	newBucket := newObj.(*cloudv1.Bucket)

	// Enforce region as an immutable field.
	if oldBucket.Spec.Region != newBucket.Spec.Region {
		return nil, fmt.Errorf("spec.region is immutable")
	}
	return v.validate(newBucket)
}

func (v *BucketCustomValidator) ValidateDelete(ctx context.Context, obj runtime.Object) (admission.Warnings, error) {
	return nil, nil
}

func (v *BucketCustomValidator) validate(bucket *cloudv1.Bucket) (admission.Warnings, error) {
	var warnings admission.Warnings

	if bucket.Spec.Region == "" {
		return nil, fmt.Errorf("spec.region must not be empty")
	}

	allowed := map[string]bool{"ap-northeast-2": true, "us-east-1": true}
	if !allowed[bucket.Spec.Region] {
		return nil, fmt.Errorf("spec.region %q is not allowed", bucket.Spec.Region)
	}

	if bucket.Spec.RetentionDays > 365 {
		warnings = append(warnings, "retentionDays exceeds 365; be mindful of cost.")
	}

	return warnings, nil
}

Returning admission.Warnings lets you show a warning to the user without rejecting. Use an error when policy must block, and a warning when you only advise.

Registering the Webhook

The webhook must be registered with the manager. In the latest Kubebuilder style, you wire it through a builder via the SetupWebhookWithManager function.

func SetupBucketWebhookWithManager(mgr ctrl.Manager) error {
	return ctrl.NewWebhookManagedBy(mgr).
		For(&cloudv1.Bucket{}).
		WithDefaulter(&BucketCustomDefaulter{DefaultRegion: "ap-northeast-2"}).
		WithValidator(&BucketCustomValidator{}).
		Complete()
}

Managing Certificates with cert-manager

Admission Webhooks are called over HTTPS, so they need a TLS certificate the API server trusts. Managing this by hand is tedious, so using cert-manager is the standard. Kubebuilder generates the related manifests under config/certmanager/.

apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
  name: selfsigned-issuer
  namespace: system
spec:
  selfSigned: {}
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: serving-cert
  namespace: system
spec:
  dnsNames:
    - webhook-service.system.svc
    - webhook-service.system.svc.cluster.local
  issuerRef:
    kind: Issuer
    name: selfsigned-issuer
  secretName: webhook-server-cert

Then, in the webhook configuration, the CA bundle is auto-injected with the cert-manager.io/inject-ca-from annotation.

apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingWebhookConfiguration
metadata:
  name: bucket-validating-webhook
  annotations:
    cert-manager.io/inject-ca-from: system/serving-cert
webhooks:
  - name: vbucket-v1.kb.io
    failurePolicy: Fail
    sideEffects: None
    admissionReviewVersions: ["v1"]
    clientConfig:
      service:
        name: webhook-service
        namespace: system
        path: /validate-cloud-example-com-v1-bucket
    rules:
      - apiGroups: ["cloud.example.com"]
        apiVersions: ["v1"]
        operations: ["CREATE", "UPDATE"]
        resources: ["buckets"]

The Status Subresource and the Conditions Standard

Why Use a Status Subresource

Declaring status as a subresource means spec and status are managed via separate endpoints. This lets the controller avoid resourceVersion conflicts on the spec when it only updates status, so the user's spec edits and the controller's status updates do not overwrite each other.

Add a marker to the CRD type.

// +kubebuilder:object:root=true
// +kubebuilder:subresource:status

type Bucket struct {
	metav1.TypeMeta   `json:",inline"`
	metav1.ObjectMeta `json:"metadata,omitempty"`

	Spec   BucketSpec   `json:"spec,omitempty"`
	Status BucketStatus `json:"status,omitempty"`
}

The Standard Conditions Structure

The Kubernetes ecosystem treats placing a conditions array in status as the standard convention. Each Condition has the following fields.

FieldMeaning
TypeCondition name (e.g. Ready, Progressing)
StatusTrue / False / Unknown
ReasonA short machine-readable reason code (CamelCase)
MessageA human-readable description
LastTransitionTimeWhen the status last changed
ObservedGenerationThe spec generation this condition reflects

Use the metav1.Condition type directly.

import metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"

type BucketStatus struct {
	// ObservedGeneration is the spec generation the controller last processed.
	ObservedGeneration int64 `json:"observedGeneration,omitempty"`

	// ExternalName is the actual external bucket name created.
	ExternalName string `json:"externalName,omitempty"`

	// +patchMergeKey=type
	// +patchStrategy=merge
	// +listType=map
	// +listMapKey=type
	Conditions []metav1.Condition `json:"conditions,omitempty"`
}

Using meta.SetStatusCondition

SetStatusCondition in the k8s.io/apimachinery/pkg/api/meta package updates a Condition of the same Type if one already exists, and appends it otherwise. The key behavior is that it only updates LastTransitionTime when the Status actually changes.

import (
	"k8s.io/apimachinery/pkg/api/meta"
	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)

func (r *BucketReconciler) markReady(ctx context.Context, bucket *cloudv1.Bucket, ready bool, reason, msg string) error {
	status := metav1.ConditionFalse
	if ready {
		status = metav1.ConditionTrue
	}

	meta.SetStatusCondition(&bucket.Status.Conditions, metav1.Condition{
		Type:               "Ready",
		Status:             status,
		Reason:             reason,
		Message:            msg,
		ObservedGeneration: bucket.Generation,
	})

	// Update observedGeneration too, recording "this generation was processed."
	bucket.Status.ObservedGeneration = bucket.Generation

	// Always use Status().Update() to update only the status subresource.
	return r.Status().Update(ctx, bucket)
}

Note that you use r.Status().Update(), not r.Update(). The former updates the status subresource; the latter updates the spec.

Filling Conditions in Reconcile

func (r *BucketReconciler) reconcileExternalBucket(ctx context.Context, bucket *cloudv1.Bucket) error {
	// Record the in-progress state.
	if err := r.markCondition(ctx, bucket, "Progressing", metav1.ConditionTrue,
		"Creating", "creating external bucket"); err != nil {
		return err
	}

	name, err := r.CloudAPI.EnsureBucket(ctx, bucket.Spec.Region)
	if err != nil {
		_ = r.markCondition(ctx, bucket, "Ready", metav1.ConditionFalse,
			"CreateFailed", err.Error())
		return err
	}

	bucket.Status.ExternalName = name
	return r.markCondition(ctx, bucket, "Ready", metav1.ConditionTrue,
		"Created", "external bucket is ready")
}

Customizing kubectl Output with additionalPrinterColumns

To show status at a glance in kubectl get, add printer column markers. controller-tools generates additionalPrinterColumns in the CRD.

// +kubebuilder:printcolumn:name="Region",type=string,JSONPath=`.spec.region`
// +kubebuilder:printcolumn:name="Ready",type=string,JSONPath=`.status.conditions[?(@.type=="Ready")].status`
// +kubebuilder:printcolumn:name="External",type=string,JSONPath=`.status.externalName`
// +kubebuilder:printcolumn:name="Age",type=date,JSONPath=`.metadata.creationTimestamp`

type Bucket struct {
	// ...
}

The generated CRD contains YAML like this.

additionalPrinterColumns:
  - name: Region
    type: string
    jsonPath: .spec.region
  - name: Ready
    type: string
    jsonPath: .status.conditions[?(@.type=="Ready")].status
  - name: External
    type: string
    jsonPath: .status.externalName
  - name: Age
    type: date
    jsonPath: .metadata.creationTimestamp

Now a user can check the state in a single line.

NAME      REGION           READY   EXTERNAL          AGE
my-data   ap-northeast-2   True    my-data-9af31     12m

Common Pitfalls

Resource Leak from a Missing Finalizer

If you do not add the finalizer before creating external resources, a fast delete request can GC the object before Reconcile gets a "chance to clean up." The result is orphaned resources left in the cloud, leaking cost. Always attach the finalizer before creating external resources.

Cluster Paralysis from failurePolicy

A Validating Webhook with failurePolicy: Fail rejects all creates and updates of the matching resource if the webhook server cannot respond. If the webhook depends on a resource it manages itself, or its Pod dies, part of the cluster's API can be paralyzed.

failurePolicyBehavior on webhook failureWhen it fits
FailReject the requestWhen security/policy enforcement is critical
IgnoreLet the request throughWhen availability matters more

It is safer to exclude core system namespaces from the webhook target with a namespaceSelector, so a webhook failure does not block cluster bootstrap.

Webhook Timeout

Webhooks have a short default timeout (timeoutSeconds). Doing slow work like external API calls inside a webhook can get the request rejected on timeout. Use webhooks only for fast in-memory validation and defaulting, and defer heavy work to Reconcile.

Confusing the Processing Order

Mutating always runs before Validating. So in Validating you may assume defaults are already filled. Conversely, if you forget to fill a field in Mutating, Validating may reject the object because of an "empty field," so divide the responsibilities of the two hooks clearly.

Status Update Conflicts

Updating status with r.Update() can conflict with spec or ignore the subresource setting. Always use r.Status().Update(), and update ObservedGeneration together to clearly record "which generation was processed."

Testing

Testing Finalizers

With envtest you can spin up a real API server to verify the finalizer flow. Create an object, send a delete request, and check that external cleanup is called and the finalizer is removed.

import (
	. "github.com/onsi/ginkgo/v2"
	. "github.com/onsi/gomega"
	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
	"sigs.k8s.io/controller-runtime/pkg/client"
)

var _ = Describe("Bucket finalizer", func() {
	It("cleans up external resources on deletion", func() {
		bucket := &cloudv1.Bucket{
			ObjectMeta: metav1.ObjectMeta{Name: "test", Namespace: "default"},
			Spec:       cloudv1.BucketSpec{Region: "ap-northeast-2"},
		}
		Expect(k8sClient.Create(ctx, bucket)).To(Succeed())

		// Wait until the finalizer is added.
		Eventually(func() bool {
			_ = k8sClient.Get(ctx, client.ObjectKeyFromObject(bucket), bucket)
			return len(bucket.Finalizers) > 0
		}).Should(BeTrue())

		// Delete request.
		Expect(k8sClient.Delete(ctx, bucket)).To(Succeed())

		// Eventually the object must disappear.
		Eventually(func() bool {
			err := k8sClient.Get(ctx, client.ObjectKeyFromObject(bucket), bucket)
			return client.IgnoreNotFound(err) == nil && err != nil
		}).Should(BeTrue())

		// Confirm deletion was called on the fake cloud API.
		Expect(fakeCloud.DeletedBuckets()).To(ContainElement("test"))
	})
})

Testing Webhooks

Webhook testing is well covered by unit tests that call the defaulter and validator directly. You can also do integration tests by attaching the webhook server to envtest.

var _ = Describe("Bucket validator", func() {
	It("rejects a disallowed region", func() {
		v := &BucketCustomValidator{}
		bucket := &cloudv1.Bucket{Spec: cloudv1.BucketSpec{Region: "mars-1"}}
		_, err := v.ValidateCreate(ctx, bucket)
		Expect(err).To(HaveOccurred())
	})

	It("fills the default region", func() {
		d := &BucketCustomDefaulter{DefaultRegion: "ap-northeast-2"}
		bucket := &cloudv1.Bucket{}
		Expect(d.Default(ctx, bucket)).To(Succeed())
		Expect(bucket.Spec.Region).To(Equal("ap-northeast-2"))
	})
})

Operations Checklist

ItemStatus
Add finalizer before creating external resourcesRequired
Idempotency of the cleanup functionRequired
Mutating only does fast defaultingRecommended
Validating enforces immutable fieldsRecommended
Review failurePolicy and namespaceSelectorRequired
Update status only via Status().Update()Required
Record ObservedGenerationRecommended
Ensure visibility with printer columnsRecommended
Automate webhook certs with cert-managerRecommended

Conclusion

Finalizers, Admission Webhooks, and Status/Conditions are the three pillars that lift an Operator from a "working demo" to an "operable controller." In summary:

  • A Finalizer intercepts deletion and guarantees the last chance to safely clean up external resources. Attach it before creating external resources, and remove it only after cleanup succeeds.
  • A Webhook blocks invalid specs at the cluster entry stage and fills defaults. Keep Mutating fast, Validating strict, and configure failurePolicy and timeout carefully.
  • Status/Conditions expose resource state in a standard way, so humans and other controllers judge "ready" the same way. Use meta.SetStatusCondition and ObservedGeneration consistently.

When you implement all three on the common principle of idempotency, you get a robust Operator that does not buckle under retries and concurrency.

References