Skip to content
Published on

Automating Ingress TLS — Fully Automated Certificates with cert-manager and ACME

Authors

Introduction

Think back to the days of managing TLS certificates by hand on a production cluster: setting calendar reminders to renew those 90-day Let’s Encrypt certificates, scrambling to run a renewal script right before expiry. Forget a single certificate and you would get a 3 a.m. page that "the site is down."

cert-manager solves this problem at its root. As a Kubernetes-native controller, it manages the entire certificate lifecycle (issuance, renewal, revocation) declaratively. Add a single annotation to your Ingress, and cert-manager handles the rest: it talks to Let’s Encrypt over the ACME protocol, obtains a certificate, and renews it automatically before expiry.

A note on the 2026 landscape is also in order. The Ingress API is effectively frozen, with no new features being added. The successor standard is the Gateway API, and cert-manager supports TLS automation for the Gateway API as well. ingress-nginx has moved into maintenance mode, receiving mostly security patches, so for a greenfield build you should consider the Gateway API alongside it. This article focuses on Ingress, but covers the Gateway API integration at the end.

The goals of this article are:

  • Understanding cert-manager's CRD model and control flow precisely
  • Establishing the difference between HTTP-01 and DNS-01 challenges and when to choose each
  • Issuing wildcard certificates safely
  • Grasping how the annotation-based integration (ingress-shim) works
  • Monitoring renewals and expiry with Prometheus
  • Integrating internal CAs and the HashiCorp Vault Issuer
  • Diagnosing challenge failures methodically

cert-manager Architecture

cert-manager consists of several CRDs (Custom Resource Definitions) and the controllers that reconcile them. Understanding the relationships between the core resources first makes the whole flow clear.

┌──────────────────────────────────────────────────────────┐
│  Issuer / ClusterIssuer   (defines the issuing authority) │
│  - ACME(Let's Encrypt), CA, Vault, SelfSigned ...         │
└───────────────────────────┬──────────────────────────────┘
                            │ referenced by
┌──────────────────────────────────────────────────────────┐
│  Certificate              (declarative spec of desired cert)│
│  - dnsNames, secretName, issuerRef, duration ...          │
└───────────────────────────┬──────────────────────────────┘
                            │ creates an issuance request
┌──────────────────────────────────────────────────────────┐
│  CertificateRequest       (single issuance attempt, has CSR)│
└───────────────────────────┬──────────────────────────────┘
                            │ when ACME
┌──────────────────────────────────────────────────────────┐
│  Order                    (an ACME order = one transaction)│
└───────────────────────────┬──────────────────────────────┘
                            │ domain ownership validation
┌──────────────────────────────────────────────────────────┐
│  Challenge                (HTTP-01 or DNS-01 validation)  │
└───────────────────────────┬──────────────────────────────┘
                            │ on success
┌──────────────────────────────────────────────────────────┐
│  Secret (kubernetes.io/tls)   stores tls.crt + tls.key    │
└──────────────────────────────────────────────────────────┘

The roles of each resource are summarized below.

ResourceScopeRole
IssuerNamespaceIssues certificates only within its namespace
ClusterIssuerCluster-wideA shared issuing authority for all namespaces
CertificateNamespaceDeclares the desired end state of a certificate
CertificateRequestNamespaceRepresents a single CSR issuance attempt
OrderNamespaceAn ACME issuance transaction (a set of domains)
ChallengeNamespaceA unit of domain ownership proof (HTTP-01/DNS-01)

The important mental model here is that the Certificate represents the desired state, and the remaining child resources are intermediate artifacts that cert-manager automatically creates and destroys to achieve that state. Operators typically deal directly only with the Issuer/ClusterIssuer and the Certificate (or Ingress annotations), and only look at CertificateRequest/Order/Challenge when debugging.

Installation

You install cert-manager with Helm or static manifests. In production, managing the CRDs explicitly is recommended.

# Add the Helm repository
helm repo add jetstack https://charts.jetstack.io
helm repo update

# Install with CRDs included
helm install cert-manager jetstack/cert-manager \
  --namespace cert-manager \
  --create-namespace \
  --version v1.17.0 \
  --set crds.enabled=true

# Verify installation
kubectl get pods -n cert-manager
kubectl get crd | grep cert-manager.io

Once installation completes, three core pods come up: cert-manager (the controller), cert-manager-webhook (the validating/mutating webhook), and cert-manager-cainjector (the CA bundle injector). If the webhook is not working, applying CRDs will be rejected outright, so always verify the webhook pod's status right after installing.

ACME Challenges: Proving Domain Ownership

For an ACME CA such as Let’s Encrypt to issue a certificate, the requester must prove that it actually controls the domain in question. The ACME protocol (RFC 8555) defines this as a challenge, and cert-manager supports two types: HTTP-01 and DNS-01.

HTTP-01 Challenge Flow

User domain: app.example.com

1. cert-manager -> ACME server: orders a cert for app.example.com
2. ACME server -> cert-manager: issues token + validation path
3. cert-manager: creates temporary pod/service/Ingress
   prepares response at /.well-known/acme-challenge/<token>
4. ACME server -> http://app.example.com/.well-known/acme-challenge/<token>
   (HTTP request on port 80)
5. If the response matches the expected key authorization, validation passes
6. ACME server -> cert-manager: issues the certificate
7. cert-manager: cleans up temporary resources + stores the Secret

HTTP-01 requires inbound internet traffic on port 80 to reach the cluster's Ingress. In other words, the domain's A record must point to the Ingress's public IP, and port 80 must be open in the firewall.

DNS-01 Challenge Flow

User domain: app.example.com (or *.example.com)

1. cert-manager -> ACME server: orders a certificate
2. ACME server -> cert-manager: issues a token
3. cert-manager -> DNS provider API:
   registers a validation value in a _acme-challenge.example.com TXT record
4. ACME server -> DNS lookup: checks the _acme-challenge.example.com TXT
5. If the TXT value matches, validation passes
6. ACME server -> cert-manager: issues the certificate
7. cert-manager: cleans up the TXT record + stores the Secret

DNS-01 requires no inbound traffic at all. Instead, cert-manager must have API credentials for your DNS provider (Route53, Cloud DNS, Cloudflare, and so on) so it can dynamically create and delete TXT records.

HTTP-01 vs DNS-01 Comparison

AspectHTTP-01DNS-01
Validation methodHTTP response on port 80DNS TXT record
Requires inbound trafficYes (public IP + port 80)No
Wildcard certificatesNot possiblePossible (required)
DNS API credentialsNot neededNeeded
Internal/private clustersDifficultSuitable
Validation latencyFast (seconds)Must wait for DNS propagation
Multi-domain SANHTTP validation per domainBatched TXT validation
Common failure causesRouting/firewall/redirectsDNS permissions/propagation lag

To simplify the decision: if you have a public single domain and can open inbound port 80, HTTP-01 is the simplest. If you need a wildcard certificate, run a private cluster where you cannot open inbound traffic, or want to handle many domains at once, choose DNS-01.

Configuring a ClusterIssuer

In most environments you use a cluster-wide ClusterIssuer that can be shared. It is recommended to create both a Let’s Encrypt staging and a production issuer first. Staging has generous rate limits and is ideal for testing, while production issues trusted certificates but enforces strict issuance limits.

HTTP-01 ClusterIssuer

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-staging
spec:
  acme:
    server: https://acme-staging-v02.api.letsencrypt.org/directory
    email: platform@example.com
    privateKeySecretRef:
      name: letsencrypt-staging-account-key
    solvers:
      - http01:
          ingress:
            ingressClassName: nginx
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: platform@example.com
    privateKeySecretRef:
      name: letsencrypt-prod-account-key
    solvers:
      - http01:
          ingress:
            ingressClassName: nginx

privateKeySecretRef is the name of the Secret that stores the ACME account key. When you first apply the ClusterIssuer, cert-manager automatically registers an ACME account and keeps the account key in this Secret. Be careful: deleting this Secret triggers an account re-registration.

DNS-01 ClusterIssuer (Route53 Example)

To use DNS-01 you need DNS provider API credentials. Taking AWS Route53 as an example, you use IAM credentials or IRSA (IAM Roles for Service Accounts).

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-dns
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: platform@example.com
    privateKeySecretRef:
      name: letsencrypt-dns-account-key
    solvers:
      - dns01:
          route53:
            region: ap-northeast-2
            hostedZoneID: Z0123456789ABCDEFGHIJ
            # With IRSA you can omit accessKeyID/secretAccessKey
        selector:
          dnsZones:
            - example.com

You can also branch between multiple solvers using selectors. For example, you can define two solvers in a single ClusterIssuer so that *.example.com is handled via DNS-01 while other public domains use HTTP-01.

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-mixed
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: platform@example.com
    privateKeySecretRef:
      name: letsencrypt-mixed-account-key
    solvers:
      - dns01:
          route53:
            region: ap-northeast-2
            hostedZoneID: Z0123456789ABCDEFGHIJ
        selector:
          dnsZones:
            - example.com
      - http01:
          ingress:
            ingressClassName: nginx

cert-manager picks the solver whose selector dnsZones most specifically match the requested domain.

Writing the Certificate Resource Directly

Using Ingress annotations auto-generates the Certificate, but writing the Certificate explicitly lets you control issuance policy more finely.

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: app-tls
  namespace: production
spec:
  secretName: app-tls
  duration: 2160h        # 90 days
  renewBefore: 720h      # renew 30 days before expiry
  privateKey:
    algorithm: ECDSA
    size: 256
    rotationPolicy: Always
  dnsNames:
    - app.example.com
    - www.app.example.com
  issuerRef:
    name: letsencrypt-prod
    kind: ClusterIssuer
    group: cert-manager.io

The key fields:

  • secretName: the Secret where the issued certificate is stored. Its type is kubernetes.io/tls, with tls.crt and tls.key keys.
  • renewBefore: how many days before expiry to renew. The default is one-third of the certificate's lifetime before expiry.
  • privateKey.rotationPolicy: Always generates a new key on every renewal. This is recommended for security.
  • issuerRef: specifies which Issuer/ClusterIssuer to use.

Wildcard Certificates

A wildcard certificate (*.example.com) can only be issued via the DNS-01 challenge. HTTP-01 does not support wildcards, because under the ACME specification, proving ownership of a wildcard domain is only possible through DNS records.

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: wildcard-example-tls
  namespace: production
spec:
  secretName: wildcard-example-tls
  dnsNames:
    - "*.example.com"
    - "example.com"
  issuerRef:
    name: letsencrypt-dns
    kind: ClusterIssuer
    group: cert-manager.io

As shown above, it is common to include both the wildcard and the apex domain as SANs. The wildcard *.example.com does not cover example.com itself, so both are needed.

Ingress Integration (ingress-shim)

cert-manager's most convenient feature is annotation-based automation on Ingress. The component responsible for this is ingress-shim. When you attach a specific annotation to an Ingress resource, ingress-shim reads that Ingress's TLS spec and automatically creates and manages a Certificate resource.

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: app-ingress
  namespace: production
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
  ingressClassName: nginx
  tls:
    - hosts:
        - app.example.com
      secretName: app-tls
  rules:
    - host: app.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: app-svc
                port:
                  number: 8080

Applying this manifest works in the following order.

1. Ingress is created (with the cert-manager.io/cluster-issuer annotation)
2. ingress-shim reads tls[].secretName and hosts
3. ingress-shim auto-creates a Certificate resource
   - dnsNames = tls[].hosts
   - secretName = tls[].secretName
   - issuerRef = the cluster-issuer from the annotation
4. The main cert-manager controller processes the Certificate
   -> CertificateRequest -> Order -> Challenge -> Secret issued
5. The Ingress Controller uses the issued Secret for TLS termination

The key annotations are:

AnnotationMeaning
cert-manager.io/cluster-issuerName of the ClusterIssuer to use
cert-manager.io/issuerName of the namespaced Issuer to use
cert-manager.io/common-nameSets the certificate CN
cert-manager.io/durationCertificate lifetime
cert-manager.io/renew-beforeRenewal timing

One caveat: do not directly edit a Certificate that ingress-shim auto-generated. On the next reconcile loop, ingress-shim overwrites it based on the Ingress spec. If you need fine-grained control, it is better to remove the annotation and manage the Certificate directly.

Operations: Monitoring Renewal and Expiry

cert-manager handles renewal automatically, but the attitude of "it is automatic, so I do not need to worry" is dangerous. If renewal silently fails due to expired DNS API credentials, hitting a rate limit, or a misconfigured solver, the certificate will eventually expire. Monitoring is therefore essential.

Checking Certificate Status

# See the status of all Certificates at a glance
kubectl get certificate -A

# Details for a specific certificate (Ready condition, expiry date)
kubectl describe certificate app-tls -n production

# The issuance progress cert-manager tracks
kubectl get certificaterequest,order,challenge -A

In the output of kubectl get certificate, the READY column should be True for a healthy state. If it is False, use describe to check the Events and Conditions.

Prometheus Metrics

cert-manager exposes Prometheus metrics on port 9402 by default. The most important metric is the certificate expiry time.

# Certificate expiry time (Unix epoch seconds)
certmanager_certificate_expiration_timestamp_seconds

# Certificate renewal time
certmanager_certificate_renewal_timestamp_seconds

# Ready status gauge (1=Ready, 0=NotReady)
certmanager_certificate_ready_status

# ACME client request count (by status code)
certmanager_http_acme_client_request_count

# Controller queue sync latency
certmanager_controller_sync_call_count

You build near-expiry alerts with PromQL as below. The expressions themselves must be kept inside a code block to stay safe.

# Alert on certs expiring within 7 days (604800 seconds)
certmanager_certificate_expiration_timestamp_seconds - time() < 604800

# Alert on certs not in Ready state
certmanager_certificate_ready_status == 0

An example Alertmanager rule looks like this.

groups:
  - name: cert-manager
    rules:
      - alert: CertificateExpiringSoon
        expr: certmanager_certificate_expiration_timestamp_seconds - time() < 604800
        for: 1h
        labels:
          severity: warning
        annotations:
          summary: "Certificate expiring within 7 days"
          description: "Check the namespace/name labels"
      - alert: CertificateNotReady
        expr: certmanager_certificate_ready_status == 0
        for: 30m
        labels:
          severity: critical
        annotations:
          summary: "Certificate is not in Ready state"

Use a ServiceMonitor so that the Prometheus Operator scrapes cert-manager metrics.

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: cert-manager
  namespace: cert-manager
  labels:
    release: kube-prometheus-stack
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: cert-manager
  endpoints:
    - port: http-metrics
      interval: 60s

Managing Rate Limits

The Let’s Encrypt production endpoint enforces strict rate limits. Notably, there is a limit of 50 certificates per registered domain per week. Always test against the staging endpoint, and only issue from production when you actually need it. Repeatedly creating and destroying clusters in CI pipelines hits the limit quickly, so be careful.

Internal CA and Vault Issuer

Not every certificate has to come from a public CA. For service-to-service mTLS or internal domains, issuing from an internal CA is common.

Self CA Issuer

First store a root or intermediate CA key pair as a Secret, then create a CA Issuer.

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: internal-ca
spec:
  ca:
    secretName: internal-ca-key-pair
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: internal-svc-tls
  namespace: backend
spec:
  secretName: internal-svc-tls
  dnsNames:
    - payments.svc.cluster.local
  issuerRef:
    name: internal-ca
    kind: ClusterIssuer

This approach needs no ACME challenge, so it issues immediately with no inbound or DNS requirements. However, you must distribute the CA bundle so that clients trust your internal CA.

HashiCorp Vault Issuer

Using Vault's PKI secrets engine as the backend lets you leverage centralized issuance policy and audit logs.

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: vault-issuer
spec:
  vault:
    server: https://vault.internal.example.com:8200
    path: pki_int/sign/example-dot-com
    auth:
      kubernetes:
        role: cert-manager
        mountPath: /v1/auth/kubernetes
        serviceAccountRef:
          name: cert-manager-vault

The Vault Issuer authenticates with a ServiceAccount token via Vault's Kubernetes auth method. path is the signing path of the Vault PKI engine, and on the Vault side you define the allowed domains and lifetime policy for that role. This structure centralizes certificate issuance authority through Vault policy, which is advantageous in regulated environments.

Relationship with the Gateway API

As mentioned earlier, the Ingress API is frozen and the Gateway API is the successor standard. cert-manager supports automatic certificate issuance for the Gateway API too. Just as ingress-shim reads Ingress annotations to create a Certificate, it reads annotations on a Gateway resource to create a Certificate.

apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: app-gateway
  namespace: production
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
  gatewayClassName: nginx
  listeners:
    - name: https
      protocol: HTTPS
      port: 443
      hostname: app.example.com
      tls:
        mode: Terminate
        certificateRefs:
          - name: app-tls
            kind: Secret

There was a period when this feature had to be enabled through an experimental feature gate, but in recent versions Gateway API support is becoming a first-class feature. You enable Gateway API support with a cert-manager controller flag, and it auto-creates a Certificate using the listener's hostname as dnsNames and the first Secret in certificateRefs as secretName. For a greenfield cluster, adopting Gateway API based TLS automation from the start instead of Ingress is recommended.

Troubleshooting: Diagnosing Challenge Failures

Challenge failures are the most common problem in operating cert-manager. Following the diagnostic flow below will narrow down most causes.

        Certificate is Ready=False
   check kubectl describe certificate
        What message is in Events?
      ┌─────────┴──────────┐
      │                    │
 Order created        No Order / error
      │                    │
      ▼                    ▼
 kubectl describe       issuerRef typo?
   order ...            ClusterIssuer Ready?
      │                 ACME account reg failed?
 Check Challenge status
 kubectl describe challenge ...
 ┌────┴─────┐
 │          │
HTTP-01    DNS-01
 │          │
 ▼          ▼
1. DNS A record ->     1. DNS API creds valid?
   Ingress IP?         2. TXT record actually created?
2. Port 80 reachable   3. DNS propagation done?
   externally?         4. CNAME delegation set up?
3. Challenge path       5. zone selector matches?
   returns 200?
4. Does forced HTTPS
   redirect block
   the port 80 check?

Common HTTP-01 Failures

The most frequent failure is when the ACME validation request arriving on port 80 is blocked by a forced HTTPS redirect. If ingress-nginx's ssl-redirect is on, it can redirect even the validation path to HTTPS and break validation. The temporary validation Ingress that cert-manager creates is designed to bypass this, but a strong global setting can still cause problems.

Use the following commands to check whether the validation path actually responds. Get the token portion of the URL from the real challenge resource.

# Check in-progress challenges
kubectl get challenge -A

# Get the token and URL from the challenge details
kubectl describe challenge <challenge-name> -n production

# Call the validation path directly from outside (TOKEN from above)
curl -v http://app.example.com/.well-known/acme-challenge/TOKEN

If the response is not 200, or it redirects to HTTPS with 301/302, review your routing and redirect settings.

Common DNS-01 Failures

For DNS-01, the key is to directly verify whether the TXT record was actually created and whether it has propagated.

# Directly look up the TXT record cert-manager registered
dig +short TXT _acme-challenge.example.com

# Query the authoritative nameserver directly
dig @ns-1.example-dns.com TXT _acme-challenge.example.com

# Check the DNS provider error in the challenge events
kubectl describe challenge <challenge-name> -n production

If the DNS API credentials have expired or lack permissions, TXT record creation itself fails. Also, if you use CNAME delegation, you must check that the record was created in the delegated target zone. With slow-propagating providers you may need to increase the propagation timeout.

Diagnosing Quickly with cmctl

cert-manager provides a diagnostic CLI called cmctl.

# Summarize certificate status
cmctl status certificate app-tls -n production

# Trigger a forced renewal
cmctl renew app-tls -n production

# Track issuance progress
cmctl inspect secret app-tls -n production

Operations Checklist

Checking the following items before and after a production rollout can greatly reduce incidents.

  • Did you validate with the staging ClusterIssuer first before switching to production?
  • Is the ClusterIssuer in Ready state and is the ACME account registered correctly?
  • Do wildcard certificates use a DNS-01 solver?
  • Do the DNS-01 API credentials have an expiry/rotation policy?
  • Is there an expiry alert based on certmanager_certificate_expiration_timestamp_seconds?
  • Is there an alert on certmanager_certificate_ready_status == 0?
  • Is the renewBefore value generous enough given your operational response time?
  • Are certificate Secrets backed up or reproducible via GitOps?
  • Have you ensured high availability for the cert-manager webhook pods (two or more replicas)?
  • Is the Let’s Encrypt rate limit isolated so CI does not exhaust it?
  • If you use an internal CA, is the CA bundle distributed to clients?
  • For a greenfield build, have you considered Gateway API based TLS automation?

Conclusion

cert-manager turns TLS certificate management from "an operational burden you must tend to regularly" into "infrastructure that just runs once you declare it." The key is to understand the CRD model precisely, choose the challenge type that fits your environment, and build monitoring alongside automation rather than blindly trusting it.

HTTP-01 is simple for a public single domain, while DNS-01 is essential for wildcards and private environments. The annotation-based ingress-shim solves most use cases with a single line, but you can manage the Certificate directly when you need fine-grained control. By centralizing internal certificates with an internal CA and the Vault Issuer, you can manage TLS across the entire cluster in one consistent flow.

Finally, to re-emphasize the 2026 direction: the Ingress API is frozen and the Gateway API is its successor. If you are building anew now, considering Gateway API based TLS automation from the start is the future-proof choice.

References