- Published on
Automating Ingress TLS — Fully Automated Certificates with cert-manager and ACME
- Authors

- Name
- Youngju Kim
- @fjvbn20031
- Introduction
- cert-manager Architecture
- ACME Challenges: Proving Domain Ownership
- Configuring a ClusterIssuer
- Writing the Certificate Resource Directly
- Ingress Integration (ingress-shim)
- Operations: Monitoring Renewal and Expiry
- Internal CA and Vault Issuer
- Relationship with the Gateway API
- Troubleshooting: Diagnosing Challenge Failures
- Operations Checklist
- Conclusion
- References
Introduction
Think back to the days of managing TLS certificates by hand on a production cluster: setting calendar reminders to renew those 90-day Let’s Encrypt certificates, scrambling to run a renewal script right before expiry. Forget a single certificate and you would get a 3 a.m. page that "the site is down."
cert-manager solves this problem at its root. As a Kubernetes-native controller, it manages the entire certificate lifecycle (issuance, renewal, revocation) declaratively. Add a single annotation to your Ingress, and cert-manager handles the rest: it talks to Let’s Encrypt over the ACME protocol, obtains a certificate, and renews it automatically before expiry.
A note on the 2026 landscape is also in order. The Ingress API is effectively frozen, with no new features being added. The successor standard is the Gateway API, and cert-manager supports TLS automation for the Gateway API as well. ingress-nginx has moved into maintenance mode, receiving mostly security patches, so for a greenfield build you should consider the Gateway API alongside it. This article focuses on Ingress, but covers the Gateway API integration at the end.
The goals of this article are:
- Understanding cert-manager's CRD model and control flow precisely
- Establishing the difference between HTTP-01 and DNS-01 challenges and when to choose each
- Issuing wildcard certificates safely
- Grasping how the annotation-based integration (ingress-shim) works
- Monitoring renewals and expiry with Prometheus
- Integrating internal CAs and the HashiCorp Vault Issuer
- Diagnosing challenge failures methodically
cert-manager Architecture
cert-manager consists of several CRDs (Custom Resource Definitions) and the controllers that reconcile them. Understanding the relationships between the core resources first makes the whole flow clear.
┌──────────────────────────────────────────────────────────┐
│ Issuer / ClusterIssuer (defines the issuing authority) │
│ - ACME(Let's Encrypt), CA, Vault, SelfSigned ... │
└───────────────────────────┬──────────────────────────────┘
│ referenced by
▼
┌──────────────────────────────────────────────────────────┐
│ Certificate (declarative spec of desired cert)│
│ - dnsNames, secretName, issuerRef, duration ... │
└───────────────────────────┬──────────────────────────────┘
│ creates an issuance request
▼
┌──────────────────────────────────────────────────────────┐
│ CertificateRequest (single issuance attempt, has CSR)│
└───────────────────────────┬──────────────────────────────┘
│ when ACME
▼
┌──────────────────────────────────────────────────────────┐
│ Order (an ACME order = one transaction)│
└───────────────────────────┬──────────────────────────────┘
│ domain ownership validation
▼
┌──────────────────────────────────────────────────────────┐
│ Challenge (HTTP-01 or DNS-01 validation) │
└───────────────────────────┬──────────────────────────────┘
│ on success
▼
┌──────────────────────────────────────────────────────────┐
│ Secret (kubernetes.io/tls) stores tls.crt + tls.key │
└──────────────────────────────────────────────────────────┘
The roles of each resource are summarized below.
| Resource | Scope | Role |
|---|---|---|
| Issuer | Namespace | Issues certificates only within its namespace |
| ClusterIssuer | Cluster-wide | A shared issuing authority for all namespaces |
| Certificate | Namespace | Declares the desired end state of a certificate |
| CertificateRequest | Namespace | Represents a single CSR issuance attempt |
| Order | Namespace | An ACME issuance transaction (a set of domains) |
| Challenge | Namespace | A unit of domain ownership proof (HTTP-01/DNS-01) |
The important mental model here is that the Certificate represents the desired state, and the remaining child resources are intermediate artifacts that cert-manager automatically creates and destroys to achieve that state. Operators typically deal directly only with the Issuer/ClusterIssuer and the Certificate (or Ingress annotations), and only look at CertificateRequest/Order/Challenge when debugging.
Installation
You install cert-manager with Helm or static manifests. In production, managing the CRDs explicitly is recommended.
# Add the Helm repository
helm repo add jetstack https://charts.jetstack.io
helm repo update
# Install with CRDs included
helm install cert-manager jetstack/cert-manager \
--namespace cert-manager \
--create-namespace \
--version v1.17.0 \
--set crds.enabled=true
# Verify installation
kubectl get pods -n cert-manager
kubectl get crd | grep cert-manager.io
Once installation completes, three core pods come up: cert-manager (the controller), cert-manager-webhook (the validating/mutating webhook), and cert-manager-cainjector (the CA bundle injector). If the webhook is not working, applying CRDs will be rejected outright, so always verify the webhook pod's status right after installing.
ACME Challenges: Proving Domain Ownership
For an ACME CA such as Let’s Encrypt to issue a certificate, the requester must prove that it actually controls the domain in question. The ACME protocol (RFC 8555) defines this as a challenge, and cert-manager supports two types: HTTP-01 and DNS-01.
HTTP-01 Challenge Flow
User domain: app.example.com
1. cert-manager -> ACME server: orders a cert for app.example.com
2. ACME server -> cert-manager: issues token + validation path
3. cert-manager: creates temporary pod/service/Ingress
prepares response at /.well-known/acme-challenge/<token>
4. ACME server -> http://app.example.com/.well-known/acme-challenge/<token>
(HTTP request on port 80)
5. If the response matches the expected key authorization, validation passes
6. ACME server -> cert-manager: issues the certificate
7. cert-manager: cleans up temporary resources + stores the Secret
HTTP-01 requires inbound internet traffic on port 80 to reach the cluster's Ingress. In other words, the domain's A record must point to the Ingress's public IP, and port 80 must be open in the firewall.
DNS-01 Challenge Flow
User domain: app.example.com (or *.example.com)
1. cert-manager -> ACME server: orders a certificate
2. ACME server -> cert-manager: issues a token
3. cert-manager -> DNS provider API:
registers a validation value in a _acme-challenge.example.com TXT record
4. ACME server -> DNS lookup: checks the _acme-challenge.example.com TXT
5. If the TXT value matches, validation passes
6. ACME server -> cert-manager: issues the certificate
7. cert-manager: cleans up the TXT record + stores the Secret
DNS-01 requires no inbound traffic at all. Instead, cert-manager must have API credentials for your DNS provider (Route53, Cloud DNS, Cloudflare, and so on) so it can dynamically create and delete TXT records.
HTTP-01 vs DNS-01 Comparison
| Aspect | HTTP-01 | DNS-01 |
|---|---|---|
| Validation method | HTTP response on port 80 | DNS TXT record |
| Requires inbound traffic | Yes (public IP + port 80) | No |
| Wildcard certificates | Not possible | Possible (required) |
| DNS API credentials | Not needed | Needed |
| Internal/private clusters | Difficult | Suitable |
| Validation latency | Fast (seconds) | Must wait for DNS propagation |
| Multi-domain SAN | HTTP validation per domain | Batched TXT validation |
| Common failure causes | Routing/firewall/redirects | DNS permissions/propagation lag |
To simplify the decision: if you have a public single domain and can open inbound port 80, HTTP-01 is the simplest. If you need a wildcard certificate, run a private cluster where you cannot open inbound traffic, or want to handle many domains at once, choose DNS-01.
Configuring a ClusterIssuer
In most environments you use a cluster-wide ClusterIssuer that can be shared. It is recommended to create both a Let’s Encrypt staging and a production issuer first. Staging has generous rate limits and is ideal for testing, while production issues trusted certificates but enforces strict issuance limits.
HTTP-01 ClusterIssuer
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-staging
spec:
acme:
server: https://acme-staging-v02.api.letsencrypt.org/directory
email: platform@example.com
privateKeySecretRef:
name: letsencrypt-staging-account-key
solvers:
- http01:
ingress:
ingressClassName: nginx
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-prod
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: platform@example.com
privateKeySecretRef:
name: letsencrypt-prod-account-key
solvers:
- http01:
ingress:
ingressClassName: nginx
privateKeySecretRef is the name of the Secret that stores the ACME account key. When you first apply the ClusterIssuer, cert-manager automatically registers an ACME account and keeps the account key in this Secret. Be careful: deleting this Secret triggers an account re-registration.
DNS-01 ClusterIssuer (Route53 Example)
To use DNS-01 you need DNS provider API credentials. Taking AWS Route53 as an example, you use IAM credentials or IRSA (IAM Roles for Service Accounts).
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-dns
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: platform@example.com
privateKeySecretRef:
name: letsencrypt-dns-account-key
solvers:
- dns01:
route53:
region: ap-northeast-2
hostedZoneID: Z0123456789ABCDEFGHIJ
# With IRSA you can omit accessKeyID/secretAccessKey
selector:
dnsZones:
- example.com
You can also branch between multiple solvers using selectors. For example, you can define two solvers in a single ClusterIssuer so that *.example.com is handled via DNS-01 while other public domains use HTTP-01.
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-mixed
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: platform@example.com
privateKeySecretRef:
name: letsencrypt-mixed-account-key
solvers:
- dns01:
route53:
region: ap-northeast-2
hostedZoneID: Z0123456789ABCDEFGHIJ
selector:
dnsZones:
- example.com
- http01:
ingress:
ingressClassName: nginx
cert-manager picks the solver whose selector dnsZones most specifically match the requested domain.
Writing the Certificate Resource Directly
Using Ingress annotations auto-generates the Certificate, but writing the Certificate explicitly lets you control issuance policy more finely.
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: app-tls
namespace: production
spec:
secretName: app-tls
duration: 2160h # 90 days
renewBefore: 720h # renew 30 days before expiry
privateKey:
algorithm: ECDSA
size: 256
rotationPolicy: Always
dnsNames:
- app.example.com
- www.app.example.com
issuerRef:
name: letsencrypt-prod
kind: ClusterIssuer
group: cert-manager.io
The key fields:
secretName: the Secret where the issued certificate is stored. Its type iskubernetes.io/tls, withtls.crtandtls.keykeys.renewBefore: how many days before expiry to renew. The default is one-third of the certificate's lifetime before expiry.privateKey.rotationPolicy: Alwaysgenerates a new key on every renewal. This is recommended for security.issuerRef: specifies which Issuer/ClusterIssuer to use.
Wildcard Certificates
A wildcard certificate (*.example.com) can only be issued via the DNS-01 challenge. HTTP-01 does not support wildcards, because under the ACME specification, proving ownership of a wildcard domain is only possible through DNS records.
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: wildcard-example-tls
namespace: production
spec:
secretName: wildcard-example-tls
dnsNames:
- "*.example.com"
- "example.com"
issuerRef:
name: letsencrypt-dns
kind: ClusterIssuer
group: cert-manager.io
As shown above, it is common to include both the wildcard and the apex domain as SANs. The wildcard *.example.com does not cover example.com itself, so both are needed.
Ingress Integration (ingress-shim)
cert-manager's most convenient feature is annotation-based automation on Ingress. The component responsible for this is ingress-shim. When you attach a specific annotation to an Ingress resource, ingress-shim reads that Ingress's TLS spec and automatically creates and manages a Certificate resource.
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: app-ingress
namespace: production
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
ingressClassName: nginx
tls:
- hosts:
- app.example.com
secretName: app-tls
rules:
- host: app.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: app-svc
port:
number: 8080
Applying this manifest works in the following order.
1. Ingress is created (with the cert-manager.io/cluster-issuer annotation)
2. ingress-shim reads tls[].secretName and hosts
3. ingress-shim auto-creates a Certificate resource
- dnsNames = tls[].hosts
- secretName = tls[].secretName
- issuerRef = the cluster-issuer from the annotation
4. The main cert-manager controller processes the Certificate
-> CertificateRequest -> Order -> Challenge -> Secret issued
5. The Ingress Controller uses the issued Secret for TLS termination
The key annotations are:
| Annotation | Meaning |
|---|---|
| cert-manager.io/cluster-issuer | Name of the ClusterIssuer to use |
| cert-manager.io/issuer | Name of the namespaced Issuer to use |
| cert-manager.io/common-name | Sets the certificate CN |
| cert-manager.io/duration | Certificate lifetime |
| cert-manager.io/renew-before | Renewal timing |
One caveat: do not directly edit a Certificate that ingress-shim auto-generated. On the next reconcile loop, ingress-shim overwrites it based on the Ingress spec. If you need fine-grained control, it is better to remove the annotation and manage the Certificate directly.
Operations: Monitoring Renewal and Expiry
cert-manager handles renewal automatically, but the attitude of "it is automatic, so I do not need to worry" is dangerous. If renewal silently fails due to expired DNS API credentials, hitting a rate limit, or a misconfigured solver, the certificate will eventually expire. Monitoring is therefore essential.
Checking Certificate Status
# See the status of all Certificates at a glance
kubectl get certificate -A
# Details for a specific certificate (Ready condition, expiry date)
kubectl describe certificate app-tls -n production
# The issuance progress cert-manager tracks
kubectl get certificaterequest,order,challenge -A
In the output of kubectl get certificate, the READY column should be True for a healthy state. If it is False, use describe to check the Events and Conditions.
Prometheus Metrics
cert-manager exposes Prometheus metrics on port 9402 by default. The most important metric is the certificate expiry time.
# Certificate expiry time (Unix epoch seconds)
certmanager_certificate_expiration_timestamp_seconds
# Certificate renewal time
certmanager_certificate_renewal_timestamp_seconds
# Ready status gauge (1=Ready, 0=NotReady)
certmanager_certificate_ready_status
# ACME client request count (by status code)
certmanager_http_acme_client_request_count
# Controller queue sync latency
certmanager_controller_sync_call_count
You build near-expiry alerts with PromQL as below. The expressions themselves must be kept inside a code block to stay safe.
# Alert on certs expiring within 7 days (604800 seconds)
certmanager_certificate_expiration_timestamp_seconds - time() < 604800
# Alert on certs not in Ready state
certmanager_certificate_ready_status == 0
An example Alertmanager rule looks like this.
groups:
- name: cert-manager
rules:
- alert: CertificateExpiringSoon
expr: certmanager_certificate_expiration_timestamp_seconds - time() < 604800
for: 1h
labels:
severity: warning
annotations:
summary: "Certificate expiring within 7 days"
description: "Check the namespace/name labels"
- alert: CertificateNotReady
expr: certmanager_certificate_ready_status == 0
for: 30m
labels:
severity: critical
annotations:
summary: "Certificate is not in Ready state"
Use a ServiceMonitor so that the Prometheus Operator scrapes cert-manager metrics.
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: cert-manager
namespace: cert-manager
labels:
release: kube-prometheus-stack
spec:
selector:
matchLabels:
app.kubernetes.io/name: cert-manager
endpoints:
- port: http-metrics
interval: 60s
Managing Rate Limits
The Let’s Encrypt production endpoint enforces strict rate limits. Notably, there is a limit of 50 certificates per registered domain per week. Always test against the staging endpoint, and only issue from production when you actually need it. Repeatedly creating and destroying clusters in CI pipelines hits the limit quickly, so be careful.
Internal CA and Vault Issuer
Not every certificate has to come from a public CA. For service-to-service mTLS or internal domains, issuing from an internal CA is common.
Self CA Issuer
First store a root or intermediate CA key pair as a Secret, then create a CA Issuer.
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: internal-ca
spec:
ca:
secretName: internal-ca-key-pair
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: internal-svc-tls
namespace: backend
spec:
secretName: internal-svc-tls
dnsNames:
- payments.svc.cluster.local
issuerRef:
name: internal-ca
kind: ClusterIssuer
This approach needs no ACME challenge, so it issues immediately with no inbound or DNS requirements. However, you must distribute the CA bundle so that clients trust your internal CA.
HashiCorp Vault Issuer
Using Vault's PKI secrets engine as the backend lets you leverage centralized issuance policy and audit logs.
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: vault-issuer
spec:
vault:
server: https://vault.internal.example.com:8200
path: pki_int/sign/example-dot-com
auth:
kubernetes:
role: cert-manager
mountPath: /v1/auth/kubernetes
serviceAccountRef:
name: cert-manager-vault
The Vault Issuer authenticates with a ServiceAccount token via Vault's Kubernetes auth method. path is the signing path of the Vault PKI engine, and on the Vault side you define the allowed domains and lifetime policy for that role. This structure centralizes certificate issuance authority through Vault policy, which is advantageous in regulated environments.
Relationship with the Gateway API
As mentioned earlier, the Ingress API is frozen and the Gateway API is the successor standard. cert-manager supports automatic certificate issuance for the Gateway API too. Just as ingress-shim reads Ingress annotations to create a Certificate, it reads annotations on a Gateway resource to create a Certificate.
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: app-gateway
namespace: production
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
gatewayClassName: nginx
listeners:
- name: https
protocol: HTTPS
port: 443
hostname: app.example.com
tls:
mode: Terminate
certificateRefs:
- name: app-tls
kind: Secret
There was a period when this feature had to be enabled through an experimental feature gate, but in recent versions Gateway API support is becoming a first-class feature. You enable Gateway API support with a cert-manager controller flag, and it auto-creates a Certificate using the listener's hostname as dnsNames and the first Secret in certificateRefs as secretName. For a greenfield cluster, adopting Gateway API based TLS automation from the start instead of Ingress is recommended.
Troubleshooting: Diagnosing Challenge Failures
Challenge failures are the most common problem in operating cert-manager. Following the diagnostic flow below will narrow down most causes.
Certificate is Ready=False
│
▼
check kubectl describe certificate
│
What message is in Events?
│
┌─────────┴──────────┐
│ │
Order created No Order / error
│ │
▼ ▼
kubectl describe issuerRef typo?
order ... ClusterIssuer Ready?
│ ACME account reg failed?
▼
Check Challenge status
kubectl describe challenge ...
│
┌────┴─────┐
│ │
HTTP-01 DNS-01
│ │
▼ ▼
1. DNS A record -> 1. DNS API creds valid?
Ingress IP? 2. TXT record actually created?
2. Port 80 reachable 3. DNS propagation done?
externally? 4. CNAME delegation set up?
3. Challenge path 5. zone selector matches?
returns 200?
4. Does forced HTTPS
redirect block
the port 80 check?
Common HTTP-01 Failures
The most frequent failure is when the ACME validation request arriving on port 80 is blocked by a forced HTTPS redirect. If ingress-nginx's ssl-redirect is on, it can redirect even the validation path to HTTPS and break validation. The temporary validation Ingress that cert-manager creates is designed to bypass this, but a strong global setting can still cause problems.
Use the following commands to check whether the validation path actually responds. Get the token portion of the URL from the real challenge resource.
# Check in-progress challenges
kubectl get challenge -A
# Get the token and URL from the challenge details
kubectl describe challenge <challenge-name> -n production
# Call the validation path directly from outside (TOKEN from above)
curl -v http://app.example.com/.well-known/acme-challenge/TOKEN
If the response is not 200, or it redirects to HTTPS with 301/302, review your routing and redirect settings.
Common DNS-01 Failures
For DNS-01, the key is to directly verify whether the TXT record was actually created and whether it has propagated.
# Directly look up the TXT record cert-manager registered
dig +short TXT _acme-challenge.example.com
# Query the authoritative nameserver directly
dig @ns-1.example-dns.com TXT _acme-challenge.example.com
# Check the DNS provider error in the challenge events
kubectl describe challenge <challenge-name> -n production
If the DNS API credentials have expired or lack permissions, TXT record creation itself fails. Also, if you use CNAME delegation, you must check that the record was created in the delegated target zone. With slow-propagating providers you may need to increase the propagation timeout.
Diagnosing Quickly with cmctl
cert-manager provides a diagnostic CLI called cmctl.
# Summarize certificate status
cmctl status certificate app-tls -n production
# Trigger a forced renewal
cmctl renew app-tls -n production
# Track issuance progress
cmctl inspect secret app-tls -n production
Operations Checklist
Checking the following items before and after a production rollout can greatly reduce incidents.
- Did you validate with the staging ClusterIssuer first before switching to production?
- Is the ClusterIssuer in Ready state and is the ACME account registered correctly?
- Do wildcard certificates use a DNS-01 solver?
- Do the DNS-01 API credentials have an expiry/rotation policy?
- Is there an expiry alert based on
certmanager_certificate_expiration_timestamp_seconds? - Is there an alert on
certmanager_certificate_ready_status == 0? - Is the renewBefore value generous enough given your operational response time?
- Are certificate Secrets backed up or reproducible via GitOps?
- Have you ensured high availability for the cert-manager webhook pods (two or more replicas)?
- Is the Let’s Encrypt rate limit isolated so CI does not exhaust it?
- If you use an internal CA, is the CA bundle distributed to clients?
- For a greenfield build, have you considered Gateway API based TLS automation?
Conclusion
cert-manager turns TLS certificate management from "an operational burden you must tend to regularly" into "infrastructure that just runs once you declare it." The key is to understand the CRD model precisely, choose the challenge type that fits your environment, and build monitoring alongside automation rather than blindly trusting it.
HTTP-01 is simple for a public single domain, while DNS-01 is essential for wildcards and private environments. The annotation-based ingress-shim solves most use cases with a single line, but you can manage the Certificate directly when you need fine-grained control. By centralizing internal certificates with an internal CA and the Vault Issuer, you can manage TLS across the entire cluster in one consistent flow.
Finally, to re-emphasize the 2026 direction: the Ingress API is frozen and the Gateway API is its successor. If you are building anew now, considering Gateway API based TLS automation from the start is the future-proof choice.
References
- cert-manager official docs: https://cert-manager.io/docs/
- cert-manager Ingress usage: https://cert-manager.io/docs/usage/ingress/
- cert-manager ACME configuration: https://cert-manager.io/docs/configuration/acme/
- cert-manager Gateway API support: https://cert-manager.io/docs/usage/gateway/
- Kubernetes Ingress concepts: https://kubernetes.io/docs/concepts/services-networking/ingress/
- ingress-nginx official docs: https://kubernetes.github.io/ingress-nginx/
- Let’s Encrypt docs and rate limits: https://letsencrypt.org/docs/
- Gateway API: https://gateway-api.sigs.k8s.io/
- RFC 8555 (ACME protocol): https://datatracker.ietf.org/doc/html/rfc8555
- HashiCorp Vault PKI secrets engine: https://developer.hashicorp.com/vault/docs/secrets/pki
- cert-manager Vault Issuer: https://cert-manager.io/docs/configuration/vault/