Skip to content

필사 모드: Automating Ingress TLS — Fully Automated Certificates with cert-manager and ACME

English
0%
정확도 0%
💡 왼쪽 원문을 읽으면서 오른쪽에 따라 써보세요. Tab 키로 힌트를 받을 수 있습니다.
원문 렌더가 준비되기 전까지 텍스트 가이드로 표시합니다.

Introduction

Think back to the days of managing TLS certificates by hand on a production cluster: setting calendar reminders to renew those 90-day Let’s Encrypt certificates, scrambling to run a renewal script right before expiry. Forget a single certificate and you would get a 3 a.m. page that "the site is down."

cert-manager solves this problem at its root. As a Kubernetes-native controller, it manages the entire certificate lifecycle (issuance, renewal, revocation) declaratively. Add a single annotation to your Ingress, and cert-manager handles the rest: it talks to Let’s Encrypt over the ACME protocol, obtains a certificate, and renews it automatically before expiry.

A note on the 2026 landscape is also in order. The Ingress API is effectively frozen, with no new features being added. The successor standard is the Gateway API, and cert-manager supports TLS automation for the Gateway API as well. ingress-nginx has moved into maintenance mode, receiving mostly security patches, so for a greenfield build you should consider the Gateway API alongside it. This article focuses on Ingress, but covers the Gateway API integration at the end.

The goals of this article are:

- Understanding cert-manager's CRD model and control flow precisely

- Establishing the difference between HTTP-01 and DNS-01 challenges and when to choose each

- Issuing wildcard certificates safely

- Grasping how the annotation-based integration (ingress-shim) works

- Monitoring renewals and expiry with Prometheus

- Integrating internal CAs and the HashiCorp Vault Issuer

- Diagnosing challenge failures methodically

cert-manager Architecture

cert-manager consists of several CRDs (Custom Resource Definitions) and the controllers that reconcile them. Understanding the relationships between the core resources first makes the whole flow clear.

┌──────────────────────────────────────────────────────────┐

│ Issuer / ClusterIssuer (defines the issuing authority) │

│ - ACME(Let's Encrypt), CA, Vault, SelfSigned ... │

└───────────────────────────┬──────────────────────────────┘

│ referenced by

┌──────────────────────────────────────────────────────────┐

│ Certificate (declarative spec of desired cert)│

│ - dnsNames, secretName, issuerRef, duration ... │

└───────────────────────────┬──────────────────────────────┘

│ creates an issuance request

┌──────────────────────────────────────────────────────────┐

│ CertificateRequest (single issuance attempt, has CSR)│

└───────────────────────────┬──────────────────────────────┘

│ when ACME

┌──────────────────────────────────────────────────────────┐

│ Order (an ACME order = one transaction)│

└───────────────────────────┬──────────────────────────────┘

│ domain ownership validation

┌──────────────────────────────────────────────────────────┐

│ Challenge (HTTP-01 or DNS-01 validation) │

└───────────────────────────┬──────────────────────────────┘

│ on success

┌──────────────────────────────────────────────────────────┐

│ Secret (kubernetes.io/tls) stores tls.crt + tls.key │

└──────────────────────────────────────────────────────────┘

The roles of each resource are summarized below.

| Resource | Scope | Role |

|---|---|---|

| Issuer | Namespace | Issues certificates only within its namespace |

| ClusterIssuer | Cluster-wide | A shared issuing authority for all namespaces |

| Certificate | Namespace | Declares the desired end state of a certificate |

| CertificateRequest | Namespace | Represents a single CSR issuance attempt |

| Order | Namespace | An ACME issuance transaction (a set of domains) |

| Challenge | Namespace | A unit of domain ownership proof (HTTP-01/DNS-01) |

The important mental model here is that the Certificate represents the desired state, and the remaining child resources are intermediate artifacts that cert-manager automatically creates and destroys to achieve that state. Operators typically deal directly only with the Issuer/ClusterIssuer and the Certificate (or Ingress annotations), and only look at CertificateRequest/Order/Challenge when debugging.

Installation

You install cert-manager with Helm or static manifests. In production, managing the CRDs explicitly is recommended.

Add the Helm repository

helm repo add jetstack https://charts.jetstack.io

helm repo update

Install with CRDs included

helm install cert-manager jetstack/cert-manager \

--namespace cert-manager \

--create-namespace \

--version v1.17.0 \

--set crds.enabled=true

Verify installation

kubectl get pods -n cert-manager

kubectl get crd | grep cert-manager.io

Once installation completes, three core pods come up: `cert-manager` (the controller), `cert-manager-webhook` (the validating/mutating webhook), and `cert-manager-cainjector` (the CA bundle injector). If the webhook is not working, applying CRDs will be rejected outright, so always verify the webhook pod's status right after installing.

ACME Challenges: Proving Domain Ownership

For an ACME CA such as Let’s Encrypt to issue a certificate, the requester must prove that it actually controls the domain in question. The ACME protocol (RFC 8555) defines this as a challenge, and cert-manager supports two types: HTTP-01 and DNS-01.

HTTP-01 Challenge Flow

User domain: app.example.com

1. cert-manager -> ACME server: orders a cert for app.example.com

2. ACME server -> cert-manager: issues token + validation path

3. cert-manager: creates temporary pod/service/Ingress

prepares response at /.well-known/acme-challenge/<token>

4. ACME server -> http://app.example.com/.well-known/acme-challenge/<token>

(HTTP request on port 80)

5. If the response matches the expected key authorization, validation passes

6. ACME server -> cert-manager: issues the certificate

7. cert-manager: cleans up temporary resources + stores the Secret

HTTP-01 requires inbound internet traffic on port 80 to reach the cluster's Ingress. In other words, the domain's A record must point to the Ingress's public IP, and port 80 must be open in the firewall.

DNS-01 Challenge Flow

User domain: app.example.com (or *.example.com)

1. cert-manager -> ACME server: orders a certificate

2. ACME server -> cert-manager: issues a token

3. cert-manager -> DNS provider API:

registers a validation value in a _acme-challenge.example.com TXT record

4. ACME server -> DNS lookup: checks the _acme-challenge.example.com TXT

5. If the TXT value matches, validation passes

6. ACME server -> cert-manager: issues the certificate

7. cert-manager: cleans up the TXT record + stores the Secret

DNS-01 requires no inbound traffic at all. Instead, cert-manager must have API credentials for your DNS provider (Route53, Cloud DNS, Cloudflare, and so on) so it can dynamically create and delete TXT records.

HTTP-01 vs DNS-01 Comparison

| Aspect | HTTP-01 | DNS-01 |

|---|---|---|

| Validation method | HTTP response on port 80 | DNS TXT record |

| Requires inbound traffic | Yes (public IP + port 80) | No |

| Wildcard certificates | Not possible | Possible (required) |

| DNS API credentials | Not needed | Needed |

| Internal/private clusters | Difficult | Suitable |

| Validation latency | Fast (seconds) | Must wait for DNS propagation |

| Multi-domain SAN | HTTP validation per domain | Batched TXT validation |

| Common failure causes | Routing/firewall/redirects | DNS permissions/propagation lag |

To simplify the decision: if you have a public single domain and can open inbound port 80, HTTP-01 is the simplest. If you need a wildcard certificate, run a private cluster where you cannot open inbound traffic, or want to handle many domains at once, choose DNS-01.

Configuring a ClusterIssuer

In most environments you use a cluster-wide ClusterIssuer that can be shared. It is recommended to create both a Let’s Encrypt staging and a production issuer first. Staging has generous rate limits and is ideal for testing, while production issues trusted certificates but enforces strict issuance limits.

HTTP-01 ClusterIssuer

apiVersion: cert-manager.io/v1

kind: ClusterIssuer

metadata:

name: letsencrypt-staging

spec:

acme:

server: https://acme-staging-v02.api.letsencrypt.org/directory

email: platform@example.com

privateKeySecretRef:

name: letsencrypt-staging-account-key

solvers:

- http01:

ingress:

ingressClassName: nginx

apiVersion: cert-manager.io/v1

kind: ClusterIssuer

metadata:

name: letsencrypt-prod

spec:

acme:

server: https://acme-v02.api.letsencrypt.org/directory

email: platform@example.com

privateKeySecretRef:

name: letsencrypt-prod-account-key

solvers:

- http01:

ingress:

ingressClassName: nginx

`privateKeySecretRef` is the name of the Secret that stores the ACME account key. When you first apply the ClusterIssuer, cert-manager automatically registers an ACME account and keeps the account key in this Secret. Be careful: deleting this Secret triggers an account re-registration.

DNS-01 ClusterIssuer (Route53 Example)

To use DNS-01 you need DNS provider API credentials. Taking AWS Route53 as an example, you use IAM credentials or IRSA (IAM Roles for Service Accounts).

apiVersion: cert-manager.io/v1

kind: ClusterIssuer

metadata:

name: letsencrypt-dns

spec:

acme:

server: https://acme-v02.api.letsencrypt.org/directory

email: platform@example.com

privateKeySecretRef:

name: letsencrypt-dns-account-key

solvers:

- dns01:

route53:

region: ap-northeast-2

hostedZoneID: Z0123456789ABCDEFGHIJ

With IRSA you can omit accessKeyID/secretAccessKey

selector:

dnsZones:

- example.com

You can also branch between multiple solvers using selectors. For example, you can define two solvers in a single ClusterIssuer so that `*.example.com` is handled via DNS-01 while other public domains use HTTP-01.

apiVersion: cert-manager.io/v1

kind: ClusterIssuer

metadata:

name: letsencrypt-mixed

spec:

acme:

server: https://acme-v02.api.letsencrypt.org/directory

email: platform@example.com

privateKeySecretRef:

name: letsencrypt-mixed-account-key

solvers:

- dns01:

route53:

region: ap-northeast-2

hostedZoneID: Z0123456789ABCDEFGHIJ

selector:

dnsZones:

- example.com

- http01:

ingress:

ingressClassName: nginx

cert-manager picks the solver whose selector dnsZones most specifically match the requested domain.

Writing the Certificate Resource Directly

Using Ingress annotations auto-generates the Certificate, but writing the Certificate explicitly lets you control issuance policy more finely.

apiVersion: cert-manager.io/v1

kind: Certificate

metadata:

name: app-tls

namespace: production

spec:

secretName: app-tls

duration: 2160h # 90 days

renewBefore: 720h # renew 30 days before expiry

privateKey:

algorithm: ECDSA

size: 256

rotationPolicy: Always

dnsNames:

- app.example.com

- www.app.example.com

issuerRef:

name: letsencrypt-prod

kind: ClusterIssuer

group: cert-manager.io

The key fields:

- `secretName`: the Secret where the issued certificate is stored. Its type is `kubernetes.io/tls`, with `tls.crt` and `tls.key` keys.

- `renewBefore`: how many days before expiry to renew. The default is one-third of the certificate's lifetime before expiry.

- `privateKey.rotationPolicy: Always` generates a new key on every renewal. This is recommended for security.

- `issuerRef`: specifies which Issuer/ClusterIssuer to use.

Wildcard Certificates

A wildcard certificate (`*.example.com`) can only be issued via the DNS-01 challenge. HTTP-01 does not support wildcards, because under the ACME specification, proving ownership of a wildcard domain is only possible through DNS records.

apiVersion: cert-manager.io/v1

kind: Certificate

metadata:

name: wildcard-example-tls

namespace: production

spec:

secretName: wildcard-example-tls

dnsNames:

- "*.example.com"

- "example.com"

issuerRef:

name: letsencrypt-dns

kind: ClusterIssuer

group: cert-manager.io

As shown above, it is common to include both the wildcard and the apex domain as SANs. The wildcard `*.example.com` does not cover `example.com` itself, so both are needed.

Ingress Integration (ingress-shim)

cert-manager's most convenient feature is annotation-based automation on Ingress. The component responsible for this is ingress-shim. When you attach a specific annotation to an Ingress resource, ingress-shim reads that Ingress's TLS spec and automatically creates and manages a Certificate resource.

apiVersion: networking.k8s.io/v1

kind: Ingress

metadata:

name: app-ingress

namespace: production

annotations:

cert-manager.io/cluster-issuer: letsencrypt-prod

spec:

ingressClassName: nginx

tls:

- hosts:

- app.example.com

secretName: app-tls

rules:

- host: app.example.com

http:

paths:

- path: /

pathType: Prefix

backend:

service:

name: app-svc

port:

number: 8080

Applying this manifest works in the following order.

1. Ingress is created (with the cert-manager.io/cluster-issuer annotation)

2. ingress-shim reads tls[].secretName and hosts

3. ingress-shim auto-creates a Certificate resource

- dnsNames = tls[].hosts

- secretName = tls[].secretName

- issuerRef = the cluster-issuer from the annotation

4. The main cert-manager controller processes the Certificate

-> CertificateRequest -> Order -> Challenge -> Secret issued

5. The Ingress Controller uses the issued Secret for TLS termination

The key annotations are:

| Annotation | Meaning |

|---|---|

| cert-manager.io/cluster-issuer | Name of the ClusterIssuer to use |

| cert-manager.io/issuer | Name of the namespaced Issuer to use |

| cert-manager.io/common-name | Sets the certificate CN |

| cert-manager.io/duration | Certificate lifetime |

| cert-manager.io/renew-before | Renewal timing |

One caveat: do not directly edit a Certificate that ingress-shim auto-generated. On the next reconcile loop, ingress-shim overwrites it based on the Ingress spec. If you need fine-grained control, it is better to remove the annotation and manage the Certificate directly.

Operations: Monitoring Renewal and Expiry

cert-manager handles renewal automatically, but the attitude of "it is automatic, so I do not need to worry" is dangerous. If renewal silently fails due to expired DNS API credentials, hitting a rate limit, or a misconfigured solver, the certificate will eventually expire. Monitoring is therefore essential.

Checking Certificate Status

See the status of all Certificates at a glance

kubectl get certificate -A

Details for a specific certificate (Ready condition, expiry date)

kubectl describe certificate app-tls -n production

The issuance progress cert-manager tracks

kubectl get certificaterequest,order,challenge -A

In the output of `kubectl get certificate`, the READY column should be True for a healthy state. If it is False, use describe to check the Events and Conditions.

Prometheus Metrics

cert-manager exposes Prometheus metrics on port 9402 by default. The most important metric is the certificate expiry time.

Certificate expiry time (Unix epoch seconds)

certmanager_certificate_expiration_timestamp_seconds

Certificate renewal time

certmanager_certificate_renewal_timestamp_seconds

Ready status gauge (1=Ready, 0=NotReady)

certmanager_certificate_ready_status

ACME client request count (by status code)

certmanager_http_acme_client_request_count

Controller queue sync latency

certmanager_controller_sync_call_count

You build near-expiry alerts with PromQL as below. The expressions themselves must be kept inside a code block to stay safe.

Alert on certs expiring within 7 days (604800 seconds)

certmanager_certificate_expiration_timestamp_seconds - time() < 604800

Alert on certs not in Ready state

certmanager_certificate_ready_status == 0

An example Alertmanager rule looks like this.

groups:

- name: cert-manager

rules:

- alert: CertificateExpiringSoon

expr: certmanager_certificate_expiration_timestamp_seconds - time() < 604800

for: 1h

labels:

severity: warning

annotations:

summary: "Certificate expiring within 7 days"

description: "Check the namespace/name labels"

- alert: CertificateNotReady

expr: certmanager_certificate_ready_status == 0

for: 30m

labels:

severity: critical

annotations:

summary: "Certificate is not in Ready state"

Use a ServiceMonitor so that the Prometheus Operator scrapes cert-manager metrics.

apiVersion: monitoring.coreos.com/v1

kind: ServiceMonitor

metadata:

name: cert-manager

namespace: cert-manager

labels:

release: kube-prometheus-stack

spec:

selector:

matchLabels:

app.kubernetes.io/name: cert-manager

endpoints:

- port: http-metrics

interval: 60s

Managing Rate Limits

The Let’s Encrypt production endpoint enforces strict rate limits. Notably, there is a limit of 50 certificates per registered domain per week. Always test against the staging endpoint, and only issue from production when you actually need it. Repeatedly creating and destroying clusters in CI pipelines hits the limit quickly, so be careful.

Internal CA and Vault Issuer

Not every certificate has to come from a public CA. For service-to-service mTLS or internal domains, issuing from an internal CA is common.

Self CA Issuer

First store a root or intermediate CA key pair as a Secret, then create a CA Issuer.

apiVersion: cert-manager.io/v1

kind: ClusterIssuer

metadata:

name: internal-ca

spec:

ca:

secretName: internal-ca-key-pair

apiVersion: cert-manager.io/v1

kind: Certificate

metadata:

name: internal-svc-tls

namespace: backend

spec:

secretName: internal-svc-tls

dnsNames:

- payments.svc.cluster.local

issuerRef:

name: internal-ca

kind: ClusterIssuer

This approach needs no ACME challenge, so it issues immediately with no inbound or DNS requirements. However, you must distribute the CA bundle so that clients trust your internal CA.

HashiCorp Vault Issuer

Using Vault's PKI secrets engine as the backend lets you leverage centralized issuance policy and audit logs.

apiVersion: cert-manager.io/v1

kind: ClusterIssuer

metadata:

name: vault-issuer

spec:

vault:

server: https://vault.internal.example.com:8200

path: pki_int/sign/example-dot-com

auth:

kubernetes:

role: cert-manager

mountPath: /v1/auth/kubernetes

serviceAccountRef:

name: cert-manager-vault

The Vault Issuer authenticates with a ServiceAccount token via Vault's Kubernetes auth method. `path` is the signing path of the Vault PKI engine, and on the Vault side you define the allowed domains and lifetime policy for that role. This structure centralizes certificate issuance authority through Vault policy, which is advantageous in regulated environments.

Relationship with the Gateway API

As mentioned earlier, the Ingress API is frozen and the Gateway API is the successor standard. cert-manager supports automatic certificate issuance for the Gateway API too. Just as ingress-shim reads Ingress annotations to create a Certificate, it reads annotations on a Gateway resource to create a Certificate.

apiVersion: gateway.networking.k8s.io/v1

kind: Gateway

metadata:

name: app-gateway

namespace: production

annotations:

cert-manager.io/cluster-issuer: letsencrypt-prod

spec:

gatewayClassName: nginx

listeners:

- name: https

protocol: HTTPS

port: 443

hostname: app.example.com

tls:

mode: Terminate

certificateRefs:

- name: app-tls

kind: Secret

There was a period when this feature had to be enabled through an experimental feature gate, but in recent versions Gateway API support is becoming a first-class feature. You enable Gateway API support with a cert-manager controller flag, and it auto-creates a Certificate using the listener's hostname as dnsNames and the first Secret in certificateRefs as secretName. For a greenfield cluster, adopting Gateway API based TLS automation from the start instead of Ingress is recommended.

Troubleshooting: Diagnosing Challenge Failures

Challenge failures are the most common problem in operating cert-manager. Following the diagnostic flow below will narrow down most causes.

Certificate is Ready=False

check kubectl describe certificate

What message is in Events?

┌─────────┴──────────┐

│ │

Order created No Order / error

│ │

▼ ▼

kubectl describe issuerRef typo?

order ... ClusterIssuer Ready?

│ ACME account reg failed?

Check Challenge status

kubectl describe challenge ...

┌────┴─────┐

│ │

HTTP-01 DNS-01

│ │

▼ ▼

1. DNS A record -> 1. DNS API creds valid?

Ingress IP? 2. TXT record actually created?

2. Port 80 reachable 3. DNS propagation done?

externally? 4. CNAME delegation set up?

3. Challenge path 5. zone selector matches?

returns 200?

4. Does forced HTTPS

redirect block

the port 80 check?

Common HTTP-01 Failures

The most frequent failure is when the ACME validation request arriving on port 80 is blocked by a forced HTTPS redirect. If ingress-nginx's `ssl-redirect` is on, it can redirect even the validation path to HTTPS and break validation. The temporary validation Ingress that cert-manager creates is designed to bypass this, but a strong global setting can still cause problems.

Use the following commands to check whether the validation path actually responds. Get the token portion of the URL from the real challenge resource.

Check in-progress challenges

kubectl get challenge -A

Get the token and URL from the challenge details

kubectl describe challenge <challenge-name> -n production

Call the validation path directly from outside (TOKEN from above)

curl -v http://app.example.com/.well-known/acme-challenge/TOKEN

If the response is not 200, or it redirects to HTTPS with 301/302, review your routing and redirect settings.

Common DNS-01 Failures

For DNS-01, the key is to directly verify whether the TXT record was actually created and whether it has propagated.

Directly look up the TXT record cert-manager registered

dig +short TXT _acme-challenge.example.com

Query the authoritative nameserver directly

dig @ns-1.example-dns.com TXT _acme-challenge.example.com

Check the DNS provider error in the challenge events

kubectl describe challenge <challenge-name> -n production

If the DNS API credentials have expired or lack permissions, TXT record creation itself fails. Also, if you use CNAME delegation, you must check that the record was created in the delegated target zone. With slow-propagating providers you may need to increase the propagation timeout.

Diagnosing Quickly with cmctl

cert-manager provides a diagnostic CLI called `cmctl`.

Summarize certificate status

cmctl status certificate app-tls -n production

Trigger a forced renewal

cmctl renew app-tls -n production

Track issuance progress

cmctl inspect secret app-tls -n production

Operations Checklist

Checking the following items before and after a production rollout can greatly reduce incidents.

- Did you validate with the staging ClusterIssuer first before switching to production?

- Is the ClusterIssuer in Ready state and is the ACME account registered correctly?

- Do wildcard certificates use a DNS-01 solver?

- Do the DNS-01 API credentials have an expiry/rotation policy?

- Is there an expiry alert based on `certmanager_certificate_expiration_timestamp_seconds`?

- Is there an alert on `certmanager_certificate_ready_status == 0`?

- Is the renewBefore value generous enough given your operational response time?

- Are certificate Secrets backed up or reproducible via GitOps?

- Have you ensured high availability for the cert-manager webhook pods (two or more replicas)?

- Is the Let’s Encrypt rate limit isolated so CI does not exhaust it?

- If you use an internal CA, is the CA bundle distributed to clients?

- For a greenfield build, have you considered Gateway API based TLS automation?

Conclusion

cert-manager turns TLS certificate management from "an operational burden you must tend to regularly" into "infrastructure that just runs once you declare it." The key is to understand the CRD model precisely, choose the challenge type that fits your environment, and build monitoring alongside automation rather than blindly trusting it.

HTTP-01 is simple for a public single domain, while DNS-01 is essential for wildcards and private environments. The annotation-based ingress-shim solves most use cases with a single line, but you can manage the Certificate directly when you need fine-grained control. By centralizing internal certificates with an internal CA and the Vault Issuer, you can manage TLS across the entire cluster in one consistent flow.

Finally, to re-emphasize the 2026 direction: the Ingress API is frozen and the Gateway API is its successor. If you are building anew now, considering Gateway API based TLS automation from the start is the future-proof choice.

References

- cert-manager official docs: https://cert-manager.io/docs/

- cert-manager Ingress usage: https://cert-manager.io/docs/usage/ingress/

- cert-manager ACME configuration: https://cert-manager.io/docs/configuration/acme/

- cert-manager Gateway API support: https://cert-manager.io/docs/usage/gateway/

- Kubernetes Ingress concepts: https://kubernetes.io/docs/concepts/services-networking/ingress/

- ingress-nginx official docs: https://kubernetes.github.io/ingress-nginx/

- Let’s Encrypt docs and rate limits: https://letsencrypt.org/docs/

- Gateway API: https://gateway-api.sigs.k8s.io/

- RFC 8555 (ACME protocol): https://datatracker.ietf.org/doc/html/rfc8555

- HashiCorp Vault PKI secrets engine: https://developer.hashicorp.com/vault/docs/secrets/pki

- cert-manager Vault Issuer: https://cert-manager.io/docs/configuration/vault/

현재 단락 (1/438)

Think back to the days of managing TLS certificates by hand on a production cluster: setting calenda...

작성 글자: 0원문 글자: 22,035작성 단락: 0/438