Skip to content
Published on

SPIFFE/SPIRE Workload Identity — Service-to-Service Authentication Without Secrets

Authors

Introduction — Is Authentication Without Distributing Secrets Possible

The traditional answer to service-to-service authentication was secret distribution: mint API keys, shared passwords, and static certificates, then hand them out to each service. The result is today's secret sprawl. Credentials scattered across environment variables, CI variables, code repositories, and chat logs are a staple cause of breach incidents, and rotation has become the chore everyone postpones.

The 2026 trajectory is clear. Machines, services, pipelines, and now AI agents — in an environment where non-human identities outnumber human accounts by an order of magnitude or more, the model of "handing out" secrets is no longer operable. The alternative is a model where a workload has its identity proven from its properties (where it runs, who launched it) and automatically receives short-lived cryptographic identity documents. The standard for this model is SPIFFE (Secure Production Identity Framework For Everyone), and its reference implementation is SPIRE.

This post covers everything from SPIFFE ID and SVID concepts through SPIRE architecture, hands-on Kubernetes deployment, automatic mTLS via Envoy SDS integration, federation, Vault/cert-manager comparisons, and the extension to transaction tokens and AI agent identity.

The Secret Sprawl Problem and the Non-Human Identity Trend

First, the structural problems of traditional secret-based authentication.

ProblemDescription
Bootstrap paradoxSafely delivering a secret requires yet another secret (access credentials)
Rotation burdenLonger lifetimes mean bigger leak damage, yet rotation is manual and fragile
Unknown ownershipSix months later, nobody knows who created this API key or why
Trivial duplicationOnce copied, a secret is untraceable; uses are indistinguishable
Audit difficultyYou only know "someone with the key" — no workload-level identification

On top of this comes the explosion of non-human identity. Microservices, batch jobs, CI runners, serverless functions — and now AI agents — have shifted the center of gravity of identity management from humans to workloads. Industry surveys report non-human identities outnumbering human ones by tens of times, and their credential management is cited as a leading cause of breaches.

SPIFFE's answer is a change of premise: do not distribute secrets — issue identities. A workload proves itself through the properties of its execution environment (attestation), and the platform automatically issues and renews short-lived identity documents (SVIDs). Human-touched secrets disappear.

SPIFFE Core Concepts — SPIFFE ID and SVID

SPIFFE ID

A SPIFFE ID is a URI identifying a workload. It consists of the spiffe scheme, a trust domain, and a path.

spiffe://prod.example.com/ns/orders/sa/orders-sa
└─┬──┘ └──────┬───────┘ └─────────┬──────────┘
 scheme    trust domain        workload path
                          (e.g. namespace/service account)
  • The trust domain is the unit of issuing authority. It typically maps to an organization or an environment (production/staging).
  • The path is free for the organization to design. On Kubernetes, encoding the namespace and service account is the common pattern.

SVID — The Identity Document

An SVID (SPIFFE Verifiable Identity Document) is a verifiable document carrying a SPIFFE ID, in two formats.

AspectX.509-SVIDJWT-SVID
FormatX.509 certificate (SPIFFE ID in SAN URI)JWT (SPIFFE ID in the sub claim)
UseMutual authentication of mTLS connectionsAuth across TLS-terminating hops (L7 proxies)
LifetimeMinutes to hours (about 1 hour default)Minutes (audience required)
Replay riskLow (proof of key possession)Present (reusable within lifetime if stolen)
RecommendedThe default choiceFallback where mTLS cannot be maintained

X.509-SVID is the default; JWT-SVID is the auxiliary mechanism for segments where mTLS cannot survive end to end (through L7 load balancers, for example). For both, the essence is short lifetime + automatic renewal — even a leak limits damage to minutes.

Trust Bundles

Verifiers validate SVIDs against the trust bundle, the set of CA public keys per trust domain. SPIRE automates bundle distribution and refresh as well.

SPIRE Architecture — Server, Agent, Attestation

SPIRE, the reference implementation of SPIFFE, is a two-tier structure of server and agent.

+--------------------------------------------------------------+
|                        SPIRE Server                           |
|  - stores registration entries                                |
|  - signs SVIDs as a CA (or delegates to an upstream CA)       |
|  - verifies node attestation                                  |
+------------------------------+-------------------------------+
                               | (1) node attestation
                               |     "is this node/agent genuine?"
+------------------------------+-------------------------------+
|                  SPIRE Agent (DaemonSet per node)             |
|  - exposes the Workload API (unix socket)                     |
|  - performs workload attestation                              |
|  - caches/renews SVIDs                                        |
+------------------------------+-------------------------------+
                               | (2) workload attestation
                               |     "which workload is this process?"
            +------------------+------------------+
            |                  |                  |
       +----+----+        +----+----+        +----+----+
       | Pod A   |        | Pod B   |        | Pod C   |
       | (orders)|        | (pay)   |        | (envoy) |
       +---------+        +---------+        +---------+

The flow works like this.

  1. Node attestation — the agent proves to the server that it runs on a legitimate node. On Kubernetes, the standard is k8s_psat, where the server validates the agent's service account token via the TokenReview API. On AWS/GCP/Azure, attestors based on instance identity documents are used.
  2. Workload attestation — when a workload connects to the agent's Workload API socket, the agent collects kernel-level information about the calling process (on Kubernetes: the pod's namespace, service account, labels, and so on).
  3. Registration entry matching — if the collected selectors match an entry registered on the server, an SVID for that SPIFFE ID is issued.
  4. Automatic renewal — the agent re-issues SVIDs before expiry and pushes them to the workload.

The key point: the workload needs no pre-provisioned secret whatsoever. Identity derives not from "what you know" (a secret) but from "where and how you are running" (properties).

Hands-On Kubernetes Deployment

For production the official SPIRE Helm charts are recommended, but to understand the structure let us look at the core manifests directly. First, the server configuration.

apiVersion: v1
kind: ConfigMap
metadata:
  name: spire-server
  namespace: spire
data:
  server.conf: |
    server {
      bind_address = "0.0.0.0"
      bind_port = "8081"
      trust_domain = "prod.example.com"
      data_dir = "/run/spire/data"
      log_level = "INFO"
      ca_ttl = "24h"
      default_x509_svid_ttl = "1h"
    }
    plugins {
      DataStore "sql" {
        plugin_data {
          database_type = "sqlite3"
          connection_string = "/run/spire/data/datastore.sqlite3"
        }
      }
      NodeAttestor "k8s_psat" {
        plugin_data {
          clusters = {
            "prod-cluster" = {
              service_account_allow_list = ["spire:spire-agent"]
            }
          }
        }
      }
      KeyManager "disk" {
        plugin_data {
          keys_path = "/run/spire/data/keys.json"
        }
      }
      Notifier "k8sbundle" {
        plugin_data {
          namespace = "spire"
        }
      }
    }
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: spire-server
  namespace: spire
spec:
  serviceName: spire-server
  replicas: 1
  selector:
    matchLabels:
      app: spire-server
  template:
    metadata:
      labels:
        app: spire-server
    spec:
      serviceAccountName: spire-server
      containers:
        - name: spire-server
          image: ghcr.io/spiffe/spire-server:1.12.0
          args: ['-config', '/run/spire/config/server.conf']
          ports:
            - containerPort: 8081
          volumeMounts:
            - name: spire-config
              mountPath: /run/spire/config
              readOnly: true
            - name: spire-data
              mountPath: /run/spire/data
      volumes:
        - name: spire-config
          configMap:
            name: spire-server
  volumeClaimTemplates:
    - metadata:
        name: spire-data
      spec:
        accessModes: ['ReadWriteOnce']
        resources:
          requests:
            storage: 1Gi

The agent is deployed to every node as a DaemonSet.

apiVersion: v1
kind: ConfigMap
metadata:
  name: spire-agent
  namespace: spire
data:
  agent.conf: |
    agent {
      data_dir = "/run/spire"
      log_level = "INFO"
      server_address = "spire-server.spire.svc.cluster.local"
      server_port = "8081"
      socket_path = "/run/spire/sockets/agent.sock"
      trust_domain = "prod.example.com"
      trust_bundle_path = "/run/spire/bundle/bundle.crt"
    }
    plugins {
      NodeAttestor "k8s_psat" {
        plugin_data {
          cluster = "prod-cluster"
        }
      }
      KeyManager "memory" {
        plugin_data {}
      }
      WorkloadAttestor "k8s" {
        plugin_data {
          disable_container_selectors = false
        }
      }
    }
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: spire-agent
  namespace: spire
spec:
  selector:
    matchLabels:
      app: spire-agent
  template:
    metadata:
      labels:
        app: spire-agent
    spec:
      hostPID: true
      serviceAccountName: spire-agent
      containers:
        - name: spire-agent
          image: ghcr.io/spiffe/spire-agent:1.12.0
          args: ['-config', '/run/spire/config/agent.conf']
          volumeMounts:
            - name: spire-config
              mountPath: /run/spire/config
              readOnly: true
            - name: spire-bundle
              mountPath: /run/spire/bundle
              readOnly: true
            - name: spire-agent-socket
              mountPath: /run/spire/sockets
      volumes:
        - name: spire-config
          configMap:
            name: spire-agent
        - name: spire-bundle
          configMap:
            name: spire-bundle
        - name: spire-agent-socket
          hostPath:
            path: /run/spire/sockets
            type: DirectoryOrCreate

Finally, register the workload. An entry granting a SPIFFE ID to pods running as the orders-sa service account in the orders namespace:

kubectl exec -n spire spire-server-0 -- \
  /opt/spire/bin/spire-server entry create \
  -spiffeID spiffe://prod.example.com/ns/orders/sa/orders-sa \
  -parentID spiffe://prod.example.com/spire/agent/k8s_psat/prod-cluster/NODE_UUID \
  -selector k8s:ns:orders \
  -selector k8s:sa:orders-sa

Manual registration does not scale. In practice you deploy the SPIRE Controller Manager alongside, managing registration declaratively via CRDs (ClusterSPIFFEID).

apiVersion: spire.spiffe.io/v1alpha1
kind: ClusterSPIFFEID
metadata:
  name: default-workload-id
spec:
  spiffeIDTemplate: 'spiffe://prod.example.com/ns/{{ .PodMeta.Namespace }}/sa/{{ .PodSpec.ServiceAccountName }}'
  podSelector:
    matchLabels:
      spiffe.io/spire-managed-identity: 'true'

With this, every labeled pod automatically receives a namespace/service-account-based SPIFFE ID.

Envoy SDS Integration — Automatic mTLS

The standard pattern for applying mTLS without changing application code is Envoy sidecars plus the SPIRE agent's SDS (Secret Discovery Service) integration. The SPIRE agent can act as an SDS server, so Envoy receives certificates via API rather than files. Renewal happens with zero downtime too.

# Envoy sidecar configuration (excerpt) — orders service
static_resources:
  clusters:
    # Register the SPIRE agent Workload API as the SDS cluster
    - name: spire_agent
      connect_timeout: 1s
      http2_protocol_options: {}
      load_assignment:
        cluster_name: spire_agent
        endpoints:
          - lb_endpoints:
              - endpoint:
                  address:
                    pipe:
                      path: /run/spire/sockets/agent.sock
    # Upstream: mTLS connection to the payments service
    - name: payments_upstream
      connect_timeout: 2s
      type: STRICT_DNS
      load_assignment:
        cluster_name: payments_upstream
        endpoints:
          - lb_endpoints:
              - endpoint:
                  address:
                    socket_address:
                      address: payments.payments.svc.cluster.local
                      port_value: 8443
      transport_socket:
        name: envoy.transport_sockets.tls
        typed_config:
          '@type': type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext
          common_tls_context:
            # Fetch my identity (X.509-SVID) via SDS
            tls_certificate_sds_secret_configs:
              - name: spiffe://prod.example.com/ns/orders/sa/orders-sa
                sds_config:
                  api_config_source:
                    api_type: GRPC
                    transport_api_version: V3
                    grpc_services:
                      - envoy_grpc:
                          cluster_name: spire_agent
            # Peer validation: trust bundle + expected SPIFFE ID
            combined_validation_context:
              default_validation_context:
                match_typed_subject_alt_names:
                  - san_type: URI
                    matcher:
                      exact: spiffe://prod.example.com/ns/payments/sa/payments-sa
              validation_context_sds_secret_config:
                name: spiffe://prod.example.com
                sds_config:
                  api_config_source:
                    api_type: GRPC
                    transport_api_version: V3
                    grpc_services:
                      - envoy_grpc:
                          cluster_name: spire_agent

This one configuration automates the following.

  • The orders sidecar receives its X.509-SVID from SPIRE and uses it as the TLS client certificate.
  • The payments-side certificate is validated against the trust bundle, and the SAN URI is checked for an exact match with the expected SPIFFE ID. The point is authorization at the workload level, not "same CA, come on in."
  • Certificate renewal (1-hour lifetimes) is handled hitlessly via SDS pushes. No human-touched certificate files exist.

To use SPIFFE directly in application code without sidecars, call the Workload API with an SDK such as go-spiffe.

// Building an mTLS client with go-spiffe v2 (excerpt)
source, err := workloadapi.NewX509Source(ctx)
if err != nil {
    log.Fatal(err)
}
defer source.Close()

serverID := spiffeid.RequireFromString("spiffe://prod.example.com/ns/payments/sa/payments-sa")
tlsConfig := tlsconfig.MTLSClientConfig(source, source, tlsconfig.AuthorizeID(serverID))

client := &http.Client{
    Transport: &http.Transport{TLSClientConfig: tlsConfig},
}
resp, err := client.Get("https://payments.payments.svc.cluster.local:8443/healthz")

Istio and SPIFFE Compatibility

Istio's mTLS identity scheme has been SPIFFE-formatted from the start. The SAN URI in certificates received by sidecars (or ambient's ztunnel) takes this form:

spiffe://cluster.local/ns/orders/sa/orders-sa

So, to the question "do I need SPIRE if I run Istio," the answer is:

  • If you only need identity inside the mesh — Istio's built-in CA (istiod) is enough. You get SPIFFE-format workload identity and automatic mTLS with no separate SPIRE.
  • If you need a single identity scheme beyond the mesh (VMs, other clusters, non-mesh workloads, CI runners) — making SPIRE the single source of identity is valid. Istio can integrate SPIRE as an external CA (via SDS), in which case workloads inside and outside the mesh mutually authenticate with SVIDs from the same trust domain.
  • Caveat: Istio's default trust domain is cluster.local, so to align with an organization-wide trust domain strategy (say prod.example.com), the trustDomain in meshConfig and SPIRE's trust_domain must be reconciled.

Federation — Authentication Across Trust Domains

For workloads in different trust domains (different clusters, organizations, clouds) to mutually authenticate, trust bundles must be exchanged. SPIRE's federation feature automates this.

# Adding federation to the SPIRE Server config (server.conf excerpt)
# federates_with points at the peer domain bundle endpoint
federation {
  bundle_endpoint {
    address = "0.0.0.0"
    port = 8443
  }
  federates_with "partner.example.org" {
    bundle_endpoint_url = "https://spire.partner.example.org:8443"
    bundle_endpoint_profile "https_spiffe" {
      endpoint_spiffe_id = "spiffe://partner.example.org/spire/server"
    }
  }
}

Registration entries must also declare federation targets.

spire-server entry create \
  -spiffeID spiffe://prod.example.com/ns/orders/sa/orders-sa \
  -selector k8s:ns:orders -selector k8s:sa:orders-sa \
  -federatesWith spiffe://partner.example.org

With this, the orders workload receives the peer domain's trust bundle along with its own SVID and can establish mTLS with workloads in the partner domain. Bundle refresh is synchronized periodically by both servers. This is the standard way to build mutual authentication "without a shared CA" across multi-cluster, hybrid cloud, and B2B integrations.

Comparison with Vault and cert-manager

Sorting out the roles of tools that look similar because they all "issue certificates automatically."

AspectSPIREHashiCorp Vaultcert-manager
EssenceWorkload identity issuance platformSecret management + PKI issuance engineKubernetes certificate lifecycle manager
Identity proofAttestation (credential-free bootstrap)Requires an auth method (k8s auth etc.)No direct proof (resource permissions)
IssuesPer-workload SVIDs (X.509/JWT)Generic secrets, PKI certificatesMostly TLS server certs (Ingress etc.)
Lifetime philosophyMinutes to hours, fully auto-renewedConfigurable (short or long)Usually tens of days, auto-renewed
StandardSPIFFE (CNCF graduated)Proprietary APIACME and other cert standards
ComplementaritySVIDs usable as Vault authVault PKI usable as SPIRE upstream CACoexists for non-mesh certificates

The point is combination, not competition. Common setups:

  • SPIRE for identity, Vault for secrets — workloads log in to Vault with their SVID (JWT/cert auth) to fetch residual secrets like database passwords. The "secret to access secrets" disappears, dissolving the bootstrap paradox.
  • Vault PKI as SPIRE's UpstreamAuthority — subordinate the SPIRE CA inside the organizational PKI to preserve governance.
  • cert-manager for edge TLS — server certificates for externally exposed domains (Let's Encrypt and so on) belong to cert-manager, while internal workload-to-workload mTLS belongs to SPIRE. A natural division of labor.

Connecting Workload Identity and User Identity

A real request carries two identities at once. How do we express "the orders service (workload) calls payments on behalf of user-1234 (user)"?

  • Transport layer — authenticate the calling workload with mTLS (X.509-SVID).
  • Request layer — carry the user context in a JWT. Token exchange (RFC 8693), covered in the previous post, records the delegation chain.
  • Transaction tokens — the Transaction Tokens draft under discussion in the OAuth WG standardizes this pattern. At the trust boundary, the external token is exchanged for a short-lived (minutes) internal-only token carrying user identity + request context + call chain, propagated across all internal calls. And the party requesting that exchange from the transaction token service authenticates with — exactly — its workload identity (SVID).
[User JWT] --> Gateway --(exchange)--> [Txn-Token: user + context + chain]
                              |
                              v
        orders (mTLS via SVID) --> payments (mTLS via SVID)
        per request: verify Txn-Token + check caller SPIFFE ID

Workload identity (SPIFFE) and user identity (OIDC) combined within a single request, rather than living in separate systems — that is the finished form of Zero Trust architecture in 2026.

Extending to AI Agent Identity

The newest inflection point of non-human identity is the AI agent. Agents are created and destroyed more dynamically than traditional services, act on behalf of users, and exercise privileges through tool calls. From a SPIFFE perspective:

  • Agent runtimes are workloads too. Granting an SVID to the process/pod running an agent makes "which agent runtime invoked this tool" cryptographically identifiable, replacing the practice of embedding static API keys in agents.
  • Combining delegation chains — when an agent acts for a user, verify workload identity (SVID) together with user delegation (the act claim from token exchange, or a transaction token). "This agent, for this user, within this scope" is all proven by tokens and certificates.
  • Touchpoints with the MCP ecosystem — as MCP (Model Context Protocol) servers standardize as OAuth-protected resources (including Keycloak 26.6 experimental CIMD support), standard token flows now govern agent tool access. For mTLS between tool servers, SPIFFE is the natural companion.
  • Maximizing the value of short lifetimes — agents have wide action radii, so credential leaks have outsized impact. Minute-scale SVIDs combined with narrow-audience JWT-SVIDs are especially effective in agent environments.

Operational Challenges of Adoption

The challenges you actually meet when adopting SPIRE, and how to respond.

  1. Availability of SPIRE itself — with 1-hour SVIDs, renewals stall if SPIRE is down for more than an hour. Run the server highly available (shared datastore + multiple replicas), absorb short outages with agent caches, and design SVID lifetime and outage tolerance together.
  2. Registration entry governance — loose selectors (say, namespace only) grant the same identity to more workloads than intended. Manage ClusterSPIFFEID templates under code review and write down the criteria for identity issuance.
  3. An incremental adoption path — converting every service simultaneously is impossible. You need a permissive stage (plaintext and mTLS in parallel) tightening to STRICT, plus a dashboard measuring mTLS adoption.
  4. Boundaries with non-SPIFFE systems — legacy databases and external SaaS still demand passwords and API keys. Consolidate those residual secrets in Vault, and authenticate Vault access with SVIDs, shrinking the "root of secrets" down to one.
  5. Observability — watch SVID issuance/renewal failures, attestation failures, and bundle sync delays as metrics with alerts. Cascading failures from certificate expiry start silently; the share of SVIDs nearing expiry is a good leading indicator.
  6. Clock sync and key protection — short-lived certificates are sensitive to clock skew, so NTP monitoring is mandatory; protecting the server's signing keys with a KMS/HSM backend (KeyManager plugin) is recommended.

Closing Thoughts

The shift SPIFFE/SPIRE proposes can be summarized in one sentence: stop moving secrets — issue identities. Attestation dissolves the bootstrap paradox, short-lived SVIDs make rotation a non-event, and the standard naming of SPIFFE IDs enables workload-level authorization and federation. With Envoy SDS or Istio integration, you reach automatic mTLS without touching application code.

And this foundation is expanding beyond workloads. User identity (OIDC), delegation (token exchange, transaction tokens), and AI agent identity combining with SPIFFE workload identity within a single request — that is the standard shape of the 2026 Zero Trust stack. If your organization is exhausted by secret sprawl, there is no reason left to postpone the move to identity-based authentication.

References