- Published on
Istio Ambient vs Sidecar — Choosing Your Service Mesh Architecture in 2026
- Authors

- Name
- Youngju Kim
- @fjvbn20031
- Introduction
- The Cost of the Sidecar Model — What Was the Problem
- Dissecting the Ambient Architecture
- Data Path Comparison — How Packets Flow
- Feature Parity Status — As of 2026
- Resource and Performance Comparison — Doing the Math
- Migration Strategy — Incremental, Namespace by Namespace
- Hands-On Installation
- The Relationship with Cilium — Conflict or Coexistence
- When Sidecars Are Still the Right Choice
- Decision Tree
- Troubleshooting
- Adoption Checklist
- Closing Thoughts
- References
Introduction
If your team has ever operated a service mesh, you have probably faced this question: "We have 500 pods — why are there 500 extra Envoy containers running?" The sidecar model was the standard service mesh architecture, but its price was never small. Attaching a proxy to every pod nearly doubles resource consumption, forces a full workload restart on every mesh upgrade, and lets a single injection misconfiguration halt an entire deployment.
Istio Ambient mode is the structural answer to this problem. Since reaching GA (General Availability) in Istio 1.24 at the end of 2024, Ambient has been establishing itself in 2026 as the default choice for new mesh deployments. But the simplistic conclusion that "Ambient is always the answer" is dangerous. There are clearly workloads for which sidecars remain the better fit, and for most organizations the transition period — during which both modes run side by side — lasts a year or more.
In this article we will put numbers on the cost of the sidecar model, dissect the Ambient architecture (ztunnel, waypoint, HBONE), and compare the data paths of both modes down to packet flow. Finally, we will cover a namespace-by-namespace incremental migration strategy, a decision tree, and troubleshooting techniques from an operations perspective.
The Cost of the Sidecar Model — What Was the Problem
Resources Double
In the sidecar model, an istio-proxy (Envoy) container is injected into every application pod. The memory footprint of a single Envoy sidecar varies with configuration size, but because it must hold service discovery information for the whole mesh, it grows as the cluster grows. Let us do some simple arithmetic.
Assumptions:
- Pods in the cluster: 500
- Memory per sidecar: average 120MiB (for a mesh with ~1,000 services)
- CPU request per sidecar: 100m
Total sidecar-model overhead:
Memory = 500 x 120MiB = 60,000MiB = about 58.6GiB
CPU request = 500 x 100m = 50 vCPU reserved on nodes
If a node has 16GiB of memory:
Sidecar memory alone consumes roughly 4 nodes worth of capacity
The 50 vCPU of requests eats into schedulable capacity accordingly
What matters here is the reserved requests rather than actual usage. Give sidecars conservative requests and you erode cluster scheduling capacity; give them too little and the proxy gets OOMKilled at traffic peaks, taking the application down with it. Tuning sidecar resources was an operational job in its own right.
Upgrades Are Painful
Because the sidecar lives inside the pod, upgrading the proxy version requires restarting the pod. In other words, upgrading the Istio control plane from 1.25 to 1.26 is not just a matter of swapping istiod — it entails a rolling restart of every workload in the mesh.
Sidecar upgrade procedure (canary revision approach):
1. Install istiod 1.26 as a new revision tag (alongside existing 1.25)
2. Relabel namespaces to the new revision
3. Rollout-restart every Deployment in those namespaces
4. Repeat for all namespaces — hours to days for a 5,000-pod cluster
5. Remove the old istiod
Pain points:
- Workloads that are expensive or impossible to restart (StatefulSets,
batch jobs, session-affine services) hold everything up
- Pods that miss the restart keep the old proxy -> version skew
- With quarterly upgrade cycles, this cost becomes a permanent burden
Injection Breaks More Often Than You Think
Sidecar injection works through a mutating webhook. Namespace labels, pod annotations, webhook configuration, and revision tags must all line up; if any one of them is off, you end up with "some pods inside the mesh, some outside." Turn on STRICT mTLS in that state and all traffic from non-injected pods is cut off. And the classic problem of Jobs never completing because of the sidecar (the main container finishes, but istio-proxy stays alive so the Job never reaches Completed) was a chronic disease of the sidecar era. Kubernetes-native sidecar containers (initContainers with restartPolicy Always) alleviated it, but the root cause was structural: the proxy intrudes into the pod lifecycle.
Dissecting the Ambient Architecture
The core idea of Ambient mode is to pull the proxy out of the pod and push it down into the infrastructure layer — and to split L4 from L7 so that not every workload is forced to pay the cost of an L7 proxy.
Two Layers: ztunnel and waypoint
+----------------------------------------------------------------------+
| Kubernetes cluster |
| |
| Node A Node B |
| +------------------------------+ +-----------------------------+ |
| | Pod: app-a Pod: app-b | | Pod: app-c | |
| | (no sidecar) (no sidecar) | | (no sidecar) | |
| | | | | | | | |
| | v v | | v | |
| | +------------------------+ | | +-----------------------+ | |
| | | ztunnel (DaemonSet) |==|====|==| ztunnel (DaemonSet) | | |
| | | L4: mTLS, TCP telemetry| | HBONE | L4: mTLS, L4 authz | | |
| | +------------------------+ | | +-----------------------+ | |
| +------------------------------+ +-----------------------------+ |
| |
| Only for namespaces that need L7 features, optionally: |
| +------------------------------------+ |
| | waypoint proxy (Envoy, Deployment) | |
| | L7: HTTP routing, retries, L7 | |
| | authz, HTTP telemetry | |
| +------------------------------------+ |
| |
| +----------------+ |
| | istiod (control| --- xDS / certificates ---> ztunnel, waypoint |
| | plane) | |
| +----------------+ |
+----------------------------------------------------------------------+
- ztunnel (zero-trust tunnel): a DaemonSet with one instance per node. It is a lightweight proxy written in Rust — it is not Envoy. Its role is deliberately limited to L4: mTLS encryption between workloads, SPIFFE-identity-based L4 authorization, and TCP-level telemetry. ztunnel encrypts traffic on behalf of every mesh pod on its node, but because it never parses HTTP, its memory usage stays small and scales with connection count rather than pod count.
- waypoint proxy: an Envoy-based proxy deployed per namespace (or per service account) wherever L7 features are needed — HTTP header-based routing, retries, timeouts, L7 AuthorizationPolicy, HTTP metrics. It is declared as a Gateway resource from the Kubernetes Gateway API, and since it is a regular Deployment it can be scaled independently with HPA.
This separation is what creates Ambient economics. In real operations, only a fraction of services need L7 policy. The large majority are fine with "mTLS and basic telemetry," yet the sidecar model charged them the full Envoy price anyway. Ambient provides L4 by default (ztunnel) and bills L7 only where it is used (waypoint).
HBONE — the In-Mesh Tunnel Protocol
In Ambient, inter-node traffic flows through HBONE (HTTP-Based Overlay Network Environment) tunnels. HBONE uses the HTTP/2 CONNECT method to encapsulate the original TCP stream over an mTLS connection, on port 15008.
HBONE tunnel structure (port 15008):
+-------------------------------------------------------+
| TCP (Node A ztunnel -> Node B ztunnel, dest :15008) |
| +---------------------------------------------------+|
| | mTLS (mutual SPIFFE identity verification) ||
| | +-----------------------------------------------+ ||
| | | HTTP/2 | ||
| | | CONNECT request: target pod IP:port | ||
| | | +-------------------------------------------+| ||
| | | | Original application TCP byte stream || ||
| | | +-------------------------------------------+| ||
| | +-----------------------------------------------+ ||
| +---------------------------------------------------+|
+-------------------------------------------------------+
Thanks to HTTP/2 stream multiplexing, multiple workload connections between the same pair of nodes can share a single mTLS connection, reducing connection counts and handshake costs. And because the CONNECT headers carry the original destination and identity information, the receiving ztunnel can make L4 authorization decisions without inspecting the payload.
Traffic Redirection — How It Differs from Sidecars
The sidecar model used iptables REDIRECT inside the pod network namespace to steer traffic into istio-proxy. In Ambient, the istio-cni node agent installs redirection rules inside the pod network namespace too, but it forwards traffic to the ztunnel socket on the same node. The pod spec is never modified, so workloads can join or leave the mesh without restarts. All it takes is one label on the namespace.
# Enroll a namespace into the ambient mesh — no pod restarts needed
kubectl label namespace payments istio.io/dataplane-mode=ambient
# Remove from the mesh
kubectl label namespace payments istio.io/dataplane-mode-
Data Path Comparison — How Packets Flow
Request Path in Sidecar Mode
[Sidecar mode] app-a (Node A) -> app-b (Node B) HTTP call
app-a container
| (1) localhost outbound, intercepted by iptables
v
istio-proxy in app-a pod (Envoy) <- L4+L7 processing (hop 1)
| (2) mTLS encryption, routing/retries/telemetry
v
(inter-node network, mTLS)
|
v
istio-proxy in app-b pod (Envoy) <- L4+L7 processing (hop 2)
| (3) mTLS decryption, authz checks, L7 telemetry
v
app-b container
Proxy hops: 2 (both always parse L7)
Request Path in Ambient Mode — L4 Only
[Ambient, no waypoint] app-a (Node A) -> app-b (Node B) TCP/HTTP call
app-a container
| (1) outbound, istio-cni rules redirect to the node ztunnel
v
Node A ztunnel <- L4 processing (hop 1)
| (2) HBONE tunnel established (mTLS, HTTP/2 CONNECT, :15008)
v
Node B ztunnel <- L4 processing (hop 2)
| (3) decrypt, L4 authz check, deliver to pod
v
app-b container
Proxy hops: 2 (but both are lightweight L4 — no HTTP parsing)
Request Path in Ambient Mode — Through a Waypoint
[Ambient + waypoint] app-a -> app-b (waypoint in app-b namespace)
app-a container
v
Node A ztunnel ── HBONE ──> waypoint pod (Envoy) <- L7 processing
| HTTP routing, retries,
| L7 authz, telemetry
v
── HBONE ──> destination node ztunnel
v
app-b container
Proxy hops: 3 (ztunnel -> waypoint -> ztunnel)
Key point: the waypoint is a destination-side resource —
it enforces the destination service policies
One design principle deserves emphasis here. In Ambient, the waypoint belongs to the destination (service producer) side. In the sidecar model, the client-side proxy performed routing and retries; in Ambient, the team that owns the destination service enforces its own L7 policy in its own waypoint. Policy ownership becomes clearer, but configurations that relied on client-side policy in the sidecar world (for example, different timeouts per caller) may need to be redesigned.
Feature Parity Status — As of 2026
Since reaching GA, Ambient has closed the gap with sidecars rapidly across releases. The table below reflects the status to the extent I have verified it as of 2026 (please re-confirm the exact state in the official documentation for your version).
| Feature | Sidecar | Ambient | Notes |
|---|---|---|---|
| mTLS (in-mesh encryption) | Supported | Supported | Provided at L4 by ztunnel |
| L4 AuthorizationPolicy | Supported | Supported | Enforced in ztunnel |
| L7 AuthorizationPolicy | Supported | Supported | Requires waypoint |
| HTTP routing / canary | Supported | Supported | Requires waypoint |
| Retries / timeouts / circuit breaking | Supported | Supported | Requires waypoint |
| RequestAuthentication (JWT) | Supported | Supported | Requires waypoint |
| Multi-cluster | Mature | In progress | Ambient multi-cluster evolving through alpha/beta |
| VM workload onboarding | Supported | Limited | An area where sidecars lead |
| EnvoyFilter customization | Supported | Partial | Only partially applicable to waypoints |
| Sidecar resource (scope trimming) | Supported | Not needed | Replaced by on-demand configuration in ztunnel |
In short, the core single-cluster scenario — mTLS plus L4/L7 policy plus traffic management — has reached parity. The remaining gaps are multi-cluster maturity, VM onboarding, and deep EnvoyFilter-level customization.
Resource and Performance Comparison — Doing the Math
Let us redo the earlier 500-pod example with Ambient.
Assumptions:
- 500 pods, 20 nodes
- Memory per ztunnel: average 50MiB (varies with connections, L4 only)
- Namespaces that need L7 policy: 5
- Per waypoint: 2 replicas, 200MiB each (Envoy, scaled by HPA)
Total Ambient overhead:
ztunnel = 20 nodes x 50MiB = 1,000MiB ≈ 1GiB
waypoint = 5 ns x 2 replicas x 200MiB = 2,000MiB ≈ 2GiB
Total ≈ 3GiB
Sidecar model: ≈ 58.6GiB (calculated earlier)
Savings: about 95% (for this scenario)
Of course, this arithmetic favors Ambient. Honesty requires examining the other direction too.
- Latency: the L4 path (ztunnel only) is as fast as or faster than sidecars. But the waypoint path has 3 hops, so it can be longer than the sidecar path (2 hops). If the waypoint lives on a different node, an extra inter-node round trip is added.
- Blast radius: with sidecars, a proxy failure is confined to its single pod. If a ztunnel dies, all mesh traffic on that node is affected. ztunnel is designed to be simple and stable for exactly this reason, but the node-level failure domain must be reflected in your operational design (PodDisruptionBudget, priority classes, monitoring).
- Noisy neighbors: traffic from multiple tenants on one node shares the same ztunnel, so a workload generating extreme traffic can affect the mesh path of other workloads on the same node.
Migration Strategy — Incremental, Namespace by Namespace
The move from sidecar to Ambient should be an incremental, namespace-scoped transition, not a big bang. Istio supports mixed operation of sidecar and Ambient workloads within the same mesh, and the two modes interoperate over mTLS.
Recommended transition order:
Phase 0: Pre-checks
- Upgrade Istio to a stable Ambient-capable version
- Install istio-cni and ztunnel (no impact on existing sidecar traffic)
- Inventory all EnvoyFilter usage — identify Ambient-incompatible items
Phase 1: Pick one low-risk namespace (internal tools, staging)
- Remove the sidecar injection label + rollout restart
- Apply the istio.io/dataplane-mode=ambient label
- If L7 policies existed, deploy a waypoint and re-bind the policies
- Observe for 1-2 weeks: mTLS metrics, error rates, p99 latency
Phase 2: Expand to namespaces with simple (L4-centric) traffic patterns
- Start with areas that run on ztunnel alone, without waypoints
Phase 3: Core services with complex L7 policies
- Size waypoint capacity (HPA), verify policy behavior, then switch
Phase 4: Finalize the sidecar-resident areas
- VM integration and EnvoyFilter-dependent workloads may stay on sidecars
- Document mixed operation as an official, supported state
The most common mistake during migration is applying the ambient label while the sidecar injection label is still present. The two modes must never be enabled simultaneously on one namespace, so your migration scripts must manage label state atomically.
# Check state before switching — the two labels must not coexist
kubectl get namespace payments -o jsonpath='{.metadata.labels}'
# Disable sidecar injection (remove the revision label if you use one)
kubectl label namespace payments istio-injection-
# Remove sidecars from pods
kubectl rollout restart deployment -n payments
# After the restart completes, enroll into ambient
kubectl label namespace payments istio.io/dataplane-mode=ambient
Hands-On Installation
Installing the Ambient Profile with Helm
helm repo add istio https://istio-release.storage.googleapis.com/charts
helm repo update
# 1) CRDs and base
helm install istio-base istio/base -n istio-system --create-namespace --wait
# 2) istiod — ambient profile
helm install istiod istio/istiod -n istio-system \
--set profile=ambient --wait
# 3) CNI node agent — handles traffic redirection
helm install istio-cni istio/cni -n istio-system \
--set profile=ambient --wait
# 4) ztunnel DaemonSet
helm install ztunnel istio/ztunnel -n istio-system --wait
If you prefer istioctl, a single line works too.
istioctl install --set profile=ambient --skip-confirmation
Deploying a Waypoint
A waypoint is declared as a Gateway resource from the Kubernetes Gateway API. Use the istioctl helper or apply YAML directly.
# Create a namespace waypoint + enroll the namespace traffic onto it
istioctl waypoint apply -n payments --enroll-namespace
# Declarative YAML equivalent to the command above
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: waypoint
namespace: payments
labels:
istio.io/waypoint-for: service # service | workload | all
spec:
gatewayClassName: istio-waypoint
listeners:
- name: mesh
port: 15008
protocol: HBONE
---
# Label the namespace to use this waypoint (same effect as enroll-namespace)
# kubectl label ns payments istio.io/use-waypoint=waypoint
If you want only specific services to go through the waypoint, put the istio.io/use-waypoint label on the Service object. Finer-grained bindings take precedence over the namespace label.
The Relationship with Cilium — Conflict or Coexistence
Whether Ambient can run on a cluster that uses Cilium as its CNI is a frequent question. The short answer: yes, but you must divide responsibilities clearly.
- CNI compatibility: istio-cni hooks into the CNI plugin chain, so it works on top of Cilium. However, there are known cases where Cilium settings must be adjusted for compatibility (for example, restricting socket-based load balancing to the host namespace — socketLB.hostNamespaceOnly), so check the compatibility documentation of both projects for your versions.
- Overlapping areas: Cilium also offers its own transport encryption (WireGuard/IPsec), L4 network policy, and L7 visibility (Hubble). These overlap with Istio Ambient, so your team must standardize on one answer to "which layer owns encryption" and "is L4 policy a NetworkPolicy or an AuthorizationPolicy" — otherwise you will descend into double-policy debugging hell.
- A practical division: Cilium handles CNI, network policy, and node-level observability; Istio Ambient handles workload-identity-based mTLS and L7 traffic management. This layered split has been the smoothest arrangement in the field.
When Sidecars Are Still the Right Choice
Even as Ambient becomes the default, sidecars remain reasonable under the following conditions.
- Workloads with deep EnvoyFilter customization — Lua/WASM filters and non-standard protocol handling are freer on the sidecar side.
- Environments where VM workloads must be first-class mesh citizens — VM onboarding is more mature in the sidecar model.
- When pod-level fault isolation is a contractual requirement — some organizations cannot accept a node-shared proxy (ztunnel) under their regulatory or audit constraints.
- Architectures deeply dependent on client-side L7 policy — when per-caller routing/retry logic cannot be redesigned in the short term.
- When you already operate advanced multi-cluster topologies — verify per version whether Ambient multi-cluster has reached the maturity your requirements demand.
Decision Tree
Building a new mesh?
├─ Yes → Any EnvoyFilter / VM / advanced multi-cluster requirements?
│ ├─ No → Start with Ambient (recommended default)
│ └─ Yes → Sidecars only for those workloads, Ambient elsewhere
└─ No (existing sidecar mesh in operation)
→ Is sidecar pain (resources/upgrades) significant?
├─ Yes → Begin incremental namespace-by-namespace migration
│ L4-centric namespaces first → L7 namespaces after
│ validating waypoints
└─ No → Keep as is + new namespaces go Ambient (gradual
convergence)
Troubleshooting
Checking ztunnel State and Logs
# Check ztunnel pods
kubectl get pods -n istio-system -l app=ztunnel -o wide
# Logs of a specific node ztunnel — enrollment / HBONE connection events
kubectl logs -n istio-system ztunnel-abcde --tail=100
# Workloads known to ztunnel (istioctl)
istioctl ztunnel-config workloads
istioctl ztunnel-config services
istioctl ztunnel-config certificates # workload certificate issuance state
In the ztunnel-config workloads output, a PROTOCOL column value of HBONE means the workload is enrolled in the mesh; TCP means plaintext passthrough (outside the mesh). Problems of the "I applied the label but mTLS is not happening" variety are usually decided right here.
istioctl analyze and Common Symptoms
# Configuration consistency diagnosis — detects mode mixing, unbound waypoints, etc.
istioctl analyze -n payments
# Verify the waypoint received policies/routes (waypoints are Envoy, so proxy-config works)
istioctl proxy-config routes deploy/waypoint -n payments
| Symptom | Common cause | How to verify |
|---|---|---|
| Plaintext traffic after enrollment | Typo in dataplane-mode label, istio-cni not installed | PROTOCOL column in ztunnel-config workloads |
| L7 policy not taking effect | Waypoint not deployed or use-waypoint not bound | Gateway resource status, istioctl analyze |
| Intermittent connection drops | ztunnel restarts (OOM), missing PDB | ztunnel restart count, resource usage |
| Job pods hanging | (Not applicable in Ambient — check if the ns still runs sidecars) | Namespace mode labels |
| Sidecar/ambient confusion | Injection label and ambient label coexisting on one ns | Full audit of ns labels |
Adoption Checklist
- Confirmed the Istio version is a stable post-GA Ambient release
- Verified istio-cni compatibility settings with the existing CNI (Cilium etc.) against version docs
- Inventoried all EnvoyFilter, VM onboarding, and multi-cluster dependencies
- Built a procedure that manages namespace mode labels atomically
- Identified namespaces needing L7 policy and sized waypoint capacity (HPA)
- Set PodDisruptionBudget and monitoring (restarts, memory) for ztunnel
- Validated mTLS interoperation during mixed operation in staging
- Documented per-phase observation metrics (error rate, p99, mTLS ratio) and rollback procedures
- Reviewed whether a node-shared proxy meets security and audit requirements
Closing Thoughts
The sidecar model popularized the service mesh, but its cost structure — a full L7 proxy on every pod — was also the biggest barrier to mesh adoption. Ambient splits L4 (ztunnel) from L7 (waypoint), turning that cost into pay-for-what-you-use, and adds the operational benefits of restart-free mesh enrollment and infrastructure-layer upgrades.
The realistic conclusion for 2026 is this: treat Ambient as the default candidate for new deployments, but do not fear mixed operation if you have sidecar-favored areas such as EnvoyFilter, VMs, or advanced multi-cluster. For existing meshes, migrate slowly, namespace by namespace, watching your metrics. Architecture selection here is less a single decision and more a per-workload classification exercise.
References
- Istio Ambient Mode Official Documentation
- Istio Ambient Architecture Overview
- Istio ztunnel Traffic Redirection Explained
- Istio Waypoint Proxy Guide
- Istio Ambient Installation Guide (Helm)
- Migrating from Sidecars to Ambient in Istio
- Kubernetes Gateway API Specification
- Cilium Official Documentation
- SPIFFE Standard
- Envoy Proxy Official Documentation