Skip to content
Published on

Istio Ambient Mesh Internals: Service Mesh Without Sidecars

Authors

Introduction

Istio Ambient Mesh is a new data plane mode that provides service mesh capabilities without sidecar proxies. It was designed to address the fundamental limitations of the traditional sidecar approach.

This post provides an in-depth analysis of Ambient Mesh's internal architecture, core components, and differences from the sidecar approach.

Limitations of the Sidecar Approach

Resource Overhead

Traditional Sidecar Mode:
┌─────────────────────────────────┐
Pod│  ┌───────────┐ ┌─────────────┐ │
│  │    App    │ │ istio-proxy │ │  ← ~100MB RAM, 0.1 CPU per pod
│  │           │   (Envoy)    │ │
│  └───────────┘ └─────────────┘ │
└─────────────────────────────────┘

100 pods = ~10GB additional memory

Startup Latency

Sidecar injection impacts pod startup time:

  • istio-init container sets up iptables rules (1-2 seconds)
  • istio-proxy waits for configuration from istiod (2-5 seconds)
  • Adds 3-7 seconds to total startup time

Upgrade Complexity

Sidecar upgrades require restarting all pods:

  • Rolling update recreates pods one by one
  • Takes hours in large clusters
  • Impacts application availability

Ambient Mesh Architecture

┌─────────────────────────────────────────────────────────┐
Control Plane│                      istiod                              │
└─────────────────────┬───────────────────────────────────┘
                      │ xDS
          ┌───────────┼───────────┐
          ▼           ▼           ▼
┌─────────────┐ ┌──────────┐ ┌──────────┐
│   ztunnel   │ │ ztunnel  │ │ waypoint │
  (Node 1) (Node 2) │ │  proxy   │
DaemonSet  │ │ DaemonSet│ (per-ns)└──────┬──────┘ └────┬─────┘ └────┬─────┘
       │              │            │
  ┌────┴────┐   ┌────┴────┐      │
Pod A   │   │ Pod B   │      │
  (no scar)(no scar)│      │
  └─────────┘   └─────────┘      │
       │              │            │
       └──── HBONE tunnel ────────┘

ztunnel: Per-Node L4 Proxy

Overview

ztunnel (Zero Trust Tunnel) is a lightweight L4 proxy deployed as a DaemonSet on each node:

  • Written in Rust: High performance and low memory usage
  • L4 only: Handles TCP-level processing exclusively
  • One per node: Handles all pod traffic on that node

Core ztunnel Functions

ztunnel functions:
├── 1. mTLS encryption/decryption
│   ├── Workload certificate management (SDS)
│   └── Encrypt all pod-to-pod communication
├── 2. L4 authorization
│   ├── AuthorizationPolicy (L4 rules only)
│   └── Source/destination identity-based access control
├── 3. HBONE tunneling
│   ├── HTTP/2 CONNECT-based tunnel
│   └── Listens on port 15008
└── 4. L4 telemetry
    ├── TCP connection metrics
    └── Byte transfer tracking

ztunnel Internal Operation

[Outbound Traffic Processing]

App (Pod A, Node 1) → reviews:9080
[1] Traffic interception (eBPF or iptables)
Redirect traffic to local ztunnel
[2] ztunnel determines destination
    ├── Kubernetes Service resolution
    └── Locate destination Pod node
[3] Establish HBONE tunnel
    ├── mTLS connection to destination node ztunnel
    ├── Create tunnel via HTTP/2 CONNECT
    └── Use port 15008
[4] Send encrypted traffic

[Inbound Traffic Processing]

[5] Destination node ztunnel receives HBONE tunnel
[6] mTLS decryption and authorization check
    ├── Verify source SPIFFE ID
    └── Evaluate AuthorizationPolicy
[7] Forward as plaintext to destination Pod

ztunnel Certificate Management

ztunnel manages certificates for all pods on its node:

ztunnel (Node 1)
├── Pod A cert: spiffe://cluster.local/ns/prod/sa/frontend
├── Pod B cert: spiffe://cluster.local/ns/prod/sa/reviews
└── Pod C cert: spiffe://cluster.local/ns/prod/sa/ratings

Certificate acquisition:
1. ztunnel requests certificates for each workload from istiod
2. Authenticates via Kubernetes ServiceAccount token
3. Receives and renews certificates via SDS

Waypoint Proxy: Per-Namespace L7 Proxy

Overview

Waypoint proxy is an Envoy-based proxy deployed only when L7 functionality is needed:

When only L4 is needed:
Pod → ztunnel → ztunnel → Pod
(mTLS + L4 authz only)

When L7 is needed:
Pod → ztunnel → waypoint proxy → ztunnel → Pod
(HTTP routing, L7 authz, fault injection, etc.)

Deploying Waypoint Proxy

Create a waypoint proxy using the Kubernetes Gateway API:

apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: waypoint
  namespace: production
  labels:
    istio.io/waypoint-for: service
spec:
  gatewayClassName: istio-waypoint
  listeners:
    - name: mesh
      port: 15008
      protocol: HBONE

Or using istioctl:

istioctl waypoint apply --namespace production

Functions Handled by Waypoint Proxy

Waypoint Proxy (Envoy):
├── L7 Traffic Management
│   ├── VirtualService (HTTP routing)
│   ├── Weight-based traffic splitting
│   ├── Retries, timeouts
│   ├── Fault injection
│   └── Traffic mirroring
├── L7 Security
│   ├── AuthorizationPolicy (L7 rules)
│   ├── RequestAuthentication (JWT)
│   └── HTTP header/path-based authorization
└── L7 Observability
    ├── HTTP metrics (istio_requests_total, etc.)
    ├── Distributed tracing spans
    └── Access logging

HBONE Tunneling Protocol

HBONE Overview

HBONE (HTTP-Based Overlay Network Environment) is the core communication protocol of Ambient Mesh:

HBONE Tunnel Structure:
┌──────────────────────────────────────┐
TCP Connection (port 15008)│  ┌────────────────────────────────┐  │
│  │ TLS (mTLS)                     │  │
│  │  ┌──────────────────────────┐  │  │
│  │  │ HTTP/2 CONNECT           │  │  │
│  │  │  ┌────────────────────┐  │  │  │
│  │  │  │ Tunneled TCP Data  │  │  │  │
│  │  │  └────────────────────┘  │  │  │
│  │  └──────────────────────────┘  │  │
│  └────────────────────────────────┘  │
└──────────────────────────────────────┘

HBONE vs Sidecar Comparison

FeatureSidecar (iptables)HBONE (Ambient)
Traffic interceptioniptables REDIRECTeBPF or iptables
Proxy locationInside podNode level (ztunnel)
TunnelingNone (direct connection)HTTP/2 CONNECT
mTLS endpointSidecar proxyztunnel
PortOriginal service port15008 (HBONE)

HBONE Connection Flow

Source Pod (Node 1)          Destination Pod (Node 2)
     │                              ▲
     ▼                              │
ztunnel (Node 1)              ztunnel (Node 2)
     │                              ▲
TCP: Node1Node2:15008TLS: mTLS handshake       │
HTTP/2: CONNECT method    │
     └──────────────────────────────┘

Traffic Interception

eBPF programs redirect traffic at the kernel level:

App (socket connect) → eBPF hook → ztunnel
                TC (Traffic Control) or
                Socket-level redirection

Benefits:
- Higher performance than iptables
- Operates at kernel level with minimal overhead
- No per-pod iptables rules needed

iptables-Based Interception (Fallback)

For environments that do not support eBPF, iptables is used:

istio-cni sets up node-level iptables rules:

iptables -t nat -A PREROUTING \
  -p tcp \
  -m mark ! --mark 0x539/0xfff \
  -j REDIRECT --to-ports 15008

iptables -t nat -A OUTPUT \
  -p tcp \
  -m mark ! --mark 0x539/0xfff \
  -j REDIRECT --to-ports 15001

Workload Enrollment

Enabling Ambient Mesh

Add a label to a namespace to enroll in Ambient Mesh:

kubectl label namespace production istio.io/dataplane-mode=ambient

When this label is added:

  1. Pod traffic in the namespace is redirected to ztunnel
  2. ztunnel manages certificates for those pods
  3. mTLS is applied to all pod-to-pod communication

Enabling L7 Features

When L7 features are needed, deploy a waypoint proxy additionally:

# Deploy waypoint proxy in namespace
istioctl waypoint apply --namespace production

# Or apply only for a specific service account
istioctl waypoint apply --namespace production \
  --service-account reviews

Performance Comparison: Sidecar vs Ambient

Memory Usage

Sidecar mode (100 pods):
- Envoy proxies: 100 x ~100MB = ~10GB
- istio-init: used only at startup

Ambient mode (100 pods, 3 nodes):
- ztunnel: 3 x ~50MB = ~150MB
- waypoint (if needed): 1-2 x ~100MB = ~200MB
- Total: ~350MB (~97% reduction vs sidecar)

Latency

Sidecar mode:
Client AppClient EnvoyNetworkServer EnvoyServer App
             (L4+L7)                    (L4+L7)

Ambient mode (L4 only):
Client App → ztunnel → Network → ztunnel → Server App
             (L4)                 (L4)

Ambient mode (with L7):
Client App → ztunnel → waypoint → ztunnel → Server App
             (L4)      (L7)       (L4)

Ambient shows lower latency when using L4 only. When L7 is needed, an extra hop (waypoint) is added, but resource efficiency improves significantly.

Startup Time

Sidecar mode:
- istio-init iptables setup: 1-2 seconds
- Envoy startup and config reception: 2-5 seconds
- Total additional time: 3-7 seconds

Ambient mode:
- ztunnel already running (DaemonSet)
- Additional overhead at pod start: nearly zero
- eBPF rule application: milliseconds

Limitations and Current Status

Current Limitations

  1. Protocol detection: ztunnel handles L4 only, so automatic protocol detection is limited
  2. Some Envoy features unsupported: EnvoyFilter, some advanced traffic management features
  3. Multi-cluster: Some scenarios not fully supported yet
  4. Windows nodes: Currently unsupported

Migration from Sidecar to Ambient

Recommended migration order:

1. Install Ambient mode (can coexist with sidecars)

2. Enable Ambient on test namespace
   kubectl label namespace test istio.io/dataplane-mode=ambient
   kubectl label namespace test istio-injection-  # disable sidecar

3. After validation, apply to production namespaces

4. Deploy waypoint proxy for namespaces that need it

5. Remove existing sidecar labels

When to Choose Which Mode

Sidecar mode is suitable when:
├── Fine-grained per-pod L7 control is needed
├── Advanced configuration using EnvoyFilter
├── Maintaining an existing stable environment
└── Using Windows nodes

Ambient mode is suitable when:
├── Resource efficiency is important
├── Fast pod startup is needed
├── Minimizing sidecar upgrade burden
├── Mostly needing L4 security (mTLS) only
└── Gradually adding L7 features as needed

Conclusion

Istio Ambient Mesh is an innovative approach to solving the biggest complaint about service meshes: sidecar overhead. By separating functionality into ztunnel (L4) and waypoint proxy (L7):

  1. Resource savings: A single ztunnel per node handles mTLS for hundreds of pods
  2. Operational simplification: No sidecar injection/upgrade needed
  3. Gradual adoption: Start with L4 and add L7 features when needed

In the next post, we will explore the internals of Istio observability.