- Published on
Complete Guide to Kubernetes Network Policy and Service Mesh (Istio, Cilium, Calico Comparison)
- Authors
- Name
- Introduction
- Kubernetes Network Policy Basics
- Advanced Network Policy: Egress, CIDR, and Port Control
- Service Mesh Architecture Comparison (Istio vs Cilium vs Calico)
- Building Service Mesh with Istio
- Cilium eBPF-based Service Mesh
- mTLS and Zero Trust Networking
- Operational Considerations and Troubleshooting
- Failure Cases and Recovery Procedures
- Performance Benchmarks and Selection Guide
- Conclusion
- References

Introduction
In a Kubernetes cluster, Pod-to-Pod communication is allow-all by default. This means any Pod can freely access any other Pod within the same cluster. While this isn't a major issue in small-scale development environments, it becomes a serious security threat in production environments where dozens or hundreds of microservices are running.
If an attacker compromises a single Pod, lateral movement to all services within the cluster becomes possible. To prevent this, Network Policy for network segmentation and Service Mesh for mTLS encryption and zero trust architecture are essential.
This article covers Kubernetes Network Policy from basics to advanced topics, and provides a comparative analysis of three major Service Mesh solutions: Istio, Cilium, and Calico. It includes real-world troubleshooting cases and performance benchmarks to help you make the best choice for your environment.
Kubernetes Network Policy Basics
What is Network Policy?
Network Policy is a Kubernetes-native resource that acts as a firewall rule controlling inbound (Ingress) and outbound (Egress) traffic at the Pod level. It operates based on label selectors and provides consistent policy enforcement even when Pods are restarted or moved between nodes.
Important prerequisite: Even if you create Network Policy resources, the policies will not take effect without a CNI plugin (Calico, Cilium, Antrea, etc.) that implements them. Default kubenet and Flannel do not support Network Policy.
Default Deny Policy
The starting point for all network security is the Default Deny policy. First block all traffic, then apply a whitelist approach that explicitly allows only necessary communication.
# default-deny-all.yaml
# Block all Ingress/Egress traffic for all Pods in the namespace
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
namespace: production
spec:
podSelector: {} # Empty selector = all Pods in namespace
policyTypes:
- Ingress
- Egress
Once this policy is applied, all Pods in the production namespace will have both inbound and outbound traffic completely blocked. Since DNS lookups will also fail, you must apply a DNS allow policy alongside it.
# allow-dns.yaml
# Policy to allow access to kube-dns (CoreDNS)
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-dns
namespace: production
spec:
podSelector: {}
policyTypes:
- Egress
egress:
- to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: kube-system
ports:
- protocol: UDP
port: 53
- protocol: TCP
port: 53
Allowing Specific Pod-to-Pod Communication
Here's an example of allowing frontend access to the backend API from a Default Deny state.
# allow-frontend-to-backend.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-frontend-to-backend
namespace: production
spec:
podSelector:
matchLabels:
app: backend-api
policyTypes:
- Ingress
ingress:
- from:
- podSelector:
matchLabels:
app: frontend
ports:
- protocol: TCP
port: 8080
This policy allows only inbound traffic from Pods with the app: frontend label to TCP port 8080 of app: backend-api Pods.
Advanced Network Policy: Egress, CIDR, and Port Control
Controlling External Access with Egress Policy
When microservices need to access external APIs or databases, Egress policies can precisely restrict allowed targets.
# egress-external-api-and-db.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: backend-egress-policy
namespace: production
spec:
podSelector:
matchLabels:
app: backend-api
policyTypes:
- Egress
egress:
# 1. Allow access to Redis in the same namespace
- to:
- podSelector:
matchLabels:
app: redis
ports:
- protocol: TCP
port: 6379
# 2. Allow access to external PostgreSQL RDS (CIDR-based)
- to:
- ipBlock:
cidr: 10.100.0.0/16
ports:
- protocol: TCP
port: 5432
# 3. Allow external HTTPS API access
- to:
- ipBlock:
cidr: 0.0.0.0/0
except:
- 10.0.0.0/8
- 172.16.0.0/12
- 192.168.0.0/16
ports:
- protocol: TCP
port: 443
# 4. Allow DNS
- to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: kube-system
ports:
- protocol: UDP
port: 53
Cross-Namespace Communication Control
Namespace isolation is essential in multi-tenant environments. Here's a pattern that allows access from specific namespaces only.
# cross-namespace-policy.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-monitoring-access
namespace: production
spec:
podSelector:
matchLabels:
app: backend-api
policyTypes:
- Ingress
ingress:
# Allow metrics scraping only from Prometheus in the monitoring namespace
- from:
- namespaceSelector:
matchLabels:
team: monitoring
podSelector:
matchLabels:
app: prometheus
ports:
- protocol: TCP
port: 9090
Limitations of Network Policy
Kubernetes basic Network Policy only supports L3/L4 level (IP, port, protocol) control. The following requirements cannot be addressed with basic Network Policy:
- L7 (HTTP path, method, header) based filtering
- mTLS encryption and service authentication
- Traffic observability and distributed tracing
- Advanced traffic management (canary deployments, circuit breakers, retries)
- FQDN (domain) based Egress control
Service Mesh comes into play when these advanced features are needed.
Service Mesh Architecture Comparison (Istio vs Cilium vs Calico)
Here's a comparison of the architecture and features of three major solutions.
| Category | Istio (Ambient Mode) | Cilium Service Mesh | Calico (Enterprise) |
|---|---|---|---|
| Data Plane | ztunnel(L4) + Waypoint Proxy(L7) | eBPF(L3/L4) + per-node Envoy(L7) | iptables/eBPF + Envoy(L7) |
| Sidecar | Not required (Ambient Mode) | Not required | Optional |
| mTLS | Automatic (HBONE protocol) | WireGuard/IPsec | WireGuard manual setup |
| L7 Policy | AuthorizationPolicy | CiliumNetworkPolicy | GlobalNetworkPolicy |
| Observability | Kiali, Jaeger, Prometheus | Hubble (built-in) | Calico Enterprise UI |
| Performance Overhead | Medium (via ztunnel) | Low (kernel level) | Medium |
| CPU Usage | Moderate | 30% less (L4 baseline) | Moderate |
| QPS Performance | High (excellent at low connections) | High (excellent at high connections) | Moderate |
| Multi-cluster | Supported (East-West Gateway) | Cluster Mesh supported | Federation supported |
| Learning Curve | High | Medium | Medium |
| Community | Very large (CNCF Graduated) | Large (CNCF Graduated) | Large (Tigera-led) |
| Windows Nodes | Not supported | Not supported | Supported |
| Best Use Case | Large multi-cluster, precise L7 control | High-performance L4, eBPF-based observability | Hybrid environments, enterprise compliance |
Architecture Selection Criteria
- Only L3/L4 network security needed: Kubernetes basic Network Policy + Calico/Cilium CNI
- L7 traffic management + mTLS is key: Istio Ambient Mode
- High performance + kernel-level observability: Cilium Service Mesh
- Enterprise compliance + hybrid: Calico Enterprise
Building Service Mesh with Istio
Installing Istio Ambient Mode
Istio Ambient Mode became GA starting from Istio 1.24. It provides mTLS and L7 traffic management while reducing CPU/memory overhead by over 90% compared to the traditional sidecar approach.
# Install istioctl
curl -L https://istio.io/downloadIstio | ISTIO_VERSION=1.24.2 sh -
export PATH="$HOME/istio-1.24.2/bin:$PATH"
# Install with Ambient profile
istioctl install --set profile=ambient --skip-confirmation
# Verify installation
kubectl get pods -n istio-system
# NAME READY STATUS RESTARTS AGE
# istiod-7b69f4b6c-xxxxx 1/1 Running 0 60s
# ztunnel-xxxxx 1/1 Running 0 60s
# istio-cni-node-xxxxx 1/1 Running 0 60s
# Enable Ambient mode for namespace
kubectl label namespace production istio.io/dataplane-mode=ambient
Deploying Waypoint Proxy (for L7 policies)
If only L4-level mTLS is needed, ztunnel alone is sufficient. Deploy a Waypoint Proxy when you need L7-level precise traffic control.
# Create Waypoint Proxy
istioctl waypoint apply --namespace production --name backend-waypoint
# Connect Waypoint to specific service
kubectl label service backend-api \
istio.io/use-waypoint=backend-waypoint \
-n production
Istio AuthorizationPolicy Configuration
Istio's L7 policies control down to HTTP methods, paths, and headers through AuthorizationPolicy.
# istio-auth-policy.yaml
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
name: backend-api-policy
namespace: production
spec:
targetRefs:
- kind: Service
group: ''
name: backend-api
action: ALLOW
rules:
- from:
- source:
principals:
- 'cluster.local/ns/production/sa/frontend'
to:
- operation:
methods: ['GET', 'POST']
paths: ['/api/v1/*']
- from:
- source:
principals:
- 'cluster.local/ns/monitoring/sa/prometheus'
to:
- operation:
methods: ['GET']
paths: ['/metrics']
This policy allows only GET and POST to /api/v1/ paths from the frontend service account, and only GET to /metrics from Prometheus. All other requests are rejected with 403 Forbidden.
Cilium eBPF-based Service Mesh
Installing Cilium and Enabling Service Mesh
Cilium leverages eBPF to handle networking at the kernel level. It processes L4 traffic without sidecar proxies and uses a per-node shared Envoy proxy only when L7 processing is needed.
# Install Cilium CLI
CILIUM_CLI_VERSION=$(curl -s https://raw.githubusercontent.com/cilium/cilium-cli/main/stable.txt)
GOOS=$(go env GOOS)
GOARCH=$(go env GOARCH)
curl -L --fail --remote-name-all \
"https://github.com/cilium/cilium-cli/releases/download/${CILIUM_CLI_VERSION}/cilium-${GOOS}-${GOARCH}.tar.gz"
sudo tar xzvfC "cilium-${GOOS}-${GOARCH}.tar.gz" /usr/local/bin
# Install Cilium with Helm (Service Mesh + Hubble enabled)
helm repo add cilium https://helm.cilium.io/
helm repo update
helm install cilium cilium/cilium --version 1.17.0 \
--namespace kube-system \
--set kubeProxyReplacement=true \
--set hubble.enabled=true \
--set hubble.relay.enabled=true \
--set hubble.ui.enabled=true \
--set envoy.enabled=true \
--set encryption.enabled=true \
--set encryption.type=wireguard
# Verify installation
cilium status --wait
cilium connectivity test
Applying L7 Policies with CiliumNetworkPolicy
Cilium supports precise L7 protocol-level policies for HTTP, gRPC, Kafka, and more through its own CRD, CiliumNetworkPolicy.
# cilium-l7-policy.yaml
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
name: backend-l7-policy
namespace: production
spec:
endpointSelector:
matchLabels:
app: backend-api
ingress:
- fromEndpoints:
- matchLabels:
app: frontend
toPorts:
- ports:
- port: '8080'
protocol: TCP
rules:
http:
- method: 'GET'
path: '/api/v1/products'
- method: 'POST'
path: '/api/v1/orders'
- method: 'GET'
path: '/healthz'
- fromEndpoints:
- matchLabels:
app: prometheus
toPorts:
- ports:
- port: '9090'
protocol: TCP
rules:
http:
- method: 'GET'
path: '/metrics'
Network Observability with Hubble
Cilium's Hubble monitors all network flows in real-time based on eBPF.
# Observe after installing Hubble CLI
hubble observe --namespace production --follow
# Filter traffic for specific Pod
hubble observe --namespace production \
--to-label app=backend-api \
--verdict DROPPED
# Observe HTTP requests (L7)
hubble observe --namespace production \
--protocol http \
--http-status 5xx
# Visualize network flows (Hubble UI)
cilium hubble port-forward &
# Access http://localhost:12000 in browser
Hubble output example:
TIMESTAMP SOURCE DESTINATION TYPE VERDICT SUMMARY
Mar 14 10:23:01.123 production/frontend production/backend-api L7/HTTP FORWARDED GET /api/v1/products => 200
Mar 14 10:23:01.456 production/attacker production/backend-api L7/HTTP DROPPED POST /api/v1/admin => Policy denied
Mar 14 10:23:02.789 production/backend-api production/redis L4/TCP FORWARDED TCP 6379
mTLS and Zero Trust Networking
What is Zero Trust?
Zero Trust is a security model of "trust nothing, verify everything." It encrypts communication within the cluster and verifies identity in every service-to-service call. Since Network Policy alone cannot encrypt traffic, a Service Mesh providing mTLS (mutual TLS) is required.
Istio Ambient Mode's mTLS
Istio Ambient Mode uses the HBONE (HTTP-Based Overlay Network Environment) protocol to automatically mTLS-encrypt all traffic. ztunnel manages certificates at the node level and issues a unique SPIFFE-based workload ID for each Pod.
# istio-peer-auth.yaml
# STRICT mode: Reject plaintext traffic without mTLS
apiVersion: security.istio.io/v1
kind: PeerAuthentication
metadata:
name: strict-mtls
namespace: production
spec:
mtls:
mode: STRICT
When STRICT mode is set, all services in the namespace will only accept mTLS connections. Plaintext requests from services not included in the mesh are all rejected.
Cilium's WireGuard-based Encryption
Cilium uses kernel-built-in WireGuard to automatically encrypt inter-node traffic. Unlike Istio's mTLS, it operates at the kernel level rather than the application level, resulting in less performance overhead.
# Check WireGuard encryption status
cilium encrypt status
# Output example:
# Encryption: Wireguard
# Keys in use: 2
# Errors: 0
# Interfaces: cilium_wg0
# Check encryption key list
cilium encrypt get
Differences between WireGuard and mTLS:
- WireGuard: L3-level inter-node encryption, kernel-level processing, no individual Pod identity
- mTLS (Istio): L7-level inter-service encryption, SPIFFE-based workload ID, fine-grained authorization policies
In production environments, a dual security strategy is sometimes used: applying Cilium WireGuard for inter-node encryption while additionally implementing Istio mTLS for workload-level authentication.
Operational Considerations and Troubleshooting
1. Network Policy Application Order
Network Policies work in an additive (union) manner. When multiple policies apply to the same Pod, allow rules are aggregated. When conflicting policies exist, deny rules do NOT take priority; if any single policy allows the traffic, it passes through.
# Check all Network Policies applied to specific Pod
kubectl get networkpolicy -n production -o wide
# Check Pod labels (verify policy selector matching)
kubectl get pods -n production --show-labels
# For Calico, verify policy matching
calicoctl get networkpolicy -n production -o yaml
2. Network Policy Ignored Without CNI Plugin
This is the most common mistake. Even if Network Policy resources are created, policies are not enforced at all without a CNI plugin that implements them.
# Check CNI plugin
kubectl get pods -n kube-system | grep -E 'calico|cilium|antrea'
# Test to verify Network Policy is actually enforced
# 1. Apply Default Deny
kubectl apply -f default-deny-all.yaml
# 2. Test communication (should be blocked)
kubectl exec -n production deploy/frontend -- \
curl -s --connect-timeout 3 http://backend-api:8080/healthz
# Timeout indicates policy is properly enforced
3. DNS Resolution Failure
If DNS allow is omitted after applying a Default Deny policy, all service discovery will be disrupted.
# Diagnose DNS issue
kubectl exec -n production deploy/frontend -- nslookup backend-api
# ;; connection timed out; no servers could be reached
# Check CoreDNS Pods
kubectl get pods -n kube-system -l k8s-app=kube-dns
# Re-test after applying DNS policy
kubectl apply -f allow-dns.yaml
kubectl exec -n production deploy/frontend -- nslookup backend-api
# Server: 10.96.0.10
# Name: backend-api.production.svc.cluster.local
4. Istio ztunnel Failure Response
ztunnel runs as a per-node DaemonSet, and when it fails, all Ambient Mesh traffic on that node is disrupted.
# Check ztunnel status
kubectl get pods -n istio-system -l app=ztunnel
# Check ztunnel logs (diagnose certificate issues)
kubectl logs -n istio-system -l app=ztunnel --tail=50
# Restart ztunnel
kubectl rollout restart daemonset/ztunnel -n istio-system
# Check xDS connection status with Istiod
istioctl proxy-status
5. Cilium eBPF Map Capacity Exceeded
In large-scale clusters, the default eBPF map sizes in Cilium may be insufficient.
# Check eBPF map usage
cilium bpf ct list global | wc -l
cilium bpf policy get --all
# Increase map sizes (modify Helm values)
# bpf.ctGlobalTCPMax: 524288 (increase from default)
# bpf.ctGlobalAnyMax: 262144
# bpf.policyMapMax: 65536
# Apply changes
helm upgrade cilium cilium/cilium \
--namespace kube-system \
--reuse-values \
--set bpf.ctGlobalTCPMax=524288
Failure Cases and Recovery Procedures
Case 1: Full Service Outage from Incorrect Egress Policy
Situation: The operations team applied Default Deny Egress for security hardening but omitted the DNS allow policy, disrupting all inter-service communication.
Symptoms: Access via service names failed from all Pods. Direct IP access still worked.
Recovery procedure:
# 1. Immediately identify the problem
kubectl get networkpolicy -n production
# 2. Urgently apply DNS allow policy
kubectl apply -f - <<'POLICY'
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: emergency-allow-dns
namespace: production
spec:
podSelector: {}
policyTypes:
- Egress
egress:
- to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: kube-system
ports:
- protocol: UDP
port: 53
- protocol: TCP
port: 53
POLICY
# 3. Verify service recovery
kubectl exec -n production deploy/frontend -- nslookup backend-api
Lesson: Default Deny policies must always be applied alongside DNS allow policies. Always test in a staging environment before making policy changes, and prepare a rollback plan.
Case 2: mTLS Mismatch During Istio Upgrade
Situation: During an Istio version upgrade, mTLS handshake failures occurred between old-version sidecars and new-version ztunnel.
Symptoms: 503 errors and "upstream connect error" messages between some services.
Recovery procedure:
# 1. Temporarily change mTLS mode to PERMISSIVE (allow both plaintext+mTLS)
kubectl apply -f - <<'POLICY'
apiVersion: security.istio.io/v1
kind: PeerAuthentication
metadata:
name: permissive-during-upgrade
namespace: production
spec:
mtls:
mode: PERMISSIVE
POLICY
# 2. Restart all workloads to apply latest proxy
kubectl rollout restart deployment -n production
# 3. Restore STRICT after all Pods are replaced with new version
kubectl rollout status deployment -n production --timeout=300s
kubectl apply -f - <<'POLICY'
apiVersion: security.istio.io/v1
kind: PeerAuthentication
metadata:
name: strict-mtls
namespace: production
spec:
mtls:
mode: STRICT
POLICY
Case 3: Momentary Traffic Drop Due to Cilium Agent Restart
Situation: During a Cilium DaemonSet update, the eBPF programs on the node were briefly unloaded, interrupting Pod-to-Pod communication on that node for several seconds.
Recovery and prevention:
# Rolling Update strategy to update one node at a time
helm upgrade cilium cilium/cilium \
--namespace kube-system \
--reuse-values \
--set upgradeCompatibility=1.16 \
--set rollOutCiliumPods=true
# Check PodDisruptionBudget
kubectl get pdb -n kube-system
# Monitor update progress
kubectl rollout status daemonset/cilium -n kube-system --timeout=600s
Performance Benchmarks and Selection Guide
Measured Benchmark Results (2025 baseline)
Here's a summary of comparison test results from recent large-scale enterprise environments.
| Metric | Network Policy Only | Istio Ambient | Cilium Service Mesh |
|---|---|---|---|
| P99 Latency (ms) | 1.2 | 3.8 | 2.1 |
| QPS (req/s) | 45,000 | 38,000 | 42,000 |
| QPS per Core | - | 2,178 | 1,815 |
| CPU Overhead | Baseline | +15% | +8% |
| Memory Overhead | Baseline | +120MB/node | +80MB/node |
| Low-connection Perf | - | Excellent | Moderate |
| High-connection Perf | - | Moderate | Excellent |
Note: Istio's higher QPS per Core includes L7 processing capability, and Cilium's CPU measurements exclude in-kernel WireGuard encryption costs.
Selection Guide Flowchart
-
Only L3/L4 network isolation needed?
- Yes: Kubernetes Network Policy + Calico or Cilium CNI is sufficient
- No: Go to 2
-
L7 traffic management (canary, retry, circuit breaker) needed?
- Yes: Istio Ambient Mode or Cilium + Envoy
- No: Go to 3
-
mTLS-based zero trust required?
- Yes: Istio Ambient Mode (SPIFFE-based workload ID)
- No: Cilium WireGuard encryption for inter-node encryption
-
High performance + kernel-level observability a priority?
- Yes: Cilium Service Mesh + Hubble
- No: Istio (richer L7 features)
-
Mixed Windows nodes or hybrid environment?
- Yes: Calico Enterprise
- No: Istio or Cilium
Recommendations by Operational Scale
- Small clusters (10 nodes or fewer): Cilium CNI + basic Network Policy. Service Mesh adoption has limited benefit relative to overhead.
- Medium clusters (10-100 nodes): Cilium Service Mesh or Istio Ambient. Choose based on L7 requirements.
- Large clusters (100+ nodes): Istio Ambient Mode. Mature multi-cluster support, rich ecosystem, stability.
- Hybrid/Multi-cloud: Calico Enterprise or Cilium Cluster Mesh.
Conclusion
Kubernetes network security goes beyond simply applying Network Policy; it requires a comprehensive strategy tailored to your application characteristics and security requirements.
Key takeaways:
- Network Policy is the absolute baseline: Start with Default Deny and apply a whitelist approach that allows only necessary communication. Don't forget the DNS allow policy.
- Introduce Service Mesh when needed: Adopt it when mTLS, L7 policies, and advanced traffic management are actually required. Unnecessary complexity only increases operational burden.
- Istio vs Cilium is a trade-off: Istio has rich L7 features and ecosystem, while Cilium excels at kernel-level performance and observability. Choose based on your requirements.
- Gradual adoption is key: The safest approach is to start with Default Deny policies, thoroughly validate Network Policies, and then add Service Mesh as needed.
- Automation and testing: Policy changes must always be validated in staging first through CI/CD pipelines, and always have a rollback plan ready.
Network security is not a one-time setup. As services evolve, policies must be continuously reviewed and updated. Quarterly policy audits are recommended, and use observability tools like Hubble or Kiali to optimize policies based on actual traffic patterns.
References
- Kubernetes Network Policies Official Docs
- Istio Official Docs - Ambient Mesh
- Cilium Official Docs - Service Mesh
- Calico Official Docs - Network Policy
- Istio Ambient vs Cilium Performance Comparison
- CNCF - Unlocking Cloud Native Security with Cilium and eBPF
- Tigera - Sidecarless mTLS in Kubernetes with Istio Ambient Mesh