- Published on
The Complete Guide to Kubernetes Network Policy: Zero Trust Network Security with Cilium and Calico
- Authors
- Name
- Introduction
- Kubernetes NetworkPolicy Architecture
- Cilium CiliumNetworkPolicy Deep Dive
- Calico GlobalNetworkPolicy Deep Dive
- Cilium vs Calico vs Standard NetworkPolicy Comparison
- Implementation Guide
- Monitoring and Troubleshooting
- Failure Cases and Recovery Procedures
- Production Deployment Checklist
- Advanced Patterns: Multi-Cluster Network Policies
- Conclusion
- References

Introduction
In a Kubernetes cluster, Pods can freely communicate with all other Pods by default. While this is convenient during early development, it becomes a serious security threat in production environments. If a single Pod is compromised, lateral movement to every service in the cluster becomes possible. In fact, 67% of Kubernetes security incidents investigated in the 2024 CNCF Security Audit were caused by insufficient internal network isolation.
Zero Trust Network architecture addresses this problem by trusting no traffic within the network and allowing only explicitly permitted communication. The core tool for implementing this in Kubernetes is the NetworkPolicy.
However, the standard Kubernetes NetworkPolicy only supports L3/L4 (IP, port) level controls and does not provide advanced features like DNS-based policies or HTTP path-based filtering. To overcome these limitations, CNI plugins such as Cilium and Calico emerged. Cilium uses eBPF to provide precise policies up to L7, while Calico implements enterprise-grade network security through BGP routing and GlobalNetworkPolicy.
This article covers everything from Kubernetes NetworkPolicy fundamentals through advanced Cilium and Calico policy implementation, comparative analysis, monitoring and troubleshooting, real-world failure cases and recovery procedures, and a production deployment checklist.
Kubernetes NetworkPolicy Architecture
NetworkPolicy Basic Structure
Kubernetes NetworkPolicy is a namespace-scoped resource that controls network traffic at the Pod level. The CNI plugin enforces the actual policy; on CNIs that do not support NetworkPolicy (e.g., Flannel), creating the resource has no effect.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: api-server-policy
namespace: production
spec:
podSelector:
matchLabels:
app: api-server
policyTypes:
- Ingress
- Egress
ingress:
- from:
- namespaceSelector:
matchLabels:
environment: production
podSelector:
matchLabels:
role: frontend
- ipBlock:
cidr: 10.0.0.0/8
except:
- 10.0.1.0/24
ports:
- protocol: TCP
port: 8080
egress:
- to:
- podSelector:
matchLabels:
app: database
ports:
- protocol: TCP
port: 5432
- to:
- namespaceSelector: {}
ports:
- protocol: UDP
port: 53
This policy works as follows:
- Target Pod Selection: Applied to Pods with the
app: api-serverlabel viapodSelector - Ingress Rules: Only allows access to TCP port 8080 from frontend Pods in the production namespace and the 10.0.0.0/8 CIDR range (excluding 10.0.1.0/24)
- Egress Rules: Only allows TCP 5432 access to database Pods and DNS (UDP 53) traffic
Default Deny Strategy
The foundation of Zero Trust is blocking all traffic first, then explicitly allowing only what is needed. A Default Deny policy blocks all ingress and egress for every Pod in a namespace.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
namespace: production
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
An empty podSelector selects all Pods in the namespace. Both Ingress and Egress are specified in policyTypes, but since there are no allow rules, all traffic is blocked. DNS must be separately allowed for service discovery to work.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-dns-egress
namespace: production
spec:
podSelector: {}
policyTypes:
- Egress
egress:
- to:
- namespaceSelector: {}
ports:
- protocol: UDP
port: 53
- protocol: TCP
port: 53
Namespace Isolation Patterns
In large clusters, inter-namespace isolation is essential. The following pattern allows only intra-namespace communication.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: namespace-isolation
namespace: team-alpha
spec:
podSelector: {}
policyTypes:
- Ingress
ingress:
- from:
- podSelector: {}
When podSelector: {} is used without a namespaceSelector, only Pods in the current namespace are matched. This provides a concise implementation of inter-namespace isolation.
Cilium CiliumNetworkPolicy Deep Dive
Cilium Architecture and eBPF
Cilium uses the Linux kernel's eBPF (extended Berkeley Packet Filter) technology to enforce network policies at the kernel level. Unlike the traditional iptables-based approach, eBPF provides a programmable data plane, so there is virtually no performance degradation even as the number of policies increases.
The core components of Cilium are:
- Cilium Agent: Runs as a DaemonSet on each node, compiling and loading eBPF programs into the kernel
- Cilium Operator: Manages cluster-wide resources
- Hubble: A monitoring tool that observes network flows in real time
- Envoy Proxy: Acts as a transparent proxy when enforcing L7 policies
L3-L7 Filtering Implementation
Cilium's CiliumNetworkPolicy includes all the features of the standard NetworkPolicy while adding fine-grained L7 (HTTP, gRPC, Kafka) level controls.
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
name: l7-api-policy
namespace: production
spec:
endpointSelector:
matchLabels:
app: api-server
ingress:
- fromEndpoints:
- matchLabels:
app: frontend
toPorts:
- ports:
- port: '8080'
protocol: TCP
rules:
http:
- method: GET
path: '/api/v1/products'
- method: POST
path: '/api/v1/orders'
headers:
- 'Content-Type: application/json'
- method: GET
path: '/healthz'
This policy allows only GET /api/v1/products, POST /api/v1/orders (with required JSON Content-Type header), and GET /healthz requests from frontend Pods to the api-server. Other HTTP methods like PUT and DELETE are blocked.
DNS-Based Policies
Cilium can control egress traffic based on FQDN (Fully Qualified Domain Name). This is useful for restricting external API calls to specific domains.
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
name: dns-based-egress
namespace: production
spec:
endpointSelector:
matchLabels:
app: payment-service
egress:
- toEndpoints:
- matchLabels:
io.kubernetes.pod.namespace: kube-system
k8s-app: kube-dns
toPorts:
- ports:
- port: '53'
protocol: ANY
rules:
dns:
- matchPattern: '*.stripe.com'
- matchPattern: '*.amazonaws.com'
- toFQDNs:
- matchPattern: '*.stripe.com'
toPorts:
- ports:
- port: '443'
protocol: TCP
- toFQDNs:
- matchPattern: '*.amazonaws.com'
toPorts:
- ports:
- port: '443'
protocol: TCP
This policy restricts the payment-service to HTTPS communication only with stripe.com and amazonaws.com domains. DNS queries themselves are also only allowed for those domains.
CiliumClusterwideNetworkPolicy
Policies that apply across the entire cluster use CiliumClusterwideNetworkPolicy.
apiVersion: cilium.io/v2
kind: CiliumClusterwideNetworkPolicy
metadata:
name: block-metadata-access
spec:
endpointSelector: {}
egressDeny:
- toCIDR:
- 169.254.169.254/32
toPorts:
- ports:
- port: '80'
protocol: TCP
This policy blocks all Pods in the cluster from accessing the cloud metadata service (169.254.169.254). This is a critical security measure to prevent credential theft attacks via IMDS.
Calico GlobalNetworkPolicy Deep Dive
Calico Architecture
Calico is a networking solution developed by Tigera that provides L3 routing and a policy engine based on BGP (Border Gateway Protocol). Its major components are:
- Felix: An agent running on each node that manages routing tables and iptables/eBPF rules
- BIRD: A BGP client that exchanges routing information between nodes
- Typha: A proxy between Felix and the Kubernetes API Server that reduces API Server load
- calicoctl: A CLI tool for managing Calico resources
GlobalNetworkPolicy Implementation
Calico's GlobalNetworkPolicy is a cluster-wide policy that is not namespace-scoped. It takes precedence over standard Kubernetes NetworkPolicies.
apiVersion: projectcalico.org/v3
kind: GlobalNetworkPolicy
metadata:
name: deny-egress-to-internet
spec:
selector: environment == 'production'
types:
- Egress
egress:
- action: Allow
destination:
nets:
- 10.0.0.0/8
- 172.16.0.0/12
- 192.168.0.0/16
- action: Allow
protocol: UDP
destination:
ports:
- 53
- action: Deny
destination:
notNets:
- 10.0.0.0/8
- 172.16.0.0/12
- 192.168.0.0/16
This policy allows all production Pods to communicate only with RFC 1918 private IP ranges and DNS traffic, blocking direct communication to the internet.
Calico Tier System
In Calico Enterprise, policy tiers can be used to clearly manage the order of policy evaluation.
apiVersion: projectcalico.org/v3
kind: Tier
metadata:
name: security
spec:
order: 100
---
apiVersion: projectcalico.org/v3
kind: Tier
metadata:
name: platform
spec:
order: 200
---
apiVersion: projectcalico.org/v3
kind: GlobalNetworkPolicy
metadata:
name: security.block-known-threats
spec:
tier: security
order: 10
selector: all()
types:
- Ingress
- Egress
ingress:
- action: Deny
source:
nets:
- 198.51.100.0/24
- action: Pass
egress:
- action: Deny
destination:
nets:
- 198.51.100.0/24
- action: Pass
Since the security team's policies are evaluated before the platform team's policies, known threat IPs can be blocked regardless of platform policies.
Cilium vs Calico vs Standard NetworkPolicy Comparison
A comparison of the key features and characteristics of each solution.
| Feature | Kubernetes NetworkPolicy | Cilium | Calico |
|---|---|---|---|
| Policy Scope | Namespace | Namespace + Cluster | Namespace + Cluster |
| L3/L4 Support | Yes | Yes | Yes |
| L7 Support | No | Yes (HTTP, gRPC, Kafka, DNS) | Partial (Enterprise only) |
| DNS-based Policy | No | Yes (FQDN matching) | Yes (Calico Enterprise) |
| Policy Engine | CNI-dependent | eBPF native | iptables / eBPF selectable |
| Performance (1000+ policies) | iptables-based degradation | Consistent with eBPF | Good in eBPF mode |
| Monitoring | Requires separate tools | Hubble built-in | calicoctl + Prometheus |
| FQDN Egress | No | Yes | Yes (Enterprise) |
| Policy Tiers | No | No | Yes (Enterprise) |
| Host Firewall | No | Yes | Yes |
| Encryption | No | WireGuard/IPsec | WireGuard |
| Multi-cluster | No | Cluster Mesh | Calico Federation |
| CNCF Status | Standard | Graduated | - (Tigera commercial) |
| GUI Management | No | Hubble UI | Calico Enterprise UI |
Implementation Guide
Step-by-Step Default Deny Rollout
Applying Default Deny all at once in a production environment can cause large-scale outages. Here is a safe step-by-step rollout procedure.
Step 1: Start with Audit Mode (Cilium)
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
name: audit-default-deny
namespace: staging
annotations:
policy.cilium.io/audit-mode: 'true'
spec:
endpointSelector: {}
ingress:
- fromEndpoints:
- matchLabels:
reserved:host: ''
egress:
- toEndpoints:
- matchLabels:
reserved:host: ''
In audit mode, the policy does not actually block traffic but logs traffic that would have been blocked via Hubble.
Step 2: Analyze Traffic with Hubble
# Check traffic audited by the policy using Hubble CLI
hubble observe --namespace staging --verdict AUDIT --output json | \
jq '.flow | {src: .source.labels, dst: .destination.labels, port: .l4.TCP.destination_port}'
# Understand communication patterns within the namespace
hubble observe --namespace staging --type trace:to-endpoint \
--output compact --last 1000
Step 3: Create Allow Policies then Activate Default Deny
After writing all necessary allow policies based on audit log analysis, remove the audit mode annotation to activate the policy.
Microservice Policy Pattern
Common policy patterns used in real microservice environments.
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
name: microservice-pattern
namespace: ecommerce
spec:
endpointSelector:
matchLabels:
app: order-service
ingress:
- fromEndpoints:
- matchLabels:
app: api-gateway
toPorts:
- ports:
- port: '8080'
protocol: TCP
rules:
http:
- method: POST
path: '/orders'
- method: GET
path: '/orders/[0-9]+'
egress:
- toEndpoints:
- matchLabels:
app: inventory-service
toPorts:
- ports:
- port: '8080'
protocol: TCP
- toEndpoints:
- matchLabels:
app: payment-service
toPorts:
- ports:
- port: '8080'
protocol: TCP
- toEndpoints:
- matchLabels:
io.kubernetes.pod.namespace: kube-system
k8s-app: kube-dns
toPorts:
- ports:
- port: '53'
protocol: ANY
This policy restricts the order-service to only receive order-related HTTP requests from the api-gateway, and to communicate only with the inventory-service and payment-service.
Monitoring and Troubleshooting
Cilium Monitoring with Hubble
Hubble is a network observability tool built into Cilium that collects all network flows in real time using eBPF.
# Enable Hubble (during Cilium Helm installation)
helm upgrade cilium cilium/cilium \
--namespace kube-system \
--set hubble.enabled=true \
--set hubble.relay.enabled=true \
--set hubble.ui.enabled=true
# Check dropped traffic in a specific namespace
hubble observe --namespace production --verdict DROPPED \
--output json --last 100
# Trace communication between specific Pods
hubble observe --from-pod production/api-server-7d9f8b6c5d-x2k4m \
--to-pod production/database-5f7b9c8d6e-m3n7p --output compact
# Check policy application status
cilium policy get --endpoint 12345
# Check Cilium endpoint status
cilium endpoint list -o json | \
jq '.[] | select(.status.policy.realized.denied > 0) | {id: .id, labels: .status.labels}'
Calico Troubleshooting with calicoctl
# Check Calico node status
calicoctl node status
# List applied policies
calicoctl get networkpolicy --all-namespaces -o wide
calicoctl get globalnetworkpolicy -o wide
# Simulate policy evaluation for a specific workload
calicoctl get workloadendpoint --all-namespaces -o yaml | \
grep -A 5 "api-server"
# Check policy application in Felix logs
kubectl logs -n calico-system -l k8s-app=calico-node -c calico-node \
--tail=100 | grep -i "policy"
# Check BGP peer status
calicoctl node status | grep -A 10 "BGP"
Prometheus Metrics Collection
Both Cilium and Calico expose Prometheus metrics.
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: cilium-metrics
namespace: monitoring
spec:
selector:
matchLabels:
k8s-app: cilium
namespaceSelector:
matchNames:
- kube-system
endpoints:
- port: metrics
interval: 15s
path: /metrics
Key monitoring metrics include:
cilium_policy_verdict_total: Policy verdict result counters (FORWARDED, DROPPED, AUDIT)cilium_drop_count_total: Number of packets dropped by policiescilium_policy_import_errors_total: Number of policy parsing errorscalico_felix_active_local_policies: Number of policies actively managed by Felixcalico_felix_iptables_save_errors: Number of iptables rule save errors
Failure Cases and Recovery Procedures
Case 1: Complete Service Outage from Bulk Default Deny Application
Situation: The operations team applied a Default Deny policy to the production namespace without testing. They forgot the DNS allow policy, causing all service discovery to fail and all microservices to lose inter-service communication.
Symptoms:
- All Pod readiness probes failing
- HTTP request timeouts between services
- No DNS query responses
Recovery Procedure:
# 1. Emergency action: Remove the problematic Default Deny policy
kubectl delete networkpolicy default-deny-all -n production
# 2. Verify status
kubectl get pods -n production -o wide
kubectl get endpoints -n production
# 3. Verify DNS communication
kubectl exec -n production deploy/api-server -- nslookup kubernetes.default
# 4. Reapply correct policy set after root cause analysis
# - Apply DNS allow policy first
kubectl apply -f allow-dns-egress.yaml
# - Apply inter-service communication policies
kubectl apply -f service-communication-policies/
# - Apply Default Deny last
kubectl apply -f default-deny-all.yaml
Lesson Learned: Default Deny must always be applied last, after all allow policies are in place. Pre-validation in a staging environment is mandatory.
Case 2: Cilium DNS Policy and CoreDNS Cache Mismatch
Situation: After applying Cilium FQDN-based egress policies, CoreDNS cached DNS responses bypassed Cilium's DNS proxy, causing the policy to not be enforced.
Symptoms:
- Certain FQDN policies intermittently not working
- DNS queries for the FQDN not observed in Hubble
- Policy works normally after CoreDNS cache TTL expires
Recovery Procedure:
# 1. Check CoreDNS cache
kubectl exec -n kube-system deploy/coredns -- \
curl -s http://localhost:9153/metrics | grep 'coredns_cache'
# 2. Check Cilium DNS proxy logs
cilium monitor --type drop --related-to fqdn
# 3. Reapply DNS policy and restart Cilium Agent
kubectl rollout restart daemonset/cilium -n kube-system
# 4. Verify CoreDNS configuration for forwarding through Cilium DNS proxy
kubectl get configmap coredns -n kube-system -o yaml
Case 3: Calico Policy Order Conflict
Situation: Two teams independently created GlobalNetworkPolicies with the same order value, and an unintended Allow rule was evaluated before a Deny rule, effectively bypassing the security policy.
Symptoms:
- Traffic that should be blocked is being allowed
- Policy order conflict discovered during calicoctl inspection
Recovery Procedure:
# 1. Check order of all GlobalNetworkPolicies
calicoctl get globalnetworkpolicy -o yaml | grep -E "name:|order:"
# 2. Fix the order of conflicting policies
calicoctl apply -f - <<EOF
apiVersion: projectcalico.org/v3
kind: GlobalNetworkPolicy
metadata:
name: security-block-external
spec:
order: 50
selector: all()
types:
- Egress
egress:
- action: Deny
destination:
notNets:
- 10.0.0.0/8
EOF
# 3. Verify policy application order
calicoctl get globalnetworkpolicy -o wide | sort -k3 -n
Production Deployment Checklist
Pre-Deployment
- Verify CNI plugin (Cilium/Calico) is installed and functioning correctly
- Confirm all namespaces have appropriate labels
- Confirm all workloads have labels (app, role, etc.) for policy selection
- Prepare allow policies for critical kube-system services (CoreDNS, metrics-server)
Policy Application Order
- Step 1: Apply DNS allow policies to all namespaces
- Step 2: Allow essential communication with kube-system namespace
- Step 3: Apply inter-service communication allow policies
- Step 4: Apply external communication allow policies
- Step 5: Test Default Deny policy in staging first
- Step 6: Apply Default Deny to production (with concurrent traffic monitoring)
Monitoring Essentials
- Configure real-time dropped traffic monitoring via Hubble or calicoctl
- Add policy-related metrics to Prometheus + Grafana dashboards
- Set up PagerDuty/Slack alerts for policy violations
- Automate periodic policy audits
Operational Notes
- Always use
--dry-run=clientfor pre-validation when changing policies - Prefer label-based policies over CIDR-based policies (CIDR is vulnerable to IP changes)
- Consider separate policies for Headless Services (Pod IPs used directly)
- Verify ingress traffic paths for NodePort and LoadBalancer services
- Be cautious about blocking inter-node communication (kubelet, etcd) when using Cilium Host Policies
- Manage policy backups in Git repositories (GitOps integration recommended)
Rollback Plan
- Prepare scripts to immediately remove Default Deny policies in emergencies
- Establish Hubble/calicoctl snapshot comparison processes before and after policy changes
- Set up pipelines to immediately restore previous policy versions from Git
Advanced Patterns: Multi-Cluster Network Policies
Cilium Cluster Mesh
With Cilium Cluster Mesh, network policies can be applied across multiple clusters.
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
name: cross-cluster-policy
namespace: shared-services
spec:
endpointSelector:
matchLabels:
app: shared-database
ingress:
- fromEndpoints:
- matchLabels:
app: backend
io.cilium.k8s.policy.cluster: cluster-east
toPorts:
- ports:
- port: '5432'
protocol: TCP
- fromEndpoints:
- matchLabels:
app: backend
io.cilium.k8s.policy.cluster: cluster-west
toPorts:
- ports:
- port: '5432'
protocol: TCP
Calico Federation
With Calico, Federation enables configuring network policies for services in remote clusters.
apiVersion: projectcalico.org/v3
kind: GlobalNetworkPolicy
metadata:
name: federated-service-access
spec:
selector: app == 'frontend'
types:
- Egress
egress:
- action: Allow
destination:
selector: app == 'api-gateway'
namespaceSelector: global()
protocol: TCP
Conclusion
Kubernetes network policies are a core component of cluster security. While the standard NetworkPolicy provides L3/L4 level isolation, advanced features from CNI plugins like Cilium or Calico are essential in production environments.
Cilium excels with its eBPF-based high-performance data plane, L7 policies, DNS-based egress control, and Hubble observability. Calico is well-suited for enterprise environments with its BGP routing integration, GlobalNetworkPolicy, and policy tier system.
Regardless of which solution you choose, adopt a Zero Trust approach based on a Default Deny strategy, and always perform thorough validation in audit mode and staging environments before applying policies. Network policies are not a one-time deployment but a living security element that must be continuously managed and updated as services evolve.