Skip to content
Published on

The Complete Guide to Kubernetes Network Policy: Zero Trust Network Security with Cilium and Calico

Authors
  • Name
    Twitter
Kubernetes Network Policy with Cilium and Calico

Introduction

In a Kubernetes cluster, Pods can freely communicate with all other Pods by default. While this is convenient during early development, it becomes a serious security threat in production environments. If a single Pod is compromised, lateral movement to every service in the cluster becomes possible. In fact, 67% of Kubernetes security incidents investigated in the 2024 CNCF Security Audit were caused by insufficient internal network isolation.

Zero Trust Network architecture addresses this problem by trusting no traffic within the network and allowing only explicitly permitted communication. The core tool for implementing this in Kubernetes is the NetworkPolicy.

However, the standard Kubernetes NetworkPolicy only supports L3/L4 (IP, port) level controls and does not provide advanced features like DNS-based policies or HTTP path-based filtering. To overcome these limitations, CNI plugins such as Cilium and Calico emerged. Cilium uses eBPF to provide precise policies up to L7, while Calico implements enterprise-grade network security through BGP routing and GlobalNetworkPolicy.

This article covers everything from Kubernetes NetworkPolicy fundamentals through advanced Cilium and Calico policy implementation, comparative analysis, monitoring and troubleshooting, real-world failure cases and recovery procedures, and a production deployment checklist.

Kubernetes NetworkPolicy Architecture

NetworkPolicy Basic Structure

Kubernetes NetworkPolicy is a namespace-scoped resource that controls network traffic at the Pod level. The CNI plugin enforces the actual policy; on CNIs that do not support NetworkPolicy (e.g., Flannel), creating the resource has no effect.

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: api-server-policy
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: api-server
  policyTypes:
    - Ingress
    - Egress
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              environment: production
          podSelector:
            matchLabels:
              role: frontend
        - ipBlock:
            cidr: 10.0.0.0/8
            except:
              - 10.0.1.0/24
      ports:
        - protocol: TCP
          port: 8080
  egress:
    - to:
        - podSelector:
            matchLabels:
              app: database
      ports:
        - protocol: TCP
          port: 5432
    - to:
        - namespaceSelector: {}
      ports:
        - protocol: UDP
          port: 53

This policy works as follows:

  1. Target Pod Selection: Applied to Pods with the app: api-server label via podSelector
  2. Ingress Rules: Only allows access to TCP port 8080 from frontend Pods in the production namespace and the 10.0.0.0/8 CIDR range (excluding 10.0.1.0/24)
  3. Egress Rules: Only allows TCP 5432 access to database Pods and DNS (UDP 53) traffic

Default Deny Strategy

The foundation of Zero Trust is blocking all traffic first, then explicitly allowing only what is needed. A Default Deny policy blocks all ingress and egress for every Pod in a namespace.

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  namespace: production
spec:
  podSelector: {}
  policyTypes:
    - Ingress
    - Egress

An empty podSelector selects all Pods in the namespace. Both Ingress and Egress are specified in policyTypes, but since there are no allow rules, all traffic is blocked. DNS must be separately allowed for service discovery to work.

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-dns-egress
  namespace: production
spec:
  podSelector: {}
  policyTypes:
    - Egress
  egress:
    - to:
        - namespaceSelector: {}
      ports:
        - protocol: UDP
          port: 53
        - protocol: TCP
          port: 53

Namespace Isolation Patterns

In large clusters, inter-namespace isolation is essential. The following pattern allows only intra-namespace communication.

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: namespace-isolation
  namespace: team-alpha
spec:
  podSelector: {}
  policyTypes:
    - Ingress
  ingress:
    - from:
        - podSelector: {}

When podSelector: {} is used without a namespaceSelector, only Pods in the current namespace are matched. This provides a concise implementation of inter-namespace isolation.

Cilium CiliumNetworkPolicy Deep Dive

Cilium Architecture and eBPF

Cilium uses the Linux kernel's eBPF (extended Berkeley Packet Filter) technology to enforce network policies at the kernel level. Unlike the traditional iptables-based approach, eBPF provides a programmable data plane, so there is virtually no performance degradation even as the number of policies increases.

The core components of Cilium are:

  • Cilium Agent: Runs as a DaemonSet on each node, compiling and loading eBPF programs into the kernel
  • Cilium Operator: Manages cluster-wide resources
  • Hubble: A monitoring tool that observes network flows in real time
  • Envoy Proxy: Acts as a transparent proxy when enforcing L7 policies

L3-L7 Filtering Implementation

Cilium's CiliumNetworkPolicy includes all the features of the standard NetworkPolicy while adding fine-grained L7 (HTTP, gRPC, Kafka) level controls.

apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: l7-api-policy
  namespace: production
spec:
  endpointSelector:
    matchLabels:
      app: api-server
  ingress:
    - fromEndpoints:
        - matchLabels:
            app: frontend
      toPorts:
        - ports:
            - port: '8080'
              protocol: TCP
          rules:
            http:
              - method: GET
                path: '/api/v1/products'
              - method: POST
                path: '/api/v1/orders'
                headers:
                  - 'Content-Type: application/json'
              - method: GET
                path: '/healthz'

This policy allows only GET /api/v1/products, POST /api/v1/orders (with required JSON Content-Type header), and GET /healthz requests from frontend Pods to the api-server. Other HTTP methods like PUT and DELETE are blocked.

DNS-Based Policies

Cilium can control egress traffic based on FQDN (Fully Qualified Domain Name). This is useful for restricting external API calls to specific domains.

apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: dns-based-egress
  namespace: production
spec:
  endpointSelector:
    matchLabels:
      app: payment-service
  egress:
    - toEndpoints:
        - matchLabels:
            io.kubernetes.pod.namespace: kube-system
            k8s-app: kube-dns
      toPorts:
        - ports:
            - port: '53'
              protocol: ANY
          rules:
            dns:
              - matchPattern: '*.stripe.com'
              - matchPattern: '*.amazonaws.com'
    - toFQDNs:
        - matchPattern: '*.stripe.com'
      toPorts:
        - ports:
            - port: '443'
              protocol: TCP
    - toFQDNs:
        - matchPattern: '*.amazonaws.com'
      toPorts:
        - ports:
            - port: '443'
              protocol: TCP

This policy restricts the payment-service to HTTPS communication only with stripe.com and amazonaws.com domains. DNS queries themselves are also only allowed for those domains.

CiliumClusterwideNetworkPolicy

Policies that apply across the entire cluster use CiliumClusterwideNetworkPolicy.

apiVersion: cilium.io/v2
kind: CiliumClusterwideNetworkPolicy
metadata:
  name: block-metadata-access
spec:
  endpointSelector: {}
  egressDeny:
    - toCIDR:
        - 169.254.169.254/32
      toPorts:
        - ports:
            - port: '80'
              protocol: TCP

This policy blocks all Pods in the cluster from accessing the cloud metadata service (169.254.169.254). This is a critical security measure to prevent credential theft attacks via IMDS.

Calico GlobalNetworkPolicy Deep Dive

Calico Architecture

Calico is a networking solution developed by Tigera that provides L3 routing and a policy engine based on BGP (Border Gateway Protocol). Its major components are:

  • Felix: An agent running on each node that manages routing tables and iptables/eBPF rules
  • BIRD: A BGP client that exchanges routing information between nodes
  • Typha: A proxy between Felix and the Kubernetes API Server that reduces API Server load
  • calicoctl: A CLI tool for managing Calico resources

GlobalNetworkPolicy Implementation

Calico's GlobalNetworkPolicy is a cluster-wide policy that is not namespace-scoped. It takes precedence over standard Kubernetes NetworkPolicies.

apiVersion: projectcalico.org/v3
kind: GlobalNetworkPolicy
metadata:
  name: deny-egress-to-internet
spec:
  selector: environment == 'production'
  types:
    - Egress
  egress:
    - action: Allow
      destination:
        nets:
          - 10.0.0.0/8
          - 172.16.0.0/12
          - 192.168.0.0/16
    - action: Allow
      protocol: UDP
      destination:
        ports:
          - 53
    - action: Deny
      destination:
        notNets:
          - 10.0.0.0/8
          - 172.16.0.0/12
          - 192.168.0.0/16

This policy allows all production Pods to communicate only with RFC 1918 private IP ranges and DNS traffic, blocking direct communication to the internet.

Calico Tier System

In Calico Enterprise, policy tiers can be used to clearly manage the order of policy evaluation.

apiVersion: projectcalico.org/v3
kind: Tier
metadata:
  name: security
spec:
  order: 100

---
apiVersion: projectcalico.org/v3
kind: Tier
metadata:
  name: platform
spec:
  order: 200

---
apiVersion: projectcalico.org/v3
kind: GlobalNetworkPolicy
metadata:
  name: security.block-known-threats
spec:
  tier: security
  order: 10
  selector: all()
  types:
    - Ingress
    - Egress
  ingress:
    - action: Deny
      source:
        nets:
          - 198.51.100.0/24
    - action: Pass
  egress:
    - action: Deny
      destination:
        nets:
          - 198.51.100.0/24
    - action: Pass

Since the security team's policies are evaluated before the platform team's policies, known threat IPs can be blocked regardless of platform policies.

Cilium vs Calico vs Standard NetworkPolicy Comparison

A comparison of the key features and characteristics of each solution.

FeatureKubernetes NetworkPolicyCiliumCalico
Policy ScopeNamespaceNamespace + ClusterNamespace + Cluster
L3/L4 SupportYesYesYes
L7 SupportNoYes (HTTP, gRPC, Kafka, DNS)Partial (Enterprise only)
DNS-based PolicyNoYes (FQDN matching)Yes (Calico Enterprise)
Policy EngineCNI-dependenteBPF nativeiptables / eBPF selectable
Performance (1000+ policies)iptables-based degradationConsistent with eBPFGood in eBPF mode
MonitoringRequires separate toolsHubble built-incalicoctl + Prometheus
FQDN EgressNoYesYes (Enterprise)
Policy TiersNoNoYes (Enterprise)
Host FirewallNoYesYes
EncryptionNoWireGuard/IPsecWireGuard
Multi-clusterNoCluster MeshCalico Federation
CNCF StatusStandardGraduated- (Tigera commercial)
GUI ManagementNoHubble UICalico Enterprise UI

Implementation Guide

Step-by-Step Default Deny Rollout

Applying Default Deny all at once in a production environment can cause large-scale outages. Here is a safe step-by-step rollout procedure.

Step 1: Start with Audit Mode (Cilium)

apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: audit-default-deny
  namespace: staging
  annotations:
    policy.cilium.io/audit-mode: 'true'
spec:
  endpointSelector: {}
  ingress:
    - fromEndpoints:
        - matchLabels:
            reserved:host: ''
  egress:
    - toEndpoints:
        - matchLabels:
            reserved:host: ''

In audit mode, the policy does not actually block traffic but logs traffic that would have been blocked via Hubble.

Step 2: Analyze Traffic with Hubble

# Check traffic audited by the policy using Hubble CLI
hubble observe --namespace staging --verdict AUDIT --output json | \
  jq '.flow | {src: .source.labels, dst: .destination.labels, port: .l4.TCP.destination_port}'

# Understand communication patterns within the namespace
hubble observe --namespace staging --type trace:to-endpoint \
  --output compact --last 1000

Step 3: Create Allow Policies then Activate Default Deny

After writing all necessary allow policies based on audit log analysis, remove the audit mode annotation to activate the policy.

Microservice Policy Pattern

Common policy patterns used in real microservice environments.

apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: microservice-pattern
  namespace: ecommerce
spec:
  endpointSelector:
    matchLabels:
      app: order-service
  ingress:
    - fromEndpoints:
        - matchLabels:
            app: api-gateway
      toPorts:
        - ports:
            - port: '8080'
              protocol: TCP
          rules:
            http:
              - method: POST
                path: '/orders'
              - method: GET
                path: '/orders/[0-9]+'
  egress:
    - toEndpoints:
        - matchLabels:
            app: inventory-service
      toPorts:
        - ports:
            - port: '8080'
              protocol: TCP
    - toEndpoints:
        - matchLabels:
            app: payment-service
      toPorts:
        - ports:
            - port: '8080'
              protocol: TCP
    - toEndpoints:
        - matchLabels:
            io.kubernetes.pod.namespace: kube-system
            k8s-app: kube-dns
      toPorts:
        - ports:
            - port: '53'
              protocol: ANY

This policy restricts the order-service to only receive order-related HTTP requests from the api-gateway, and to communicate only with the inventory-service and payment-service.

Monitoring and Troubleshooting

Cilium Monitoring with Hubble

Hubble is a network observability tool built into Cilium that collects all network flows in real time using eBPF.

# Enable Hubble (during Cilium Helm installation)
helm upgrade cilium cilium/cilium \
  --namespace kube-system \
  --set hubble.enabled=true \
  --set hubble.relay.enabled=true \
  --set hubble.ui.enabled=true

# Check dropped traffic in a specific namespace
hubble observe --namespace production --verdict DROPPED \
  --output json --last 100

# Trace communication between specific Pods
hubble observe --from-pod production/api-server-7d9f8b6c5d-x2k4m \
  --to-pod production/database-5f7b9c8d6e-m3n7p --output compact

# Check policy application status
cilium policy get --endpoint 12345

# Check Cilium endpoint status
cilium endpoint list -o json | \
  jq '.[] | select(.status.policy.realized.denied > 0) | {id: .id, labels: .status.labels}'

Calico Troubleshooting with calicoctl

# Check Calico node status
calicoctl node status

# List applied policies
calicoctl get networkpolicy --all-namespaces -o wide
calicoctl get globalnetworkpolicy -o wide

# Simulate policy evaluation for a specific workload
calicoctl get workloadendpoint --all-namespaces -o yaml | \
  grep -A 5 "api-server"

# Check policy application in Felix logs
kubectl logs -n calico-system -l k8s-app=calico-node -c calico-node \
  --tail=100 | grep -i "policy"

# Check BGP peer status
calicoctl node status | grep -A 10 "BGP"

Prometheus Metrics Collection

Both Cilium and Calico expose Prometheus metrics.

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: cilium-metrics
  namespace: monitoring
spec:
  selector:
    matchLabels:
      k8s-app: cilium
  namespaceSelector:
    matchNames:
      - kube-system
  endpoints:
    - port: metrics
      interval: 15s
      path: /metrics

Key monitoring metrics include:

  • cilium_policy_verdict_total: Policy verdict result counters (FORWARDED, DROPPED, AUDIT)
  • cilium_drop_count_total: Number of packets dropped by policies
  • cilium_policy_import_errors_total: Number of policy parsing errors
  • calico_felix_active_local_policies: Number of policies actively managed by Felix
  • calico_felix_iptables_save_errors: Number of iptables rule save errors

Failure Cases and Recovery Procedures

Case 1: Complete Service Outage from Bulk Default Deny Application

Situation: The operations team applied a Default Deny policy to the production namespace without testing. They forgot the DNS allow policy, causing all service discovery to fail and all microservices to lose inter-service communication.

Symptoms:

  • All Pod readiness probes failing
  • HTTP request timeouts between services
  • No DNS query responses

Recovery Procedure:

# 1. Emergency action: Remove the problematic Default Deny policy
kubectl delete networkpolicy default-deny-all -n production

# 2. Verify status
kubectl get pods -n production -o wide
kubectl get endpoints -n production

# 3. Verify DNS communication
kubectl exec -n production deploy/api-server -- nslookup kubernetes.default

# 4. Reapply correct policy set after root cause analysis
# - Apply DNS allow policy first
kubectl apply -f allow-dns-egress.yaml
# - Apply inter-service communication policies
kubectl apply -f service-communication-policies/
# - Apply Default Deny last
kubectl apply -f default-deny-all.yaml

Lesson Learned: Default Deny must always be applied last, after all allow policies are in place. Pre-validation in a staging environment is mandatory.

Case 2: Cilium DNS Policy and CoreDNS Cache Mismatch

Situation: After applying Cilium FQDN-based egress policies, CoreDNS cached DNS responses bypassed Cilium's DNS proxy, causing the policy to not be enforced.

Symptoms:

  • Certain FQDN policies intermittently not working
  • DNS queries for the FQDN not observed in Hubble
  • Policy works normally after CoreDNS cache TTL expires

Recovery Procedure:

# 1. Check CoreDNS cache
kubectl exec -n kube-system deploy/coredns -- \
  curl -s http://localhost:9153/metrics | grep 'coredns_cache'

# 2. Check Cilium DNS proxy logs
cilium monitor --type drop --related-to fqdn

# 3. Reapply DNS policy and restart Cilium Agent
kubectl rollout restart daemonset/cilium -n kube-system

# 4. Verify CoreDNS configuration for forwarding through Cilium DNS proxy
kubectl get configmap coredns -n kube-system -o yaml

Case 3: Calico Policy Order Conflict

Situation: Two teams independently created GlobalNetworkPolicies with the same order value, and an unintended Allow rule was evaluated before a Deny rule, effectively bypassing the security policy.

Symptoms:

  • Traffic that should be blocked is being allowed
  • Policy order conflict discovered during calicoctl inspection

Recovery Procedure:

# 1. Check order of all GlobalNetworkPolicies
calicoctl get globalnetworkpolicy -o yaml | grep -E "name:|order:"

# 2. Fix the order of conflicting policies
calicoctl apply -f - <<EOF
apiVersion: projectcalico.org/v3
kind: GlobalNetworkPolicy
metadata:
  name: security-block-external
spec:
  order: 50
  selector: all()
  types:
    - Egress
  egress:
    - action: Deny
      destination:
        notNets:
          - 10.0.0.0/8
EOF

# 3. Verify policy application order
calicoctl get globalnetworkpolicy -o wide | sort -k3 -n

Production Deployment Checklist

Pre-Deployment

  • Verify CNI plugin (Cilium/Calico) is installed and functioning correctly
  • Confirm all namespaces have appropriate labels
  • Confirm all workloads have labels (app, role, etc.) for policy selection
  • Prepare allow policies for critical kube-system services (CoreDNS, metrics-server)

Policy Application Order

  • Step 1: Apply DNS allow policies to all namespaces
  • Step 2: Allow essential communication with kube-system namespace
  • Step 3: Apply inter-service communication allow policies
  • Step 4: Apply external communication allow policies
  • Step 5: Test Default Deny policy in staging first
  • Step 6: Apply Default Deny to production (with concurrent traffic monitoring)

Monitoring Essentials

  • Configure real-time dropped traffic monitoring via Hubble or calicoctl
  • Add policy-related metrics to Prometheus + Grafana dashboards
  • Set up PagerDuty/Slack alerts for policy violations
  • Automate periodic policy audits

Operational Notes

  • Always use --dry-run=client for pre-validation when changing policies
  • Prefer label-based policies over CIDR-based policies (CIDR is vulnerable to IP changes)
  • Consider separate policies for Headless Services (Pod IPs used directly)
  • Verify ingress traffic paths for NodePort and LoadBalancer services
  • Be cautious about blocking inter-node communication (kubelet, etcd) when using Cilium Host Policies
  • Manage policy backups in Git repositories (GitOps integration recommended)

Rollback Plan

  • Prepare scripts to immediately remove Default Deny policies in emergencies
  • Establish Hubble/calicoctl snapshot comparison processes before and after policy changes
  • Set up pipelines to immediately restore previous policy versions from Git

Advanced Patterns: Multi-Cluster Network Policies

Cilium Cluster Mesh

With Cilium Cluster Mesh, network policies can be applied across multiple clusters.

apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: cross-cluster-policy
  namespace: shared-services
spec:
  endpointSelector:
    matchLabels:
      app: shared-database
  ingress:
    - fromEndpoints:
        - matchLabels:
            app: backend
            io.cilium.k8s.policy.cluster: cluster-east
      toPorts:
        - ports:
            - port: '5432'
              protocol: TCP
    - fromEndpoints:
        - matchLabels:
            app: backend
            io.cilium.k8s.policy.cluster: cluster-west
      toPorts:
        - ports:
            - port: '5432'
              protocol: TCP

Calico Federation

With Calico, Federation enables configuring network policies for services in remote clusters.

apiVersion: projectcalico.org/v3
kind: GlobalNetworkPolicy
metadata:
  name: federated-service-access
spec:
  selector: app == 'frontend'
  types:
    - Egress
  egress:
    - action: Allow
      destination:
        selector: app == 'api-gateway'
        namespaceSelector: global()
      protocol: TCP

Conclusion

Kubernetes network policies are a core component of cluster security. While the standard NetworkPolicy provides L3/L4 level isolation, advanced features from CNI plugins like Cilium or Calico are essential in production environments.

Cilium excels with its eBPF-based high-performance data plane, L7 policies, DNS-based egress control, and Hubble observability. Calico is well-suited for enterprise environments with its BGP routing integration, GlobalNetworkPolicy, and policy tier system.

Regardless of which solution you choose, adopt a Zero Trust approach based on a Default Deny strategy, and always perform thorough validation in audit mode and staging environments before applying policies. Network policies are not a one-time deployment but a living security element that must be continuously managed and updated as services evolve.

References