Skip to content
Published on

Kubernetes RBAC Deep Dive and OPA Gatekeeper Policy-as-Code Operations Guide

Authors
  • Name
    Twitter
Kubernetes RBAC Deep Dive and OPA Gatekeeper Policy-as-Code Operations Guide

Why RBAC Alone Is Not Enough

Kubernetes RBAC (Role-Based Access Control) is the core mechanism for controlling "who can perform which actions on which resources." However, RBAC alone cannot satisfy the following requirements:

  • Constraints on resource content: RBAC controls whether you can create a Pod or not, but it cannot validate whether that Pod has privileged: true or uses images from allowed registries.
  • Enforcing naming conventions: Organizational policies requiring that specific labels or annotations must exist cannot be expressed through RBAC.
  • Dynamic policy changes: RBAC changes require YAML modifications followed by kubectl apply. Consistent policy deployment across hundreds of clusters is challenging.

The Admission Controller-based Policy-as-Code approach bridges this gap, and OPA Gatekeeper is its representative implementation. This article covers the entire process from advanced RBAC design through codifying policies with Gatekeeper and operating them in production.

Advanced RBAC Design Principles

Applying Least Privilege in Practice

The first principle of RBAC design is granting only the minimum necessary permissions. Follow these rules:

  1. Prefer RoleBinding over ClusterRoleBinding: Don't use ClusterRoleBinding when a namespace-scoped RoleBinding is sufficient.
  2. No wildcards: Never use resources: ["*"] or verbs: ["*"].
  3. No system:masters group: Members of this group bypass all RBAC checks. Manage it separately for break-glass procedures only.
# bad-example: Excessive permissions
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: too-permissive
rules:
  - apiGroups: ['*']
    resources: ['*']
    verbs: ['*']
---
# good-example: Only necessary permissions specified
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: app-deployer
  namespace: production
rules:
  - apiGroups: ['apps']
    resources: ['deployments']
    verbs: ['get', 'list', 'watch', 'create', 'update', 'patch']
  - apiGroups: ['']
    resources: ['services', 'configmaps']
    verbs: ['get', 'list', 'watch', 'create', 'update']
  - apiGroups: ['']
    resources: ['pods']
    verbs: ['get', 'list', 'watch']
  - apiGroups: ['']
    resources: ['pods/log']
    verbs: ['get']

Preventing Privilege Escalation

The Kubernetes RBAC API blocks privilege escalation by default. To create or modify a Role or RoleBinding, a user must already possess all the permissions included in that Role. However, the following two verbs can bypass this protection and require special attention:

Dangerous VerbDescriptionMitigation
escalateAllows adding permissions to a Role that the user doesn't haveGrant only to platform admins, double-verify with OPA policies
bindAllows binding to a Role with permissions the user doesn't haveRestrict ClusterRoleBinding creation permissions themselves
impersonateAllows acting as another user/groupMandatory audit log monitoring, allow only specific targets
# Audit policy for monitoring privilege escalation related verbs
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
  - level: RequestResponse
    verbs: ['escalate', 'bind', 'impersonate']
    resources:
      - group: 'rbac.authorization.k8s.io'
        resources: ['clusterroles', 'clusterrolebindings', 'roles', 'rolebindings']

Using Aggregated ClusterRoles Safely

Aggregated ClusterRoles automatically sum up rules from multiple ClusterRoles based on label selectors. While convenient, unintended permission accumulation (Role Explosion) can occur.

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: monitoring-aggregate
  labels:
    rbac.example.com/aggregate-to-monitoring: 'true'
aggregationRule:
  clusterRoleSelectors:
    - matchLabels:
        rbac.example.com/aggregate-to-monitoring: 'true'
rules: [] # rules are automatically populated
---
# This ClusterRole's rules are automatically merged into the aggregate above
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: monitoring-pods
  labels:
    rbac.example.com/aggregate-to-monitoring: 'true'
rules:
  - apiGroups: ['']
    resources: ['pods', 'pods/log']
    verbs: ['get', 'list', 'watch']

Operational tip: Periodically verify which rules are included in Aggregated ClusterRoles.

# Check actual rules of an Aggregated ClusterRole
kubectl get clusterrole monitoring-aggregate -o jsonpath='{.rules}' | jq .

# Query all ClusterRoles with a specific label
kubectl get clusterrole -l rbac.example.com/aggregate-to-monitoring=true

ServiceAccount Management Strategy

Every Namespace has a default ServiceAccount that is auto-created. You should not use it as-is.

apiVersion: v1
kind: ServiceAccount
metadata:
  name: my-app-sa
  namespace: production
automountServiceAccountToken: false # Disable token mount if not needed
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
  namespace: production
spec:
  template:
    spec:
      serviceAccountName: my-app-sa
      automountServiceAccountToken: false # Also specify at Pod level
      containers:
        - name: app
          image: registry.example.com/my-app:v2.1.0

RBAC vs ABAC vs OPA Comparison

Before choosing a policy engine, you need to clearly understand the differences between each approach.

ItemRBACABACOPA Gatekeeper
Policy unitRole-basedAttribute-basedRule (Rego)-based
Configuration methodKubernetes API objectsStatic files (requires API server restart)CRD (ConstraintTemplate + Constraint)
Resource content validationNot possibleLimitedFully supported
Dynamic updateskubectl applyAPI server restartkubectl apply (zero-downtime)
Mutation supportN/AN/ASupported (Assign, AssignMetadata)
Audit capabilityNoneNoneExisting resource audit available
Learning curveLowMediumHigh (Rego learning required)
Community maturityBuilt-in featureDeprecatedCNCF Graduated (OPA)

Key point: RBAC controls "access eligibility" while OPA Gatekeeper validates "resource content compliance." They are not replacements but complements to each other.

OPA Gatekeeper Architecture

Admission Controller Flow

When processing requests, the Kubernetes API server goes through Admission Controllers in the following order:

API Request -> Authentication -> Authorization(RBAC) -> Mutating Admission -> Validating Admission -> etcd Storage
                                                              ^                      ^
                                                        Gatekeeper Mutation    Gatekeeper Validation

Gatekeeper registers with both ValidatingAdmissionWebhook and MutatingAdmissionWebhook, performing policy validation before the API server stores resources.

Core Components

Gatekeeper consists of three main components:

  1. Controller Manager: Manages ConstraintTemplate and Constraint CRDs, and compiles Rego policies.
  2. Audit Controller: Periodically scans existing resources to detect policy violations (default 60-second interval).
  3. Webhook Server: Receives Admission requests from the API server and evaluates policies in real-time.

Gatekeeper Installation

# Install with Helm (recommended)
helm repo add gatekeeper https://open-policy-agent.github.io/gatekeeper/charts
helm repo update

helm install gatekeeper gatekeeper/gatekeeper \
  --namespace gatekeeper-system \
  --create-namespace \
  --set replicas=3 \
  --set audit.replicas=1 \
  --set audit.logLevel=INFO \
  --set logDenies=true \
  --set emitAdmissionEvents=true \
  --set emitAuditEvents=true

# Verify installation
kubectl get pods -n gatekeeper-system
kubectl get crd | grep gatekeeper

CRDs to verify after installation:

assign.mutations.gatekeeper.sh
assignmetadata.mutations.gatekeeper.sh
configs.config.gatekeeper.sh
constraintpodstatuses.status.gatekeeper.sh
constrainttemplatepodstatuses.status.gatekeeper.sh
constrainttemplates.templates.gatekeeper.sh
expansiontemplate.expansion.gatekeeper.sh
modifyset.mutations.gatekeeper.sh
mutatorpodstatuses.status.gatekeeper.sh
providers.externaldata.gatekeeper.sh

ConstraintTemplate Authoring in Practice

A ConstraintTemplate defines the policy template, and a Constraint fills in parameters to activate the actual policy.

Example 1: Enforcing Required Labels

A policy requiring that all Deployments must have app.kubernetes.io/name and app.kubernetes.io/owner labels.

apiVersion: templates.gatekeeper.sh/v1
kind: ConstraintTemplate
metadata:
  name: k8srequiredlabels
spec:
  crd:
    spec:
      names:
        kind: K8sRequiredLabels
      validation:
        openAPIV3Schema:
          type: object
          properties:
            labels:
              type: array
              description: 'List of label names that must exist'
              items:
                type: string
  targets:
    - target: admission.k8s.gatekeeper.sh
      rego: |
        package k8srequiredlabels

        violation[{"msg": msg, "details": {"missing_labels": missing}}] {
          provided := {label | input.review.object.metadata.labels[label]}
          required := {label | label := input.parameters.labels[_]}
          missing := required - provided
          count(missing) > 0
          msg := sprintf("Resource is missing required labels: %v", [missing])
        }
---
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sRequiredLabels
metadata:
  name: deployment-must-have-labels
spec:
  enforcementAction: deny
  match:
    kinds:
      - apiGroups: ['apps']
        kinds: ['Deployment']
    namespaces: ['production', 'staging']
    excludedNamespaces: ['kube-system', 'gatekeeper-system']
  parameters:
    labels:
      - 'app.kubernetes.io/name'
      - 'app.kubernetes.io/owner'

Example 2: Allow Only Approved Container Registries

apiVersion: templates.gatekeeper.sh/v1
kind: ConstraintTemplate
metadata:
  name: k8sallowedrepos
spec:
  crd:
    spec:
      names:
        kind: K8sAllowedRepos
      validation:
        openAPIV3Schema:
          type: object
          properties:
            repos:
              type: array
              description: 'List of allowed container registry prefixes'
              items:
                type: string
  targets:
    - target: admission.k8s.gatekeeper.sh
      rego: |
        package k8sallowedrepos

        violation[{"msg": msg}] {
          container := input.review.object.spec.containers[_]
          not startswith_any(container.image, input.parameters.repos)
          msg := sprintf("Container '%v' image '%v' is not from an allowed registry. Allowed: %v", [container.name, container.image, input.parameters.repos])
        }

        violation[{"msg": msg}] {
          container := input.review.object.spec.initContainers[_]
          not startswith_any(container.image, input.parameters.repos)
          msg := sprintf("initContainer '%v' image '%v' is not from an allowed registry. Allowed: %v", [container.name, container.image, input.parameters.repos])
        }

        startswith_any(str, prefixes) {
          prefix := prefixes[_]
          startswith(str, prefix)
        }
---
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sAllowedRepos
metadata:
  name: allowed-repos-production
spec:
  enforcementAction: deny
  match:
    kinds:
      - apiGroups: ['']
        kinds: ['Pod']
    namespaces: ['production']
  parameters:
    repos:
      - 'registry.example.com/'
      - 'gcr.io/my-project/'

Example 3: Blocking Privileged Containers

apiVersion: templates.gatekeeper.sh/v1
kind: ConstraintTemplate
metadata:
  name: k8spspprivileged
spec:
  crd:
    spec:
      names:
        kind: K8sPSPPrivileged
      validation:
        openAPIV3Schema:
          type: object
          properties:
            exemptImages:
              type: array
              description: 'List of images exempt from this policy'
              items:
                type: string
  targets:
    - target: admission.k8s.gatekeeper.sh
      rego: |
        package k8spspprivileged

        violation[{"msg": msg}] {
          container := input.review.object.spec.containers[_]
          container.securityContext.privileged == true
          not is_exempt(container.image)
          msg := sprintf("Privileged containers are not allowed: '%v'", [container.name])
        }

        violation[{"msg": msg}] {
          container := input.review.object.spec.initContainers[_]
          container.securityContext.privileged == true
          not is_exempt(container.image)
          msg := sprintf("Privileged initContainers are not allowed: '%v'", [container.name])
        }

        is_exempt(image) {
          exempt := input.parameters.exemptImages[_]
          image == exempt
        }

enforcementAction Strategy: From Audit to Deny

The core operational strategy for Gatekeeper is phased rollout. Starting with deny from the beginning can cause mass blocking of existing workloads.

Phased Rollout Flow

Phase 1: dryrun  ->  Phase 2: warn  ->  Phase 3: deny
(audit only)         (show warnings)     (actual blocking)
enforcementActionBehaviorWhen to Use
dryrunRecords violations in Audit results only, allows requestsInitial policy deployment, impact assessment phase
warnReturns warning messages on violations but allows requestsPhase for notifying dev teams
denyRejects requests on violationProduction enforcement after sufficient testing

Checking Audit Results

# Check violations for a specific Constraint
kubectl get k8srequiredlabels deployment-must-have-labels -o yaml

# Filter only violating resources (using jq)
kubectl get k8srequiredlabels deployment-must-have-labels -o json | \
  jq '.status.violations[] | {name: .name, namespace: .namespace, message: .message}'

# Check Gatekeeper audit logs
kubectl logs -n gatekeeper-system -l control-plane=audit-controller --tail=100 | \
  grep '"process":"audit"'

Safe Method to Transition from dryrun to deny

#!/bin/bash
# safe-enforcement-switch.sh
# Script to check violations before transitioning dryrun -> deny

CONSTRAINT_KIND=$1
CONSTRAINT_NAME=$2

echo "=== Checking current violation count ==="
VIOLATIONS=$(kubectl get ${CONSTRAINT_KIND} ${CONSTRAINT_NAME} -o json | \
  jq '.status.totalViolations')

echo "Total violations: ${VIOLATIONS}"

if [ "${VIOLATIONS}" -gt 0 ]; then
  echo ""
  echo "=== Violating resource list ==="
  kubectl get ${CONSTRAINT_KIND} ${CONSTRAINT_NAME} -o json | \
    jq -r '.status.violations[] | "\(.namespace)/\(.name): \(.message)"'
  echo ""
  echo "[WARNING] Violating resources exist. Switching to deny will block updates to those resources."
  echo "Fix the violating resources first."
  exit 1
fi

echo ""
echo "No violations found. Switching to deny."
kubectl patch ${CONSTRAINT_KIND} ${CONSTRAINT_NAME} --type=merge \
  -p '{"spec":{"enforcementAction":"deny"}}'
echo "Transition complete."

Gatekeeper vs Kyverno: Policy Engine Selection Guide

OPA Gatekeeper and Kyverno are the two leading Kubernetes policy engines. Choosing the right one for your project is important.

Comparison ItemOPA GatekeeperKyverno
Policy languageRego (dedicated language)YAML (Kubernetes-native)
CNCF stageGraduated (OPA)Incubating
Validating WebhookSupportedSupported
Mutating WebhookSupported (Assign, AssignMetadata)Supported (native)
Resource generationNot supportedSupported
Image signature verificationRequires external data integrationBuilt-in (Cosign, Notary)
Audit capabilityBuilt-in (periodic scan)Built-in (Policy Report CRD)
External data integrationExternal Data Provider APIAPI Call support
Multi-clusterExternal tools like Config SyncLimited self-support
ValidatingAdmissionPolicy integrationFrom v3.22Supported
Learning curveHigh (Rego)Low (YAML)
ExpressivenessVery high (complex logic possible)Medium (supplemented by CEL)
Resource usageHigh (multiple Pods)Medium (single Controller)

Selection criteria summary:

  • Teams already familiar with Rego, needing complex cross-resource policies: Gatekeeper
  • Teams familiar with Kubernetes YAML, where Mutation/Generation is core: Kyverno
  • Large enterprise environments leveraging the OPA ecosystem (Styra DAS, etc.): Gatekeeper

Integrating Policies into CI/CD Pipelines

Git-Based Policy Management Structure

policies/
├── templates/
│   ├── k8s-required-labels.yaml
│   ├── k8s-allowed-repos.yaml
│   └── k8s-psp-privileged.yaml
├── constraints/
│   ├── production/
│   │   ├── required-labels.yaml
│   │   └── allowed-repos.yaml
│   └── staging/
│       └── required-labels.yaml
├── tests/
│   ├── required-labels_test.rego
│   └── allowed-repos_test.rego
└── Makefile

Writing Rego Unit Tests

Rego policies must have unit tests. Use the OPA CLI's opa test command.

# tests/required-labels_test.rego
package k8srequiredlabels

test_violation_missing_label {
  input := {
    "review": {
      "object": {
        "metadata": {
          "labels": {
            "app.kubernetes.io/name": "myapp"
          }
        }
      }
    },
    "parameters": {
      "labels": ["app.kubernetes.io/name", "app.kubernetes.io/owner"]
    }
  }
  results := violation with input as input
  count(results) > 0
}

test_no_violation_all_labels_present {
  input := {
    "review": {
      "object": {
        "metadata": {
          "labels": {
            "app.kubernetes.io/name": "myapp",
            "app.kubernetes.io/owner": "team-platform"
          }
        }
      }
    },
    "parameters": {
      "labels": ["app.kubernetes.io/name", "app.kubernetes.io/owner"]
    }
  }
  results := violation with input as input
  count(results) == 0
}
# Run tests
opa test ./policies/templates/ ./policies/tests/ -v

# Makefile targets for CI
# Makefile
.PHONY: test-rego lint-rego apply-dryrun

test-rego:
	opa test ./policies/templates/ ./policies/tests/ -v --fail

lint-rego:
	opa check ./policies/templates/ --strict

apply-dryrun:
	kubectl apply -f ./policies/templates/ --dry-run=server
	kubectl apply -f ./policies/constraints/ --dry-run=server

GitHub Actions Integration Example

# .github/workflows/policy-ci.yaml
name: Policy CI
on:
  pull_request:
    paths:
      - 'policies/**'

jobs:
  test-and-validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Setup OPA
        uses: open-policy-agent/setup-opa@v2
        with:
          version: latest

      - name: Rego Lint
        run: |
          opa check ./policies/templates/ --strict

      - name: Rego Unit Tests
        run: |
          opa test ./policies/templates/ ./policies/tests/ -v --fail

      - name: Validate YAML syntax
        run: |
          for f in $(find policies/ -name '*.yaml'); do
            echo "Validating: $f"
            kubectl apply -f "$f" --dry-run=client 2>&1 || exit 1
          done

      - name: Conftest Policy Check
        uses: instrumenta/conftest-action@main
        with:
          files: policies/constraints/

Troubleshooting Guide

Symptom 1: Gatekeeper Webhook Not Responding, Blocking All Requests

This is the most critical failure scenario. If all Gatekeeper Pods go down, behavior varies depending on the failurePolicy setting.

# Check webhook configuration
kubectl get validatingwebhookconfiguration gatekeeper-validating-webhook-configuration -o yaml | \
  grep failurePolicy

# failurePolicy: Fail -> All requests blocked during Gatekeeper outage (dangerous!)
# failurePolicy: Ignore -> Policy validation skipped during Gatekeeper outage

Emergency Response Procedure:

# 1. Temporarily disable webhook (emergency)
kubectl delete validatingwebhookconfiguration gatekeeper-validating-webhook-configuration

# 2. Check Gatekeeper Pod status and recover
kubectl get pods -n gatekeeper-system
kubectl describe pod -n gatekeeper-system -l control-plane=controller-manager

# 3. Re-register webhook after Pod recovery (Helm reapply)
helm upgrade gatekeeper gatekeeper/gatekeeper \
  --namespace gatekeeper-system \
  --reuse-values

# 4. Verify webhook re-registration
kubectl get validatingwebhookconfiguration | grep gatekeeper

Operational recommendation: In production environments, set failurePolicy: Ignore to prevent Gatekeeper outages from cascading into cluster-wide outages. However, since policy validation is temporarily disabled when Gatekeeper is down, monitoring alerts must be configured.

Symptom 2: ConstraintTemplate Status Shows "Not Ready" After Applying

# Check ConstraintTemplate status
kubectl get constrainttemplate k8srequiredlabels -o yaml | grep -A 20 status

# Common cause: Rego syntax errors
# Check compilation errors in Controller Manager logs
kubectl logs -n gatekeeper-system -l control-plane=controller-manager --tail=50 | \
  grep -i "error\|compile\|template"

Common Rego syntax mistakes:

  • Typos in input.review.object paths
  • Indentation errors when using line breaks instead of semicolons
  • violation rules not including {"msg": msg} in the return format

Symptom 3: Audit Not Detecting Violations

# Check if resource sync is configured in Config
kubectl get config config -n gatekeeper-system -o yaml

# Resources must be registered in Config for audit
# Gatekeeper Config: Register resources for Audit reference
apiVersion: config.gatekeeper.sh/v1alpha1
kind: Config
metadata:
  name: config
  namespace: gatekeeper-system
spec:
  sync:
    syncOnly:
      - group: ''
        version: 'v1'
        kind: 'Namespace'
      - group: ''
        version: 'v1'
        kind: 'Pod'
      - group: 'apps'
        version: 'v1'
        kind: 'Deployment'

Symptom 4: Constraint Not Excluding Specific Namespaces

# Check excludedNamespaces in the match block
spec:
  match:
    excludedNamespaces:
      - 'kube-system'
      - 'gatekeeper-system'
      - 'cert-manager' # System component namespace
      - 'monitoring' # Monitoring stack

Additionally, you can specify exempt namespaces in Gatekeeper's global configuration.

# Set global exclusions via Helm values
helm upgrade gatekeeper gatekeeper/gatekeeper \
  --namespace gatekeeper-system \
  --set 'exemptNamespaces={kube-system,gatekeeper-system}'

ValidatingAdmissionPolicy Integration (Kubernetes 1.30+)

Starting with Kubernetes 1.30, ValidatingAdmissionPolicy (VAP) became GA. From Gatekeeper v3.22, integration with VAP has been strengthened, allowing the sync-vap-enforcement-scope flag to align Gatekeeper's enforcement scope with VAP's enforcement scope.

# Enable VAP integration in Gatekeeper 3.22+
helm upgrade gatekeeper gatekeeper/gatekeeper \
  --namespace gatekeeper-system \
  --set 'controllerManager.extraArgs={--sync-vap-enforcement-scope=true}'

VAP performs validation using CEL expressions within the API server without external webhook calls, resulting in lower latency. A hybrid strategy where simple policies use VAP and complex cross-resource policies use Gatekeeper is effective.

Operations Checklist

RBAC Checklist

  • Are no regular users included in the system:masters group?
  • Are escalate, bind, impersonate verb usage being monitored?
  • Have all ServiceAccounts been granted only minimum permissions?
  • Is the default ServiceAccount not being directly used by workloads?
  • Is automountServiceAccountToken: false set on Pods that don't need it?
  • Are namespace-scoped RoleBindings being preferred over ClusterRoleBindings?
  • Are the actual rules of Aggregated ClusterRoles being regularly reviewed?
  • Are RBAC-related audit logs being collected?

OPA Gatekeeper Checklist

  • Are Gatekeeper Pods running with 3 or more replicas?
  • Is the failurePolicy setting appropriate for the environment? (Production: Ignore recommended)
  • Do all ConstraintTemplates have Rego unit tests?
  • Are new policies always deployed first as dryrun or warn?
  • Is the Audit Controller operating normally and monitoring violations?
  • Are system namespaces like kube-system and gatekeeper-system excluded?
  • Are Gatekeeper resource (CPU/Memory) usage being monitored?
  • Is webhook response latency being monitored? (P99 latency)
  • Are policy changes going through Git-based PR reviews?
  • Is the webhook deactivation procedure for emergencies documented?

Incident Response Priority

PriorityFailure ScenarioImmediate ActionRoot Cause Response
P0All deployments blocked due to webhook failureDelete webhook and restore serviceChange failurePolicy, increase replicas
P1Violations undetected due to audit not workingRestart Audit ControllerVerify Config sync settings, check log pipeline
P2False positive in specific policySwitch that Constraint to dryrunFix Rego logic and strengthen tests
P3Violating resource deployed due to missing policyManual audit then fix resourcesAdd ConstraintTemplate, strengthen CI pipeline

End-to-End Practical Scenario

Scenario: Applying Image Registry Restriction Policy to Production Cluster

# Step 1: Assess current state - check which registry images are in use
kubectl get pods --all-namespaces -o jsonpath='{range .items[*]}{.metadata.namespace}/{.metadata.name}{"\t"}{range .spec.containers[*]}{.image}{"\n"}{end}{end}' | \
  sort | uniq -c | sort -rn | head -20

# Step 2: Deploy ConstraintTemplate
kubectl apply -f policies/templates/k8s-allowed-repos.yaml

# Step 3: Deploy Constraint in dryrun mode
cat <<EOF | kubectl apply -f -
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sAllowedRepos
metadata:
  name: allowed-repos-production
spec:
  enforcementAction: dryrun
  match:
    kinds:
      - apiGroups: [""]
        kinds: ["Pod"]
    namespaces: ["production"]
  parameters:
    repos:
      - "registry.example.com/"
      - "gcr.io/my-project/"
EOF

# Step 4: Check violations after 1-2 days
kubectl get k8sallowedrepos allowed-repos-production -o json | \
  jq '.status.totalViolations'

# Step 5: Fix violating resources then switch to warn
kubectl patch k8sallowedrepos allowed-repos-production --type=merge \
  -p '{"spec":{"enforcementAction":"warn"}}'

# Step 6: Switch to deny after collecting dev team feedback
kubectl patch k8sallowedrepos allowed-repos-production --type=merge \
  -p '{"spec":{"enforcementAction":"deny"}}'

# Step 7: Verify policy enforcement
kubectl run test-blocked --image=docker.io/nginx:latest -n production
# Error: admission webhook "validation.gatekeeper.sh" denied the request

Conclusion

RBAC and OPA Gatekeeper handle different layers of Kubernetes security. RBAC controls "who can access" while Gatekeeper validates "which resources are allowed." Operating both layers together is what creates a complete policy framework.

Here are the key principles once more:

  1. Apply the principle of least privilege rigorously to RBAC. Prefer RoleBinding over ClusterRoleBinding, and explicit resource/verb specification over wildcards.
  2. Manage Gatekeeper policies as code. Version-control them in a Git repository, go through PR reviews, and automatically run Rego tests in CI.
  3. Always follow phased rollout (dryrun, warn, deny). Deploying deny directly to production is the beginning of an incident.
  4. Prepare incident response procedures in advance. Include the webhook deletion command in your runbook and drill regularly.

References