Skip to content
Published on

Kubernetes RBAC Deep Dive: Implementing Least-Privilege Access Control with Role, ClusterRole, and OPA Gatekeeper

Authors
  • Name
    Twitter
Kubernetes RBAC OPA Gatekeeper

Introduction

When operating Kubernetes clusters in production, access control is the first area that demands a well-structured framework. Cluster administrators, developers, CI/CD pipelines, monitoring agents, and many other entities send requests to the API Server, and these requests must be finely controlled to determine who (Subject) can perform which actions (Verb) on which resources (Resource). While Kubernetes' built-in authorization mechanism, RBAC (Role-Based Access Control), provides a substantial level of access control, policy-based controls such as "allow only specific image registries," "require resource limits on all Pods," or "prohibit namespace creation without specific labels" fall outside the scope of RBAC.

OPA Gatekeeper fills this gap. As a Kubernetes admission controller built on Open Policy Agent (OPA), it enforces policies written in the Rego language through ConstraintTemplate and Constraint CRDs applied to the cluster. While RBAC addresses "who should be granted which permissions," OPA Gatekeeper validates "whether an authorized request complies with policies" -- a complementary security layer.

This guide covers RBAC core components, ServiceAccount token management, namespace-level permission isolation design patterns, OPA Gatekeeper architecture and Rego policy authoring, real-world policy use cases, audit strategies, policy engine comparison, and failure case studies with recovery procedures.

RBAC Core Concepts

Kubernetes RBAC consists of four API objects. Understanding the relationships among these four is the starting point for access control design.

Role and ClusterRole

A Role defines permission rules for resources within a specific namespace. A ClusterRole is not scoped to any namespace and is used to define permissions for cluster-wide resources (nodes, persistentvolumes, etc.) or to create reusable permission sets across multiple namespaces.

# Namespace-scoped Role: Pod read access in the dev namespace
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: dev
  name: pod-reader
rules:
  - apiGroups: ['']
    resources: ['pods']
    verbs: ['get', 'watch', 'list']
  - apiGroups: ['']
    resources: ['pods/log']
    verbs: ['get']
# Cluster-scoped ClusterRole: Deployment management across all namespaces
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: deployment-manager
rules:
  - apiGroups: ['apps']
    resources: ['deployments']
    verbs: ['get', 'list', 'watch', 'create', 'update', 'patch']
  - apiGroups: ['apps']
    resources: ['deployments/scale']
    verbs: ['update', 'patch']

The core principle is to avoid wildcard (*) usage as much as possible. Using resources: ["*"] or verbs: ["*"] grants unlimited access not only to current resources but also to any future resources that may be added, dramatically increasing security risk.

RoleBinding and ClusterRoleBinding

RoleBinding and ClusterRoleBinding connect defined Roles/ClusterRoles to actual Subjects.

# RoleBinding: Grant pod-reader Role to frontend-team group in dev namespace
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: frontend-pod-reader
  namespace: dev
subjects:
  - kind: Group
    name: frontend-team
    apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: Role
  name: pod-reader
  apiGroup: rbac.authorization.k8s.io

Critical warning: Never add users to the system:masters group. Members of this group bypass all RBAC checks, and their permissions cannot be revoked by removing RoleBindings or ClusterRoleBindings.

Subject Types

There are three types of Subjects that can receive permissions in RBAC:

  • User: A user identity provided by an external authentication system (OIDC, certificates, etc.)
  • Group: A logical grouping of users. Group information is conveyed by the authentication system
  • ServiceAccount: An in-cluster account managed directly by Kubernetes, used for workloads running inside Pods

ServiceAccount and Token Management

Since Kubernetes 1.24, ServiceAccounts no longer automatically generate permanent tokens. The TokenRequest API issues time-limited tokens by default, which is a significant security improvement.

# Create a ServiceAccount
kubectl create serviceaccount ci-deployer -n staging

# Issue a time-limited token (valid for 1 hour)
kubectl create token ci-deployer -n staging --duration=3600s

# Check ServiceAccount permissions
kubectl auth can-i create deployments --as=system:serviceaccount:staging:ci-deployer -n staging

# Find Roles bound to a specific ServiceAccount
kubectl get rolebindings -n staging -o json | \
  jq '.items[] | select(.subjects[]? | .name=="ci-deployer" and .kind=="ServiceAccount")'

When designing ServiceAccounts for CI/CD pipelines, follow these principles:

  1. Create dedicated ServiceAccounts per pipeline: Never share a single ServiceAccount across multiple pipelines
  2. Create RoleBindings only in required namespaces: Use namespace-scoped RoleBindings instead of ClusterRoleBindings
  3. Minimize token validity period: Set token expiration to match the pipeline execution duration
  4. Disable automountServiceAccountToken: Prevent unnecessary token mounting in Pods
# Disabling automountServiceAccountToken
apiVersion: v1
kind: ServiceAccount
metadata:
  name: app-service
  namespace: production
automountServiceAccountToken: false

RBAC Design Patterns

Namespace-Level Isolation

In multi-tenant environments, namespaces are separated by team or environment, with independent RBAC policies applied to each namespace.

# Team namespace + RBAC batch configuration example
---
apiVersion: v1
kind: Namespace
metadata:
  name: team-alpha
  labels:
    team: alpha
    environment: production
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: team-alpha
  name: team-alpha-developer
rules:
  - apiGroups: ['', 'apps', 'batch']
    resources: ['pods', 'deployments', 'services', 'configmaps', 'jobs']
    verbs: ['get', 'list', 'watch', 'create', 'update', 'patch', 'delete']
  - apiGroups: ['']
    resources: ['secrets']
    verbs: ['get', 'list'] # Restrict Secret write permissions
  - apiGroups: ['']
    resources: ['pods/exec']
    verbs: ['create'] # Allow exec for debugging
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: team-alpha-developer-binding
  namespace: team-alpha
subjects:
  - kind: Group
    name: team-alpha-devs
    apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: Role
  name: team-alpha-developer
  apiGroup: rbac.authorization.k8s.io

Preventing Privilege Escalation

The most critical aspect of RBAC design is blocking privilege escalation paths. The following permissions require special attention:

  • pods/exec: Allows arbitrary command execution inside Pods, enabling ServiceAccount token theft
  • secrets read access: Risk of exposing other ServiceAccount tokens or database credentials
  • create on rolebindings/clusterrolebindings: Allows self-granting elevated permissions
  • escalate/bind verbs: Meta-permissions that allow modifying or binding Roles/ClusterRoles

OPA Gatekeeper Architecture

OPA Gatekeeper operates as a Kubernetes admission webhook, performing policy validation before the API Server creates, modifies, or deletes resources.

Components

  1. Gatekeeper Controller Manager: The core component that processes admission webhook requests and evaluates Rego policies
  2. Audit Controller: Periodically inspects already-deployed resources for policy compliance
  3. ConstraintTemplate CRD: Defines the Rego policy logic and parameter schema as a template
  4. Constraint CRD: Instantiates a ConstraintTemplate, specifying concrete policy targets and parameters

Installation

# Install Gatekeeper (Helm)
helm repo add gatekeeper https://open-policy-agent.github.io/gatekeeper/charts
helm repo update

helm install gatekeeper gatekeeper/gatekeeper \
  --namespace gatekeeper-system \
  --create-namespace \
  --set audit.interval=60 \
  --set constraintViolationsLimit=50 \
  --set audit.fromCache=true

# Verify installation
kubectl get pods -n gatekeeper-system
kubectl get crd | grep gatekeeper

Request Flow

The Gatekeeper policy evaluation flow works as follows:

  1. A user or controller sends a resource create/modify request to the API Server
  2. The API Server performs authentication (AuthN) and authorization (AuthZ/RBAC)
  3. During the admission phase, the request is forwarded to the Gatekeeper webhook
  4. Gatekeeper finds matching Constraints for the resource and evaluates Rego policies
  5. If violations are found, the request is denied with violation messages
  6. If no violations exist, the request is allowed and persisted to etcd

ConstraintTemplate and Rego Policy Authoring

Basic Structure

A ConstraintTemplate consists of two parts: the CRD spec (parameter schema) and the Rego code (policy logic).

apiVersion: templates.gatekeeper.sh/v1
kind: ConstraintTemplate
metadata:
  name: k8srequiredlabels
spec:
  crd:
    spec:
      names:
        kind: K8sRequiredLabels
      validation:
        openAPIV3Schema:
          type: object
          properties:
            labels:
              type: array
              items:
                type: string
              description: 'List of required labels'
  targets:
    - target: admission.k8s.gatekeeper.sh
      rego: |
        package k8srequiredlabels

        violation[{"msg": msg, "details": {"missing_labels": missing}}] {
          provided := {label | input.review.object.metadata.labels[label]}
          required := {label | label := input.parameters.labels[_]}
          missing := required - provided
          count(missing) > 0
          msg := sprintf("Resource is missing required labels: %v", [missing])
        }

Applying a Constraint

apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sRequiredLabels
metadata:
  name: require-team-label
spec:
  enforcementAction: deny
  match:
    kinds:
      - apiGroups: ['']
        kinds: ['Namespace']
    excludedNamespaces:
      - kube-system
      - gatekeeper-system
  parameters:
    labels:
      - 'team'
      - 'cost-center'

Real-World Policy Use Cases

Use Case 1: Image Registry Restriction

Enforce that containers in production clusters can only pull images from approved registries.

apiVersion: templates.gatekeeper.sh/v1
kind: ConstraintTemplate
metadata:
  name: k8sallowedrepos
spec:
  crd:
    spec:
      names:
        kind: K8sAllowedRepos
      validation:
        openAPIV3Schema:
          type: object
          properties:
            repos:
              type: array
              items:
                type: string
  targets:
    - target: admission.k8s.gatekeeper.sh
      rego: |
        package k8sallowedrepos

        violation[{"msg": msg}] {
          container := input.review.object.spec.containers[_]
          not image_from_allowed(container.image)
          msg := sprintf("Container '%v' uses image '%v' from an unauthorized registry. Allowed registries: %v",
            [container.name, container.image, input.parameters.repos])
        }

        violation[{"msg": msg}] {
          container := input.review.object.spec.initContainers[_]
          not image_from_allowed(container.image)
          msg := sprintf("Init container '%v' uses image '%v' from an unauthorized registry.",
            [container.name, container.image])
        }

        image_from_allowed(image) {
          repo := input.parameters.repos[_]
          startswith(image, repo)
        }
---
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sAllowedRepos
metadata:
  name: prod-allowed-repos
spec:
  enforcementAction: deny
  match:
    kinds:
      - apiGroups: ['']
        kinds: ['Pod']
    namespaces:
      - production
      - staging
  parameters:
    repos:
      - 'gcr.io/my-company/'
      - 'us-docker.pkg.dev/my-company/'
      - 'registry.internal.company.com/'

Use Case 2: Mandatory Resource Requests and Limits

Require all containers to have CPU and memory requests/limits configured.

apiVersion: templates.gatekeeper.sh/v1
kind: ConstraintTemplate
metadata:
  name: k8srequireresourcelimits
spec:
  crd:
    spec:
      names:
        kind: K8sRequireResourceLimits
      validation:
        openAPIV3Schema:
          type: object
          properties:
            requiredResources:
              type: array
              items:
                type: string
  targets:
    - target: admission.k8s.gatekeeper.sh
      rego: |
        package k8srequireresourcelimits

        violation[{"msg": msg}] {
          container := input.review.object.spec.containers[_]
          resource := input.parameters.requiredResources[_]
          not container.resources.limits[resource]
          msg := sprintf("Container '%v' is missing resources.limits.%v", [container.name, resource])
        }

        violation[{"msg": msg}] {
          container := input.review.object.spec.containers[_]
          resource := input.parameters.requiredResources[_]
          not container.resources.requests[resource]
          msg := sprintf("Container '%v' is missing resources.requests.%v", [container.name, resource])
        }
---
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sRequireResourceLimits
metadata:
  name: require-cpu-memory-limits
spec:
  enforcementAction: deny
  match:
    kinds:
      - apiGroups: ['']
        kinds: ['Pod']
    excludedNamespaces:
      - kube-system
      - gatekeeper-system
  parameters:
    requiredResources:
      - 'cpu'
      - 'memory'

Use Case 3: Block Privileged Containers

Prevent containers from running in privileged mode.

apiVersion: templates.gatekeeper.sh/v1
kind: ConstraintTemplate
metadata:
  name: k8sdisallowprivileged
spec:
  crd:
    spec:
      names:
        kind: K8sDisallowPrivileged
  targets:
    - target: admission.k8s.gatekeeper.sh
      rego: |
        package k8sdisallowprivileged

        violation[{"msg": msg}] {
          container := input.review.object.spec.containers[_]
          container.securityContext.privileged == true
          msg := sprintf("Privileged containers are not allowed: '%v'", [container.name])
        }

        violation[{"msg": msg}] {
          container := input.review.object.spec.initContainers[_]
          container.securityContext.privileged == true
          msg := sprintf("Privileged init containers are not allowed: '%v'", [container.name])
        }

RBAC Auditing and Monitoring

kubectl-Based Auditing

# Check current user permissions
kubectl auth can-i --list

# Check a specific ServiceAccount's namespace permissions
kubectl auth can-i --list --as=system:serviceaccount:production:app-deployer -n production

# Verify a specific action
kubectl auth can-i delete pods --as=system:serviceaccount:staging:ci-runner -n staging

# Find subjects that can read secrets cluster-wide (kubectl-who-can plugin)
kubectl who-can get secrets --all-namespaces

# Discover ClusterRoleBindings with excessive permissions
kubectl get clusterrolebindings -o json | \
  jq '.items[] | select(.roleRef.name=="cluster-admin") | .metadata.name, .subjects'

Gatekeeper Audit Capabilities

Gatekeeper's Audit feature inspects already-deployed resources for policy violations. Setting enforcementAction: dryrun records violations without blocking requests.

# View policy violation status
kubectl get k8sallowedrepos prod-allowed-repos -o yaml | \
  grep -A 100 "status:" | head -50

# Summary of all Constraint violations
kubectl get constraints -o json | \
  jq '.items[] | {name: .metadata.name, kind: .kind, violations: (.status.totalViolations // 0)}'

Monitoring Integration

Gatekeeper exposes Prometheus metrics by default:

  • gatekeeper_violations: Number of violations found during the current audit
  • gatekeeper_request_duration_seconds: Admission request processing time
  • gatekeeper_request_count: Total admission requests (by allow/deny)
  • gatekeeper_constraint_template_status: ConstraintTemplate status

Comparison: RBAC vs OPA Gatekeeper vs PSA vs Kyverno

A comprehensive comparison of Kubernetes access control and policy engines.

CategoryRBACOPA GatekeeperPod Security Admission (PSA)Kyverno
LayerAuthorizationAdmissionAdmissionAdmission
Policy LanguageDeclarative YAMLRegoBuilt-in profiles (Privileged/Baseline/Restricted)Declarative YAML
Learning CurveLowHigh (requires Rego)Very LowLow
FlexibilityLow (grant/deny only)Very HighLow (only 3 profiles)High
Mutation SupportN/ASupported (Assign/Modify)Not supportedSupported (mutate)
Resource GenerationN/ANot supportedNot supportedSupported (generate)
AuditRequires audit log analysisBuilt-in Audit featureaudit/warn modesPolicyReport CRD
CNCF StageBuilt into KubernetesGraduatedBuilt into KubernetesIncubating
Resource ConsumptionNone (built into API Server)High (multiple Pods)Very LowMedium
Recommended ForBase access controlComplex policy logicPod security baseline enforcementKubernetes-native policies

Recommended combination: Using RBAC (base authorization) + PSA (Pod security baseline) + Gatekeeper or Kyverno (custom policies) together is the best practice for production environments. Kyverno is well-suited for simple policies, while OPA Gatekeeper excels at complex cross-resource validation.

Failure Cases and Recovery Procedures

Case 1: Gatekeeper Webhook Failure Blocking All Deployments

Symptom: All Pod creations are rejected with error messages showing webhook "validation.gatekeeper.sh" denied the request or connection timeouts.

Root Cause: Gatekeeper Controller Pods have crashed or are unresponsive due to resource exhaustion.

Recovery Procedure:

# 1. Check Gatekeeper Pod status
kubectl get pods -n gatekeeper-system

# 2. Emergency: Temporarily disable webhook (change failurePolicy to Ignore)
kubectl get validatingwebhookconfigurations gatekeeper-validating-webhook-configuration -o yaml > webhook-backup.yaml
kubectl patch validatingwebhookconfigurations gatekeeper-validating-webhook-configuration \
  --type='json' -p='[{"op": "replace", "path": "/webhooks/0/failurePolicy", "value": "Ignore"}]'

# 3. Restart Gatekeeper Pods
kubectl rollout restart deployment gatekeeper-controller-manager -n gatekeeper-system

# 4. After confirming recovery, restore failurePolicy
kubectl apply -f webhook-backup.yaml

Prevention: Changing Gatekeeper's failurePolicy from Fail to Ignore improves availability but allows policy bypass. In production, maintain Fail while ensuring adequate resources and replicas for Gatekeeper Pods.

Case 2: Permission Leakage Through Excessive ClusterRoleBindings

Symptom: All ServiceAccounts can unexpectedly read cluster resources.

Diagnosis and Recovery:

# Find subjects bound to cluster-admin
kubectl get clusterrolebindings -o json | \
  jq '.items[] | select(.roleRef.name=="cluster-admin") | {name: .metadata.name, subjects: .subjects}'

# Remove unnecessary ClusterRoleBindings
kubectl delete clusterrolebinding suspicious-admin-binding

# Audit all ClusterRoleBindings
kubectl get clusterrolebindings -o json | \
  jq '.items[] | {name: .metadata.name, role: .roleRef.name, subjects: [.subjects[]? | .kind + ":" + .name]}'

Case 3: ConstraintTemplate Syntax Error Causing Policy Inaction

Symptom: A Constraint has been created but the policy is not being enforced. The STATUS column in kubectl get constrainttemplates shows an abnormal state.

Diagnosis:

# Check ConstraintTemplate status
kubectl get constrainttemplate k8srequiredlabels -o json | jq '.status'

# Check for Rego syntax errors
kubectl describe constrainttemplate k8srequiredlabels | grep -A 10 "Status:"

# Check Gatekeeper logs for errors
kubectl logs -n gatekeeper-system -l control-plane=controller-manager --tail=100

Operational Checklist

A checklist for auditing access control in production Kubernetes clusters.

RBAC Checklist

  • Are there any unnecessary users in the system:masters group?
  • Have all subjects bound to the cluster-admin ClusterRole been identified?
  • Do any Roles/ClusterRoles use wildcard (*) permissions?
  • Is automountServiceAccountToken: false set as the default for ServiceAccounts?
  • Are appropriate Roles/RoleBindings configured per namespace?
  • Are sensitive permissions (pods/exec, secrets, rolebindings) granted minimally?
  • Have unused ServiceAccounts and RoleBindings been cleaned up?
  • Are RBAC configurations managed in Git (GitOps)?

OPA Gatekeeper Checklist

  • Is the Gatekeeper Controller deployed with high availability (2+ replicas)?
  • Is failurePolicy configured appropriately for production requirements?
  • Are system namespaces (kube-system, gatekeeper-system) properly excluded?
  • Is there a process to test new policies in dryrun mode first?
  • Are Audit results reviewed regularly?
  • Does the ConstraintTemplate Rego code pass unit tests?
  • Are Gatekeeper metrics integrated with Prometheus/Grafana?
  • Is there a documented emergency recovery runbook for webhook failures?

Periodic Audit Items

  • Quarterly RBAC permission review: Identify subjects with excessive permissions
  • Monthly Gatekeeper Audit report: Current policy violation resource status
  • ServiceAccount token usage pattern analysis: Clean up unused tokens
  • Update RBAC and Gatekeeper policies when new CRDs/APIs are introduced

Conclusion

Kubernetes access control is not complete with RBAC alone. RBAC answers "who should be granted which permissions," but validating "whether authorized actions comply with policies" requires an admission controller like OPA Gatekeeper. Using both mechanisms together, enforcing baseline security with Pod Security Admission, and performing regular audits is how you realize the least-privilege principle in production environments.

The operational habits that prevent security incidents are managing policies as code (Policy as Code), always evaluating impact with dryrun mode before changes, and preparing emergency recovery procedures for failures in advance. RBAC and OPA Gatekeeper are technical tools, but effective operation requires integration with your organization's access control governance framework.