Operating GitHub Actions ARC: Ephemeral Runners, Scale Sets, and Security Boundaries

Introduction
Why ephemeral runners are usually the safer default
Split scale sets by workload and trust boundary
Security depends on the surrounding boundaries
Image strategy is a balance between speed and cleanliness
Apply least privilege to networking and secrets
Signals to monitor in production
Operational checklist
Common anti-patterns
Closing thoughts
References

Operating GitHub Actions ARC

Introduction

At team scale, self-hosted GitHub Actions runners stop being only a cost discussion. They become an execution-isolation and security-boundary discussion. Once you run GitHub Actions Runner Controller, or ARC, on Kubernetes, the important questions change.

Should runners be long-lived or ephemeral
How should scale sets be separated by workload or trust boundary
How should images, caches, secrets, and network access be isolated

This guide stays close to GitHub's official ARC documentation and focuses on operational decisions rather than installation alone.

Why ephemeral runners are usually the safer default

Long-lived runners are easy to start with, but state accumulates over time.

files from previous jobs can remain
tool versions can drift
contamination or compromise can persist longer than expected

Ephemeral runners improve the default security and reproducibility posture because they are discarded after a short lifecycle.

Benefits:

stronger job-to-job isolation
cleaner image-based standardization
less risk of carrying polluted state into later jobs

They do increase operational discipline requirements, but for most serious environments they are the safer baseline.

Split scale sets by workload and trust boundary

One of the most common ARC design mistakes is putting every repository and workload behind a single giant scale set. That makes scheduling feel simple but weakens security and priority control.

Use boundaries like these instead:

untrusted versus trusted code paths
build versus deploy workloads
different network-access needs

Example:

Scale set	Purpose	Why separate it
`public-ci`	validation for external contributor PRs	isolates untrusted code
`internal-build`	internal builds and tests	allows controlled package and cache access
`deploy-ops`	deployment and operations workflows	needs stronger credential and approval boundaries

Security depends on the surrounding boundaries

Runner security is not only about patching the runner image. It depends on the whole execution perimeter.

Critical boundaries to review:

which secrets the job can access
Kubernetes service account permissions
image provenance and update path
outbound network policy
cache and artifact storage separation

The most dangerous pattern is letting untrusted pull request workloads share a boundary with deployment credentials.

Image strategy is a balance between speed and cleanliness

With ephemeral runners, image design matters more.

If the image is too empty, every job pays heavy install time. If the image is too large, image pull time dominates startup.

A practical compromise:

put language runtimes and common build tools in the base image
restore project-specific dependencies through caches or workflow steps
version runner images clearly and track their rollout history

Apply least privilege to networking and secrets

ARC runners often become far more privileged than teams realize. Recommended practices:

separate egress policy per scale set
expose cloud credentials only to deployment-oriented runners
keep basic test runners at the lowest practical privilege
prefer GitHub OIDC over long-lived cloud secrets where possible

Signals to monitor in production

Watch at least these:

job queue wait time
runner creation latency
Pod scheduling failures
image pull time
cleanup behavior after failed jobs
scale-out and scale-in delay

Operational checklist

Untrusted and privileged workloads are separated.
Ephemeral runners are the default pattern.
Runner image versions are traceable.
Scale sets have distinct network policy boundaries.
OIDC or minimum-privilege secret strategy is documented.

Common anti-patterns

One runner group for every repository

This simplifies administration on paper but weakens security and priority control.

Long-lived runners with a lot of local state

The apparent speed gain is often purchased with drift and contamination risk.

This is one of the highest-risk runner designs you can choose.

Closing thoughts

The core ARC challenge is not how to launch runners on Kubernetes. It is how to decide which jobs should run inside which trust boundary. Ephemeral runners, separate scale sets, and minimum-privilege network and secret policies are the center of that model.