Skip to content
Published on

Operating GitHub Actions ARC: Ephemeral Runners, Scale Sets, and Security Boundaries

Authors

Operating GitHub Actions ARC

Introduction

At team scale, self-hosted GitHub Actions runners stop being only a cost discussion. They become an execution-isolation and security-boundary discussion. Once you run GitHub Actions Runner Controller, or ARC, on Kubernetes, the important questions change.

  • Should runners be long-lived or ephemeral
  • How should scale sets be separated by workload or trust boundary
  • How should images, caches, secrets, and network access be isolated

This guide stays close to GitHub's official ARC documentation and focuses on operational decisions rather than installation alone.

Why ephemeral runners are usually the safer default

Long-lived runners are easy to start with, but state accumulates over time.

  • files from previous jobs can remain
  • tool versions can drift
  • contamination or compromise can persist longer than expected

Ephemeral runners improve the default security and reproducibility posture because they are discarded after a short lifecycle.

Benefits:

  • stronger job-to-job isolation
  • cleaner image-based standardization
  • less risk of carrying polluted state into later jobs

They do increase operational discipline requirements, but for most serious environments they are the safer baseline.

Split scale sets by workload and trust boundary

One of the most common ARC design mistakes is putting every repository and workload behind a single giant scale set. That makes scheduling feel simple but weakens security and priority control.

Use boundaries like these instead:

  • untrusted versus trusted code paths
  • build versus deploy workloads
  • different network-access needs

Example:

Scale setPurposeWhy separate it
public-civalidation for external contributor PRsisolates untrusted code
internal-buildinternal builds and testsallows controlled package and cache access
deploy-opsdeployment and operations workflowsneeds stronger credential and approval boundaries

Security depends on the surrounding boundaries

Runner security is not only about patching the runner image. It depends on the whole execution perimeter.

Critical boundaries to review:

  • which secrets the job can access
  • Kubernetes service account permissions
  • image provenance and update path
  • outbound network policy
  • cache and artifact storage separation

The most dangerous pattern is letting untrusted pull request workloads share a boundary with deployment credentials.

Image strategy is a balance between speed and cleanliness

With ephemeral runners, image design matters more.

If the image is too empty, every job pays heavy install time. If the image is too large, image pull time dominates startup.

A practical compromise:

  • put language runtimes and common build tools in the base image
  • restore project-specific dependencies through caches or workflow steps
  • version runner images clearly and track their rollout history

Apply least privilege to networking and secrets

ARC runners often become far more privileged than teams realize. Recommended practices:

  • separate egress policy per scale set
  • expose cloud credentials only to deployment-oriented runners
  • keep basic test runners at the lowest practical privilege
  • prefer GitHub OIDC over long-lived cloud secrets where possible

Signals to monitor in production

Watch at least these:

  • job queue wait time
  • runner creation latency
  • Pod scheduling failures
  • image pull time
  • cleanup behavior after failed jobs
  • scale-out and scale-in delay

Operational checklist

  • Untrusted and privileged workloads are separated.
  • Ephemeral runners are the default pattern.
  • Runner image versions are traceable.
  • Scale sets have distinct network policy boundaries.
  • OIDC or minimum-privilege secret strategy is documented.

Common anti-patterns

One runner group for every repository

This simplifies administration on paper but weakens security and priority control.

Long-lived runners with a lot of local state

The apparent speed gain is often purchased with drift and contamination risk.

Sharing deployment credentials with external PR validation

This is one of the highest-risk runner designs you can choose.

Closing thoughts

The core ARC challenge is not how to launch runners on Kubernetes. It is how to decide which jobs should run inside which trust boundary. Ephemeral runners, separate scale sets, and minimum-privilege network and secret policies are the center of that model.

References