Skip to content

Split View: ArgoCD GitOps 멀티 클러스터 배포 전략과 운영 자동화 가이드

✨ Learn with Quiz
|

ArgoCD GitOps 멀티 클러스터 배포 전략과 운영 자동화 가이드

ArgoCD GitOps Multi-Cluster Deployment

들어가며

멀티 클러스터 Kubernetes 환경을 운영하는 조직이 늘어나면서, 일관된 배포 파이프라인의 중요성이 커지고 있습니다. 각 클러스터에 수동으로 kubectl apply를 실행하던 시절은 지났습니다. GitOps는 Git 저장소를 단일 진실 소스(Single Source of Truth)로 삼아, 선언적으로 인프라와 애플리케이션 상태를 관리하는 운영 패러다임입니다.

ArgoCD는 CNCF Graduated 프로젝트로, Kubernetes 네이티브 GitOps 도구의 사실상 표준입니다. 이 글에서는 ArgoCD를 활용한 멀티 클러스터 배포 아키텍처 설계부터 ApplicationSet, App of Apps 패턴, Sync Wave, 보안, 장애 복구까지 실전 운영에 필요한 모든 것을 다룹니다.

ArgoCD 아키텍처와 핵심 컴포넌트

아키텍처 개요

ArgoCD는 다음 핵심 컴포넌트로 구성됩니다:

┌─────────────────────────────────────────────────┐
ArgoCD Server│  ┌──────────┐  ┌───────────┐  ┌──────────────┐  │
│  │ API Server│Repo Server│App Controller││  └──────────┘  └───────────┘  └──────────────┘  │
│  ┌──────────────────┐  ┌─────────────────────┐  │
│  │ ApplicationSet    │  │ Notification         │  │
│  │ Controller        │  │ Controller           │  │
│  └──────────────────┘  └─────────────────────┘  │
└─────────────────────────────────────────────────┘
         │                    │
    ┌────┴────┐          ┌───┴────┐
Git Repo │          │ K8s     (Source) │          │Clusters│
    └─────────┘          └────────┘
  • API Server: UI와 CLI 요청 처리, 인증/인가
  • Repo Server: Git 저장소에서 매니페스트를 가져와 렌더링 (Helm, Kustomize, Jsonnet 등)
  • Application Controller: 실제 클러스터 상태와 Git 상태를 비교하고 동기화
  • ApplicationSet Controller: 템플릿 기반으로 여러 Application을 자동 생성
  • Notification Controller: Sync 상태 변화를 Slack, Teams, Webhook 등으로 알림

멀티 클러스터에서의 ArgoCD 배포 모델

멀티 클러스터 환경에서 ArgoCD 자체를 어디에 어떻게 배포하느냐가 첫 번째 설계 결정입니다.

모델구조장점단점
Hub-Spoke관리 클러스터에 ArgoCD 1개, 워커 클러스터에 배포중앙 관리, 일관된 정책SPOF, 네트워크 의존
Standalone각 클러스터에 ArgoCD 설치독립성, 네트워크 격리관리 오버헤드, 설정 불일치
HybridHub에서 공통 인프라, 각 클러스터에서 앱 배포균형잡힌 접근아키텍처 복잡도

대부분의 조직에서는 Hub-Spoke 모델을 채택합니다. 이 모델에서는 관리(Management) 클러스터에 ArgoCD를 설치하고, 원격 클러스터를 등록하여 배포합니다.

멀티 클러스터 등록과 시크릿 관리

클러스터 등록

ArgoCD CLI를 통해 원격 클러스터를 등록합니다:

# kubeconfig 컨텍스트 확인
kubectl config get-contexts

# 클러스터 추가 (ArgoCD가 ServiceAccount를 자동 생성)
argocd cluster add staging-cluster \
  --name staging \
  --label env=staging \
  --label region=ap-northeast-2

argocd cluster add production-apne2 \
  --name production-apne2 \
  --label env=production \
  --label region=ap-northeast-2

argocd cluster add production-use1 \
  --name production-use1 \
  --label env=production \
  --label region=us-east-1

내부적으로 ArgoCD는 각 클러스터 정보를 Secret 리소스로 저장합니다. 선언적으로 관리하려면 다음과 같이 Secret을 직접 생성합니다:

apiVersion: v1
kind: Secret
metadata:
  name: production-apne2-cluster
  namespace: argocd
  labels:
    argocd.argoproj.io/secret-type: cluster
    env: production
    region: ap-northeast-2
type: Opaque
stringData:
  name: production-apne2
  server: 'https://kubernetes.production-apne2.internal:6443'
  config: |
    {
      "bearerToken": "<service-account-token>",
      "tlsClientConfig": {
        "insecure": false,
        "caData": "<base64-encoded-ca-cert>"
      }
    }

시크릿 관리 전략

멀티 클러스터 환경에서 시크릿을 Git에 평문으로 저장하면 보안 사고가 발생합니다. 다음 도구들을 ArgoCD와 결합합니다:

  • Sealed Secrets: 공개키로 암호화하여 Git에 저장, 클러스터에서 복호화
  • External Secrets Operator (ESO): AWS Secrets Manager, HashiCorp Vault 등에서 시크릿을 동기화
  • SOPS + age/KMS: 파일 단위 암호화

ESO를 ArgoCD와 결합하는 패턴이 가장 많이 사용됩니다:

apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: database-credentials
  namespace: production
spec:
  refreshInterval: 1h
  secretStoreRef:
    name: aws-secrets-manager
    kind: ClusterSecretStore
  target:
    name: database-credentials
    creationPolicy: Owner
  data:
    - secretKey: DB_HOST
      remoteRef:
        key: production/database
        property: host
    - secretKey: DB_PASSWORD
      remoteRef:
        key: production/database
        property: password

ApplicationSet Controller와 Generator 패턴

ApplicationSet은 멀티 클러스터 배포의 핵심 도구입니다. 하나의 ApplicationSet으로 수십 개 클러스터에 동일한 애플리케이션을 자동으로 배포할 수 있습니다.

Cluster Generator

ArgoCD에 등록된 클러스터 정보를 기반으로 Application을 생성합니다:

apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: prometheus-stack
  namespace: argocd
spec:
  goTemplate: true
  goTemplateOptions: ['missingkey=error']
  generators:
    - clusters:
        selector:
          matchLabels:
            env: production
        values:
          helmReleaseName: kube-prometheus
  template:
    metadata:
      name: 'prometheus-{{.name}}'
    spec:
      project: infrastructure
      source:
        repoURL: 'https://github.com/org/k8s-infrastructure.git'
        targetRevision: main
        path: 'clusters/{{.metadata.labels.region}}/prometheus'
        helm:
          releaseName: '{{.values.helmReleaseName}}'
          valueFiles:
            - 'values.yaml'
            - 'values-{{.metadata.labels.env}}.yaml'
      destination:
        server: '{{.server}}'
        namespace: monitoring
      syncPolicy:
        automated:
          prune: true
          selfHeal: true
        syncOptions:
          - CreateNamespace=true
          - ServerSideApply=true

이 설정은 env=production 레이블이 있는 모든 클러스터에 Prometheus 스택을 배포합니다. 새 클러스터를 추가하고 같은 레이블을 붙이면 자동으로 Application이 생성됩니다.

Matrix Generator - 클러스터 x 서비스 조합

여러 클러스터에 여러 서비스를 배포하는 경우 Matrix Generator가 유용합니다:

apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: microservices
  namespace: argocd
spec:
  goTemplate: true
  generators:
    - matrix:
        generators:
          - clusters:
              selector:
                matchLabels:
                  env: production
          - list:
              elements:
                - service: order-service
                  replicas: '3'
                  memory: '512Mi'
                - service: payment-service
                  replicas: '2'
                  memory: '1Gi'
                - service: notification-service
                  replicas: '2'
                  memory: '256Mi'
  template:
    metadata:
      name: '{{.service}}-{{.name}}'
      annotations:
        notifications.argoproj.io/subscribe.on-sync-failed.slack: deploy-alerts
    spec:
      project: applications
      source:
        repoURL: 'https://github.com/org/k8s-apps.git'
        targetRevision: main
        path: 'apps/{{.service}}'
        helm:
          parameters:
            - name: replicaCount
              value: '{{.replicas}}'
            - name: resources.requests.memory
              value: '{{.memory}}'
      destination:
        server: '{{.server}}'
        namespace: '{{.service}}'
      syncPolicy:
        automated:
          prune: true
          selfHeal: true
        syncOptions:
          - CreateNamespace=true

Matrix Generator는 클러스터 3개 x 서비스 3개 = 9개의 Application을 자동 생성합니다. 이것이 ApplicationSet의 가장 강력한 점입니다.

App of Apps 패턴 설계

App of Apps 패턴은 ArgoCD Application을 관리하는 Application을 만드는 계층적 접근법입니다. Git 저장소에 Application 매니페스트를 저장하고, 루트 Application이 이를 참조합니다.

디렉토리 구조

k8s-gitops/
├── root-apps/
│   ├── infrastructure.yaml      # 인프라 App of Apps
│   ├── platform.yaml            # 플랫폼 App of Apps
│   └── applications.yaml        # 비즈니스 앱 App of Apps
├── infrastructure/
│   ├── cert-manager.yaml
│   ├── external-secrets.yaml
│   ├── ingress-nginx.yaml
│   └── prometheus-stack.yaml
├── platform/
│   ├── istio.yaml
│   ├── kafka.yaml
│   └── redis.yaml
└── applications/
    ├── order-service.yaml
    ├── payment-service.yaml
    └── notification-service.yaml

루트 Application 설정

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: root-infrastructure
  namespace: argocd
  annotations:
    argocd.argoproj.io/sync-wave: '-2'
spec:
  project: default
  source:
    repoURL: 'https://github.com/org/k8s-gitops.git'
    targetRevision: main
    path: infrastructure
  destination:
    server: https://kubernetes.default.svc
    namespace: argocd
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
---
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: root-platform
  namespace: argocd
  annotations:
    argocd.argoproj.io/sync-wave: '-1'
spec:
  project: default
  source:
    repoURL: 'https://github.com/org/k8s-gitops.git'
    targetRevision: main
    path: platform
  destination:
    server: https://kubernetes.default.svc
    namespace: argocd
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
---
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: root-applications
  namespace: argocd
  annotations:
    argocd.argoproj.io/sync-wave: '0'
spec:
  project: default
  source:
    repoURL: 'https://github.com/org/k8s-gitops.git'
    targetRevision: main
    path: applications
  destination:
    server: https://kubernetes.default.svc
    namespace: argocd
  syncPolicy:
    automated:
      prune: true
      selfHeal: true

Sync Wave와 Hook을 활용한 배포 순서 제어

Sync Wave 기본 개념

Sync Wave는 argocd.argoproj.io/sync-wave 어노테이션으로 리소스의 배포 순서를 정의합니다. 낮은 숫자부터 배포되며, 같은 Wave 내에서는 병렬 처리됩니다.

# Wave -1: Namespace와 RBAC 먼저 생성
apiVersion: v1
kind: Namespace
metadata:
  name: payment-service
  annotations:
    argocd.argoproj.io/sync-wave: '-1'
---
# Wave 0: ConfigMap과 Secret
apiVersion: v1
kind: ConfigMap
metadata:
  name: payment-config
  namespace: payment-service
  annotations:
    argocd.argoproj.io/sync-wave: '0'
data:
  DATABASE_URL: 'postgresql://db.internal:5432/payments'
  KAFKA_BROKERS: 'kafka-0.kafka:9092'
---
# Wave 1: 메인 Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: payment-service
  namespace: payment-service
  annotations:
    argocd.argoproj.io/sync-wave: '1'
spec:
  replicas: 3
  selector:
    matchLabels:
      app: payment-service
  template:
    metadata:
      labels:
        app: payment-service
    spec:
      containers:
        - name: payment
          image: registry.internal/payment-service:v2.3.1
          envFrom:
            - configMapRef:
                name: payment-config
---
# Wave 2: Service 노출
apiVersion: v1
kind: Service
metadata:
  name: payment-service
  namespace: payment-service
  annotations:
    argocd.argoproj.io/sync-wave: '2'
spec:
  selector:
    app: payment-service
  ports:
    - port: 8080
      targetPort: 8080

Sync Hook 활용

Hook은 Sync 라이프사이클의 특정 시점에 실행되는 리소스(주로 Job)입니다:

# PreSync Hook: 배포 전 데이터베이스 마이그레이션
apiVersion: batch/v1
kind: Job
metadata:
  name: db-migration
  namespace: payment-service
  annotations:
    argocd.argoproj.io/hook: PreSync
    argocd.argoproj.io/hook-delete-policy: BeforeHookCreation
    argocd.argoproj.io/sync-wave: '-1'
spec:
  template:
    spec:
      containers:
        - name: migrate
          image: registry.internal/payment-service:v2.3.1
          command: ['python', 'manage.py', 'migrate']
          envFrom:
            - secretRef:
                name: database-credentials
      restartPolicy: Never
  backoffLimit: 3
---
# PostSync Hook: 배포 후 스모크 테스트
apiVersion: batch/v1
kind: Job
metadata:
  name: smoke-test
  namespace: payment-service
  annotations:
    argocd.argoproj.io/hook: PostSync
    argocd.argoproj.io/hook-delete-policy: HookSucceeded
spec:
  template:
    spec:
      containers:
        - name: smoke-test
          image: registry.internal/smoke-tester:latest
          command: ['./run-tests.sh']
          env:
            - name: TARGET_URL
              value: 'http://payment-service.payment-service:8080'
            - name: TEST_SUITE
              value: 'smoke'
      restartPolicy: Never
  backoffLimit: 1

Hook의 종류와 실행 시점:

Hook실행 시점사용 사례
PreSync매니페스트 동기화 전DB 마이그레이션, 백업
Sync매니페스트 동기화와 함께특수 순서 배포
PostSync모든 리소스 Healthy 후스모크 테스트, 알림
SyncFailSync 실패 시롤백, 알림 전송
Skip동기화에서 제외수동 관리 리소스

ArgoCD vs Flux vs Jenkins GitOps 비교

항목ArgoCDFlux v2Jenkins + GitOps
CNCF 상태GraduatedGraduated해당 없음
UI 대시보드내장 (풍부함)Weave GitOps (별도)Jenkins 내장
멀티 클러스터Hub-Spoke 네이티브클러스터별 설치 권장별도 구성 필요
ApplicationSet네이티브 지원Kustomization으로 유사미지원
Helm 지원네이티브HelmRelease CRD플러그인
이미지 자동 업데이트Image Updater (별도)네이티브 지원파이프라인 트리거
RBAC프로젝트 기반 세분화Kubernetes RBACJenkins 자체 RBAC
Sync Wave/Hook네이티브 지원Health Check 기반파이프라인 스테이지
알림Notification ControllerAlert Provider플러그인
학습 곡선중간중간~높음높음 (Groovy 필요)
커뮤니티 규모매우 활발 (17k+ Stars)활발 (6k+ Stars)매우 활발

ArgoCD는 UI의 직관성, ApplicationSet의 강력한 템플릿 기능, 그리고 Hub-Spoke 멀티 클러스터 지원에서 우위를 보입니다. Flux는 이미지 자동 업데이트와 Kubernetes-native한 설계 철학이 강점입니다.

장애 시나리오와 복구

시나리오 1: Sync가 Stuck 상태에 빠짐

증상: Application이 Syncing 상태에서 멈추고 진행되지 않음

# Application 상태 확인
argocd app get payment-service --show-operation

# 강제 Sync 종료
argocd app terminate-op payment-service

# Sync 재시도
argocd app sync payment-service --retry-limit 3 --retry-backoff-duration 10s

원인: PreSync Hook Job이 실패하거나, 리소스가 Healthy 상태에 도달하지 못한 경우

시나리오 2: Git 저장소 접근 불가

증상: 모든 Application이 Unknown 상태로 표시

# Repo Server 로그 확인
kubectl logs -n argocd deployment/argocd-repo-server --tail=100

# Git 연결 테스트
argocd repo list

# Repository 재등록
argocd repo add https://github.com/org/k8s-gitops.git \
  --username deploy-bot \
  --password "$GITHUB_TOKEN"

복구: ArgoCD는 마지막으로 성공한 매니페스트를 캐시하므로, Git이 잠시 불가해도 기존 배포 상태는 유지됩니다. 하지만 새로운 배포는 불가능합니다.

시나리오 3: 원격 클러스터 연결 끊김

# 클러스터 상태 확인
argocd cluster list

# 특정 클러스터의 모든 앱 상태
argocd app list --dest-server https://kubernetes.production-apne2.internal:6443

# 클러스터 자격증명 갱신
argocd cluster set production-apne2 \
  --server https://kubernetes.production-apne2.internal:6443 \
  --bearer-token "new-token-here"

시나리오 4: 잘못된 배포 롤백

# Application 히스토리 확인
argocd app history payment-service

# 특정 리비전으로 롤백
argocd app rollback payment-service 42

# 또는 Git revert 후 자동 Sync
git revert HEAD
git push origin main

GitOps에서 권장하는 롤백 방법은 argocd app rollback보다 Git revert입니다. Git이 진실의 소스이므로, Git 히스토리에 롤백 기록이 남아야 합니다.

운영 자동화 권장사항

Notification 설정

apiVersion: v1
kind: ConfigMap
metadata:
  name: argocd-notifications-cm
  namespace: argocd
data:
  trigger.on-sync-failed: |
    - description: Application sync has failed
      send:
        - slack-deploy-alert
      when: app.status.operationState.phase in ['Error', 'Failed']
  trigger.on-health-degraded: |
    - description: Application health has degraded
      send:
        - slack-deploy-alert
      when: app.status.health.status == 'Degraded'
  template.slack-deploy-alert: |
    message: |
      Application *{{.app.metadata.name}}* sync {{.app.status.operationState.phase}}.
      Revision: {{.app.status.sync.revision}}
      Cluster: {{.app.spec.destination.server}}
    slack:
      attachments: |
        [{
          "color": "#E96D76",
          "title": "{{.app.metadata.name}}",
          "title_link": "https://argocd.internal/applications/{{.app.metadata.name}}",
          "fields": [
            {"title": "Sync Status", "value": "{{.app.status.sync.status}}", "short": true},
            {"title": "Health", "value": "{{.app.status.health.status}}", "short": true}
          ]
        }]
  service.slack: |
    token: $slack-token
    username: ArgoCD
    icon: ":argocd:"

RBAC으로 프로젝트 격리

# argocd-rbac-cm ConfigMap의 policy.csv
p, role:platform-team, applications, *, infrastructure/*, allow
p, role:platform-team, clusters, get, *, allow

p, role:dev-team-order, applications, get, applications/order-*, allow
p, role:dev-team-order, applications, sync, applications/order-*, allow
p, role:dev-team-order, applications, action/*, applications/order-*, allow

p, role:dev-team-payment, applications, get, applications/payment-*, allow
p, role:dev-team-payment, applications, sync, applications/payment-*, allow

g, platform-admins, role:platform-team
g, order-team, role:dev-team-order
g, payment-team, role:dev-team-payment

마치며

ArgoCD를 활용한 멀티 클러스터 GitOps는 단순히 도구를 설치하는 것이 아니라, 조직의 배포 문화를 바꾸는 것입니다. ApplicationSet으로 클러스터를 동적으로 관리하고, App of Apps 패턴으로 계층적 구조를 만들며, Sync Wave와 Hook으로 배포 순서를 제어합니다. 그리고 모든 변경은 Git PR을 통해 리뷰되고, 자동으로 동기화됩니다.

가장 중요한 원칙은 Git이 진실의 소스라는 것입니다. 긴급 상황에서 kubectl edit으로 직접 수정하고 싶은 유혹이 있지만, ArgoCD의 selfHeal 기능이 이를 원래 상태로 되돌릴 것입니다. Git을 통한 변경만이 영속적이며, 추적 가능하고, 롤백 가능합니다.

참고자료

  1. ArgoCD 공식 문서 - Declarative Setup
  2. ArgoCD ApplicationSet - Generators
  3. ArgoCD Sync Waves and Hooks
  4. Codefresh - ArgoCD ApplicationSet Multi-Cluster Deployment
  5. Red Hat - How to Automate Multi-Cluster Deployments Using Argo CD
  6. ArgoCD Notifications Documentation
  7. DigitalOcean - Manage Multi-Cluster Deployments with ArgoCD

ArgoCD GitOps Multi-Cluster Deployment Strategy and Operations Automation Guide

ArgoCD GitOps Multi-Cluster Deployment

Introduction

As more organizations operate multi-cluster Kubernetes environments, the importance of consistent deployment pipelines continues to grow. The days of manually running kubectl apply on each cluster are over. GitOps is an operational paradigm that uses a Git repository as the Single Source of Truth, declaratively managing infrastructure and application state.

ArgoCD is a CNCF Graduated project and the de facto standard for Kubernetes-native GitOps tools. This article covers everything you need for production operations with ArgoCD-based multi-cluster deployments, from architecture design to ApplicationSet, App of Apps pattern, Sync Wave, security, and disaster recovery.

ArgoCD Architecture and Core Components

Architecture Overview

ArgoCD consists of the following core components:

┌─────────────────────────────────────────────────┐
ArgoCD Server│  ┌──────────┐  ┌───────────┐  ┌──────────────┐  │
│  │ API Server│Repo Server│App Controller││  └──────────┘  └───────────┘  └──────────────┘  │
│  ┌──────────────────┐  ┌─────────────────────┐  │
│  │ ApplicationSet    │  │ Notification         │  │
│  │ Controller        │  │ Controller           │  │
│  └──────────────────┘  └─────────────────────┘  │
└─────────────────────────────────────────────────┘
         │                    │
    ┌────┴────┐          ┌───┴────┐
Git Repo │          │ K8s     (Source) │          │Clusters│
    └─────────┘          └────────┘
  • API Server: Handles UI and CLI requests, authentication/authorization
  • Repo Server: Fetches and renders manifests from Git repositories (Helm, Kustomize, Jsonnet, etc.)
  • Application Controller: Compares actual cluster state with Git state and synchronizes
  • ApplicationSet Controller: Automatically generates multiple Applications based on templates
  • Notification Controller: Sends alerts on Sync state changes via Slack, Teams, Webhook, etc.

ArgoCD Deployment Models for Multi-Cluster

In a multi-cluster environment, the first design decision is where and how to deploy ArgoCD itself.

ModelStructureProsCons
Hub-SpokeSingle ArgoCD on management cluster, deploys to worker clustersCentralized management, consistent policiesSPOF, network dependency
StandaloneArgoCD installed on each clusterIndependence, network isolationManagement overhead, configuration drift
HybridHub for common infra, each cluster for app deploymentsBalanced approachArchitecture complexity

Most organizations adopt the Hub-Spoke model. In this model, ArgoCD is installed on a management cluster, and remote clusters are registered for deployment.

Multi-Cluster Registration and Secret Management

Cluster Registration

Register remote clusters using the ArgoCD CLI:

# Check kubeconfig contexts
kubectl config get-contexts

# Add cluster (ArgoCD auto-creates ServiceAccount)
argocd cluster add staging-cluster \
  --name staging \
  --label env=staging \
  --label region=ap-northeast-2

argocd cluster add production-apne2 \
  --name production-apne2 \
  --label env=production \
  --label region=ap-northeast-2

argocd cluster add production-use1 \
  --name production-use1 \
  --label env=production \
  --label region=us-east-1

Internally, ArgoCD stores each cluster's information as a Secret resource. To manage it declaratively, create Secrets directly as follows:

apiVersion: v1
kind: Secret
metadata:
  name: production-apne2-cluster
  namespace: argocd
  labels:
    argocd.argoproj.io/secret-type: cluster
    env: production
    region: ap-northeast-2
type: Opaque
stringData:
  name: production-apne2
  server: 'https://kubernetes.production-apne2.internal:6443'
  config: |
    {
      "bearerToken": "<service-account-token>",
      "tlsClientConfig": {
        "insecure": false,
        "caData": "<base64-encoded-ca-cert>"
      }
    }

Secret Management Strategy

Storing secrets in plaintext in Git in a multi-cluster environment leads to security incidents. Combine the following tools with ArgoCD:

  • Sealed Secrets: Encrypt with public key, store in Git, decrypt in cluster
  • External Secrets Operator (ESO): Sync secrets from AWS Secrets Manager, HashiCorp Vault, etc.
  • SOPS + age/KMS: File-level encryption

The pattern of combining ESO with ArgoCD is the most widely used:

apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: database-credentials
  namespace: production
spec:
  refreshInterval: 1h
  secretStoreRef:
    name: aws-secrets-manager
    kind: ClusterSecretStore
  target:
    name: database-credentials
    creationPolicy: Owner
  data:
    - secretKey: DB_HOST
      remoteRef:
        key: production/database
        property: host
    - secretKey: DB_PASSWORD
      remoteRef:
        key: production/database
        property: password

ApplicationSet Controller and Generator Patterns

ApplicationSet is the core tool for multi-cluster deployment. A single ApplicationSet can automatically deploy the same application across dozens of clusters.

Cluster Generator

Generates Applications based on cluster information registered in ArgoCD:

apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: prometheus-stack
  namespace: argocd
spec:
  goTemplate: true
  goTemplateOptions: ['missingkey=error']
  generators:
    - clusters:
        selector:
          matchLabels:
            env: production
        values:
          helmReleaseName: kube-prometheus
  template:
    metadata:
      name: 'prometheus-{{.name}}'
    spec:
      project: infrastructure
      source:
        repoURL: 'https://github.com/org/k8s-infrastructure.git'
        targetRevision: main
        path: 'clusters/{{.metadata.labels.region}}/prometheus'
        helm:
          releaseName: '{{.values.helmReleaseName}}'
          valueFiles:
            - 'values.yaml'
            - 'values-{{.metadata.labels.env}}.yaml'
      destination:
        server: '{{.server}}'
        namespace: monitoring
      syncPolicy:
        automated:
          prune: true
          selfHeal: true
        syncOptions:
          - CreateNamespace=true
          - ServerSideApply=true

This configuration deploys the Prometheus stack to all clusters with the env=production label. When you add a new cluster with the same label, an Application is automatically created.

Matrix Generator - Cluster x Service Combinations

The Matrix Generator is useful when deploying multiple services across multiple clusters:

apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: microservices
  namespace: argocd
spec:
  goTemplate: true
  generators:
    - matrix:
        generators:
          - clusters:
              selector:
                matchLabels:
                  env: production
          - list:
              elements:
                - service: order-service
                  replicas: '3'
                  memory: '512Mi'
                - service: payment-service
                  replicas: '2'
                  memory: '1Gi'
                - service: notification-service
                  replicas: '2'
                  memory: '256Mi'
  template:
    metadata:
      name: '{{.service}}-{{.name}}'
      annotations:
        notifications.argoproj.io/subscribe.on-sync-failed.slack: deploy-alerts
    spec:
      project: applications
      source:
        repoURL: 'https://github.com/org/k8s-apps.git'
        targetRevision: main
        path: 'apps/{{.service}}'
        helm:
          parameters:
            - name: replicaCount
              value: '{{.replicas}}'
            - name: resources.requests.memory
              value: '{{.memory}}'
      destination:
        server: '{{.server}}'
        namespace: '{{.service}}'
      syncPolicy:
        automated:
          prune: true
          selfHeal: true
        syncOptions:
          - CreateNamespace=true

The Matrix Generator automatically creates 3 clusters x 3 services = 9 Applications. This is the most powerful aspect of ApplicationSet.

App of Apps Pattern Design

The App of Apps pattern is a hierarchical approach that creates an Application to manage ArgoCD Applications. Application manifests are stored in a Git repository, and a root Application references them.

Directory Structure

k8s-gitops/
├── root-apps/
│   ├── infrastructure.yaml      # Infrastructure App of Apps
│   ├── platform.yaml            # Platform App of Apps
│   └── applications.yaml        # Business app App of Apps
├── infrastructure/
│   ├── cert-manager.yaml
│   ├── external-secrets.yaml
│   ├── ingress-nginx.yaml
│   └── prometheus-stack.yaml
├── platform/
│   ├── istio.yaml
│   ├── kafka.yaml
│   └── redis.yaml
└── applications/
    ├── order-service.yaml
    ├── payment-service.yaml
    └── notification-service.yaml

Root Application Configuration

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: root-infrastructure
  namespace: argocd
  annotations:
    argocd.argoproj.io/sync-wave: '-2'
spec:
  project: default
  source:
    repoURL: 'https://github.com/org/k8s-gitops.git'
    targetRevision: main
    path: infrastructure
  destination:
    server: https://kubernetes.default.svc
    namespace: argocd
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
---
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: root-platform
  namespace: argocd
  annotations:
    argocd.argoproj.io/sync-wave: '-1'
spec:
  project: default
  source:
    repoURL: 'https://github.com/org/k8s-gitops.git'
    targetRevision: main
    path: platform
  destination:
    server: https://kubernetes.default.svc
    namespace: argocd
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
---
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: root-applications
  namespace: argocd
  annotations:
    argocd.argoproj.io/sync-wave: '0'
spec:
  project: default
  source:
    repoURL: 'https://github.com/org/k8s-gitops.git'
    targetRevision: main
    path: applications
  destination:
    server: https://kubernetes.default.svc
    namespace: argocd
  syncPolicy:
    automated:
      prune: true
      selfHeal: true

Controlling Deployment Order with Sync Wave and Hooks

Sync Wave Basics

Sync Wave defines the deployment order of resources using the argocd.argoproj.io/sync-wave annotation. Resources are deployed starting from the lowest number, and resources within the same Wave are processed in parallel.

# Wave -1: Create Namespace and RBAC first
apiVersion: v1
kind: Namespace
metadata:
  name: payment-service
  annotations:
    argocd.argoproj.io/sync-wave: '-1'
---
# Wave 0: ConfigMap and Secret
apiVersion: v1
kind: ConfigMap
metadata:
  name: payment-config
  namespace: payment-service
  annotations:
    argocd.argoproj.io/sync-wave: '0'
data:
  DATABASE_URL: 'postgresql://db.internal:5432/payments'
  KAFKA_BROKERS: 'kafka-0.kafka:9092'
---
# Wave 1: Main Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: payment-service
  namespace: payment-service
  annotations:
    argocd.argoproj.io/sync-wave: '1'
spec:
  replicas: 3
  selector:
    matchLabels:
      app: payment-service
  template:
    metadata:
      labels:
        app: payment-service
    spec:
      containers:
        - name: payment
          image: registry.internal/payment-service:v2.3.1
          envFrom:
            - configMapRef:
                name: payment-config
---
# Wave 2: Service exposure
apiVersion: v1
kind: Service
metadata:
  name: payment-service
  namespace: payment-service
  annotations:
    argocd.argoproj.io/sync-wave: '2'
spec:
  selector:
    app: payment-service
  ports:
    - port: 8080
      targetPort: 8080

Using Sync Hooks

Hooks are resources (typically Jobs) that execute at specific points in the Sync lifecycle:

# PreSync Hook: Database migration before deployment
apiVersion: batch/v1
kind: Job
metadata:
  name: db-migration
  namespace: payment-service
  annotations:
    argocd.argoproj.io/hook: PreSync
    argocd.argoproj.io/hook-delete-policy: BeforeHookCreation
    argocd.argoproj.io/sync-wave: '-1'
spec:
  template:
    spec:
      containers:
        - name: migrate
          image: registry.internal/payment-service:v2.3.1
          command: ['python', 'manage.py', 'migrate']
          envFrom:
            - secretRef:
                name: database-credentials
      restartPolicy: Never
  backoffLimit: 3
---
# PostSync Hook: Smoke test after deployment
apiVersion: batch/v1
kind: Job
metadata:
  name: smoke-test
  namespace: payment-service
  annotations:
    argocd.argoproj.io/hook: PostSync
    argocd.argoproj.io/hook-delete-policy: HookSucceeded
spec:
  template:
    spec:
      containers:
        - name: smoke-test
          image: registry.internal/smoke-tester:latest
          command: ['./run-tests.sh']
          env:
            - name: TARGET_URL
              value: 'http://payment-service.payment-service:8080'
            - name: TEST_SUITE
              value: 'smoke'
      restartPolicy: Never
  backoffLimit: 1

Hook types and execution timing:

HookExecution TimingUse Cases
PreSyncBefore manifest synchronizationDB migration, backup
SyncAlong with manifest syncSpecial order deployments
PostSyncAfter all resources are HealthySmoke tests, notifications
SyncFailWhen Sync failsRollback, alert notifications
SkipExcluded from synchronizationManually managed resources

ArgoCD vs Flux vs Jenkins GitOps Comparison

ItemArgoCDFlux v2Jenkins + GitOps
CNCF StatusGraduatedGraduatedN/A
UI DashboardBuilt-in (feature-rich)Weave GitOps (separate)Built-in to Jenkins
Multi-ClusterHub-Spoke nativePer-cluster install recommendedRequires separate config
ApplicationSetNative supportSimilar via KustomizationNot supported
Helm SupportNativeHelmRelease CRDPlugin
Auto Image UpdateImage Updater (separate)Native supportPipeline trigger
RBACProject-based granularKubernetes RBACJenkins native RBAC
Sync Wave/HookNative supportHealth Check basedPipeline stages
NotificationsNotification ControllerAlert ProviderPlugin
Learning CurveMediumMedium to HighHigh (requires Groovy)
Community SizeVery active (17k+ Stars)Active (6k+ Stars)Very active

ArgoCD excels in UI intuitiveness, powerful ApplicationSet templating capabilities, and Hub-Spoke multi-cluster support. Flux has strengths in automatic image updates and its Kubernetes-native design philosophy.

Failure Scenarios and Recovery

Scenario 1: Sync Gets Stuck

Symptom: Application is stuck in Syncing state and does not progress

# Check Application status
argocd app get payment-service --show-operation

# Force terminate Sync operation
argocd app terminate-op payment-service

# Retry Sync
argocd app sync payment-service --retry-limit 3 --retry-backoff-duration 10s

Cause: PreSync Hook Job failed, or resource did not reach Healthy state

Scenario 2: Git Repository Inaccessible

Symptom: All Applications show Unknown status

# Check Repo Server logs
kubectl logs -n argocd deployment/argocd-repo-server --tail=100

# Test Git connection
argocd repo list

# Re-register repository
argocd repo add https://github.com/org/k8s-gitops.git \
  --username deploy-bot \
  --password "$GITHUB_TOKEN"

Recovery: ArgoCD caches the last successfully applied manifests, so even if Git is temporarily unavailable, the existing deployment state is maintained. However, new deployments are not possible.

Scenario 3: Remote Cluster Connection Lost

# Check cluster status
argocd cluster list

# All apps for a specific cluster
argocd app list --dest-server https://kubernetes.production-apne2.internal:6443

# Renew cluster credentials
argocd cluster set production-apne2 \
  --server https://kubernetes.production-apne2.internal:6443 \
  --bearer-token "new-token-here"

Scenario 4: Rolling Back a Bad Deployment

# Check Application history
argocd app history payment-service

# Rollback to a specific revision
argocd app rollback payment-service 42

# Or Git revert followed by auto Sync
git revert HEAD
git push origin main

The recommended rollback method in GitOps is Git revert rather than argocd app rollback. Since Git is the source of truth, rollback records should be preserved in the Git history.

Operations Automation Best Practices

Notification Configuration

apiVersion: v1
kind: ConfigMap
metadata:
  name: argocd-notifications-cm
  namespace: argocd
data:
  trigger.on-sync-failed: |
    - description: Application sync has failed
      send:
        - slack-deploy-alert
      when: app.status.operationState.phase in ['Error', 'Failed']
  trigger.on-health-degraded: |
    - description: Application health has degraded
      send:
        - slack-deploy-alert
      when: app.status.health.status == 'Degraded'
  template.slack-deploy-alert: |
    message: |
      Application *{{.app.metadata.name}}* sync {{.app.status.operationState.phase}}.
      Revision: {{.app.status.sync.revision}}
      Cluster: {{.app.spec.destination.server}}
    slack:
      attachments: |
        [{
          "color": "#E96D76",
          "title": "{{.app.metadata.name}}",
          "title_link": "https://argocd.internal/applications/{{.app.metadata.name}}",
          "fields": [
            {"title": "Sync Status", "value": "{{.app.status.sync.status}}", "short": true},
            {"title": "Health", "value": "{{.app.status.health.status}}", "short": true}
          ]
        }]
  service.slack: |
    token: $slack-token
    username: ArgoCD
    icon: ":argocd:"

Project Isolation with RBAC

# policy.csv in argocd-rbac-cm ConfigMap
p, role:platform-team, applications, *, infrastructure/*, allow
p, role:platform-team, clusters, get, *, allow

p, role:dev-team-order, applications, get, applications/order-*, allow
p, role:dev-team-order, applications, sync, applications/order-*, allow
p, role:dev-team-order, applications, action/*, applications/order-*, allow

p, role:dev-team-payment, applications, get, applications/payment-*, allow
p, role:dev-team-payment, applications, sync, applications/payment-*, allow

g, platform-admins, role:platform-team
g, order-team, role:dev-team-order
g, payment-team, role:dev-team-payment

Conclusion

Multi-cluster GitOps with ArgoCD is not just about installing a tool -- it is about changing your organization's deployment culture. Use ApplicationSet to dynamically manage clusters, the App of Apps pattern to create hierarchical structures, and Sync Wave and Hooks to control deployment order. All changes are reviewed through Git PRs and automatically synchronized.

The most important principle is that Git is the source of truth. In emergencies, there may be a temptation to make direct changes with kubectl edit, but ArgoCD's selfHeal feature will revert them to the original state. Only changes made through Git are persistent, traceable, and rollbackable.

References

  1. ArgoCD Official Documentation - Declarative Setup
  2. ArgoCD ApplicationSet - Generators
  3. ArgoCD Sync Waves and Hooks
  4. Codefresh - ArgoCD ApplicationSet Multi-Cluster Deployment
  5. Red Hat - How to Automate Multi-Cluster Deployments Using Argo CD
  6. ArgoCD Notifications Documentation
  7. DigitalOcean - Manage Multi-Cluster Deployments with ArgoCD

Quiz

Q1: What is the main topic covered in "ArgoCD GitOps Multi-Cluster Deployment Strategy and Operations Automation Guide"?

From ArgoCD-based GitOps multi-cluster deployment architecture design to ApplicationSet, App of Apps pattern, Sync Wave, security configuration, and disaster recovery.

Q2: Describe the ArgoCD Architecture and Core Components. Architecture Overview ArgoCD consists of the following core components: API Server: Handles UI and CLI requests, authentication/authorization Repo Server: Fetches and renders manifests from Git repositories (Helm, Kustomize, Jsonnet, etc.) Application Controller: Compares actual...

Q3: Explain the core concept of Multi-Cluster Registration and Secret Management.

Cluster Registration Register remote clusters using the ArgoCD CLI: Internally, ArgoCD stores each cluster's information as a Secret resource.

Q4: What are the key aspects of ApplicationSet Controller and Generator Patterns?

ApplicationSet is the core tool for multi-cluster deployment. A single ApplicationSet can automatically deploy the same application across dozens of clusters.

Q5: Describe the App of Apps Pattern Design. The App of Apps pattern is a hierarchical approach that creates an Application to manage ArgoCD Applications. Application manifests are stored in a Git repository, and a root Application references them. Directory Structure Root Application Configuration