Skip to content
Published on

ArgoCD GitOps Multi-Cluster Deployment Strategy and Operations Automation Guide

Authors
  • Name
    Twitter
ArgoCD GitOps Multi-Cluster Deployment

Introduction

As more organizations operate multi-cluster Kubernetes environments, the importance of consistent deployment pipelines continues to grow. The days of manually running kubectl apply on each cluster are over. GitOps is an operational paradigm that uses a Git repository as the Single Source of Truth, declaratively managing infrastructure and application state.

ArgoCD is a CNCF Graduated project and the de facto standard for Kubernetes-native GitOps tools. This article covers everything you need for production operations with ArgoCD-based multi-cluster deployments, from architecture design to ApplicationSet, App of Apps pattern, Sync Wave, security, and disaster recovery.

ArgoCD Architecture and Core Components

Architecture Overview

ArgoCD consists of the following core components:

┌─────────────────────────────────────────────────┐
ArgoCD Server│  ┌──────────┐  ┌───────────┐  ┌──────────────┐  │
│  │ API Server│Repo Server│App Controller││  └──────────┘  └───────────┘  └──────────────┘  │
│  ┌──────────────────┐  ┌─────────────────────┐  │
│  │ ApplicationSet    │  │ Notification         │  │
│  │ Controller        │  │ Controller           │  │
│  └──────────────────┘  └─────────────────────┘  │
└─────────────────────────────────────────────────┘
         │                    │
    ┌────┴────┐          ┌───┴────┐
Git Repo │          │ K8s     (Source) │          │Clusters│
    └─────────┘          └────────┘
  • API Server: Handles UI and CLI requests, authentication/authorization
  • Repo Server: Fetches and renders manifests from Git repositories (Helm, Kustomize, Jsonnet, etc.)
  • Application Controller: Compares actual cluster state with Git state and synchronizes
  • ApplicationSet Controller: Automatically generates multiple Applications based on templates
  • Notification Controller: Sends alerts on Sync state changes via Slack, Teams, Webhook, etc.

ArgoCD Deployment Models for Multi-Cluster

In a multi-cluster environment, the first design decision is where and how to deploy ArgoCD itself.

ModelStructureProsCons
Hub-SpokeSingle ArgoCD on management cluster, deploys to worker clustersCentralized management, consistent policiesSPOF, network dependency
StandaloneArgoCD installed on each clusterIndependence, network isolationManagement overhead, configuration drift
HybridHub for common infra, each cluster for app deploymentsBalanced approachArchitecture complexity

Most organizations adopt the Hub-Spoke model. In this model, ArgoCD is installed on a management cluster, and remote clusters are registered for deployment.

Multi-Cluster Registration and Secret Management

Cluster Registration

Register remote clusters using the ArgoCD CLI:

# Check kubeconfig contexts
kubectl config get-contexts

# Add cluster (ArgoCD auto-creates ServiceAccount)
argocd cluster add staging-cluster \
  --name staging \
  --label env=staging \
  --label region=ap-northeast-2

argocd cluster add production-apne2 \
  --name production-apne2 \
  --label env=production \
  --label region=ap-northeast-2

argocd cluster add production-use1 \
  --name production-use1 \
  --label env=production \
  --label region=us-east-1

Internally, ArgoCD stores each cluster's information as a Secret resource. To manage it declaratively, create Secrets directly as follows:

apiVersion: v1
kind: Secret
metadata:
  name: production-apne2-cluster
  namespace: argocd
  labels:
    argocd.argoproj.io/secret-type: cluster
    env: production
    region: ap-northeast-2
type: Opaque
stringData:
  name: production-apne2
  server: 'https://kubernetes.production-apne2.internal:6443'
  config: |
    {
      "bearerToken": "<service-account-token>",
      "tlsClientConfig": {
        "insecure": false,
        "caData": "<base64-encoded-ca-cert>"
      }
    }

Secret Management Strategy

Storing secrets in plaintext in Git in a multi-cluster environment leads to security incidents. Combine the following tools with ArgoCD:

  • Sealed Secrets: Encrypt with public key, store in Git, decrypt in cluster
  • External Secrets Operator (ESO): Sync secrets from AWS Secrets Manager, HashiCorp Vault, etc.
  • SOPS + age/KMS: File-level encryption

The pattern of combining ESO with ArgoCD is the most widely used:

apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: database-credentials
  namespace: production
spec:
  refreshInterval: 1h
  secretStoreRef:
    name: aws-secrets-manager
    kind: ClusterSecretStore
  target:
    name: database-credentials
    creationPolicy: Owner
  data:
    - secretKey: DB_HOST
      remoteRef:
        key: production/database
        property: host
    - secretKey: DB_PASSWORD
      remoteRef:
        key: production/database
        property: password

ApplicationSet Controller and Generator Patterns

ApplicationSet is the core tool for multi-cluster deployment. A single ApplicationSet can automatically deploy the same application across dozens of clusters.

Cluster Generator

Generates Applications based on cluster information registered in ArgoCD:

apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: prometheus-stack
  namespace: argocd
spec:
  goTemplate: true
  goTemplateOptions: ['missingkey=error']
  generators:
    - clusters:
        selector:
          matchLabels:
            env: production
        values:
          helmReleaseName: kube-prometheus
  template:
    metadata:
      name: 'prometheus-{{.name}}'
    spec:
      project: infrastructure
      source:
        repoURL: 'https://github.com/org/k8s-infrastructure.git'
        targetRevision: main
        path: 'clusters/{{.metadata.labels.region}}/prometheus'
        helm:
          releaseName: '{{.values.helmReleaseName}}'
          valueFiles:
            - 'values.yaml'
            - 'values-{{.metadata.labels.env}}.yaml'
      destination:
        server: '{{.server}}'
        namespace: monitoring
      syncPolicy:
        automated:
          prune: true
          selfHeal: true
        syncOptions:
          - CreateNamespace=true
          - ServerSideApply=true

This configuration deploys the Prometheus stack to all clusters with the env=production label. When you add a new cluster with the same label, an Application is automatically created.

Matrix Generator - Cluster x Service Combinations

The Matrix Generator is useful when deploying multiple services across multiple clusters:

apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: microservices
  namespace: argocd
spec:
  goTemplate: true
  generators:
    - matrix:
        generators:
          - clusters:
              selector:
                matchLabels:
                  env: production
          - list:
              elements:
                - service: order-service
                  replicas: '3'
                  memory: '512Mi'
                - service: payment-service
                  replicas: '2'
                  memory: '1Gi'
                - service: notification-service
                  replicas: '2'
                  memory: '256Mi'
  template:
    metadata:
      name: '{{.service}}-{{.name}}'
      annotations:
        notifications.argoproj.io/subscribe.on-sync-failed.slack: deploy-alerts
    spec:
      project: applications
      source:
        repoURL: 'https://github.com/org/k8s-apps.git'
        targetRevision: main
        path: 'apps/{{.service}}'
        helm:
          parameters:
            - name: replicaCount
              value: '{{.replicas}}'
            - name: resources.requests.memory
              value: '{{.memory}}'
      destination:
        server: '{{.server}}'
        namespace: '{{.service}}'
      syncPolicy:
        automated:
          prune: true
          selfHeal: true
        syncOptions:
          - CreateNamespace=true

The Matrix Generator automatically creates 3 clusters x 3 services = 9 Applications. This is the most powerful aspect of ApplicationSet.

App of Apps Pattern Design

The App of Apps pattern is a hierarchical approach that creates an Application to manage ArgoCD Applications. Application manifests are stored in a Git repository, and a root Application references them.

Directory Structure

k8s-gitops/
├── root-apps/
│   ├── infrastructure.yaml      # Infrastructure App of Apps
│   ├── platform.yaml            # Platform App of Apps
│   └── applications.yaml        # Business app App of Apps
├── infrastructure/
│   ├── cert-manager.yaml
│   ├── external-secrets.yaml
│   ├── ingress-nginx.yaml
│   └── prometheus-stack.yaml
├── platform/
│   ├── istio.yaml
│   ├── kafka.yaml
│   └── redis.yaml
└── applications/
    ├── order-service.yaml
    ├── payment-service.yaml
    └── notification-service.yaml

Root Application Configuration

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: root-infrastructure
  namespace: argocd
  annotations:
    argocd.argoproj.io/sync-wave: '-2'
spec:
  project: default
  source:
    repoURL: 'https://github.com/org/k8s-gitops.git'
    targetRevision: main
    path: infrastructure
  destination:
    server: https://kubernetes.default.svc
    namespace: argocd
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
---
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: root-platform
  namespace: argocd
  annotations:
    argocd.argoproj.io/sync-wave: '-1'
spec:
  project: default
  source:
    repoURL: 'https://github.com/org/k8s-gitops.git'
    targetRevision: main
    path: platform
  destination:
    server: https://kubernetes.default.svc
    namespace: argocd
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
---
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: root-applications
  namespace: argocd
  annotations:
    argocd.argoproj.io/sync-wave: '0'
spec:
  project: default
  source:
    repoURL: 'https://github.com/org/k8s-gitops.git'
    targetRevision: main
    path: applications
  destination:
    server: https://kubernetes.default.svc
    namespace: argocd
  syncPolicy:
    automated:
      prune: true
      selfHeal: true

Controlling Deployment Order with Sync Wave and Hooks

Sync Wave Basics

Sync Wave defines the deployment order of resources using the argocd.argoproj.io/sync-wave annotation. Resources are deployed starting from the lowest number, and resources within the same Wave are processed in parallel.

# Wave -1: Create Namespace and RBAC first
apiVersion: v1
kind: Namespace
metadata:
  name: payment-service
  annotations:
    argocd.argoproj.io/sync-wave: '-1'
---
# Wave 0: ConfigMap and Secret
apiVersion: v1
kind: ConfigMap
metadata:
  name: payment-config
  namespace: payment-service
  annotations:
    argocd.argoproj.io/sync-wave: '0'
data:
  DATABASE_URL: 'postgresql://db.internal:5432/payments'
  KAFKA_BROKERS: 'kafka-0.kafka:9092'
---
# Wave 1: Main Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: payment-service
  namespace: payment-service
  annotations:
    argocd.argoproj.io/sync-wave: '1'
spec:
  replicas: 3
  selector:
    matchLabels:
      app: payment-service
  template:
    metadata:
      labels:
        app: payment-service
    spec:
      containers:
        - name: payment
          image: registry.internal/payment-service:v2.3.1
          envFrom:
            - configMapRef:
                name: payment-config
---
# Wave 2: Service exposure
apiVersion: v1
kind: Service
metadata:
  name: payment-service
  namespace: payment-service
  annotations:
    argocd.argoproj.io/sync-wave: '2'
spec:
  selector:
    app: payment-service
  ports:
    - port: 8080
      targetPort: 8080

Using Sync Hooks

Hooks are resources (typically Jobs) that execute at specific points in the Sync lifecycle:

# PreSync Hook: Database migration before deployment
apiVersion: batch/v1
kind: Job
metadata:
  name: db-migration
  namespace: payment-service
  annotations:
    argocd.argoproj.io/hook: PreSync
    argocd.argoproj.io/hook-delete-policy: BeforeHookCreation
    argocd.argoproj.io/sync-wave: '-1'
spec:
  template:
    spec:
      containers:
        - name: migrate
          image: registry.internal/payment-service:v2.3.1
          command: ['python', 'manage.py', 'migrate']
          envFrom:
            - secretRef:
                name: database-credentials
      restartPolicy: Never
  backoffLimit: 3
---
# PostSync Hook: Smoke test after deployment
apiVersion: batch/v1
kind: Job
metadata:
  name: smoke-test
  namespace: payment-service
  annotations:
    argocd.argoproj.io/hook: PostSync
    argocd.argoproj.io/hook-delete-policy: HookSucceeded
spec:
  template:
    spec:
      containers:
        - name: smoke-test
          image: registry.internal/smoke-tester:latest
          command: ['./run-tests.sh']
          env:
            - name: TARGET_URL
              value: 'http://payment-service.payment-service:8080'
            - name: TEST_SUITE
              value: 'smoke'
      restartPolicy: Never
  backoffLimit: 1

Hook types and execution timing:

HookExecution TimingUse Cases
PreSyncBefore manifest synchronizationDB migration, backup
SyncAlong with manifest syncSpecial order deployments
PostSyncAfter all resources are HealthySmoke tests, notifications
SyncFailWhen Sync failsRollback, alert notifications
SkipExcluded from synchronizationManually managed resources

ArgoCD vs Flux vs Jenkins GitOps Comparison

ItemArgoCDFlux v2Jenkins + GitOps
CNCF StatusGraduatedGraduatedN/A
UI DashboardBuilt-in (feature-rich)Weave GitOps (separate)Built-in to Jenkins
Multi-ClusterHub-Spoke nativePer-cluster install recommendedRequires separate config
ApplicationSetNative supportSimilar via KustomizationNot supported
Helm SupportNativeHelmRelease CRDPlugin
Auto Image UpdateImage Updater (separate)Native supportPipeline trigger
RBACProject-based granularKubernetes RBACJenkins native RBAC
Sync Wave/HookNative supportHealth Check basedPipeline stages
NotificationsNotification ControllerAlert ProviderPlugin
Learning CurveMediumMedium to HighHigh (requires Groovy)
Community SizeVery active (17k+ Stars)Active (6k+ Stars)Very active

ArgoCD excels in UI intuitiveness, powerful ApplicationSet templating capabilities, and Hub-Spoke multi-cluster support. Flux has strengths in automatic image updates and its Kubernetes-native design philosophy.

Failure Scenarios and Recovery

Scenario 1: Sync Gets Stuck

Symptom: Application is stuck in Syncing state and does not progress

# Check Application status
argocd app get payment-service --show-operation

# Force terminate Sync operation
argocd app terminate-op payment-service

# Retry Sync
argocd app sync payment-service --retry-limit 3 --retry-backoff-duration 10s

Cause: PreSync Hook Job failed, or resource did not reach Healthy state

Scenario 2: Git Repository Inaccessible

Symptom: All Applications show Unknown status

# Check Repo Server logs
kubectl logs -n argocd deployment/argocd-repo-server --tail=100

# Test Git connection
argocd repo list

# Re-register repository
argocd repo add https://github.com/org/k8s-gitops.git \
  --username deploy-bot \
  --password "$GITHUB_TOKEN"

Recovery: ArgoCD caches the last successfully applied manifests, so even if Git is temporarily unavailable, the existing deployment state is maintained. However, new deployments are not possible.

Scenario 3: Remote Cluster Connection Lost

# Check cluster status
argocd cluster list

# All apps for a specific cluster
argocd app list --dest-server https://kubernetes.production-apne2.internal:6443

# Renew cluster credentials
argocd cluster set production-apne2 \
  --server https://kubernetes.production-apne2.internal:6443 \
  --bearer-token "new-token-here"

Scenario 4: Rolling Back a Bad Deployment

# Check Application history
argocd app history payment-service

# Rollback to a specific revision
argocd app rollback payment-service 42

# Or Git revert followed by auto Sync
git revert HEAD
git push origin main

The recommended rollback method in GitOps is Git revert rather than argocd app rollback. Since Git is the source of truth, rollback records should be preserved in the Git history.

Operations Automation Best Practices

Notification Configuration

apiVersion: v1
kind: ConfigMap
metadata:
  name: argocd-notifications-cm
  namespace: argocd
data:
  trigger.on-sync-failed: |
    - description: Application sync has failed
      send:
        - slack-deploy-alert
      when: app.status.operationState.phase in ['Error', 'Failed']
  trigger.on-health-degraded: |
    - description: Application health has degraded
      send:
        - slack-deploy-alert
      when: app.status.health.status == 'Degraded'
  template.slack-deploy-alert: |
    message: |
      Application *{{.app.metadata.name}}* sync {{.app.status.operationState.phase}}.
      Revision: {{.app.status.sync.revision}}
      Cluster: {{.app.spec.destination.server}}
    slack:
      attachments: |
        [{
          "color": "#E96D76",
          "title": "{{.app.metadata.name}}",
          "title_link": "https://argocd.internal/applications/{{.app.metadata.name}}",
          "fields": [
            {"title": "Sync Status", "value": "{{.app.status.sync.status}}", "short": true},
            {"title": "Health", "value": "{{.app.status.health.status}}", "short": true}
          ]
        }]
  service.slack: |
    token: $slack-token
    username: ArgoCD
    icon: ":argocd:"

Project Isolation with RBAC

# policy.csv in argocd-rbac-cm ConfigMap
p, role:platform-team, applications, *, infrastructure/*, allow
p, role:platform-team, clusters, get, *, allow

p, role:dev-team-order, applications, get, applications/order-*, allow
p, role:dev-team-order, applications, sync, applications/order-*, allow
p, role:dev-team-order, applications, action/*, applications/order-*, allow

p, role:dev-team-payment, applications, get, applications/payment-*, allow
p, role:dev-team-payment, applications, sync, applications/payment-*, allow

g, platform-admins, role:platform-team
g, order-team, role:dev-team-order
g, payment-team, role:dev-team-payment

Conclusion

Multi-cluster GitOps with ArgoCD is not just about installing a tool -- it is about changing your organization's deployment culture. Use ApplicationSet to dynamically manage clusters, the App of Apps pattern to create hierarchical structures, and Sync Wave and Hooks to control deployment order. All changes are reviewed through Git PRs and automatically synchronized.

The most important principle is that Git is the source of truth. In emergencies, there may be a temptation to make direct changes with kubectl edit, but ArgoCD's selfHeal feature will revert them to the original state. Only changes made through Git are persistent, traceable, and rollbackable.

References

  1. ArgoCD Official Documentation - Declarative Setup
  2. ArgoCD ApplicationSet - Generators
  3. ArgoCD Sync Waves and Hooks
  4. Codefresh - ArgoCD ApplicationSet Multi-Cluster Deployment
  5. Red Hat - How to Automate Multi-Cluster Deployments Using Argo CD
  6. ArgoCD Notifications Documentation
  7. DigitalOcean - Manage Multi-Cluster Deployments with ArgoCD