- Authors
- Name
- Introduction
- ArgoCD Architecture and Core Components
- Multi-Cluster Registration and Secret Management
- ApplicationSet Controller and Generator Patterns
- App of Apps Pattern Design
- Controlling Deployment Order with Sync Wave and Hooks
- ArgoCD vs Flux vs Jenkins GitOps Comparison
- Failure Scenarios and Recovery
- Operations Automation Best Practices
- Conclusion
- References

Introduction
As more organizations operate multi-cluster Kubernetes environments, the importance of consistent deployment pipelines continues to grow. The days of manually running kubectl apply on each cluster are over. GitOps is an operational paradigm that uses a Git repository as the Single Source of Truth, declaratively managing infrastructure and application state.
ArgoCD is a CNCF Graduated project and the de facto standard for Kubernetes-native GitOps tools. This article covers everything you need for production operations with ArgoCD-based multi-cluster deployments, from architecture design to ApplicationSet, App of Apps pattern, Sync Wave, security, and disaster recovery.
ArgoCD Architecture and Core Components
Architecture Overview
ArgoCD consists of the following core components:
┌─────────────────────────────────────────────────┐
│ ArgoCD Server │
│ ┌──────────┐ ┌───────────┐ ┌──────────────┐ │
│ │ API Server│ │ Repo Server│ │ App Controller│ │
│ └──────────┘ └───────────┘ └──────────────┘ │
│ ┌──────────────────┐ ┌─────────────────────┐ │
│ │ ApplicationSet │ │ Notification │ │
│ │ Controller │ │ Controller │ │
│ └──────────────────┘ └─────────────────────┘ │
└─────────────────────────────────────────────────┘
│ │
┌────┴────┐ ┌───┴────┐
│ Git Repo │ │ K8s │
│ (Source) │ │Clusters│
└─────────┘ └────────┘
- API Server: Handles UI and CLI requests, authentication/authorization
- Repo Server: Fetches and renders manifests from Git repositories (Helm, Kustomize, Jsonnet, etc.)
- Application Controller: Compares actual cluster state with Git state and synchronizes
- ApplicationSet Controller: Automatically generates multiple Applications based on templates
- Notification Controller: Sends alerts on Sync state changes via Slack, Teams, Webhook, etc.
ArgoCD Deployment Models for Multi-Cluster
In a multi-cluster environment, the first design decision is where and how to deploy ArgoCD itself.
| Model | Structure | Pros | Cons |
|---|---|---|---|
| Hub-Spoke | Single ArgoCD on management cluster, deploys to worker clusters | Centralized management, consistent policies | SPOF, network dependency |
| Standalone | ArgoCD installed on each cluster | Independence, network isolation | Management overhead, configuration drift |
| Hybrid | Hub for common infra, each cluster for app deployments | Balanced approach | Architecture complexity |
Most organizations adopt the Hub-Spoke model. In this model, ArgoCD is installed on a management cluster, and remote clusters are registered for deployment.
Multi-Cluster Registration and Secret Management
Cluster Registration
Register remote clusters using the ArgoCD CLI:
# Check kubeconfig contexts
kubectl config get-contexts
# Add cluster (ArgoCD auto-creates ServiceAccount)
argocd cluster add staging-cluster \
--name staging \
--label env=staging \
--label region=ap-northeast-2
argocd cluster add production-apne2 \
--name production-apne2 \
--label env=production \
--label region=ap-northeast-2
argocd cluster add production-use1 \
--name production-use1 \
--label env=production \
--label region=us-east-1
Internally, ArgoCD stores each cluster's information as a Secret resource. To manage it declaratively, create Secrets directly as follows:
apiVersion: v1
kind: Secret
metadata:
name: production-apne2-cluster
namespace: argocd
labels:
argocd.argoproj.io/secret-type: cluster
env: production
region: ap-northeast-2
type: Opaque
stringData:
name: production-apne2
server: 'https://kubernetes.production-apne2.internal:6443'
config: |
{
"bearerToken": "<service-account-token>",
"tlsClientConfig": {
"insecure": false,
"caData": "<base64-encoded-ca-cert>"
}
}
Secret Management Strategy
Storing secrets in plaintext in Git in a multi-cluster environment leads to security incidents. Combine the following tools with ArgoCD:
- Sealed Secrets: Encrypt with public key, store in Git, decrypt in cluster
- External Secrets Operator (ESO): Sync secrets from AWS Secrets Manager, HashiCorp Vault, etc.
- SOPS + age/KMS: File-level encryption
The pattern of combining ESO with ArgoCD is the most widely used:
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: database-credentials
namespace: production
spec:
refreshInterval: 1h
secretStoreRef:
name: aws-secrets-manager
kind: ClusterSecretStore
target:
name: database-credentials
creationPolicy: Owner
data:
- secretKey: DB_HOST
remoteRef:
key: production/database
property: host
- secretKey: DB_PASSWORD
remoteRef:
key: production/database
property: password
ApplicationSet Controller and Generator Patterns
ApplicationSet is the core tool for multi-cluster deployment. A single ApplicationSet can automatically deploy the same application across dozens of clusters.
Cluster Generator
Generates Applications based on cluster information registered in ArgoCD:
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
name: prometheus-stack
namespace: argocd
spec:
goTemplate: true
goTemplateOptions: ['missingkey=error']
generators:
- clusters:
selector:
matchLabels:
env: production
values:
helmReleaseName: kube-prometheus
template:
metadata:
name: 'prometheus-{{.name}}'
spec:
project: infrastructure
source:
repoURL: 'https://github.com/org/k8s-infrastructure.git'
targetRevision: main
path: 'clusters/{{.metadata.labels.region}}/prometheus'
helm:
releaseName: '{{.values.helmReleaseName}}'
valueFiles:
- 'values.yaml'
- 'values-{{.metadata.labels.env}}.yaml'
destination:
server: '{{.server}}'
namespace: monitoring
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
- ServerSideApply=true
This configuration deploys the Prometheus stack to all clusters with the env=production label. When you add a new cluster with the same label, an Application is automatically created.
Matrix Generator - Cluster x Service Combinations
The Matrix Generator is useful when deploying multiple services across multiple clusters:
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
name: microservices
namespace: argocd
spec:
goTemplate: true
generators:
- matrix:
generators:
- clusters:
selector:
matchLabels:
env: production
- list:
elements:
- service: order-service
replicas: '3'
memory: '512Mi'
- service: payment-service
replicas: '2'
memory: '1Gi'
- service: notification-service
replicas: '2'
memory: '256Mi'
template:
metadata:
name: '{{.service}}-{{.name}}'
annotations:
notifications.argoproj.io/subscribe.on-sync-failed.slack: deploy-alerts
spec:
project: applications
source:
repoURL: 'https://github.com/org/k8s-apps.git'
targetRevision: main
path: 'apps/{{.service}}'
helm:
parameters:
- name: replicaCount
value: '{{.replicas}}'
- name: resources.requests.memory
value: '{{.memory}}'
destination:
server: '{{.server}}'
namespace: '{{.service}}'
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
The Matrix Generator automatically creates 3 clusters x 3 services = 9 Applications. This is the most powerful aspect of ApplicationSet.
App of Apps Pattern Design
The App of Apps pattern is a hierarchical approach that creates an Application to manage ArgoCD Applications. Application manifests are stored in a Git repository, and a root Application references them.
Directory Structure
k8s-gitops/
├── root-apps/
│ ├── infrastructure.yaml # Infrastructure App of Apps
│ ├── platform.yaml # Platform App of Apps
│ └── applications.yaml # Business app App of Apps
├── infrastructure/
│ ├── cert-manager.yaml
│ ├── external-secrets.yaml
│ ├── ingress-nginx.yaml
│ └── prometheus-stack.yaml
├── platform/
│ ├── istio.yaml
│ ├── kafka.yaml
│ └── redis.yaml
└── applications/
├── order-service.yaml
├── payment-service.yaml
└── notification-service.yaml
Root Application Configuration
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: root-infrastructure
namespace: argocd
annotations:
argocd.argoproj.io/sync-wave: '-2'
spec:
project: default
source:
repoURL: 'https://github.com/org/k8s-gitops.git'
targetRevision: main
path: infrastructure
destination:
server: https://kubernetes.default.svc
namespace: argocd
syncPolicy:
automated:
prune: true
selfHeal: true
---
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: root-platform
namespace: argocd
annotations:
argocd.argoproj.io/sync-wave: '-1'
spec:
project: default
source:
repoURL: 'https://github.com/org/k8s-gitops.git'
targetRevision: main
path: platform
destination:
server: https://kubernetes.default.svc
namespace: argocd
syncPolicy:
automated:
prune: true
selfHeal: true
---
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: root-applications
namespace: argocd
annotations:
argocd.argoproj.io/sync-wave: '0'
spec:
project: default
source:
repoURL: 'https://github.com/org/k8s-gitops.git'
targetRevision: main
path: applications
destination:
server: https://kubernetes.default.svc
namespace: argocd
syncPolicy:
automated:
prune: true
selfHeal: true
Controlling Deployment Order with Sync Wave and Hooks
Sync Wave Basics
Sync Wave defines the deployment order of resources using the argocd.argoproj.io/sync-wave annotation. Resources are deployed starting from the lowest number, and resources within the same Wave are processed in parallel.
# Wave -1: Create Namespace and RBAC first
apiVersion: v1
kind: Namespace
metadata:
name: payment-service
annotations:
argocd.argoproj.io/sync-wave: '-1'
---
# Wave 0: ConfigMap and Secret
apiVersion: v1
kind: ConfigMap
metadata:
name: payment-config
namespace: payment-service
annotations:
argocd.argoproj.io/sync-wave: '0'
data:
DATABASE_URL: 'postgresql://db.internal:5432/payments'
KAFKA_BROKERS: 'kafka-0.kafka:9092'
---
# Wave 1: Main Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: payment-service
namespace: payment-service
annotations:
argocd.argoproj.io/sync-wave: '1'
spec:
replicas: 3
selector:
matchLabels:
app: payment-service
template:
metadata:
labels:
app: payment-service
spec:
containers:
- name: payment
image: registry.internal/payment-service:v2.3.1
envFrom:
- configMapRef:
name: payment-config
---
# Wave 2: Service exposure
apiVersion: v1
kind: Service
metadata:
name: payment-service
namespace: payment-service
annotations:
argocd.argoproj.io/sync-wave: '2'
spec:
selector:
app: payment-service
ports:
- port: 8080
targetPort: 8080
Using Sync Hooks
Hooks are resources (typically Jobs) that execute at specific points in the Sync lifecycle:
# PreSync Hook: Database migration before deployment
apiVersion: batch/v1
kind: Job
metadata:
name: db-migration
namespace: payment-service
annotations:
argocd.argoproj.io/hook: PreSync
argocd.argoproj.io/hook-delete-policy: BeforeHookCreation
argocd.argoproj.io/sync-wave: '-1'
spec:
template:
spec:
containers:
- name: migrate
image: registry.internal/payment-service:v2.3.1
command: ['python', 'manage.py', 'migrate']
envFrom:
- secretRef:
name: database-credentials
restartPolicy: Never
backoffLimit: 3
---
# PostSync Hook: Smoke test after deployment
apiVersion: batch/v1
kind: Job
metadata:
name: smoke-test
namespace: payment-service
annotations:
argocd.argoproj.io/hook: PostSync
argocd.argoproj.io/hook-delete-policy: HookSucceeded
spec:
template:
spec:
containers:
- name: smoke-test
image: registry.internal/smoke-tester:latest
command: ['./run-tests.sh']
env:
- name: TARGET_URL
value: 'http://payment-service.payment-service:8080'
- name: TEST_SUITE
value: 'smoke'
restartPolicy: Never
backoffLimit: 1
Hook types and execution timing:
| Hook | Execution Timing | Use Cases |
|---|---|---|
| PreSync | Before manifest synchronization | DB migration, backup |
| Sync | Along with manifest sync | Special order deployments |
| PostSync | After all resources are Healthy | Smoke tests, notifications |
| SyncFail | When Sync fails | Rollback, alert notifications |
| Skip | Excluded from synchronization | Manually managed resources |
ArgoCD vs Flux vs Jenkins GitOps Comparison
| Item | ArgoCD | Flux v2 | Jenkins + GitOps |
|---|---|---|---|
| CNCF Status | Graduated | Graduated | N/A |
| UI Dashboard | Built-in (feature-rich) | Weave GitOps (separate) | Built-in to Jenkins |
| Multi-Cluster | Hub-Spoke native | Per-cluster install recommended | Requires separate config |
| ApplicationSet | Native support | Similar via Kustomization | Not supported |
| Helm Support | Native | HelmRelease CRD | Plugin |
| Auto Image Update | Image Updater (separate) | Native support | Pipeline trigger |
| RBAC | Project-based granular | Kubernetes RBAC | Jenkins native RBAC |
| Sync Wave/Hook | Native support | Health Check based | Pipeline stages |
| Notifications | Notification Controller | Alert Provider | Plugin |
| Learning Curve | Medium | Medium to High | High (requires Groovy) |
| Community Size | Very active (17k+ Stars) | Active (6k+ Stars) | Very active |
ArgoCD excels in UI intuitiveness, powerful ApplicationSet templating capabilities, and Hub-Spoke multi-cluster support. Flux has strengths in automatic image updates and its Kubernetes-native design philosophy.
Failure Scenarios and Recovery
Scenario 1: Sync Gets Stuck
Symptom: Application is stuck in Syncing state and does not progress
# Check Application status
argocd app get payment-service --show-operation
# Force terminate Sync operation
argocd app terminate-op payment-service
# Retry Sync
argocd app sync payment-service --retry-limit 3 --retry-backoff-duration 10s
Cause: PreSync Hook Job failed, or resource did not reach Healthy state
Scenario 2: Git Repository Inaccessible
Symptom: All Applications show Unknown status
# Check Repo Server logs
kubectl logs -n argocd deployment/argocd-repo-server --tail=100
# Test Git connection
argocd repo list
# Re-register repository
argocd repo add https://github.com/org/k8s-gitops.git \
--username deploy-bot \
--password "$GITHUB_TOKEN"
Recovery: ArgoCD caches the last successfully applied manifests, so even if Git is temporarily unavailable, the existing deployment state is maintained. However, new deployments are not possible.
Scenario 3: Remote Cluster Connection Lost
# Check cluster status
argocd cluster list
# All apps for a specific cluster
argocd app list --dest-server https://kubernetes.production-apne2.internal:6443
# Renew cluster credentials
argocd cluster set production-apne2 \
--server https://kubernetes.production-apne2.internal:6443 \
--bearer-token "new-token-here"
Scenario 4: Rolling Back a Bad Deployment
# Check Application history
argocd app history payment-service
# Rollback to a specific revision
argocd app rollback payment-service 42
# Or Git revert followed by auto Sync
git revert HEAD
git push origin main
The recommended rollback method in GitOps is Git revert rather than argocd app rollback. Since Git is the source of truth, rollback records should be preserved in the Git history.
Operations Automation Best Practices
Notification Configuration
apiVersion: v1
kind: ConfigMap
metadata:
name: argocd-notifications-cm
namespace: argocd
data:
trigger.on-sync-failed: |
- description: Application sync has failed
send:
- slack-deploy-alert
when: app.status.operationState.phase in ['Error', 'Failed']
trigger.on-health-degraded: |
- description: Application health has degraded
send:
- slack-deploy-alert
when: app.status.health.status == 'Degraded'
template.slack-deploy-alert: |
message: |
Application *{{.app.metadata.name}}* sync {{.app.status.operationState.phase}}.
Revision: {{.app.status.sync.revision}}
Cluster: {{.app.spec.destination.server}}
slack:
attachments: |
[{
"color": "#E96D76",
"title": "{{.app.metadata.name}}",
"title_link": "https://argocd.internal/applications/{{.app.metadata.name}}",
"fields": [
{"title": "Sync Status", "value": "{{.app.status.sync.status}}", "short": true},
{"title": "Health", "value": "{{.app.status.health.status}}", "short": true}
]
}]
service.slack: |
token: $slack-token
username: ArgoCD
icon: ":argocd:"
Project Isolation with RBAC
# policy.csv in argocd-rbac-cm ConfigMap
p, role:platform-team, applications, *, infrastructure/*, allow
p, role:platform-team, clusters, get, *, allow
p, role:dev-team-order, applications, get, applications/order-*, allow
p, role:dev-team-order, applications, sync, applications/order-*, allow
p, role:dev-team-order, applications, action/*, applications/order-*, allow
p, role:dev-team-payment, applications, get, applications/payment-*, allow
p, role:dev-team-payment, applications, sync, applications/payment-*, allow
g, platform-admins, role:platform-team
g, order-team, role:dev-team-order
g, payment-team, role:dev-team-payment
Conclusion
Multi-cluster GitOps with ArgoCD is not just about installing a tool -- it is about changing your organization's deployment culture. Use ApplicationSet to dynamically manage clusters, the App of Apps pattern to create hierarchical structures, and Sync Wave and Hooks to control deployment order. All changes are reviewed through Git PRs and automatically synchronized.
The most important principle is that Git is the source of truth. In emergencies, there may be a temptation to make direct changes with kubectl edit, but ArgoCD's selfHeal feature will revert them to the original state. Only changes made through Git are persistent, traceable, and rollbackable.
References
- ArgoCD Official Documentation - Declarative Setup
- ArgoCD ApplicationSet - Generators
- ArgoCD Sync Waves and Hooks
- Codefresh - ArgoCD ApplicationSet Multi-Cluster Deployment
- Red Hat - How to Automate Multi-Cluster Deployments Using Argo CD
- ArgoCD Notifications Documentation
- DigitalOcean - Manage Multi-Cluster Deployments with ArgoCD