Split View: ArgoCD Sync 엔진 분석: 동기화 메커니즘의 모든 것

ArgoCD Sync 엔진 분석: 동기화 메커니즘의 모든 것

1. Sync 엔진 개요
- Sync 상태 머신
2. Sync Phases (동기화 단계)
3. Resource Hooks 상세
4. Sync Waves와 순서 제어
5. Resource Tracking (리소스 추적)
6. Diff 엔진 상세 분석
7. Health Assessment (건강 상태 평가)
8. Pruning (리소스 정리) 상세
9. Retry 전략과 Backoff
10. Sync 옵션
11. Sync Window (동기화 시간대)
12. 실전 Sync 전략
- 안전한 프로덕션 배포 전략
- Hook을 활용한 완전한 배포 파이프라인
13. 정리

1. Sync 엔진 개요

ArgoCD의 Sync 엔진은 Git 저장소에 정의된 원하는 상태(Desired State)를 Kubernetes 클러스터에 적용하는 핵심 모듈입니다. 단순한 kubectl apply를 넘어 Hook, Wave, Health Check, Retry 등 정교한 메커니즘을 제공합니다.

Sync 상태 머신

Pending --> Running --> Succeeded
                |
                +--> Failed --> (Retry or Manual)

Sync 작업은 다음 상태를 가집니다:

상태	설명
Pending	Sync가 대기열에 있음
Running	Sync가 실행 중
Succeeded	모든 리소스가 성공적으로 동기화됨
Failed	Sync 중 오류 발생

2. Sync Phases (동기화 단계)

ArgoCD는 Sync를 여러 단계(Phase)로 나누어 실행합니다. 각 단계에서 특정 유형의 리소스를 처리합니다.

Phase 실행 순서

PreSync --> Sync --> PostSync
              |
              +--> SyncFail (Sync 실패 시에만)

PreSync Phase

PreSync는 메인 동기화 전에 실행되는 단계입니다:

apiVersion: batch/v1
kind: Job
metadata:
  name: db-migration
  annotations:
    argocd.argoproj.io/hook: PreSync
    argocd.argoproj.io/hook-delete-policy: HookSucceeded
spec:
  template:
    spec:
      containers:
        - name: migrate
          image: myapp/migration:latest
          command: ['./migrate', 'up']
      restartPolicy: Never

PreSync 사용 사례:

데이터베이스 스키마 마이그레이션
설정 사전 검증
외부 시스템 상태 확인
백업 생성

Sync Phase

메인 동기화 단계에서는 실제 Kubernetes 리소스를 클러스터에 적용합니다:

1. 모든 Sync Phase 리소스를 Wave 순서로 정렬
2. 각 Wave 그룹별로:
   a. 리소스 타입 순서에 따라 적용
   b. 각 리소스에 대해 kubectl apply 동등 작업 수행
   c. 해당 Wave의 모든 리소스가 Healthy가 될 때까지 대기
3. 다음 Wave로 진행

PostSync Phase

PostSync는 메인 동기화가 성공한 후에만 실행됩니다:

apiVersion: batch/v1
kind: Job
metadata:
  name: notification
  annotations:
    argocd.argoproj.io/hook: PostSync
    argocd.argoproj.io/hook-delete-policy: HookSucceeded
spec:
  template:
    spec:
      containers:
        - name: notify
          image: curlimages/curl:latest
          command:
            - curl
            - -X
            - POST
            - https://hooks.slack.com/services/XXX
            - -d
            - '{"text":"Deployment successful"}'
      restartPolicy: Never

PostSync 사용 사례:

배포 완료 알림 전송
스모크 테스트 실행
CDN 캐시 무효화
외부 시스템 동기화 트리거

SyncFail Phase

SyncFail은 Sync가 실패했을 때만 실행됩니다:

apiVersion: batch/v1
kind: Job
metadata:
  name: failure-notification
  annotations:
    argocd.argoproj.io/hook: SyncFail
    argocd.argoproj.io/hook-delete-policy: HookSucceeded
spec:
  template:
    spec:
      containers:
        - name: notify-failure
          image: curlimages/curl:latest
          command:
            - curl
            - -X
            - POST
            - https://hooks.slack.com/services/XXX
            - -d
            - '{"text":"Deployment FAILED"}'
      restartPolicy: Never

3. Resource Hooks 상세

Hook 어노테이션

metadata:
  annotations:
    # Hook 유형 지정
    argocd.argoproj.io/hook: PreSync|Sync|PostSync|SyncFail|Skip
    # Hook 리소스 삭제 정책
    argocd.argoproj.io/hook-delete-policy: HookSucceeded|HookFailed|BeforeHookCreation

Hook Delete Policy

정책	설명
HookSucceeded	Hook이 성공하면 리소스를 삭제
HookFailed	Hook이 실패하면 리소스를 삭제
BeforeHookCreation	다음 Sync에서 Hook을 생성하기 전에 기존 리소스를 삭제

BeforeHookCreation이 기본값이며, 가장 많이 사용됩니다. 이 정책은 다음 Sync 시 이전 Hook 리소스를 먼저 삭제한 후 새로 생성합니다.

Hook 실행 메커니즘

1. Application Controller가 Sync 시작
2. 현재 Phase에 해당하는 Hook 리소스 식별
3. Hook 리소스를 Wave 순서로 정렬
4. 각 Hook 리소스를 클러스터에 적용 (Job, Pod 등)
5. Hook이 완료(성공/실패)될 때까지 대기
6. 성공 시 다음 단계 진행, 실패 시 Sync 중단 또는 SyncFail Phase로 전환
7. Delete Policy에 따라 Hook 리소스 정리

4. Sync Waves와 순서 제어

Sync Wave 개념

Sync Wave는 리소스 적용 순서를 세밀하게 제어하는 메커니즘입니다:

# Wave -1: 인프라 리소스 (먼저 생성)
apiVersion: v1
kind: Namespace
metadata:
  name: my-app
  annotations:
    argocd.argoproj.io/sync-wave: '-1'

---
# Wave 0: 설정 리소스 (기본값)
apiVersion: v1
kind: ConfigMap
metadata:
  name: app-config
  annotations:
    argocd.argoproj.io/sync-wave: '0'

---
# Wave 1: 애플리케이션 리소스
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
  annotations:
    argocd.argoproj.io/sync-wave: '1'

---
# Wave 2: 외부 접근 리소스
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: my-app-ingress
  annotations:
    argocd.argoproj.io/sync-wave: '2'

Wave 실행 로직

1. 모든 리소스를 Wave 번호로 그룹화
2. 가장 낮은 Wave부터 순서대로 실행
3. 각 Wave 내에서는 리소스 타입 기본 순서 적용
4. 현재 Wave의 모든 리소스가 Healthy가 될 때까지 대기
5. 다음 Wave로 진행
6. 어느 Wave에서든 실패 시 전체 Sync 중단

리소스 타입 기본 순서

동일 Wave 내에서 리소스는 다음 순서로 적용됩니다:

Phase 1: 네임스페이스와 기본 설정
  1. Namespace
  2. NetworkPolicy
  3. ResourceQuota
  4. LimitRange
  5. PodSecurityPolicy
  6. ServiceAccount
  7. Secret
  8. SecretList
  9. ConfigMap

Phase 2: RBAC
  10. ClusterRole
  11. ClusterRoleBinding
  12. Role
  13. RoleBinding

Phase 3: CRD
  14. CustomResourceDefinition

Phase 4: 스토리지와 볼륨
  15. PersistentVolume
  16. PersistentVolumeClaim
  17. StorageClass

Phase 5: 서비스
  18. Service
  19. Endpoints

Phase 6: 워크로드
  20. DaemonSet
  21. Deployment
  22. ReplicaSet
  23. StatefulSet
  24. Job
  25. CronJob

Phase 7: 라우팅
  26. Ingress
  27. IngressClass
  28. APIService

5. Resource Tracking (리소스 추적)

Tracking 방식

ArgoCD는 자신이 관리하는 리소스를 추적하기 위해 두 가지 방식을 제공합니다:

Annotation 방식 (기본값, 권장):

metadata:
  annotations:
    argocd.argoproj.io/tracking-id: 'my-app:apps/Deployment:default/nginx'

Label 방식 (레거시):

metadata:
  labels:
    app.kubernetes.io/instance: my-app

Tracking ID 구조

APP_NAME:GROUP/KIND:NAMESPACE/NAME

예시:
  my-app:apps/Deployment:default/nginx
  my-app:/Service:default/nginx-svc
  my-app:networking.k8s.io/Ingress:default/nginx-ingress

Tracking 방식 설정

argocd-cm ConfigMap에서 설정합니다:

apiVersion: v1
kind: ConfigMap
metadata:
  name: argocd-cm
  namespace: argocd
data:
  application.resourceTrackingMethod: annotation # annotation | label | annotation+label

6. Diff 엔진 상세 분석

3-Way Diff

ArgoCD는 세 가지 상태를 비교합니다:

1. Desired State: Git에서 생성된 매니페스트 (목표 상태)
2. Live State: 클러스터에서 실행 중인 실제 상태
3. Last Applied: 마지막으로 적용된 설정 (annotation에 기록)

Structured Merge Diff

ArgoCD는 Kubernetes의 Server-Side Apply에서 사용하는 Structured Merge Diff 라이브러리를 활용합니다:

// Diff 수행 로직 (간소화)
func diff(desired, live *unstructured.Unstructured) (*DiffResult, error) {
    // 1. 정규화
    normalizedDesired := normalize(desired)
    normalizedLive := normalize(live)

    // 2. 무시 필드 제거
    removeIgnoredFields(normalizedDesired)
    removeIgnoredFields(normalizedLive)

    // 3. 구조적 비교
    result := structuredMergeDiff(normalizedDesired, normalizedLive)

    return result, nil
}

Normalization (정규화) 상세

정규화는 불필요한 diff를 제거하기 위한 핵심 과정입니다:

제거 대상 필드:

metadata.resourceVersion
metadata.uid
metadata.generation
metadata.creationTimestamp
metadata.managedFields
status (대부분의 리소스)

정규화 규칙 예시:

Container의 imagePullPolicy가 생략되었지만 이미지 태그가 latest면 Kubernetes가 Always를 자동 설정 -> diff에서 무시
Service의 clusterIP가 생략되면 Kubernetes가 자동 할당 -> diff에서 무시
빈 필드("", [], null)와 미설정 필드의 동등 처리

Diff 결과 해석

결과	의미
NoDiff	원하는 상태와 실제 상태가 동일 (Synced)
Diff	차이 존재 (OutOfSync)
Modified	필드 값이 변경됨
Added	새 필드가 추가됨
Removed	필드가 제거됨

Diff Customization

특정 필드를 diff에서 무시하도록 설정할 수 있습니다:

# Application 리소스에서 설정
spec:
  ignoreDifferences:
    - group: apps
      kind: Deployment
      jsonPointers:
        - /spec/replicas # HPA가 관리하는 replica 수 무시
    - group: ''
      kind: ConfigMap
      jqPathExpressions:
        - .data.generated-field # 자동 생성 필드 무시

글로벌 설정은 argocd-cm ConfigMap에서 가능합니다:

data:
  resource.customizations.ignoreDifferences.all: |
    managedFieldsManagers:
      - kube-controller-manager
      - kube-scheduler

7. Health Assessment (건강 상태 평가)

Built-in Health Check

ArgoCD는 주요 Kubernetes 리소스에 대해 내장 Health Check를 제공합니다:

Deployment:

Healthy: 모든 replica가 Ready이고 업데이트 완료
Progressing: 롤아웃이 진행 중 (새 ReplicaSet 생성 중)
Degraded: replica가 Ready 상태에 도달하지 못함

StatefulSet:

Healthy: 모든 replica가 Ready이고 currentRevision == updateRevision
Progressing: 업데이트가 진행 중
Degraded: replica가 Ready가 아님

Pod:

Healthy: Running 상태이고 모든 컨테이너가 Ready
Progressing: Pending 또는 ContainerCreating 상태
Degraded: CrashLoopBackOff, ImagePullBackOff 등

Service:

Healthy: Endpoints가 존재하고 하나 이상의 Ready 주소가 있음
Progressing: LoadBalancer 타입에서 외부 IP 할당 대기 중

Ingress:

Healthy: LoadBalancer 주소가 할당됨
Progressing: 주소 할당 대기 중

Job:

Healthy: 성공적으로 완료됨 (Completed)
Progressing: 실행 중 (Active)
Degraded: 실패 (Failed)

Custom Lua Health Check

기본 Health Check로 충분하지 않은 경우, Lua 스크립트로 커스텀 로직을 작성합니다:

# argocd-cm ConfigMap
data:
  resource.customizations.health.cert-manager.io_Certificate: |
    hs = {}
    if obj.status ~= nil then
      if obj.status.conditions ~= nil then
        for _, condition in ipairs(obj.status.conditions) do
          if condition.type == "Ready" then
            if condition.status == "True" then
              hs.status = "Healthy"
              hs.message = "Certificate is ready"
            else
              hs.status = "Degraded"
              hs.message = condition.message
            end
            return hs
          end
        end
      end
    end
    hs.status = "Progressing"
    hs.message = "Waiting for certificate"
    return hs

Health Check 대상 CRD 예시

CRD	프로젝트	Health Check 기준
Certificate	cert-manager	Ready condition
VirtualService	Istio	항상 Healthy (상태 없음)
Rollout	Argo Rollouts	phase 필드 기반
HelmRelease	Flux	Ready condition
Kustomization	Flux	Ready condition

8. Pruning (리소스 정리) 상세

Prune 동작 원리

1. Git 매니페스트에서 모든 리소스 목록 생성
2. 클러스터에서 ArgoCD가 관리하는 리소스 조회 (tracking label/annotation)
3. 클러스터에 존재하지만 Git에 없는 리소스 식별 (= Prune 대상)
4. 삭제 정책에 따라 리소스 삭제

삭제 전략

Cascade 삭제 (기본값):

- 소유 리소스를 재귀적으로 삭제
- Deployment를 삭제하면 ReplicaSet과 Pod도 함께 삭제
- Kubernetes의 garbage collection 메커니즘 사용

Foreground 삭제:

- 소유 리소스가 먼저 삭제된 후 부모 리소스 삭제
- 순서가 보장되어야 하는 경우 사용
- finalizer를 통한 순서 제어

Prune 보호

의도하지 않은 삭제를 방지하기 위한 보호 메커니즘:

# 리소스에 Prune 방지 어노테이션 추가
metadata:
  annotations:
    argocd.argoproj.io/sync-options: Prune=false

# Application 수준에서 Prune 비활성화
spec:
  syncPolicy:
    automated:
      prune: false # 자동 Prune 비활성화

Orphan 리소스 모니터링

ArgoCD는 관리 대상이 아닌 "고아(Orphan)" 리소스도 감지할 수 있습니다:

# AppProject에서 Orphan 리소스 모니터링 활성화
spec:
  orphanedResources:
    warn: true # 경고만 표시
    # ignore:   # 무시할 리소스 패턴
    #   - group: ""
    #     kind: ConfigMap
    #     name: "auto-*"

9. Retry 전략과 Backoff

Auto-Sync Retry

Sync가 실패하면 자동으로 재시도할 수 있습니다:

spec:
  syncPolicy:
    automated:
      selfHeal: true
    retry:
      limit: 5 # 최대 재시도 횟수
      backoff:
        duration: 5s # 초기 대기 시간
        factor: 2 # 대기 시간 증가 배수
        maxDuration: 3m # 최대 대기 시간

Backoff 계산 예시

재시도 1: 5초 후
재시도 2: 10초 후 (5s * 2)
재시도 3: 20초 후 (10s * 2)
재시도 4: 40초 후 (20s * 2)
재시도 5: 80초 후 (40s * 2) -> 최대 3분으로 제한

Retry 트리거 조건

다음 상황에서 Retry가 트리거됩니다:

리소스 적용 실패 (API 서버 오류, 유효성 검사 실패 등)
Health Check 타임아웃 (리소스가 시간 내에 Healthy가 되지 않음)
Hook 실행 실패 (PreSync/PostSync Job 실패)
네트워크 일시적 오류

10. Sync 옵션

Application 수준 Sync 옵션

spec:
  syncPolicy:
    automated:
      prune: true # Git에서 삭제된 리소스 자동 정리
      selfHeal: true # 수동 변경 자동 교정
      allowEmpty: false # 빈 매니페스트 허용 여부
    syncOptions:
      - CreateNamespace=true # 네임스페이스 자동 생성
      - PrunePropagationPolicy=foreground # 삭제 전파 정책
      - PruneLast=true # 다른 리소스 동기화 후 Prune
      - Replace=false # apply 대신 replace 사용 여부
      - ServerSideApply=true # Server-Side Apply 사용
      - ApplyOutOfSyncOnly=true # OutOfSync 리소스만 적용
      - Validate=true # 매니페스트 유효성 검사
      - RespectIgnoreDifferences=true # ignoreDifferences 설정 존중

리소스 수준 Sync 옵션

개별 리소스에 어노테이션으로 설정할 수 있습니다:

metadata:
  annotations:
    argocd.argoproj.io/sync-options: Prune=false,Replace=true

Server-Side Apply

Server-Side Apply는 Kubernetes 1.22+에서 권장되는 적용 방식입니다:

장점:
  - 필드 소유권(Field Ownership) 추적
  - 여러 컨트롤러 간 충돌 방지
  - 더 정확한 3-way merge
  - 대규모 리소스에서 성능 향상

설정:
  syncOptions:
    - ServerSideApply=true

11. Sync Window (동기화 시간대)

Sync Window 개념

AppProject에서 동기화를 허용하거나 금지하는 시간대를 설정할 수 있습니다:

spec:
  syncWindows:
    # 평일 업무시간에만 Sync 허용
    - kind: allow
      schedule: '0 9 * * 1-5' # 월-금 09:00
      duration: 9h # 9시간 동안
      applications:
        - '*'
      namespaces:
        - 'production'
    # 주말에는 Sync 금지
    - kind: deny
      schedule: '0 0 * * 0,6' # 토,일 00:00
      duration: 24h
      applications:
        - '*'
    # 수동 Sync만 허용하는 시간대
    - kind: allow
      schedule: '0 18 * * 1-5' # 월-금 18:00
      duration: 15h
      manualSync: true
      applications:
        - 'critical-*'

Window 유형

유형	설명
allow	지정된 시간에만 Sync를 허용
deny	지정된 시간에는 Sync를 금지

우선순위 규칙

1. deny가 allow보다 우선
2. 동일 우선순위면 더 구체적인 규칙이 우선
3. manualSync=true면 수동 Sync만 허용

12. 실전 Sync 전략

안전한 프로덕션 배포 전략

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: production-app
spec:
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
      - CreateNamespace=true
      - PruneLast=true
      - ServerSideApply=true
      - ApplyOutOfSyncOnly=true
    retry:
      limit: 3
      backoff:
        duration: 10s
        factor: 2
        maxDuration: 5m
  ignoreDifferences:
    - group: apps
      kind: Deployment
      jsonPointers:
        - /spec/replicas # HPA 관리
    - group: autoscaling
      kind: HorizontalPodAutoscaler
      jqPathExpressions:
        - .status

Hook을 활용한 완전한 배포 파이프라인

# Wave -2: PreSync - DB 백업
apiVersion: batch/v1
kind: Job
metadata:
  name: db-backup
  annotations:
    argocd.argoproj.io/hook: PreSync
    argocd.argoproj.io/hook-delete-policy: BeforeHookCreation
    argocd.argoproj.io/sync-wave: '-2'
spec:
  template:
    spec:
      containers:
        - name: backup
          image: backup-tool:latest
          command: ['./backup.sh']
      restartPolicy: Never

---
# Wave -1: PreSync - DB 마이그레이션
apiVersion: batch/v1
kind: Job
metadata:
  name: db-migrate
  annotations:
    argocd.argoproj.io/hook: PreSync
    argocd.argoproj.io/hook-delete-policy: BeforeHookCreation
    argocd.argoproj.io/sync-wave: '-1'
spec:
  template:
    spec:
      containers:
        - name: migrate
          image: myapp/migration:v2
          command: ['./migrate', 'up']
      restartPolicy: Never

---
# Wave 0: Sync - 메인 애플리케이션
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
  annotations:
    argocd.argoproj.io/sync-wave: '0'
spec:
  replicas: 3
  selector:
    matchLabels:
      app: myapp
  template:
    metadata:
      labels:
        app: myapp
    spec:
      containers:
        - name: app
          image: myapp:v2

---
# Wave 1: PostSync - 스모크 테스트
apiVersion: batch/v1
kind: Job
metadata:
  name: smoke-test
  annotations:
    argocd.argoproj.io/hook: PostSync
    argocd.argoproj.io/hook-delete-policy: BeforeHookCreation
    argocd.argoproj.io/sync-wave: '1'
spec:
  template:
    spec:
      containers:
        - name: test
          image: curlimages/curl:latest
          command: ['curl', '-f', 'http://myapp:8080/health']
      restartPolicy: Never

13. 정리

ArgoCD Sync 엔진의 핵심 요소를 정리합니다:

Phase: PreSync, Sync, PostSync, SyncFail로 배포 단계를 구조화
Hook: Job이나 Pod로 각 Phase에서 커스텀 작업 실행
Wave: 리소스 간 적용 순서를 세밀하게 제어
Diff Engine: 3-Way Merge 기반의 정규화된 상태 비교
Health Check: 내장 + Lua 커스텀으로 리소스 건강 상태 평가
Pruning: Git에서 제거된 리소스의 안전한 정리
Retry: 지수 백오프를 통한 자동 재시도
Sync Window: 시간대 기반 동기화 제어

이러한 메커니즘을 적절히 조합하면 안전하고 예측 가능한 GitOps 배포 파이프라인을 구축할 수 있습니다.

[ArgoCD] Sync Engine Analysis: Everything About Synchronization Mechanisms

1. Sync Engine Overview
- Sync State Machine
2. Sync Phases
3. Resource Hooks in Detail
4. Sync Waves and Ordering
5. Resource Tracking
6. Diff Engine Detailed Analysis
7. Health Assessment
- Built-in Health Checks
- Custom Lua Health Checks
8. Pruning Details
9. Retry Strategy and Backoff
- Auto-Sync Retry
- Backoff Calculation Example
10. Sync Options
- Application-Level Sync Options
- Server-Side Apply
11. Sync Windows
- Sync Window Concept
- Priority Rules
12. Production Sync Strategy
- Safe Production Deployment
13. Summary

1. Sync Engine Overview

The ArgoCD Sync engine is the core module that applies the desired state defined in Git repositories to Kubernetes clusters. Beyond simple kubectl apply, it provides sophisticated mechanisms including Hooks, Waves, Health Checks, and Retry logic.

Sync State Machine

Pending --> Running --> Succeeded
                |
                +--> Failed --> (Retry or Manual)

State	Description
Pending	Sync is queued
Running	Sync is executing
Succeeded	All resources synchronized successfully
Failed	Error occurred during sync

2. Sync Phases

ArgoCD divides sync into multiple phases, each handling specific resource types.

Phase Execution Order

PreSync --> Sync --> PostSync
              |
              +--> SyncFail (only on Sync failure)

PreSync Phase

PreSync runs before the main synchronization:

apiVersion: batch/v1
kind: Job
metadata:
  name: db-migration
  annotations:
    argocd.argoproj.io/hook: PreSync
    argocd.argoproj.io/hook-delete-policy: HookSucceeded
spec:
  template:
    spec:
      containers:
        - name: migrate
          image: myapp/migration:latest
          command: ['./migrate', 'up']
      restartPolicy: Never

PreSync use cases:

Database schema migrations
Configuration pre-validation
External system health checks
Backup creation

Sync Phase

The main sync phase applies actual Kubernetes resources to the cluster:

1. Sort all Sync Phase resources by Wave order
2. For each Wave group:
   a. Apply in resource type order
   b. Perform kubectl apply equivalent for each resource
   c. Wait until all resources in the Wave become Healthy
3. Proceed to next Wave

PostSync Phase

PostSync runs only after successful main synchronization:

apiVersion: batch/v1
kind: Job
metadata:
  name: notification
  annotations:
    argocd.argoproj.io/hook: PostSync
    argocd.argoproj.io/hook-delete-policy: HookSucceeded
spec:
  template:
    spec:
      containers:
        - name: notify
          image: curlimages/curl:latest
          command:
            - curl
            - -X
            - POST
            - https://hooks.slack.com/services/XXX
            - -d
            - '{"text":"Deployment successful"}'
      restartPolicy: Never

PostSync use cases:

Deployment completion notifications
Smoke test execution
CDN cache invalidation
External system sync triggers

SyncFail Phase

SyncFail runs only when the Sync fails:

apiVersion: batch/v1
kind: Job
metadata:
  name: failure-notification
  annotations:
    argocd.argoproj.io/hook: SyncFail
    argocd.argoproj.io/hook-delete-policy: HookSucceeded
spec:
  template:
    spec:
      containers:
        - name: notify-failure
          image: curlimages/curl:latest
          command:
            - curl
            - -X
            - POST
            - https://hooks.slack.com/services/XXX
            - -d
            - '{"text":"Deployment FAILED"}'
      restartPolicy: Never

3. Resource Hooks in Detail

Hook Annotations

metadata:
  annotations:
    argocd.argoproj.io/hook: PreSync|Sync|PostSync|SyncFail|Skip
    argocd.argoproj.io/hook-delete-policy: HookSucceeded|HookFailed|BeforeHookCreation

Hook Delete Policy

Policy	Description
HookSucceeded	Delete resource when hook succeeds
HookFailed	Delete resource when hook fails
BeforeHookCreation	Delete existing resource before creating hook on next sync

BeforeHookCreation is the default and most commonly used. It deletes previous hook resources before creating new ones on the next sync.

Hook Execution Mechanism

1. Application Controller starts Sync
2. Identify hook resources for the current Phase
3. Sort hook resources by Wave order
4. Apply each hook resource to cluster (Job, Pod, etc.)
5. Wait for hook completion (success/failure)
6. On success: proceed to next step; on failure: abort Sync or transition to SyncFail
7. Clean up hook resources per Delete Policy

4. Sync Waves and Ordering

Sync Wave Concept

Sync Waves provide fine-grained control over resource application order:

# Wave -1: Infrastructure resources (created first)
apiVersion: v1
kind: Namespace
metadata:
  name: my-app
  annotations:
    argocd.argoproj.io/sync-wave: '-1'

---
# Wave 0: Configuration resources (default)
apiVersion: v1
kind: ConfigMap
metadata:
  name: app-config
  annotations:
    argocd.argoproj.io/sync-wave: '0'

---
# Wave 1: Application resources
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
  annotations:
    argocd.argoproj.io/sync-wave: '1'

---
# Wave 2: External access resources
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: my-app-ingress
  annotations:
    argocd.argoproj.io/sync-wave: '2'

Wave Execution Logic

1. Group all resources by Wave number
2. Execute from lowest Wave first
3. Apply default resource type ordering within each Wave
4. Wait until all resources in current Wave are Healthy
5. Proceed to next Wave
6. Abort entire Sync if any Wave fails

Default Resource Type Order

Within the same Wave, resources are applied in this order:

Phase 1: Namespaces and base configs
  1. Namespace
  2. NetworkPolicy
  3. ResourceQuota
  4. LimitRange
  5. PodSecurityPolicy
  6. ServiceAccount
  7. Secret
  8. SecretList
  9. ConfigMap

Phase 2: RBAC
  10. ClusterRole
  11. ClusterRoleBinding
  12. Role
  13. RoleBinding

Phase 3: CRD
  14. CustomResourceDefinition

Phase 4: Storage and volumes
  15. PersistentVolume
  16. PersistentVolumeClaim
  17. StorageClass

Phase 5: Services
  18. Service
  19. Endpoints

Phase 6: Workloads
  20. DaemonSet
  21. Deployment
  22. ReplicaSet
  23. StatefulSet
  24. Job
  25. CronJob

Phase 7: Routing
  26. Ingress
  27. IngressClass
  28. APIService

5. Resource Tracking

Tracking Methods

ArgoCD provides two methods for tracking managed resources:

Annotation method (default, recommended):

metadata:
  annotations:
    argocd.argoproj.io/tracking-id: 'my-app:apps/Deployment:default/nginx'

Label method (legacy):

metadata:
  labels:
    app.kubernetes.io/instance: my-app

Tracking ID Structure

APP_NAME:GROUP/KIND:NAMESPACE/NAME

Examples:
  my-app:apps/Deployment:default/nginx
  my-app:/Service:default/nginx-svc
  my-app:networking.k8s.io/Ingress:default/nginx-ingress

Tracking Method Configuration

Configured in the argocd-cm ConfigMap:

apiVersion: v1
kind: ConfigMap
metadata:
  name: argocd-cm
  namespace: argocd
data:
  application.resourceTrackingMethod: annotation # annotation | label | annotation+label

6. Diff Engine Detailed Analysis

3-Way Diff

ArgoCD compares three states:

1. Desired State: Manifests generated from Git (target state)
2. Live State: Actual state running in the cluster
3. Last Applied: Last applied configuration (recorded in annotation)

Structured Merge Diff

ArgoCD leverages the Structured Merge Diff library used by Kubernetes Server-Side Apply:

// Diff logic (simplified)
func diff(desired, live *unstructured.Unstructured) (*DiffResult, error) {
    // 1. Normalize
    normalizedDesired := normalize(desired)
    normalizedLive := normalize(live)

    // 2. Remove ignored fields
    removeIgnoredFields(normalizedDesired)
    removeIgnoredFields(normalizedLive)

    // 3. Structural comparison
    result := structuredMergeDiff(normalizedDesired, normalizedLive)

    return result, nil
}

Normalization Details

Normalization is essential for eliminating unnecessary diffs:

Fields removed:

metadata.resourceVersion
metadata.uid
metadata.generation
metadata.creationTimestamp
metadata.managedFields
status (for most resources)

Normalization rule examples:

Container imagePullPolicy omitted but image tag is latest causes Kubernetes to auto-set Always -- ignored in diff
Service clusterIP omitted causes Kubernetes to auto-assign -- ignored in diff
Empty fields ("", [], null) treated equivalently to unset fields

Diff Customization

You can configure specific fields to be ignored in diffs:

spec:
  ignoreDifferences:
    - group: apps
      kind: Deployment
      jsonPointers:
        - /spec/replicas # Ignore replica count managed by HPA
    - group: ''
      kind: ConfigMap
      jqPathExpressions:
        - .data.generated-field # Ignore auto-generated fields

Global settings in argocd-cm ConfigMap:

data:
  resource.customizations.ignoreDifferences.all: |
    managedFieldsManagers:
      - kube-controller-manager
      - kube-scheduler

7. Health Assessment

Built-in Health Checks

ArgoCD provides built-in health checks for key Kubernetes resources:

Deployment:

Healthy: All replicas Ready and update complete
Progressing: Rollout in progress (creating new ReplicaSet)
Degraded: Replicas failed to reach Ready state

StatefulSet:

Healthy: All replicas Ready and currentRevision == updateRevision
Progressing: Update in progress
Degraded: Replicas not Ready

Pod:

Healthy: Running state with all containers Ready
Progressing: Pending or ContainerCreating state
Degraded: CrashLoopBackOff, ImagePullBackOff, etc.

Job:

Healthy: Successfully completed (Completed)
Progressing: Running (Active)
Degraded: Failed

Custom Lua Health Checks

When built-in checks are insufficient, write custom logic with Lua scripts:

# argocd-cm ConfigMap
data:
  resource.customizations.health.cert-manager.io_Certificate: |
    hs = {}
    if obj.status ~= nil then
      if obj.status.conditions ~= nil then
        for _, condition in ipairs(obj.status.conditions) do
          if condition.type == "Ready" then
            if condition.status == "True" then
              hs.status = "Healthy"
              hs.message = "Certificate is ready"
            else
              hs.status = "Degraded"
              hs.message = condition.message
            end
            return hs
          end
        end
      end
    end
    hs.status = "Progressing"
    hs.message = "Waiting for certificate"
    return hs

8. Pruning Details

Prune Operation Flow

1. Generate complete resource list from Git manifests
2. Query ArgoCD-managed resources in cluster (tracking label/annotation)
3. Identify resources in cluster but not in Git (= Prune targets)
4. Delete resources according to deletion policy

Deletion Strategies

Cascade deletion (default):

- Recursively deletes owned resources
- Deleting a Deployment also deletes its ReplicaSet and Pods
- Uses Kubernetes garbage collection mechanism

Foreground deletion:

- Owned resources deleted first, then parent resource
- Used when ordering must be guaranteed
- Order control via finalizers

Prune Protection

Protection mechanisms to prevent unintended deletion:

# Add Prune prevention annotation to resource
metadata:
  annotations:
    argocd.argoproj.io/sync-options: Prune=false

# Disable Prune at Application level
spec:
  syncPolicy:
    automated:
      prune: false

9. Retry Strategy and Backoff

Auto-Sync Retry

Sync failures can be automatically retried:

spec:
  syncPolicy:
    automated:
      selfHeal: true
    retry:
      limit: 5
      backoff:
        duration: 5s
        factor: 2
        maxDuration: 3m

Backoff Calculation Example

Retry 1: After 5s
Retry 2: After 10s (5s * 2)
Retry 3: After 20s (10s * 2)
Retry 4: After 40s (20s * 2)
Retry 5: After 80s (40s * 2) -> capped at 3m max

10. Sync Options

Application-Level Sync Options

spec:
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
      allowEmpty: false
    syncOptions:
      - CreateNamespace=true
      - PrunePropagationPolicy=foreground
      - PruneLast=true
      - Replace=false
      - ServerSideApply=true
      - ApplyOutOfSyncOnly=true
      - Validate=true
      - RespectIgnoreDifferences=true

Server-Side Apply

Server-Side Apply is the recommended apply method for Kubernetes 1.22+:

Benefits:
  - Field Ownership tracking
  - Conflict prevention between multiple controllers
  - More accurate 3-way merge
  - Performance improvement for large resources

Configuration:
  syncOptions:
    - ServerSideApply=true

11. Sync Windows

Sync Window Concept

AppProjects can define time windows that allow or deny synchronization:

spec:
  syncWindows:
    # Allow Sync only during weekday business hours
    - kind: allow
      schedule: '0 9 * * 1-5'
      duration: 9h
      applications:
        - '*'
      namespaces:
        - 'production'
    # Deny Sync on weekends
    - kind: deny
      schedule: '0 0 * * 0,6'
      duration: 24h
      applications:
        - '*'
    # Allow only manual Sync during off-hours
    - kind: allow
      schedule: '0 18 * * 1-5'
      duration: 15h
      manualSync: true
      applications:
        - 'critical-*'

Priority Rules

1. deny takes priority over allow
2. More specific rules take priority at the same level
3. manualSync=true allows only manual Sync

12. Production Sync Strategy

Safe Production Deployment

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: production-app
spec:
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
      - CreateNamespace=true
      - PruneLast=true
      - ServerSideApply=true
      - ApplyOutOfSyncOnly=true
    retry:
      limit: 3
      backoff:
        duration: 10s
        factor: 2
        maxDuration: 5m
  ignoreDifferences:
    - group: apps
      kind: Deployment
      jsonPointers:
        - /spec/replicas
    - group: autoscaling
      kind: HorizontalPodAutoscaler
      jqPathExpressions:
        - .status

13. Summary

Key elements of the ArgoCD Sync engine:

Phases: Structure deployment stages with PreSync, Sync, PostSync, SyncFail
Hooks: Execute custom tasks at each Phase via Jobs or Pods
Waves: Fine-grained control over resource application order
Diff Engine: Normalized state comparison based on 3-Way Merge
Health Check: Built-in + Lua custom health assessment
Pruning: Safe cleanup of resources removed from Git
Retry: Automatic retry with exponential backoff
Sync Window: Time-based synchronization control

Combining these mechanisms appropriately enables building safe and predictable GitOps deployment pipelines.