Skip to content
Published on

[ArgoCD] Sync Engine Analysis: Everything About Synchronization Mechanisms

Authors

1. Sync Engine Overview

The ArgoCD Sync engine is the core module that applies the desired state defined in Git repositories to Kubernetes clusters. Beyond simple kubectl apply, it provides sophisticated mechanisms including Hooks, Waves, Health Checks, and Retry logic.

Sync State Machine

Pending --> Running --> Succeeded
                |
                +--> Failed --> (Retry or Manual)
StateDescription
PendingSync is queued
RunningSync is executing
SucceededAll resources synchronized successfully
FailedError occurred during sync

2. Sync Phases

ArgoCD divides sync into multiple phases, each handling specific resource types.

Phase Execution Order

PreSync --> Sync --> PostSync
              |
              +--> SyncFail (only on Sync failure)

PreSync Phase

PreSync runs before the main synchronization:

apiVersion: batch/v1
kind: Job
metadata:
  name: db-migration
  annotations:
    argocd.argoproj.io/hook: PreSync
    argocd.argoproj.io/hook-delete-policy: HookSucceeded
spec:
  template:
    spec:
      containers:
        - name: migrate
          image: myapp/migration:latest
          command: ['./migrate', 'up']
      restartPolicy: Never

PreSync use cases:

  • Database schema migrations
  • Configuration pre-validation
  • External system health checks
  • Backup creation

Sync Phase

The main sync phase applies actual Kubernetes resources to the cluster:

1. Sort all Sync Phase resources by Wave order
2. For each Wave group:
   a. Apply in resource type order
   b. Perform kubectl apply equivalent for each resource
   c. Wait until all resources in the Wave become Healthy
3. Proceed to next Wave

PostSync Phase

PostSync runs only after successful main synchronization:

apiVersion: batch/v1
kind: Job
metadata:
  name: notification
  annotations:
    argocd.argoproj.io/hook: PostSync
    argocd.argoproj.io/hook-delete-policy: HookSucceeded
spec:
  template:
    spec:
      containers:
        - name: notify
          image: curlimages/curl:latest
          command:
            - curl
            - -X
            - POST
            - https://hooks.slack.com/services/XXX
            - -d
            - '{"text":"Deployment successful"}'
      restartPolicy: Never

PostSync use cases:

  • Deployment completion notifications
  • Smoke test execution
  • CDN cache invalidation
  • External system sync triggers

SyncFail Phase

SyncFail runs only when the Sync fails:

apiVersion: batch/v1
kind: Job
metadata:
  name: failure-notification
  annotations:
    argocd.argoproj.io/hook: SyncFail
    argocd.argoproj.io/hook-delete-policy: HookSucceeded
spec:
  template:
    spec:
      containers:
        - name: notify-failure
          image: curlimages/curl:latest
          command:
            - curl
            - -X
            - POST
            - https://hooks.slack.com/services/XXX
            - -d
            - '{"text":"Deployment FAILED"}'
      restartPolicy: Never

3. Resource Hooks in Detail

Hook Annotations

metadata:
  annotations:
    argocd.argoproj.io/hook: PreSync|Sync|PostSync|SyncFail|Skip
    argocd.argoproj.io/hook-delete-policy: HookSucceeded|HookFailed|BeforeHookCreation

Hook Delete Policy

PolicyDescription
HookSucceededDelete resource when hook succeeds
HookFailedDelete resource when hook fails
BeforeHookCreationDelete existing resource before creating hook on next sync

BeforeHookCreation is the default and most commonly used. It deletes previous hook resources before creating new ones on the next sync.

Hook Execution Mechanism

1. Application Controller starts Sync
2. Identify hook resources for the current Phase
3. Sort hook resources by Wave order
4. Apply each hook resource to cluster (Job, Pod, etc.)
5. Wait for hook completion (success/failure)
6. On success: proceed to next step; on failure: abort Sync or transition to SyncFail
7. Clean up hook resources per Delete Policy

4. Sync Waves and Ordering

Sync Wave Concept

Sync Waves provide fine-grained control over resource application order:

# Wave -1: Infrastructure resources (created first)
apiVersion: v1
kind: Namespace
metadata:
  name: my-app
  annotations:
    argocd.argoproj.io/sync-wave: '-1'

---
# Wave 0: Configuration resources (default)
apiVersion: v1
kind: ConfigMap
metadata:
  name: app-config
  annotations:
    argocd.argoproj.io/sync-wave: '0'

---
# Wave 1: Application resources
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
  annotations:
    argocd.argoproj.io/sync-wave: '1'

---
# Wave 2: External access resources
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: my-app-ingress
  annotations:
    argocd.argoproj.io/sync-wave: '2'

Wave Execution Logic

1. Group all resources by Wave number
2. Execute from lowest Wave first
3. Apply default resource type ordering within each Wave
4. Wait until all resources in current Wave are Healthy
5. Proceed to next Wave
6. Abort entire Sync if any Wave fails

Default Resource Type Order

Within the same Wave, resources are applied in this order:

Phase 1: Namespaces and base configs
  1. Namespace
  2. NetworkPolicy
  3. ResourceQuota
  4. LimitRange
  5. PodSecurityPolicy
  6. ServiceAccount
  7. Secret
  8. SecretList
  9. ConfigMap

Phase 2: RBAC
  10. ClusterRole
  11. ClusterRoleBinding
  12. Role
  13. RoleBinding

Phase 3: CRD
  14. CustomResourceDefinition

Phase 4: Storage and volumes
  15. PersistentVolume
  16. PersistentVolumeClaim
  17. StorageClass

Phase 5: Services
  18. Service
  19. Endpoints

Phase 6: Workloads
  20. DaemonSet
  21. Deployment
  22. ReplicaSet
  23. StatefulSet
  24. Job
  25. CronJob

Phase 7: Routing
  26. Ingress
  27. IngressClass
  28. APIService

5. Resource Tracking

Tracking Methods

ArgoCD provides two methods for tracking managed resources:

Annotation method (default, recommended):

metadata:
  annotations:
    argocd.argoproj.io/tracking-id: 'my-app:apps/Deployment:default/nginx'

Label method (legacy):

metadata:
  labels:
    app.kubernetes.io/instance: my-app

Tracking ID Structure

APP_NAME:GROUP/KIND:NAMESPACE/NAME

Examples:
  my-app:apps/Deployment:default/nginx
  my-app:/Service:default/nginx-svc
  my-app:networking.k8s.io/Ingress:default/nginx-ingress

Tracking Method Configuration

Configured in the argocd-cm ConfigMap:

apiVersion: v1
kind: ConfigMap
metadata:
  name: argocd-cm
  namespace: argocd
data:
  application.resourceTrackingMethod: annotation # annotation | label | annotation+label

6. Diff Engine Detailed Analysis

3-Way Diff

ArgoCD compares three states:

1. Desired State: Manifests generated from Git (target state)
2. Live State: Actual state running in the cluster
3. Last Applied: Last applied configuration (recorded in annotation)

Structured Merge Diff

ArgoCD leverages the Structured Merge Diff library used by Kubernetes Server-Side Apply:

// Diff logic (simplified)
func diff(desired, live *unstructured.Unstructured) (*DiffResult, error) {
    // 1. Normalize
    normalizedDesired := normalize(desired)
    normalizedLive := normalize(live)

    // 2. Remove ignored fields
    removeIgnoredFields(normalizedDesired)
    removeIgnoredFields(normalizedLive)

    // 3. Structural comparison
    result := structuredMergeDiff(normalizedDesired, normalizedLive)

    return result, nil
}

Normalization Details

Normalization is essential for eliminating unnecessary diffs:

Fields removed:

  • metadata.resourceVersion
  • metadata.uid
  • metadata.generation
  • metadata.creationTimestamp
  • metadata.managedFields
  • status (for most resources)

Normalization rule examples:

  • Container imagePullPolicy omitted but image tag is latest causes Kubernetes to auto-set Always -- ignored in diff
  • Service clusterIP omitted causes Kubernetes to auto-assign -- ignored in diff
  • Empty fields ("", [], null) treated equivalently to unset fields

Diff Customization

You can configure specific fields to be ignored in diffs:

spec:
  ignoreDifferences:
    - group: apps
      kind: Deployment
      jsonPointers:
        - /spec/replicas # Ignore replica count managed by HPA
    - group: ''
      kind: ConfigMap
      jqPathExpressions:
        - .data.generated-field # Ignore auto-generated fields

Global settings in argocd-cm ConfigMap:

data:
  resource.customizations.ignoreDifferences.all: |
    managedFieldsManagers:
      - kube-controller-manager
      - kube-scheduler

7. Health Assessment

Built-in Health Checks

ArgoCD provides built-in health checks for key Kubernetes resources:

Deployment:

Healthy: All replicas Ready and update complete
Progressing: Rollout in progress (creating new ReplicaSet)
Degraded: Replicas failed to reach Ready state

StatefulSet:

Healthy: All replicas Ready and currentRevision == updateRevision
Progressing: Update in progress
Degraded: Replicas not Ready

Pod:

Healthy: Running state with all containers Ready
Progressing: Pending or ContainerCreating state
Degraded: CrashLoopBackOff, ImagePullBackOff, etc.

Job:

Healthy: Successfully completed (Completed)
Progressing: Running (Active)
Degraded: Failed

Custom Lua Health Checks

When built-in checks are insufficient, write custom logic with Lua scripts:

# argocd-cm ConfigMap
data:
  resource.customizations.health.cert-manager.io_Certificate: |
    hs = {}
    if obj.status ~= nil then
      if obj.status.conditions ~= nil then
        for _, condition in ipairs(obj.status.conditions) do
          if condition.type == "Ready" then
            if condition.status == "True" then
              hs.status = "Healthy"
              hs.message = "Certificate is ready"
            else
              hs.status = "Degraded"
              hs.message = condition.message
            end
            return hs
          end
        end
      end
    end
    hs.status = "Progressing"
    hs.message = "Waiting for certificate"
    return hs

8. Pruning Details

Prune Operation Flow

1. Generate complete resource list from Git manifests
2. Query ArgoCD-managed resources in cluster (tracking label/annotation)
3. Identify resources in cluster but not in Git (= Prune targets)
4. Delete resources according to deletion policy

Deletion Strategies

Cascade deletion (default):

- Recursively deletes owned resources
- Deleting a Deployment also deletes its ReplicaSet and Pods
- Uses Kubernetes garbage collection mechanism

Foreground deletion:

- Owned resources deleted first, then parent resource
- Used when ordering must be guaranteed
- Order control via finalizers

Prune Protection

Protection mechanisms to prevent unintended deletion:

# Add Prune prevention annotation to resource
metadata:
  annotations:
    argocd.argoproj.io/sync-options: Prune=false

# Disable Prune at Application level
spec:
  syncPolicy:
    automated:
      prune: false

9. Retry Strategy and Backoff

Auto-Sync Retry

Sync failures can be automatically retried:

spec:
  syncPolicy:
    automated:
      selfHeal: true
    retry:
      limit: 5
      backoff:
        duration: 5s
        factor: 2
        maxDuration: 3m

Backoff Calculation Example

Retry 1: After 5s
Retry 2: After 10s (5s * 2)
Retry 3: After 20s (10s * 2)
Retry 4: After 40s (20s * 2)
Retry 5: After 80s (40s * 2) -> capped at 3m max

10. Sync Options

Application-Level Sync Options

spec:
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
      allowEmpty: false
    syncOptions:
      - CreateNamespace=true
      - PrunePropagationPolicy=foreground
      - PruneLast=true
      - Replace=false
      - ServerSideApply=true
      - ApplyOutOfSyncOnly=true
      - Validate=true
      - RespectIgnoreDifferences=true

Server-Side Apply

Server-Side Apply is the recommended apply method for Kubernetes 1.22+:

Benefits:
  - Field Ownership tracking
  - Conflict prevention between multiple controllers
  - More accurate 3-way merge
  - Performance improvement for large resources

Configuration:
  syncOptions:
    - ServerSideApply=true

11. Sync Windows

Sync Window Concept

AppProjects can define time windows that allow or deny synchronization:

spec:
  syncWindows:
    # Allow Sync only during weekday business hours
    - kind: allow
      schedule: '0 9 * * 1-5'
      duration: 9h
      applications:
        - '*'
      namespaces:
        - 'production'
    # Deny Sync on weekends
    - kind: deny
      schedule: '0 0 * * 0,6'
      duration: 24h
      applications:
        - '*'
    # Allow only manual Sync during off-hours
    - kind: allow
      schedule: '0 18 * * 1-5'
      duration: 15h
      manualSync: true
      applications:
        - 'critical-*'

Priority Rules

1. deny takes priority over allow
2. More specific rules take priority at the same level
3. manualSync=true allows only manual Sync

12. Production Sync Strategy

Safe Production Deployment

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: production-app
spec:
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
      - CreateNamespace=true
      - PruneLast=true
      - ServerSideApply=true
      - ApplyOutOfSyncOnly=true
    retry:
      limit: 3
      backoff:
        duration: 10s
        factor: 2
        maxDuration: 5m
  ignoreDifferences:
    - group: apps
      kind: Deployment
      jsonPointers:
        - /spec/replicas
    - group: autoscaling
      kind: HorizontalPodAutoscaler
      jqPathExpressions:
        - .status

13. Summary

Key elements of the ArgoCD Sync engine:

  1. Phases: Structure deployment stages with PreSync, Sync, PostSync, SyncFail
  2. Hooks: Execute custom tasks at each Phase via Jobs or Pods
  3. Waves: Fine-grained control over resource application order
  4. Diff Engine: Normalized state comparison based on 3-Way Merge
  5. Health Check: Built-in + Lua custom health assessment
  6. Pruning: Safe cleanup of resources removed from Git
  7. Retry: Automatic retry with exponential backoff
  8. Sync Window: Time-based synchronization control

Combining these mechanisms appropriately enables building safe and predictable GitOps deployment pipelines.