Split View: Multi-Cloud 전략 & 마이그레이션 완전 가이드 2025: AWS/GCP/Azure 비교, 하이브리드 클라우드

Multi-Cloud 전략 & 마이그레이션 완전 가이드 2025: AWS/GCP/Azure 비교, 하이브리드 클라우드

1. 왜 Multi-Cloud인가?

클라우드 시장이 성숙해지면서 단일 클라우드 제공자에 모든 워크로드를 집중하는 전략이 점점 더 위험해지고 있습니다. 2025년 기준 Fortune 500 기업의 약 89%가 멀티클라우드 전략을 채택하고 있으며, 평균적으로 2.6개의 퍼블릭 클라우드를 사용합니다.

1.1 벤더 락인(Vendor Lock-in) 리스크

단일 클라우드에 의존할 때 발생하는 핵심 리스크:

가격 협상력 상실: 대안이 없으면 클라우드 제공자의 가격 인상에 종속
서비스 중단 리스크: 2024년 주요 클라우드 제공자별 평균 대형 장애 4.2건 발생
기술 종속: 독점 서비스(AWS Lambda, Azure Functions 등) 사용 시 이전 비용 기하급수적 증가
규제 변화: 데이터 주권법, 규제 변화 시 유연한 대응 불가

1.2 Multi-Cloud를 선택하는 4가지 이유

Best-of-Breed 전략: 각 클라우드의 강점을 활용합니다.

AWS: 가장 넓은 서비스 포트폴리오, 엔터프라이즈 생태계
GCP: 데이터 분석(BigQuery), AI/ML(Vertex AI), Kubernetes(GKE)
Azure: 엔터프라이즈 통합(Active Directory, Office 365), 하이브리드(Azure Arc)

규제 준수(Compliance): 산업/지역별 데이터 요구사항 충족

금융: 특정 데이터는 국내 리전에 저장 의무
의료: HIPAA 준수 가능한 서비스 선택
공공: 정부 전용 클라우드 리전 활용

재해 복구(DR): 클라우드 제공자 수준의 장애 대비

단일 클라우드 DR: 리전 간 복제 (같은 제공자 내)
멀티클라우드 DR: 제공자 간 복제 (AWS 장애 시 GCP로 페일오버)

비용 최적화: 워크로드별 최적 가격 선택

스팟/프리엠티블 인스턴스 가격 차이 활용
커밋 사용 할인(Reserved/Committed Use) 전략적 분배
데이터 이그레스 비용 비교 후 최적 배치

1.3 Multi-Cloud의 현실적 과제

멀티클라우드가 만능은 아닙니다. 반드시 고려해야 할 과제:

과제	설명	완화 전략
복잡성 증가	2-3배의 운영 오버헤드	IaC, 통합 관리 플랫폼
인력 부족	각 클라우드 전문가 필요	추상화 레이어, 교육 투자
네트워크 비용	클라우드 간 데이터 전송 비용	데이터 지역성 설계
보안 통합	서로 다른 IAM/보안 모델	Zero Trust, 통합 IdP
일관성 유지	서비스별 동작 차이	표준화된 추상화 레이어

2. AWS vs GCP vs Azure: 서비스 매핑

2.1 컴퓨팅 서비스 비교

카테고리	AWS	GCP	Azure
가상 머신	EC2	Compute Engine	Virtual Machines
컨테이너 오케스트레이션	EKS	GKE	AKS
서버리스 컨테이너	Fargate	Cloud Run	Container Apps
서버리스 함수	Lambda	Cloud Functions	Azure Functions
배치 처리	AWS Batch	Cloud Batch	Azure Batch
앱 플랫폼	Elastic Beanstalk	App Engine	App Service
VMware 통합	VMware Cloud on AWS	Google Cloud VMware Engine	Azure VMware Solution

2.2 스토리지 서비스 비교

카테고리	AWS	GCP	Azure
오브젝트 스토리지	S3	Cloud Storage	Blob Storage
블록 스토리지	EBS	Persistent Disk	Managed Disks
파일 스토리지	EFS	Filestore	Azure Files
아카이브	S3 Glacier	Archive Storage	Archive Storage
하이브리드 스토리지	Storage Gateway	Transfer Appliance	StorSimple

2.3 데이터베이스 서비스 비교

카테고리	AWS	GCP	Azure
관계형 DB	RDS, Aurora	Cloud SQL, AlloyDB	Azure SQL, MySQL/PostgreSQL
NoSQL 문서	DynamoDB	Firestore	Cosmos DB
인메모리	ElastiCache	Memorystore	Azure Cache for Redis
그래프 DB	Neptune	-	Cosmos DB (Gremlin)
시계열 DB	Timestream	-	Azure Data Explorer
글로벌 분산 DB	Aurora Global, DynamoDB Global Tables	Spanner	Cosmos DB

2.4 네트워킹 서비스 비교

카테고리	AWS	GCP	Azure
가상 네트워크	VPC	VPC	VNet
로드 밸런서	ALB/NLB/GLB	Cloud Load Balancing	Azure Load Balancer/App Gateway
CDN	CloudFront	Cloud CDN	Azure CDN/Front Door
DNS	Route 53	Cloud DNS	Azure DNS
전용 연결	Direct Connect	Cloud Interconnect	ExpressRoute
VPN	Site-to-Site VPN	Cloud VPN	VPN Gateway
서비스 메시	App Mesh	Traffic Director	-

2.5 AI/ML 서비스 비교

카테고리	AWS	GCP	Azure
ML 플랫폼	SageMaker	Vertex AI	Azure ML
LLM 서비스	Bedrock	Gemini API, Model Garden	Azure OpenAI Service
자연어 처리	Comprehend	Natural Language AI	Cognitive Services
이미지 분석	Rekognition	Vision AI	Computer Vision
음성 처리	Transcribe/Polly	Speech-to-Text/Text-to-Speech	Speech Services
추천 시스템	Personalize	Recommendations AI	Personalizer

2.6 데이터 분석 비교

카테고리	AWS	GCP	Azure
데이터 웨어하우스	Redshift	BigQuery	Synapse Analytics
스트림 처리	Kinesis	Dataflow	Stream Analytics
ETL/ELT	Glue	Dataflow, Dataproc	Data Factory
데이터 카탈로그	Glue Data Catalog	Data Catalog	Purview
BI 도구	QuickSight	Looker	Power BI

3. 멀티클라우드 아키텍처 패턴

3.1 Active-Active 패턴

두 개 이상의 클라우드에서 동시에 트래픽을 처리합니다.

                    ┌─────────────────────┐
                    │   Global DNS / LB   │
                    │  (Route 53 / CF)    │
                    └──────────┬──────────┘
                         ┌─────┴─────┐
                         │           │
                    ┌────▼────┐ ┌────▼────┐
                    │  AWS    │ │  GCP    │
                    │ Region  │ │ Region  │
                    │         │ │         │
                    │ ┌─────┐ │ │ ┌─────┐ │
                    │ │ K8s │ │ │ │ GKE │ │
                    │ │ EKS │ │ │ │     │ │
                    │ └──┬──┘ │ │ └──┬──┘ │
                    │    │    │ │    │    │
                    │ ┌──▼──┐ │ │ ┌──▼──┐ │
                    │ │ DB  │◄┼─┼─► DB  │ │
                    │ │ RDS │ │ │ │ SQL │ │
                    │ └─────┘ │ │ └─────┘ │
                    └─────────┘ └─────────┘

장점: 최대 가용성, 제로 다운타임 페일오버 단점: 데이터 동기화 복잡성, 높은 비용 적합한 경우: 미션 크리티컬 서비스, 글로벌 서비스

3.2 Active-Passive 패턴

주 클라우드에서 트래픽을 처리하고, 보조 클라우드는 DR 용도로 대기합니다.

                    ┌────────────────────┐
                    │    Global DNS      │
                    └────────┬───────────┘
                             │
                    ┌────────▼────────┐          ┌────────────────┐
                    │   AWS (Active)  │  복제 ──►│  Azure (Passive)│
                    │                 │          │                │
                    │  ┌───────────┐  │          │  ┌───────────┐ │
                    │  │ Workloads │  │          │  │ Standby   │ │
                    │  └───────────┘  │          │  └───────────┘ │
                    │  ┌───────────┐  │          │  ┌───────────┐ │
                    │  │ Database  │──┼──────────┼─►│ Replica   │ │
                    │  └───────────┘  │          │  └───────────┘ │
                    └─────────────────┘          └────────────────┘

장점: 합리적인 비용, 클라우드 레벨 DR 단점: 페일오버 시 약간의 다운타임, 패시브 리소스 비용 적합한 경우: 높은 가용성 요구하지만 비용에 민감한 경우

3.3 Cloud Bursting 패턴

평소에는 온프레미스/주 클라우드에서 처리하고, 피크 시 보조 클라우드로 확장합니다.

# Kubernetes Federation - Cloud Bursting 예시
apiVersion: types.kubefed.io/v1beta1
kind: FederatedDeployment
metadata:
  name: web-app
  namespace: production
spec:
  template:
    spec:
      replicas: 10
      containers:
        - name: web
          image: registry.example.com/web-app:v2.1
  placement:
    clusters:
      - name: on-prem-cluster
        weight: 70
      - name: aws-eks-cluster
        weight: 20
      - name: gcp-gke-cluster
        weight: 10
  overrides:
    - clusterName: aws-eks-cluster
      clusterOverrides:
        - path: "/spec/replicas"
          value: 5

3.4 Arbitrage(중재) 패턴

워크로드 특성에 따라 가장 비용 효율적인 클라우드에 배치합니다.

# 클라우드 비용 비교 및 자동 배치 예시
class CloudArbitrage:
    def __init__(self):
        self.providers = {
            "aws": AWSProvider(),
            "gcp": GCPProvider(),
            "azure": AzureProvider()
        }

    def find_optimal_placement(self, workload):
        costs = {}
        for name, provider in self.providers.items():
            cost = provider.estimate_cost(
                cpu=workload.cpu_cores,
                memory_gb=workload.memory_gb,
                storage_gb=workload.storage_gb,
                gpu=workload.gpu_type,
                duration_hours=workload.expected_duration,
                region=workload.preferred_region
            )
            costs[name] = cost

        # 비용 대비 성능 점수 계산
        scores = {}
        for name, cost in costs.items():
            perf = self.providers[name].benchmark_score(workload.type)
            scores[name] = perf / cost  # 가성비

        best = max(scores, key=scores.get)
        return best, costs[best], scores[best]

    def auto_schedule(self, workloads):
        placements = []
        for wl in workloads:
            provider, cost, score = self.find_optimal_placement(wl)
            placements.append({
                "workload": wl.name,
                "provider": provider,
                "estimated_cost": cost,
                "efficiency_score": score
            })
        return placements

4. 하이브리드 클라우드 솔루션

4.1 Google Anthos

Anthos는 GKE를 기반으로 온프레미스, AWS, Azure에서도 동일한 Kubernetes 환경을 제공합니다.

# Anthos Config Management - 멀티 클러스터 설정
apiVersion: configmanagement.gke.io/v1
kind: ConfigManagement
metadata:
  name: config-management
spec:
  clusterName: production-cluster
  git:
    syncRepo: https://github.com/org/anthos-config
    syncBranch: main
    secretType: ssh
    policyDir: "policies"
  policyController:
    enabled: true
    templateLibraryInstalled: true
    referentialRulesEnabled: true
  hierarchyController:
    enabled: true
    enablePodTreeLabels: true
    enableHierarchicalResourceQuotas: true

Anthos 핵심 구성 요소:

Anthos on GKE: GCP 내 관리형 Kubernetes
Anthos on VMware: 온프레미스 vSphere 위에 GKE 실행
Anthos on AWS: AWS에서 Anthos 관리형 클러스터
Anthos on Azure: Azure에서 Anthos 관리형 클러스터
Anthos Config Management: GitOps 기반 멀티 클러스터 정책 관리
Anthos Service Mesh: Istio 기반 서비스 메시 통합 관리

4.2 Azure Arc

Azure Arc는 Azure 관리 플레인을 온프레미스, AWS, GCP 등 어디든 확장합니다.

# Azure Arc에 Kubernetes 클러스터 등록
az connectedk8s connect \
  --name production-eks \
  --resource-group multi-cloud-rg \
  --location eastus \
  --tags "environment=production" "cloud=aws"

# Arc 지원 데이터 서비스 배포
az arcdata dc create \
  --name arc-dc \
  --k8s-namespace arc \
  --connectivity-mode indirect \
  --resource-group multi-cloud-rg \
  --location eastus \
  --storage-class managed-premium \
  --profile-name azure-arc-kubeadm

# Arc 지원 SQL Managed Instance
az sql mi-arc create \
  --name sql-prod \
  --resource-group multi-cloud-rg \
  --location eastus \
  --storage-class-data managed-premium \
  --storage-class-logs managed-premium \
  --cores-limit 8 \
  --memory-limit 32Gi \
  --k8s-namespace arc

4.3 AWS EKS Anywhere

EKS Anywhere는 온프레미스에서 AWS EKS와 동일한 Kubernetes를 실행합니다.

# EKS Anywhere 클러스터 설정
apiVersion: anywhere.eks.amazonaws.com/v1alpha1
kind: Cluster
metadata:
  name: prod-cluster
spec:
  clusterNetwork:
    cniConfig:
      cilium: {}
    pods:
      cidrBlocks:
        - "192.168.0.0/16"
    services:
      cidrBlocks:
        - "10.96.0.0/12"
  controlPlaneConfiguration:
    count: 3
    endpoint:
      host: "10.0.0.100"
    machineGroupRef:
      kind: VSphereMachineConfig
      name: cp-machines
  workerNodeGroupConfigurations:
    - count: 5
      machineGroupRef:
        kind: VSphereMachineConfig
        name: worker-machines
      name: md-0
  kubernetesVersion: "1.29"
  managementCluster:
    name: prod-cluster

4.4 Red Hat OpenShift (하이브리드)

OpenShift는 모든 주요 클라우드와 온프레미스에서 일관된 Kubernetes 플랫폼을 제공합니다.

# OpenShift 멀티클러스터 허브 설정
apiVersion: operator.open-cluster-management.io/v1
kind: MultiClusterHub
metadata:
  name: multiclusterhub
  namespace: open-cluster-management
spec:
  availabilityConfig: High
  enableClusterBackup: true
  overrides:
    components:
      - name: cluster-lifecycle
        enabled: true
      - name: cluster-backup
        enabled: true
      - name: multicluster-engine
        enabled: true
      - name: grc
        enabled: true
      - name: app-lifecycle
        enabled: true

5. 클라우드 마이그레이션 6R 전략

5.1 6R 개요와 의사결정 트리

마이그레이션 대상 워크로드를 분석할 때 6가지 전략(6R) 중 하나를 선택합니다.

워크로드 분석 시작
       │
       ▼
  비즈니스 가치 있는가? ─── No ──► Retire (폐기)
       │ Yes
       ▼
  변경 필요한가? ─── No ──► Retain (유지)
       │ Yes
       ▼
  SaaS로 대체 가능? ─── Yes ──► Repurchase (재구매)
       │ No
       ▼
  아키텍처 변경 필요? ─── No ──► Rehost (리호스트)
       │ Yes                      또는 Replatform
       ▼
  완전 재설계 필요? ─── No ──► Replatform (리플랫폼)
       │ Yes
       ▼
  Refactor (리팩터)

5.2 각 전략 상세

Rehost (Lift and Shift):

기존 애플리케이션을 그대로 클라우드 VM으로 이전
가장 빠르고 리스크가 낮음
클라우드 네이티브 이점은 제한적
AWS Application Migration Service, Azure Migrate, Google Migrate for Compute Engine 활용

Replatform (Lift and Reshape):

핵심 아키텍처는 유지하면서 클라우드 서비스 활용
예: 자체 MySQL을 RDS/Cloud SQL로 이전
중간 수준의 노력, 운영 이점 즉시 확보

Refactor (Re-architect):

클라우드 네이티브로 완전 재설계
마이크로서비스, 서버리스, 컨테이너화
가장 높은 노력, 가장 높은 클라우드 이점
장기적 비용 절감과 확장성

Repurchase (Replace):

SaaS 제품으로 교체
예: 자체 이메일 서버를 Office 365/Google Workspace로
운영 부담 완전 제거

Retire (폐기):

더 이상 필요 없는 워크로드 제거
평균적으로 전체의 10-20%가 폐기 대상

Retain (유지):

아직 마이그레이션할 준비가 안 된 워크로드
레거시 의존성, 규제 요구사항 등

5.3 마이그레이션 평가 스코어카드

# 마이그레이션 전략 결정 자동화 예시
class MigrationAssessor:
    def assess_workload(self, workload):
        score = {
            "business_value": self._rate_business_value(workload),
            "technical_complexity": self._rate_complexity(workload),
            "cloud_readiness": self._rate_cloud_readiness(workload),
            "data_sensitivity": self._rate_data_sensitivity(workload),
            "dependency_count": len(workload.dependencies),
            "team_skill_level": self._rate_team_skills(workload.team)
        }

        # 전략 추천 로직
        if score["business_value"] < 3:
            return "Retire"
        if score["cloud_readiness"] < 2:
            return "Retain"
        if workload.has_saas_alternative and score["technical_complexity"] > 7:
            return "Repurchase"
        if score["technical_complexity"] < 4 and score["cloud_readiness"] > 6:
            return "Rehost"
        if score["technical_complexity"] < 7:
            return "Replatform"
        return "Refactor"

    def generate_report(self, workloads):
        results = {}
        for wl in workloads:
            strategy = self.assess_workload(wl)
            if strategy not in results:
                results[strategy] = []
            results[strategy].append({
                "name": wl.name,
                "estimated_effort_weeks": self._estimate_effort(wl, strategy),
                "estimated_cost": self._estimate_cost(wl, strategy),
                "risk_level": self._assess_risk(wl, strategy)
            })
        return results

6. 마이그레이션 계획 수립

6.1 디스커버리와 의존성 매핑

마이그레이션 전 현재 환경을 정확히 파악하는 것이 핵심입니다.

# AWS Application Discovery Service 사용
aws discovery start-continuous-export

# 에이전트 기반 수집
aws discovery start-data-collection-by-agent-ids \
  --agent-ids agent-001 agent-002 agent-003

# 서버 의존성 맵 조회
aws discovery describe-agents \
  --filters name=hostName,values=web-server-*,condition=CONTAINS

# 수집된 데이터로 마이그레이션 계획 생성
aws migrationhub-strategy create-assessment \
  --s3bucket migration-data \
  --s3key discovery-export.csv

6.2 TCO (Total Cost of Ownership) 분석

# TCO 비교 분석 프레임워크
class TCOAnalysis:
    def calculate_on_prem_tco(self, infra):
        annual_costs = {
            "hardware": infra.server_count * 8000 / 3,  # 3년 감가상각
            "software_licenses": infra.license_costs,
            "datacenter": infra.rack_units * 1200,  # 전력, 냉각, 공간
            "network": infra.bandwidth_gbps * 500,
            "personnel": infra.fte_count * 120000,
            "maintenance": infra.server_count * 2400,
            "disaster_recovery": infra.dr_cost_annual,
            "security": infra.security_cost_annual,
            "compliance": infra.compliance_cost_annual
        }
        return sum(annual_costs.values()), annual_costs

    def calculate_cloud_tco(self, workloads, provider="aws"):
        annual_costs = {
            "compute": self._estimate_compute(workloads, provider),
            "storage": self._estimate_storage(workloads, provider),
            "network": self._estimate_network(workloads, provider),
            "managed_services": self._estimate_managed(workloads, provider),
            "personnel": workloads.cloud_fte * 130000,  # 클라우드 인력
            "migration_amortized": workloads.migration_cost / 3,
            "training": workloads.team_size * 5000,
            "tools": workloads.tool_licenses
        }
        return sum(annual_costs.values()), annual_costs

    def compare(self, infra, workloads):
        on_prem_total, on_prem_detail = self.calculate_on_prem_tco(infra)
        cloud_totals = {}
        for provider in ["aws", "gcp", "azure"]:
            total, detail = self.calculate_cloud_tco(workloads, provider)
            cloud_totals[provider] = {
                "total": total,
                "detail": detail,
                "savings_pct": (on_prem_total - total) / on_prem_total * 100
            }
        return {
            "on_prem": {"total": on_prem_total, "detail": on_prem_detail},
            "cloud": cloud_totals
        }

6.3 마이그레이션 웨이브 계획

대규모 마이그레이션은 웨이브(단계)로 나눠 실행합니다.

웨이브	대상	전략	기간	리스크
Wave 0	파일럿 (비핵심 앱 2-3개)	Rehost	4주	낮음
Wave 1	웹 프론트엔드, 정적 사이트	Rehost/Replatform	6주	낮음
Wave 2	API 서버, 마이크로서비스	Replatform/Refactor	8주	중간
Wave 3	데이터베이스, 스토리지	Replatform	6주	높음
Wave 4	레거시 모놀리스	Refactor	12주	높음
Wave 5	최종 전환, 정리	-	4주	중간

7. 데이터 마이그레이션 전략

7.1 온라인 vs 오프라인 마이그레이션

온라인 마이그레이션 (네트워크 전송):

100TB 이하 데이터에 적합
전용 연결(Direct Connect/ExpressRoute) 권장
증분 동기화 가능

오프라인 마이그레이션 (물리적 전송):

AWS Snowball / Snowball Edge: 최대 80TB/디바이스
AWS Snowmobile: 페타바이트 규모
Azure Data Box: 최대 100TB
Google Transfer Appliance: 최대 300TB

7.2 데이터베이스 마이그레이션

# AWS DMS (Database Migration Service) 태스크 설정 예시
# 소스: 온프레미스 Oracle, 타겟: Amazon Aurora PostgreSQL
Resources:
  DMSReplicationTask:
    Type: AWS::DMS::ReplicationTask
    Properties:
      MigrationType: full-load-and-cdc
      SourceEndpointArn: !Ref OracleSourceEndpoint
      TargetEndpointArn: !Ref AuroraTargetEndpoint
      ReplicationInstanceArn: !Ref DMSReplicationInstance
      TableMappings: |
        {
          "rules": [
            {
              "rule-type": "selection",
              "rule-id": "1",
              "rule-name": "select-all-tables",
              "object-locator": {
                "schema-name": "PROD_SCHEMA",
                "table-name": "%"
              },
              "rule-action": "include"
            },
            {
              "rule-type": "transformation",
              "rule-id": "2",
              "rule-name": "lowercase-schema",
              "rule-action": "convert-lowercase",
              "rule-target": "schema",
              "object-locator": {
                "schema-name": "PROD_SCHEMA"
              }
            }
          ]
        }
      ReplicationTaskSettings: |
        {
          "TargetMetadata": {
            "SupportLobs": true,
            "FullLobMode": false,
            "LobChunkSize": 64
          },
          "FullLoadSettings": {
            "TargetTablePrepMode": "DROP_AND_CREATE",
            "MaxFullLoadSubTasks": 8
          },
          "Logging": {
            "EnableLogging": true,
            "LogComponents": [
              { "Id": "SOURCE_UNLOAD", "Severity": "LOGGER_SEVERITY_DEFAULT" },
              { "Id": "TARGET_LOAD", "Severity": "LOGGER_SEVERITY_DEFAULT" }
            ]
          }
        }

7.3 스토리지 전송

# AWS에서 GCP로 스토리지 전송
gcloud transfer jobs create \
  --source-agent-pool=transfer-pool \
  --source-s3-bucket=my-aws-bucket \
  --source-s3-region=us-east-1 \
  --destination-gcs-bucket=my-gcp-bucket \
  --schedule-starts=2025-01-15T00:00:00Z \
  --schedule-repeats-every=24h \
  --include-prefixes="data/,backups/" \
  --exclude-prefixes="temp/,logs/"

8. 애플리케이션 마이그레이션 패턴

8.1 Strangler Fig 패턴

기존 모놀리스를 점진적으로 마이크로서비스로 교체합니다.

Phase 1: 프록시 레이어 추가
┌────────────────┐
│ API Gateway /  │
│ Load Balancer  │
└───────┬────────┘
        │
        ▼
┌────────────────┐
│   Monolith     │  ← 모든 트래픽
│  (온프레미스)   │
└────────────────┘

Phase 2: 일부 기능 마이그레이션
┌────────────────┐
│ API Gateway    │
└───┬────────┬───┘
    │        │
    ▼        ▼
┌──────┐ ┌──────────┐
│ New  │ │ Monolith │  ← 나머지 트래픽
│ Auth │ │          │
│(Cloud)│ │(온프레미스)│
└──────┘ └──────────┘

Phase 3: 대부분 마이그레이션 완료
┌────────────────┐
│ API Gateway    │
└┬──┬──┬──┬──┬───┘
 │  │  │  │  │
 ▼  ▼  ▼  ▼  ▼
┌─┐┌─┐┌─┐┌─┐┌────────┐
│A││B││C││D││Monolith│ ← 최소 트래픽
└─┘└─┘└─┘└─┘└────────┘
 (Cloud services)

8.2 Blue-Green 전환

# Kubernetes 기반 Blue-Green 전환 설정
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: web-app-migration
spec:
  replicas: 10
  strategy:
    blueGreen:
      activeService: web-app-active
      previewService: web-app-preview
      autoPromotionEnabled: false
      prePromotionAnalysis:
        templates:
          - templateName: migration-validation
        args:
          - name: service-name
            value: web-app-preview
      postPromotionAnalysis:
        templates:
          - templateName: post-migration-check
      scaleDownDelaySeconds: 3600
  selector:
    matchLabels:
      app: web-app
  template:
    metadata:
      labels:
        app: web-app
        version: cloud-native
    spec:
      containers:
        - name: web-app
          image: registry.example.com/web-app:v3.0-cloud
          resources:
            requests:
              cpu: "500m"
              memory: "512Mi"
            limits:
              cpu: "1000m"
              memory: "1Gi"

9. 멀티클라우드 네트워킹

9.1 클라우드 간 연결 옵션

┌─────────────────────────────────────────────────────┐
│                 연결 옵션 비교                        │
├────────────────┬──────────┬──────────┬──────────────┤
│                │  대역폭   │ 지연시간  │  비용         │
├────────────────┼──────────┼──────────┼──────────────┤
│ Public Internet│ 가변적    │ 높음     │ 이그레스 비용  │
│ VPN (IPsec)    │ 1-3 Gbps │ 중간     │ 낮음          │
│ Dedicated Link │ 10-100Gb │ 낮음     │ 높음 (월정액)  │
│ Megaport/Equinix│ 유연     │ 낮음     │ 중간          │
└────────────────┴──────────┴──────────┴──────────────┘

9.2 Transit 아키텍처

# Terraform - AWS Transit Gateway와 VPN 설정
resource "aws_ec2_transit_gateway" "main" {
  description                     = "Multi-cloud transit gateway"
  default_route_table_association = "disable"
  default_route_table_propagation = "disable"
  auto_accept_shared_attachments  = "enable"

  tags = {
    Name = "multi-cloud-tgw"
  }
}

resource "aws_vpn_connection" "to_gcp" {
  customer_gateway_id = aws_customer_gateway.gcp.id
  transit_gateway_id  = aws_ec2_transit_gateway.main.id
  type                = "ipsec.1"
  static_routes_only  = false

  tunnel1_inside_cidr   = "169.254.10.0/30"
  tunnel2_inside_cidr   = "169.254.10.4/30"

  tags = {
    Name = "aws-to-gcp-vpn"
  }
}

resource "aws_customer_gateway" "gcp" {
  bgp_asn    = 65000
  ip_address = var.gcp_vpn_gateway_ip
  type       = "ipsec.1"

  tags = {
    Name = "gcp-customer-gateway"
  }
}

9.3 서비스 메시 (멀티클라우드)

# Istio 멀티클러스터 설정 (Primary-Remote)
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
metadata:
  name: istio-primary
spec:
  values:
    global:
      meshID: multi-cloud-mesh
      multiCluster:
        clusterName: aws-cluster
      network: aws-network
    pilot:
      env:
        EXTERNAL_ISTIOD: "true"
  meshConfig:
    defaultConfig:
      proxyMetadata:
        ISTIO_META_DNS_CAPTURE: "true"
        ISTIO_META_DNS_AUTO_ALLOCATE: "true"
  components:
    ingressGateways:
      - name: istio-eastwestgateway
        label:
          istio: eastwestgateway
          topology.istio.io/network: aws-network
        enabled: true
        k8s:
          env:
            - name: ISTIO_META_REQUESTED_NETWORK_VIEW
              value: aws-network

10. 아이덴티티 페더레이션

10.1 멀티클라우드 IAM 전략

┌─────────────────────────────────────┐
│      중앙 Identity Provider         │
│    (Okta / Azure AD / Google)       │
└──────────┬──────────────────────────┘
           │ SAML / OIDC
     ┌─────┼──────┬───────┐
     ▼     ▼      ▼       ▼
  ┌─────┐┌─────┐┌──────┐┌─────────┐
  │ AWS ││ GCP ││Azure ││On-Prem  │
  │ IAM ││ IAM ││ AD   ││  LDAP   │
  └─────┘└─────┘└──────┘└─────────┘

10.2 OIDC 기반 클라우드 간 인증

# AWS에서 GCP 리소스 접근 (Workload Identity Federation)
import google.auth
from google.auth import impersonated_credentials
import boto3

class CrossCloudAuth:
    def get_gcp_credentials_from_aws(self):
        # AWS STS에서 현재 자격증명 확인
        sts = boto3.client("sts")
        aws_identity = sts.get_caller_identity()

        # GCP Workload Identity Federation 사용
        # AWS 자격증명으로 GCP 토큰 교환
        credentials, project = google.auth.default(
            scopes=["https://www.googleapis.com/auth/cloud-platform"]
        )

        # 서비스 계정 임퍼소네이션
        target_credentials = impersonated_credentials.Credentials(
            source_credentials=credentials,
            target_principal="cross-cloud@project.iam.gserviceaccount.com",
            target_scopes=["https://www.googleapis.com/auth/cloud-platform"],
            lifetime=3600
        )

        return target_credentials

    def setup_workload_identity_pool(self):
        """GCP Workload Identity Pool 설정 (gcloud CLI)"""
        commands = [
            # 풀 생성
            "gcloud iam workload-identity-pools create aws-pool "
            "--location=global "
            "--description='AWS Workload Identity Pool'",

            # AWS 프로바이더 추가
            "gcloud iam workload-identity-pools providers create-aws aws-provider "
            "--location=global "
            "--workload-identity-pool=aws-pool "
            "--account-id=123456789012",

            # 서비스 계정 바인딩
            "gcloud iam service-accounts add-iam-policy-binding "
            "cross-cloud@project.iam.gserviceaccount.com "
            "--role=roles/iam.workloadIdentityUser "
            "--member='principalSet://iam.googleapis.com/"
            "projects/PROJECT_NUM/locations/global/"
            "workloadIdentityPools/aws-pool/attribute.aws_role/"
            "arn:aws:sts::123456789012:assumed-role/my-role'"
        ]
        return commands

11. 클라우드 네이티브 이식성

11.1 Kubernetes 기반 추상화

# 멀티클라우드 Kubernetes 배포 (Helm values)
# values-aws.yaml
cloud:
  provider: aws
  region: us-east-1
  storageClass: gp3
  ingressClass: alb
  serviceAnnotations:
    service.beta.kubernetes.io/aws-load-balancer-type: nlb
    service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing

# values-gcp.yaml
cloud:
  provider: gcp
  region: us-central1
  storageClass: pd-ssd
  ingressClass: gce
  serviceAnnotations:
    cloud.google.com/neg: '{"ingress": true}'
    cloud.google.com/backend-config: '{"default": "backend-config"}'

# values-azure.yaml
cloud:
  provider: azure
  region: eastus
  storageClass: managed-premium
  ingressClass: azure-application-gateway
  serviceAnnotations:
    service.beta.kubernetes.io/azure-load-balancer-internal: "false"

11.2 Terraform 멀티클라우드 모듈

# modules/compute/main.tf - 클라우드 추상화 레이어
variable "cloud_provider" {
  type = string
  validation {
    condition     = contains(["aws", "gcp", "azure"], var.cloud_provider)
    error_message = "Supported providers: aws, gcp, azure"
  }
}

variable "instance_config" {
  type = object({
    name          = string
    cpu           = number
    memory_gb     = number
    disk_gb       = number
    os            = string
  })
}

# AWS 구현
module "aws_compute" {
  source = "./aws"
  count  = var.cloud_provider == "aws" ? 1 : 0

  instance_type = local.aws_instance_map[
    "${var.instance_config.cpu}-${var.instance_config.memory_gb}"
  ]
  ami_id      = local.aws_ami_map[var.instance_config.os]
  volume_size = var.instance_config.disk_gb
  name        = var.instance_config.name
}

# GCP 구현
module "gcp_compute" {
  source = "./gcp"
  count  = var.cloud_provider == "gcp" ? 1 : 0

  machine_type = local.gcp_machine_map[
    "${var.instance_config.cpu}-${var.instance_config.memory_gb}"
  ]
  image      = local.gcp_image_map[var.instance_config.os]
  disk_size  = var.instance_config.disk_gb
  name       = var.instance_config.name
}

# Azure 구현
module "azure_compute" {
  source = "./azure"
  count  = var.cloud_provider == "azure" ? 1 : 0

  vm_size     = local.azure_vm_map[
    "${var.instance_config.cpu}-${var.instance_config.memory_gb}"
  ]
  image_ref   = local.azure_image_map[var.instance_config.os]
  disk_size   = var.instance_config.disk_gb
  name        = var.instance_config.name
}

output "instance_id" {
  value = coalesce(
    try(module.aws_compute[0].instance_id, ""),
    try(module.gcp_compute[0].instance_id, ""),
    try(module.azure_compute[0].instance_id, "")
  )
}

11.3 OCI 컨테이너 이미지 전략

# 멀티 스테이지 빌드 - 클라우드 독립적 이미지
FROM golang:1.22-alpine AS builder

WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download

COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -o /app/server ./cmd/server

# 최종 이미지 - distroless (클라우드 무관)
FROM gcr.io/distroless/static-debian12:nonroot

COPY --from=builder /app/server /server
COPY --from=builder /app/config /config

EXPOSE 8080
USER nonroot:nonroot
ENTRYPOINT ["/server"]

12. 멀티클라우드 재해 복구(DR)

12.1 DR 전략 비교

DR 전략	RTO	RPO	비용	복잡성
Backup and Restore	시간 단위	시간 단위	낮음	낮음
Pilot Light	10-30분	분 단위	중간	중간
Warm Standby	분 단위	초 단위	높음	높음
Active-Active	초 단위	거의 0	매우 높음	매우 높음

12.2 멀티클라우드 DR 구현

# 멀티클라우드 DR 오케스트레이터
class MultiCloudDR:
    def __init__(self):
        self.primary = AWSProvider(region="us-east-1")
        self.secondary = AzureProvider(region="eastus")
        self.health_checker = HealthChecker()

    def execute_failover(self):
        """Primary(AWS) 장애 시 Secondary(Azure)로 전환"""
        steps = [
            self._verify_secondary_health,
            self._promote_database_replica,
            self._update_dns_records,
            self._scale_up_secondary,
            self._verify_application_health,
            self._notify_stakeholders
        ]

        for step in steps:
            result = step()
            if not result.success:
                self._rollback_failover(result.step_index)
                raise FailoverError(f"Failed at step: {step.__name__}")

    def _promote_database_replica(self):
        """Azure SQL 읽기 복제본을 프라이머리로 승격"""
        self.secondary.promote_replica(
            server="dr-sql-server",
            database="production-db",
            failover_group="prod-failover-group"
        )

    def _update_dns_records(self):
        """Route 53 / Azure DNS 레코드 업데이트"""
        self.primary.update_dns(
            zone="example.com",
            record="api.example.com",
            target=self.secondary.get_endpoint(),
            ttl=60
        )

    def continuous_replication(self):
        """지속적 데이터 동기화"""
        replication_config = {
            "database": {
                "type": "async",
                "lag_threshold_seconds": 30,
                "source": "aws-rds-primary",
                "target": "azure-sql-replica"
            },
            "storage": {
                "type": "incremental",
                "interval_minutes": 15,
                "source": "s3://prod-bucket",
                "target": "azure://prod-container"
            },
            "secrets": {
                "type": "sync",
                "source": "aws-secrets-manager",
                "target": "azure-key-vault"
            }
        }
        return replication_config

13. 멀티클라우드 비용 관리

13.1 통합 비용 모니터링

# 멀티클라우드 비용 대시보드 수집기
class MultiCloudCostCollector:
    def collect_all_costs(self, period="monthly"):
        aws_costs = self._get_aws_costs(period)
        gcp_costs = self._get_gcp_costs(period)
        azure_costs = self._get_azure_costs(period)

        return {
            "total": aws_costs["total"] + gcp_costs["total"] + azure_costs["total"],
            "by_provider": {
                "aws": aws_costs,
                "gcp": gcp_costs,
                "azure": azure_costs
            },
            "by_service": self._aggregate_by_service(
                aws_costs, gcp_costs, azure_costs
            ),
            "by_team": self._aggregate_by_tag(
                "team", aws_costs, gcp_costs, azure_costs
            ),
            "anomalies": self._detect_anomalies(
                aws_costs, gcp_costs, azure_costs
            ),
            "recommendations": self._generate_optimization_recommendations(
                aws_costs, gcp_costs, azure_costs
            )
        }

    def _detect_anomalies(self, *provider_costs):
        """비용 이상 감지"""
        anomalies = []
        for costs in provider_costs:
            for service, cost in costs["by_service"].items():
                avg = cost.get("rolling_avg_30d", 0)
                current = cost.get("current", 0)
                if avg > 0 and current > avg * 1.5:
                    anomalies.append({
                        "service": service,
                        "provider": costs["provider"],
                        "current": current,
                        "average": avg,
                        "increase_pct": (current - avg) / avg * 100
                    })
        return anomalies

13.2 비용 최적화 전략

전략	절감률	적용 대상
Reserved/Committed Use	30-60%	안정적 워크로드
스팟/프리엠티블 인스턴스	60-90%	배치, 테스트
자동 스케일링	20-40%	변동 트래픽
적정 크기(Right-sizing)	15-35%	모든 워크로드
스토리지 티어링	40-70%	아카이브 데이터
네트워크 최적화	10-30%	크로스 클라우드 통신

14. 거버넌스와 컴플라이언스

14.1 멀티클라우드 거버넌스 프레임워크

# Open Policy Agent (OPA) - 멀티클라우드 정책
package multicloud.governance

# 모든 리소스에 필수 태그 요구
required_tags := ["environment", "team", "cost-center", "data-classification"]

deny[msg] {
  resource := input.resource
  tag := required_tags[_]
  not resource.tags[tag]
  msg := sprintf(
    "Resource %v is missing required tag: %v",
    [resource.name, tag]
  )
}

# 데이터 주권 - 특정 데이터는 특정 리전에만 저장
deny[msg] {
  resource := input.resource
  resource.tags["data-classification"] == "pii-kr"
  not startswith(resource.region, "ap-northeast-2")  # 서울
  not startswith(resource.region, "korea")
  msg := sprintf(
    "Korean PII data must be stored in Korea region. Resource %v in %v",
    [resource.name, resource.region]
  )
}

# 비용 한도 초과 방지
deny[msg] {
  resource := input.resource
  resource.type == "compute_instance"
  resource.monthly_cost > 5000
  not resource.tags["approved-high-cost"] == "true"
  msg := sprintf(
    "Instance %v exceeds $5000/month limit. Get approval first.",
    [resource.name]
  )
}

# 암호화 필수
deny[msg] {
  resource := input.resource
  resource.type == "storage_bucket"
  not resource.encryption.enabled
  msg := sprintf(
    "Storage %v must have encryption enabled",
    [resource.name]
  )
}

15. 실전 퀴즈

Q1. 멀티클라우드 아키텍처에서 Active-Active 패턴과 Active-Passive 패턴의 핵심 차이점은 무엇이며, 각각 어떤 상황에 적합한가요?

Active-Active: 두 클라우드가 동시에 트래픽을 처리합니다. 최대 가용성을 제공하지만, 데이터 동기화 복잡성과 비용이 높습니다. 미션 크리티컬 서비스, 글로벌 서비스에 적합합니다.

Active-Passive: 주 클라우드만 트래픽을 처리하고 보조 클라우드는 DR 용도로 대기합니다. 비용이 합리적이지만 페일오버 시 약간의 다운타임이 발생합니다. 높은 가용성은 필요하지만 비용에 민감한 경우 적합합니다.

핵심 차이는 동시 처리 여부와 **페일오버 시간(RTO)**입니다. Active-Active는 RTO가 거의 0이고, Active-Passive는 수분의 RTO를 가집니다.

Q2. 6R 마이그레이션 전략 중 Replatform과 Refactor의 차이를 설명하고, 각각의 적합한 사용 사례를 제시하세요.

Replatform: 핵심 아키텍처는 유지하면서 일부를 클라우드 관리형 서비스로 교체합니다. 예를 들어 자체 MySQL을 Amazon RDS로 이전하거나, 자체 Redis를 ElastiCache로 교체합니다. 중간 수준의 노력으로 운영 이점을 빠르게 얻을 수 있습니다.

Refactor: 클라우드 네이티브로 완전히 재설계합니다. 모놀리스를 마이크로서비스로 분해하거나, 서버리스 아키텍처로 전환합니다. 가장 높은 노력이 필요하지만 장기적으로 최대의 클라우드 이점을 얻습니다.

Replatform은 빠른 성과가 필요할 때, Refactor는 장기적 혁신이 목표일 때 적합합니다.

Q3. Workload Identity Federation이란 무엇이며, 왜 서비스 계정 키보다 안전한가요?

Workload Identity Federation은 외부 IdP(AWS IAM, Azure AD 등)의 자격증명을 GCP의 임시 토큰으로 교환하는 메커니즘입니다.

서비스 계정 키보다 안전한 이유:

키 관리 불필요: JSON 키 파일을 생성/배포/로테이션할 필요가 없음
임시 토큰: 교환된 토큰은 1시간 후 자동 만료
최소 권한: 특정 속성(역할, 태그 등) 기반으로 세밀한 접근 제어
감사 용이: 모든 토큰 교환이 Cloud Audit Logs에 기록
유출 리스크 감소: 장기 자격증명이 존재하지 않으므로 유출될 것이 없음

Q4. Strangler Fig 패턴으로 모놀리스를 마이크로서비스로 마이그레이션할 때의 핵심 단계와 주의사항을 설명하세요.

핵심 단계:

프록시/API Gateway 추가: 모든 트래픽이 게이트웨이를 거치도록 설정
기능 식별: 분리할 수 있는 바운디드 컨텍스트 식별
점진적 추출: 하나씩 마이크로서비스로 추출, 게이트웨이에서 라우팅 변경
데이터 분리: 공유 DB에서 서비스별 DB로 분리
모놀리스 축소: 모든 기능 추출 후 모놀리스 폐기

주의사항:

데이터베이스 분리가 가장 어려운 부분 - 트랜잭션 일관성 주의
한 번에 너무 많은 서비스를 추출하지 말 것
서비스 간 통신 패턴(동기/비동기) 신중히 결정
모니터링/관찰성을 먼저 구축한 후 마이그레이션 시작
롤백 계획 필수

Q5. 멀티클라우드 환경에서 데이터 이그레스(egress) 비용을 최적화하는 전략을 3가지 이상 제시하세요.

데이터 지역성(Data Locality) 설계: 처리 엔진을 데이터가 있는 클라우드에 배치. 데이터를 옮기지 말고 컴퓨팅을 옮김
CDN 활용: 클라우드 내부에서 외부로 나가는 트래픽을 CDN으로 캐싱하여 오리진 이그레스 감소
압축과 프로토콜 최적화: gRPC, Protobuf 등 효율적인 직렬화로 전송 데이터 크기 감소
전용 연결(Dedicated Interconnect): Direct Connect, ExpressRoute 등을 사용하면 이그레스 비용이 인터넷 대비 낮음
비동기 배치 전송: 실시간 전송 대신 데이터를 모아서 비피크 시간에 배치 전송
Private Peering: Megaport, Equinix Fabric 등 중립적 교환 포인트를 통해 클라우드 간 직접 피어링

참고 자료

Multi-Cloud Strategy & Migration Complete Guide 2025: AWS/GCP/Azure Comparison, Hybrid Cloud

1. Why Multi-Cloud?

As the cloud market matures, concentrating all workloads on a single cloud provider becomes increasingly risky. As of 2025, approximately 89% of Fortune 500 companies have adopted multi-cloud strategies, using an average of 2.6 public clouds.

1.1 Vendor Lock-in Risk

Key risks of depending on a single cloud:

Loss of pricing leverage: No alternative means being subject to provider price increases
Service disruption risk: Major cloud providers averaged 4.2 large-scale outages each in 2024
Technology dependency: Migration costs grow exponentially when using proprietary services (AWS Lambda, Azure Functions, etc.)
Regulatory changes: Inability to respond flexibly to data sovereignty laws and regulatory changes

1.2 Four Reasons to Choose Multi-Cloud

Best-of-Breed Strategy: Leverage each cloud's strengths.

AWS: Broadest service portfolio, enterprise ecosystem
GCP: Data analytics (BigQuery), AI/ML (Vertex AI), Kubernetes (GKE)
Azure: Enterprise integration (Active Directory, Office 365), hybrid (Azure Arc)

Compliance: Meet industry/regional data requirements

Finance: Mandatory storage of certain data in domestic regions
Healthcare: Select HIPAA-compliant services
Public sector: Government-specific cloud regions

Disaster Recovery (DR): Protection against provider-level failures

Single-cloud DR: Cross-region replication (within same provider)
Multi-cloud DR: Cross-provider replication (failover from AWS to GCP if AWS fails)

Cost Optimization: Choose optimal pricing per workload

Leverage spot/preemptible instance price differences
Strategically distribute committed use discounts (Reserved/Committed Use)
Compare data egress costs for optimal placement

1.3 Realistic Challenges of Multi-Cloud

Multi-cloud is not a silver bullet. Challenges you must consider:

Challenge	Description	Mitigation Strategy
Increased complexity	2-3x operational overhead	IaC, unified management platforms
Talent shortage	Experts needed for each cloud	Abstraction layers, training investment
Network costs	Cross-cloud data transfer costs	Data locality design
Security integration	Different IAM/security models	Zero Trust, unified IdP
Consistency	Behavioral differences across services	Standardized abstraction layers

2. AWS vs GCP vs Azure: Service Mapping

2.1 Compute Services Comparison

Category	AWS	GCP	Azure
Virtual Machines	EC2	Compute Engine	Virtual Machines
Container Orchestration	EKS	GKE	AKS
Serverless Containers	Fargate	Cloud Run	Container Apps
Serverless Functions	Lambda	Cloud Functions	Azure Functions
Batch Processing	AWS Batch	Cloud Batch	Azure Batch
App Platform	Elastic Beanstalk	App Engine	App Service
VMware Integration	VMware Cloud on AWS	Google Cloud VMware Engine	Azure VMware Solution

2.2 Storage Services Comparison

Category	AWS	GCP	Azure
Object Storage	S3	Cloud Storage	Blob Storage
Block Storage	EBS	Persistent Disk	Managed Disks
File Storage	EFS	Filestore	Azure Files
Archive	S3 Glacier	Archive Storage	Archive Storage
Hybrid Storage	Storage Gateway	Transfer Appliance	StorSimple

2.3 Database Services Comparison

Category	AWS	GCP	Azure
Relational DB	RDS, Aurora	Cloud SQL, AlloyDB	Azure SQL, MySQL/PostgreSQL
NoSQL Document	DynamoDB	Firestore	Cosmos DB
In-Memory	ElastiCache	Memorystore	Azure Cache for Redis
Graph DB	Neptune	-	Cosmos DB (Gremlin)
Time-Series DB	Timestream	-	Azure Data Explorer
Global Distributed DB	Aurora Global, DynamoDB Global Tables	Spanner	Cosmos DB

2.4 Networking Services Comparison

Category	AWS	GCP	Azure
Virtual Network	VPC	VPC	VNet
Load Balancer	ALB/NLB/GLB	Cloud Load Balancing	Azure Load Balancer/App Gateway
CDN	CloudFront	Cloud CDN	Azure CDN/Front Door
DNS	Route 53	Cloud DNS	Azure DNS
Dedicated Connection	Direct Connect	Cloud Interconnect	ExpressRoute
VPN	Site-to-Site VPN	Cloud VPN	VPN Gateway
Service Mesh	App Mesh	Traffic Director	-

2.5 AI/ML Services Comparison

Category	AWS	GCP	Azure
ML Platform	SageMaker	Vertex AI	Azure ML
LLM Service	Bedrock	Gemini API, Model Garden	Azure OpenAI Service
NLP	Comprehend	Natural Language AI	Cognitive Services
Image Analysis	Rekognition	Vision AI	Computer Vision
Speech	Transcribe/Polly	Speech-to-Text/Text-to-Speech	Speech Services
Recommendation	Personalize	Recommendations AI	Personalizer

2.6 Data Analytics Comparison

Category	AWS	GCP	Azure
Data Warehouse	Redshift	BigQuery	Synapse Analytics
Stream Processing	Kinesis	Dataflow	Stream Analytics
ETL/ELT	Glue	Dataflow, Dataproc	Data Factory
Data Catalog	Glue Data Catalog	Data Catalog	Purview
BI Tool	QuickSight	Looker	Power BI

3. Multi-Cloud Architecture Patterns

3.1 Active-Active Pattern

Process traffic simultaneously across two or more clouds.

                    ┌─────────────────────┐
                    │   Global DNS / LB   │
                    │  (Route 53 / CF)    │
                    └──────────┬──────────┘
                         ┌─────┴─────┐
                         │           │
                    ┌────▼────┐ ┌────▼────┐
                    │  AWS    │ │  GCP    │
                    │ Region  │ │ Region  │
                    │         │ │         │
                    │ ┌─────┐ │ │ ┌─────┐ │
                    │ │ K8s │ │ │ │ GKE │ │
                    │ │ EKS │ │ │ │     │ │
                    │ └──┬──┘ │ │ └──┬──┘ │
                    │    │    │ │    │    │
                    │ ┌──▼──┐ │ │ ┌──▼──┐ │
                    │ │ DB  │◄┼─┼─► DB  │ │
                    │ │ RDS │ │ │ │ SQL │ │
                    │ └─────┘ │ │ └─────┘ │
                    └─────────┘ └─────────┘

Pros: Maximum availability, zero-downtime failover Cons: Data synchronization complexity, high cost Best for: Mission-critical services, global services

3.2 Active-Passive Pattern

Primary cloud processes traffic while secondary cloud stands by for DR.

                    ┌────────────────────┐
                    │    Global DNS      │
                    └────────┬───────────┘
                             │
                    ┌────────▼────────┐          ┌────────────────┐
                    │   AWS (Active)  │  Repl ──►│  Azure (Passive)│
                    │                 │          │                │
                    │  ┌───────────┐  │          │  ┌───────────┐ │
                    │  │ Workloads │  │          │  │ Standby   │ │
                    │  └───────────┘  │          │  └───────────┘ │
                    │  ┌───────────┐  │          │  ┌───────────┐ │
                    │  │ Database  │──┼──────────┼─►│ Replica   │ │
                    │  └───────────┘  │          │  └───────────┘ │
                    └─────────────────┘          └────────────────┘

Pros: Reasonable cost, cloud-level DR Cons: Some downtime during failover, passive resource cost Best for: High availability requirements with cost sensitivity

3.3 Cloud Bursting Pattern

Normally processes on-premises/primary cloud, bursts to secondary cloud during peak.

# Kubernetes Federation - Cloud Bursting Example
apiVersion: types.kubefed.io/v1beta1
kind: FederatedDeployment
metadata:
  name: web-app
  namespace: production
spec:
  template:
    spec:
      replicas: 10
      containers:
        - name: web
          image: registry.example.com/web-app:v2.1
  placement:
    clusters:
      - name: on-prem-cluster
        weight: 70
      - name: aws-eks-cluster
        weight: 20
      - name: gcp-gke-cluster
        weight: 10
  overrides:
    - clusterName: aws-eks-cluster
      clusterOverrides:
        - path: "/spec/replicas"
          value: 5

3.4 Arbitrage Pattern

Place workloads on the most cost-effective cloud based on characteristics.

# Cloud cost comparison and auto-placement example
class CloudArbitrage:
    def __init__(self):
        self.providers = {
            "aws": AWSProvider(),
            "gcp": GCPProvider(),
            "azure": AzureProvider()
        }

    def find_optimal_placement(self, workload):
        costs = {}
        for name, provider in self.providers.items():
            cost = provider.estimate_cost(
                cpu=workload.cpu_cores,
                memory_gb=workload.memory_gb,
                storage_gb=workload.storage_gb,
                gpu=workload.gpu_type,
                duration_hours=workload.expected_duration,
                region=workload.preferred_region
            )
            costs[name] = cost

        # Calculate performance-to-cost ratio
        scores = {}
        for name, cost in costs.items():
            perf = self.providers[name].benchmark_score(workload.type)
            scores[name] = perf / cost

        best = max(scores, key=scores.get)
        return best, costs[best], scores[best]

    def auto_schedule(self, workloads):
        placements = []
        for wl in workloads:
            provider, cost, score = self.find_optimal_placement(wl)
            placements.append({
                "workload": wl.name,
                "provider": provider,
                "estimated_cost": cost,
                "efficiency_score": score
            })
        return placements

4. Hybrid Cloud Solutions

4.1 Google Anthos

Anthos provides the same Kubernetes environment across on-premises, AWS, and Azure, based on GKE.

# Anthos Config Management - Multi-cluster Setup
apiVersion: configmanagement.gke.io/v1
kind: ConfigManagement
metadata:
  name: config-management
spec:
  clusterName: production-cluster
  git:
    syncRepo: https://github.com/org/anthos-config
    syncBranch: main
    secretType: ssh
    policyDir: "policies"
  policyController:
    enabled: true
    templateLibraryInstalled: true
    referentialRulesEnabled: true
  hierarchyController:
    enabled: true
    enablePodTreeLabels: true
    enableHierarchicalResourceQuotas: true

Anthos Core Components:

Anthos on GKE: Managed Kubernetes on GCP
Anthos on VMware: GKE running on on-premises vSphere
Anthos on AWS: Anthos managed clusters on AWS
Anthos on Azure: Anthos managed clusters on Azure
Anthos Config Management: GitOps-based multi-cluster policy management
Anthos Service Mesh: Unified Istio-based service mesh management

4.2 Azure Arc

Azure Arc extends the Azure management plane to on-premises, AWS, GCP, and beyond.

# Register Kubernetes cluster with Azure Arc
az connectedk8s connect \
  --name production-eks \
  --resource-group multi-cloud-rg \
  --location eastus \
  --tags "environment=production" "cloud=aws"

# Deploy Arc-enabled Data Services
az arcdata dc create \
  --name arc-dc \
  --k8s-namespace arc \
  --connectivity-mode indirect \
  --resource-group multi-cloud-rg \
  --location eastus \
  --storage-class managed-premium \
  --profile-name azure-arc-kubeadm

# Arc-enabled SQL Managed Instance
az sql mi-arc create \
  --name sql-prod \
  --resource-group multi-cloud-rg \
  --location eastus \
  --storage-class-data managed-premium \
  --storage-class-logs managed-premium \
  --cores-limit 8 \
  --memory-limit 32Gi \
  --k8s-namespace arc

4.3 AWS EKS Anywhere

EKS Anywhere runs the same Kubernetes as AWS EKS on-premises.

# EKS Anywhere Cluster Configuration
apiVersion: anywhere.eks.amazonaws.com/v1alpha1
kind: Cluster
metadata:
  name: prod-cluster
spec:
  clusterNetwork:
    cniConfig:
      cilium: {}
    pods:
      cidrBlocks:
        - "192.168.0.0/16"
    services:
      cidrBlocks:
        - "10.96.0.0/12"
  controlPlaneConfiguration:
    count: 3
    endpoint:
      host: "10.0.0.100"
    machineGroupRef:
      kind: VSphereMachineConfig
      name: cp-machines
  workerNodeGroupConfigurations:
    - count: 5
      machineGroupRef:
        kind: VSphereMachineConfig
        name: worker-machines
      name: md-0
  kubernetesVersion: "1.29"
  managementCluster:
    name: prod-cluster

4.4 Red Hat OpenShift (Hybrid)

OpenShift provides a consistent Kubernetes platform across all major clouds and on-premises.

# OpenShift Multi-Cluster Hub Configuration
apiVersion: operator.open-cluster-management.io/v1
kind: MultiClusterHub
metadata:
  name: multiclusterhub
  namespace: open-cluster-management
spec:
  availabilityConfig: High
  enableClusterBackup: true
  overrides:
    components:
      - name: cluster-lifecycle
        enabled: true
      - name: cluster-backup
        enabled: true
      - name: multicluster-engine
        enabled: true
      - name: grc
        enabled: true
      - name: app-lifecycle
        enabled: true

5. Cloud Migration 6R Strategy

5.1 6R Overview and Decision Tree

When analyzing workloads for migration, choose one of six strategies (6R).

Start Workload Analysis
       |
       v
  Has business value? --- No ---> Retire
       | Yes
       v
  Needs changes? --- No ---> Retain
       | Yes
       v
  SaaS replacement available? --- Yes ---> Repurchase
       | No
       v
  Architecture change needed? --- No ---> Rehost
       | Yes                               or Replatform
       v
  Full redesign needed? --- No ---> Replatform
       | Yes
       v
  Refactor

5.2 Detailed Strategy Descriptions

Rehost (Lift and Shift):

Move existing applications as-is to cloud VMs
Fastest and lowest risk
Limited cloud-native benefits
Use AWS Application Migration Service, Azure Migrate, Google Migrate for Compute Engine

Replatform (Lift and Reshape):

Maintain core architecture while leveraging cloud services
Example: Migrate self-managed MySQL to RDS/Cloud SQL
Moderate effort, immediate operational benefits

Refactor (Re-architect):

Complete redesign for cloud-native
Microservices, serverless, containerization
Highest effort, highest cloud benefits
Long-term cost savings and scalability

Repurchase (Replace):

Replace with SaaS products
Example: Self-managed email server to Office 365/Google Workspace
Complete elimination of operational burden

Retire:

Remove workloads no longer needed
Typically 10-20% of total workloads are retirement candidates

Retain:

Workloads not yet ready for migration
Legacy dependencies, regulatory requirements, etc.

5.3 Migration Assessment Scorecard

# Migration strategy decision automation example
class MigrationAssessor:
    def assess_workload(self, workload):
        score = {
            "business_value": self._rate_business_value(workload),
            "technical_complexity": self._rate_complexity(workload),
            "cloud_readiness": self._rate_cloud_readiness(workload),
            "data_sensitivity": self._rate_data_sensitivity(workload),
            "dependency_count": len(workload.dependencies),
            "team_skill_level": self._rate_team_skills(workload.team)
        }

        # Strategy recommendation logic
        if score["business_value"] < 3:
            return "Retire"
        if score["cloud_readiness"] < 2:
            return "Retain"
        if workload.has_saas_alternative and score["technical_complexity"] > 7:
            return "Repurchase"
        if score["technical_complexity"] < 4 and score["cloud_readiness"] > 6:
            return "Rehost"
        if score["technical_complexity"] < 7:
            return "Replatform"
        return "Refactor"

    def generate_report(self, workloads):
        results = {}
        for wl in workloads:
            strategy = self.assess_workload(wl)
            if strategy not in results:
                results[strategy] = []
            results[strategy].append({
                "name": wl.name,
                "estimated_effort_weeks": self._estimate_effort(wl, strategy),
                "estimated_cost": self._estimate_cost(wl, strategy),
                "risk_level": self._assess_risk(wl, strategy)
            })
        return results

6. Migration Planning

6.1 Discovery and Dependency Mapping

Accurately understanding your current environment before migration is key.

# Using AWS Application Discovery Service
aws discovery start-continuous-export

# Agent-based collection
aws discovery start-data-collection-by-agent-ids \
  --agent-ids agent-001 agent-002 agent-003

# Query server dependency map
aws discovery describe-agents \
  --filters name=hostName,values=web-server-*,condition=CONTAINS

# Generate migration plan from collected data
aws migrationhub-strategy create-assessment \
  --s3bucket migration-data \
  --s3key discovery-export.csv

6.2 TCO (Total Cost of Ownership) Analysis

# TCO comparison analysis framework
class TCOAnalysis:
    def calculate_on_prem_tco(self, infra):
        annual_costs = {
            "hardware": infra.server_count * 8000 / 3,  # 3-year depreciation
            "software_licenses": infra.license_costs,
            "datacenter": infra.rack_units * 1200,  # Power, cooling, space
            "network": infra.bandwidth_gbps * 500,
            "personnel": infra.fte_count * 120000,
            "maintenance": infra.server_count * 2400,
            "disaster_recovery": infra.dr_cost_annual,
            "security": infra.security_cost_annual,
            "compliance": infra.compliance_cost_annual
        }
        return sum(annual_costs.values()), annual_costs

    def calculate_cloud_tco(self, workloads, provider="aws"):
        annual_costs = {
            "compute": self._estimate_compute(workloads, provider),
            "storage": self._estimate_storage(workloads, provider),
            "network": self._estimate_network(workloads, provider),
            "managed_services": self._estimate_managed(workloads, provider),
            "personnel": workloads.cloud_fte * 130000,
            "migration_amortized": workloads.migration_cost / 3,
            "training": workloads.team_size * 5000,
            "tools": workloads.tool_licenses
        }
        return sum(annual_costs.values()), annual_costs

    def compare(self, infra, workloads):
        on_prem_total, on_prem_detail = self.calculate_on_prem_tco(infra)
        cloud_totals = {}
        for provider in ["aws", "gcp", "azure"]:
            total, detail = self.calculate_cloud_tco(workloads, provider)
            cloud_totals[provider] = {
                "total": total,
                "detail": detail,
                "savings_pct": (on_prem_total - total) / on_prem_total * 100
            }
        return {
            "on_prem": {"total": on_prem_total, "detail": on_prem_detail},
            "cloud": cloud_totals
        }

6.3 Migration Wave Planning

Large-scale migrations are executed in waves (phases).

Wave	Target	Strategy	Duration	Risk
Wave 0	Pilot (2-3 non-critical apps)	Rehost	4 weeks	Low
Wave 1	Web frontends, static sites	Rehost/Replatform	6 weeks	Low
Wave 2	API servers, microservices	Replatform/Refactor	8 weeks	Medium
Wave 3	Databases, storage	Replatform	6 weeks	High
Wave 4	Legacy monoliths	Refactor	12 weeks	High
Wave 5	Final cutover, cleanup	-	4 weeks	Medium

7. Data Migration Strategies

7.1 Online vs Offline Migration

Online Migration (Network transfer):

Suitable for data under 100TB
Dedicated connections (Direct Connect/ExpressRoute) recommended
Incremental sync possible

Offline Migration (Physical transfer):

AWS Snowball / Snowball Edge: Up to 80TB/device
AWS Snowmobile: Petabyte scale
Azure Data Box: Up to 100TB
Google Transfer Appliance: Up to 300TB

7.2 Database Migration

# AWS DMS (Database Migration Service) Task Configuration
# Source: On-premises Oracle, Target: Amazon Aurora PostgreSQL
Resources:
  DMSReplicationTask:
    Type: AWS::DMS::ReplicationTask
    Properties:
      MigrationType: full-load-and-cdc
      SourceEndpointArn: !Ref OracleSourceEndpoint
      TargetEndpointArn: !Ref AuroraTargetEndpoint
      ReplicationInstanceArn: !Ref DMSReplicationInstance
      TableMappings: |
        {
          "rules": [
            {
              "rule-type": "selection",
              "rule-id": "1",
              "rule-name": "select-all-tables",
              "object-locator": {
                "schema-name": "PROD_SCHEMA",
                "table-name": "%"
              },
              "rule-action": "include"
            },
            {
              "rule-type": "transformation",
              "rule-id": "2",
              "rule-name": "lowercase-schema",
              "rule-action": "convert-lowercase",
              "rule-target": "schema",
              "object-locator": {
                "schema-name": "PROD_SCHEMA"
              }
            }
          ]
        }
      ReplicationTaskSettings: |
        {
          "TargetMetadata": {
            "SupportLobs": true,
            "FullLobMode": false,
            "LobChunkSize": 64
          },
          "FullLoadSettings": {
            "TargetTablePrepMode": "DROP_AND_CREATE",
            "MaxFullLoadSubTasks": 8
          },
          "Logging": {
            "EnableLogging": true,
            "LogComponents": [
              { "Id": "SOURCE_UNLOAD", "Severity": "LOGGER_SEVERITY_DEFAULT" },
              { "Id": "TARGET_LOAD", "Severity": "LOGGER_SEVERITY_DEFAULT" }
            ]
          }
        }

7.3 Storage Transfer

# Transfer storage from AWS to GCP
gcloud transfer jobs create \
  --source-agent-pool=transfer-pool \
  --source-s3-bucket=my-aws-bucket \
  --source-s3-region=us-east-1 \
  --destination-gcs-bucket=my-gcp-bucket \
  --schedule-starts=2025-01-15T00:00:00Z \
  --schedule-repeats-every=24h \
  --include-prefixes="data/,backups/" \
  --exclude-prefixes="temp/,logs/"

8. Application Migration Patterns

8.1 Strangler Fig Pattern

Gradually replace an existing monolith with microservices.

Phase 1: Add Proxy Layer
┌────────────────┐
│ API Gateway /  │
│ Load Balancer  │
└───────┬────────┘
        │
        v
┌────────────────┐
│   Monolith     │  <- All traffic
│  (On-Prem)     │
└────────────────┘

Phase 2: Migrate Some Functions
┌────────────────┐
│ API Gateway    │
└───┬────────┬───┘
    │        │
    v        v
┌──────┐ ┌──────────┐
│ New  │ │ Monolith │  <- Remaining traffic
│ Auth │ │          │
│(Cloud)│ │(On-Prem) │
└──────┘ └──────────┘

Phase 3: Migration Mostly Complete
┌────────────────┐
│ API Gateway    │
└┬──┬──┬──┬──┬───┘
 │  │  │  │  │
 v  v  v  v  v
┌─┐┌─┐┌─┐┌─┐┌────────┐
│A││B││C││D││Monolith│ <- Minimal traffic
└─┘└─┘└─┘└─┘└────────┘
 (Cloud services)

8.2 Blue-Green Cutover

# Kubernetes-based Blue-Green Cutover Configuration
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: web-app-migration
spec:
  replicas: 10
  strategy:
    blueGreen:
      activeService: web-app-active
      previewService: web-app-preview
      autoPromotionEnabled: false
      prePromotionAnalysis:
        templates:
          - templateName: migration-validation
        args:
          - name: service-name
            value: web-app-preview
      postPromotionAnalysis:
        templates:
          - templateName: post-migration-check
      scaleDownDelaySeconds: 3600
  selector:
    matchLabels:
      app: web-app
  template:
    metadata:
      labels:
        app: web-app
        version: cloud-native
    spec:
      containers:
        - name: web-app
          image: registry.example.com/web-app:v3.0-cloud
          resources:
            requests:
              cpu: "500m"
              memory: "512Mi"
            limits:
              cpu: "1000m"
              memory: "1Gi"

9. Multi-Cloud Networking

9.1 Cross-Cloud Connection Options

┌─────────────────────────────────────────────────────┐
│           Connection Options Comparison              │
├────────────────┬──────────┬──────────┬──────────────┤
│                │ Bandwidth│ Latency  │  Cost        │
├────────────────┼──────────┼──────────┼──────────────┤
│ Public Internet│ Variable │ High     │ Egress cost  │
│ VPN (IPsec)    │ 1-3 Gbps│ Medium   │ Low          │
│ Dedicated Link │ 10-100Gb│ Low      │ High (monthly)│
│ Megaport/Equinix│ Flexible│ Low      │ Medium       │
└────────────────┴──────────┴──────────┴──────────────┘

9.2 Transit Architecture

# Terraform - AWS Transit Gateway with VPN Setup
resource "aws_ec2_transit_gateway" "main" {
  description                     = "Multi-cloud transit gateway"
  default_route_table_association = "disable"
  default_route_table_propagation = "disable"
  auto_accept_shared_attachments  = "enable"

  tags = {
    Name = "multi-cloud-tgw"
  }
}

resource "aws_vpn_connection" "to_gcp" {
  customer_gateway_id = aws_customer_gateway.gcp.id
  transit_gateway_id  = aws_ec2_transit_gateway.main.id
  type                = "ipsec.1"
  static_routes_only  = false

  tunnel1_inside_cidr   = "169.254.10.0/30"
  tunnel2_inside_cidr   = "169.254.10.4/30"

  tags = {
    Name = "aws-to-gcp-vpn"
  }
}

resource "aws_customer_gateway" "gcp" {
  bgp_asn    = 65000
  ip_address = var.gcp_vpn_gateway_ip
  type       = "ipsec.1"

  tags = {
    Name = "gcp-customer-gateway"
  }
}

9.3 Service Mesh (Multi-Cloud)

# Istio Multi-Cluster Setup (Primary-Remote)
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
metadata:
  name: istio-primary
spec:
  values:
    global:
      meshID: multi-cloud-mesh
      multiCluster:
        clusterName: aws-cluster
      network: aws-network
    pilot:
      env:
        EXTERNAL_ISTIOD: "true"
  meshConfig:
    defaultConfig:
      proxyMetadata:
        ISTIO_META_DNS_CAPTURE: "true"
        ISTIO_META_DNS_AUTO_ALLOCATE: "true"
  components:
    ingressGateways:
      - name: istio-eastwestgateway
        label:
          istio: eastwestgateway
          topology.istio.io/network: aws-network
        enabled: true
        k8s:
          env:
            - name: ISTIO_META_REQUESTED_NETWORK_VIEW
              value: aws-network

10. Identity Federation

10.1 Multi-Cloud IAM Strategy

┌─────────────────────────────────────┐
│     Central Identity Provider       │
│    (Okta / Azure AD / Google)       │
└──────────┬──────────────────────────┘
           │ SAML / OIDC
     ┌─────┼──────┬───────┐
     v     v      v       v
  ┌─────┐┌─────┐┌──────┐┌─────────┐
  │ AWS ││ GCP ││Azure ││On-Prem  │
  │ IAM ││ IAM ││ AD   ││  LDAP   │
  └─────┘└─────┘└──────┘└─────────┘

10.2 OIDC-based Cross-Cloud Authentication

# Access GCP resources from AWS (Workload Identity Federation)
import google.auth
from google.auth import impersonated_credentials
import boto3

class CrossCloudAuth:
    def get_gcp_credentials_from_aws(self):
        # Verify current credentials from AWS STS
        sts = boto3.client("sts")
        aws_identity = sts.get_caller_identity()

        # Use GCP Workload Identity Federation
        # Exchange AWS credentials for GCP token
        credentials, project = google.auth.default(
            scopes=["https://www.googleapis.com/auth/cloud-platform"]
        )

        # Service account impersonation
        target_credentials = impersonated_credentials.Credentials(
            source_credentials=credentials,
            target_principal="cross-cloud@project.iam.gserviceaccount.com",
            target_scopes=["https://www.googleapis.com/auth/cloud-platform"],
            lifetime=3600
        )

        return target_credentials

    def setup_workload_identity_pool(self):
        """GCP Workload Identity Pool setup (gcloud CLI)"""
        commands = [
            # Create pool
            "gcloud iam workload-identity-pools create aws-pool "
            "--location=global "
            "--description='AWS Workload Identity Pool'",

            # Add AWS provider
            "gcloud iam workload-identity-pools providers create-aws aws-provider "
            "--location=global "
            "--workload-identity-pool=aws-pool "
            "--account-id=123456789012",

            # Bind service account
            "gcloud iam service-accounts add-iam-policy-binding "
            "cross-cloud@project.iam.gserviceaccount.com "
            "--role=roles/iam.workloadIdentityUser "
            "--member='principalSet://iam.googleapis.com/"
            "projects/PROJECT_NUM/locations/global/"
            "workloadIdentityPools/aws-pool/attribute.aws_role/"
            "arn:aws:sts::123456789012:assumed-role/my-role'"
        ]
        return commands

11. Cloud-Native Portability

11.1 Kubernetes-based Abstraction

# Multi-cloud Kubernetes Deployment (Helm values)
# values-aws.yaml
cloud:
  provider: aws
  region: us-east-1
  storageClass: gp3
  ingressClass: alb
  serviceAnnotations:
    service.beta.kubernetes.io/aws-load-balancer-type: nlb
    service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing

# values-gcp.yaml
cloud:
  provider: gcp
  region: us-central1
  storageClass: pd-ssd
  ingressClass: gce
  serviceAnnotations:
    cloud.google.com/neg: '{"ingress": true}'
    cloud.google.com/backend-config: '{"default": "backend-config"}'

# values-azure.yaml
cloud:
  provider: azure
  region: eastus
  storageClass: managed-premium
  ingressClass: azure-application-gateway
  serviceAnnotations:
    service.beta.kubernetes.io/azure-load-balancer-internal: "false"

11.2 Terraform Multi-Cloud Modules

# modules/compute/main.tf - Cloud Abstraction Layer
variable "cloud_provider" {
  type = string
  validation {
    condition     = contains(["aws", "gcp", "azure"], var.cloud_provider)
    error_message = "Supported providers: aws, gcp, azure"
  }
}

variable "instance_config" {
  type = object({
    name          = string
    cpu           = number
    memory_gb     = number
    disk_gb       = number
    os            = string
  })
}

# AWS Implementation
module "aws_compute" {
  source = "./aws"
  count  = var.cloud_provider == "aws" ? 1 : 0

  instance_type = local.aws_instance_map[
    "${var.instance_config.cpu}-${var.instance_config.memory_gb}"
  ]
  ami_id      = local.aws_ami_map[var.instance_config.os]
  volume_size = var.instance_config.disk_gb
  name        = var.instance_config.name
}

# GCP Implementation
module "gcp_compute" {
  source = "./gcp"
  count  = var.cloud_provider == "gcp" ? 1 : 0

  machine_type = local.gcp_machine_map[
    "${var.instance_config.cpu}-${var.instance_config.memory_gb}"
  ]
  image      = local.gcp_image_map[var.instance_config.os]
  disk_size  = var.instance_config.disk_gb
  name       = var.instance_config.name
}

# Azure Implementation
module "azure_compute" {
  source = "./azure"
  count  = var.cloud_provider == "azure" ? 1 : 0

  vm_size     = local.azure_vm_map[
    "${var.instance_config.cpu}-${var.instance_config.memory_gb}"
  ]
  image_ref   = local.azure_image_map[var.instance_config.os]
  disk_size   = var.instance_config.disk_gb
  name        = var.instance_config.name
}

output "instance_id" {
  value = coalesce(
    try(module.aws_compute[0].instance_id, ""),
    try(module.gcp_compute[0].instance_id, ""),
    try(module.azure_compute[0].instance_id, "")
  )
}

11.3 OCI Container Image Strategy

# Multi-stage Build - Cloud-independent Image
FROM golang:1.22-alpine AS builder

WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download

COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -o /app/server ./cmd/server

# Final Image - distroless (cloud-agnostic)
FROM gcr.io/distroless/static-debian12:nonroot

COPY --from=builder /app/server /server
COPY --from=builder /app/config /config

EXPOSE 8080
USER nonroot:nonroot
ENTRYPOINT ["/server"]

12. Multi-Cloud Disaster Recovery (DR)

12.1 DR Strategy Comparison

DR Strategy	RTO	RPO	Cost	Complexity
Backup and Restore	Hours	Hours	Low	Low
Pilot Light	10-30 min	Minutes	Medium	Medium
Warm Standby	Minutes	Seconds	High	High
Active-Active	Seconds	Near zero	Very high	Very high

12.2 Multi-Cloud DR Implementation

# Multi-Cloud DR Orchestrator
class MultiCloudDR:
    def __init__(self):
        self.primary = AWSProvider(region="us-east-1")
        self.secondary = AzureProvider(region="eastus")
        self.health_checker = HealthChecker()

    def execute_failover(self):
        """Failover to Secondary (Azure) when Primary (AWS) fails"""
        steps = [
            self._verify_secondary_health,
            self._promote_database_replica,
            self._update_dns_records,
            self._scale_up_secondary,
            self._verify_application_health,
            self._notify_stakeholders
        ]

        for step in steps:
            result = step()
            if not result.success:
                self._rollback_failover(result.step_index)
                raise FailoverError(f"Failed at step: {step.__name__}")

    def _promote_database_replica(self):
        """Promote Azure SQL read replica to primary"""
        self.secondary.promote_replica(
            server="dr-sql-server",
            database="production-db",
            failover_group="prod-failover-group"
        )

    def _update_dns_records(self):
        """Update Route 53 / Azure DNS records"""
        self.primary.update_dns(
            zone="example.com",
            record="api.example.com",
            target=self.secondary.get_endpoint(),
            ttl=60
        )

    def continuous_replication(self):
        """Continuous data synchronization"""
        replication_config = {
            "database": {
                "type": "async",
                "lag_threshold_seconds": 30,
                "source": "aws-rds-primary",
                "target": "azure-sql-replica"
            },
            "storage": {
                "type": "incremental",
                "interval_minutes": 15,
                "source": "s3://prod-bucket",
                "target": "azure://prod-container"
            },
            "secrets": {
                "type": "sync",
                "source": "aws-secrets-manager",
                "target": "azure-key-vault"
            }
        }
        return replication_config

13. Multi-Cloud Cost Management

13.1 Unified Cost Monitoring

# Multi-Cloud Cost Dashboard Collector
class MultiCloudCostCollector:
    def collect_all_costs(self, period="monthly"):
        aws_costs = self._get_aws_costs(period)
        gcp_costs = self._get_gcp_costs(period)
        azure_costs = self._get_azure_costs(period)

        return {
            "total": aws_costs["total"] + gcp_costs["total"] + azure_costs["total"],
            "by_provider": {
                "aws": aws_costs,
                "gcp": gcp_costs,
                "azure": azure_costs
            },
            "by_service": self._aggregate_by_service(
                aws_costs, gcp_costs, azure_costs
            ),
            "by_team": self._aggregate_by_tag(
                "team", aws_costs, gcp_costs, azure_costs
            ),
            "anomalies": self._detect_anomalies(
                aws_costs, gcp_costs, azure_costs
            ),
            "recommendations": self._generate_optimization_recommendations(
                aws_costs, gcp_costs, azure_costs
            )
        }

    def _detect_anomalies(self, *provider_costs):
        """Detect cost anomalies"""
        anomalies = []
        for costs in provider_costs:
            for service, cost in costs["by_service"].items():
                avg = cost.get("rolling_avg_30d", 0)
                current = cost.get("current", 0)
                if avg > 0 and current > avg * 1.5:
                    anomalies.append({
                        "service": service,
                        "provider": costs["provider"],
                        "current": current,
                        "average": avg,
                        "increase_pct": (current - avg) / avg * 100
                    })
        return anomalies

13.2 Cost Optimization Strategies

Strategy	Savings	Applicable To
Reserved/Committed Use	30-60%	Stable workloads
Spot/Preemptible Instances	60-90%	Batch, testing
Auto Scaling	20-40%	Variable traffic
Right-Sizing	15-35%	All workloads
Storage Tiering	40-70%	Archive data
Network Optimization	10-30%	Cross-cloud communication

14. Governance and Compliance

14.1 Multi-Cloud Governance Framework

# Open Policy Agent (OPA) - Multi-Cloud Policy
package multicloud.governance

# Require mandatory tags on all resources
required_tags := ["environment", "team", "cost-center", "data-classification"]

deny[msg] {
  resource := input.resource
  tag := required_tags[_]
  not resource.tags[tag]
  msg := sprintf(
    "Resource %v is missing required tag: %v",
    [resource.name, tag]
  )
}

# Data sovereignty - certain data only in specific regions
deny[msg] {
  resource := input.resource
  resource.tags["data-classification"] == "pii-eu"
  not startswith(resource.region, "eu-")
  not startswith(resource.region, "europe")
  msg := sprintf(
    "EU PII data must be stored in EU region. Resource %v in %v",
    [resource.name, resource.region]
  )
}

# Prevent cost limit overruns
deny[msg] {
  resource := input.resource
  resource.type == "compute_instance"
  resource.monthly_cost > 5000
  not resource.tags["approved-high-cost"] == "true"
  msg := sprintf(
    "Instance %v exceeds $5000/month limit. Get approval first.",
    [resource.name]
  )
}

# Mandatory encryption
deny[msg] {
  resource := input.resource
  resource.type == "storage_bucket"
  not resource.encryption.enabled
  msg := sprintf(
    "Storage %v must have encryption enabled",
    [resource.name]
  )
}

15. Practice Quiz

Q1. What are the key differences between Active-Active and Active-Passive patterns in multi-cloud architecture, and when is each appropriate?

Active-Active: Both clouds simultaneously process traffic. Provides maximum availability but has high data synchronization complexity and cost. Best for mission-critical and global services.

Active-Passive: Only the primary cloud processes traffic while the secondary stands by for DR. Cost is reasonable but some downtime occurs during failover. Best for high availability requirements with cost sensitivity.

The key difference is simultaneous processing and failover time (RTO). Active-Active has near-zero RTO, while Active-Passive has an RTO of several minutes.

Q2. Explain the difference between Replatform and Refactor in the 6R migration strategy and provide suitable use cases for each.

Replatform: Maintain core architecture while replacing parts with cloud-managed services. For example, migrating self-managed MySQL to Amazon RDS, or replacing self-managed Redis with ElastiCache. Moderate effort yields quick operational benefits.

Refactor: Complete redesign for cloud-native. Decompose monoliths into microservices or transition to serverless architecture. Requires the highest effort but delivers maximum cloud benefits long-term.

Replatform is suitable when quick wins are needed; Refactor when long-term innovation is the goal.

Q3. What is Workload Identity Federation, and why is it more secure than service account keys?

Workload Identity Federation is a mechanism that exchanges external IdP credentials (AWS IAM, Azure AD, etc.) for temporary GCP tokens.

Why it is more secure than service account keys:

No key management: No need to create/distribute/rotate JSON key files
Temporary tokens: Exchanged tokens auto-expire after 1 hour
Least privilege: Fine-grained access control based on specific attributes (roles, tags)
Easy auditing: All token exchanges recorded in Cloud Audit Logs
Reduced leak risk: No long-lived credentials exist to be leaked

Q4. Describe the key steps and considerations when migrating a monolith to microservices using the Strangler Fig pattern.

Key Steps:

Add proxy/API Gateway: Route all traffic through the gateway
Identify capabilities: Identify bounded contexts that can be separated
Incremental extraction: Extract one microservice at a time, change routing at the gateway
Data separation: Separate from shared DB to per-service databases
Shrink the monolith: Retire the monolith after all capabilities are extracted

Considerations:

Database separation is the hardest part -- watch for transactional consistency
Avoid extracting too many services at once
Carefully decide inter-service communication patterns (sync/async)
Build monitoring/observability before starting migration
Rollback plan is essential

Q5. Suggest at least 3 strategies to optimize data egress costs in a multi-cloud environment.

Data Locality Design: Place processing engines in the cloud where data resides. Move compute, not data
CDN Utilization: Cache outbound traffic using CDN to reduce origin egress
Compression and Protocol Optimization: Reduce transmitted data size using efficient serialization like gRPC and Protobuf
Dedicated Connections: Direct Connect, ExpressRoute, etc. offer lower egress costs compared to internet transfer
Async Batch Transfer: Instead of real-time transfer, batch data and send during off-peak hours
Private Peering: Direct peering between clouds through neutral exchange points like Megaport or Equinix Fabric

Multi-Cloud 전략 & 마이그레이션 완전 가이드 2025: AWS/GCP/Azure 비교, 하이브리드 클라우드

목차

1. 왜 Multi-Cloud인가?

1.1 벤더 락인(Vendor Lock-in) 리스크

1.2 Multi-Cloud를 선택하는 4가지 이유

1.3 Multi-Cloud의 현실적 과제

2. AWS vs GCP vs Azure: 서비스 매핑

2.1 컴퓨팅 서비스 비교

2.2 스토리지 서비스 비교

2.3 데이터베이스 서비스 비교

2.4 네트워킹 서비스 비교

2.5 AI/ML 서비스 비교

2.6 데이터 분석 비교

3. 멀티클라우드 아키텍처 패턴

3.1 Active-Active 패턴

3.2 Active-Passive 패턴

3.3 Cloud Bursting 패턴

3.4 Arbitrage(중재) 패턴

4. 하이브리드 클라우드 솔루션

4.1 Google Anthos

4.2 Azure Arc

4.3 AWS EKS Anywhere

4.4 Red Hat OpenShift (하이브리드)

5. 클라우드 마이그레이션 6R 전략

5.1 6R 개요와 의사결정 트리

5.2 각 전략 상세

5.3 마이그레이션 평가 스코어카드

6. 마이그레이션 계획 수립

6.1 디스커버리와 의존성 매핑

6.2 TCO (Total Cost of Ownership) 분석

6.3 마이그레이션 웨이브 계획

7. 데이터 마이그레이션 전략

7.1 온라인 vs 오프라인 마이그레이션

7.2 데이터베이스 마이그레이션

7.3 스토리지 전송

8. 애플리케이션 마이그레이션 패턴

8.1 Strangler Fig 패턴

8.2 Blue-Green 전환

9. 멀티클라우드 네트워킹

9.1 클라우드 간 연결 옵션

9.2 Transit 아키텍처

9.3 서비스 메시 (멀티클라우드)

10. 아이덴티티 페더레이션

10.1 멀티클라우드 IAM 전략

10.2 OIDC 기반 클라우드 간 인증

11. 클라우드 네이티브 이식성

11.1 Kubernetes 기반 추상화

11.2 Terraform 멀티클라우드 모듈

11.3 OCI 컨테이너 이미지 전략

12. 멀티클라우드 재해 복구(DR)

12.1 DR 전략 비교

12.2 멀티클라우드 DR 구현

13. 멀티클라우드 비용 관리

13.1 통합 비용 모니터링

13.2 비용 최적화 전략

14. 거버넌스와 컴플라이언스

14.1 멀티클라우드 거버넌스 프레임워크

15. 실전 퀴즈

참고 자료

Multi-Cloud Strategy & Migration Complete Guide 2025: AWS/GCP/Azure Comparison, Hybrid Cloud

Table of Contents

1. Why Multi-Cloud?

1.1 Vendor Lock-in Risk

1.2 Four Reasons to Choose Multi-Cloud

1.3 Realistic Challenges of Multi-Cloud

2. AWS vs GCP vs Azure: Service Mapping

2.1 Compute Services Comparison

2.2 Storage Services Comparison

2.3 Database Services Comparison

2.4 Networking Services Comparison

2.5 AI/ML Services Comparison

2.6 Data Analytics Comparison

3. Multi-Cloud Architecture Patterns

3.1 Active-Active Pattern

3.2 Active-Passive Pattern

3.3 Cloud Bursting Pattern

3.4 Arbitrage Pattern

4. Hybrid Cloud Solutions

4.1 Google Anthos

4.2 Azure Arc