Skip to content

✍️ 필사 모드: Multi-Cloud Strategy & Migration Complete Guide 2025: AWS/GCP/Azure Comparison, Hybrid Cloud

English
0%
정확도 0%
💡 왼쪽 원문을 읽으면서 오른쪽에 따라 써보세요. Tab 키로 힌트를 받을 수 있습니다.

Table of Contents

1. Why Multi-Cloud?

As the cloud market matures, concentrating all workloads on a single cloud provider becomes increasingly risky. As of 2025, approximately 89% of Fortune 500 companies have adopted multi-cloud strategies, using an average of 2.6 public clouds.

1.1 Vendor Lock-in Risk

Key risks of depending on a single cloud:

  • Loss of pricing leverage: No alternative means being subject to provider price increases
  • Service disruption risk: Major cloud providers averaged 4.2 large-scale outages each in 2024
  • Technology dependency: Migration costs grow exponentially when using proprietary services (AWS Lambda, Azure Functions, etc.)
  • Regulatory changes: Inability to respond flexibly to data sovereignty laws and regulatory changes

1.2 Four Reasons to Choose Multi-Cloud

Best-of-Breed Strategy: Leverage each cloud's strengths.

  • AWS: Broadest service portfolio, enterprise ecosystem
  • GCP: Data analytics (BigQuery), AI/ML (Vertex AI), Kubernetes (GKE)
  • Azure: Enterprise integration (Active Directory, Office 365), hybrid (Azure Arc)

Compliance: Meet industry/regional data requirements

  • Finance: Mandatory storage of certain data in domestic regions
  • Healthcare: Select HIPAA-compliant services
  • Public sector: Government-specific cloud regions

Disaster Recovery (DR): Protection against provider-level failures

  • Single-cloud DR: Cross-region replication (within same provider)
  • Multi-cloud DR: Cross-provider replication (failover from AWS to GCP if AWS fails)

Cost Optimization: Choose optimal pricing per workload

  • Leverage spot/preemptible instance price differences
  • Strategically distribute committed use discounts (Reserved/Committed Use)
  • Compare data egress costs for optimal placement

1.3 Realistic Challenges of Multi-Cloud

Multi-cloud is not a silver bullet. Challenges you must consider:

ChallengeDescriptionMitigation Strategy
Increased complexity2-3x operational overheadIaC, unified management platforms
Talent shortageExperts needed for each cloudAbstraction layers, training investment
Network costsCross-cloud data transfer costsData locality design
Security integrationDifferent IAM/security modelsZero Trust, unified IdP
ConsistencyBehavioral differences across servicesStandardized abstraction layers

2. AWS vs GCP vs Azure: Service Mapping

2.1 Compute Services Comparison

CategoryAWSGCPAzure
Virtual MachinesEC2Compute EngineVirtual Machines
Container OrchestrationEKSGKEAKS
Serverless ContainersFargateCloud RunContainer Apps
Serverless FunctionsLambdaCloud FunctionsAzure Functions
Batch ProcessingAWS BatchCloud BatchAzure Batch
App PlatformElastic BeanstalkApp EngineApp Service
VMware IntegrationVMware Cloud on AWSGoogle Cloud VMware EngineAzure VMware Solution

2.2 Storage Services Comparison

CategoryAWSGCPAzure
Object StorageS3Cloud StorageBlob Storage
Block StorageEBSPersistent DiskManaged Disks
File StorageEFSFilestoreAzure Files
ArchiveS3 GlacierArchive StorageArchive Storage
Hybrid StorageStorage GatewayTransfer ApplianceStorSimple

2.3 Database Services Comparison

CategoryAWSGCPAzure
Relational DBRDS, AuroraCloud SQL, AlloyDBAzure SQL, MySQL/PostgreSQL
NoSQL DocumentDynamoDBFirestoreCosmos DB
In-MemoryElastiCacheMemorystoreAzure Cache for Redis
Graph DBNeptune-Cosmos DB (Gremlin)
Time-Series DBTimestream-Azure Data Explorer
Global Distributed DBAurora Global, DynamoDB Global TablesSpannerCosmos DB

2.4 Networking Services Comparison

CategoryAWSGCPAzure
Virtual NetworkVPCVPCVNet
Load BalancerALB/NLB/GLBCloud Load BalancingAzure Load Balancer/App Gateway
CDNCloudFrontCloud CDNAzure CDN/Front Door
DNSRoute 53Cloud DNSAzure DNS
Dedicated ConnectionDirect ConnectCloud InterconnectExpressRoute
VPNSite-to-Site VPNCloud VPNVPN Gateway
Service MeshApp MeshTraffic Director-

2.5 AI/ML Services Comparison

CategoryAWSGCPAzure
ML PlatformSageMakerVertex AIAzure ML
LLM ServiceBedrockGemini API, Model GardenAzure OpenAI Service
NLPComprehendNatural Language AICognitive Services
Image AnalysisRekognitionVision AIComputer Vision
SpeechTranscribe/PollySpeech-to-Text/Text-to-SpeechSpeech Services
RecommendationPersonalizeRecommendations AIPersonalizer

2.6 Data Analytics Comparison

CategoryAWSGCPAzure
Data WarehouseRedshiftBigQuerySynapse Analytics
Stream ProcessingKinesisDataflowStream Analytics
ETL/ELTGlueDataflow, DataprocData Factory
Data CatalogGlue Data CatalogData CatalogPurview
BI ToolQuickSightLookerPower BI

3. Multi-Cloud Architecture Patterns

3.1 Active-Active Pattern

Process traffic simultaneously across two or more clouds.

                    ┌─────────────────────┐
Global DNS / LB                      (Route 53 / CF)                    └──────────┬──────────┘
                         ┌─────┴─────┐
                         │           │
                    ┌────▼────┐ ┌────▼────┐
AWS    │ │  GCPRegion  │ │ Region                    │         │ │         │
                    │ ┌─────┐ │ │ ┌─────┐ │
                    │ │ K8s │ │ │ │ GKE │ │
                    │ │ EKS │ │ │ │     │ │
                    │ └──┬──┘ │ │ └──┬──┘ │
                    │    │    │ │    │    │
                    │ ┌──▼──┐ │ │ ┌──▼──┐ │
                    │ │ DB  │◄┼─┼─► DB  │ │
                    │ │ RDS │ │ │ │ SQL │ │
                    │ └─────┘ │ │ └─────┘ │
                    └─────────┘ └─────────┘

Pros: Maximum availability, zero-downtime failover Cons: Data synchronization complexity, high cost Best for: Mission-critical services, global services

3.2 Active-Passive Pattern

Primary cloud processes traffic while secondary cloud stands by for DR.

                    ┌────────────────────┐
Global DNS                    └────────┬───────────┘
                    ┌────────▼────────┐          ┌────────────────┐
AWS (Active)Repl ──►│  Azure (Passive)                    │                 │          │                │
                    │  ┌───────────┐  │          │  ┌───────────┐ │
                    │  │ Workloads │  │          │  │ Standby   │ │
                    │  └───────────┘  │          │  └───────────┘ │
                    │  ┌───────────┐  │          │  ┌───────────┐ │
                    │  │ Database  │──┼──────────┼─►│ Replica   │ │
                    │  └───────────┘  │          │  └───────────┘ │
                    └─────────────────┘          └────────────────┘

Pros: Reasonable cost, cloud-level DR Cons: Some downtime during failover, passive resource cost Best for: High availability requirements with cost sensitivity

3.3 Cloud Bursting Pattern

Normally processes on-premises/primary cloud, bursts to secondary cloud during peak.

# Kubernetes Federation - Cloud Bursting Example
apiVersion: types.kubefed.io/v1beta1
kind: FederatedDeployment
metadata:
  name: web-app
  namespace: production
spec:
  template:
    spec:
      replicas: 10
      containers:
        - name: web
          image: registry.example.com/web-app:v2.1
  placement:
    clusters:
      - name: on-prem-cluster
        weight: 70
      - name: aws-eks-cluster
        weight: 20
      - name: gcp-gke-cluster
        weight: 10
  overrides:
    - clusterName: aws-eks-cluster
      clusterOverrides:
        - path: "/spec/replicas"
          value: 5

3.4 Arbitrage Pattern

Place workloads on the most cost-effective cloud based on characteristics.

# Cloud cost comparison and auto-placement example
class CloudArbitrage:
    def __init__(self):
        self.providers = {
            "aws": AWSProvider(),
            "gcp": GCPProvider(),
            "azure": AzureProvider()
        }

    def find_optimal_placement(self, workload):
        costs = {}
        for name, provider in self.providers.items():
            cost = provider.estimate_cost(
                cpu=workload.cpu_cores,
                memory_gb=workload.memory_gb,
                storage_gb=workload.storage_gb,
                gpu=workload.gpu_type,
                duration_hours=workload.expected_duration,
                region=workload.preferred_region
            )
            costs[name] = cost

        # Calculate performance-to-cost ratio
        scores = {}
        for name, cost in costs.items():
            perf = self.providers[name].benchmark_score(workload.type)
            scores[name] = perf / cost

        best = max(scores, key=scores.get)
        return best, costs[best], scores[best]

    def auto_schedule(self, workloads):
        placements = []
        for wl in workloads:
            provider, cost, score = self.find_optimal_placement(wl)
            placements.append({
                "workload": wl.name,
                "provider": provider,
                "estimated_cost": cost,
                "efficiency_score": score
            })
        return placements

4. Hybrid Cloud Solutions

4.1 Google Anthos

Anthos provides the same Kubernetes environment across on-premises, AWS, and Azure, based on GKE.

# Anthos Config Management - Multi-cluster Setup
apiVersion: configmanagement.gke.io/v1
kind: ConfigManagement
metadata:
  name: config-management
spec:
  clusterName: production-cluster
  git:
    syncRepo: https://github.com/org/anthos-config
    syncBranch: main
    secretType: ssh
    policyDir: "policies"
  policyController:
    enabled: true
    templateLibraryInstalled: true
    referentialRulesEnabled: true
  hierarchyController:
    enabled: true
    enablePodTreeLabels: true
    enableHierarchicalResourceQuotas: true

Anthos Core Components:

  • Anthos on GKE: Managed Kubernetes on GCP
  • Anthos on VMware: GKE running on on-premises vSphere
  • Anthos on AWS: Anthos managed clusters on AWS
  • Anthos on Azure: Anthos managed clusters on Azure
  • Anthos Config Management: GitOps-based multi-cluster policy management
  • Anthos Service Mesh: Unified Istio-based service mesh management

4.2 Azure Arc

Azure Arc extends the Azure management plane to on-premises, AWS, GCP, and beyond.

# Register Kubernetes cluster with Azure Arc
az connectedk8s connect \
  --name production-eks \
  --resource-group multi-cloud-rg \
  --location eastus \
  --tags "environment=production" "cloud=aws"

# Deploy Arc-enabled Data Services
az arcdata dc create \
  --name arc-dc \
  --k8s-namespace arc \
  --connectivity-mode indirect \
  --resource-group multi-cloud-rg \
  --location eastus \
  --storage-class managed-premium \
  --profile-name azure-arc-kubeadm

# Arc-enabled SQL Managed Instance
az sql mi-arc create \
  --name sql-prod \
  --resource-group multi-cloud-rg \
  --location eastus \
  --storage-class-data managed-premium \
  --storage-class-logs managed-premium \
  --cores-limit 8 \
  --memory-limit 32Gi \
  --k8s-namespace arc

4.3 AWS EKS Anywhere

EKS Anywhere runs the same Kubernetes as AWS EKS on-premises.

# EKS Anywhere Cluster Configuration
apiVersion: anywhere.eks.amazonaws.com/v1alpha1
kind: Cluster
metadata:
  name: prod-cluster
spec:
  clusterNetwork:
    cniConfig:
      cilium: {}
    pods:
      cidrBlocks:
        - "192.168.0.0/16"
    services:
      cidrBlocks:
        - "10.96.0.0/12"
  controlPlaneConfiguration:
    count: 3
    endpoint:
      host: "10.0.0.100"
    machineGroupRef:
      kind: VSphereMachineConfig
      name: cp-machines
  workerNodeGroupConfigurations:
    - count: 5
      machineGroupRef:
        kind: VSphereMachineConfig
        name: worker-machines
      name: md-0
  kubernetesVersion: "1.29"
  managementCluster:
    name: prod-cluster

4.4 Red Hat OpenShift (Hybrid)

OpenShift provides a consistent Kubernetes platform across all major clouds and on-premises.

# OpenShift Multi-Cluster Hub Configuration
apiVersion: operator.open-cluster-management.io/v1
kind: MultiClusterHub
metadata:
  name: multiclusterhub
  namespace: open-cluster-management
spec:
  availabilityConfig: High
  enableClusterBackup: true
  overrides:
    components:
      - name: cluster-lifecycle
        enabled: true
      - name: cluster-backup
        enabled: true
      - name: multicluster-engine
        enabled: true
      - name: grc
        enabled: true
      - name: app-lifecycle
        enabled: true

5. Cloud Migration 6R Strategy

5.1 6R Overview and Decision Tree

When analyzing workloads for migration, choose one of six strategies (6R).

Start Workload Analysis
       |
       v
  Has business value? --- No ---> Retire
       | Yes
       v
  Needs changes? --- No ---> Retain
       | Yes
       v
  SaaS replacement available? --- Yes ---> Repurchase
       | No
       v
  Architecture change needed? --- No ---> Rehost
       | Yes                               or Replatform
       v
  Full redesign needed? --- No ---> Replatform
       | Yes
       v
  Refactor

5.2 Detailed Strategy Descriptions

Rehost (Lift and Shift):

  • Move existing applications as-is to cloud VMs
  • Fastest and lowest risk
  • Limited cloud-native benefits
  • Use AWS Application Migration Service, Azure Migrate, Google Migrate for Compute Engine

Replatform (Lift and Reshape):

  • Maintain core architecture while leveraging cloud services
  • Example: Migrate self-managed MySQL to RDS/Cloud SQL
  • Moderate effort, immediate operational benefits

Refactor (Re-architect):

  • Complete redesign for cloud-native
  • Microservices, serverless, containerization
  • Highest effort, highest cloud benefits
  • Long-term cost savings and scalability

Repurchase (Replace):

  • Replace with SaaS products
  • Example: Self-managed email server to Office 365/Google Workspace
  • Complete elimination of operational burden

Retire:

  • Remove workloads no longer needed
  • Typically 10-20% of total workloads are retirement candidates

Retain:

  • Workloads not yet ready for migration
  • Legacy dependencies, regulatory requirements, etc.

5.3 Migration Assessment Scorecard

# Migration strategy decision automation example
class MigrationAssessor:
    def assess_workload(self, workload):
        score = {
            "business_value": self._rate_business_value(workload),
            "technical_complexity": self._rate_complexity(workload),
            "cloud_readiness": self._rate_cloud_readiness(workload),
            "data_sensitivity": self._rate_data_sensitivity(workload),
            "dependency_count": len(workload.dependencies),
            "team_skill_level": self._rate_team_skills(workload.team)
        }

        # Strategy recommendation logic
        if score["business_value"] < 3:
            return "Retire"
        if score["cloud_readiness"] < 2:
            return "Retain"
        if workload.has_saas_alternative and score["technical_complexity"] > 7:
            return "Repurchase"
        if score["technical_complexity"] < 4 and score["cloud_readiness"] > 6:
            return "Rehost"
        if score["technical_complexity"] < 7:
            return "Replatform"
        return "Refactor"

    def generate_report(self, workloads):
        results = {}
        for wl in workloads:
            strategy = self.assess_workload(wl)
            if strategy not in results:
                results[strategy] = []
            results[strategy].append({
                "name": wl.name,
                "estimated_effort_weeks": self._estimate_effort(wl, strategy),
                "estimated_cost": self._estimate_cost(wl, strategy),
                "risk_level": self._assess_risk(wl, strategy)
            })
        return results

6. Migration Planning

6.1 Discovery and Dependency Mapping

Accurately understanding your current environment before migration is key.

# Using AWS Application Discovery Service
aws discovery start-continuous-export

# Agent-based collection
aws discovery start-data-collection-by-agent-ids \
  --agent-ids agent-001 agent-002 agent-003

# Query server dependency map
aws discovery describe-agents \
  --filters name=hostName,values=web-server-*,condition=CONTAINS

# Generate migration plan from collected data
aws migrationhub-strategy create-assessment \
  --s3bucket migration-data \
  --s3key discovery-export.csv

6.2 TCO (Total Cost of Ownership) Analysis

# TCO comparison analysis framework
class TCOAnalysis:
    def calculate_on_prem_tco(self, infra):
        annual_costs = {
            "hardware": infra.server_count * 8000 / 3,  # 3-year depreciation
            "software_licenses": infra.license_costs,
            "datacenter": infra.rack_units * 1200,  # Power, cooling, space
            "network": infra.bandwidth_gbps * 500,
            "personnel": infra.fte_count * 120000,
            "maintenance": infra.server_count * 2400,
            "disaster_recovery": infra.dr_cost_annual,
            "security": infra.security_cost_annual,
            "compliance": infra.compliance_cost_annual
        }
        return sum(annual_costs.values()), annual_costs

    def calculate_cloud_tco(self, workloads, provider="aws"):
        annual_costs = {
            "compute": self._estimate_compute(workloads, provider),
            "storage": self._estimate_storage(workloads, provider),
            "network": self._estimate_network(workloads, provider),
            "managed_services": self._estimate_managed(workloads, provider),
            "personnel": workloads.cloud_fte * 130000,
            "migration_amortized": workloads.migration_cost / 3,
            "training": workloads.team_size * 5000,
            "tools": workloads.tool_licenses
        }
        return sum(annual_costs.values()), annual_costs

    def compare(self, infra, workloads):
        on_prem_total, on_prem_detail = self.calculate_on_prem_tco(infra)
        cloud_totals = {}
        for provider in ["aws", "gcp", "azure"]:
            total, detail = self.calculate_cloud_tco(workloads, provider)
            cloud_totals[provider] = {
                "total": total,
                "detail": detail,
                "savings_pct": (on_prem_total - total) / on_prem_total * 100
            }
        return {
            "on_prem": {"total": on_prem_total, "detail": on_prem_detail},
            "cloud": cloud_totals
        }

6.3 Migration Wave Planning

Large-scale migrations are executed in waves (phases).

WaveTargetStrategyDurationRisk
Wave 0Pilot (2-3 non-critical apps)Rehost4 weeksLow
Wave 1Web frontends, static sitesRehost/Replatform6 weeksLow
Wave 2API servers, microservicesReplatform/Refactor8 weeksMedium
Wave 3Databases, storageReplatform6 weeksHigh
Wave 4Legacy monolithsRefactor12 weeksHigh
Wave 5Final cutover, cleanup-4 weeksMedium

7. Data Migration Strategies

7.1 Online vs Offline Migration

Online Migration (Network transfer):

  • Suitable for data under 100TB
  • Dedicated connections (Direct Connect/ExpressRoute) recommended
  • Incremental sync possible

Offline Migration (Physical transfer):

  • AWS Snowball / Snowball Edge: Up to 80TB/device
  • AWS Snowmobile: Petabyte scale
  • Azure Data Box: Up to 100TB
  • Google Transfer Appliance: Up to 300TB

7.2 Database Migration

# AWS DMS (Database Migration Service) Task Configuration
# Source: On-premises Oracle, Target: Amazon Aurora PostgreSQL
Resources:
  DMSReplicationTask:
    Type: AWS::DMS::ReplicationTask
    Properties:
      MigrationType: full-load-and-cdc
      SourceEndpointArn: !Ref OracleSourceEndpoint
      TargetEndpointArn: !Ref AuroraTargetEndpoint
      ReplicationInstanceArn: !Ref DMSReplicationInstance
      TableMappings: |
        {
          "rules": [
            {
              "rule-type": "selection",
              "rule-id": "1",
              "rule-name": "select-all-tables",
              "object-locator": {
                "schema-name": "PROD_SCHEMA",
                "table-name": "%"
              },
              "rule-action": "include"
            },
            {
              "rule-type": "transformation",
              "rule-id": "2",
              "rule-name": "lowercase-schema",
              "rule-action": "convert-lowercase",
              "rule-target": "schema",
              "object-locator": {
                "schema-name": "PROD_SCHEMA"
              }
            }
          ]
        }
      ReplicationTaskSettings: |
        {
          "TargetMetadata": {
            "SupportLobs": true,
            "FullLobMode": false,
            "LobChunkSize": 64
          },
          "FullLoadSettings": {
            "TargetTablePrepMode": "DROP_AND_CREATE",
            "MaxFullLoadSubTasks": 8
          },
          "Logging": {
            "EnableLogging": true,
            "LogComponents": [
              { "Id": "SOURCE_UNLOAD", "Severity": "LOGGER_SEVERITY_DEFAULT" },
              { "Id": "TARGET_LOAD", "Severity": "LOGGER_SEVERITY_DEFAULT" }
            ]
          }
        }

7.3 Storage Transfer

# Transfer storage from AWS to GCP
gcloud transfer jobs create \
  --source-agent-pool=transfer-pool \
  --source-s3-bucket=my-aws-bucket \
  --source-s3-region=us-east-1 \
  --destination-gcs-bucket=my-gcp-bucket \
  --schedule-starts=2025-01-15T00:00:00Z \
  --schedule-repeats-every=24h \
  --include-prefixes="data/,backups/" \
  --exclude-prefixes="temp/,logs/"

8. Application Migration Patterns

8.1 Strangler Fig Pattern

Gradually replace an existing monolith with microservices.

Phase 1: Add Proxy Layer
┌────────────────┐
API Gateway /Load Balancer└───────┬────────┘
        v
┌────────────────┐
Monolith<- All traffic
  (On-Prem)└────────────────┘

Phase 2: Migrate Some Functions
┌────────────────┐
API Gateway└───┬────────┬───┘
    │        │
    v        v
┌──────┐ ┌──────────┐
New  │ │ Monolith<- Remaining traffic
Auth │ │          │
(Cloud)(On-Prem)└──────┘ └──────────┘

Phase 3: Migration Mostly Complete
┌────────────────┐
API Gateway└┬──┬──┬──┬──┬───┘
 │  │  │  │  │
 v  v  v  v  v
┌─┐┌─┐┌─┐┌─┐┌────────┐
A││B││C││D││Monolith│ <- Minimal traffic
└─┘└─┘└─┘└─┘└────────┘
 (Cloud services)

8.2 Blue-Green Cutover

# Kubernetes-based Blue-Green Cutover Configuration
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: web-app-migration
spec:
  replicas: 10
  strategy:
    blueGreen:
      activeService: web-app-active
      previewService: web-app-preview
      autoPromotionEnabled: false
      prePromotionAnalysis:
        templates:
          - templateName: migration-validation
        args:
          - name: service-name
            value: web-app-preview
      postPromotionAnalysis:
        templates:
          - templateName: post-migration-check
      scaleDownDelaySeconds: 3600
  selector:
    matchLabels:
      app: web-app
  template:
    metadata:
      labels:
        app: web-app
        version: cloud-native
    spec:
      containers:
        - name: web-app
          image: registry.example.com/web-app:v3.0-cloud
          resources:
            requests:
              cpu: "500m"
              memory: "512Mi"
            limits:
              cpu: "1000m"
              memory: "1Gi"

9. Multi-Cloud Networking

9.1 Cross-Cloud Connection Options

┌─────────────────────────────────────────────────────┐
Connection Options Comparison├────────────────┬──────────┬──────────┬──────────────┤
│                │ Bandwidth│ LatencyCost├────────────────┼──────────┼──────────┼──────────────┤
Public Internet│ VariableHighEgress cost  │
VPN (IPsec)1-3 Gbps│ MediumLowDedicated Link10-100Gb│ LowHigh (monthly)Megaport/Equinix│ Flexible│ LowMedium└────────────────┴──────────┴──────────┴──────────────┘

9.2 Transit Architecture

# Terraform - AWS Transit Gateway with VPN Setup
resource "aws_ec2_transit_gateway" "main" {
  description                     = "Multi-cloud transit gateway"
  default_route_table_association = "disable"
  default_route_table_propagation = "disable"
  auto_accept_shared_attachments  = "enable"

  tags = {
    Name = "multi-cloud-tgw"
  }
}

resource "aws_vpn_connection" "to_gcp" {
  customer_gateway_id = aws_customer_gateway.gcp.id
  transit_gateway_id  = aws_ec2_transit_gateway.main.id
  type                = "ipsec.1"
  static_routes_only  = false

  tunnel1_inside_cidr   = "169.254.10.0/30"
  tunnel2_inside_cidr   = "169.254.10.4/30"

  tags = {
    Name = "aws-to-gcp-vpn"
  }
}

resource "aws_customer_gateway" "gcp" {
  bgp_asn    = 65000
  ip_address = var.gcp_vpn_gateway_ip
  type       = "ipsec.1"

  tags = {
    Name = "gcp-customer-gateway"
  }
}

9.3 Service Mesh (Multi-Cloud)

# Istio Multi-Cluster Setup (Primary-Remote)
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
metadata:
  name: istio-primary
spec:
  values:
    global:
      meshID: multi-cloud-mesh
      multiCluster:
        clusterName: aws-cluster
      network: aws-network
    pilot:
      env:
        EXTERNAL_ISTIOD: "true"
  meshConfig:
    defaultConfig:
      proxyMetadata:
        ISTIO_META_DNS_CAPTURE: "true"
        ISTIO_META_DNS_AUTO_ALLOCATE: "true"
  components:
    ingressGateways:
      - name: istio-eastwestgateway
        label:
          istio: eastwestgateway
          topology.istio.io/network: aws-network
        enabled: true
        k8s:
          env:
            - name: ISTIO_META_REQUESTED_NETWORK_VIEW
              value: aws-network

10. Identity Federation

10.1 Multi-Cloud IAM Strategy

┌─────────────────────────────────────┐
Central Identity Provider    (Okta / Azure AD / Google)└──────────┬──────────────────────────┘
SAML / OIDC
     ┌─────┼──────┬───────┐
     v     v      v       v
  ┌─────┐┌─────┐┌──────┐┌─────────┐
AWS ││ GCP ││Azure ││On-PremIAM ││ IAM ││ AD   ││  LDAP  └─────┘└─────┘└──────┘└─────────┘

10.2 OIDC-based Cross-Cloud Authentication

# Access GCP resources from AWS (Workload Identity Federation)
import google.auth
from google.auth import impersonated_credentials
import boto3

class CrossCloudAuth:
    def get_gcp_credentials_from_aws(self):
        # Verify current credentials from AWS STS
        sts = boto3.client("sts")
        aws_identity = sts.get_caller_identity()

        # Use GCP Workload Identity Federation
        # Exchange AWS credentials for GCP token
        credentials, project = google.auth.default(
            scopes=["https://www.googleapis.com/auth/cloud-platform"]
        )

        # Service account impersonation
        target_credentials = impersonated_credentials.Credentials(
            source_credentials=credentials,
            target_principal="cross-cloud@project.iam.gserviceaccount.com",
            target_scopes=["https://www.googleapis.com/auth/cloud-platform"],
            lifetime=3600
        )

        return target_credentials

    def setup_workload_identity_pool(self):
        """GCP Workload Identity Pool setup (gcloud CLI)"""
        commands = [
            # Create pool
            "gcloud iam workload-identity-pools create aws-pool "
            "--location=global "
            "--description='AWS Workload Identity Pool'",

            # Add AWS provider
            "gcloud iam workload-identity-pools providers create-aws aws-provider "
            "--location=global "
            "--workload-identity-pool=aws-pool "
            "--account-id=123456789012",

            # Bind service account
            "gcloud iam service-accounts add-iam-policy-binding "
            "cross-cloud@project.iam.gserviceaccount.com "
            "--role=roles/iam.workloadIdentityUser "
            "--member='principalSet://iam.googleapis.com/"
            "projects/PROJECT_NUM/locations/global/"
            "workloadIdentityPools/aws-pool/attribute.aws_role/"
            "arn:aws:sts::123456789012:assumed-role/my-role'"
        ]
        return commands

11. Cloud-Native Portability

11.1 Kubernetes-based Abstraction

# Multi-cloud Kubernetes Deployment (Helm values)
# values-aws.yaml
cloud:
  provider: aws
  region: us-east-1
  storageClass: gp3
  ingressClass: alb
  serviceAnnotations:
    service.beta.kubernetes.io/aws-load-balancer-type: nlb
    service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing

# values-gcp.yaml
cloud:
  provider: gcp
  region: us-central1
  storageClass: pd-ssd
  ingressClass: gce
  serviceAnnotations:
    cloud.google.com/neg: '{"ingress": true}'
    cloud.google.com/backend-config: '{"default": "backend-config"}'

# values-azure.yaml
cloud:
  provider: azure
  region: eastus
  storageClass: managed-premium
  ingressClass: azure-application-gateway
  serviceAnnotations:
    service.beta.kubernetes.io/azure-load-balancer-internal: "false"

11.2 Terraform Multi-Cloud Modules

# modules/compute/main.tf - Cloud Abstraction Layer
variable "cloud_provider" {
  type = string
  validation {
    condition     = contains(["aws", "gcp", "azure"], var.cloud_provider)
    error_message = "Supported providers: aws, gcp, azure"
  }
}

variable "instance_config" {
  type = object({
    name          = string
    cpu           = number
    memory_gb     = number
    disk_gb       = number
    os            = string
  })
}

# AWS Implementation
module "aws_compute" {
  source = "./aws"
  count  = var.cloud_provider == "aws" ? 1 : 0

  instance_type = local.aws_instance_map[
    "${var.instance_config.cpu}-${var.instance_config.memory_gb}"
  ]
  ami_id      = local.aws_ami_map[var.instance_config.os]
  volume_size = var.instance_config.disk_gb
  name        = var.instance_config.name
}

# GCP Implementation
module "gcp_compute" {
  source = "./gcp"
  count  = var.cloud_provider == "gcp" ? 1 : 0

  machine_type = local.gcp_machine_map[
    "${var.instance_config.cpu}-${var.instance_config.memory_gb}"
  ]
  image      = local.gcp_image_map[var.instance_config.os]
  disk_size  = var.instance_config.disk_gb
  name       = var.instance_config.name
}

# Azure Implementation
module "azure_compute" {
  source = "./azure"
  count  = var.cloud_provider == "azure" ? 1 : 0

  vm_size     = local.azure_vm_map[
    "${var.instance_config.cpu}-${var.instance_config.memory_gb}"
  ]
  image_ref   = local.azure_image_map[var.instance_config.os]
  disk_size   = var.instance_config.disk_gb
  name        = var.instance_config.name
}

output "instance_id" {
  value = coalesce(
    try(module.aws_compute[0].instance_id, ""),
    try(module.gcp_compute[0].instance_id, ""),
    try(module.azure_compute[0].instance_id, "")
  )
}

11.3 OCI Container Image Strategy

# Multi-stage Build - Cloud-independent Image
FROM golang:1.22-alpine AS builder

WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download

COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -o /app/server ./cmd/server

# Final Image - distroless (cloud-agnostic)
FROM gcr.io/distroless/static-debian12:nonroot

COPY --from=builder /app/server /server
COPY --from=builder /app/config /config

EXPOSE 8080
USER nonroot:nonroot
ENTRYPOINT ["/server"]

12. Multi-Cloud Disaster Recovery (DR)

12.1 DR Strategy Comparison

DR StrategyRTORPOCostComplexity
Backup and RestoreHoursHoursLowLow
Pilot Light10-30 minMinutesMediumMedium
Warm StandbyMinutesSecondsHighHigh
Active-ActiveSecondsNear zeroVery highVery high

12.2 Multi-Cloud DR Implementation

# Multi-Cloud DR Orchestrator
class MultiCloudDR:
    def __init__(self):
        self.primary = AWSProvider(region="us-east-1")
        self.secondary = AzureProvider(region="eastus")
        self.health_checker = HealthChecker()

    def execute_failover(self):
        """Failover to Secondary (Azure) when Primary (AWS) fails"""
        steps = [
            self._verify_secondary_health,
            self._promote_database_replica,
            self._update_dns_records,
            self._scale_up_secondary,
            self._verify_application_health,
            self._notify_stakeholders
        ]

        for step in steps:
            result = step()
            if not result.success:
                self._rollback_failover(result.step_index)
                raise FailoverError(f"Failed at step: {step.__name__}")

    def _promote_database_replica(self):
        """Promote Azure SQL read replica to primary"""
        self.secondary.promote_replica(
            server="dr-sql-server",
            database="production-db",
            failover_group="prod-failover-group"
        )

    def _update_dns_records(self):
        """Update Route 53 / Azure DNS records"""
        self.primary.update_dns(
            zone="example.com",
            record="api.example.com",
            target=self.secondary.get_endpoint(),
            ttl=60
        )

    def continuous_replication(self):
        """Continuous data synchronization"""
        replication_config = {
            "database": {
                "type": "async",
                "lag_threshold_seconds": 30,
                "source": "aws-rds-primary",
                "target": "azure-sql-replica"
            },
            "storage": {
                "type": "incremental",
                "interval_minutes": 15,
                "source": "s3://prod-bucket",
                "target": "azure://prod-container"
            },
            "secrets": {
                "type": "sync",
                "source": "aws-secrets-manager",
                "target": "azure-key-vault"
            }
        }
        return replication_config

13. Multi-Cloud Cost Management

13.1 Unified Cost Monitoring

# Multi-Cloud Cost Dashboard Collector
class MultiCloudCostCollector:
    def collect_all_costs(self, period="monthly"):
        aws_costs = self._get_aws_costs(period)
        gcp_costs = self._get_gcp_costs(period)
        azure_costs = self._get_azure_costs(period)

        return {
            "total": aws_costs["total"] + gcp_costs["total"] + azure_costs["total"],
            "by_provider": {
                "aws": aws_costs,
                "gcp": gcp_costs,
                "azure": azure_costs
            },
            "by_service": self._aggregate_by_service(
                aws_costs, gcp_costs, azure_costs
            ),
            "by_team": self._aggregate_by_tag(
                "team", aws_costs, gcp_costs, azure_costs
            ),
            "anomalies": self._detect_anomalies(
                aws_costs, gcp_costs, azure_costs
            ),
            "recommendations": self._generate_optimization_recommendations(
                aws_costs, gcp_costs, azure_costs
            )
        }

    def _detect_anomalies(self, *provider_costs):
        """Detect cost anomalies"""
        anomalies = []
        for costs in provider_costs:
            for service, cost in costs["by_service"].items():
                avg = cost.get("rolling_avg_30d", 0)
                current = cost.get("current", 0)
                if avg > 0 and current > avg * 1.5:
                    anomalies.append({
                        "service": service,
                        "provider": costs["provider"],
                        "current": current,
                        "average": avg,
                        "increase_pct": (current - avg) / avg * 100
                    })
        return anomalies

13.2 Cost Optimization Strategies

StrategySavingsApplicable To
Reserved/Committed Use30-60%Stable workloads
Spot/Preemptible Instances60-90%Batch, testing
Auto Scaling20-40%Variable traffic
Right-Sizing15-35%All workloads
Storage Tiering40-70%Archive data
Network Optimization10-30%Cross-cloud communication

14. Governance and Compliance

14.1 Multi-Cloud Governance Framework

# Open Policy Agent (OPA) - Multi-Cloud Policy
package multicloud.governance

# Require mandatory tags on all resources
required_tags := ["environment", "team", "cost-center", "data-classification"]

deny[msg] {
  resource := input.resource
  tag := required_tags[_]
  not resource.tags[tag]
  msg := sprintf(
    "Resource %v is missing required tag: %v",
    [resource.name, tag]
  )
}

# Data sovereignty - certain data only in specific regions
deny[msg] {
  resource := input.resource
  resource.tags["data-classification"] == "pii-eu"
  not startswith(resource.region, "eu-")
  not startswith(resource.region, "europe")
  msg := sprintf(
    "EU PII data must be stored in EU region. Resource %v in %v",
    [resource.name, resource.region]
  )
}

# Prevent cost limit overruns
deny[msg] {
  resource := input.resource
  resource.type == "compute_instance"
  resource.monthly_cost > 5000
  not resource.tags["approved-high-cost"] == "true"
  msg := sprintf(
    "Instance %v exceeds $5000/month limit. Get approval first.",
    [resource.name]
  )
}

# Mandatory encryption
deny[msg] {
  resource := input.resource
  resource.type == "storage_bucket"
  not resource.encryption.enabled
  msg := sprintf(
    "Storage %v must have encryption enabled",
    [resource.name]
  )
}

15. Practice Quiz

Q1. What are the key differences between Active-Active and Active-Passive patterns in multi-cloud architecture, and when is each appropriate?

Active-Active: Both clouds simultaneously process traffic. Provides maximum availability but has high data synchronization complexity and cost. Best for mission-critical and global services.

Active-Passive: Only the primary cloud processes traffic while the secondary stands by for DR. Cost is reasonable but some downtime occurs during failover. Best for high availability requirements with cost sensitivity.

The key difference is simultaneous processing and failover time (RTO). Active-Active has near-zero RTO, while Active-Passive has an RTO of several minutes.

Q2. Explain the difference between Replatform and Refactor in the 6R migration strategy and provide suitable use cases for each.

Replatform: Maintain core architecture while replacing parts with cloud-managed services. For example, migrating self-managed MySQL to Amazon RDS, or replacing self-managed Redis with ElastiCache. Moderate effort yields quick operational benefits.

Refactor: Complete redesign for cloud-native. Decompose monoliths into microservices or transition to serverless architecture. Requires the highest effort but delivers maximum cloud benefits long-term.

Replatform is suitable when quick wins are needed; Refactor when long-term innovation is the goal.

Q3. What is Workload Identity Federation, and why is it more secure than service account keys?

Workload Identity Federation is a mechanism that exchanges external IdP credentials (AWS IAM, Azure AD, etc.) for temporary GCP tokens.

Why it is more secure than service account keys:

  1. No key management: No need to create/distribute/rotate JSON key files
  2. Temporary tokens: Exchanged tokens auto-expire after 1 hour
  3. Least privilege: Fine-grained access control based on specific attributes (roles, tags)
  4. Easy auditing: All token exchanges recorded in Cloud Audit Logs
  5. Reduced leak risk: No long-lived credentials exist to be leaked
Q4. Describe the key steps and considerations when migrating a monolith to microservices using the Strangler Fig pattern.

Key Steps:

  1. Add proxy/API Gateway: Route all traffic through the gateway
  2. Identify capabilities: Identify bounded contexts that can be separated
  3. Incremental extraction: Extract one microservice at a time, change routing at the gateway
  4. Data separation: Separate from shared DB to per-service databases
  5. Shrink the monolith: Retire the monolith after all capabilities are extracted

Considerations:

  • Database separation is the hardest part -- watch for transactional consistency
  • Avoid extracting too many services at once
  • Carefully decide inter-service communication patterns (sync/async)
  • Build monitoring/observability before starting migration
  • Rollback plan is essential
Q5. Suggest at least 3 strategies to optimize data egress costs in a multi-cloud environment.
  1. Data Locality Design: Place processing engines in the cloud where data resides. Move compute, not data
  2. CDN Utilization: Cache outbound traffic using CDN to reduce origin egress
  3. Compression and Protocol Optimization: Reduce transmitted data size using efficient serialization like gRPC and Protobuf
  4. Dedicated Connections: Direct Connect, ExpressRoute, etc. offer lower egress costs compared to internet transfer
  5. Async Batch Transfer: Instead of real-time transfer, batch data and send during off-peak hours
  6. Private Peering: Direct peering between clouds through neutral exchange points like Megaport or Equinix Fabric

References

  1. AWS Well-Architected Framework - Multi-Cloud
  2. Google Cloud - Hybrid and Multi-Cloud Patterns
  3. Azure Architecture Center - Multi-Cloud
  4. HashiCorp - Multi-Cloud with Terraform
  5. CNCF Multi-Cloud Reference Architecture
  6. Gartner - Cloud Migration Strategies
  7. AWS Migration Hub Documentation
  8. Google Anthos Documentation
  9. Azure Arc Documentation
  10. Istio Multi-Cluster Documentation
  11. Open Policy Agent Documentation
  12. Kubernetes Federation v2
  13. FinOps Foundation - Multi-Cloud Cost Management

현재 단락 (1/975)

As the cloud market matures, concentrating all workloads on a single cloud provider becomes increasi...

작성 글자: 0원문 글자: 33,056작성 단락: 0/975