Chaos and Order

💡 왼쪽 원문을 읽으면서 오른쪽에 따라 써보세요. Tab 키로 힌트를 받을 수 있습니다.

들어가며: 왜 Service Mesh가 필요한가?

마이크로서비스 아키텍처가 보편화되면서, 수십에서 수백 개의 서비스가 네트워크를 통해 통신하는 환경이 일반적이 되었습니다. 이 복잡한 서비스 간 통신에서 다음과 같은 문제들이 반복적으로 발생합니다.

보안 문제: 서비스 간 통신이 암호화되지 않으면 내부 네트워크에서도 도청이 가능합니다. 각 서비스마다 TLS를 직접 구현하고 인증서를 관리하는 것은 엄청난 운영 부담입니다.

관찰 가능성(Observability) 부재: 요청이 여러 서비스를 거치면서 어디서 지연이 발생하는지, 어떤 서비스가 오류를 반환하는지 파악하기 어렵습니다.

트래픽 제어의 어려움: 카나리 배포, A/B 테스트, 서킷 브레이커 같은 고급 트래픽 관리를 애플리케이션 코드에 직접 구현해야 합니다.

Service Mesh는 이 모든 문제를 인프라 레이어에서 해결합니다. 애플리케이션 코드를 한 줄도 수정하지 않고, 보안/관찰/제어 기능을 네트워크 레벨에서 투명하게 추가할 수 있습니다.

1. Service Mesh 아키텍처

Service Mesh는 크게 두 가지 평면(plane)으로 구성됩니다.

1.1 데이터 플레인 (Data Plane)

데이터 플레인은 실제 서비스 트래픽을 처리하는 프록시들의 집합입니다. 각 서비스 Pod에 사이드카로 배포되어 모든 인바운드/아웃바운드 트래픽을 가로챕니다.

┌─────────────────────────────────────────────┐
│                   Pod                        │
│  ┌─────────────┐    ┌─────────────────────┐ │
│  │  Application │◄──►│   Sidecar Proxy     │ │
│  │  Container   │    │  (Envoy/linkerd2)   │ │
│  └─────────────┘    └─────────────────────┘ │
└─────────────────────────────────────────────┘

사이드카 프록시의 주요 역할:

모든 트래픽을 투명하게 가로채기 (iptables 규칙 활용)
mTLS 암호화/복호화 수행
로드 밸런싱 (라운드 로빈, 최소 연결 등)
메트릭 수집 및 분산 트레이싱 헤더 전파
재시도, 타임아웃, 서킷 브레이킹 적용

1.2 컨트롤 플레인 (Control Plane)

컨트롤 플레인은 데이터 플레인의 프록시들을 중앙에서 관리하고 설정합니다.

Istio의 컨트롤 플레인 (Istiod):

# Istiod가 관리하는 주요 기능
- 서비스 디스커버리: Kubernetes API에서 서비스 목록 동기화
- 설정 배포: VirtualService, DestinationRule 등을 Envoy 설정으로 변환
- 인증서 관리: mTLS용 인증서 발급/갱신 (내장 CA)
- 정책 적용: AuthorizationPolicy, PeerAuthentication 배포

Linkerd의 컨트롤 플레인:

# Linkerd 컨트롤 플레인 컴포넌트
- destination: 서비스 디스커버리 + 정책 배포
- identity: mTLS 인증서 발급 (trust anchor 기반)
- proxy-injector: Pod 생성 시 사이드카 자동 주입
- heartbeat: 텔레메트리 수집

2. Istio 심층 분석

2.1 Istio 아키텍처 개요

Istio는 가장 기능이 풍부한 Service Mesh입니다. Google, IBM, Lyft가 공동 개발했으며, 현재 CNCF 졸업 프로젝트입니다.

# Istio 설치 (istioctl)
curl -L https://istio.io/downloadIstio | sh -
cd istio-1.24.0
export PATH=$PWD/bin:$PATH

# 프로필 기반 설치
istioctl install --set profile=demo -y

# 네임스페이스에 사이드카 자동 주입 활성화
kubectl label namespace default istio-injection=enabled

2.2 Envoy 사이드카 프록시

Istio의 데이터 플레인은 Envoy 프록시를 사용합니다. Envoy는 C++로 작성된 고성능 L4/L7 프록시로, 다음 기능을 제공합니다.

# Envoy의 핵심 기능
- HTTP/1.1, HTTP/2, gRPC 지원
- 자동 재시도 및 서킷 브레이킹
- 동적 설정 업데이트 (xDS API)
- 풍부한 메트릭 및 트레이싱
- 웹어셈블리(Wasm) 확장 지원
- 핫 리스타트 (graceful restart)

메모리 오버헤드는 Pod당 약 40-100MB이며, CPU 오버헤드는 요청당 수 밀리초 수준입니다.

2.3 VirtualService

VirtualService는 Istio에서 트래픽 라우팅 규칙을 정의하는 핵심 리소스입니다.

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: reviews-route
spec:
  hosts:
    - reviews
  http:
    # 카나리 배포: 90% v1, 10% v2
    - route:
        - destination:
            host: reviews
            subset: v1
          weight: 90
        - destination:
            host: reviews
            subset: v2
          weight: 10
      timeout: 5s
      retries:
        attempts: 3
        perTryTimeout: 2s
        retryOn: 5xx,reset,connect-failure

2.4 DestinationRule

DestinationRule은 라우팅이 결정된 후 트래픽에 적용할 정책을 정의합니다.

apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: reviews-destination
spec:
  host: reviews
  trafficPolicy:
    connectionPool:
      tcp:
        maxConnections: 100
      http:
        h2UpgradePolicy: DEFAULT
        http1MaxPendingRequests: 100
        http2MaxRequests: 1000
    outlierDetection:
      consecutive5xxErrors: 5
      interval: 30s
      baseEjectionTime: 30s
      maxEjectionPercent: 50
    loadBalancer:
      simple: LEAST_REQUEST
  subsets:
    - name: v1
      labels:
        version: v1
    - name: v2
      labels:
        version: v2

2.5 Gateway

Istio Gateway는 메시 외부에서 들어오는 트래픽을 관리합니다.

apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
  name: bookinfo-gateway
spec:
  selector:
    istio: ingressgateway
  servers:
    - port:
        number: 443
        name: https
        protocol: HTTPS
      tls:
        mode: SIMPLE
        credentialName: bookinfo-cert
      hosts:
        - "bookinfo.example.com"

2.6 PeerAuthentication

PeerAuthentication은 서비스 간 mTLS 정책을 정의합니다.

apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: default
  namespace: istio-system
spec:
  # 메시 전체에 STRICT mTLS 적용
  mtls:
    mode: STRICT
---
# 특정 네임스페이스에만 PERMISSIVE 모드
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: legacy-compat
  namespace: legacy-apps
spec:
  mtls:
    mode: PERMISSIVE

2.7 AuthorizationPolicy

AuthorizationPolicy는 서비스 간 접근 제어를 정의합니다.

apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: reviews-viewer
  namespace: default
spec:
  selector:
    matchLabels:
      app: reviews
  action: ALLOW
  rules:
    - from:
        - source:
            principals: ["cluster.local/ns/default/sa/productpage"]
      to:
        - operation:
            methods: ["GET"]
            paths: ["/reviews/*"]

3. Linkerd 심층 분석

3.1 Linkerd 아키텍처 개요

Linkerd는 가볍고 단순함을 추구하는 Service Mesh입니다. Buoyant가 개발했으며, CNCF 졸업 프로젝트입니다.

# Linkerd CLI 설치
curl --proto '=https' --tlsv1.2 -sSfL https://run.linkerd.io/install | sh
export PATH=$HOME/.linkerd2/bin:$PATH

# 사전 점검
linkerd check --pre

# 설치
linkerd install --crds | kubectl apply -f -
linkerd install | kubectl apply -f -

# 검증
linkerd check

# Viz 확장 (대시보드 + 메트릭)
linkerd viz install | kubectl apply -f -

3.2 linkerd2-proxy: Rust로 작성된 마이크로 프록시

Linkerd의 핵심 차별점은 데이터 플레인 프록시입니다. linkerd2-proxy는 Rust로 작성되어 다음과 같은 장점이 있습니다.

성능 비교 (linkerd2-proxy vs Envoy)
========================================
메모리 사용량: ~20MB vs ~50-100MB
P99 레이턴시: ~1ms 추가 vs ~2-5ms 추가
바이너리 크기: ~13MB vs ~50MB
보안: Rust 메모리 안전성 보장
기능 범위: Service Mesh 전용 vs 범용 프록시

linkerd2-proxy는 Service Mesh에 필요한 기능만 구현하여 경량화를 달성했습니다. Envoy처럼 범용 프록시가 아니므로 Wasm 확장 같은 기능은 없지만, 핵심 기능에서는 뛰어난 성능을 보여줍니다.

3.3 ServiceProfile

Linkerd의 ServiceProfile은 서비스별 라우팅 및 관찰 가능성 설정을 정의합니다.

apiVersion: linkerd.io/v1alpha2
kind: ServiceProfile
metadata:
  name: webapp.default.svc.cluster.local
  namespace: default
spec:
  routes:
    - name: GET /api/users
      condition:
        method: GET
        pathRegex: /api/users
      responseClasses:
        - condition:
            status:
              min: 500
              max: 599
          isFailure: true
    - name: POST /api/orders
      condition:
        method: POST
        pathRegex: /api/orders
      isRetryable: true
      timeout: 10s

3.4 TrafficSplit (SMI)

Linkerd는 SMI(Service Mesh Interface) 표준을 사용하여 트래픽 분할을 구현합니다.

apiVersion: split.smi-spec.io/v1alpha4
kind: TrafficSplit
metadata:
  name: webapp-split
  namespace: default
spec:
  service: webapp
  backends:
    - service: webapp-v1
      weight: 900
    - service: webapp-v2
      weight: 100

3.5 Linkerd 멀티클러스터

Linkerd는 멀티클러스터 통신을 네이티브로 지원합니다.

# 멀티클러스터 설치
linkerd multicluster install | kubectl apply -f -

# 원격 클러스터 연결
linkerd multicluster link --cluster-name=west \
  --api-server-address="https://west.example.com:6443" | \
  kubectl apply -f -

# 서비스 미러링 확인
linkerd multicluster gateways

4. Istio vs Linkerd 상세 비교

비교 항목	Istio	Linkerd
데이터 플레인 프록시	Envoy (C++)	linkerd2-proxy (Rust)
메모리 오버헤드 (Pod당)	50-100MB	10-20MB
P99 레이턴시 추가	2-5ms	0.5-1ms
설치 복잡도	높음 (다양한 프로필)	낮음 (단일 명령)
CRD 수	50개 이상	10개 이하
학습 곡선	가파름	완만함
트래픽 관리	매우 풍부 (VirtualService)	기본적 (ServiceProfile)
보안 정책	세밀한 RBAC (AuthorizationPolicy)	기본 mTLS + Server/Authorization
프로토콜 지원	HTTP, gRPC, TCP, WebSocket	HTTP, gRPC, TCP
Wasm 확장	지원	미지원
멀티클러스터	지원 (복잡)	지원 (상대적으로 간단)
Ambient Mesh	지원 (사이드카 없는 모드)	해당 없음
Gateway API	완전 지원	부분 지원
커뮤니티 규모	매우 큼 (CNCF 졸업)	큼 (CNCF 졸업)
운영 복잡도	높음	낮음
적합한 환경	대규모, 복잡한 정책 필요	소중규모, 단순함 선호

선택 기준 요약

Istio를 선택해야 할 때:

세밀한 트래픽 관리가 필요한 경우 (가중치 기반 라우팅, 폴트 인젝션, 트래픽 미러링)
복잡한 보안 정책이 필요한 경우 (JWT 검증, 외부 인가)
Wasm 기반 확장 플러그인이 필요한 경우
Ambient Mesh(사이드카 없는 모드)를 사용하려는 경우

Linkerd를 선택해야 할 때:

리소스 오버헤드를 최소화하고 싶은 경우
빠른 도입과 간단한 운영을 원하는 경우
핵심 기능(mTLS, 메트릭, 재시도)만으로 충분한 경우
운영팀 규모가 작은 경우

5. mTLS (상호 TLS)

5.1 mTLS의 원리

Service Mesh에서 mTLS는 서비스 간 통신을 자동으로 암호화합니다.

서비스 A (클라이언트)          서비스 B (서버)
     │                            │
     │── ClientHello ──────────►  │
     │◄─ ServerHello + 서버 인증서 │
     │── 클라이언트 인증서 ──────► │
     │◄─ 인증서 검증 완료 ────────│
     │                            │
     │◄════ 암호화된 통신 ════════►│

일반 TLS와의 차이점: mTLS에서는 양쪽 모두 인증서를 제시하고 검증합니다. 이를 통해 서버도 클라이언트의 신원을 확인할 수 있습니다.

5.2 SPIFFE 신원 체계

Istio와 Linkerd 모두 SPIFFE(Secure Production Identity Framework For Everyone) 표준을 사용합니다.

SPIFFE ID 형식:
spiffe://cluster.local/ns/NAMESPACE/sa/SERVICE_ACCOUNT

예시:
spiffe://cluster.local/ns/production/sa/frontend
spiffe://cluster.local/ns/production/sa/backend-api

SPIFFE ID는 Kubernetes의 ServiceAccount에 매핑되어, Pod의 신원을 네트워크 레벨에서 증명합니다.

5.3 인증서 자동 로테이션

# Istio: 인증서 수명 설정 (MeshConfig)
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
spec:
  meshConfig:
    defaultConfig:
      # 워크로드 인증서 기본 24시간
      # proxyMetadata를 통해 커스터마이즈 가능
    certificates: []
  values:
    pilot:
      env:
        # 최대 인증서 수명
        MAX_WORKLOAD_CERT_TTL: "48h"
        # 기본 인증서 수명
        DEFAULT_WORKLOAD_CERT_TTL: "24h"

Linkerd의 인증서 관리:

# Trust anchor 생성 (10년 수명)
step certificate create root.linkerd.cluster.local ca.crt ca.key \
  --profile root-ca --no-password --insecure --not-after=87600h

# Issuer 인증서 생성 (48시간 수명, 자동 갱신)
step certificate create identity.linkerd.cluster.local issuer.crt issuer.key \
  --profile intermediate-ca --not-after=48h --no-password --insecure \
  --ca ca.crt --ca-key ca.key

# 인증서로 설치
linkerd install \
  --identity-trust-anchors-file ca.crt \
  --identity-issuer-certificate-file issuer.crt \
  --identity-issuer-key-file issuer.key | kubectl apply -f -

6. 트래픽 관리

6.1 카나리 릴리스

# Istio - 점진적 카나리 배포
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: reviews
spec:
  hosts:
    - reviews
  http:
    - match:
        - headers:
            x-canary-user:
              exact: "true"
      route:
        - destination:
            host: reviews
            subset: v2
    - route:
        - destination:
            host: reviews
            subset: v1
          weight: 95
        - destination:
            host: reviews
            subset: v2
          weight: 5

Flagger를 사용한 자동 카나리:

apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
  name: reviews
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: reviews
  service:
    port: 9080
  analysis:
    interval: 1m
    threshold: 5
    maxWeight: 50
    stepWeight: 10
    metrics:
      - name: request-success-rate
        thresholdRange:
          min: 99
        interval: 1m
      - name: request-duration
        thresholdRange:
          max: 500
        interval: 1m

6.2 트래픽 미러링 (Shadow Traffic)

프로덕션 트래픽의 복사본을 새 버전에 보내 실제 환경에서의 동작을 검증합니다.

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: reviews-mirror
spec:
  hosts:
    - reviews
  http:
    - route:
        - destination:
            host: reviews
            subset: v1
      mirror:
        host: reviews
        subset: v2
      mirrorPercentage:
        value: 100.0

미러링의 핵심 특성:

미러된 트래픽의 응답은 폐기됩니다 (클라이언트에 영향 없음)
Host 헤더에 -shadow 접미사가 추가됩니다
새 버전의 성능과 에러율을 실제 트래픽으로 검증 가능합니다

6.3 폴트 인젝션 (Fault Injection)

의도적으로 장애를 주입하여 시스템의 복원력을 테스트합니다.

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: ratings-fault
spec:
  hosts:
    - ratings
  http:
    - fault:
        delay:
          percentage:
            value: 10
          fixedDelay: 5s
        abort:
          percentage:
            value: 5
          httpStatus: 503
      route:
        - destination:
            host: ratings

6.4 서킷 브레이킹 (Circuit Breaking)

apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: reviews-circuit-breaker
spec:
  host: reviews
  trafficPolicy:
    connectionPool:
      tcp:
        maxConnections: 50
      http:
        http1MaxPendingRequests: 100
        http2MaxRequests: 100
        maxRequestsPerConnection: 10
    outlierDetection:
      consecutive5xxErrors: 3
      interval: 10s
      baseEjectionTime: 30s
      maxEjectionPercent: 30
      minHealthPercent: 70

6.5 재시도 및 타임아웃

# Istio
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: reviews-retry
spec:
  hosts:
    - reviews
  http:
    - timeout: 10s
      retries:
        attempts: 3
        perTryTimeout: 3s
        retryOn: 5xx,reset,connect-failure,retriable-4xx
      route:
        - destination:
            host: reviews

# Linkerd ServiceProfile
apiVersion: linkerd.io/v1alpha2
kind: ServiceProfile
metadata:
  name: reviews.default.svc.cluster.local
spec:
  routes:
    - name: GET /reviews
      condition:
        method: GET
        pathRegex: /reviews/.*
      isRetryable: true
      timeout: 10s

7. Observability (관찰 가능성)

7.1 메트릭 (Prometheus)

Service Mesh는 자동으로 다음 메트릭을 수집합니다.

골든 시그널 (Golden Signals)
================================
1. 레이턴시: 요청 처리 시간
2. 트래픽: 초당 요청 수
3. 에러율: 실패한 요청 비율
4. 포화도: 리소스 사용률

Istio 주요 메트릭:
- istio_requests_total: 총 요청 수 (소스, 대상, 응답 코드별)
- istio_request_duration_milliseconds: 요청 소요 시간
- istio_request_bytes / istio_response_bytes: 요청/응답 크기

Linkerd 주요 메트릭:
- request_total: 총 요청 수
- response_latency_ms: 응답 레이턴시
- tcp_open_total: TCP 연결 수

# Prometheus 스크래핑 설정 (Istio)
apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-config
data:
  prometheus.yml: |
    scrape_configs:
      - job_name: 'envoy-stats'
        metrics_path: /stats/prometheus
        kubernetes_sd_configs:
          - role: pod
        relabel_configs:
          - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
            action: keep
            regex: true

7.2 분산 트레이싱 (Jaeger / Zipkin)

서비스 메시는 트레이싱 헤더를 자동으로 전파하여 요청의 전체 경로를 추적합니다.

# Istio 텔레메트리 설정
apiVersion: telemetry.istio.io/v1alpha1
kind: Telemetry
metadata:
  name: mesh-default
  namespace: istio-system
spec:
  tracing:
    - providers:
        - name: jaeger
      randomSamplingPercentage: 10
      customTags:
        environment:
          literal:
            value: "production"

중요: 애플리케이션은 다음 헤더를 전파해야 합니다 (자동 생성은 되지만 전파는 애플리케이션의 책임).

전파해야 할 트레이싱 헤더:
- x-request-id
- x-b3-traceid
- x-b3-spanid
- x-b3-parentspanid
- x-b3-sampled
- x-b3-flags
- traceparent (W3C Trace Context)
- tracestate

7.3 Kiali 대시보드

Kiali는 Istio 전용 관찰 가능성 대시보드입니다.

# Kiali 설치
kubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.24/samples/addons/kiali.yaml

# 대시보드 접속
istioctl dashboard kiali

Kiali의 주요 기능:

서비스 토폴로지 그래프 시각화
실시간 트래픽 흐름 모니터링
Istio 설정 검증 및 오류 탐지
분산 트레이싱 통합
메트릭 기반 건강 상태 표시

7.4 Grafana 대시보드

# Grafana + 사전 구성된 대시보드 설치
kubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.24/samples/addons/grafana.yaml

# 대시보드 접속
istioctl dashboard grafana

주요 대시보드:

Mesh Dashboard: 전체 메시 트래픽 개요
Service Dashboard: 개별 서비스 메트릭
Workload Dashboard: 워크로드별 상세 정보
Performance Dashboard: P50/P90/P99 레이턴시

8. Kubernetes Gateway API

8.1 Gateway API란?

Kubernetes Gateway API는 기존 Ingress를 대체하는 차세대 트래픽 관리 표준입니다. 역할 기반 설계로 인프라/클러스터/애플리케이션 관리자의 책임을 명확히 분리합니다.

# GatewayClass: 인프라 관리자가 정의
apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
  name: istio
spec:
  controllerName: istio.io/gateway-controller
---
# Gateway: 클러스터 관리자가 정의
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: bookinfo-gateway
spec:
  gatewayClassName: istio
  listeners:
    - name: https
      protocol: HTTPS
      port: 443
      tls:
        mode: Terminate
        certificateRefs:
          - name: bookinfo-tls
      allowedRoutes:
        namespaces:
          from: Selector
          selector:
            matchLabels:
              expose: "true"
---
# HTTPRoute: 애플리케이션 개발자가 정의
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: bookinfo-route
spec:
  parentRefs:
    - name: bookinfo-gateway
  hostnames:
    - "bookinfo.example.com"
  rules:
    - matches:
        - path:
            type: PathPrefix
            value: /reviews
      backendRefs:
        - name: reviews
          port: 9080
          weight: 90
        - name: reviews-v2
          port: 9080
          weight: 10

8.2 Istio Gateway vs Kubernetes Gateway API

기존 Istio 방식:
  Gateway + VirtualService + DestinationRule

Kubernetes Gateway API 방식:
  GatewayClass + Gateway + HTTPRoute

이점:
  - 표준화된 API (여러 구현 간 이식성)
  - 역할 기반 접근 제어
  - 더 나은 네임스페이스 격리
  - Istio, Linkerd, Cilium 등에서 동일한 API 사용 가능

9. Ambient Mesh

9.1 사이드카의 한계

기존 사이드카 방식의 문제점:

Pod당 50-100MB 추가 메모리
모든 요청에 프록시 홉 추가 (레이턴시)
사이드카 주입으로 인한 Pod 재시작 필요
리소스 오버프로비저닝

9.2 Ambient Mesh 아키텍처

Istio의 Ambient Mesh는 사이드카 없이 서비스 메시를 구현하는 새로운 모드입니다.

기존 사이드카 모드:
┌────────────┐    ┌────────────┐
│ App + Envoy│───►│ App + Envoy│
└────────────┘    └────────────┘

Ambient Mesh 모드:
┌────────────┐    ┌────────────┐
│    App     │    │    App     │
└─────┬──────┘    └──────┬─────┘
      │                  │
┌─────┴──────────────────┴─────┐  ← ztunnel (노드당 1개, L4)
└──────────────┬───────────────┘
               │
        ┌──────┴──────┐           ← waypoint proxy (선택, L7)
        │   Waypoint  │
        └─────────────┘

ztunnel (Zero Trust Tunnel):

노드당 하나의 데몬셋으로 실행
L4 기능만 담당: mTLS, 기본 인증
Rust로 작성, 매우 가벼움
Pod 재시작 불필요

Waypoint Proxy:

L7 기능이 필요한 경우에만 배포
네임스페이스 또는 서비스별로 배포 가능
Envoy 기반, 전체 L7 기능 제공

# Ambient 모드로 Istio 설치
istioctl install --set profile=ambient -y

# 네임스페이스를 Ambient 메시에 추가
kubectl label namespace default istio.io/dataplane-mode=ambient

# Waypoint Proxy 배포 (L7 기능 필요 시)
istioctl waypoint apply --namespace default --name reviews-waypoint

9.3 Ambient Mesh의 이점

리소스 절감 비교 (100 Pod 클러스터 기준):
========================================
            사이드카 모드    Ambient 모드
메모리:     5-10GB 추가     200-500MB 추가
CPU:        상당한 오버헤드   최소 오버헤드
운영:       사이드카 관리     ztunnel 데몬셋만 관리
업그레이드:  Pod 재시작 필요   ztunnel 롤링 업데이트

10. 보안 심층 분석

10.1 RBAC (역할 기반 접근 제어)

# 네임스페이스 레벨 거부 정책
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: deny-all
  namespace: production
spec:
  # 규칙이 비어있으면 모든 요청 거부
  {}
---
# 특정 서비스만 허용
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: allow-frontend-to-api
  namespace: production
spec:
  selector:
    matchLabels:
      app: api-server
  action: ALLOW
  rules:
    - from:
        - source:
            namespaces: ["production"]
            principals: ["cluster.local/ns/production/sa/frontend"]
      to:
        - operation:
            methods: ["GET", "POST"]
            paths: ["/api/v1/*"]
      when:
        - key: request.headers[x-api-version]
          values: ["v1", "v2"]

10.2 JWT 검증

# RequestAuthentication: JWT 검증 정의
apiVersion: security.istio.io/v1beta1
kind: RequestAuthentication
metadata:
  name: jwt-auth
  namespace: production
spec:
  selector:
    matchLabels:
      app: api-server
  jwtRules:
    - issuer: "https://auth.example.com"
      jwksUri: "https://auth.example.com/.well-known/jwks.json"
      forwardOriginalToken: true
      outputPayloadToHeader: "x-jwt-payload"
---
# JWT 클레임 기반 인가
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: require-jwt
  namespace: production
spec:
  selector:
    matchLabels:
      app: api-server
  action: ALLOW
  rules:
    - from:
        - source:
            requestPrincipals: ["https://auth.example.com/*"]
      when:
        - key: request.auth.claims[role]
          values: ["admin", "editor"]

10.3 외부 인가 (External Authorization)

# 외부 인가 서비스 연동
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: ext-authz
  namespace: production
spec:
  selector:
    matchLabels:
      app: api-server
  action: CUSTOM
  provider:
    name: "opa-ext-authz"
  rules:
    - to:
        - operation:
            paths: ["/admin/*"]

# MeshConfig에 외부 인가 프로바이더 등록
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
spec:
  meshConfig:
    extensionProviders:
      - name: "opa-ext-authz"
        envoyExtAuthzGrpc:
          service: "opa.opa-system.svc.cluster.local"
          port: 9191
          includeRequestBodyInCheck:
            maxRequestBytes: 1024

11. 프로덕션 운영 베스트 프랙티스

11.1 리소스 제한 설정

# Istio sidecar 리소스 제한
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
spec:
  meshConfig:
    defaultConfig:
      concurrency: 2
  values:
    global:
      proxy:
        resources:
          requests:
            cpu: 100m
            memory: 128Mi
          limits:
            cpu: 500m
            memory: 256Mi

11.2 점진적 롤아웃 전략

# 1단계: PERMISSIVE mTLS (기존 트래픽 허용)
kubectl apply -f - <<EOF
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: default
  namespace: default
spec:
  mtls:
    mode: PERMISSIVE
EOF

# 2단계: 메트릭 모니터링 (mTLS 트래픽 비율 확인)
# istio_requests_total 메트릭에서 connection_security_policy 확인

# 3단계: STRICT mTLS 전환
kubectl apply -f - <<EOF
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: default
  namespace: default
spec:
  mtls:
    mode: STRICT
EOF

11.3 디버깅 도구

# Istio 프록시 상태 확인
istioctl proxy-status

# Envoy 설정 덤프
istioctl proxy-config all POD_NAME -o json

# 라우팅 규칙 확인
istioctl proxy-config route POD_NAME

# 클러스터 설정 확인
istioctl proxy-config cluster POD_NAME

# 분석 도구 (설정 오류 탐지)
istioctl analyze --all-namespaces

# Linkerd 진단
linkerd check
linkerd diagnostics proxy-metrics POD_NAME
linkerd viz stat deploy
linkerd viz top deploy/webapp
linkerd viz tap deploy/webapp

11.4 Horizontal Pod Autoscaler 연동

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: reviews-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: reviews
  minReplicas: 3
  maxReplicas: 20
  metrics:
    - type: Pods
      pods:
        metric:
          name: istio_requests_per_second
        target:
          type: AverageValue
          averageValue: "100"
    - type: Pods
      pods:
        metric:
          name: istio_request_duration_milliseconds_p99
        target:
          type: AverageValue
          averageValue: "500"

11.5 업그레이드 전략

# Istio 카나리 업그레이드
# 1. 새 버전 컨트롤 플레인 설치 (리비전 기반)
istioctl install --set revision=1-24-0

# 2. 네임스페이스 라벨 변경으로 점진적 전환
kubectl label namespace default istio.io/rev=1-24-0 --overwrite

# 3. Pod 재시작으로 새 프록시 적용
kubectl rollout restart deployment -n default

# 4. 이전 버전 제거
istioctl uninstall --revision 1-23-0

12. Service Mesh를 사용하지 말아야 할 때

Service Mesh는 강력하지만 모든 환경에 적합한 것은 아닙니다.

사용하지 말아야 할 상황:

서비스 수가 적은 경우: 5개 이하의 서비스라면 Service Mesh의 복잡성이 이점보다 클 수 있습니다.
팀이 Kubernetes에 익숙하지 않은 경우: Service Mesh는 Kubernetes 위에 추가되는 복잡성입니다.
리소스가 극도로 제한된 경우: 사이드카 프록시의 메모리/CPU 오버헤드를 감당하기 어려울 때.
성능이 극도로 중요한 경우: 마이크로초 단위의 레이턴시가 중요한 HFT(고빈도 거래) 같은 환경.

대안 고려:

단순한 mTLS만 필요: cert-manager + 서비스 자체 TLS
기본 관찰 가능성: OpenTelemetry 직접 계측
간단한 로드 밸런싱: Kubernetes Service (ClusterIP)
인그레스만 필요: NGINX Ingress Controller 또는 Traefik
네트워크 정책: Kubernetes NetworkPolicy 또는 Cilium

퀴즈

Q1: Service Mesh에서 데이터 플레인과 컨트롤 플레인의 역할을 설명하세요.

데이터 플레인: 사이드카 프록시들의 집합으로, 실제 서비스 트래픽을 가로채서 처리합니다. mTLS 암호화, 로드 밸런싱, 메트릭 수집, 재시도/타임아웃 등을 수행합니다. Istio는 Envoy, Linkerd는 linkerd2-proxy를 사용합니다.

컨트롤 플레인: 데이터 플레인의 프록시들을 중앙에서 관리하고 설정합니다. 서비스 디스커버리, 인증서 발급, 정책 배포 등을 담당합니다. Istio는 Istiod, Linkerd는 destination/identity/proxy-injector 컴포넌트로 구성됩니다.

Q2: mTLS에서 일반 TLS와의 핵심 차이점은 무엇인가요?

일반 TLS에서는 클라이언트만 서버의 인증서를 검증합니다. mTLS(상호 TLS)에서는 양쪽 모두 인증서를 제시하고 검증합니다.

클라이언트가 서버의 인증서를 검증 (일반 TLS와 동일)
서버도 클라이언트의 인증서를 검증 (mTLS의 추가 단계)
이를 통해 서비스 간 양방향 신원 확인이 가능합니다
SPIFFE 표준을 사용하여 서비스의 신원을 Kubernetes ServiceAccount에 매핑합니다

Q3: Istio의 Ambient Mesh가 해결하는 문제와 아키텍처를 설명하세요.

해결하는 문제: 기존 사이드카 방식은 Pod당 50-100MB 메모리 오버헤드, 사이드카 주입을 위한 Pod 재시작 필요, 모든 요청에 프록시 홉 추가 등의 문제가 있습니다.

아키텍처:

ztunnel: 노드당 하나의 데몬셋으로 실행되는 L4 프록시. Rust로 작성되어 매우 가볍고, mTLS와 기본 인증만 담당합니다.
Waypoint Proxy: L7 기능이 필요한 경우에만 선택적으로 배포. Envoy 기반으로 VirtualService, 트래픽 관리 등 전체 L7 기능을 제공합니다.

100개 Pod 기준으로 메모리 사용량이 5-10GB(사이드카)에서 200-500MB(Ambient)로 대폭 절감됩니다.

Q4: Istio와 Linkerd 중 어떤 상황에서 각각을 선택해야 하나요?

Istio 선택 기준:

세밀한 트래픽 관리가 필요 (가중치 라우팅, 폴트 인젝션, 미러링)
복잡한 보안 정책 (JWT 검증, 외부 인가, RBAC)
Wasm 확장 플러그인 필요
Ambient Mesh(사이드카 없는 모드) 사용

Linkerd 선택 기준:

리소스 오버헤드 최소화 (Pod당 10-20MB)
빠른 도입과 간단한 운영
핵심 기능(mTLS, 메트릭, 재시도)만으로 충분
소규모 팀 운영

Q5: Service Mesh를 도입하지 말아야 할 상황은 언제인가요?

서비스 수가 5개 이하: 복잡성이 이점보다 큽니다
팀이 Kubernetes에 미숙: Service Mesh는 추가 복잡성 레이어입니다
극도의 리소스 제한: 사이드카 메모리/CPU 오버헤드 감당 불가
극도의 저지연 요구: 마이크로초 단위 레이턴시가 중요한 환경 (HFT 등)

대안: cert-manager(mTLS), OpenTelemetry(관찰 가능성), Kubernetes NetworkPolicy(네트워크 보안), NGINX Ingress(인그레스)