Split View: API 설계 & 마이크로서비스 아키텍처 완전 가이드

API 설계 & 마이크로서비스 아키텍처 완전 가이드

1. API 설계 원칙

1.1 Richardson 성숙도 모델

Leonard Richardson이 제안한 REST 성숙도 모델은 API의 RESTful 수준을 4단계로 분류한다.

레벨	이름	설명
Level 0	The Swamp of POX	단일 URI, 단일 HTTP 메서드 (보통 POST)
Level 1	Resources	개별 리소스 URI 사용, 여전히 단일 메서드
Level 2	HTTP Verbs	HTTP 메서드를 올바르게 활용 (GET, POST, PUT, DELETE)
Level 3	Hypermedia Controls	HATEOAS - 응답에 다음 행동 링크 포함

대부분의 실무 API는 Level 2를 목표로 하며, Level 3 (HATEOAS)는 구현 복잡도 대비 실질적 이점이 적어 선택적으로 적용한다.

1.2 리소스 네이밍 컨벤션

좋은 API 설계의 핵심은 일관된 리소스 네이밍이다.

# 좋은 예
GET    /api/v1/users
GET    /api/v1/users/123
GET    /api/v1/users/123/orders
POST   /api/v1/users
PUT    /api/v1/users/123
DELETE /api/v1/users/123

# 나쁜 예
GET    /api/v1/getUsers
POST   /api/v1/createUser
GET    /api/v1/user_list

핵심 원칙:

명사 사용: 리소스는 명사로 표현한다 (users, orders, products)
복수형: 컬렉션은 복수형을 사용한다 (/users, /orders)
소문자 + 하이픈: kebab-case를 사용한다 (/order-items)
계층 관계: 중첩 리소스로 표현한다 (/users/123/orders)
필터링은 쿼리 파라미터: /users?status=active&role=admin

1.3 HTTP 메서드와 상태 코드

GET     - 조회 (안전, 멱등)
POST    - 생성 (비안전, 비멱등)
PUT     - 전체 수정 (비안전, 멱등)
PATCH   - 부분 수정 (비안전, 비멱등)
DELETE  - 삭제 (비안전, 멱등)
OPTIONS - CORS preflight
HEAD    - 헤더만 조회

주요 상태 코드 가이드:

코드	의미	사용 시점
200	OK	성공적인 GET, PUT, PATCH
201	Created	성공적인 POST (Location 헤더 포함)
204	No Content	성공적인 DELETE
400	Bad Request	잘못된 요청 형식
401	Unauthorized	인증 실패
403	Forbidden	권한 없음
404	Not Found	리소스 없음
409	Conflict	리소스 충돌
422	Unprocessable Entity	유효성 검사 실패
429	Too Many Requests	Rate Limit 초과
500	Internal Server Error	서버 오류

1.4 요청/응답 설계

일관된 응답 형식은 클라이언트 개발 경험을 크게 향상시킨다.

{
  "data": {
    "id": "user_123",
    "type": "user",
    "attributes": {
      "name": "홍길동",
      "email": "hong@example.com",
      "created_at": "2026-04-12T10:00:00Z"
    }
  },
  "meta": {
    "request_id": "req_abc123",
    "timestamp": "2026-04-12T10:00:00Z"
  }
}

페이지네이션 응답:

{
  "data": [],
  "pagination": {
    "page": 1,
    "per_page": 20,
    "total": 150,
    "total_pages": 8,
    "next_cursor": "eyJpZCI6MTAwfQ=="
  }
}

에러 응답:

{
  "error": {
    "code": "VALIDATION_ERROR",
    "message": "요청 데이터가 유효하지 않습니다",
    "details": [
      {
        "field": "email",
        "message": "올바른 이메일 형식이 아닙니다"
      }
    ]
  }
}

2. REST vs GraphQL vs gRPC 비교

2.1 비교표

특성	REST	GraphQL	gRPC
프로토콜	HTTP/1.1	HTTP/1.1	HTTP/2
데이터 형식	JSON/XML	JSON	Protocol Buffers
스키마	OpenAPI (선택)	SDL (필수)	.proto (필수)
타입 안전성	약함	강함	매우 강함
오버/언더 페칭	있음	해결	해결
실시간	WebSocket	Subscription	양방향 스트리밍
브라우저 지원	네이티브	네이티브	grpc-web 필요
성능	보통	보통	높음
학습 곡선	낮음	중간	높음

2.2 각 기술의 적합한 사용 사례

REST가 적합한 경우:

공개 API (Open API)
단순한 CRUD 작업
캐싱이 중요한 경우 (HTTP 캐싱 활용)
브라우저에서 직접 호출하는 경우

GraphQL이 적합한 경우:

모바일 앱 (대역폭 최적화)
복잡한 데이터 관계가 있는 경우
다양한 클라이언트가 다른 데이터를 요구하는 경우
빠른 프론트엔드 개발 주기

gRPC가 적합한 경우:

마이크로서비스 간 내부 통신
높은 성능이 필요한 경우
양방향 스트리밍이 필요한 경우
다중 언어 환경

2.3 GraphQL 예제

# 스키마 정의
type User {
  id: ID!
  name: String!
  email: String!
  orders: [Order!]!
}

type Order {
  id: ID!
  total: Float!
  status: OrderStatus!
  items: [OrderItem!]!
}

type Query {
  user(id: ID!): User
  users(page: Int, limit: Int): [User!]!
}

type Mutation {
  createUser(input: CreateUserInput!): User!
  updateUser(id: ID!, input: UpdateUserInput!): User!
}

# 클라이언트 쿼리 - 필요한 필드만 요청
query GetUserWithOrders {
  user(id: "123") {
    name
    email
    orders {
      id
      total
      status
    }
  }
}

2.4 gRPC 예제

syntax = "proto3";

package ecommerce;

service UserService {
  rpc GetUser (GetUserRequest) returns (UserResponse);
  rpc ListUsers (ListUsersRequest) returns (stream UserResponse);
  rpc CreateUser (CreateUserRequest) returns (UserResponse);
}

message GetUserRequest {
  string user_id = 1;
}

message UserResponse {
  string id = 1;
  string name = 2;
  string email = 3;
  int64 created_at = 4;
}

message ListUsersRequest {
  int32 page = 1;
  int32 limit = 2;
}

message CreateUserRequest {
  string name = 1;
  string email = 2;
}

3. API 버저닝

3.1 버저닝 전략 비교

전략	예시	장점	단점
URI 버저닝	/api/v1/users	직관적, 캐싱 용이	URI 오염
헤더 버저닝	Accept: application/vnd.api+json;version=1	URI 깔끔	테스트 어려움
쿼리 버저닝	/api/users?version=1	구현 간단	캐싱 복잡
Content Negotiation	Accept: application/vnd.company.v1+json	표준 준수	복잡함

URI 버저닝이 가장 널리 사용되며, 실용성과 명확성 면에서 권장된다.

3.2 하위 호환성 유지 원칙

하위 호환 가능한 변경 (비파괴적):
  - 새로운 엔드포인트 추가
  - 응답에 새로운 필드 추가
  - 선택적 요청 파라미터 추가
  - 새로운 enum 값 추가 (클라이언트가 unknown 처리 시)

하위 호환 불가능한 변경 (파괴적):
  - 기존 필드 제거 또는 이름 변경
  - 필드 타입 변경
  - 필수 파라미터 추가
  - 응답 구조 변경
  - URL 경로 변경

3.3 API 폐기 전략

Phase 1: Sunset 헤더 추가
  Sunset: Sat, 01 Jan 2027 00:00:00 GMT
  Deprecation: true
  Link: <https://api.example.com/v2/docs>; rel="successor-version"

Phase 2: 응답에 경고 포함 (6개월)
Phase 3: Rate limit 단계적 축소 (3개월)
Phase 4: 410 Gone 응답 반환

4. 인증/인가 (Authentication / Authorization)

4.1 인증 방식 비교

방식	보안 수준	사용 사례	복잡도
API Key	낮음	내부/파트너 API	낮음
OAuth 2.0	높음	사용자 인증 위임	높음
JWT	중간	무상태 인증	중간
mTLS	매우 높음	서비스 간 통신	높음

4.2 OAuth 2.0 플로우

Authorization Code Flow (서버 사이드 앱 권장):

1. 클라이언트 --> 인가 서버: 인가 코드 요청
   GET /authorize?response_type=code
     &client_id=CLIENT_ID
     &redirect_uri=CALLBACK_URL
     &scope=read:user
     &state=RANDOM_STATE

2. 사용자 --> 인가 서버: 로그인 및 권한 동의

3. 인가 서버 --> 클라이언트: 인가 코드 반환
   302 Redirect: CALLBACK_URL?code=AUTH_CODE&state=RANDOM_STATE

4. 클라이언트 --> 인가 서버: 토큰 교환
   POST /token
     grant_type=authorization_code
     &code=AUTH_CODE
     &client_id=CLIENT_ID
     &client_secret=CLIENT_SECRET

5. 인가 서버 --> 클라이언트: 액세스 토큰 + 리프레시 토큰

4.3 JWT 구조와 주의사항

{
  "header": {
    "alg": "RS256",
    "typ": "JWT",
    "kid": "key-id-001"
  },
  "payload": {
    "sub": "user_123",
    "iss": "auth.example.com",
    "aud": "api.example.com",
    "exp": 1744540800,
    "iat": 1744537200,
    "scope": "read:users write:orders",
    "roles": ["admin"]
  }
}

JWT 보안 체크리스트:

RS256 (비대칭) 알고리즘 사용 권장 (HS256보다 안전)
짧은 만료 시간 설정 (15분 이하)
리프레시 토큰은 서버사이드에 저장
iss, aud, exp 클레임을 반드시 검증
none 알고리즘 거부
kid (Key ID) 검증으로 키 혼동 공격 방지

4.4 mTLS (상호 TLS)

서비스 간 통신 보안:

1. 각 서비스에 고유한 X.509 인증서 발급
2. 통신 시 양방향 인증서 검증
3. 인증서 자동 갱신 (cert-manager 등)

장점:
  - 서비스 ID를 암호학적으로 증명
  - 네트워크 레벨에서 암호화
  - Zero Trust 아키텍처의 기반

단점:
  - 인증서 관리 복잡도
  - 인증서 만료 시 서비스 중단 위험
  - 디버깅 어려움

5. Rate Limiting

5.1 알고리즘 비교

Token Bucket:

원리: 일정 속도로 토큰이 추가되며 요청 시 토큰 소비
장점: 버스트 트래픽 허용, 평균 속도 유지
단점: 메모리 사용

예시:
  - 버킷 크기: 100 토큰
  - 충전 속도: 10 토큰/초
  - 요청당 1 토큰 소비
  - 버스트: 최대 100 요청 동시 가능

Sliding Window:

원리: 시간 윈도우 내 요청 수 카운팅
장점: 정확한 제한, 윈도우 경계 문제 해결
단점: 이전 윈도우 카운터 저장 필요

예시:
  현재 시간: 12:01:30
  이전 윈도우 (12:00-12:01): 80 요청
  현재 윈도우 (12:01-12:02): 20 요청
  가중 합계: 80 * 0.5 + 20 = 60 (한도 100 이내)

5.2 Rate Limit 응답 헤더

HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1744540860
Retry-After: 60

5.3 API Gateway를 이용한 Rate Limiting

# Kong Gateway 설정 예시
plugins:
  - name: rate-limiting
    config:
      minute: 100
      hour: 1000
      policy: redis
      redis_host: redis-cluster
      redis_port: 6379
      fault_tolerant: true
      hide_client_headers: false

6. 마이크로서비스 패턴

6.1 서비스 분리 기준 - DDD Bounded Context

이커머스 도메인 분석:

[주문 컨텍스트]          [상품 컨텍스트]
  - Order                - Product
  - OrderItem            - Category
  - OrderStatus          - Inventory
  - Payment              - Price

[사용자 컨텍스트]        [배송 컨텍스트]
  - User                 - Shipment
  - Address              - Tracking
  - Authentication       - Carrier
  - Profile              - Delivery

[알림 컨텍스트]          [검색 컨텍스트]
  - Notification         - SearchIndex
  - Template             - Filter
  - Channel              - Ranking

분리 원칙:

비즈니스 능력(Business Capability) 기준으로 분리
데이터 소유권: 각 서비스가 자체 데이터베이스 소유
팀 자율성: 두 피자 팀 규칙 (6-8명)
배포 독립성: 다른 서비스 변경 없이 배포 가능
느슨한 결합, 높은 응집도

6.2 서비스 통신 - 동기 vs 비동기

동기 통신 (Request-Response):

  [API Gateway] --> [Order Service] --> [Payment Service]
                                    --> [Inventory Service]

  장점: 즉시 응답, 구현 간단
  단점: 강한 결합, 연쇄 장애, 지연 시간 누적

비동기 통신 (Event-Driven):

  [Order Service] --> [Message Broker] --> [Payment Service]
                                       --> [Inventory Service]
                                       --> [Notification Service]

  장점: 느슨한 결합, 높은 복원력, 확장성
  단점: 최종 일관성, 디버깅 어려움

Kafka vs RabbitMQ 비교:

특성	Apache Kafka	RabbitMQ
모델	Pub/Sub + Log	Queue + Exchange
순서 보장	파티션 내 보장	큐 내 보장
처리량	매우 높음 (수백만/초)	높음 (수만/초)
메시지 보존	설정 기간 동안 보존	소비 후 삭제
재처리	오프셋 리셋으로 가능	불가 (DLQ 사용)
사용 사례	이벤트 스트리밍, 로그	작업 큐, RPC

6.3 API Gateway 패턴

역할:
  - 요청 라우팅
  - 인증/인가
  - Rate Limiting
  - 로드 밸런싱
  - 요청/응답 변환
  - 회로 차단
  - 모니터링/로깅

주요 솔루션:
  - Kong: 오픈소스, 플러그인 생태계
  - Envoy: 고성능, L7 프록시
  - AWS API Gateway: 관리형, 서버리스
  - NGINX: 경량, 높은 성능
  - Traefik: 자동 서비스 디스커버리

# Envoy 라우팅 설정 예시
static_resources:
  listeners:
    - name: listener_0
      address:
        socket_address:
          address: 0.0.0.0
          port_value: 8080
      filter_chains:
        - filters:
            - name: envoy.filters.network.http_connection_manager
              typed_config:
                route_config:
                  virtual_hosts:
                    - name: backend
                      domains: ["*"]
                      routes:
                        - match:
                            prefix: "/api/v1/users"
                          route:
                            cluster: user_service
                        - match:
                            prefix: "/api/v1/orders"
                          route:
                            cluster: order_service

6.4 서비스 디스커버리

클라이언트 사이드 디스커버리:
  1. 서비스 A --> 서비스 레지스트리: 서비스 B 주소 조회
  2. 서비스 A --> 서비스 B: 직접 호출
  도구: Eureka, Consul

서버 사이드 디스커버리:
  1. 서비스 A --> 로드 밸런서: 요청
  2. 로드 밸런서 --> 서비스 레지스트리: 주소 조회
  3. 로드 밸런서 --> 서비스 B: 전달
  도구: K8s Service + DNS, AWS ALB

Kubernetes DNS 기반:
  서비스 내부 DNS: service-name.namespace.svc.cluster.local
  예: order-service.production.svc.cluster.local

6.5 Circuit Breaker 패턴

상태 전이:

  [Closed] --실패 임계치 초과--> [Open]
  [Open]   --타임아웃 경과--> [Half-Open]
  [Half-Open] --성공--> [Closed]
  [Half-Open] --실패--> [Open]

// Resilience4j 설정 예시
CircuitBreakerConfig config = CircuitBreakerConfig.custom()
    .failureRateThreshold(50)        // 50% 실패 시 Open
    .waitDurationInOpenState(
        Duration.ofSeconds(30))       // 30초 후 Half-Open
    .slidingWindowSize(10)            // 최근 10개 요청 기준
    .minimumNumberOfCalls(5)          // 최소 5개 호출 후 판단
    .permittedNumberOfCallsInHalfOpenState(3)
    .build();

CircuitBreaker circuitBreaker = CircuitBreaker.of(
    "paymentService", config);

Supplier<PaymentResponse> decoratedSupplier =
    CircuitBreaker.decorateSupplier(
        circuitBreaker,
        () -> paymentService.processPayment(request)
    );

폴백(Fallback) 전략:

캐시된 응답 반환: 이전 성공 응답 사용
기본값 반환: 미리 정의된 기본 응답
대체 서비스 호출: 백업 서비스 사용
그레이스풀 디그레이데이션: 기능 축소 응답

7. 서비스 메시

7.1 서비스 메시란?

서비스 메시 아키텍처:

  [Service A] <--> [Sidecar Proxy] <--> [Sidecar Proxy] <--> [Service B]
                          |                      |
                          v                      v
                   [Control Plane (Istio/Linkerd)]
                          |
                   [설정, 정책, 인증서 관리]

사이드카 프록시의 역할:
  - 트래픽 라우팅 및 로드 밸런싱
  - mTLS 암호화
  - 회로 차단
  - 재시도 및 타임아웃
  - 메트릭 수집
  - 분산 트레이싱

7.2 Istio vs Linkerd

특성	Istio	Linkerd
데이터 플레인	Envoy	linkerd2-proxy (Rust)
리소스 사용	높음	낮음
기능	매우 풍부	핵심 기능 집중
학습 곡선	가파름	완만
커뮤니티	Google 주도	CNCF Graduated
멀티클러스터	지원	지원

7.3 Istio 트래픽 관리

# 카나리 배포 - 트래픽 분할
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: order-service
spec:
  hosts:
    - order-service
  http:
    - route:
        - destination:
            host: order-service
            subset: v1
          weight: 90
        - destination:
            host: order-service
            subset: v2
          weight: 10
      retries:
        attempts: 3
        perTryTimeout: 2s
        retryOn: 5xx,reset,connect-failure
      timeout: 10s

# 회로 차단 설정
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: order-service
spec:
  host: order-service
  trafficPolicy:
    connectionPool:
      tcp:
        maxConnections: 100
      http:
        h2UpgradePolicy: DEFAULT
        http1MaxPendingRequests: 100
        http2MaxRequests: 1000
    outlierDetection:
      consecutive5xxErrors: 5
      interval: 30s
      baseEjectionTime: 30s
      maxEjectionPercent: 50

8. 이벤트 드리븐 아키텍처

8.1 Event Sourcing

전통적 방식: 현재 상태만 저장
  orders 테이블: id=1, status=SHIPPED, total=50000

Event Sourcing: 모든 상태 변경을 이벤트로 저장
  events 테이블:
    1. OrderCreated     (total=50000)
    2. PaymentReceived  (amount=50000)
    3. OrderConfirmed   ()
    4. ItemShipped       (tracking=KR123456)

장점:
  - 완전한 감사 로그
  - 시간 여행 (특정 시점 상태 재현)
  - 이벤트 재생으로 새로운 뷰 생성
  - 디버깅 용이

단점:
  - 이벤트 스키마 진화 관리
  - 이벤트 저장소 크기 증가
  - 최종 일관성 (Eventual Consistency)

8.2 CQRS (Command Query Responsibility Segregation)

CQRS 아키텍처:

  [Command] --> [Write Model] --> [Event Store]
                                       |
                                  [Event Bus]
                                       |
                              [Read Model 프로젝션]
                                       |
                                [Query] <-- [Read DB]

Command (쓰기):
  - 도메인 로직 실행
  - 이벤트 발행
  - 정규화된 데이터베이스

Query (읽기):
  - 비정규화된 읽기 전용 뷰
  - 빠른 조회 최적화
  - 다양한 저장소 사용 가능 (ES, Redis 등)

8.3 Saga 패턴 - 분산 트랜잭션

마이크로서비스 환경에서 2PC(Two-Phase Commit)는 성능과 가용성 문제가 있다. Saga 패턴은 로컬 트랜잭션의 시퀀스로 분산 트랜잭션을 관리한다.

Choreography (안무) 방식:

주문 생성 Saga:

  [Order Service]       [Payment Service]    [Inventory Service]
       |                      |                     |
  OrderCreated -------->      |                     |
       |              PaymentProcessed -------->    |
       |                      |              InventoryReserved
       |                      |                     |
       |              <--- (성공 시) --->            |
  OrderConfirmed              |                     |

보상 트랜잭션 (실패 시):
  InventoryReserveFailed --> PaymentRefunded --> OrderCancelled

Orchestration (오케스트레이션) 방식:

  [Order Saga Orchestrator]
       |
       |--> 1. Order Service: 주문 생성
       |--> 2. Payment Service: 결제 처리
       |--> 3. Inventory Service: 재고 예약
       |--> 4. Shipping Service: 배송 생성
       |
  (실패 시 역순 보상)
       |--> 3c. Inventory: 재고 해제
       |--> 2c. Payment: 환불 처리
       |--> 1c. Order: 주문 취소

9. 분산 트레이싱

9.1 Correlation ID 패턴

요청 흐름:

  [Client]
    X-Request-ID: req-abc-123
       |
  [API Gateway]
    X-Request-ID: req-abc-123
    X-Correlation-ID: corr-xyz-789
       |
  [Order Service] ----------> [Payment Service]
    trace_id: corr-xyz-789     trace_id: corr-xyz-789
    span_id: span-001          span_id: span-002
    parent_span_id: null        parent_span_id: span-001
       |
       ----------> [Inventory Service]
                    trace_id: corr-xyz-789
                    span_id: span-003
                    parent_span_id: span-001

9.2 OpenTelemetry 적용

// OpenTelemetry SDK 초기화
import { NodeSDK } from '@opentelemetry/sdk-node';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-grpc';
import { HttpInstrumentation } from '@opentelemetry/instrumentation-http';
import { ExpressInstrumentation } from '@opentelemetry/instrumentation-express';

const sdk = new NodeSDK({
  traceExporter: new OTLPTraceExporter({
    url: 'http://otel-collector:4317',
  }),
  instrumentations: [
    new HttpInstrumentation(),
    new ExpressInstrumentation(),
  ],
});

sdk.start();

// 수동 스팬 생성
import { trace, SpanStatusCode } from '@opentelemetry/api';

const tracer = trace.getTracer('order-service');

async function processOrder(orderId: string) {
  const span = tracer.startSpan('processOrder', {
    attributes: {
      'order.id': orderId,
      'service.name': 'order-service',
    },
  });

  try {
    span.addEvent('Validating order');
    await validateOrder(orderId);

    span.addEvent('Processing payment');
    await processPayment(orderId);

    span.setStatus({ code: SpanStatusCode.OK });
  } catch (error) {
    span.setStatus({
      code: SpanStatusCode.ERROR,
      message: String(error),
    });
    throw error;
  } finally {
    span.end();
  }
}

9.3 관찰 가능성 (Observability) 3요소

1. 로그 (Logs):
   - 구조화된 로그 (JSON)
   - 로그 레벨 (DEBUG, INFO, WARN, ERROR)
   - Correlation ID 포함
   - ELK Stack / Loki

2. 메트릭 (Metrics):
   - RED 메트릭: Rate, Errors, Duration
   - USE 메트릭: Utilization, Saturation, Errors
   - Prometheus + Grafana

3. 트레이스 (Traces):
   - 분산 요청 추적
   - 스팬(Span) 기반 시각화
   - Jaeger / Zipkin / Tempo

10. 실전: 이커머스 시스템 MSA 설계

10.1 전체 아키텍처

                    [CDN / CloudFront]
                           |
                    [API Gateway (Kong)]
                     /    |    |     \
                    /     |    |      \
  [User Service] [Product] [Order] [Payment]
       |          Service   Service   Service
       |            |         |         |
  [User DB]    [Product DB] [Order DB] [Payment DB]
  (PostgreSQL)  (PostgreSQL) (PostgreSQL) (PostgreSQL)
                    |         |
              [Search Service] [Notification]
                    |           Service
              [Elasticsearch]     |
                              [Kafka]
                                |
                         [Email/SMS/Push]

10.2 서비스별 기술 스택

User Service:
  - 언어: Go
  - DB: PostgreSQL
  - 캐시: Redis (세션)
  - 통신: REST + gRPC

Product Service:
  - 언어: Java (Spring Boot)
  - DB: PostgreSQL
  - 캐시: Redis (상품 정보)
  - 검색: Elasticsearch
  - 통신: REST + gRPC

Order Service:
  - 언어: Java (Spring Boot)
  - DB: PostgreSQL
  - 메시지: Kafka (주문 이벤트)
  - 통신: gRPC + Kafka

Payment Service:
  - 언어: Go
  - DB: PostgreSQL
  - 외부: PG사 연동
  - 통신: gRPC

Notification Service:
  - 언어: Node.js
  - DB: MongoDB (템플릿)
  - 메시지: Kafka Consumer
  - 외부: SendGrid, Firebase

10.3 주문 처리 흐름

1. 클라이언트 --> API Gateway: POST /api/v1/orders
2. API Gateway --> Order Service: 주문 생성 요청
3. Order Service --> Product Service (gRPC): 재고 확인
4. Order Service --> Kafka: OrderCreated 이벤트 발행
5. Payment Service (Consumer): 결제 처리
6. Payment Service --> Kafka: PaymentCompleted 이벤트
7. Order Service (Consumer): 주문 상태 업데이트
8. Notification Service (Consumer): 주문 확인 이메일 발송
9. Product Service (Consumer): 재고 차감

10.4 장애 대응 전략

Circuit Breaker:
  - Payment 서비스 장애 시 주문 접수만 진행
  - 결제는 Kafka 큐에 적재 후 재처리

Retry + Exponential Backoff:
  - 1차 재시도: 100ms
  - 2차 재시도: 200ms
  - 3차 재시도: 400ms
  - 최대 재시도: 5회

Bulkhead 패턴:
  - 서비스별 스레드 풀 격리
  - 하나의 서비스 장애가 전체로 전파되지 않도록 격리

Dead Letter Queue:
  - 처리 실패 메시지를 DLQ로 이동
  - 수동 분석 및 재처리
  - 알림 설정으로 운영팀 통보

10.5 배포 전략

# Kubernetes Deployment - 카나리 배포
apiVersion: apps/v1
kind: Deployment
metadata:
  name: order-service-canary
  labels:
    app: order-service
    version: v2
spec:
  replicas: 1
  selector:
    matchLabels:
      app: order-service
      version: v2
  template:
    metadata:
      labels:
        app: order-service
        version: v2
    spec:
      containers:
        - name: order-service
          image: order-service:2.0.0
          ports:
            - containerPort: 8080
          resources:
            requests:
              memory: "256Mi"
              cpu: "250m"
            limits:
              memory: "512Mi"
              cpu: "500m"
          readinessProbe:
            httpGet:
              path: /health/ready
              port: 8080
            initialDelaySeconds: 10
            periodSeconds: 5
          livenessProbe:
            httpGet:
              path: /health/live
              port: 8080
            initialDelaySeconds: 30
            periodSeconds: 10

마무리

API 설계와 마이크로서비스 아키텍처는 현대 소프트웨어 개발의 핵심 역량이다. 핵심 포인트를 정리하면 다음과 같다.

API 설계는 계약이다 - 일관된 네이밍, 적절한 상태 코드, 명확한 에러 메시지가 개발 생산성을 결정한다
기술 선택은 맥락에 따라 - REST, GraphQL, gRPC 각각의 강점이 다르며, 혼용이 일반적이다
보안은 처음부터 - 인증/인가, mTLS, Rate Limiting은 후순위가 아니라 설계 단계부터 고려해야 한다
서비스 분리는 신중하게 - DDD Bounded Context를 기반으로, 너무 작지도 크지도 않게 분리한다
장애는 반드시 온다 - Circuit Breaker, Retry, Bulkhead, Saga 패턴으로 복원력을 확보한다
관찰 가능성이 곧 운영력 - 로그, 메트릭, 트레이스 3요소를 통합하여 문제를 빠르게 파악한다

모놀리스에서 시작하여 실제 필요에 따라 점진적으로 마이크로서비스를 도입하는 것이 가장 현명한 접근 방법이다. 기술적 우아함보다 비즈니스 가치 전달에 집중하자.

The Complete Guide to API Design & Microservices Architecture

1. API Design Principles

1.1 Richardson Maturity Model

The REST maturity model proposed by Leonard Richardson classifies how RESTful an API is across four levels.

Level	Name	Description
Level 0	The Swamp of POX	Single URI, single HTTP method (usually POST)
Level 1	Resources	Individual resource URIs, still single method
Level 2	HTTP Verbs	Proper use of HTTP methods (GET, POST, PUT, DELETE)
Level 3	Hypermedia Controls	HATEOAS - responses include links to next actions

Most production APIs target Level 2, while Level 3 (HATEOAS) is applied selectively since its implementation complexity rarely justifies the benefits.

1.2 Resource Naming Conventions

Consistent resource naming is the cornerstone of good API design.

# Good examples
GET    /api/v1/users
GET    /api/v1/users/123
GET    /api/v1/users/123/orders
POST   /api/v1/users
PUT    /api/v1/users/123
DELETE /api/v1/users/123

# Bad examples
GET    /api/v1/getUsers
POST   /api/v1/createUser
GET    /api/v1/user_list

Core principles:

Use nouns: Resources are expressed as nouns (users, orders, products)
Plurals: Collections use plural forms (/users, /orders)
Lowercase + hyphens: Use kebab-case (/order-items)
Hierarchical relationships: Express with nested resources (/users/123/orders)
Filtering via query parameters: /users?status=active&role=admin

1.3 HTTP Methods and Status Codes

GET     - Read (safe, idempotent)
POST    - Create (unsafe, non-idempotent)
PUT     - Full update (unsafe, idempotent)
PATCH   - Partial update (unsafe, non-idempotent)
DELETE  - Delete (unsafe, idempotent)
OPTIONS - CORS preflight
HEAD    - Headers only

Key status code guide:

Code	Meaning	When to Use
200	OK	Successful GET, PUT, PATCH
201	Created	Successful POST (include Location header)
204	No Content	Successful DELETE
400	Bad Request	Invalid request format
401	Unauthorized	Authentication failure
403	Forbidden	Insufficient permissions
404	Not Found	Resource does not exist
409	Conflict	Resource conflict
422	Unprocessable Entity	Validation failure
429	Too Many Requests	Rate limit exceeded
500	Internal Server Error	Server error

1.4 Request/Response Design

A consistent response format greatly improves the client development experience.

{
  "data": {
    "id": "user_123",
    "type": "user",
    "attributes": {
      "name": "John Doe",
      "email": "john@example.com",
      "created_at": "2026-04-12T10:00:00Z"
    }
  },
  "meta": {
    "request_id": "req_abc123",
    "timestamp": "2026-04-12T10:00:00Z"
  }
}

Pagination response:

{
  "data": [],
  "pagination": {
    "page": 1,
    "per_page": 20,
    "total": 150,
    "total_pages": 8,
    "next_cursor": "eyJpZCI6MTAwfQ=="
  }
}

Error response:

{
  "error": {
    "code": "VALIDATION_ERROR",
    "message": "The request data is invalid",
    "details": [
      {
        "field": "email",
        "message": "Not a valid email format"
      }
    ]
  }
}

2. REST vs GraphQL vs gRPC

2.1 Comparison Table

Feature	REST	GraphQL	gRPC
Protocol	HTTP/1.1	HTTP/1.1	HTTP/2
Data Format	JSON/XML	JSON	Protocol Buffers
Schema	OpenAPI (optional)	SDL (required)	.proto (required)
Type Safety	Weak	Strong	Very Strong
Over/Under-fetching	Present	Resolved	Resolved
Real-time	WebSocket	Subscription	Bidirectional Streaming
Browser Support	Native	Native	Requires grpc-web
Performance	Moderate	Moderate	High
Learning Curve	Low	Medium	High

2.2 Best Use Cases for Each

REST works best when:

Building public APIs (Open API)
Simple CRUD operations
Caching is critical (leveraging HTTP caching)
Browser-direct API calls

GraphQL works best when:

Mobile apps (bandwidth optimization)
Complex data relationships
Different clients need different data shapes
Fast frontend development cycles

gRPC works best when:

Internal microservice communication
High performance is required
Bidirectional streaming is needed
Polyglot environments

2.3 GraphQL Example

# Schema definition
type User {
  id: ID!
  name: String!
  email: String!
  orders: [Order!]!
}

type Order {
  id: ID!
  total: Float!
  status: OrderStatus!
  items: [OrderItem!]!
}

type Query {
  user(id: ID!): User
  users(page: Int, limit: Int): [User!]!
}

type Mutation {
  createUser(input: CreateUserInput!): User!
  updateUser(id: ID!, input: UpdateUserInput!): User!
}

# Client query - request only needed fields
query GetUserWithOrders {
  user(id: "123") {
    name
    email
    orders {
      id
      total
      status
    }
  }
}

2.4 gRPC Example

syntax = "proto3";

package ecommerce;

service UserService {
  rpc GetUser (GetUserRequest) returns (UserResponse);
  rpc ListUsers (ListUsersRequest) returns (stream UserResponse);
  rpc CreateUser (CreateUserRequest) returns (UserResponse);
}

message GetUserRequest {
  string user_id = 1;
}

message UserResponse {
  string id = 1;
  string name = 2;
  string email = 3;
  int64 created_at = 4;
}

message ListUsersRequest {
  int32 page = 1;
  int32 limit = 2;
}

message CreateUserRequest {
  string name = 1;
  string email = 2;
}

3. API Versioning

3.1 Versioning Strategy Comparison

Strategy	Example	Pros	Cons
URI Versioning	/api/v1/users	Intuitive, caching-friendly	URI pollution
Header Versioning	Accept: application/vnd.api+json;version=1	Clean URIs	Hard to test
Query Versioning	/api/users?version=1	Simple implementation	Complex caching
Content Negotiation	Accept: application/vnd.company.v1+json	Standards-compliant	Complex

URI versioning is the most widely adopted, and is recommended for its practicality and clarity.

3.2 Backward Compatibility Principles

Backward-compatible changes (non-breaking):
  - Adding new endpoints
  - Adding new fields to responses
  - Adding optional request parameters
  - Adding new enum values (if clients handle unknown)

Breaking changes:
  - Removing or renaming existing fields
  - Changing field types
  - Adding required parameters
  - Changing response structure
  - Changing URL paths

3.3 API Deprecation Strategy

Phase 1: Add Sunset header
  Sunset: Sat, 01 Jan 2027 00:00:00 GMT
  Deprecation: true
  Link: <https://api.example.com/v2/docs>; rel="successor-version"

Phase 2: Include warnings in responses (6 months)
Phase 3: Gradually reduce rate limits (3 months)
Phase 4: Return 410 Gone

4. Authentication and Authorization

4.1 Authentication Method Comparison

Method	Security Level	Use Case	Complexity
API Key	Low	Internal/Partner APIs	Low
OAuth 2.0	High	Delegated user auth	High
JWT	Medium	Stateless auth	Medium
mTLS	Very High	Service-to-service	High

4.2 OAuth 2.0 Flow

Authorization Code Flow (recommended for server-side apps):

1. Client --> Auth Server: Request authorization code
   GET /authorize?response_type=code
     &client_id=CLIENT_ID
     &redirect_uri=CALLBACK_URL
     &scope=read:user
     &state=RANDOM_STATE

2. User --> Auth Server: Login and grant consent

3. Auth Server --> Client: Return authorization code
   302 Redirect: CALLBACK_URL?code=AUTH_CODE&state=RANDOM_STATE

4. Client --> Auth Server: Exchange for token
   POST /token
     grant_type=authorization_code
     &code=AUTH_CODE
     &client_id=CLIENT_ID
     &client_secret=CLIENT_SECRET

5. Auth Server --> Client: Access token + Refresh token

4.3 JWT Structure and Security

{
  "header": {
    "alg": "RS256",
    "typ": "JWT",
    "kid": "key-id-001"
  },
  "payload": {
    "sub": "user_123",
    "iss": "auth.example.com",
    "aud": "api.example.com",
    "exp": 1744540800,
    "iat": 1744537200,
    "scope": "read:users write:orders",
    "roles": ["admin"]
  }
}

JWT Security Checklist:

Prefer RS256 (asymmetric) algorithm over HS256
Set short expiration times (15 minutes or less)
Store refresh tokens server-side
Always validate iss, aud, and exp claims
Reject the none algorithm
Validate kid (Key ID) to prevent key confusion attacks

4.4 mTLS (Mutual TLS)

Service-to-service communication security:

1. Issue unique X.509 certificates to each service
2. Mutual certificate verification during communication
3. Automatic certificate renewal (cert-manager, etc.)

Pros:
  - Cryptographically prove service identity
  - Network-level encryption
  - Foundation for Zero Trust architecture

Cons:
  - Certificate management complexity
  - Service outage risk on certificate expiration
  - Debugging difficulty

5. Rate Limiting

5.1 Algorithm Comparison

Token Bucket:

Principle: Tokens are added at a constant rate; consumed per request
Pros: Allows burst traffic while maintaining average rate
Cons: Memory usage

Example:
  - Bucket size: 100 tokens
  - Refill rate: 10 tokens/second
  - 1 token consumed per request
  - Burst: up to 100 simultaneous requests

Sliding Window:

Principle: Count requests within a time window
Pros: Accurate limiting, solves window boundary issues
Cons: Requires storing previous window counters

Example:
  Current time: 12:01:30
  Previous window (12:00-12:01): 80 requests
  Current window (12:01-12:02): 20 requests
  Weighted sum: 80 * 0.5 + 20 = 60 (within limit of 100)

5.2 Rate Limit Response Headers

HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1744540860
Retry-After: 60

5.3 Rate Limiting with API Gateway

# Kong Gateway configuration example
plugins:
  - name: rate-limiting
    config:
      minute: 100
      hour: 1000
      policy: redis
      redis_host: redis-cluster
      redis_port: 6379
      fault_tolerant: true
      hide_client_headers: false

6. Microservices Patterns

6.1 Service Decomposition - DDD Bounded Context

E-commerce domain analysis:

[Order Context]              [Product Context]
  - Order                    - Product
  - OrderItem                - Category
  - OrderStatus              - Inventory
  - Payment                  - Price

[User Context]               [Shipping Context]
  - User                     - Shipment
  - Address                  - Tracking
  - Authentication           - Carrier
  - Profile                  - Delivery

[Notification Context]       [Search Context]
  - Notification             - SearchIndex
  - Template                 - Filter
  - Channel                  - Ranking

Decomposition principles:

Business Capability based decomposition
Data ownership: Each service owns its own database
Team autonomy: Two-pizza team rule (6-8 people)
Deployment independence: Deployable without changing other services
Loose coupling, high cohesion

6.2 Service Communication - Synchronous vs Asynchronous

Synchronous (Request-Response):

  [API Gateway] --> [Order Service] --> [Payment Service]
                                    --> [Inventory Service]

  Pros: Immediate response, simple implementation
  Cons: Tight coupling, cascading failures, latency accumulation

Asynchronous (Event-Driven):

  [Order Service] --> [Message Broker] --> [Payment Service]
                                       --> [Inventory Service]
                                       --> [Notification Service]

  Pros: Loose coupling, high resilience, scalability
  Cons: Eventual consistency, debugging difficulty

Kafka vs RabbitMQ:

Feature	Apache Kafka	RabbitMQ
Model	Pub/Sub + Log	Queue + Exchange
Ordering	Guaranteed within partition	Guaranteed within queue
Throughput	Very high (millions/sec)	High (tens of thousands/sec)
Retention	Retained for configured period	Deleted after consumption
Reprocessing	Possible via offset reset	Not possible (use DLQ)
Use Case	Event streaming, logs	Task queues, RPC

6.3 API Gateway Pattern

Responsibilities:
  - Request routing
  - Authentication / Authorization
  - Rate limiting
  - Load balancing
  - Request/Response transformation
  - Circuit breaking
  - Monitoring / Logging

Key solutions:
  - Kong: Open source, plugin ecosystem
  - Envoy: High performance, L7 proxy
  - AWS API Gateway: Managed, serverless
  - NGINX: Lightweight, high performance
  - Traefik: Automatic service discovery

# Envoy routing configuration example
static_resources:
  listeners:
    - name: listener_0
      address:
        socket_address:
          address: 0.0.0.0
          port_value: 8080
      filter_chains:
        - filters:
            - name: envoy.filters.network.http_connection_manager
              typed_config:
                route_config:
                  virtual_hosts:
                    - name: backend
                      domains: ["*"]
                      routes:
                        - match:
                            prefix: "/api/v1/users"
                          route:
                            cluster: user_service
                        - match:
                            prefix: "/api/v1/orders"
                          route:
                            cluster: order_service

6.4 Service Discovery

Client-side discovery:
  1. Service A --> Registry: Look up Service B address
  2. Service A --> Service B: Direct call
  Tools: Eureka, Consul

Server-side discovery:
  1. Service A --> Load Balancer: Request
  2. Load Balancer --> Registry: Look up address
  3. Load Balancer --> Service B: Forward
  Tools: K8s Service + DNS, AWS ALB

Kubernetes DNS-based:
  Internal DNS: service-name.namespace.svc.cluster.local
  Example: order-service.production.svc.cluster.local

6.5 Circuit Breaker Pattern

State transitions:

  [Closed] --failure threshold exceeded--> [Open]
  [Open]   --timeout elapsed--> [Half-Open]
  [Half-Open] --success--> [Closed]
  [Half-Open] --failure--> [Open]

// Resilience4j configuration example
CircuitBreakerConfig config = CircuitBreakerConfig.custom()
    .failureRateThreshold(50)        // Open at 50% failure
    .waitDurationInOpenState(
        Duration.ofSeconds(30))       // Half-Open after 30s
    .slidingWindowSize(10)            // Based on last 10 calls
    .minimumNumberOfCalls(5)          // Evaluate after 5 calls
    .permittedNumberOfCallsInHalfOpenState(3)
    .build();

CircuitBreaker circuitBreaker = CircuitBreaker.of(
    "paymentService", config);

Supplier<PaymentResponse> decoratedSupplier =
    CircuitBreaker.decorateSupplier(
        circuitBreaker,
        () -> paymentService.processPayment(request)
    );

Fallback strategies:

Return cached response: Use previous successful response
Return default value: Predefined default response
Call alternative service: Use backup service
Graceful degradation: Reduced functionality response

7. Service Mesh

7.1 What Is a Service Mesh?

Service mesh architecture:

  [Service A] <--> [Sidecar Proxy] <--> [Sidecar Proxy] <--> [Service B]
                          |                      |
                          v                      v
                   [Control Plane (Istio/Linkerd)]
                          |
                   [Config, Policy, Certificate Management]

Sidecar proxy responsibilities:
  - Traffic routing and load balancing
  - mTLS encryption
  - Circuit breaking
  - Retries and timeouts
  - Metrics collection
  - Distributed tracing

7.2 Istio vs Linkerd

Feature	Istio	Linkerd
Data Plane	Envoy	linkerd2-proxy (Rust)
Resource Usage	High	Low
Features	Very rich	Core features focused
Learning Curve	Steep	Gentle
Community	Google-led	CNCF Graduated
Multi-cluster	Supported	Supported

7.3 Istio Traffic Management

# Canary deployment - traffic splitting
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: order-service
spec:
  hosts:
    - order-service
  http:
    - route:
        - destination:
            host: order-service
            subset: v1
          weight: 90
        - destination:
            host: order-service
            subset: v2
          weight: 10
      retries:
        attempts: 3
        perTryTimeout: 2s
        retryOn: 5xx,reset,connect-failure
      timeout: 10s

# Circuit breaker configuration
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: order-service
spec:
  host: order-service
  trafficPolicy:
    connectionPool:
      tcp:
        maxConnections: 100
      http:
        h2UpgradePolicy: DEFAULT
        http1MaxPendingRequests: 100
        http2MaxRequests: 1000
    outlierDetection:
      consecutive5xxErrors: 5
      interval: 30s
      baseEjectionTime: 30s
      maxEjectionPercent: 50

8. Event-Driven Architecture

8.1 Event Sourcing

Traditional approach: Store only current state
  orders table: id=1, status=SHIPPED, total=50000

Event Sourcing: Store all state changes as events
  events table:
    1. OrderCreated     (total=50000)
    2. PaymentReceived  (amount=50000)
    3. OrderConfirmed   ()
    4. ItemShipped       (tracking=KR123456)

Pros:
  - Complete audit log
  - Time travel (reconstruct state at any point)
  - Replay events to create new views
  - Easier debugging

Cons:
  - Event schema evolution management
  - Event store size growth
  - Eventual consistency

8.2 CQRS (Command Query Responsibility Segregation)

CQRS Architecture:

  [Command] --> [Write Model] --> [Event Store]
                                       |
                                  [Event Bus]
                                       |
                              [Read Model Projection]
                                       |
                                [Query] <-- [Read DB]

Command (Write):
  - Execute domain logic
  - Publish events
  - Normalized database

Query (Read):
  - Denormalized read-only views
  - Optimized for fast queries
  - Multiple storage options (ES, Redis, etc.)

8.3 Saga Pattern - Distributed Transactions

In microservice environments, 2PC (Two-Phase Commit) has performance and availability issues. The Saga pattern manages distributed transactions as a sequence of local transactions.

Choreography approach:

Order creation Saga:

  [Order Service]       [Payment Service]    [Inventory Service]
       |                      |                     |
  OrderCreated -------->      |                     |
       |              PaymentProcessed -------->    |
       |                      |              InventoryReserved
       |                      |                     |
       |              <--- (on success) --->        |
  OrderConfirmed              |                     |

Compensating transactions (on failure):
  InventoryReserveFailed --> PaymentRefunded --> OrderCancelled

Orchestration approach:

  [Order Saga Orchestrator]
       |
       |--> 1. Order Service: Create order
       |--> 2. Payment Service: Process payment
       |--> 3. Inventory Service: Reserve inventory
       |--> 4. Shipping Service: Create shipment
       |
  (On failure, compensate in reverse)
       |--> 3c. Inventory: Release stock
       |--> 2c. Payment: Issue refund
       |--> 1c. Order: Cancel order

9. Distributed Tracing

9.1 Correlation ID Pattern

Request flow:

  [Client]
    X-Request-ID: req-abc-123
       |
  [API Gateway]
    X-Request-ID: req-abc-123
    X-Correlation-ID: corr-xyz-789
       |
  [Order Service] ----------> [Payment Service]
    trace_id: corr-xyz-789     trace_id: corr-xyz-789
    span_id: span-001          span_id: span-002
    parent_span_id: null        parent_span_id: span-001
       |
       ----------> [Inventory Service]
                    trace_id: corr-xyz-789
                    span_id: span-003
                    parent_span_id: span-001

9.2 OpenTelemetry Integration

// OpenTelemetry SDK initialization
import { NodeSDK } from '@opentelemetry/sdk-node';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-grpc';
import { HttpInstrumentation } from '@opentelemetry/instrumentation-http';
import { ExpressInstrumentation } from '@opentelemetry/instrumentation-express';

const sdk = new NodeSDK({
  traceExporter: new OTLPTraceExporter({
    url: 'http://otel-collector:4317',
  }),
  instrumentations: [
    new HttpInstrumentation(),
    new ExpressInstrumentation(),
  ],
});

sdk.start();

// Manual span creation
import { trace, SpanStatusCode } from '@opentelemetry/api';

const tracer = trace.getTracer('order-service');

async function processOrder(orderId: string) {
  const span = tracer.startSpan('processOrder', {
    attributes: {
      'order.id': orderId,
      'service.name': 'order-service',
    },
  });

  try {
    span.addEvent('Validating order');
    await validateOrder(orderId);

    span.addEvent('Processing payment');
    await processPayment(orderId);

    span.setStatus({ code: SpanStatusCode.OK });
  } catch (error) {
    span.setStatus({
      code: SpanStatusCode.ERROR,
      message: String(error),
    });
    throw error;
  } finally {
    span.end();
  }
}

9.3 Three Pillars of Observability

1. Logs:
   - Structured logging (JSON)
   - Log levels (DEBUG, INFO, WARN, ERROR)
   - Include Correlation ID
   - ELK Stack / Loki

2. Metrics:
   - RED metrics: Rate, Errors, Duration
   - USE metrics: Utilization, Saturation, Errors
   - Prometheus + Grafana

3. Traces:
   - Distributed request tracking
   - Span-based visualization
   - Jaeger / Zipkin / Tempo

10. Practical Example: E-commerce MSA Design

10.1 Overall Architecture

                    [CDN / CloudFront]
                           |
                    [API Gateway (Kong)]
                     /    |    |     \
                    /     |    |      \
  [User Service] [Product] [Order] [Payment]
       |          Service   Service   Service
       |            |         |         |
  [User DB]    [Product DB] [Order DB] [Payment DB]
  (PostgreSQL)  (PostgreSQL) (PostgreSQL) (PostgreSQL)
                    |         |
              [Search Service] [Notification]
                    |           Service
              [Elasticsearch]     |
                              [Kafka]
                                |
                         [Email/SMS/Push]

10.2 Technology Stack per Service

User Service:
  - Language: Go
  - DB: PostgreSQL
  - Cache: Redis (sessions)
  - Communication: REST + gRPC

Product Service:
  - Language: Java (Spring Boot)
  - DB: PostgreSQL
  - Cache: Redis (product info)
  - Search: Elasticsearch
  - Communication: REST + gRPC

Order Service:
  - Language: Java (Spring Boot)
  - DB: PostgreSQL
  - Messaging: Kafka (order events)
  - Communication: gRPC + Kafka

Payment Service:
  - Language: Go
  - DB: PostgreSQL
  - External: Payment gateway integration
  - Communication: gRPC

Notification Service:
  - Language: Node.js
  - DB: MongoDB (templates)
  - Messaging: Kafka Consumer
  - External: SendGrid, Firebase

10.3 Order Processing Flow

1. Client --> API Gateway: POST /api/v1/orders
2. API Gateway --> Order Service: Create order request
3. Order Service --> Product Service (gRPC): Check inventory
4. Order Service --> Kafka: Publish OrderCreated event
5. Payment Service (Consumer): Process payment
6. Payment Service --> Kafka: Publish PaymentCompleted event
7. Order Service (Consumer): Update order status
8. Notification Service (Consumer): Send confirmation email
9. Product Service (Consumer): Decrement inventory

10.4 Failure Handling Strategies

Circuit Breaker:
  - Accept orders when Payment service is down
  - Queue payments in Kafka for later processing

Retry + Exponential Backoff:
  - 1st retry: 100ms
  - 2nd retry: 200ms
  - 3rd retry: 400ms
  - Maximum retries: 5

Bulkhead Pattern:
  - Isolate thread pools per service
  - Prevent one service failure from cascading

Dead Letter Queue:
  - Move failed messages to DLQ
  - Manual analysis and reprocessing
  - Alert ops team via notifications

10.5 Deployment Strategy

# Kubernetes Deployment - Canary deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: order-service-canary
  labels:
    app: order-service
    version: v2
spec:
  replicas: 1
  selector:
    matchLabels:
      app: order-service
      version: v2
  template:
    metadata:
      labels:
        app: order-service
        version: v2
    spec:
      containers:
        - name: order-service
          image: order-service:2.0.0
          ports:
            - containerPort: 8080
          resources:
            requests:
              memory: "256Mi"
              cpu: "250m"
            limits:
              memory: "512Mi"
              cpu: "500m"
          readinessProbe:
            httpGet:
              path: /health/ready
              port: 8080
            initialDelaySeconds: 10
            periodSeconds: 5
          livenessProbe:
            httpGet:
              path: /health/live
              port: 8080
            initialDelaySeconds: 30
            periodSeconds: 10

Conclusion

API design and microservices architecture are core competencies in modern software development. Here are the key takeaways:

API design is a contract - Consistent naming, appropriate status codes, and clear error messages determine developer productivity
Technology choice depends on context - REST, GraphQL, and gRPC each have distinct strengths, and mixing them is common
Security from the start - Auth, mTLS, and rate limiting should be considered at design time, not as an afterthought
Decompose services carefully - Based on DDD Bounded Contexts, neither too small nor too large
Failures will happen - Build resilience with Circuit Breaker, Retry, Bulkhead, and Saga patterns
Observability is operational capability - Integrate logs, metrics, and traces to quickly identify issues

The wisest approach is to start with a monolith and incrementally adopt microservices based on actual needs. Focus on delivering business value rather than striving for technical elegance.

API 설계 & 마이크로서비스 아키텍처 완전 가이드

목차

1. API 설계 원칙

1.1 Richardson 성숙도 모델

1.2 리소스 네이밍 컨벤션

1.3 HTTP 메서드와 상태 코드

1.4 요청/응답 설계

2. REST vs GraphQL vs gRPC 비교

2.1 비교표

2.2 각 기술의 적합한 사용 사례

2.3 GraphQL 예제

2.4 gRPC 예제

3. API 버저닝

3.1 버저닝 전략 비교

3.2 하위 호환성 유지 원칙

3.3 API 폐기 전략

4. 인증/인가 (Authentication / Authorization)

4.1 인증 방식 비교

4.2 OAuth 2.0 플로우

4.3 JWT 구조와 주의사항

4.4 mTLS (상호 TLS)

5. Rate Limiting

5.1 알고리즘 비교

5.2 Rate Limit 응답 헤더

5.3 API Gateway를 이용한 Rate Limiting

6. 마이크로서비스 패턴

6.1 서비스 분리 기준 - DDD Bounded Context

6.2 서비스 통신 - 동기 vs 비동기

6.3 API Gateway 패턴

6.4 서비스 디스커버리

6.5 Circuit Breaker 패턴

7. 서비스 메시

7.1 서비스 메시란?

7.2 Istio vs Linkerd

7.3 Istio 트래픽 관리

8. 이벤트 드리븐 아키텍처

8.1 Event Sourcing

8.2 CQRS (Command Query Responsibility Segregation)

8.3 Saga 패턴 - 분산 트랜잭션

9. 분산 트레이싱

9.1 Correlation ID 패턴

9.2 OpenTelemetry 적용

9.3 관찰 가능성 (Observability) 3요소

10. 실전: 이커머스 시스템 MSA 설계

10.1 전체 아키텍처

10.2 서비스별 기술 스택

10.3 주문 처리 흐름

10.4 장애 대응 전략

10.5 배포 전략

마무리

The Complete Guide to API Design & Microservices Architecture

Table of Contents

1. API Design Principles

1.1 Richardson Maturity Model

1.2 Resource Naming Conventions

1.3 HTTP Methods and Status Codes

1.4 Request/Response Design

2. REST vs GraphQL vs gRPC

2.1 Comparison Table

2.2 Best Use Cases for Each

2.3 GraphQL Example

2.4 gRPC Example

3. API Versioning

3.1 Versioning Strategy Comparison

3.2 Backward Compatibility Principles

3.3 API Deprecation Strategy

4. Authentication and Authorization

4.1 Authentication Method Comparison

4.2 OAuth 2.0 Flow

4.3 JWT Structure and Security

4.4 mTLS (Mutual TLS)

5. Rate Limiting

5.1 Algorithm Comparison

5.2 Rate Limit Response Headers

5.3 Rate Limiting with API Gateway

6. Microservices Patterns

6.1 Service Decomposition - DDD Bounded Context

6.2 Service Communication - Synchronous vs Asynchronous

6.3 API Gateway Pattern

6.4 Service Discovery