Skip to content

Split View: API 설계 & 마이크로서비스 아키텍처 완전 가이드

|

API 설계 & 마이크로서비스 아키텍처 완전 가이드

목차

1. API 설계 원칙

1.1 Richardson 성숙도 모델

Leonard Richardson이 제안한 REST 성숙도 모델은 API의 RESTful 수준을 4단계로 분류한다.

레벨이름설명
Level 0The Swamp of POX단일 URI, 단일 HTTP 메서드 (보통 POST)
Level 1Resources개별 리소스 URI 사용, 여전히 단일 메서드
Level 2HTTP VerbsHTTP 메서드를 올바르게 활용 (GET, POST, PUT, DELETE)
Level 3Hypermedia ControlsHATEOAS - 응답에 다음 행동 링크 포함

대부분의 실무 API는 Level 2를 목표로 하며, Level 3 (HATEOAS)는 구현 복잡도 대비 실질적 이점이 적어 선택적으로 적용한다.

1.2 리소스 네이밍 컨벤션

좋은 API 설계의 핵심은 일관된 리소스 네이밍이다.

# 좋은 예
GET    /api/v1/users
GET    /api/v1/users/123
GET    /api/v1/users/123/orders
POST   /api/v1/users
PUT    /api/v1/users/123
DELETE /api/v1/users/123

# 나쁜 예
GET    /api/v1/getUsers
POST   /api/v1/createUser
GET    /api/v1/user_list

핵심 원칙:

  • 명사 사용: 리소스는 명사로 표현한다 (users, orders, products)
  • 복수형: 컬렉션은 복수형을 사용한다 (/users, /orders)
  • 소문자 + 하이픈: kebab-case를 사용한다 (/order-items)
  • 계층 관계: 중첩 리소스로 표현한다 (/users/123/orders)
  • 필터링은 쿼리 파라미터: /users?status=active&role=admin

1.3 HTTP 메서드와 상태 코드

GET     - 조회 (안전, 멱등)
POST    - 생성 (비안전, 비멱등)
PUT     - 전체 수정 (비안전, 멱등)
PATCH   - 부분 수정 (비안전, 비멱등)
DELETE  - 삭제 (비안전, 멱등)
OPTIONS - CORS preflight
HEAD    - 헤더만 조회

주요 상태 코드 가이드:

코드의미사용 시점
200OK성공적인 GET, PUT, PATCH
201Created성공적인 POST (Location 헤더 포함)
204No Content성공적인 DELETE
400Bad Request잘못된 요청 형식
401Unauthorized인증 실패
403Forbidden권한 없음
404Not Found리소스 없음
409Conflict리소스 충돌
422Unprocessable Entity유효성 검사 실패
429Too Many RequestsRate Limit 초과
500Internal Server Error서버 오류

1.4 요청/응답 설계

일관된 응답 형식은 클라이언트 개발 경험을 크게 향상시킨다.

{
  "data": {
    "id": "user_123",
    "type": "user",
    "attributes": {
      "name": "홍길동",
      "email": "hong@example.com",
      "created_at": "2026-04-12T10:00:00Z"
    }
  },
  "meta": {
    "request_id": "req_abc123",
    "timestamp": "2026-04-12T10:00:00Z"
  }
}

페이지네이션 응답:

{
  "data": [],
  "pagination": {
    "page": 1,
    "per_page": 20,
    "total": 150,
    "total_pages": 8,
    "next_cursor": "eyJpZCI6MTAwfQ=="
  }
}

에러 응답:

{
  "error": {
    "code": "VALIDATION_ERROR",
    "message": "요청 데이터가 유효하지 않습니다",
    "details": [
      {
        "field": "email",
        "message": "올바른 이메일 형식이 아닙니다"
      }
    ]
  }
}

2. REST vs GraphQL vs gRPC 비교

2.1 비교표

특성RESTGraphQLgRPC
프로토콜HTTP/1.1HTTP/1.1HTTP/2
데이터 형식JSON/XMLJSONProtocol Buffers
스키마OpenAPI (선택)SDL (필수).proto (필수)
타입 안전성약함강함매우 강함
오버/언더 페칭있음해결해결
실시간WebSocketSubscription양방향 스트리밍
브라우저 지원네이티브네이티브grpc-web 필요
성능보통보통높음
학습 곡선낮음중간높음

2.2 각 기술의 적합한 사용 사례

REST가 적합한 경우:

  • 공개 API (Open API)
  • 단순한 CRUD 작업
  • 캐싱이 중요한 경우 (HTTP 캐싱 활용)
  • 브라우저에서 직접 호출하는 경우

GraphQL이 적합한 경우:

  • 모바일 앱 (대역폭 최적화)
  • 복잡한 데이터 관계가 있는 경우
  • 다양한 클라이언트가 다른 데이터를 요구하는 경우
  • 빠른 프론트엔드 개발 주기

gRPC가 적합한 경우:

  • 마이크로서비스 간 내부 통신
  • 높은 성능이 필요한 경우
  • 양방향 스트리밍이 필요한 경우
  • 다중 언어 환경

2.3 GraphQL 예제

# 스키마 정의
type User {
  id: ID!
  name: String!
  email: String!
  orders: [Order!]!
}

type Order {
  id: ID!
  total: Float!
  status: OrderStatus!
  items: [OrderItem!]!
}

type Query {
  user(id: ID!): User
  users(page: Int, limit: Int): [User!]!
}

type Mutation {
  createUser(input: CreateUserInput!): User!
  updateUser(id: ID!, input: UpdateUserInput!): User!
}
# 클라이언트 쿼리 - 필요한 필드만 요청
query GetUserWithOrders {
  user(id: "123") {
    name
    email
    orders {
      id
      total
      status
    }
  }
}

2.4 gRPC 예제

syntax = "proto3";

package ecommerce;

service UserService {
  rpc GetUser (GetUserRequest) returns (UserResponse);
  rpc ListUsers (ListUsersRequest) returns (stream UserResponse);
  rpc CreateUser (CreateUserRequest) returns (UserResponse);
}

message GetUserRequest {
  string user_id = 1;
}

message UserResponse {
  string id = 1;
  string name = 2;
  string email = 3;
  int64 created_at = 4;
}

message ListUsersRequest {
  int32 page = 1;
  int32 limit = 2;
}

message CreateUserRequest {
  string name = 1;
  string email = 2;
}

3. API 버저닝

3.1 버저닝 전략 비교

전략예시장점단점
URI 버저닝/api/v1/users직관적, 캐싱 용이URI 오염
헤더 버저닝Accept: application/vnd.api+json;version=1URI 깔끔테스트 어려움
쿼리 버저닝/api/users?version=1구현 간단캐싱 복잡
Content NegotiationAccept: application/vnd.company.v1+json표준 준수복잡함

URI 버저닝이 가장 널리 사용되며, 실용성과 명확성 면에서 권장된다.

3.2 하위 호환성 유지 원칙

하위 호환 가능한 변경 (비파괴적):
  - 새로운 엔드포인트 추가
  - 응답에 새로운 필드 추가
  - 선택적 요청 파라미터 추가
  - 새로운 enum 값 추가 (클라이언트가 unknown 처리 시)

하위 호환 불가능한 변경 (파괴적):
  - 기존 필드 제거 또는 이름 변경
  - 필드 타입 변경
  - 필수 파라미터 추가
  - 응답 구조 변경
  - URL 경로 변경

3.3 API 폐기 전략

Phase 1: Sunset 헤더 추가
  Sunset: Sat, 01 Jan 2027 00:00:00 GMT
  Deprecation: true
  Link: <https://api.example.com/v2/docs>; rel="successor-version"

Phase 2: 응답에 경고 포함 (6개월)
Phase 3: Rate limit 단계적 축소 (3개월)
Phase 4: 410 Gone 응답 반환

4. 인증/인가 (Authentication / Authorization)

4.1 인증 방식 비교

방식보안 수준사용 사례복잡도
API Key낮음내부/파트너 API낮음
OAuth 2.0높음사용자 인증 위임높음
JWT중간무상태 인증중간
mTLS매우 높음서비스 간 통신높음

4.2 OAuth 2.0 플로우

Authorization Code Flow (서버 사이드 앱 권장):

1. 클라이언트 --> 인가 서버: 인가 코드 요청
   GET /authorize?response_type=code
     &client_id=CLIENT_ID
     &redirect_uri=CALLBACK_URL
     &scope=read:user
     &state=RANDOM_STATE

2. 사용자 --> 인가 서버: 로그인 및 권한 동의

3. 인가 서버 --> 클라이언트: 인가 코드 반환
   302 Redirect: CALLBACK_URL?code=AUTH_CODE&state=RANDOM_STATE

4. 클라이언트 --> 인가 서버: 토큰 교환
   POST /token
     grant_type=authorization_code
     &code=AUTH_CODE
     &client_id=CLIENT_ID
     &client_secret=CLIENT_SECRET

5. 인가 서버 --> 클라이언트: 액세스 토큰 + 리프레시 토큰

4.3 JWT 구조와 주의사항

{
  "header": {
    "alg": "RS256",
    "typ": "JWT",
    "kid": "key-id-001"
  },
  "payload": {
    "sub": "user_123",
    "iss": "auth.example.com",
    "aud": "api.example.com",
    "exp": 1744540800,
    "iat": 1744537200,
    "scope": "read:users write:orders",
    "roles": ["admin"]
  }
}

JWT 보안 체크리스트:

  • RS256 (비대칭) 알고리즘 사용 권장 (HS256보다 안전)
  • 짧은 만료 시간 설정 (15분 이하)
  • 리프레시 토큰은 서버사이드에 저장
  • iss, aud, exp 클레임을 반드시 검증
  • none 알고리즘 거부
  • kid (Key ID) 검증으로 키 혼동 공격 방지

4.4 mTLS (상호 TLS)

서비스 간 통신 보안:

1. 각 서비스에 고유한 X.509 인증서 발급
2. 통신 시 양방향 인증서 검증
3. 인증서 자동 갱신 (cert-manager 등)

장점:
  - 서비스 ID를 암호학적으로 증명
  - 네트워크 레벨에서 암호화
  - Zero Trust 아키텍처의 기반

단점:
  - 인증서 관리 복잡도
  - 인증서 만료 시 서비스 중단 위험
  - 디버깅 어려움

5. Rate Limiting

5.1 알고리즘 비교

Token Bucket:

원리: 일정 속도로 토큰이 추가되며 요청 시 토큰 소비
장점: 버스트 트래픽 허용, 평균 속도 유지
단점: 메모리 사용

예시:
  - 버킷 크기: 100 토큰
  - 충전 속도: 10 토큰/초
  - 요청당 1 토큰 소비
  - 버스트: 최대 100 요청 동시 가능

Sliding Window:

원리: 시간 윈도우 내 요청 수 카운팅
장점: 정확한 제한, 윈도우 경계 문제 해결
단점: 이전 윈도우 카운터 저장 필요

예시:
  현재 시간: 12:01:30
  이전 윈도우 (12:00-12:01): 80 요청
  현재 윈도우 (12:01-12:02): 20 요청
  가중 합계: 80 * 0.5 + 20 = 60 (한도 100 이내)

5.2 Rate Limit 응답 헤더

HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1744540860
Retry-After: 60

5.3 API Gateway를 이용한 Rate Limiting

# Kong Gateway 설정 예시
plugins:
  - name: rate-limiting
    config:
      minute: 100
      hour: 1000
      policy: redis
      redis_host: redis-cluster
      redis_port: 6379
      fault_tolerant: true
      hide_client_headers: false

6. 마이크로서비스 패턴

6.1 서비스 분리 기준 - DDD Bounded Context

이커머스 도메인 분석:

[주문 컨텍스트]          [상품 컨텍스트]
  - Order                - Product
  - OrderItem            - Category
  - OrderStatus          - Inventory
  - Payment              - Price

[사용자 컨텍스트]        [배송 컨텍스트]
  - User                 - Shipment
  - Address              - Tracking
  - Authentication       - Carrier
  - Profile              - Delivery

[알림 컨텍스트]          [검색 컨텍스트]
  - Notification         - SearchIndex
  - Template             - Filter
  - Channel              - Ranking

분리 원칙:

  • 비즈니스 능력(Business Capability) 기준으로 분리
  • 데이터 소유권: 각 서비스가 자체 데이터베이스 소유
  • 팀 자율성: 두 피자 팀 규칙 (6-8명)
  • 배포 독립성: 다른 서비스 변경 없이 배포 가능
  • 느슨한 결합, 높은 응집도

6.2 서비스 통신 - 동기 vs 비동기

동기 통신 (Request-Response):

  [API Gateway] --> [Order Service] --> [Payment Service]
                                    --> [Inventory Service]

  장점: 즉시 응답, 구현 간단
  단점: 강한 결합, 연쇄 장애, 지연 시간 누적

비동기 통신 (Event-Driven):

  [Order Service] --> [Message Broker] --> [Payment Service]
                                       --> [Inventory Service]
                                       --> [Notification Service]

  장점: 느슨한 결합, 높은 복원력, 확장성
  단점: 최종 일관성, 디버깅 어려움

Kafka vs RabbitMQ 비교:

특성Apache KafkaRabbitMQ
모델Pub/Sub + LogQueue + Exchange
순서 보장파티션 내 보장큐 내 보장
처리량매우 높음 (수백만/초)높음 (수만/초)
메시지 보존설정 기간 동안 보존소비 후 삭제
재처리오프셋 리셋으로 가능불가 (DLQ 사용)
사용 사례이벤트 스트리밍, 로그작업 큐, RPC

6.3 API Gateway 패턴

역할:
  - 요청 라우팅
  - 인증/인가
  - Rate Limiting
  - 로드 밸런싱
  - 요청/응답 변환
  - 회로 차단
  - 모니터링/로깅

주요 솔루션:
  - Kong: 오픈소스, 플러그인 생태계
  - Envoy: 고성능, L7 프록시
  - AWS API Gateway: 관리형, 서버리스
  - NGINX: 경량, 높은 성능
  - Traefik: 자동 서비스 디스커버리
# Envoy 라우팅 설정 예시
static_resources:
  listeners:
    - name: listener_0
      address:
        socket_address:
          address: 0.0.0.0
          port_value: 8080
      filter_chains:
        - filters:
            - name: envoy.filters.network.http_connection_manager
              typed_config:
                route_config:
                  virtual_hosts:
                    - name: backend
                      domains: ["*"]
                      routes:
                        - match:
                            prefix: "/api/v1/users"
                          route:
                            cluster: user_service
                        - match:
                            prefix: "/api/v1/orders"
                          route:
                            cluster: order_service

6.4 서비스 디스커버리

클라이언트 사이드 디스커버리:
  1. 서비스 A --> 서비스 레지스트리: 서비스 B 주소 조회
  2. 서비스 A --> 서비스 B: 직접 호출
  도구: Eureka, Consul

서버 사이드 디스커버리:
  1. 서비스 A --> 로드 밸런서: 요청
  2. 로드 밸런서 --> 서비스 레지스트리: 주소 조회
  3. 로드 밸런서 --> 서비스 B: 전달
  도구: K8s Service + DNS, AWS ALB

Kubernetes DNS 기반:
  서비스 내부 DNS: service-name.namespace.svc.cluster.local
  예: order-service.production.svc.cluster.local

6.5 Circuit Breaker 패턴

상태 전이:

  [Closed] --실패 임계치 초과--> [Open]
  [Open]   --타임아웃 경과--> [Half-Open]
  [Half-Open] --성공--> [Closed]
  [Half-Open] --실패--> [Open]
// Resilience4j 설정 예시
CircuitBreakerConfig config = CircuitBreakerConfig.custom()
    .failureRateThreshold(50)        // 50% 실패 시 Open
    .waitDurationInOpenState(
        Duration.ofSeconds(30))       // 30초 후 Half-Open
    .slidingWindowSize(10)            // 최근 10개 요청 기준
    .minimumNumberOfCalls(5)          // 최소 5개 호출 후 판단
    .permittedNumberOfCallsInHalfOpenState(3)
    .build();

CircuitBreaker circuitBreaker = CircuitBreaker.of(
    "paymentService", config);

Supplier<PaymentResponse> decoratedSupplier =
    CircuitBreaker.decorateSupplier(
        circuitBreaker,
        () -> paymentService.processPayment(request)
    );

폴백(Fallback) 전략:

  • 캐시된 응답 반환: 이전 성공 응답 사용
  • 기본값 반환: 미리 정의된 기본 응답
  • 대체 서비스 호출: 백업 서비스 사용
  • 그레이스풀 디그레이데이션: 기능 축소 응답

7. 서비스 메시

7.1 서비스 메시란?

서비스 메시 아키텍처:

  [Service A] <--> [Sidecar Proxy] <--> [Sidecar Proxy] <--> [Service B]
                          |                      |
                          v                      v
                   [Control Plane (Istio/Linkerd)]
                          |
                   [설정, 정책, 인증서 관리]

사이드카 프록시의 역할:
  - 트래픽 라우팅 및 로드 밸런싱
  - mTLS 암호화
  - 회로 차단
  - 재시도 및 타임아웃
  - 메트릭 수집
  - 분산 트레이싱

7.2 Istio vs Linkerd

특성IstioLinkerd
데이터 플레인Envoylinkerd2-proxy (Rust)
리소스 사용높음낮음
기능매우 풍부핵심 기능 집중
학습 곡선가파름완만
커뮤니티Google 주도CNCF Graduated
멀티클러스터지원지원

7.3 Istio 트래픽 관리

# 카나리 배포 - 트래픽 분할
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: order-service
spec:
  hosts:
    - order-service
  http:
    - route:
        - destination:
            host: order-service
            subset: v1
          weight: 90
        - destination:
            host: order-service
            subset: v2
          weight: 10
      retries:
        attempts: 3
        perTryTimeout: 2s
        retryOn: 5xx,reset,connect-failure
      timeout: 10s
# 회로 차단 설정
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: order-service
spec:
  host: order-service
  trafficPolicy:
    connectionPool:
      tcp:
        maxConnections: 100
      http:
        h2UpgradePolicy: DEFAULT
        http1MaxPendingRequests: 100
        http2MaxRequests: 1000
    outlierDetection:
      consecutive5xxErrors: 5
      interval: 30s
      baseEjectionTime: 30s
      maxEjectionPercent: 50

8. 이벤트 드리븐 아키텍처

8.1 Event Sourcing

전통적 방식: 현재 상태만 저장
  orders 테이블: id=1, status=SHIPPED, total=50000

Event Sourcing: 모든 상태 변경을 이벤트로 저장
  events 테이블:
    1. OrderCreated     (total=50000)
    2. PaymentReceived  (amount=50000)
    3. OrderConfirmed   ()
    4. ItemShipped       (tracking=KR123456)

장점:
  - 완전한 감사 로그
  - 시간 여행 (특정 시점 상태 재현)
  - 이벤트 재생으로 새로운 뷰 생성
  - 디버깅 용이

단점:
  - 이벤트 스키마 진화 관리
  - 이벤트 저장소 크기 증가
  - 최종 일관성 (Eventual Consistency)

8.2 CQRS (Command Query Responsibility Segregation)

CQRS 아키텍처:

  [Command] --> [Write Model] --> [Event Store]
                                       |
                                  [Event Bus]
                                       |
                              [Read Model 프로젝션]
                                       |
                                [Query] <-- [Read DB]

Command (쓰기):
  - 도메인 로직 실행
  - 이벤트 발행
  - 정규화된 데이터베이스

Query (읽기):
  - 비정규화된 읽기 전용 뷰
  - 빠른 조회 최적화
  - 다양한 저장소 사용 가능 (ES, Redis 등)

8.3 Saga 패턴 - 분산 트랜잭션

마이크로서비스 환경에서 2PC(Two-Phase Commit)는 성능과 가용성 문제가 있다. Saga 패턴은 로컬 트랜잭션의 시퀀스로 분산 트랜잭션을 관리한다.

Choreography (안무) 방식:

주문 생성 Saga:

  [Order Service]       [Payment Service]    [Inventory Service]
       |                      |                     |
  OrderCreated -------->      |                     |
       |              PaymentProcessed -------->    |
       |                      |              InventoryReserved
       |                      |                     |
       |              <--- (성공 시) --->            |
  OrderConfirmed              |                     |

보상 트랜잭션 (실패 시):
  InventoryReserveFailed --> PaymentRefunded --> OrderCancelled

Orchestration (오케스트레이션) 방식:

  [Order Saga Orchestrator]
       |
       |--> 1. Order Service: 주문 생성
       |--> 2. Payment Service: 결제 처리
       |--> 3. Inventory Service: 재고 예약
       |--> 4. Shipping Service: 배송 생성
       |
  (실패 시 역순 보상)
       |--> 3c. Inventory: 재고 해제
       |--> 2c. Payment: 환불 처리
       |--> 1c. Order: 주문 취소

9. 분산 트레이싱

9.1 Correlation ID 패턴

요청 흐름:

  [Client]
    X-Request-ID: req-abc-123
       |
  [API Gateway]
    X-Request-ID: req-abc-123
    X-Correlation-ID: corr-xyz-789
       |
  [Order Service] ----------> [Payment Service]
    trace_id: corr-xyz-789     trace_id: corr-xyz-789
    span_id: span-001          span_id: span-002
    parent_span_id: null        parent_span_id: span-001
       |
       ----------> [Inventory Service]
                    trace_id: corr-xyz-789
                    span_id: span-003
                    parent_span_id: span-001

9.2 OpenTelemetry 적용

// OpenTelemetry SDK 초기화
import { NodeSDK } from '@opentelemetry/sdk-node';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-grpc';
import { HttpInstrumentation } from '@opentelemetry/instrumentation-http';
import { ExpressInstrumentation } from '@opentelemetry/instrumentation-express';

const sdk = new NodeSDK({
  traceExporter: new OTLPTraceExporter({
    url: 'http://otel-collector:4317',
  }),
  instrumentations: [
    new HttpInstrumentation(),
    new ExpressInstrumentation(),
  ],
});

sdk.start();
// 수동 스팬 생성
import { trace, SpanStatusCode } from '@opentelemetry/api';

const tracer = trace.getTracer('order-service');

async function processOrder(orderId: string) {
  const span = tracer.startSpan('processOrder', {
    attributes: {
      'order.id': orderId,
      'service.name': 'order-service',
    },
  });

  try {
    span.addEvent('Validating order');
    await validateOrder(orderId);

    span.addEvent('Processing payment');
    await processPayment(orderId);

    span.setStatus({ code: SpanStatusCode.OK });
  } catch (error) {
    span.setStatus({
      code: SpanStatusCode.ERROR,
      message: String(error),
    });
    throw error;
  } finally {
    span.end();
  }
}

9.3 관찰 가능성 (Observability) 3요소

1. 로그 (Logs):
   - 구조화된 로그 (JSON)
   - 로그 레벨 (DEBUG, INFO, WARN, ERROR)
   - Correlation ID 포함
   - ELK Stack / Loki

2. 메트릭 (Metrics):
   - RED 메트릭: Rate, Errors, Duration
   - USE 메트릭: Utilization, Saturation, Errors
   - Prometheus + Grafana

3. 트레이스 (Traces):
   - 분산 요청 추적
   - 스팬(Span) 기반 시각화
   - Jaeger / Zipkin / Tempo

10. 실전: 이커머스 시스템 MSA 설계

10.1 전체 아키텍처

                    [CDN / CloudFront]
                           |
                    [API Gateway (Kong)]
                     /    |    |     \
                    /     |    |      \
  [User Service] [Product] [Order] [Payment]
       |          Service   Service   Service
       |            |         |         |
  [User DB]    [Product DB] [Order DB] [Payment DB]
  (PostgreSQL)  (PostgreSQL) (PostgreSQL) (PostgreSQL)
                    |         |
              [Search Service] [Notification]
                    |           Service
              [Elasticsearch]     |
                              [Kafka]
                                |
                         [Email/SMS/Push]

10.2 서비스별 기술 스택

User Service:
  - 언어: Go
  - DB: PostgreSQL
  - 캐시: Redis (세션)
  - 통신: REST + gRPC

Product Service:
  - 언어: Java (Spring Boot)
  - DB: PostgreSQL
  - 캐시: Redis (상품 정보)
  - 검색: Elasticsearch
  - 통신: REST + gRPC

Order Service:
  - 언어: Java (Spring Boot)
  - DB: PostgreSQL
  - 메시지: Kafka (주문 이벤트)
  - 통신: gRPC + Kafka

Payment Service:
  - 언어: Go
  - DB: PostgreSQL
  - 외부: PG사 연동
  - 통신: gRPC

Notification Service:
  - 언어: Node.js
  - DB: MongoDB (템플릿)
  - 메시지: Kafka Consumer
  - 외부: SendGrid, Firebase

10.3 주문 처리 흐름

1. 클라이언트 --> API Gateway: POST /api/v1/orders
2. API Gateway --> Order Service: 주문 생성 요청
3. Order Service --> Product Service (gRPC): 재고 확인
4. Order Service --> Kafka: OrderCreated 이벤트 발행
5. Payment Service (Consumer): 결제 처리
6. Payment Service --> Kafka: PaymentCompleted 이벤트
7. Order Service (Consumer): 주문 상태 업데이트
8. Notification Service (Consumer): 주문 확인 이메일 발송
9. Product Service (Consumer): 재고 차감

10.4 장애 대응 전략

Circuit Breaker:
  - Payment 서비스 장애 시 주문 접수만 진행
  - 결제는 Kafka 큐에 적재 후 재처리

Retry + Exponential Backoff:
  - 1차 재시도: 100ms
  - 2차 재시도: 200ms
  - 3차 재시도: 400ms
  - 최대 재시도: 5회

Bulkhead 패턴:
  - 서비스별 스레드 풀 격리
  - 하나의 서비스 장애가 전체로 전파되지 않도록 격리

Dead Letter Queue:
  - 처리 실패 메시지를 DLQ로 이동
  - 수동 분석 및 재처리
  - 알림 설정으로 운영팀 통보

10.5 배포 전략

# Kubernetes Deployment - 카나리 배포
apiVersion: apps/v1
kind: Deployment
metadata:
  name: order-service-canary
  labels:
    app: order-service
    version: v2
spec:
  replicas: 1
  selector:
    matchLabels:
      app: order-service
      version: v2
  template:
    metadata:
      labels:
        app: order-service
        version: v2
    spec:
      containers:
        - name: order-service
          image: order-service:2.0.0
          ports:
            - containerPort: 8080
          resources:
            requests:
              memory: "256Mi"
              cpu: "250m"
            limits:
              memory: "512Mi"
              cpu: "500m"
          readinessProbe:
            httpGet:
              path: /health/ready
              port: 8080
            initialDelaySeconds: 10
            periodSeconds: 5
          livenessProbe:
            httpGet:
              path: /health/live
              port: 8080
            initialDelaySeconds: 30
            periodSeconds: 10

마무리

API 설계와 마이크로서비스 아키텍처는 현대 소프트웨어 개발의 핵심 역량이다. 핵심 포인트를 정리하면 다음과 같다.

  1. API 설계는 계약이다 - 일관된 네이밍, 적절한 상태 코드, 명확한 에러 메시지가 개발 생산성을 결정한다
  2. 기술 선택은 맥락에 따라 - REST, GraphQL, gRPC 각각의 강점이 다르며, 혼용이 일반적이다
  3. 보안은 처음부터 - 인증/인가, mTLS, Rate Limiting은 후순위가 아니라 설계 단계부터 고려해야 한다
  4. 서비스 분리는 신중하게 - DDD Bounded Context를 기반으로, 너무 작지도 크지도 않게 분리한다
  5. 장애는 반드시 온다 - Circuit Breaker, Retry, Bulkhead, Saga 패턴으로 복원력을 확보한다
  6. 관찰 가능성이 곧 운영력 - 로그, 메트릭, 트레이스 3요소를 통합하여 문제를 빠르게 파악한다

모놀리스에서 시작하여 실제 필요에 따라 점진적으로 마이크로서비스를 도입하는 것이 가장 현명한 접근 방법이다. 기술적 우아함보다 비즈니스 가치 전달에 집중하자.

The Complete Guide to API Design & Microservices Architecture

Table of Contents

1. API Design Principles

1.1 Richardson Maturity Model

The REST maturity model proposed by Leonard Richardson classifies how RESTful an API is across four levels.

LevelNameDescription
Level 0The Swamp of POXSingle URI, single HTTP method (usually POST)
Level 1ResourcesIndividual resource URIs, still single method
Level 2HTTP VerbsProper use of HTTP methods (GET, POST, PUT, DELETE)
Level 3Hypermedia ControlsHATEOAS - responses include links to next actions

Most production APIs target Level 2, while Level 3 (HATEOAS) is applied selectively since its implementation complexity rarely justifies the benefits.

1.2 Resource Naming Conventions

Consistent resource naming is the cornerstone of good API design.

# Good examples
GET    /api/v1/users
GET    /api/v1/users/123
GET    /api/v1/users/123/orders
POST   /api/v1/users
PUT    /api/v1/users/123
DELETE /api/v1/users/123

# Bad examples
GET    /api/v1/getUsers
POST   /api/v1/createUser
GET    /api/v1/user_list

Core principles:

  • Use nouns: Resources are expressed as nouns (users, orders, products)
  • Plurals: Collections use plural forms (/users, /orders)
  • Lowercase + hyphens: Use kebab-case (/order-items)
  • Hierarchical relationships: Express with nested resources (/users/123/orders)
  • Filtering via query parameters: /users?status=active&role=admin

1.3 HTTP Methods and Status Codes

GET     - Read (safe, idempotent)
POST    - Create (unsafe, non-idempotent)
PUT     - Full update (unsafe, idempotent)
PATCH   - Partial update (unsafe, non-idempotent)
DELETE  - Delete (unsafe, idempotent)
OPTIONS - CORS preflight
HEAD    - Headers only

Key status code guide:

CodeMeaningWhen to Use
200OKSuccessful GET, PUT, PATCH
201CreatedSuccessful POST (include Location header)
204No ContentSuccessful DELETE
400Bad RequestInvalid request format
401UnauthorizedAuthentication failure
403ForbiddenInsufficient permissions
404Not FoundResource does not exist
409ConflictResource conflict
422Unprocessable EntityValidation failure
429Too Many RequestsRate limit exceeded
500Internal Server ErrorServer error

1.4 Request/Response Design

A consistent response format greatly improves the client development experience.

{
  "data": {
    "id": "user_123",
    "type": "user",
    "attributes": {
      "name": "John Doe",
      "email": "john@example.com",
      "created_at": "2026-04-12T10:00:00Z"
    }
  },
  "meta": {
    "request_id": "req_abc123",
    "timestamp": "2026-04-12T10:00:00Z"
  }
}

Pagination response:

{
  "data": [],
  "pagination": {
    "page": 1,
    "per_page": 20,
    "total": 150,
    "total_pages": 8,
    "next_cursor": "eyJpZCI6MTAwfQ=="
  }
}

Error response:

{
  "error": {
    "code": "VALIDATION_ERROR",
    "message": "The request data is invalid",
    "details": [
      {
        "field": "email",
        "message": "Not a valid email format"
      }
    ]
  }
}

2. REST vs GraphQL vs gRPC

2.1 Comparison Table

FeatureRESTGraphQLgRPC
ProtocolHTTP/1.1HTTP/1.1HTTP/2
Data FormatJSON/XMLJSONProtocol Buffers
SchemaOpenAPI (optional)SDL (required).proto (required)
Type SafetyWeakStrongVery Strong
Over/Under-fetchingPresentResolvedResolved
Real-timeWebSocketSubscriptionBidirectional Streaming
Browser SupportNativeNativeRequires grpc-web
PerformanceModerateModerateHigh
Learning CurveLowMediumHigh

2.2 Best Use Cases for Each

REST works best when:

  • Building public APIs (Open API)
  • Simple CRUD operations
  • Caching is critical (leveraging HTTP caching)
  • Browser-direct API calls

GraphQL works best when:

  • Mobile apps (bandwidth optimization)
  • Complex data relationships
  • Different clients need different data shapes
  • Fast frontend development cycles

gRPC works best when:

  • Internal microservice communication
  • High performance is required
  • Bidirectional streaming is needed
  • Polyglot environments

2.3 GraphQL Example

# Schema definition
type User {
  id: ID!
  name: String!
  email: String!
  orders: [Order!]!
}

type Order {
  id: ID!
  total: Float!
  status: OrderStatus!
  items: [OrderItem!]!
}

type Query {
  user(id: ID!): User
  users(page: Int, limit: Int): [User!]!
}

type Mutation {
  createUser(input: CreateUserInput!): User!
  updateUser(id: ID!, input: UpdateUserInput!): User!
}
# Client query - request only needed fields
query GetUserWithOrders {
  user(id: "123") {
    name
    email
    orders {
      id
      total
      status
    }
  }
}

2.4 gRPC Example

syntax = "proto3";

package ecommerce;

service UserService {
  rpc GetUser (GetUserRequest) returns (UserResponse);
  rpc ListUsers (ListUsersRequest) returns (stream UserResponse);
  rpc CreateUser (CreateUserRequest) returns (UserResponse);
}

message GetUserRequest {
  string user_id = 1;
}

message UserResponse {
  string id = 1;
  string name = 2;
  string email = 3;
  int64 created_at = 4;
}

message ListUsersRequest {
  int32 page = 1;
  int32 limit = 2;
}

message CreateUserRequest {
  string name = 1;
  string email = 2;
}

3. API Versioning

3.1 Versioning Strategy Comparison

StrategyExampleProsCons
URI Versioning/api/v1/usersIntuitive, caching-friendlyURI pollution
Header VersioningAccept: application/vnd.api+json;version=1Clean URIsHard to test
Query Versioning/api/users?version=1Simple implementationComplex caching
Content NegotiationAccept: application/vnd.company.v1+jsonStandards-compliantComplex

URI versioning is the most widely adopted, and is recommended for its practicality and clarity.

3.2 Backward Compatibility Principles

Backward-compatible changes (non-breaking):
  - Adding new endpoints
  - Adding new fields to responses
  - Adding optional request parameters
  - Adding new enum values (if clients handle unknown)

Breaking changes:
  - Removing or renaming existing fields
  - Changing field types
  - Adding required parameters
  - Changing response structure
  - Changing URL paths

3.3 API Deprecation Strategy

Phase 1: Add Sunset header
  Sunset: Sat, 01 Jan 2027 00:00:00 GMT
  Deprecation: true
  Link: <https://api.example.com/v2/docs>; rel="successor-version"

Phase 2: Include warnings in responses (6 months)
Phase 3: Gradually reduce rate limits (3 months)
Phase 4: Return 410 Gone

4. Authentication and Authorization

4.1 Authentication Method Comparison

MethodSecurity LevelUse CaseComplexity
API KeyLowInternal/Partner APIsLow
OAuth 2.0HighDelegated user authHigh
JWTMediumStateless authMedium
mTLSVery HighService-to-serviceHigh

4.2 OAuth 2.0 Flow

Authorization Code Flow (recommended for server-side apps):

1. Client --> Auth Server: Request authorization code
   GET /authorize?response_type=code
     &client_id=CLIENT_ID
     &redirect_uri=CALLBACK_URL
     &scope=read:user
     &state=RANDOM_STATE

2. User --> Auth Server: Login and grant consent

3. Auth Server --> Client: Return authorization code
   302 Redirect: CALLBACK_URL?code=AUTH_CODE&state=RANDOM_STATE

4. Client --> Auth Server: Exchange for token
   POST /token
     grant_type=authorization_code
     &code=AUTH_CODE
     &client_id=CLIENT_ID
     &client_secret=CLIENT_SECRET

5. Auth Server --> Client: Access token + Refresh token

4.3 JWT Structure and Security

{
  "header": {
    "alg": "RS256",
    "typ": "JWT",
    "kid": "key-id-001"
  },
  "payload": {
    "sub": "user_123",
    "iss": "auth.example.com",
    "aud": "api.example.com",
    "exp": 1744540800,
    "iat": 1744537200,
    "scope": "read:users write:orders",
    "roles": ["admin"]
  }
}

JWT Security Checklist:

  • Prefer RS256 (asymmetric) algorithm over HS256
  • Set short expiration times (15 minutes or less)
  • Store refresh tokens server-side
  • Always validate iss, aud, and exp claims
  • Reject the none algorithm
  • Validate kid (Key ID) to prevent key confusion attacks

4.4 mTLS (Mutual TLS)

Service-to-service communication security:

1. Issue unique X.509 certificates to each service
2. Mutual certificate verification during communication
3. Automatic certificate renewal (cert-manager, etc.)

Pros:
  - Cryptographically prove service identity
  - Network-level encryption
  - Foundation for Zero Trust architecture

Cons:
  - Certificate management complexity
  - Service outage risk on certificate expiration
  - Debugging difficulty

5. Rate Limiting

5.1 Algorithm Comparison

Token Bucket:

Principle: Tokens are added at a constant rate; consumed per request
Pros: Allows burst traffic while maintaining average rate
Cons: Memory usage

Example:
  - Bucket size: 100 tokens
  - Refill rate: 10 tokens/second
  - 1 token consumed per request
  - Burst: up to 100 simultaneous requests

Sliding Window:

Principle: Count requests within a time window
Pros: Accurate limiting, solves window boundary issues
Cons: Requires storing previous window counters

Example:
  Current time: 12:01:30
  Previous window (12:00-12:01): 80 requests
  Current window (12:01-12:02): 20 requests
  Weighted sum: 80 * 0.5 + 20 = 60 (within limit of 100)

5.2 Rate Limit Response Headers

HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1744540860
Retry-After: 60

5.3 Rate Limiting with API Gateway

# Kong Gateway configuration example
plugins:
  - name: rate-limiting
    config:
      minute: 100
      hour: 1000
      policy: redis
      redis_host: redis-cluster
      redis_port: 6379
      fault_tolerant: true
      hide_client_headers: false

6. Microservices Patterns

6.1 Service Decomposition - DDD Bounded Context

E-commerce domain analysis:

[Order Context]              [Product Context]
  - Order                    - Product
  - OrderItem                - Category
  - OrderStatus              - Inventory
  - Payment                  - Price

[User Context]               [Shipping Context]
  - User                     - Shipment
  - Address                  - Tracking
  - Authentication           - Carrier
  - Profile                  - Delivery

[Notification Context]       [Search Context]
  - Notification             - SearchIndex
  - Template                 - Filter
  - Channel                  - Ranking

Decomposition principles:

  • Business Capability based decomposition
  • Data ownership: Each service owns its own database
  • Team autonomy: Two-pizza team rule (6-8 people)
  • Deployment independence: Deployable without changing other services
  • Loose coupling, high cohesion

6.2 Service Communication - Synchronous vs Asynchronous

Synchronous (Request-Response):

  [API Gateway] --> [Order Service] --> [Payment Service]
                                    --> [Inventory Service]

  Pros: Immediate response, simple implementation
  Cons: Tight coupling, cascading failures, latency accumulation

Asynchronous (Event-Driven):

  [Order Service] --> [Message Broker] --> [Payment Service]
                                       --> [Inventory Service]
                                       --> [Notification Service]

  Pros: Loose coupling, high resilience, scalability
  Cons: Eventual consistency, debugging difficulty

Kafka vs RabbitMQ:

FeatureApache KafkaRabbitMQ
ModelPub/Sub + LogQueue + Exchange
OrderingGuaranteed within partitionGuaranteed within queue
ThroughputVery high (millions/sec)High (tens of thousands/sec)
RetentionRetained for configured periodDeleted after consumption
ReprocessingPossible via offset resetNot possible (use DLQ)
Use CaseEvent streaming, logsTask queues, RPC

6.3 API Gateway Pattern

Responsibilities:
  - Request routing
  - Authentication / Authorization
  - Rate limiting
  - Load balancing
  - Request/Response transformation
  - Circuit breaking
  - Monitoring / Logging

Key solutions:
  - Kong: Open source, plugin ecosystem
  - Envoy: High performance, L7 proxy
  - AWS API Gateway: Managed, serverless
  - NGINX: Lightweight, high performance
  - Traefik: Automatic service discovery
# Envoy routing configuration example
static_resources:
  listeners:
    - name: listener_0
      address:
        socket_address:
          address: 0.0.0.0
          port_value: 8080
      filter_chains:
        - filters:
            - name: envoy.filters.network.http_connection_manager
              typed_config:
                route_config:
                  virtual_hosts:
                    - name: backend
                      domains: ["*"]
                      routes:
                        - match:
                            prefix: "/api/v1/users"
                          route:
                            cluster: user_service
                        - match:
                            prefix: "/api/v1/orders"
                          route:
                            cluster: order_service

6.4 Service Discovery

Client-side discovery:
  1. Service A --> Registry: Look up Service B address
  2. Service A --> Service B: Direct call
  Tools: Eureka, Consul

Server-side discovery:
  1. Service A --> Load Balancer: Request
  2. Load Balancer --> Registry: Look up address
  3. Load Balancer --> Service B: Forward
  Tools: K8s Service + DNS, AWS ALB

Kubernetes DNS-based:
  Internal DNS: service-name.namespace.svc.cluster.local
  Example: order-service.production.svc.cluster.local

6.5 Circuit Breaker Pattern

State transitions:

  [Closed] --failure threshold exceeded--> [Open]
  [Open]   --timeout elapsed--> [Half-Open]
  [Half-Open] --success--> [Closed]
  [Half-Open] --failure--> [Open]
// Resilience4j configuration example
CircuitBreakerConfig config = CircuitBreakerConfig.custom()
    .failureRateThreshold(50)        // Open at 50% failure
    .waitDurationInOpenState(
        Duration.ofSeconds(30))       // Half-Open after 30s
    .slidingWindowSize(10)            // Based on last 10 calls
    .minimumNumberOfCalls(5)          // Evaluate after 5 calls
    .permittedNumberOfCallsInHalfOpenState(3)
    .build();

CircuitBreaker circuitBreaker = CircuitBreaker.of(
    "paymentService", config);

Supplier<PaymentResponse> decoratedSupplier =
    CircuitBreaker.decorateSupplier(
        circuitBreaker,
        () -> paymentService.processPayment(request)
    );

Fallback strategies:

  • Return cached response: Use previous successful response
  • Return default value: Predefined default response
  • Call alternative service: Use backup service
  • Graceful degradation: Reduced functionality response

7. Service Mesh

7.1 What Is a Service Mesh?

Service mesh architecture:

  [Service A] <--> [Sidecar Proxy] <--> [Sidecar Proxy] <--> [Service B]
                          |                      |
                          v                      v
                   [Control Plane (Istio/Linkerd)]
                          |
                   [Config, Policy, Certificate Management]

Sidecar proxy responsibilities:
  - Traffic routing and load balancing
  - mTLS encryption
  - Circuit breaking
  - Retries and timeouts
  - Metrics collection
  - Distributed tracing

7.2 Istio vs Linkerd

FeatureIstioLinkerd
Data PlaneEnvoylinkerd2-proxy (Rust)
Resource UsageHighLow
FeaturesVery richCore features focused
Learning CurveSteepGentle
CommunityGoogle-ledCNCF Graduated
Multi-clusterSupportedSupported

7.3 Istio Traffic Management

# Canary deployment - traffic splitting
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: order-service
spec:
  hosts:
    - order-service
  http:
    - route:
        - destination:
            host: order-service
            subset: v1
          weight: 90
        - destination:
            host: order-service
            subset: v2
          weight: 10
      retries:
        attempts: 3
        perTryTimeout: 2s
        retryOn: 5xx,reset,connect-failure
      timeout: 10s
# Circuit breaker configuration
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: order-service
spec:
  host: order-service
  trafficPolicy:
    connectionPool:
      tcp:
        maxConnections: 100
      http:
        h2UpgradePolicy: DEFAULT
        http1MaxPendingRequests: 100
        http2MaxRequests: 1000
    outlierDetection:
      consecutive5xxErrors: 5
      interval: 30s
      baseEjectionTime: 30s
      maxEjectionPercent: 50

8. Event-Driven Architecture

8.1 Event Sourcing

Traditional approach: Store only current state
  orders table: id=1, status=SHIPPED, total=50000

Event Sourcing: Store all state changes as events
  events table:
    1. OrderCreated     (total=50000)
    2. PaymentReceived  (amount=50000)
    3. OrderConfirmed   ()
    4. ItemShipped       (tracking=KR123456)

Pros:
  - Complete audit log
  - Time travel (reconstruct state at any point)
  - Replay events to create new views
  - Easier debugging

Cons:
  - Event schema evolution management
  - Event store size growth
  - Eventual consistency

8.2 CQRS (Command Query Responsibility Segregation)

CQRS Architecture:

  [Command] --> [Write Model] --> [Event Store]
                                       |
                                  [Event Bus]
                                       |
                              [Read Model Projection]
                                       |
                                [Query] <-- [Read DB]

Command (Write):
  - Execute domain logic
  - Publish events
  - Normalized database

Query (Read):
  - Denormalized read-only views
  - Optimized for fast queries
  - Multiple storage options (ES, Redis, etc.)

8.3 Saga Pattern - Distributed Transactions

In microservice environments, 2PC (Two-Phase Commit) has performance and availability issues. The Saga pattern manages distributed transactions as a sequence of local transactions.

Choreography approach:

Order creation Saga:

  [Order Service]       [Payment Service]    [Inventory Service]
       |                      |                     |
  OrderCreated -------->      |                     |
       |              PaymentProcessed -------->    |
       |                      |              InventoryReserved
       |                      |                     |
       |              <--- (on success) --->        |
  OrderConfirmed              |                     |

Compensating transactions (on failure):
  InventoryReserveFailed --> PaymentRefunded --> OrderCancelled

Orchestration approach:

  [Order Saga Orchestrator]
       |
       |--> 1. Order Service: Create order
       |--> 2. Payment Service: Process payment
       |--> 3. Inventory Service: Reserve inventory
       |--> 4. Shipping Service: Create shipment
       |
  (On failure, compensate in reverse)
       |--> 3c. Inventory: Release stock
       |--> 2c. Payment: Issue refund
       |--> 1c. Order: Cancel order

9. Distributed Tracing

9.1 Correlation ID Pattern

Request flow:

  [Client]
    X-Request-ID: req-abc-123
       |
  [API Gateway]
    X-Request-ID: req-abc-123
    X-Correlation-ID: corr-xyz-789
       |
  [Order Service] ----------> [Payment Service]
    trace_id: corr-xyz-789     trace_id: corr-xyz-789
    span_id: span-001          span_id: span-002
    parent_span_id: null        parent_span_id: span-001
       |
       ----------> [Inventory Service]
                    trace_id: corr-xyz-789
                    span_id: span-003
                    parent_span_id: span-001

9.2 OpenTelemetry Integration

// OpenTelemetry SDK initialization
import { NodeSDK } from '@opentelemetry/sdk-node';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-grpc';
import { HttpInstrumentation } from '@opentelemetry/instrumentation-http';
import { ExpressInstrumentation } from '@opentelemetry/instrumentation-express';

const sdk = new NodeSDK({
  traceExporter: new OTLPTraceExporter({
    url: 'http://otel-collector:4317',
  }),
  instrumentations: [
    new HttpInstrumentation(),
    new ExpressInstrumentation(),
  ],
});

sdk.start();
// Manual span creation
import { trace, SpanStatusCode } from '@opentelemetry/api';

const tracer = trace.getTracer('order-service');

async function processOrder(orderId: string) {
  const span = tracer.startSpan('processOrder', {
    attributes: {
      'order.id': orderId,
      'service.name': 'order-service',
    },
  });

  try {
    span.addEvent('Validating order');
    await validateOrder(orderId);

    span.addEvent('Processing payment');
    await processPayment(orderId);

    span.setStatus({ code: SpanStatusCode.OK });
  } catch (error) {
    span.setStatus({
      code: SpanStatusCode.ERROR,
      message: String(error),
    });
    throw error;
  } finally {
    span.end();
  }
}

9.3 Three Pillars of Observability

1. Logs:
   - Structured logging (JSON)
   - Log levels (DEBUG, INFO, WARN, ERROR)
   - Include Correlation ID
   - ELK Stack / Loki

2. Metrics:
   - RED metrics: Rate, Errors, Duration
   - USE metrics: Utilization, Saturation, Errors
   - Prometheus + Grafana

3. Traces:
   - Distributed request tracking
   - Span-based visualization
   - Jaeger / Zipkin / Tempo

10. Practical Example: E-commerce MSA Design

10.1 Overall Architecture

                    [CDN / CloudFront]
                           |
                    [API Gateway (Kong)]
                     /    |    |     \
                    /     |    |      \
  [User Service] [Product] [Order] [Payment]
       |          Service   Service   Service
       |            |         |         |
  [User DB]    [Product DB] [Order DB] [Payment DB]
  (PostgreSQL)  (PostgreSQL) (PostgreSQL) (PostgreSQL)
                    |         |
              [Search Service] [Notification]
                    |           Service
              [Elasticsearch]     |
                              [Kafka]
                                |
                         [Email/SMS/Push]

10.2 Technology Stack per Service

User Service:
  - Language: Go
  - DB: PostgreSQL
  - Cache: Redis (sessions)
  - Communication: REST + gRPC

Product Service:
  - Language: Java (Spring Boot)
  - DB: PostgreSQL
  - Cache: Redis (product info)
  - Search: Elasticsearch
  - Communication: REST + gRPC

Order Service:
  - Language: Java (Spring Boot)
  - DB: PostgreSQL
  - Messaging: Kafka (order events)
  - Communication: gRPC + Kafka

Payment Service:
  - Language: Go
  - DB: PostgreSQL
  - External: Payment gateway integration
  - Communication: gRPC

Notification Service:
  - Language: Node.js
  - DB: MongoDB (templates)
  - Messaging: Kafka Consumer
  - External: SendGrid, Firebase

10.3 Order Processing Flow

1. Client --> API Gateway: POST /api/v1/orders
2. API Gateway --> Order Service: Create order request
3. Order Service --> Product Service (gRPC): Check inventory
4. Order Service --> Kafka: Publish OrderCreated event
5. Payment Service (Consumer): Process payment
6. Payment Service --> Kafka: Publish PaymentCompleted event
7. Order Service (Consumer): Update order status
8. Notification Service (Consumer): Send confirmation email
9. Product Service (Consumer): Decrement inventory

10.4 Failure Handling Strategies

Circuit Breaker:
  - Accept orders when Payment service is down
  - Queue payments in Kafka for later processing

Retry + Exponential Backoff:
  - 1st retry: 100ms
  - 2nd retry: 200ms
  - 3rd retry: 400ms
  - Maximum retries: 5

Bulkhead Pattern:
  - Isolate thread pools per service
  - Prevent one service failure from cascading

Dead Letter Queue:
  - Move failed messages to DLQ
  - Manual analysis and reprocessing
  - Alert ops team via notifications

10.5 Deployment Strategy

# Kubernetes Deployment - Canary deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: order-service-canary
  labels:
    app: order-service
    version: v2
spec:
  replicas: 1
  selector:
    matchLabels:
      app: order-service
      version: v2
  template:
    metadata:
      labels:
        app: order-service
        version: v2
    spec:
      containers:
        - name: order-service
          image: order-service:2.0.0
          ports:
            - containerPort: 8080
          resources:
            requests:
              memory: "256Mi"
              cpu: "250m"
            limits:
              memory: "512Mi"
              cpu: "500m"
          readinessProbe:
            httpGet:
              path: /health/ready
              port: 8080
            initialDelaySeconds: 10
            periodSeconds: 5
          livenessProbe:
            httpGet:
              path: /health/live
              port: 8080
            initialDelaySeconds: 30
            periodSeconds: 10

Conclusion

API design and microservices architecture are core competencies in modern software development. Here are the key takeaways:

  1. API design is a contract - Consistent naming, appropriate status codes, and clear error messages determine developer productivity
  2. Technology choice depends on context - REST, GraphQL, and gRPC each have distinct strengths, and mixing them is common
  3. Security from the start - Auth, mTLS, and rate limiting should be considered at design time, not as an afterthought
  4. Decompose services carefully - Based on DDD Bounded Contexts, neither too small nor too large
  5. Failures will happen - Build resilience with Circuit Breaker, Retry, Bulkhead, and Saga patterns
  6. Observability is operational capability - Integrate logs, metrics, and traces to quickly identify issues

The wisest approach is to start with a monolith and incrementally adopt microservices based on actual needs. Focus on delivering business value rather than striving for technical elegance.