Split View: 플랫폼 아키텍처 의사결정 가이드: 모놀리스와 마이크로서비스 사이

플랫폼 아키텍처 의사결정 가이드: 모놀리스와 마이크로서비스 사이

아키텍처 결정은 왜 어려운가
아키텍처 스펙트럼: 이분법을 넘어서
의사결정 매트릭스
모듈러 모놀리스: 실무적으로 가장 간과되는 선택지
모듈 경계 위반 감지 자동화
Architecture Decision Record(ADR) 작성법
마이그레이션 전략: 점진적 분리
실전 트러블슈팅
참고 자료

아키텍처 결정은 왜 어려운가

"마이크로서비스로 가야 할까요, 모놀리스를 유지해야 할까요?"

이 질문은 기술적 질문처럼 보이지만, 실제로는 조직 구조, 팀 역량, 비즈니스 단계, 운영 성숙도를 모두 포함하는 경영 판단이다. 기술적으로 마이크로서비스가 "더 좋은" 아키텍처라고 해도, 3명으로 구성된 초기 스타트업이 12개의 서비스를 분리하면 개발 속도가 오히려 느려지고 운영 비용이 폭발한다.

Martin Fowler는 이를 "Microservice Premium"이라고 불렀다. 마이크로서비스는 일정 규모 이상에서는 생산성이 모놀리스를 앞서지만, 그 이하에서는 분산 시스템의 복잡도가 순수한 비용으로 작용한다.

이 글은 아키텍처 선택을 이분법이 아닌 스펙트럼으로 보고, 자신의 상황에 맞는 위치를 찾는 의사결정 프레임워크를 제시한다.

아키텍처 스펙트럼: 이분법을 넘어서

실무에서 선택지는 "모놀리스 vs 마이크로서비스"의 이분법이 아니다. 그 사이에 여러 단계가 있다.

[모놀리스] -----> [모듈러 모놀리스] -----> [매크로서비스] -----> [마이크로서비스]

  단일 배포 단위      모듈 경계 분리        2-5개 큰 서비스      도메인별 독립 서비스
  단일 DB             모듈별 스키마 분리     서비스별 DB           서비스별 DB
  팀 1개              팀 1-2개              팀 2-5개             팀 5+개
  운영 단순            운영 단순             운영 보통             운영 복잡

모놀리스: 모든 코드가 하나의 배포 단위. 내부 모듈 간 함수 호출. 하나의 DB 트랜잭션으로 일관성 보장.

모듈러 모놀리스: 배포는 하나지만, 내부적으로 모듈(패키지) 경계가 명확하게 분리된다. 모듈 간 통신은 명시적 인터페이스를 통해서만 가능하고, 직접적인 DB 테이블 참조를 금지한다. 나중에 분리가 필요하면 모듈 단위로 추출한다.

매크로서비스: 2-5개의 큰 서비스로 나눈다. "주문/결제 서비스"와 "사용자/인증 서비스"처럼 큰 도메인 단위로 분리. 마이크로서비스의 복잡도 없이 독립 배포의 이점을 얻는다.

마이크로서비스: 도메인 개념 하나가 하나의 서비스. 수십~수백 개의 서비스. 각 서비스가 독립 배포, 독립 DB, 독립 스케일링.

의사결정 매트릭스

각 축에 대해 자신의 상황을 점수화하면 적합한 위치가 보인다.

평가 축	모놀리스 (1점)	모듈러 모놀리스 (2점)	매크로서비스 (3점)	마이크로서비스 (4점)
팀 규모	1-5명	5-15명	15-40명	40명+
배포 빈도 요구	주 1-2회	일 1-2회	일 3-10회	일 10회+ 또는 서비스별 독립
도메인 변경 빈도	높음 (경계가 자주 변동)	중간	낮음 (경계가 안정적)	매우 낮음
운영 역량	서버 1-2대 운영	CI/CD 파이프라인 운영	컨테이너 오케스트레이션	service mesh, 분산 트레이싱
일관성 요건	강한 일관성 필수	모듈 간 eventual OK	eventual consistency	eventual consistency
확장성 요건	수직 확장으로 충분	수직 + 일부 수평	서비스별 수평 확장	서비스별 독립 확장 필수

총점 해석:

6-10점: 모놀리스 또는 모듈러 모놀리스
11-16점: 모듈러 모놀리스 또는 매크로서비스
17-24점: 매크로서비스 또는 마이크로서비스

모듈러 모놀리스: 실무적으로 가장 간과되는 선택지

모듈러 모놀리스는 "모놀리스의 운영 단순성"과 "마이크로서비스의 모듈 독립성"을 결합한다. 특히 팀이 5-15명이고, 도메인 경계가 아직 확정되지 않은 경우에 최적이다.

핵심 원칙은 세 가지다.

모듈 간 직접 DB 참조 금지: 다른 모듈의 테이블을 직접 JOIN하지 않는다.
명시적 인터페이스를 통한 통신: 모듈 간 호출은 public API(Python이면 facade 클래스, Java면 interface)를 통해서만 한다.
모듈별 스키마 분리: 같은 DB 서버 내에서 스키마(schema)를 분리하여 물리적 분리 없이 논리적 분리를 확보한다.

"""
모듈러 모놀리스의 모듈 간 통신 예시.

order 모듈이 payment 모듈에 접근할 때,
payment의 내부 구현이 아닌 public facade를 통한다.
"""

# === payment/facade.py (payment 모듈의 공개 인터페이스) ===
from dataclasses import dataclass
from typing import Optional


@dataclass
class PaymentResult:
    reservation_id: str
    status: str
    amount: int
    currency: str


class PaymentFacade:
    """Payment 모듈의 공개 인터페이스.

    다른 모듈은 이 클래스를 통해서만 payment 기능에 접근한다.
    내부 구현(repository, service, domain model)에 직접 접근 금지.
    """

    def __init__(self, payment_service):
        self._service = payment_service

    def reserve_payment(
        self,
        customer_id: str,
        amount: int,
        currency: str = "KRW",
        idempotency_key: Optional[str] = None,
    ) -> PaymentResult:
        """결제를 예약한다. 실제 청구는 confirm 시점에 발생."""
        reservation = self._service.create_reservation(
            customer_id=customer_id,
            amount=amount,
            currency=currency,
            idempotency_key=idempotency_key,
        )
        return PaymentResult(
            reservation_id=reservation.id,
            status=reservation.status.value,
            amount=reservation.amount,
            currency=reservation.currency,
        )

    def confirm_payment(self, reservation_id: str) -> PaymentResult:
        """예약된 결제를 확정한다."""
        result = self._service.confirm(reservation_id)
        return PaymentResult(
            reservation_id=result.id,
            status=result.status.value,
            amount=result.amount,
            currency=result.currency,
        )

    def cancel_payment(self, reservation_id: str) -> None:
        """예약된 결제를 취소한다."""
        self._service.cancel(reservation_id)


# === order/service.py (order 모듈에서 payment facade 사용) ===

class OrderService:
    """주문 서비스.

    payment 모듈의 내부 구현에 의존하지 않고,
    PaymentFacade를 통해서만 결제 기능에 접근한다.
    """

    def __init__(self, order_repo, payment_facade: PaymentFacade):
        self.order_repo = order_repo
        self.payment = payment_facade

    def create_order(self, customer_id: str, items: list, total: int) -> dict:
        # 1. 주문 생성
        order = self.order_repo.create(
            customer_id=customer_id,
            items=items,
            total=total,
        )

        # 2. 결제 예약 (payment facade 통해)
        try:
            payment_result = self.payment.reserve_payment(
                customer_id=customer_id,
                amount=total,
                idempotency_key=f"order-{order.id}",
            )
            order.payment_reservation_id = payment_result.reservation_id
            self.order_repo.save(order)
        except Exception as e:
            order.status = "payment_failed"
            self.order_repo.save(order)
            raise

        return {"order_id": order.id, "status": order.status}

모듈 경계 위반 감지 자동화

모듈러 모놀리스의 가장 큰 위험은 시간이 지나면서 모듈 경계가 무너지는 것이다. "급하니까 직접 import하자"가 반복되면 다시 빅볼오브머드(big ball of mud)로 돌아간다. 이를 CI에서 자동으로 감지해야 한다.

"""
모듈 경계 위반 감지 스크립트.

각 모듈은 다른 모듈의 facade만 import할 수 있다.
내부 패키지(service, repository, domain)를 직접 import하면 위반이다.
"""
import ast
import sys
from pathlib import Path
from dataclasses import dataclass


@dataclass
class Violation:
    file: str
    line: int
    importing_module: str
    imported_module: str
    imported_path: str
    reason: str


# 모듈 목록과 허용된 공개 패키지
MODULE_CONFIG = {
    "order": {"public": ["order.facade"]},
    "payment": {"public": ["payment.facade"]},
    "inventory": {"public": ["inventory.facade"]},
    "shipping": {"public": ["shipping.facade"]},
    "user": {"public": ["user.facade"]},
}


def get_module_name(file_path: str) -> str | None:
    """파일 경로에서 모듈 이름을 추출한다."""
    parts = Path(file_path).parts
    for module_name in MODULE_CONFIG:
        if module_name in parts:
            return module_name
    return None


def check_file(file_path: str) -> list[Violation]:
    """단일 파일의 import를 검사하여 위반 목록을 반환한다."""
    violations = []
    source_module = get_module_name(file_path)
    if source_module is None:
        return violations

    with open(file_path) as f:
        try:
            tree = ast.parse(f.read())
        except SyntaxError:
            return violations

    for node in ast.walk(tree):
        if isinstance(node, ast.Import):
            for alias in node.names:
                _check_import(file_path, source_module, alias.name, node.lineno, violations)
        elif isinstance(node, ast.ImportFrom) and node.module:
            _check_import(file_path, source_module, node.module, node.lineno, violations)

    return violations


def _check_import(
    file_path: str,
    source_module: str,
    imported_path: str,
    line: int,
    violations: list[Violation],
):
    """개별 import 구문이 모듈 경계를 위반하는지 검사한다."""
    for target_module, config in MODULE_CONFIG.items():
        if target_module == source_module:
            continue  # 같은 모듈 내부 import는 OK

        if imported_path.startswith(target_module + "."):
            # 다른 모듈을 import하고 있음 -> public 패키지인지 확인
            is_public = any(
                imported_path.startswith(pub)
                for pub in config["public"]
            )
            if not is_public:
                violations.append(Violation(
                    file=file_path,
                    line=line,
                    importing_module=source_module,
                    imported_module=target_module,
                    imported_path=imported_path,
                    reason=f"Direct import of '{target_module}' internals. "
                           f"Use {config['public']} instead.",
                ))


def main():
    """프로젝트 전체 파일을 검사하고 위반을 보고한다."""
    project_root = Path(sys.argv[1]) if len(sys.argv) > 1 else Path(".")
    all_violations = []

    for py_file in project_root.rglob("*.py"):
        violations = check_file(str(py_file))
        all_violations.extend(violations)

    if all_violations:
        print(f"\n{'='*60}")
        print(f"Module boundary violations found: {len(all_violations)}")
        print(f"{'='*60}\n")
        for v in all_violations:
            print(f"  {v.file}:{v.line}")
            print(f"    {v.importing_module} -> {v.imported_module} ({v.imported_path})")
            print(f"    {v.reason}\n")
        sys.exit(1)
    else:
        print("No module boundary violations found.")
        sys.exit(0)


if __name__ == "__main__":
    main()

이 스크립트를 CI에 추가하면 모듈 경계 위반이 merge 전에 차단된다.

# .github/workflows/boundary-check.yml
name: Module Boundary Check
on: [pull_request]
jobs:
  check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: '3.12'
      - run: python scripts/check_module_boundaries.py src/

Architecture Decision Record(ADR) 작성법

아키텍처 결정은 반드시 기록해야 한다. 6개월 뒤에 "왜 이렇게 했지?"를 물었을 때 답할 수 있어야 한다. Michael Nygard가 제안한 ADR 형식을 기반으로 팀에 맞게 조정한다.

# ADR-007: 결제 모듈을 독립 서비스로 분리

## 상태: 승인됨 (2026-02-15)

## 맥락

현재 결제 모듈은 모놀리스 내부에서 facade를 통해 호출된다.
PCI-DSS 인증 범위를 축소하기 위해 결제 처리를 별도 서비스로 분리해야 한다.
또한 결제 서비스는 주문 서비스와 독립적으로 스케일링해야 한다
(Black Friday 기간 결제 트래픽이 10배 증가).

## 결정

결제 모듈을 독립 gRPC 서비스로 분리한다.

- 통신: gRPC (내부), REST (외부 PG 연동)
- DB: 별도 PostgreSQL 인스턴스
- 일관성: Saga 패턴 (orchestration 방식)
- 배포: 독립 Kubernetes deployment

## 결과

### 긍정적

- PCI-DSS 인증 범위가 결제 서비스로 한정됨
- 결제 서비스 독립 스케일링 가능
- 결제 로직 변경 시 주문 서비스 재배포 불필요

### 부정적

- 서비스 간 통신 지연 추가 (~5ms)
- Saga 보상 트랜잭션 구현 및 운영 필요
- 운영 복잡도 증가: 별도 모니터링, 배포 파이프라인

### 수치 근거

- 현재 결제 p99 지연: 150ms -> 분리 후 예상: 155ms (수용 가능)
- 현재 PCI 인증 범위: 전체 인프라 -> 분리 후: 결제 서비스만
- 인프라 비용 증가: 월 $200 (별도 DB + 서비스 인스턴스)

## 대안 검토

1. 모놀리스 유지 + PCI 범위 전체 인증: 비용과 감사 부담이 큼
2. 서버리스(Lambda)로 결제 분리: cold start 이슈로 p99 지연 불확실

마이그레이션 전략: 점진적 분리

모놀리스에서 마이크로서비스로의 전환은 big bang이 아니라 점진적으로 수행해야 한다. Strangler Fig 패턴을 기반으로 한 단계별 전략이다.

Phase 1: 모듈러 모놀리스화 (1-3개월)
  - 코드 내 모듈 경계 정립
  - facade 인터페이스 도입
  - 경계 위반 감지 CI 구축
  - DB 스키마를 모듈별로 논리 분리

Phase 2: 첫 번째 서비스 추출 (2-4개월)
  - 가장 독립적인 모듈 1개를 서비스로 분리
  - 통신 프로토콜 결정 (gRPC/REST/이벤트)
  - Saga 또는 이벤트 기반 일관성 구현
  - A/B 라우팅으로 신규 서비스와 기존 모듈 병행 운영

Phase 3: 안정화 및 확장 (3-6개월)
  - 첫 번째 서비스의 운영 안정성 확인
  - 모니터링, 알림, 런북 정비
  - 다음 분리 대상 선정 및 반복

Phase 4: 플랫폼 성숙 (지속적)
  - service mesh / API gateway 도입
  - 분산 트레이싱 체계 구축
  - 서비스 카탈로그 및 ownership 관리

각 phase에서의 핵심 게이트 조건은 다음과 같다.

"""
마이그레이션 단계 진행 가능 여부를 판단하는 게이트 체크.

각 phase 완료 후 다음 phase로 진행하기 전에
이 체크를 통과해야 한다.
"""
from dataclasses import dataclass


@dataclass
class PhaseGateCheck:
    name: str
    passed: bool
    detail: str


def check_phase1_gate() -> list[PhaseGateCheck]:
    """Phase 1 완료 게이트: 모듈러 모놀리스화가 충분히 되었는가."""
    return [
        PhaseGateCheck(
            name="module_boundaries_defined",
            passed=True,  # 실제로는 코드 분석 결과를 확인
            detail="All modules have facade interfaces",
        ),
        PhaseGateCheck(
            name="no_boundary_violations",
            passed=True,  # CI에서 위반 0건 확인
            detail="0 boundary violations in last 30 days",
        ),
        PhaseGateCheck(
            name="schema_separated",
            passed=True,  # DB 스키마 분리 확인
            detail="Each module uses its own schema prefix",
        ),
        PhaseGateCheck(
            name="integration_tests_exist",
            passed=True,
            detail="Module integration tests cover 85%+ of facade methods",
        ),
    ]


def check_phase2_gate() -> list[PhaseGateCheck]:
    """Phase 2 완료 게이트: 첫 서비스 추출이 안정적인가."""
    return [
        PhaseGateCheck(
            name="service_p99_latency",
            passed=True,   # 실제 메트릭 기반 판단
            detail="p99 latency < 200ms for 14 consecutive days",
        ),
        PhaseGateCheck(
            name="error_rate",
            passed=True,
            detail="Error rate < 0.1% for 14 consecutive days",
        ),
        PhaseGateCheck(
            name="saga_compensation_tested",
            passed=True,
            detail="Compensation scenarios tested in staging 3+ times",
        ),
        PhaseGateCheck(
            name="runbook_documented",
            passed=True,
            detail="Incident runbook reviewed by on-call team",
        ),
        PhaseGateCheck(
            name="rollback_verified",
            passed=True,
            detail="Rollback to monolith path verified in staging",
        ),
    ]


def evaluate_gate(checks: list[PhaseGateCheck]) -> tuple[bool, str]:
    """게이트 통과 여부를 판단한다."""
    failed = [c for c in checks if not c.passed]
    if failed:
        details = "; ".join(f"{c.name}: {c.detail}" for c in failed)
        return False, f"Gate BLOCKED - {len(failed)} checks failed: {details}"
    return True, "Gate PASSED - all checks passed"

실전 트러블슈팅

분산 트레이싱 없이 마이크로서비스를 시작했다

증상: 사용자가 "주문이 안 된다"고 보고했는데, 어떤 서비스에서 문제가 발생했는지 알 수 없다. 각 서비스 로그를 하나하나 뒤져야 한다.

대응: OpenTelemetry를 도입한다. 각 서비스에서 trace context를 전파하고, Jaeger나 Grafana Tempo에서 전체 요청 흐름을 시각화한다. 이미 서비스가 운영 중이라면 sidecar 방식으로 점진적으로 적용한다.

서비스 경계를 잘못 잡아서 서비스 간 호출이 폭발

증상: 하나의 사용자 요청이 내부적으로 15개 서비스 간 30번의 호출을 발생시킨다. latency가 누적되고 장애 전파 범위가 넓다.

원인: 도메인 경계가 아닌 기술 레이어(프론트엔드 서비스, 데이터 서비스, 로깅 서비스)로 분리했거나, 너무 잘게 쪼갰다.

대응: (1) 서비스 간 호출 패턴을 분석하여 과도한 coupling이 있는 서비스를 합친다. (2) 분리 기준을 "기술 레이어"에서 "비즈니스 도메인"으로 재정립한다. (3) BFF(Backend For Frontend) 패턴으로 프론트엔드 요청을 집약한다.

공유 DB에서 벗어나지 못한다

증상: 서비스를 분리했지만 같은 DB를 공유한다. 한 서비스의 스키마 변경이 다른 서비스를 깨뜨린다. 사실상 "분산 모놀리스"다.

대응: (1) 먼저 DB 뷰(view)를 통해 다른 서비스의 테이블 접근을 간접화한다. (2) 이벤트 기반으로 데이터 동기화를 도입하여 직접 DB 조회를 제거한다. (3) 최종적으로 서비스별 DB를 물리적으로 분리한다. 이 과정은 반드시 점진적으로, 한 테이블씩 진행한다.

참고 자료

Martin Fowler, "MonolithFirst" -- martinfowler.com/bliki/MonolithFirst
Sam Newman, "Building Microservices", O'Reilly, 2nd Edition, 2021
Michael Nygard, "Documenting Architecture Decisions" -- cognitect.com/blog/2011/11/15/documenting-architecture-decisions
Chris Richardson, "Microservices Patterns", Manning, 2018
Martin Fowler, "StranglerFigApplication" -- martinfowler.com/bliki/StranglerFigApplication
Google Cloud Architecture Framework -- cloud.google.com/architecture/framework
OpenTelemetry Documentation -- opentelemetry.io/docs

퀴즈

"Microservice Premium"이란 무엇인가? 정답: ||마이크로서비스를 도입하면 분산 시스템의 복잡도(네트워크 통신, 서비스 디스커버리, 분산 트랜잭션, 모니터링 등)가 추가 비용으로 작용한다는 개념. 일정 규모 이상에서만 이 비용을 상쇄하는 이점이 생긴다.||
모듈러 모놀리스의 세 가지 핵심 원칙은? 정답: ||(1) 모듈 간 직접 DB 참조 금지, (2) 명시적 facade 인터페이스를 통한 모듈 간 통신, (3) 모듈별 DB 스키마 논리적 분리. 이를 통해 모놀리스의 운영 단순성과 마이크로서비스의 모듈 독립성을 결합한다.||
아키텍처 의사결정에서 팀 규모가 중요한 이유는? 정답: ||마이크로서비스는 서비스 소유권, 독립 배포, 운영 모니터링 등 서비스당 일정 수준의 인력이 필요하다. 소규모 팀(5명 이하)이 많은 서비스를 분리하면 한 사람이 여러 서비스를 소유하게 되어 운영 부담이 개발 이점을 상회한다.||
ADR(Architecture Decision Record)에 반드시 포함해야 하는 항목은? 정답: ||맥락(왜 이 결정이 필요한가), 결정(무엇을 선택했는가), 결과(긍정적/부정적 영향), 수치 근거(지연, 비용, 범위 등의 정량적 데이터), 검토한 대안(다른 선택지와 기각 사유).||
Strangler Fig 패턴의 핵심 전략은? 정답: ||기존 시스템을 한 번에 교체하는 것이 아니라, 새로운 시스템을 점진적으로 구축하면서 기존 시스템의 기능을 하나씩 새 시스템으로 라우팅한다. 모든 기능이 이전되면 기존 시스템을 제거한다.||
모듈 경계 위반을 CI에서 자동 감지하는 방법은? 정답: ||Python AST를 분석하여 각 모듈의 import를 검사한다. 다른 모듈의 내부 패키지(service, repository, domain)를 직접 import하면 위반으로 판별하고, 공개 facade만 허용한다. CI에서 이 스크립트를 PR마다 실행하여 위반 시 merge를 차단한다.||
서비스를 분리했지만 DB를 공유하는 "분산 모놀리스"를 해소하는 단계는? 정답: ||(1) DB 뷰를 통해 다른 서비스의 테이블 접근을 간접화, (2) 이벤트 기반 데이터 동기화 도입으로 직접 DB 조회 제거, (3) 서비스별 DB 물리적 분리. 반드시 한 테이블씩 점진적으로 진행한다.||
Phase 2 게이트에서 "rollback 경로 검증"이 필요한 이유는? 정답: ||새로 분리한 서비스에 문제가 발생했을 때, 기존 모놀리스 경로로 즉시 되돌릴 수 있어야 한다. 이 rollback 경로가 실제로 동작하는지 staging에서 미리 검증하지 않으면, 장애 시 복구 시간이 길어진다.||

Platform Architecture Decision Guide: Between Monolith and Microservices

Why Architecture Decisions Are Difficult
The Architecture Spectrum: Beyond the Binary
Decision Matrix
Modular Monolith: The Most Overlooked Choice in Practice
Automating Module Boundary Violation Detection
How to Write Architecture Decision Records (ADR)
Migration Strategy: Incremental Separation
Practical Troubleshooting
References

Why Architecture Decisions Are Difficult

"Should we go with microservices or maintain the monolith?"

This question looks like a technical question, but in reality, it is a management decision that encompasses organizational structure, team capabilities, business stage, and operational maturity. Even if microservices are "better" architecturally, if a 3-person early-stage startup separates into 12 services, development speed actually slows down and operational costs explode.

Martin Fowler called this the "Microservice Premium." Microservices surpass monolith productivity above a certain scale, but below that threshold, the complexity of distributed systems acts as pure cost.

This article views architecture selection not as a binary choice but as a spectrum, and presents a decision-making framework for finding the right position for your situation.

The Architecture Spectrum: Beyond the Binary

In practice, the choices are not a binary "monolith vs microservices." There are several stages in between.

[Monolith] -----> [Modular Monolith] -----> [Macroservices] -----> [Microservices]

  Single deployment    Module boundary       2-5 large services    Domain-specific
  unit                 separation                                  independent services
  Single DB            Per-module schema     Per-service DB        Per-service DB
  1 team               1-2 teams             2-5 teams             5+ teams
  Simple ops           Simple ops            Moderate ops          Complex ops

Monolith: All code in a single deployment unit. Internal module communication via function calls. Consistency guaranteed by a single DB transaction.

Modular Monolith: Deployment is a single unit, but internally, module (package) boundaries are clearly separated. Communication between modules is only possible through explicit interfaces, and direct DB table references are prohibited. If separation is needed later, modules can be extracted as units.

Macroservices: Split into 2-5 large services. Separated by large domain units like "Order/Payment Service" and "User/Auth Service." Gains the benefits of independent deployment without the complexity of microservices.

Microservices: One domain concept per service. Tens to hundreds of services. Each service has independent deployment, independent DB, and independent scaling.

Decision Matrix

By scoring your situation on each axis, you can see the appropriate position.

Evaluation Axis	Monolith (1 pt)	Modular Monolith (2 pts)	Macroservices (3 pts)	Microservices (4 pts)
Team Size	1-5	5-15	15-40	40+
Deployment Frequency	1-2/week	1-2/day	3-10/day	10+/day or per-service independent
Domain Change Frequency	High (boundaries shift often)	Medium	Low (boundaries stable)	Very Low
Ops Capability	1-2 server ops	CI/CD pipeline ops	Container orchestration	Service mesh, distributed tracing
Consistency Requirements	Strong consistency required	Inter-module eventual OK	Eventual consistency	Eventual consistency
Scalability Requirements	Vertical scaling sufficient	Vertical + some horizontal	Per-service horizontal	Per-service independent scaling required

Total Score Interpretation:

6-10 points: Monolith or Modular Monolith
11-16 points: Modular Monolith or Macroservices
17-24 points: Macroservices or Microservices

Modular Monolith: The Most Overlooked Choice in Practice

The Modular Monolith combines "operational simplicity of a monolith" with "module independence of microservices." It is optimal especially when the team is 5-15 people and domain boundaries are not yet finalized.

The three core principles are:

No direct DB references between modules: Do not directly JOIN other module tables.
Communication through explicit interfaces: Inter-module calls are only through public APIs (facade classes in Python, interfaces in Java).
Per-module schema separation: Separate schemas within the same DB server to achieve logical separation without physical separation.

"""
Example of inter-module communication in a Modular Monolith.

When the order module accesses the payment module,
it goes through payment's public facade, not its internal implementation.
"""

# === payment/facade.py (payment module's public interface) ===
from dataclasses import dataclass
from typing import Optional


@dataclass
class PaymentResult:
    reservation_id: str
    status: str
    amount: int
    currency: str


class PaymentFacade:
    """Payment module's public interface.

    Other modules access payment functionality only through this class.
    Direct access to internal implementation (repository, service, domain model) is prohibited.
    """

    def __init__(self, payment_service):
        self._service = payment_service

    def reserve_payment(
        self,
        customer_id: str,
        amount: int,
        currency: str = "KRW",
        idempotency_key: Optional[str] = None,
    ) -> PaymentResult:
        """Reserves a payment. Actual charge occurs at confirm time."""
        reservation = self._service.create_reservation(
            customer_id=customer_id,
            amount=amount,
            currency=currency,
            idempotency_key=idempotency_key,
        )
        return PaymentResult(
            reservation_id=reservation.id,
            status=reservation.status.value,
            amount=reservation.amount,
            currency=reservation.currency,
        )

    def confirm_payment(self, reservation_id: str) -> PaymentResult:
        """Confirms a reserved payment."""
        result = self._service.confirm(reservation_id)
        return PaymentResult(
            reservation_id=result.id,
            status=result.status.value,
            amount=result.amount,
            currency=result.currency,
        )

    def cancel_payment(self, reservation_id: str) -> None:
        """Cancels a reserved payment."""
        self._service.cancel(reservation_id)


# === order/service.py (using payment facade from order module) ===

class OrderService:
    """Order service.

    Does not depend on payment module's internal implementation,
    accesses payment functionality only through PaymentFacade.
    """

    def __init__(self, order_repo, payment_facade: PaymentFacade):
        self.order_repo = order_repo
        self.payment = payment_facade

    def create_order(self, customer_id: str, items: list, total: int) -> dict:
        # 1. Create order
        order = self.order_repo.create(
            customer_id=customer_id,
            items=items,
            total=total,
        )

        # 2. Reserve payment (through payment facade)
        try:
            payment_result = self.payment.reserve_payment(
                customer_id=customer_id,
                amount=total,
                idempotency_key=f"order-{order.id}",
            )
            order.payment_reservation_id = payment_result.reservation_id
            self.order_repo.save(order)
        except Exception as e:
            order.status = "payment_failed"
            self.order_repo.save(order)
            raise

        return {"order_id": order.id, "status": order.status}

Automating Module Boundary Violation Detection

The biggest risk of a Modular Monolith is module boundaries eroding over time. If "let's just import directly since it's urgent" is repeated, you return to a big ball of mud. This must be automatically detected in CI.

"""
Module boundary violation detection script.

Each module can only import the facade of other modules.
Directly importing internal packages (service, repository, domain) is a violation.
"""
import ast
import sys
from pathlib import Path
from dataclasses import dataclass


@dataclass
class Violation:
    file: str
    line: int
    importing_module: str
    imported_module: str
    imported_path: str
    reason: str


# Module list and allowed public packages
MODULE_CONFIG = {
    "order": {"public": ["order.facade"]},
    "payment": {"public": ["payment.facade"]},
    "inventory": {"public": ["inventory.facade"]},
    "shipping": {"public": ["shipping.facade"]},
    "user": {"public": ["user.facade"]},
}


def get_module_name(file_path: str) -> str | None:
    """Extracts module name from file path."""
    parts = Path(file_path).parts
    for module_name in MODULE_CONFIG:
        if module_name in parts:
            return module_name
    return None


def check_file(file_path: str) -> list[Violation]:
    """Inspects imports in a single file and returns a list of violations."""
    violations = []
    source_module = get_module_name(file_path)
    if source_module is None:
        return violations

    with open(file_path) as f:
        try:
            tree = ast.parse(f.read())
        except SyntaxError:
            return violations

    for node in ast.walk(tree):
        if isinstance(node, ast.Import):
            for alias in node.names:
                _check_import(file_path, source_module, alias.name, node.lineno, violations)
        elif isinstance(node, ast.ImportFrom) and node.module:
            _check_import(file_path, source_module, node.module, node.lineno, violations)

    return violations


def _check_import(
    file_path: str,
    source_module: str,
    imported_path: str,
    line: int,
    violations: list[Violation],
):
    """Checks whether an individual import statement violates module boundaries."""
    for target_module, config in MODULE_CONFIG.items():
        if target_module == source_module:
            continue  # Internal imports within the same module are OK

        if imported_path.startswith(target_module + "."):
            # Importing from another module -> check if it's a public package
            is_public = any(
                imported_path.startswith(pub)
                for pub in config["public"]
            )
            if not is_public:
                violations.append(Violation(
                    file=file_path,
                    line=line,
                    importing_module=source_module,
                    imported_module=target_module,
                    imported_path=imported_path,
                    reason=f"Direct import of '{target_module}' internals. "
                           f"Use {config['public']} instead.",
                ))


def main():
    """Inspects all files in the project and reports violations."""
    project_root = Path(sys.argv[1]) if len(sys.argv) > 1 else Path(".")
    all_violations = []

    for py_file in project_root.rglob("*.py"):
        violations = check_file(str(py_file))
        all_violations.extend(violations)

    if all_violations:
        print(f"\n{'='*60}")
        print(f"Module boundary violations found: {len(all_violations)}")
        print(f"{'='*60}\n")
        for v in all_violations:
            print(f"  {v.file}:{v.line}")
            print(f"    {v.importing_module} -> {v.imported_module} ({v.imported_path})")
            print(f"    {v.reason}\n")
        sys.exit(1)
    else:
        print("No module boundary violations found.")
        sys.exit(0)


if __name__ == "__main__":
    main()

Adding this script to CI blocks module boundary violations before merge.

# .github/workflows/boundary-check.yml
name: Module Boundary Check
on: [pull_request]
jobs:
  check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: '3.12'
      - run: python scripts/check_module_boundaries.py src/

How to Write Architecture Decision Records (ADR)

Architecture decisions must be documented. You need to be able to answer "why did we do it this way?" six months later. Based on the ADR format proposed by Michael Nygard, adjust it to fit your team.

# ADR-007: Separating Payment Module into an Independent Service

## Status: Approved (2026-02-15)

## Context

Currently, the payment module is called through a facade inside the monolith.
To reduce PCI-DSS certification scope, payment processing needs to be separated into a standalone service.
Additionally, the payment service needs to scale independently from the order service
(payment traffic increases 10x during Black Friday).

## Decision

Separate the payment module into an independent gRPC service.

- Communication: gRPC (internal), REST (external PG integration)
- DB: Separate PostgreSQL instance
- Consistency: Saga pattern (orchestration approach)
- Deployment: Independent Kubernetes deployment

## Consequences

### Positive

- PCI-DSS certification scope limited to payment service
- Payment service can scale independently
- No need to redeploy order service when payment logic changes

### Negative

- Additional inter-service communication latency (~5ms)
- Need to implement and operate Saga compensating transactions
- Increased operational complexity: separate monitoring, deployment pipeline

### Quantitative Basis

- Current payment p99 latency: 150ms -> Post-separation estimate: 155ms (acceptable)
- Current PCI certification scope: entire infrastructure -> Post-separation: payment service only
- Infrastructure cost increase: $200/month (separate DB + service instances)

## Alternatives Considered

1. Maintain monolith + certify entire PCI scope: High cost and audit burden
2. Separate payment as serverless (Lambda): p99 latency uncertain due to cold start issues

Migration Strategy: Incremental Separation

The transition from monolith to microservices should be performed incrementally, not as a big bang. Here is a step-by-step strategy based on the Strangler Fig pattern.

Phase 1: Modular Monolith (1-3 months)
  - Establish module boundaries in code
  - Introduce facade interfaces
  - Build boundary violation detection CI
  - Logically separate DB schema per module

Phase 2: First Service Extraction (2-4 months)
  - Separate the most independent module as a service
  - Decide communication protocol (gRPC/REST/Events)
  - Implement Saga or event-based consistency
  - A/B routing for parallel operation of new service and existing module

Phase 3: Stabilization and Expansion (3-6 months)
  - Confirm operational stability of first service
  - Establish monitoring, alerting, runbooks
  - Select next separation target and repeat

Phase 4: Platform Maturity (ongoing)
  - Introduce service mesh / API gateway
  - Build distributed tracing system
  - Service catalog and ownership management

The key gate conditions at each phase are as follows.

"""
Gate check for determining whether migration phase can proceed.

After completing each phase, this check must pass
before proceeding to the next phase.
"""
from dataclasses import dataclass


@dataclass
class PhaseGateCheck:
    name: str
    passed: bool
    detail: str


def check_phase1_gate() -> list[PhaseGateCheck]:
    """Phase 1 completion gate: Has modular monolith conversion been sufficient."""
    return [
        PhaseGateCheck(
            name="module_boundaries_defined",
            passed=True,  # Actually checks code analysis results
            detail="All modules have facade interfaces",
        ),
        PhaseGateCheck(
            name="no_boundary_violations",
            passed=True,  # Confirmed 0 violations in CI
            detail="0 boundary violations in last 30 days",
        ),
        PhaseGateCheck(
            name="schema_separated",
            passed=True,  # Confirmed DB schema separation
            detail="Each module uses its own schema prefix",
        ),
        PhaseGateCheck(
            name="integration_tests_exist",
            passed=True,
            detail="Module integration tests cover 85%+ of facade methods",
        ),
    ]


def check_phase2_gate() -> list[PhaseGateCheck]:
    """Phase 2 completion gate: Is the first service extraction stable."""
    return [
        PhaseGateCheck(
            name="service_p99_latency",
            passed=True,   # Based on actual metrics
            detail="p99 latency < 200ms for 14 consecutive days",
        ),
        PhaseGateCheck(
            name="error_rate",
            passed=True,
            detail="Error rate < 0.1% for 14 consecutive days",
        ),
        PhaseGateCheck(
            name="saga_compensation_tested",
            passed=True,
            detail="Compensation scenarios tested in staging 3+ times",
        ),
        PhaseGateCheck(
            name="runbook_documented",
            passed=True,
            detail="Incident runbook reviewed by on-call team",
        ),
        PhaseGateCheck(
            name="rollback_verified",
            passed=True,
            detail="Rollback to monolith path verified in staging",
        ),
    ]


def evaluate_gate(checks: list[PhaseGateCheck]) -> tuple[bool, str]:
    """Determines whether the gate passes."""
    failed = [c for c in checks if not c.passed]
    if failed:
        details = "; ".join(f"{c.name}: {c.detail}" for c in failed)
        return False, f"Gate BLOCKED - {len(failed)} checks failed: {details}"
    return True, "Gate PASSED - all checks passed"

Practical Troubleshooting

Started Microservices Without Distributed Tracing

Symptom: A user reports "orders aren't working," but you cannot tell which service has the problem. You have to dig through each service's logs one by one.

Response: Introduce OpenTelemetry. Propagate trace context across each service and visualize the entire request flow in Jaeger or Grafana Tempo. If services are already running, apply it incrementally using the sidecar approach.

Wrong Service Boundaries Cause Explosion of Inter-Service Calls

Symptom: A single user request internally triggers 30 calls across 15 services. Latency accumulates and the failure propagation scope is wide.

Cause: Separated by technology layer (frontend service, data service, logging service) rather than domain boundary, or split too finely.

Response: (1) Analyze inter-service call patterns and merge services with excessive coupling. (2) Re-establish the separation criterion from "technology layer" to "business domain." (3) Aggregate frontend requests using the BFF (Backend For Frontend) pattern.

Cannot Break Free from Shared DB

Symptom: Services are separated but share the same DB. Schema changes in one service break the other. It is effectively a "distributed monolith."

Response: (1) First, indirect other services' table access through DB views. (2) Introduce event-based data synchronization to eliminate direct DB queries. (3) Finally, physically separate DBs per service. This process must be performed incrementally, one table at a time.

References

Martin Fowler, "MonolithFirst" -- martinfowler.com/bliki/MonolithFirst
Sam Newman, "Building Microservices", O'Reilly, 2nd Edition, 2021
Michael Nygard, "Documenting Architecture Decisions" -- cognitect.com/blog/2011/11/15/documenting-architecture-decisions
Chris Richardson, "Microservices Patterns", Manning, 2018
Martin Fowler, "StranglerFigApplication" -- martinfowler.com/bliki/StranglerFigApplication
Google Cloud Architecture Framework -- cloud.google.com/architecture/framework
OpenTelemetry Documentation -- opentelemetry.io/docs

Quiz

What is "Microservice Premium"? Answer: ||The concept that adopting microservices adds the complexity of distributed systems (network communication, service discovery, distributed transactions, monitoring, etc.) as additional cost. Benefits that offset this cost only emerge above a certain scale.||
What are the three core principles of a Modular Monolith? Answer: ||(1) No direct DB references between modules, (2) Inter-module communication through explicit facade interfaces, (3) Logical DB schema separation per module. This combines the operational simplicity of a monolith with the module independence of microservices.||
Why is team size important in architecture decisions? Answer: ||Microservices require a certain level of staffing per service for service ownership, independent deployment, and operational monitoring. If a small team (5 or fewer) separates into many services, one person ends up owning multiple services and the operational burden outweighs development benefits.||
What items must be included in an ADR (Architecture Decision Record)? Answer: ||Context (why this decision is needed), Decision (what was chosen), Consequences (positive/negative impacts), Quantitative basis (latency, cost, scope, etc.), and Alternatives considered (other options and reasons for rejection).||
What is the core strategy of the Strangler Fig pattern? Answer: ||Rather than replacing the existing system all at once, gradually build a new system while routing functions from the old system to the new one, one by one. Once all functions have been migrated, remove the old system.||
How do you automatically detect module boundary violations in CI? Answer: ||Analyze Python AST to inspect imports of each module. If internal packages (service, repository, domain) of another module are directly imported, it is flagged as a violation, allowing only public facades. Running this script per PR in CI blocks violations before merge.||
What are the steps to resolve a "distributed monolith" where services are separated but share a DB? Answer: ||(1) Indirect other services' table access through DB views, (2) Introduce event-based data synchronization to eliminate direct DB queries, (3) Physically separate DBs per service. Must proceed incrementally, one table at a time.||
Why is "rollback path verification" needed at the Phase 2 gate? Answer: ||When problems occur with a newly separated service, you must be able to immediately revert to the existing monolith path. If this rollback path is not verified in staging beforehand, recovery time during incidents becomes extended.||