Platform Architecture Decision Guide: Between Monolith and Microservices

Why Architecture Decisions Are Difficult
The Architecture Spectrum: Beyond the Binary
Decision Matrix
Modular Monolith: The Most Overlooked Choice in Practice
Automating Module Boundary Violation Detection
How to Write Architecture Decision Records (ADR)
Migration Strategy: Incremental Separation
Practical Troubleshooting
References

Why Architecture Decisions Are Difficult

"Should we go with microservices or maintain the monolith?"

This question looks like a technical question, but in reality, it is a management decision that encompasses organizational structure, team capabilities, business stage, and operational maturity. Even if microservices are "better" architecturally, if a 3-person early-stage startup separates into 12 services, development speed actually slows down and operational costs explode.

Martin Fowler called this the "Microservice Premium." Microservices surpass monolith productivity above a certain scale, but below that threshold, the complexity of distributed systems acts as pure cost.

This article views architecture selection not as a binary choice but as a spectrum, and presents a decision-making framework for finding the right position for your situation.

The Architecture Spectrum: Beyond the Binary

In practice, the choices are not a binary "monolith vs microservices." There are several stages in between.

[Monolith] -----> [Modular Monolith] -----> [Macroservices] -----> [Microservices]

  Single deployment    Module boundary       2-5 large services    Domain-specific
  unit                 separation                                  independent services
  Single DB            Per-module schema     Per-service DB        Per-service DB
  1 team               1-2 teams             2-5 teams             5+ teams
  Simple ops           Simple ops            Moderate ops          Complex ops

Monolith: All code in a single deployment unit. Internal module communication via function calls. Consistency guaranteed by a single DB transaction.

Modular Monolith: Deployment is a single unit, but internally, module (package) boundaries are clearly separated. Communication between modules is only possible through explicit interfaces, and direct DB table references are prohibited. If separation is needed later, modules can be extracted as units.

Macroservices: Split into 2-5 large services. Separated by large domain units like "Order/Payment Service" and "User/Auth Service." Gains the benefits of independent deployment without the complexity of microservices.

Microservices: One domain concept per service. Tens to hundreds of services. Each service has independent deployment, independent DB, and independent scaling.

Decision Matrix

By scoring your situation on each axis, you can see the appropriate position.

Evaluation Axis	Monolith (1 pt)	Modular Monolith (2 pts)	Macroservices (3 pts)	Microservices (4 pts)
Team Size	1-5	5-15	15-40	40+
Deployment Frequency	1-2/week	1-2/day	3-10/day	10+/day or per-service independent
Domain Change Frequency	High (boundaries shift often)	Medium	Low (boundaries stable)	Very Low
Ops Capability	1-2 server ops	CI/CD pipeline ops	Container orchestration	Service mesh, distributed tracing
Consistency Requirements	Strong consistency required	Inter-module eventual OK	Eventual consistency	Eventual consistency
Scalability Requirements	Vertical scaling sufficient	Vertical + some horizontal	Per-service horizontal	Per-service independent scaling required

Total Score Interpretation:

6-10 points: Monolith or Modular Monolith
11-16 points: Modular Monolith or Macroservices
17-24 points: Macroservices or Microservices

Modular Monolith: The Most Overlooked Choice in Practice

The Modular Monolith combines "operational simplicity of a monolith" with "module independence of microservices." It is optimal especially when the team is 5-15 people and domain boundaries are not yet finalized.

The three core principles are:

No direct DB references between modules: Do not directly JOIN other module tables.
Communication through explicit interfaces: Inter-module calls are only through public APIs (facade classes in Python, interfaces in Java).
Per-module schema separation: Separate schemas within the same DB server to achieve logical separation without physical separation.

"""
Example of inter-module communication in a Modular Monolith.

When the order module accesses the payment module,
it goes through payment's public facade, not its internal implementation.
"""

# === payment/facade.py (payment module's public interface) ===
from dataclasses import dataclass
from typing import Optional


@dataclass
class PaymentResult:
    reservation_id: str
    status: str
    amount: int
    currency: str


class PaymentFacade:
    """Payment module's public interface.

    Other modules access payment functionality only through this class.
    Direct access to internal implementation (repository, service, domain model) is prohibited.
    """

    def __init__(self, payment_service):
        self._service = payment_service

    def reserve_payment(
        self,
        customer_id: str,
        amount: int,
        currency: str = "KRW",
        idempotency_key: Optional[str] = None,
    ) -> PaymentResult:
        """Reserves a payment. Actual charge occurs at confirm time."""
        reservation = self._service.create_reservation(
            customer_id=customer_id,
            amount=amount,
            currency=currency,
            idempotency_key=idempotency_key,
        )
        return PaymentResult(
            reservation_id=reservation.id,
            status=reservation.status.value,
            amount=reservation.amount,
            currency=reservation.currency,
        )

    def confirm_payment(self, reservation_id: str) -> PaymentResult:
        """Confirms a reserved payment."""
        result = self._service.confirm(reservation_id)
        return PaymentResult(
            reservation_id=result.id,
            status=result.status.value,
            amount=result.amount,
            currency=result.currency,
        )

    def cancel_payment(self, reservation_id: str) -> None:
        """Cancels a reserved payment."""
        self._service.cancel(reservation_id)


# === order/service.py (using payment facade from order module) ===

class OrderService:
    """Order service.

    Does not depend on payment module's internal implementation,
    accesses payment functionality only through PaymentFacade.
    """

    def __init__(self, order_repo, payment_facade: PaymentFacade):
        self.order_repo = order_repo
        self.payment = payment_facade

    def create_order(self, customer_id: str, items: list, total: int) -> dict:
        # 1. Create order
        order = self.order_repo.create(
            customer_id=customer_id,
            items=items,
            total=total,
        )

        # 2. Reserve payment (through payment facade)
        try:
            payment_result = self.payment.reserve_payment(
                customer_id=customer_id,
                amount=total,
                idempotency_key=f"order-{order.id}",
            )
            order.payment_reservation_id = payment_result.reservation_id
            self.order_repo.save(order)
        except Exception as e:
            order.status = "payment_failed"
            self.order_repo.save(order)
            raise

        return {"order_id": order.id, "status": order.status}

Automating Module Boundary Violation Detection

The biggest risk of a Modular Monolith is module boundaries eroding over time. If "let's just import directly since it's urgent" is repeated, you return to a big ball of mud. This must be automatically detected in CI.

"""
Module boundary violation detection script.

Each module can only import the facade of other modules.
Directly importing internal packages (service, repository, domain) is a violation.
"""
import ast
import sys
from pathlib import Path
from dataclasses import dataclass


@dataclass
class Violation:
    file: str
    line: int
    importing_module: str
    imported_module: str
    imported_path: str
    reason: str


# Module list and allowed public packages
MODULE_CONFIG = {
    "order": {"public": ["order.facade"]},
    "payment": {"public": ["payment.facade"]},
    "inventory": {"public": ["inventory.facade"]},
    "shipping": {"public": ["shipping.facade"]},
    "user": {"public": ["user.facade"]},
}


def get_module_name(file_path: str) -> str | None:
    """Extracts module name from file path."""
    parts = Path(file_path).parts
    for module_name in MODULE_CONFIG:
        if module_name in parts:
            return module_name
    return None


def check_file(file_path: str) -> list[Violation]:
    """Inspects imports in a single file and returns a list of violations."""
    violations = []
    source_module = get_module_name(file_path)
    if source_module is None:
        return violations

    with open(file_path) as f:
        try:
            tree = ast.parse(f.read())
        except SyntaxError:
            return violations

    for node in ast.walk(tree):
        if isinstance(node, ast.Import):
            for alias in node.names:
                _check_import(file_path, source_module, alias.name, node.lineno, violations)
        elif isinstance(node, ast.ImportFrom) and node.module:
            _check_import(file_path, source_module, node.module, node.lineno, violations)

    return violations


def _check_import(
    file_path: str,
    source_module: str,
    imported_path: str,
    line: int,
    violations: list[Violation],
):
    """Checks whether an individual import statement violates module boundaries."""
    for target_module, config in MODULE_CONFIG.items():
        if target_module == source_module:
            continue  # Internal imports within the same module are OK

        if imported_path.startswith(target_module + "."):
            # Importing from another module -> check if it's a public package
            is_public = any(
                imported_path.startswith(pub)
                for pub in config["public"]
            )
            if not is_public:
                violations.append(Violation(
                    file=file_path,
                    line=line,
                    importing_module=source_module,
                    imported_module=target_module,
                    imported_path=imported_path,
                    reason=f"Direct import of '{target_module}' internals. "
                           f"Use {config['public']} instead.",
                ))


def main():
    """Inspects all files in the project and reports violations."""
    project_root = Path(sys.argv[1]) if len(sys.argv) > 1 else Path(".")
    all_violations = []

    for py_file in project_root.rglob("*.py"):
        violations = check_file(str(py_file))
        all_violations.extend(violations)

    if all_violations:
        print(f"\n{'='*60}")
        print(f"Module boundary violations found: {len(all_violations)}")
        print(f"{'='*60}\n")
        for v in all_violations:
            print(f"  {v.file}:{v.line}")
            print(f"    {v.importing_module} -> {v.imported_module} ({v.imported_path})")
            print(f"    {v.reason}\n")
        sys.exit(1)
    else:
        print("No module boundary violations found.")
        sys.exit(0)


if __name__ == "__main__":
    main()

Adding this script to CI blocks module boundary violations before merge.

# .github/workflows/boundary-check.yml
name: Module Boundary Check
on: [pull_request]
jobs:
  check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: '3.12'
      - run: python scripts/check_module_boundaries.py src/

How to Write Architecture Decision Records (ADR)

Architecture decisions must be documented. You need to be able to answer "why did we do it this way?" six months later. Based on the ADR format proposed by Michael Nygard, adjust it to fit your team.

# ADR-007: Separating Payment Module into an Independent Service

## Status: Approved (2026-02-15)

## Context

Currently, the payment module is called through a facade inside the monolith.
To reduce PCI-DSS certification scope, payment processing needs to be separated into a standalone service.
Additionally, the payment service needs to scale independently from the order service
(payment traffic increases 10x during Black Friday).

## Decision

Separate the payment module into an independent gRPC service.

- Communication: gRPC (internal), REST (external PG integration)
- DB: Separate PostgreSQL instance
- Consistency: Saga pattern (orchestration approach)
- Deployment: Independent Kubernetes deployment

## Consequences

### Positive

- PCI-DSS certification scope limited to payment service
- Payment service can scale independently
- No need to redeploy order service when payment logic changes

### Negative

- Additional inter-service communication latency (~5ms)
- Need to implement and operate Saga compensating transactions
- Increased operational complexity: separate monitoring, deployment pipeline

### Quantitative Basis

- Current payment p99 latency: 150ms -> Post-separation estimate: 155ms (acceptable)
- Current PCI certification scope: entire infrastructure -> Post-separation: payment service only
- Infrastructure cost increase: $200/month (separate DB + service instances)

## Alternatives Considered

1. Maintain monolith + certify entire PCI scope: High cost and audit burden
2. Separate payment as serverless (Lambda): p99 latency uncertain due to cold start issues

Migration Strategy: Incremental Separation

The transition from monolith to microservices should be performed incrementally, not as a big bang. Here is a step-by-step strategy based on the Strangler Fig pattern.

Phase 1: Modular Monolith (1-3 months)
  - Establish module boundaries in code
  - Introduce facade interfaces
  - Build boundary violation detection CI
  - Logically separate DB schema per module

Phase 2: First Service Extraction (2-4 months)
  - Separate the most independent module as a service
  - Decide communication protocol (gRPC/REST/Events)
  - Implement Saga or event-based consistency
  - A/B routing for parallel operation of new service and existing module

Phase 3: Stabilization and Expansion (3-6 months)
  - Confirm operational stability of first service
  - Establish monitoring, alerting, runbooks
  - Select next separation target and repeat

Phase 4: Platform Maturity (ongoing)
  - Introduce service mesh / API gateway
  - Build distributed tracing system
  - Service catalog and ownership management

The key gate conditions at each phase are as follows.

"""
Gate check for determining whether migration phase can proceed.

After completing each phase, this check must pass
before proceeding to the next phase.
"""
from dataclasses import dataclass


@dataclass
class PhaseGateCheck:
    name: str
    passed: bool
    detail: str


def check_phase1_gate() -> list[PhaseGateCheck]:
    """Phase 1 completion gate: Has modular monolith conversion been sufficient."""
    return [
        PhaseGateCheck(
            name="module_boundaries_defined",
            passed=True,  # Actually checks code analysis results
            detail="All modules have facade interfaces",
        ),
        PhaseGateCheck(
            name="no_boundary_violations",
            passed=True,  # Confirmed 0 violations in CI
            detail="0 boundary violations in last 30 days",
        ),
        PhaseGateCheck(
            name="schema_separated",
            passed=True,  # Confirmed DB schema separation
            detail="Each module uses its own schema prefix",
        ),
        PhaseGateCheck(
            name="integration_tests_exist",
            passed=True,
            detail="Module integration tests cover 85%+ of facade methods",
        ),
    ]


def check_phase2_gate() -> list[PhaseGateCheck]:
    """Phase 2 completion gate: Is the first service extraction stable."""
    return [
        PhaseGateCheck(
            name="service_p99_latency",
            passed=True,   # Based on actual metrics
            detail="p99 latency < 200ms for 14 consecutive days",
        ),
        PhaseGateCheck(
            name="error_rate",
            passed=True,
            detail="Error rate < 0.1% for 14 consecutive days",
        ),
        PhaseGateCheck(
            name="saga_compensation_tested",
            passed=True,
            detail="Compensation scenarios tested in staging 3+ times",
        ),
        PhaseGateCheck(
            name="runbook_documented",
            passed=True,
            detail="Incident runbook reviewed by on-call team",
        ),
        PhaseGateCheck(
            name="rollback_verified",
            passed=True,
            detail="Rollback to monolith path verified in staging",
        ),
    ]


def evaluate_gate(checks: list[PhaseGateCheck]) -> tuple[bool, str]:
    """Determines whether the gate passes."""
    failed = [c for c in checks if not c.passed]
    if failed:
        details = "; ".join(f"{c.name}: {c.detail}" for c in failed)
        return False, f"Gate BLOCKED - {len(failed)} checks failed: {details}"
    return True, "Gate PASSED - all checks passed"

Practical Troubleshooting

Started Microservices Without Distributed Tracing

Symptom: A user reports "orders aren't working," but you cannot tell which service has the problem. You have to dig through each service's logs one by one.

Response: Introduce OpenTelemetry. Propagate trace context across each service and visualize the entire request flow in Jaeger or Grafana Tempo. If services are already running, apply it incrementally using the sidecar approach.

Wrong Service Boundaries Cause Explosion of Inter-Service Calls

Symptom: A single user request internally triggers 30 calls across 15 services. Latency accumulates and the failure propagation scope is wide.

Cause: Separated by technology layer (frontend service, data service, logging service) rather than domain boundary, or split too finely.

Response: (1) Analyze inter-service call patterns and merge services with excessive coupling. (2) Re-establish the separation criterion from "technology layer" to "business domain." (3) Aggregate frontend requests using the BFF (Backend For Frontend) pattern.

Cannot Break Free from Shared DB

Symptom: Services are separated but share the same DB. Schema changes in one service break the other. It is effectively a "distributed monolith."

Response: (1) First, indirect other services' table access through DB views. (2) Introduce event-based data synchronization to eliminate direct DB queries. (3) Finally, physically separate DBs per service. This process must be performed incrementally, one table at a time.

References

Martin Fowler, "MonolithFirst" -- martinfowler.com/bliki/MonolithFirst
Sam Newman, "Building Microservices", O'Reilly, 2nd Edition, 2021
Michael Nygard, "Documenting Architecture Decisions" -- cognitect.com/blog/2011/11/15/documenting-architecture-decisions
Chris Richardson, "Microservices Patterns", Manning, 2018
Martin Fowler, "StranglerFigApplication" -- martinfowler.com/bliki/StranglerFigApplication
Google Cloud Architecture Framework -- cloud.google.com/architecture/framework
OpenTelemetry Documentation -- opentelemetry.io/docs

Quiz

What is "Microservice Premium"? Answer: ||The concept that adopting microservices adds the complexity of distributed systems (network communication, service discovery, distributed transactions, monitoring, etc.) as additional cost. Benefits that offset this cost only emerge above a certain scale.||
What are the three core principles of a Modular Monolith? Answer: ||(1) No direct DB references between modules, (2) Inter-module communication through explicit facade interfaces, (3) Logical DB schema separation per module. This combines the operational simplicity of a monolith with the module independence of microservices.||
Why is team size important in architecture decisions? Answer: ||Microservices require a certain level of staffing per service for service ownership, independent deployment, and operational monitoring. If a small team (5 or fewer) separates into many services, one person ends up owning multiple services and the operational burden outweighs development benefits.||
What items must be included in an ADR (Architecture Decision Record)? Answer: ||Context (why this decision is needed), Decision (what was chosen), Consequences (positive/negative impacts), Quantitative basis (latency, cost, scope, etc.), and Alternatives considered (other options and reasons for rejection).||
What is the core strategy of the Strangler Fig pattern? Answer: ||Rather than replacing the existing system all at once, gradually build a new system while routing functions from the old system to the new one, one by one. Once all functions have been migrated, remove the old system.||
How do you automatically detect module boundary violations in CI? Answer: ||Analyze Python AST to inspect imports of each module. If internal packages (service, repository, domain) of another module are directly imported, it is flagged as a violation, allowing only public facades. Running this script per PR in CI blocks violations before merge.||
What are the steps to resolve a "distributed monolith" where services are separated but share a DB? Answer: ||(1) Indirect other services' table access through DB views, (2) Introduce event-based data synchronization to eliminate direct DB queries, (3) Physically separate DBs per service. Must proceed incrementally, one table at a time.||
Why is "rollback path verification" needed at the Phase 2 gate? Answer: ||When problems occur with a newly separated service, you must be able to immediately revert to the existing monolith path. If this rollback path is not verified in staging beforehand, recovery time during incidents becomes extended.||