- Authors

- Name
- Youngju Kim
- @fjvbn20031
- Why Architecture Decisions Are Difficult
- The Architecture Spectrum: Beyond the Binary
- Decision Matrix
- Modular Monolith: The Most Overlooked Choice in Practice
- Automating Module Boundary Violation Detection
- How to Write Architecture Decision Records (ADR)
- Migration Strategy: Incremental Separation
- Practical Troubleshooting
- References

Why Architecture Decisions Are Difficult
"Should we go with microservices or maintain the monolith?"
This question looks like a technical question, but in reality, it is a management decision that encompasses organizational structure, team capabilities, business stage, and operational maturity. Even if microservices are "better" architecturally, if a 3-person early-stage startup separates into 12 services, development speed actually slows down and operational costs explode.
Martin Fowler called this the "Microservice Premium." Microservices surpass monolith productivity above a certain scale, but below that threshold, the complexity of distributed systems acts as pure cost.
This article views architecture selection not as a binary choice but as a spectrum, and presents a decision-making framework for finding the right position for your situation.
The Architecture Spectrum: Beyond the Binary
In practice, the choices are not a binary "monolith vs microservices." There are several stages in between.
[Monolith] -----> [Modular Monolith] -----> [Macroservices] -----> [Microservices]
Single deployment Module boundary 2-5 large services Domain-specific
unit separation independent services
Single DB Per-module schema Per-service DB Per-service DB
1 team 1-2 teams 2-5 teams 5+ teams
Simple ops Simple ops Moderate ops Complex ops
Monolith: All code in a single deployment unit. Internal module communication via function calls. Consistency guaranteed by a single DB transaction.
Modular Monolith: Deployment is a single unit, but internally, module (package) boundaries are clearly separated. Communication between modules is only possible through explicit interfaces, and direct DB table references are prohibited. If separation is needed later, modules can be extracted as units.
Macroservices: Split into 2-5 large services. Separated by large domain units like "Order/Payment Service" and "User/Auth Service." Gains the benefits of independent deployment without the complexity of microservices.
Microservices: One domain concept per service. Tens to hundreds of services. Each service has independent deployment, independent DB, and independent scaling.
Decision Matrix
By scoring your situation on each axis, you can see the appropriate position.
| Evaluation Axis | Monolith (1 pt) | Modular Monolith (2 pts) | Macroservices (3 pts) | Microservices (4 pts) |
|---|---|---|---|---|
| Team Size | 1-5 | 5-15 | 15-40 | 40+ |
| Deployment Frequency | 1-2/week | 1-2/day | 3-10/day | 10+/day or per-service independent |
| Domain Change Frequency | High (boundaries shift often) | Medium | Low (boundaries stable) | Very Low |
| Ops Capability | 1-2 server ops | CI/CD pipeline ops | Container orchestration | Service mesh, distributed tracing |
| Consistency Requirements | Strong consistency required | Inter-module eventual OK | Eventual consistency | Eventual consistency |
| Scalability Requirements | Vertical scaling sufficient | Vertical + some horizontal | Per-service horizontal | Per-service independent scaling required |
Total Score Interpretation:
- 6-10 points: Monolith or Modular Monolith
- 11-16 points: Modular Monolith or Macroservices
- 17-24 points: Macroservices or Microservices
Modular Monolith: The Most Overlooked Choice in Practice
The Modular Monolith combines "operational simplicity of a monolith" with "module independence of microservices." It is optimal especially when the team is 5-15 people and domain boundaries are not yet finalized.
The three core principles are:
- No direct DB references between modules: Do not directly JOIN other module tables.
- Communication through explicit interfaces: Inter-module calls are only through public APIs (facade classes in Python, interfaces in Java).
- Per-module schema separation: Separate schemas within the same DB server to achieve logical separation without physical separation.
"""
Example of inter-module communication in a Modular Monolith.
When the order module accesses the payment module,
it goes through payment's public facade, not its internal implementation.
"""
# === payment/facade.py (payment module's public interface) ===
from dataclasses import dataclass
from typing import Optional
@dataclass
class PaymentResult:
reservation_id: str
status: str
amount: int
currency: str
class PaymentFacade:
"""Payment module's public interface.
Other modules access payment functionality only through this class.
Direct access to internal implementation (repository, service, domain model) is prohibited.
"""
def __init__(self, payment_service):
self._service = payment_service
def reserve_payment(
self,
customer_id: str,
amount: int,
currency: str = "KRW",
idempotency_key: Optional[str] = None,
) -> PaymentResult:
"""Reserves a payment. Actual charge occurs at confirm time."""
reservation = self._service.create_reservation(
customer_id=customer_id,
amount=amount,
currency=currency,
idempotency_key=idempotency_key,
)
return PaymentResult(
reservation_id=reservation.id,
status=reservation.status.value,
amount=reservation.amount,
currency=reservation.currency,
)
def confirm_payment(self, reservation_id: str) -> PaymentResult:
"""Confirms a reserved payment."""
result = self._service.confirm(reservation_id)
return PaymentResult(
reservation_id=result.id,
status=result.status.value,
amount=result.amount,
currency=result.currency,
)
def cancel_payment(self, reservation_id: str) -> None:
"""Cancels a reserved payment."""
self._service.cancel(reservation_id)
# === order/service.py (using payment facade from order module) ===
class OrderService:
"""Order service.
Does not depend on payment module's internal implementation,
accesses payment functionality only through PaymentFacade.
"""
def __init__(self, order_repo, payment_facade: PaymentFacade):
self.order_repo = order_repo
self.payment = payment_facade
def create_order(self, customer_id: str, items: list, total: int) -> dict:
# 1. Create order
order = self.order_repo.create(
customer_id=customer_id,
items=items,
total=total,
)
# 2. Reserve payment (through payment facade)
try:
payment_result = self.payment.reserve_payment(
customer_id=customer_id,
amount=total,
idempotency_key=f"order-{order.id}",
)
order.payment_reservation_id = payment_result.reservation_id
self.order_repo.save(order)
except Exception as e:
order.status = "payment_failed"
self.order_repo.save(order)
raise
return {"order_id": order.id, "status": order.status}
Automating Module Boundary Violation Detection
The biggest risk of a Modular Monolith is module boundaries eroding over time. If "let's just import directly since it's urgent" is repeated, you return to a big ball of mud. This must be automatically detected in CI.
"""
Module boundary violation detection script.
Each module can only import the facade of other modules.
Directly importing internal packages (service, repository, domain) is a violation.
"""
import ast
import sys
from pathlib import Path
from dataclasses import dataclass
@dataclass
class Violation:
file: str
line: int
importing_module: str
imported_module: str
imported_path: str
reason: str
# Module list and allowed public packages
MODULE_CONFIG = {
"order": {"public": ["order.facade"]},
"payment": {"public": ["payment.facade"]},
"inventory": {"public": ["inventory.facade"]},
"shipping": {"public": ["shipping.facade"]},
"user": {"public": ["user.facade"]},
}
def get_module_name(file_path: str) -> str | None:
"""Extracts module name from file path."""
parts = Path(file_path).parts
for module_name in MODULE_CONFIG:
if module_name in parts:
return module_name
return None
def check_file(file_path: str) -> list[Violation]:
"""Inspects imports in a single file and returns a list of violations."""
violations = []
source_module = get_module_name(file_path)
if source_module is None:
return violations
with open(file_path) as f:
try:
tree = ast.parse(f.read())
except SyntaxError:
return violations
for node in ast.walk(tree):
if isinstance(node, ast.Import):
for alias in node.names:
_check_import(file_path, source_module, alias.name, node.lineno, violations)
elif isinstance(node, ast.ImportFrom) and node.module:
_check_import(file_path, source_module, node.module, node.lineno, violations)
return violations
def _check_import(
file_path: str,
source_module: str,
imported_path: str,
line: int,
violations: list[Violation],
):
"""Checks whether an individual import statement violates module boundaries."""
for target_module, config in MODULE_CONFIG.items():
if target_module == source_module:
continue # Internal imports within the same module are OK
if imported_path.startswith(target_module + "."):
# Importing from another module -> check if it's a public package
is_public = any(
imported_path.startswith(pub)
for pub in config["public"]
)
if not is_public:
violations.append(Violation(
file=file_path,
line=line,
importing_module=source_module,
imported_module=target_module,
imported_path=imported_path,
reason=f"Direct import of '{target_module}' internals. "
f"Use {config['public']} instead.",
))
def main():
"""Inspects all files in the project and reports violations."""
project_root = Path(sys.argv[1]) if len(sys.argv) > 1 else Path(".")
all_violations = []
for py_file in project_root.rglob("*.py"):
violations = check_file(str(py_file))
all_violations.extend(violations)
if all_violations:
print(f"\n{'='*60}")
print(f"Module boundary violations found: {len(all_violations)}")
print(f"{'='*60}\n")
for v in all_violations:
print(f" {v.file}:{v.line}")
print(f" {v.importing_module} -> {v.imported_module} ({v.imported_path})")
print(f" {v.reason}\n")
sys.exit(1)
else:
print("No module boundary violations found.")
sys.exit(0)
if __name__ == "__main__":
main()
Adding this script to CI blocks module boundary violations before merge.
# .github/workflows/boundary-check.yml
name: Module Boundary Check
on: [pull_request]
jobs:
check:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: '3.12'
- run: python scripts/check_module_boundaries.py src/
How to Write Architecture Decision Records (ADR)
Architecture decisions must be documented. You need to be able to answer "why did we do it this way?" six months later. Based on the ADR format proposed by Michael Nygard, adjust it to fit your team.
# ADR-007: Separating Payment Module into an Independent Service
## Status: Approved (2026-02-15)
## Context
Currently, the payment module is called through a facade inside the monolith.
To reduce PCI-DSS certification scope, payment processing needs to be separated into a standalone service.
Additionally, the payment service needs to scale independently from the order service
(payment traffic increases 10x during Black Friday).
## Decision
Separate the payment module into an independent gRPC service.
- Communication: gRPC (internal), REST (external PG integration)
- DB: Separate PostgreSQL instance
- Consistency: Saga pattern (orchestration approach)
- Deployment: Independent Kubernetes deployment
## Consequences
### Positive
- PCI-DSS certification scope limited to payment service
- Payment service can scale independently
- No need to redeploy order service when payment logic changes
### Negative
- Additional inter-service communication latency (~5ms)
- Need to implement and operate Saga compensating transactions
- Increased operational complexity: separate monitoring, deployment pipeline
### Quantitative Basis
- Current payment p99 latency: 150ms -> Post-separation estimate: 155ms (acceptable)
- Current PCI certification scope: entire infrastructure -> Post-separation: payment service only
- Infrastructure cost increase: $200/month (separate DB + service instances)
## Alternatives Considered
1. Maintain monolith + certify entire PCI scope: High cost and audit burden
2. Separate payment as serverless (Lambda): p99 latency uncertain due to cold start issues
Migration Strategy: Incremental Separation
The transition from monolith to microservices should be performed incrementally, not as a big bang. Here is a step-by-step strategy based on the Strangler Fig pattern.
Phase 1: Modular Monolith (1-3 months)
- Establish module boundaries in code
- Introduce facade interfaces
- Build boundary violation detection CI
- Logically separate DB schema per module
Phase 2: First Service Extraction (2-4 months)
- Separate the most independent module as a service
- Decide communication protocol (gRPC/REST/Events)
- Implement Saga or event-based consistency
- A/B routing for parallel operation of new service and existing module
Phase 3: Stabilization and Expansion (3-6 months)
- Confirm operational stability of first service
- Establish monitoring, alerting, runbooks
- Select next separation target and repeat
Phase 4: Platform Maturity (ongoing)
- Introduce service mesh / API gateway
- Build distributed tracing system
- Service catalog and ownership management
The key gate conditions at each phase are as follows.
"""
Gate check for determining whether migration phase can proceed.
After completing each phase, this check must pass
before proceeding to the next phase.
"""
from dataclasses import dataclass
@dataclass
class PhaseGateCheck:
name: str
passed: bool
detail: str
def check_phase1_gate() -> list[PhaseGateCheck]:
"""Phase 1 completion gate: Has modular monolith conversion been sufficient."""
return [
PhaseGateCheck(
name="module_boundaries_defined",
passed=True, # Actually checks code analysis results
detail="All modules have facade interfaces",
),
PhaseGateCheck(
name="no_boundary_violations",
passed=True, # Confirmed 0 violations in CI
detail="0 boundary violations in last 30 days",
),
PhaseGateCheck(
name="schema_separated",
passed=True, # Confirmed DB schema separation
detail="Each module uses its own schema prefix",
),
PhaseGateCheck(
name="integration_tests_exist",
passed=True,
detail="Module integration tests cover 85%+ of facade methods",
),
]
def check_phase2_gate() -> list[PhaseGateCheck]:
"""Phase 2 completion gate: Is the first service extraction stable."""
return [
PhaseGateCheck(
name="service_p99_latency",
passed=True, # Based on actual metrics
detail="p99 latency < 200ms for 14 consecutive days",
),
PhaseGateCheck(
name="error_rate",
passed=True,
detail="Error rate < 0.1% for 14 consecutive days",
),
PhaseGateCheck(
name="saga_compensation_tested",
passed=True,
detail="Compensation scenarios tested in staging 3+ times",
),
PhaseGateCheck(
name="runbook_documented",
passed=True,
detail="Incident runbook reviewed by on-call team",
),
PhaseGateCheck(
name="rollback_verified",
passed=True,
detail="Rollback to monolith path verified in staging",
),
]
def evaluate_gate(checks: list[PhaseGateCheck]) -> tuple[bool, str]:
"""Determines whether the gate passes."""
failed = [c for c in checks if not c.passed]
if failed:
details = "; ".join(f"{c.name}: {c.detail}" for c in failed)
return False, f"Gate BLOCKED - {len(failed)} checks failed: {details}"
return True, "Gate PASSED - all checks passed"
Practical Troubleshooting
Started Microservices Without Distributed Tracing
Symptom: A user reports "orders aren't working," but you cannot tell which service has the problem. You have to dig through each service's logs one by one.
Response: Introduce OpenTelemetry. Propagate trace context across each service and visualize the entire request flow in Jaeger or Grafana Tempo. If services are already running, apply it incrementally using the sidecar approach.
Wrong Service Boundaries Cause Explosion of Inter-Service Calls
Symptom: A single user request internally triggers 30 calls across 15 services. Latency accumulates and the failure propagation scope is wide.
Cause: Separated by technology layer (frontend service, data service, logging service) rather than domain boundary, or split too finely.
Response: (1) Analyze inter-service call patterns and merge services with excessive coupling. (2) Re-establish the separation criterion from "technology layer" to "business domain." (3) Aggregate frontend requests using the BFF (Backend For Frontend) pattern.
Cannot Break Free from Shared DB
Symptom: Services are separated but share the same DB. Schema changes in one service break the other. It is effectively a "distributed monolith."
Response: (1) First, indirect other services' table access through DB views. (2) Introduce event-based data synchronization to eliminate direct DB queries. (3) Finally, physically separate DBs per service. This process must be performed incrementally, one table at a time.
References
- Martin Fowler, "MonolithFirst" -- martinfowler.com/bliki/MonolithFirst
- Sam Newman, "Building Microservices", O'Reilly, 2nd Edition, 2021
- Michael Nygard, "Documenting Architecture Decisions" -- cognitect.com/blog/2011/11/15/documenting-architecture-decisions
- Chris Richardson, "Microservices Patterns", Manning, 2018
- Martin Fowler, "StranglerFigApplication" -- martinfowler.com/bliki/StranglerFigApplication
- Google Cloud Architecture Framework -- cloud.google.com/architecture/framework
- OpenTelemetry Documentation -- opentelemetry.io/docs
Quiz
-
What is "Microservice Premium"? Answer: ||The concept that adopting microservices adds the complexity of distributed systems (network communication, service discovery, distributed transactions, monitoring, etc.) as additional cost. Benefits that offset this cost only emerge above a certain scale.||
-
What are the three core principles of a Modular Monolith? Answer: ||(1) No direct DB references between modules, (2) Inter-module communication through explicit facade interfaces, (3) Logical DB schema separation per module. This combines the operational simplicity of a monolith with the module independence of microservices.||
-
Why is team size important in architecture decisions? Answer: ||Microservices require a certain level of staffing per service for service ownership, independent deployment, and operational monitoring. If a small team (5 or fewer) separates into many services, one person ends up owning multiple services and the operational burden outweighs development benefits.||
-
What items must be included in an ADR (Architecture Decision Record)? Answer: ||Context (why this decision is needed), Decision (what was chosen), Consequences (positive/negative impacts), Quantitative basis (latency, cost, scope, etc.), and Alternatives considered (other options and reasons for rejection).||
-
What is the core strategy of the Strangler Fig pattern? Answer: ||Rather than replacing the existing system all at once, gradually build a new system while routing functions from the old system to the new one, one by one. Once all functions have been migrated, remove the old system.||
-
How do you automatically detect module boundary violations in CI? Answer: ||Analyze Python AST to inspect imports of each module. If internal packages (service, repository, domain) of another module are directly imported, it is flagged as a violation, allowing only public facades. Running this script per PR in CI blocks violations before merge.||
-
What are the steps to resolve a "distributed monolith" where services are separated but share a DB? Answer: ||(1) Indirect other services' table access through DB views, (2) Introduce event-based data synchronization to eliminate direct DB queries, (3) Physically separate DBs per service. Must proceed incrementally, one table at a time.||
-
Why is "rollback path verification" needed at the Phase 2 gate? Answer: ||When problems occur with a newly separated service, you must be able to immediately revert to the existing monolith path. If this rollback path is not verified in staging beforehand, recovery time during incidents becomes extended.||