Skip to content
Published on

Mastering Software Architecture & Design Patterns: From SOLID to Clean Architecture and AI System Design

Authors

Introduction

Software architecture is the skeleton of a system. Well-designed architecture is flexible to change, easy to test, and produces code that the entire team can understand. In the AI era, new design challenges have emerged: LLM services, RAG pipelines, and Agent systems. This guide walks through everything from classical design principles to modern AI system architecture.


1. SOLID Principles

SOLID is a set of five object-oriented design principles compiled by Robert C. Martin. They form the foundation for building maintainable and extensible software.

1.1 Single Responsibility Principle (SRP)

A class should have one, and only one, reason to change.

# BAD: multiple responsibilities in one class
class UserManager:
    def create_user(self, data): ...
    def send_welcome_email(self, user): ...
    def save_to_database(self, user): ...

# GOOD: each responsibility in its own class
class UserRepository:
    def save(self, user): ...

class EmailService:
    def send_welcome(self, user): ...

class UserFactory:
    def create(self, data): ...

1.2 Open/Closed Principle (OCP)

Software entities should be open for extension but closed for modification.

from abc import ABC, abstractmethod

class Discount(ABC):
    @abstractmethod
    def apply(self, price: float) -> float: ...

class NoDiscount(Discount):
    def apply(self, price: float) -> float:
        return price

class PercentDiscount(Discount):
    def __init__(self, percent: float):
        self.percent = percent
    def apply(self, price: float) -> float:
        return price * (1 - self.percent / 100)

class VIPDiscount(Discount):
    def apply(self, price: float) -> float:
        return price * 0.7

# New discount policies can be added without modifying existing code
class Order:
    def __init__(self, discount: Discount):
        self.discount = discount

    def final_price(self, base: float) -> float:
        return self.discount.apply(base)

1.3 Liskov Substitution Principle (LSP)

Subtypes must be substitutable for their base types without altering the correctness of the program.

class Bird:
    def fly(self) -> str:
        return "Flying"

# LSP violation: Penguin inherits Bird but cannot fly
class Penguin(Bird):
    def fly(self):
        raise NotImplementedError("Penguins cannot fly")

# GOOD: separate interfaces based on capabilities
class FlyingBird(ABC):
    @abstractmethod
    def fly(self) -> str: ...

class SwimmingBird(ABC):
    @abstractmethod
    def swim(self) -> str: ...

class Eagle(FlyingBird):
    def fly(self) -> str:
        return "Eagle soars through the sky"

class Penguin(SwimmingBird):
    def swim(self) -> str:
        return "Penguin swims gracefully"

1.4 Interface Segregation Principle (ISP)

Clients should not be forced to depend on interfaces they do not use.

# BAD: one bloated interface
class Machine(ABC):
    @abstractmethod
    def print(self): ...
    @abstractmethod
    def scan(self): ...
    @abstractmethod
    def fax(self): ...

# GOOD: split into smaller, focused interfaces
class Printable(ABC):
    @abstractmethod
    def print(self): ...

class Scannable(ABC):
    @abstractmethod
    def scan(self): ...

class MultiFunctionPrinter(Printable, Scannable):
    def print(self): print("Printing...")
    def scan(self): print("Scanning...")

class SimplePrinter(Printable):
    def print(self): print("Simple print...")

1.5 Dependency Inversion Principle (DIP)

High-level modules should not depend on low-level modules. Both should depend on abstractions.

# BAD: high-level depends directly on low-level
class MySQLDatabase:
    def query(self, sql: str): ...

class UserService:
    def __init__(self):
        self.db = MySQLDatabase()  # depends on a concrete class

# GOOD: depend on abstractions (dependency injection)
class DatabasePort(ABC):
    @abstractmethod
    def find_user(self, user_id: str) -> dict: ...

class MySQLAdapter(DatabasePort):
    def find_user(self, user_id: str) -> dict:
        # MySQL implementation
        return {}

class MongoAdapter(DatabasePort):
    def find_user(self, user_id: str) -> dict:
        # MongoDB implementation
        return {}

class UserService:
    def __init__(self, db: DatabasePort):
        self.db = db  # depends on the abstraction

# Usage
service = UserService(db=MySQLAdapter())

2. GoF Design Patterns

2.1 Factory Pattern

Encapsulates object creation logic.

class LLMProvider(ABC):
    @abstractmethod
    def complete(self, prompt: str) -> str: ...

class OpenAIProvider(LLMProvider):
    def complete(self, prompt: str) -> str:
        return f"OpenAI response: {prompt}"

class AnthropicProvider(LLMProvider):
    def complete(self, prompt: str) -> str:
        return f"Anthropic response: {prompt}"

class LLMFactory:
    _registry = {
        "openai": OpenAIProvider,
        "anthropic": AnthropicProvider,
    }

    @classmethod
    def create(cls, provider: str) -> LLMProvider:
        klass = cls._registry.get(provider)
        if not klass:
            raise ValueError(f"Unknown provider: {provider}")
        return klass()

2.2 Singleton Pattern

Ensures a class has only one instance and provides a global access point to it.

import threading

class ConfigManager:
    _instance = None
    _lock = threading.Lock()

    def __new__(cls):
        if cls._instance is None:
            with cls._lock:
                if cls._instance is None:
                    cls._instance = super().__new__(cls)
                    cls._instance._config = {}
        return cls._instance

    def set(self, key: str, value):
        self._config[key] = value

    def get(self, key: str):
        return self._config.get(key)

2.3 Observer Pattern

Automatically notifies multiple subscribers when an object's state changes.

class EventBus:
    def __init__(self):
        self._subscribers: dict[str, list] = {}

    def subscribe(self, event: str, handler):
        self._subscribers.setdefault(event, []).append(handler)

    def publish(self, event: str, data=None):
        for handler in self._subscribers.get(event, []):
            handler(data)

# Usage
bus = EventBus()
bus.subscribe("user.created", lambda d: print(f"Send welcome email: {d}"))
bus.subscribe("user.created", lambda d: print(f"Log analytics event: {d}"))
bus.publish("user.created", {"id": "u1", "email": "user@example.com"})

2.4 Strategy Pattern

Encapsulates algorithms and makes them interchangeable at runtime.

class SortStrategy(ABC):
    @abstractmethod
    def sort(self, data: list) -> list: ...

class QuickSort(SortStrategy):
    def sort(self, data: list) -> list:
        return sorted(data)

class MergeSort(SortStrategy):
    def sort(self, data: list) -> list:
        return sorted(data, key=lambda x: x)

class DataProcessor:
    def __init__(self, strategy: SortStrategy):
        self._strategy = strategy

    def set_strategy(self, strategy: SortStrategy):
        self._strategy = strategy

    def process(self, data: list) -> list:
        return self._strategy.sort(data)

2.5 Decorator Pattern

Adds new responsibilities to objects dynamically.

import time
import functools

def retry(max_attempts: int = 3):
    def decorator(func):
        @functools.wraps(func)
        def wrapper(*args, **kwargs):
            for attempt in range(max_attempts):
                try:
                    return func(*args, **kwargs)
                except Exception as e:
                    if attempt == max_attempts - 1:
                        raise
                    time.sleep(2 ** attempt)
        return wrapper
    return decorator

def timed(func):
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        start = time.time()
        result = func(*args, **kwargs)
        print(f"{func.__name__} took {time.time() - start:.2f}s")
        return result
    return wrapper

@retry(max_attempts=3)
@timed
def call_llm_api(prompt: str) -> str:
    return "response"

2.6 Command Pattern

Encapsulates requests as objects, enabling undo/redo and queuing.

class Command(ABC):
    @abstractmethod
    def execute(self): ...
    @abstractmethod
    def undo(self): ...

class CreatePostCommand(Command):
    def __init__(self, repo, post_data: dict):
        self.repo = repo
        self.post_data = post_data
        self.created_id = None

    def execute(self):
        self.created_id = self.repo.create(self.post_data)

    def undo(self):
        if self.created_id:
            self.repo.delete(self.created_id)

class CommandHistory:
    def __init__(self):
        self._history: list[Command] = []

    def execute(self, cmd: Command):
        cmd.execute()
        self._history.append(cmd)

    def undo_last(self):
        if self._history:
            self._history.pop().undo()

3. Architecture Patterns

3.1 Clean Architecture

Dependencies always point inward (toward the domain).

Outer LayerInterface AdaptersUse CasesDomain Entities
# Domain Entity (innermost layer)
from dataclasses import dataclass, field
from datetime import datetime

@dataclass
class Article:
    id: str
    title: str
    content: str
    author_id: str
    created_at: datetime = field(default_factory=datetime.utcnow)

    def publish(self):
        if not self.title or not self.content:
            raise ValueError("Title and content are required")

# Use Case (orchestrates domain logic)
class CreateArticleUseCase:
    def __init__(self, repo: "ArticleRepository", event_bus: EventBus):
        self.repo = repo
        self.event_bus = event_bus

    def execute(self, title: str, content: str, author_id: str) -> Article:
        import uuid
        article = Article(
            id=str(uuid.uuid4()),
            title=title,
            content=content,
            author_id=author_id,
        )
        article.publish()
        self.repo.save(article)
        self.event_bus.publish("article.created", {"id": article.id})
        return article

# Interface Adapter (outermost layer)
class ArticleController:
    def __init__(self, use_case: CreateArticleUseCase):
        self.use_case = use_case

    def handle_create(self, request: dict) -> dict:
        article = self.use_case.execute(
            title=request["title"],
            content=request["content"],
            author_id=request["author_id"],
        )
        return {"id": article.id, "title": article.title}

3.2 Hexagonal Architecture (Ports and Adapters)

# Port (interface definition)
class ArticleRepository(ABC):
    @abstractmethod
    def save(self, article: Article): ...
    @abstractmethod
    def find_by_id(self, id: str) -> Article: ...

class NotificationPort(ABC):
    @abstractmethod
    def notify(self, message: str): ...

# Adapter (connects to external systems)
class SQLiteArticleRepository(ArticleRepository):
    def save(self, article: Article):
        pass  # SQLite save logic
    def find_by_id(self, id: str) -> Article:
        pass  # SQLite query logic

class SlackNotificationAdapter(NotificationPort):
    def notify(self, message: str):
        pass  # Slack API call

3.3 CQRS and Event Sourcing

# Command Side
@dataclass
class CreateOrderCommand:
    order_id: str
    user_id: str
    items: list[dict]

# Query Side (separate read model)
@dataclass
class OrderSummaryView:
    order_id: str
    total_price: float
    item_count: int

# Event Sourcing
@dataclass
class OrderCreatedEvent:
    order_id: str
    user_id: str
    items: list[dict]
    timestamp: datetime = field(default_factory=datetime.utcnow)

class OrderAggregate:
    def __init__(self):
        self.events: list = []
        self.state = {}

    def create(self, cmd: CreateOrderCommand):
        event = OrderCreatedEvent(
            order_id=cmd.order_id,
            user_id=cmd.user_id,
            items=cmd.items,
        )
        self._apply(event)
        self.events.append(event)

    def _apply(self, event: OrderCreatedEvent):
        self.state["id"] = event.order_id
        self.state["items"] = event.items
        self.state["total"] = sum(
            i.get("price", 0) * i.get("qty", 1) for i in event.items
        )

4. Microservices Patterns

4.1 API Gateway Pattern

from fastapi import FastAPI, HTTPException
import httpx

app = FastAPI(title="API Gateway")

SERVICE_MAP = {
    "users": "http://user-service:8001",
    "orders": "http://order-service:8002",
    "products": "http://product-service:8003",
}

@app.get("/api/users/{user_id}")
async def proxy_user(user_id: str):
    async with httpx.AsyncClient() as client:
        resp = await client.get(f"{SERVICE_MAP['users']}/users/{user_id}")
        if resp.status_code == 404:
            raise HTTPException(status_code=404, detail="User not found")
        return resp.json()

4.2 Saga Pattern (Distributed Transactions)

The Saga pattern ensures data consistency across microservices using a series of local transactions and compensating transactions.

class OrderSaga:
    def __init__(self, order_service, payment_service, inventory_service):
        self.order_svc = order_service
        self.payment_svc = payment_service
        self.inventory_svc = inventory_service

    async def execute(self, order_data: dict):
        order_id = None
        payment_id = None
        try:
            # Step 1: Create order
            order_id = await self.order_svc.create(order_data)
            # Step 2: Process payment
            payment_id = await self.payment_svc.charge(order_data["amount"])
            # Step 3: Reserve inventory
            await self.inventory_svc.reserve(order_data["items"])
            return {"status": "success", "order_id": order_id}
        except Exception as e:
            # Compensating transactions (reverse rollback)
            if payment_id:
                await self.payment_svc.refund(payment_id)
            if order_id:
                await self.order_svc.cancel(order_id)
            raise

5. Clean Code Principles

5.1 Meaningful Naming

# BAD
def calc(d, r):
    return d * (1 - r / 100)

# GOOD
def calculate_discounted_price(original_price: float, discount_rate_percent: float) -> float:
    return original_price * (1 - discount_rate_percent / 100)

5.2 Function Design — Small and Single Purpose

# BAD: too many responsibilities
def process_user_registration(email, password, name, send_email=True):
    if "@" not in email:
        raise ValueError("Invalid email format")
    hashed = hash(password)
    user = {"email": email, "password": hashed, "name": name}
    db.save(user)
    if send_email:
        mailer.send(email, "Welcome!")
    return user

# GOOD: separated functions
def validate_email(email: str) -> None:
    if "@" not in email:
        raise ValueError("Invalid email format")

def hash_password(raw: str) -> str:
    import hashlib
    return hashlib.sha256(raw.encode()).hexdigest()

def register_user(email: str, password: str, name: str) -> dict:
    validate_email(email)
    return {"email": email, "password": hash_password(password), "name": name}

5.3 Code Smells and Refactoring

Common code smells and their solutions:

  • Long Method: Extract into smaller functions (Extract Method)
  • Large Class: Split by responsibility (Extract Class)
  • Feature Envy: Move method closer to the data it uses (Move Method)
  • Magic Numbers: Replace with named constants (Replace Magic Number with Symbolic Constant)
  • Duplicate Code: Extract to a shared function (Extract Function)

6. AI System Architecture

6.1 RAG (Retrieval-Augmented Generation) Architecture

from dataclasses import dataclass

@dataclass
class RAGConfig:
    embedding_model: str = "text-embedding-3-small"
    llm_model: str = "gpt-4o"
    top_k: int = 5
    chunk_size: int = 512

class RAGPipeline:
    def __init__(self, config: RAGConfig, vector_store, llm_client):
        self.config = config
        self.vector_store = vector_store
        self.llm = llm_client

    def retrieve(self, query: str) -> list[str]:
        # 1. Embed the query
        query_vector = self.llm.embed(query)
        # 2. Search for similar documents
        docs = self.vector_store.search(query_vector, top_k=self.config.top_k)
        return [d["content"] for d in docs]

    def generate(self, query: str, context: list[str]) -> str:
        context_text = "\n\n".join(context)
        prompt = f"""Answer the question using the following context.

Context:
{context_text}

Question: {query}
Answer:"""
        return self.llm.complete(prompt)

    def query(self, user_question: str) -> str:
        context = self.retrieve(user_question)
        return self.generate(user_question, context)

6.2 Agent System Design

@dataclass
class Tool:
    name: str
    description: str
    func: callable

class ReActAgent:
    """AI Agent using the Reasoning + Acting pattern"""

    def __init__(self, llm, tools: list[Tool]):
        self.llm = llm
        self.tools = {t.name: t for t in tools}

    def _build_system_prompt(self) -> str:
        tool_desc = "\n".join(
            f"- {t.name}: {t.description}" for t in self.tools.values()
        )
        return f"""You are an AI Agent that solves tasks using tools.
Available tools:
{tool_desc}

Format:
Thought: [analyze the current situation]
Action: [tool name to use]
Action Input: [input for the tool]
Observation: [result of the tool execution]
... (repeat)
Final Answer: [final answer]"""

    def run(self, task: str, max_steps: int = 10) -> str:
        messages = [{"role": "user", "content": task}]
        for _ in range(max_steps):
            response = self.llm.chat(messages)
            if "Final Answer:" in response:
                return response.split("Final Answer:")[-1].strip()
            if "Action:" in response:
                action_line = [l for l in response.split("\n") if l.startswith("Action:")]
                if action_line:
                    tool_name = action_line[0].replace("Action:", "").strip()
                    tool = self.tools.get(tool_name)
                    if tool:
                        observation = tool.func(response)
                        messages.append({"role": "assistant", "content": response})
                        messages.append({"role": "user", "content": f"Observation: {observation}"})
        return "Max steps exceeded"

7. Testing Strategy

7.1 The Test Pyramid

       [E2E Tests]           <- slow and costly, keep few
     [Integration Tests]     <- verify service contracts
  [Unit Tests]               <- fast and cheap, keep many

7.2 TDD Example (Red-Green-Refactor)

import pytest

# 1. RED: write a failing test first
def test_calculate_discounted_price_basic():
    assert calculate_discounted_price(100.0, 20.0) == 80.0

def test_calculate_discounted_price_zero_discount():
    assert calculate_discounted_price(100.0, 0.0) == 100.0

def test_calculate_discounted_price_full_discount():
    assert calculate_discounted_price(100.0, 100.0) == 0.0

def test_calculate_discounted_price_invalid_rate():
    with pytest.raises(ValueError):
        calculate_discounted_price(100.0, -10.0)

# 2. GREEN: minimal implementation to pass the tests
def calculate_discounted_price(price: float, discount_rate: float) -> float:
    if discount_rate < 0 or discount_rate > 100:
        raise ValueError("Discount rate must be between 0 and 100")
    return price * (1 - discount_rate / 100)

# 3. REFACTOR: improve quality while keeping tests green

7.3 Mocking Strategy

from unittest.mock import MagicMock, patch

class TestUserService:
    def test_create_user_sends_email(self):
        mock_repo = MagicMock()
        mock_email = MagicMock()
        mock_repo.save.return_value = {"id": "u1"}

        service = UserService(repo=mock_repo, email_svc=mock_email)
        service.register("test@example.com", "John Doe")

        mock_repo.save.assert_called_once()
        mock_email.send_welcome.assert_called_once_with("test@example.com")

    def test_create_user_handles_db_error(self):
        mock_repo = MagicMock()
        mock_repo.save.side_effect = Exception("DB connection failed")
        mock_email = MagicMock()

        service = UserService(repo=mock_repo, email_svc=mock_email)
        with pytest.raises(Exception):
            service.register("test@example.com", "John Doe")

        mock_email.send_welcome.assert_not_called()

Quiz

Q1. Why should high-level modules not directly depend on low-level modules in the Dependency Inversion Principle?

Answer: Because changes in low-level modules propagate up to high-level modules, increasing the cost of change across the entire system.

Explanation: When business logic (high-level) depends directly on infrastructure (low-level like a database or external API), swapping MySQL for PostgreSQL forces changes in the business logic code. Depending on an abstraction (interface) means only the low-level adapter needs to change while the high-level code stays untouched. This also makes it easy to inject mocks during testing.

Q2. What is the difference between the Observer pattern and the Pub/Sub pattern?

Answer: In Observer, Subject and Observer have a direct reference relationship. In Pub/Sub, a message broker (event bus) sits between Publisher and Subscriber, fully decoupling them.

Explanation: In the Observer pattern, the Subject maintains a list of Observers directly, so they must live in the same process. In Pub/Sub, a broker like Kafka or RabbitMQ enables Publisher and Subscriber to communicate without knowing each other. Pub/Sub is better suited for asynchronous communication in microservices.

Q3. What are the benefits and complexity trade-offs of separating Commands and Queries in CQRS?

Answer: Read and write workloads can scale independently and read models can be optimized, but the downsides are eventual consistency delays and increased code complexity.

Explanation: Commands (writes) require strong consistency while Queries (reads) need performance optimization. Separation allows multiple read replicas or denormalized read models. However, combining CQRS with event sourcing introduces synchronization lag between models and significantly increases overall system complexity.

Q4. When is the Saga pattern necessary in a microservices architecture?

Answer: When a business transaction spans multiple microservices and you need data consistency in a distributed environment without using Two-Phase Commit (2PC).

Explanation: When order, payment, and shipping are separate services, they cannot share a single database transaction. Saga handles each step as a local transaction and rolls back completed steps using compensating transactions if a failure occurs. It can be implemented as Choreography (event-driven) or Orchestration (central coordinator).

Q5. Why should you write only the minimum code necessary during the Green phase of TDD's Red-Green-Refactor cycle?

Answer: To verify that the test genuinely validates behavior and to prevent over-engineering (YAGNI violations), by writing only enough code to pass the currently failing test.

Explanation: Enforcing minimal implementation confirms that tests are working as specifications. If you implement everything upfront, you cannot be sure that tests pass for the right reasons. During the Refactor phase, tests serve as a safety net so refactoring can proceed safely without breaking behavior.


Conclusion

Software architecture is not a one-time lesson. SOLID principles apply from the smallest function design, and Clean Architecture forms the backbone of systems a team will maintain for years. These principles apply equally to AI systems: RAG pipelines designed with ports and adapters make swapping vector databases painless, and Agent systems that use the Command pattern for tool execution become highly extensible.

Good architecture makes change fearless. Apply the patterns you learned today to your real projects, one at a time.