Split View: Serverless 아키텍처 패턴 완전 가이드 2025: Lambda, Step Functions, 이벤트 소싱, 비용 최적화

Serverless 아키텍처 패턴 완전 가이드 2025: Lambda, Step Functions, 이벤트 소싱, 비용 최적화

1. Serverless란 무엇인가

Serverless는 서버를 직접 관리하지 않고 클라우드 제공자가 인프라를 완전히 추상화하는 컴퓨팅 모델이다. 개발자는 비즈니스 로직에만 집중하고, 프로비저닝/스케일링/패칭은 클라우드가 담당한다.

1.1 Serverless의 4대 원칙

원칙	설명	예시
서버 관리 불필요	OS, 패치, 스케일링 신경 안 씀	Lambda, Cloud Functions
자동 스케일링	트래픽에 따라 0에서 수천 인스턴스까지	초당 0건에서 10만건까지
사용한 만큼 과금	유휴 시간 비용 없음	100ms 단위 과금
이벤트 기반	요청/이벤트가 함수를 트리거	HTTP, S3, SQS, 스케줄

1.2 Serverless 컴퓨팅의 역사

2014: AWS Lambda 출시 (최초의 FaaS)
2016: Azure Functions, Google Cloud Functions
2017: AWS Step Functions, SAM 출시
2018: Lambda Layers, ALB 지원
2019: Provisioned Concurrency, RDS Proxy
2020: Lambda Container Image, EventBridge
2021: Lambda URL, Graviton2 지원
2022: Lambda SnapStart (Java), 스트리밍 응답
2023: Lambda Advanced Logging, Step Functions 개선
2024: Lambda 성능 최적화, ARM64 전면 지원
2025: Lambda 최대 메모리 10GB, Step Functions Distributed Map 강화

1.3 주요 클라우드의 Serverless 서비스

카테고리	AWS	Azure	GCP
FaaS	Lambda	Functions	Cloud Functions
워크플로	Step Functions	Durable Functions	Workflows
API	API Gateway	API Management	API Gateway
메시징	SQS/SNS	Service Bus	Pub/Sub
스트리밍	Kinesis	Event Hubs	Dataflow
DB	DynamoDB	Cosmos DB	Firestore
스토리지	S3	Blob Storage	Cloud Storage
이벤트 버스	EventBridge	Event Grid	Eventarc

2. Lambda 설계 패턴

Lambda 함수를 어떻게 구조화하느냐에 따라 유지보수성, 성능, 비용이 크게 달라진다.

2.1 단일 목적 함수 (Single Purpose Function)

하나의 Lambda가 하나의 작업만 수행한다. 가장 권장되는 패턴이다.

# order_create.py - 주문 생성만 담당
import json
import boto3
import os
from datetime import datetime

dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table(os.environ['ORDERS_TABLE'])
sns = boto3.client('sns')

def handler(event, context):
    body = json.loads(event['body'])

    order = {
        'orderId': context.aws_request_id,
        'userId': body['userId'],
        'items': body['items'],
        'total': calculate_total(body['items']),
        'status': 'CREATED',
        'createdAt': datetime.utcnow().isoformat()
    }

    table.put_item(Item=order)

    # 이벤트 발행
    sns.publish(
        TopicArn=os.environ['ORDER_TOPIC'],
        Message=json.dumps(order),
        MessageAttributes={
            'eventType': {
                'DataType': 'String',
                'StringValue': 'OrderCreated'
            }
        }
    )

    return {
        'statusCode': 201,
        'body': json.dumps(order)
    }

def calculate_total(items):
    return sum(item['price'] * item['quantity'] for item in items)

장점:

함수 크기가 작아 콜드 스타트가 빠름
독립 배포 가능
IAM 권한을 최소화할 수 있음
디버깅이 쉬움

단점:

함수 수가 많아질 수 있음
공통 코드 관리를 위해 Layer 필요

2.2 모놀리식 Lambda (Lambda-lith)

하나의 Lambda가 여러 라우트를 처리한다. Express나 FastAPI 같은 프레임워크를 사용한다.

// app.ts - 모놀리식 Lambda
import express from 'express';
import serverless from 'serverless-http';

const app = express();
app.use(express.json());

// 여러 라우트를 하나의 Lambda에서 처리
app.get('/orders', async (req, res) => {
  const orders = await getOrders(req.query);
  res.json(orders);
});

app.post('/orders', async (req, res) => {
  const order = await createOrder(req.body);
  res.status(201).json(order);
});

app.get('/orders/:id', async (req, res) => {
  const order = await getOrder(req.params.id);
  if (!order) return res.status(404).json({ error: 'Not found' });
  res.json(order);
});

app.put('/orders/:id/status', async (req, res) => {
  const order = await updateOrderStatus(req.params.id, req.body.status);
  res.json(order);
});

app.delete('/orders/:id', async (req, res) => {
  await cancelOrder(req.params.id);
  res.status(204).send();
});

export const handler = serverless(app);

장점:

기존 웹 프레임워크 코드를 그대로 마이그레이션 가능
함수 수 관리가 단순
로컬 개발이 편리

단점:

패키지 크기가 커서 콜드 스타트 느림
IAM 권한이 과도하게 넓어짐
하나의 라우트 문제가 전체에 영향

2.3 Fan-out / Fan-in 패턴

하나의 이벤트가 여러 Lambda를 동시에 트리거하고, 결과를 집계하는 패턴이다.

# serverless.yml - Fan-out 아키텍처
service: order-processing

provider:
  name: aws
  runtime: nodejs20.x

functions:
  orderReceiver:
    handler: src/receiver.handler
    events:
      - http:
          path: /orders
          method: post
    environment:
      FAN_OUT_TOPIC: !Ref OrderFanOutTopic

  inventoryCheck:
    handler: src/inventory.handler
    events:
      - sns:
          arn: !Ref OrderFanOutTopic
          filterPolicy:
            eventType:
              - OrderCreated

  paymentProcess:
    handler: src/payment.handler
    events:
      - sns:
          arn: !Ref OrderFanOutTopic
          filterPolicy:
            eventType:
              - OrderCreated

  notificationSend:
    handler: src/notification.handler
    events:
      - sns:
          arn: !Ref OrderFanOutTopic
          filterPolicy:
            eventType:
              - OrderCreated

resources:
  Resources:
    OrderFanOutTopic:
      Type: AWS::SNS::Topic
      Properties:
        TopicName: order-fan-out

2.4 Lambda 설계 패턴 비교

패턴	함수 수	콜드 스타트	배포 단위	권장 상황
단일 목적	많음	빠름	개별	마이크로서비스
Lambda-lith	적음	느림	전체	마이그레이션
Fan-out	중간	빠름	개별	병렬 처리
Lambda Layer	중간	보통	레이어+함수	공통 코드 공유

3. Step Functions: 워크플로 오케스트레이션

Step Functions는 AWS의 서버리스 워크플로 서비스로, 복잡한 비즈니스 로직을 상태 머신으로 시각적으로 정의한다.

3.1 Standard vs Express Workflow

특성	Standard	Express
최대 실행 시간	1년	5분
실행 보장	Exactly-once	At-least-once
가격	상태 전이당 과금	실행 시간/메모리 과금
실행 이력	90일 보관	CloudWatch Logs
최대 처리량	초당 2,000 전이	초당 100,000+
용도	장기 실행 워크플로	대량/빠른 처리

3.2 상태 타입

{
  "Comment": "주문 처리 워크플로",
  "StartAt": "ValidateOrder",
  "States": {
    "ValidateOrder": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:ap-northeast-2:123456789:function:validate-order",
      "Next": "CheckInventory",
      "Retry": [
        {
          "ErrorEquals": ["ServiceException"],
          "IntervalSeconds": 2,
          "MaxAttempts": 3,
          "BackoffRate": 2.0
        }
      ],
      "Catch": [
        {
          "ErrorEquals": ["ValidationError"],
          "Next": "OrderFailed"
        }
      ]
    },
    "CheckInventory": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:ap-northeast-2:123456789:function:check-inventory",
      "Next": "ProcessPaymentOrWait"
    },
    "ProcessPaymentOrWait": {
      "Type": "Choice",
      "Choices": [
        {
          "Variable": "$.inventoryAvailable",
          "BooleanEquals": true,
          "Next": "ProcessPayment"
        }
      ],
      "Default": "WaitForInventory"
    },
    "WaitForInventory": {
      "Type": "Wait",
      "Seconds": 300,
      "Next": "CheckInventory"
    },
    "ProcessPayment": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:ap-northeast-2:123456789:function:process-payment",
      "Next": "ParallelFulfillment"
    },
    "ParallelFulfillment": {
      "Type": "Parallel",
      "Branches": [
        {
          "StartAt": "UpdateDatabase",
          "States": {
            "UpdateDatabase": {
              "Type": "Task",
              "Resource": "arn:aws:lambda:ap-northeast-2:123456789:function:update-db",
              "End": true
            }
          }
        },
        {
          "StartAt": "SendNotification",
          "States": {
            "SendNotification": {
              "Type": "Task",
              "Resource": "arn:aws:lambda:ap-northeast-2:123456789:function:send-notification",
              "End": true
            }
          }
        },
        {
          "StartAt": "InitiateShipping",
          "States": {
            "InitiateShipping": {
              "Type": "Task",
              "Resource": "arn:aws:lambda:ap-northeast-2:123456789:function:initiate-shipping",
              "End": true
            }
          }
        }
      ],
      "Next": "OrderCompleted"
    },
    "OrderCompleted": {
      "Type": "Succeed"
    },
    "OrderFailed": {
      "Type": "Fail",
      "Error": "OrderProcessingFailed",
      "Cause": "Order validation or processing failed"
    }
  }
}

3.3 Step Functions 상태 타입 요약

상태 타입	용도	설명
Task	작업 실행	Lambda, DynamoDB, SQS 등 호출
Choice	조건 분기	if/else 로직
Parallel	병렬 실행	여러 브랜치 동시 실행
Map	반복 처리	배열의 각 요소를 처리
Wait	대기	지정 시간 또는 타임스탬프까지 대기
Pass	데이터 변환	입력 변환 후 전달
Succeed	성공 종료	워크플로 성공 완료
Fail	실패 종료	워크플로 실패

3.4 Callback 패턴 (사람 승인 워크플로)

Step Functions는 외부 시스템의 응답을 기다리는 콜백 패턴을 지원한다.

# callback_handler.py - 사람 승인을 기다리는 Lambda
import json
import boto3

sfn = boto3.client('stepfunctions')
ses = boto3.client('ses')

def request_approval(event, context):
    """Step Functions가 태스크 토큰과 함께 호출"""
    task_token = event['taskToken']
    order = event['order']

    # 승인 링크가 포함된 이메일 발송
    approval_url = f"https://api.example.com/approve?token={task_token}"
    reject_url = f"https://api.example.com/reject?token={task_token}"

    ses.send_email(
        Source='noreply@example.com',
        Destination={'ToAddresses': ['manager@example.com']},
        Message={
            'Subject': {'Data': f"주문 승인 요청: {order['orderId']}"},
            'Body': {
                'Html': {
                    'Data': f"""
                    <h2>주문 승인 요청</h2>
                    <p>주문 ID: {order['orderId']}</p>
                    <p>금액: {order['total']}원</p>
                    <a href="{approval_url}">승인</a> |
                    <a href="{reject_url}">거절</a>
                    """
                }
            }
        }
    )

def handle_approval(event, context):
    """승인/거절 콜백 처리"""
    params = event['queryStringParameters']
    task_token = params['token']

    if 'approve' in event['path']:
        sfn.send_task_success(
            taskToken=task_token,
            output=json.dumps({'approved': True})
        )
    else:
        sfn.send_task_failure(
            taskToken=task_token,
            error='Rejected',
            cause='Manager rejected the order'
        )

    return {
        'statusCode': 200,
        'body': json.dumps({'message': '처리 완료'})
    }

4. 이벤트 기반 아키텍처 패턴

4.1 이벤트 소싱 (Event Sourcing) with Lambda

# event_store.py
import json
import boto3
from datetime import datetime

dynamodb = boto3.resource('dynamodb')
event_store = dynamodb.Table('EventStore')
sns = boto3.client('sns')

def append_event(aggregate_id, event_type, data, version):
    """이벤트를 저장하고 발행"""
    event = {
        'aggregateId': aggregate_id,
        'version': version,
        'eventType': event_type,
        'data': data,
        'timestamp': datetime.utcnow().isoformat(),
        'metadata': {
            'correlationId': data.get('correlationId', ''),
            'causationId': data.get('causationId', '')
        }
    }

    # 낙관적 잠금: version이 이미 존재하면 실패
    event_store.put_item(
        Item=event,
        ConditionExpression='attribute_not_exists(version)'
    )

    # 이벤트 발행
    sns.publish(
        TopicArn='arn:aws:sns:ap-northeast-2:123456789:domain-events',
        Message=json.dumps(event),
        MessageAttributes={
            'eventType': {
                'DataType': 'String',
                'StringValue': event_type
            }
        }
    )

    return event

def replay_events(aggregate_id):
    """특정 Aggregate의 모든 이벤트를 재생"""
    response = event_store.query(
        KeyConditionExpression='aggregateId = :aid',
        ExpressionAttributeValues={':aid': aggregate_id},
        ScanIndexForward=True  # 시간순 정렬
    )
    return response['Items']

4.2 Saga 패턴 with Step Functions

분산 트랜잭션을 관리하는 Saga 패턴을 Step Functions로 구현한다.

{
  "Comment": "주문 Saga - 보상 트랜잭션 포함",
  "StartAt": "ReserveInventory",
  "States": {
    "ReserveInventory": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:ap-northeast-2:123456789:function:reserve-inventory",
      "Next": "ProcessPayment",
      "Catch": [{
        "ErrorEquals": ["States.ALL"],
        "Next": "InventoryReservationFailed"
      }]
    },
    "ProcessPayment": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:ap-northeast-2:123456789:function:process-payment",
      "Next": "ConfirmOrder",
      "Catch": [{
        "ErrorEquals": ["States.ALL"],
        "Next": "RollbackInventory"
      }]
    },
    "ConfirmOrder": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:ap-northeast-2:123456789:function:confirm-order",
      "Next": "SagaCompleted",
      "Catch": [{
        "ErrorEquals": ["States.ALL"],
        "Next": "RollbackPayment"
      }]
    },
    "RollbackPayment": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:ap-northeast-2:123456789:function:rollback-payment",
      "Next": "RollbackInventory"
    },
    "RollbackInventory": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:ap-northeast-2:123456789:function:rollback-inventory",
      "Next": "SagaFailed"
    },
    "InventoryReservationFailed": {
      "Type": "Fail",
      "Error": "InventoryReservationFailed",
      "Cause": "Could not reserve inventory"
    },
    "SagaCompleted": {
      "Type": "Succeed"
    },
    "SagaFailed": {
      "Type": "Fail",
      "Error": "SagaFailed",
      "Cause": "Order saga failed, all compensations executed"
    }
  }
}

4.3 Choreography vs Orchestration

특성	Choreography (이벤트)	Orchestration (Step Functions)
결합도	느슨	중앙 집중
가시성	분산 추적 필요	상태 머신에서 확인
복잡도	이벤트 흐름 파악 어려움	워크플로 정의 명확
에러 처리	각 서비스가 독립 처리	중앙에서 재시도/보상
적합한 경우	단순한 이벤트 흐름	복잡한 비즈니스 로직

5. Cold Start 심층 분석

Cold Start는 서버리스의 가장 큰 기술적 과제 중 하나다. Lambda 함수가 새 실행 환경에서 시작될 때 발생하는 지연이다.

5.1 Cold Start 발생 원인

요청 도착
  |
  v
[실행 환경 있음?] --No--> [Cold Start 경로]
  |                           |
  Yes                    1. 실행 환경 프로비저닝
  |                    2. 코드 다운로드 (S3)
  v                    3. 런타임 초기화
[Warm Start]           4. 핸들러 외부 코드 실행
  |                    5. 핸들러 실행
  v                         |
[핸들러 실행]               v
  |                    [응답 반환]
  v
[응답 반환]

5.2 런타임별 Cold Start 시간 비교

런타임	평균 Cold Start	P99 Cold Start	패키지 크기 영향
Python 3.12	150-300ms	500-800ms	낮음
Node.js 20	150-350ms	500-900ms	중간
Go (provided.al2023)	50-100ms	150-300ms	매우 낮음
Rust (provided.al2023)	30-80ms	100-250ms	매우 낮음
Java 21	800-3000ms	3000-8000ms	높음
Java 21 (SnapStart)	100-200ms	300-500ms	중간
.NET 8 (AOT)	200-400ms	600-1000ms	중간

5.3 Cold Start 최적화 전략

# 최적화된 Lambda 함수 구조
import json
import os

# 핸들러 외부에서 초기화 (재사용됨)
# 1. 연결은 전역으로 초기화
import boto3
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table(os.environ['TABLE_NAME'])

# 2. 불필요한 import 제거
# BAD: import pandas  (패키지 크기 증가)
# GOOD: 필요한 것만 import

# 3. SDK 설정 최적화
from botocore.config import Config
config = Config(
    connect_timeout=5,
    read_timeout=5,
    retries={'max_attempts': 2}
)
s3 = boto3.client('s3', config=config)

def handler(event, context):
    """핸들러는 가능한 가볍게"""
    order_id = event['pathParameters']['orderId']

    response = table.get_item(Key={'orderId': order_id})
    item = response.get('Item')

    if not item:
        return {'statusCode': 404, 'body': json.dumps({'error': 'Not found'})}

    return {'statusCode': 200, 'body': json.dumps(item)}

5.4 Provisioned Concurrency

# SAM template - Provisioned Concurrency 설정
Resources:
  MyFunction:
    Type: AWS::Serverless::Function
    Properties:
      Handler: app.handler
      Runtime: python3.12
      MemorySize: 512
      AutoPublishAlias: live
      ProvisionedConcurrencyConfig:
        ProvisionedConcurrentExecutions: 10

  # 시간대별 자동 스케일링
  ScalingTarget:
    Type: AWS::ApplicationAutoScaling::ScalableTarget
    Properties:
      MaxCapacity: 100
      MinCapacity: 5
      ResourceId: !Sub function:${MyFunction}:live
      ScalableDimension: lambda:function:ProvisionedConcurrency
      ServiceNamespace: lambda

  ScalingPolicy:
    Type: AWS::ApplicationAutoScaling::ScalingPolicy
    Properties:
      PolicyName: UtilizationScaling
      PolicyType: TargetTrackingScaling
      ScalableTargetId: !Ref ScalingTarget
      TargetTrackingScalingPolicyConfiguration:
        TargetValue: 0.7
        PredefinedMetricSpecification:
          PredefinedMetricType: LambdaProvisionedConcurrencyUtilization

5.5 Java SnapStart

// SnapStart 최적화된 Java Lambda
import com.amazonaws.services.lambda.runtime.Context;
import com.amazonaws.services.lambda.runtime.RequestHandler;
import software.amazon.awssdk.services.dynamodb.DynamoDbClient;
import org.crac.Core;
import org.crac.Resource;

public class OrderHandler implements RequestHandler<APIGatewayProxyRequestEvent, APIGatewayProxyResponseEvent>,
                                     Resource {

    private final DynamoDbClient dynamoDb;
    private final ObjectMapper objectMapper;

    public OrderHandler() {
        // SnapStart: 이 초기화 코드는 스냅샷에 포함됨
        this.dynamoDb = DynamoDbClient.create();
        this.objectMapper = new ObjectMapper();
        Core.getGlobalContext().register(this);
    }

    @Override
    public void beforeCheckpoint(org.crac.Context<? extends Resource> context) {
        // 스냅샷 전: 연결 정리
    }

    @Override
    public void afterRestore(org.crac.Context<? extends Resource> context) {
        // 복원 후: 연결 재설정
        // 고유성 보장 (난수 시드 재설정 등)
    }

    @Override
    public APIGatewayProxyResponseEvent handleRequest(
            APIGatewayProxyRequestEvent event, Context context) {
        // 비즈니스 로직
        return new APIGatewayProxyResponseEvent()
            .withStatusCode(200)
            .withBody("{\"message\": \"OK\"}");
    }
}

6. API 패턴

6.1 REST API with API Gateway

# SAM template - REST API
Resources:
  OrdersApi:
    Type: AWS::Serverless::Api
    Properties:
      StageName: prod
      Auth:
        DefaultAuthorizer: CognitoAuthorizer
        Authorizers:
          CognitoAuthorizer:
            UserPoolArn: !GetAtt UserPool.Arn
      # 스로틀링
      MethodSettings:
        - HttpMethod: '*'
          ResourcePath: '/*'
          ThrottlingBurstLimit: 100
          ThrottlingRateLimit: 50
      # CORS
      Cors:
        AllowMethods: "'GET,POST,PUT,DELETE,OPTIONS'"
        AllowHeaders: "'Content-Type,Authorization'"
        AllowOrigin: "'https://example.com'"

  GetOrderFunction:
    Type: AWS::Serverless::Function
    Properties:
      Handler: src/orders/get.handler
      Runtime: nodejs20.x
      Events:
        GetOrder:
          Type: Api
          Properties:
            RestApiId: !Ref OrdersApi
            Path: /orders/{orderId}
            Method: get

  CreateOrderFunction:
    Type: AWS::Serverless::Function
    Properties:
      Handler: src/orders/create.handler
      Runtime: nodejs20.x
      Events:
        CreateOrder:
          Type: Api
          Properties:
            RestApiId: !Ref OrdersApi
            Path: /orders
            Method: post

6.2 GraphQL with AppSync

# AppSync 스키마
type Order {
  orderId: ID!
  userId: String!
  items: [OrderItem!]!
  total: Float!
  status: OrderStatus!
  createdAt: AWSDateTime!
}

type OrderItem {
  productId: String!
  name: String!
  quantity: Int!
  price: Float!
}

enum OrderStatus {
  CREATED
  PAID
  SHIPPED
  DELIVERED
  CANCELLED
}

type Query {
  getOrder(orderId: ID!): Order
  listOrders(userId: String!, limit: Int, nextToken: String): OrderConnection!
}

type Mutation {
  createOrder(input: CreateOrderInput!): Order!
  updateOrderStatus(orderId: ID!, status: OrderStatus!): Order!
}

type Subscription {
  onOrderStatusChanged(orderId: ID!): Order
    @aws_subscribe(mutations: ["updateOrderStatus"])
}

6.3 WebSocket with API Gateway

# websocket_handler.py
import json
import boto3
import os

dynamodb = boto3.resource('dynamodb')
connections_table = dynamodb.Table(os.environ['CONNECTIONS_TABLE'])

def connect(event, context):
    """WebSocket 연결"""
    connection_id = event['requestContext']['connectionId']
    user_id = event['requestContext']['authorizer']['userId']

    connections_table.put_item(Item={
        'connectionId': connection_id,
        'userId': user_id
    })

    return {'statusCode': 200}

def disconnect(event, context):
    """WebSocket 연결 해제"""
    connection_id = event['requestContext']['connectionId']
    connections_table.delete_item(Key={'connectionId': connection_id})
    return {'statusCode': 200}

def send_message(event, context):
    """메시지 전송"""
    domain = event['requestContext']['domainName']
    stage = event['requestContext']['stage']
    body = json.loads(event['body'])

    apigw = boto3.client(
        'apigatewaymanagementapi',
        endpoint_url=f'https://{domain}/{stage}'
    )

    # 모든 연결에 브로드캐스트
    connections = connections_table.scan()['Items']
    for conn in connections:
        try:
            apigw.post_to_connection(
                ConnectionId=conn['connectionId'],
                Data=json.dumps(body['message']).encode()
            )
        except apigw.exceptions.GoneException:
            connections_table.delete_item(
                Key={'connectionId': conn['connectionId']}
            )

    return {'statusCode': 200}

7. 데이터 패턴

7.1 DynamoDB 단일 테이블 설계

# DynamoDB Single Table Design
# PK/SK 패턴으로 여러 엔티티를 하나의 테이블에 저장

ENTITY_PATTERNS = {
    'User': {
        'PK': 'USER#user_id',
        'SK': 'PROFILE'
    },
    'Order': {
        'PK': 'USER#user_id',
        'SK': 'ORDER#order_id'
    },
    'OrderItem': {
        'PK': 'ORDER#order_id',
        'SK': 'ITEM#item_id'
    },
    'Product': {
        'PK': 'PRODUCT#product_id',
        'SK': 'DETAIL'
    }
}

# 접근 패턴별 쿼리
def get_user_with_orders(user_id):
    """사용자와 주문 목록을 한 번에 조회"""
    response = table.query(
        KeyConditionExpression='PK = :pk',
        ExpressionAttributeValues={':pk': f'USER#{user_id}'}
    )
    user = None
    orders = []
    for item in response['Items']:
        if item['SK'] == 'PROFILE':
            user = item
        elif item['SK'].startswith('ORDER#'):
            orders.append(item)
    return {'user': user, 'orders': orders}

def get_order_details(order_id):
    """주문 상세와 아이템을 한 번에 조회"""
    response = table.query(
        KeyConditionExpression='PK = :pk',
        ExpressionAttributeValues={':pk': f'ORDER#{order_id}'}
    )
    return response['Items']

7.2 Aurora Serverless v2

# Aurora Serverless v2 + Lambda
Resources:
  AuroraCluster:
    Type: AWS::RDS::DBCluster
    Properties:
      Engine: aurora-postgresql
      EngineVersion: '15.4'
      ServerlessV2ScalingConfiguration:
        MinCapacity: 0.5
        MaxCapacity: 16
      EnableHttpEndpoint: true  # Data API 사용

  AuroraInstance:
    Type: AWS::RDS::DBInstance
    Properties:
      DBClusterIdentifier: !Ref AuroraCluster
      DBInstanceClass: db.serverless
      Engine: aurora-postgresql

  # RDS Proxy로 연결 관리
  RDSProxy:
    Type: AWS::RDS::DBProxy
    Properties:
      DBProxyName: orders-proxy
      EngineFamily: POSTGRESQL
      Auth:
        - AuthScheme: SECRETS
          SecretArn: !Ref DBSecret
          IAMAuth: REQUIRED
      VpcSubnetIds:
        - !Ref PrivateSubnet1
        - !Ref PrivateSubnet2

7.3 S3 이벤트 처리 파이프라인

# S3 이벤트 -> Lambda -> DynamoDB 파이프라인
import json
import boto3
import csv
import io

s3 = boto3.client('s3')
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('ProcessedData')

def process_csv_upload(event, context):
    """S3에 업로드된 CSV를 처리"""
    bucket = event['Records'][0]['s3']['bucket']['name']
    key = event['Records'][0]['s3']['object']['key']

    # S3에서 파일 읽기
    response = s3.get_object(Bucket=bucket, Key=key)
    content = response['Body'].read().decode('utf-8')

    # CSV 파싱 및 배치 쓰기
    reader = csv.DictReader(io.StringIO(content))

    with table.batch_writer() as batch:
        for row in reader:
            batch.put_item(Item={
                'id': row['id'],
                'data': row,
                'sourceFile': key,
                'processedAt': context.get_remaining_time_in_millis()
            })

    return {
        'statusCode': 200,
        'processedFile': key
    }

8. 메시징 서비스 선택 가이드

특성	SQS	SNS	EventBridge	Kinesis
패턴	큐 (1:1)	Pub/Sub (1:N)	이벤트 버스 (N:N)	스트리밍
순서 보장	FIFO만	FIFO만	없음	파티션 내
최대 메시지	256KB	256KB	256KB	1MB
재처리	DLQ	DLQ	아카이브/재생	보존 기간
필터링	없음	메시지 속성	이벤트 패턴	없음
지연시간	ms	ms	ms	ms
처리량	무제한	무제한	초당 수천	샤드당 1MB/s
비용	요청당	발행당	이벤트당	샤드 시간당

8.2 의사결정 트리

메시징 선택 흐름:
1. 실시간 스트리밍이 필요한가?
   -> Yes: Kinesis Data Streams
   -> No: 다음으로

2. 여러 소비자에게 동시 전달?
   -> Yes: 다음으로
   -> No: SQS (단순 큐)

3. 복잡한 이벤트 라우팅/필터링?
   -> Yes: EventBridge
   -> No: SNS

4. 이벤트 재생이 필요한가?
   -> Yes: EventBridge (아카이브) 또는 Kinesis (보존)
   -> No: SNS/SQS

8.3 EventBridge 패턴 매칭

{
  "source": ["com.myapp.orders"],
  "detail-type": ["OrderCreated"],
  "detail": {
    "total": [{"numeric": [">=", 10000]}],
    "status": ["CREATED"],
    "items": {
      "category": ["electronics", "premium"]
    }
  }
}

9. Serverless 컨테이너

9.1 Lambda vs Fargate vs Cloud Run

특성	Lambda	Fargate	Cloud Run
최대 실행 시간	15분	무제한	60분
최대 메모리	10GB	120GB	32GB
vCPU	최대 6	최대 16	최대 8
스케일 투 제로	O	X (최소 1 태스크)	O
콜드 스타트	있음	없음 (상시 실행)	있음
가격 모델	실행시간+메모리	vCPU+메모리 시간	실행시간+메모리
컨테이너 이미지	10GB까지	무제한	32GB까지

9.2 Lambda Container Image

# Dockerfile - Lambda 컨테이너 이미지
FROM public.ecr.aws/lambda/python:3.12

# 의존성 설치
COPY requirements.txt .
RUN pip install -r requirements.txt

# 애플리케이션 코드
COPY app/ ./app/

# Lambda 핸들러 지정
CMD ["app.main.handler"]

# app/main.py
import json
import numpy as np  # 큰 의존성도 OK (컨테이너 이미지)
from sklearn.ensemble import RandomForestClassifier

# 모델 로드 (콜드 스타트 시 1회)
model = RandomForestClassifier()

def handler(event, context):
    """ML 추론 Lambda"""
    features = np.array(event['features']).reshape(1, -1)
    prediction = model.predict(features)

    return {
        'statusCode': 200,
        'body': json.dumps({
            'prediction': prediction.tolist()
        })
    }

10. 비용 최적화

10.1 Lambda 비용 구조

Lambda 비용 = 요청 수 비용 + 실행 시간 비용

요청 수 비용:
  - 월 100만 건 무료
  - 이후 100만 건당 약 0.20 USD

실행 시간 비용 (x86):
  - 128MB: 0.0000000021 USD / ms
  - 512MB: 0.0000000083 USD / ms
  - 1024MB: 0.0000000167 USD / ms
  - 1769MB (1 vCPU): 0.0000000289 USD / ms
  - 10240MB: 0.0000001667 USD / ms

ARM64 (Graviton2) 비용:
  - x86 대비 약 20% 저렴
  - 성능은 동등하거나 우수

Provisioned Concurrency 추가 비용:
  - 프로비저닝: 0.0000041667 USD / GB-초
  - 실행: 0.0000000150 USD / GB-ms (일반보다 저렴)

10.2 메모리 최적화 (Power Tuning)

# AWS Lambda Power Tuning 사용
# Step Functions 기반으로 최적 메모리를 찾아줌
aws stepfunctions start-execution \
  --state-machine-arn arn:aws:states:ap-northeast-2:123456789:stateMachine:powerTuning \
  --input '{
    "lambdaARN": "arn:aws:lambda:ap-northeast-2:123456789:function:my-function",
    "powerValues": [128, 256, 512, 1024, 1769, 3008],
    "num": 50,
    "payload": "{\"test\": true}",
    "parallelInvocation": true,
    "strategy": "cost"
  }'

메모리 (MB)	평균 실행 시간	비용/호출	최적 여부
128	2500ms	0.0053 USD
256	1200ms	0.0051 USD
512	600ms	0.0050 USD	비용 최적
1024	350ms	0.0058 USD
1769	200ms	0.0058 USD	성능 최적

10.3 비용 절감 체크리스트

ARM64 (Graviton2) 전환 - 20% 절감, 동등 성능
메모리 Power Tuning - 과소/과대 프로비저닝 방지
타임아웃 적절 설정 - 무한 실행 방지
DLQ 설정 - 실패 반복 호출 방지
Reserved Concurrency - 과도한 스케일링 제한
Lambda Layer 활용 - 코드 크기 줄여 콜드 스타트 감소
EventBridge 스케줄 - CloudWatch Events 대체로 비용 최적화
S3 Intelligent-Tiering - 접근 패턴에 따른 자동 최적화
DynamoDB On-Demand - 예측 불가 트래픽에 적합
API Gateway 캐싱 - Lambda 호출 감소

11. Serverless vs Container 의사결정 프레임워크

11.1 비교 매트릭스

기준	Serverless (Lambda)	Container (ECS/K8s)
실행 시간	최대 15분	무제한
스케일링 속도	초 단위	분 단위
최소 비용	0 (미사용 시)	항상 기본 비용
최대 처리량	동시성 제한 있음	Pod 수에 따라 무제한
상태 관리	Stateless	Stateful 가능
워밍업	콜드 스타트 있음	상시 실행
벤더 종속	높음	중간
운영 부담	매우 낮음	높음
디버깅	어려움	쉬움
네트워크	제한적	완전 제어

11.2 의사결정 플로우

워크로드 유형 판별:
1. 실행 시간 15분 이상? -> Container
2. 상시 트래픽 (초당 수백 건 이상)? -> Container (비용 효율)
3. 간헐적 트래픽? -> Serverless
4. GPU 필요? -> Container
5. 특수 런타임 필요? -> Container
6. 빠른 프로토타이핑? -> Serverless
7. WebSocket 장기 연결? -> Container
8. 배치 처리 (큰 데이터)? -> Step Functions + Lambda or Container

12. 모니터링과 관찰성

12.1 Lambda Powertools

# Lambda Powertools - 구조화된 로깅, 추적, 메트릭
from aws_lambda_powertools import Logger, Tracer, Metrics
from aws_lambda_powertools.metrics import MetricUnit
from aws_lambda_powertools.event_handler import APIGatewayRestResolver

logger = Logger()
tracer = Tracer()
metrics = Metrics()
app = APIGatewayRestResolver()

@app.get("/orders/<order_id>")
@tracer.capture_method
def get_order(order_id: str):
    logger.info("Fetching order", extra={"order_id": order_id})

    order = fetch_order(order_id)

    metrics.add_metric(name="OrderFetched", unit=MetricUnit.Count, value=1)
    metrics.add_dimension(name="Environment", value="production")

    return {"order": order}

@logger.inject_lambda_context
@tracer.capture_lambda_handler
@metrics.log_metrics(capture_cold_start_metric=True)
def handler(event, context):
    return app.resolve(event, context)

12.2 X-Ray 분산 추적

# X-Ray SDK로 외부 호출 추적
from aws_xray_sdk.core import xray_recorder
from aws_xray_sdk.core import patch_all

# 모든 AWS SDK 호출 자동 추적
patch_all()

@xray_recorder.capture('process_order')
def process_order(order):
    # 하위 세그먼트 생성
    subsegment = xray_recorder.begin_subsegment('validate')
    try:
        validate_order(order)
        subsegment.put_annotation('valid', True)
    except Exception as e:
        subsegment.put_annotation('valid', False)
        subsegment.add_exception(e)
        raise
    finally:
        xray_recorder.end_subsegment()

    # DynamoDB 호출 (자동 추적)
    save_order(order)

    # SNS 발행 (자동 추적)
    publish_event(order)

12.3 CloudWatch 알람 설정

Resources:
  # Lambda 에러율 알람
  LambdaErrorAlarm:
    Type: AWS::CloudWatch::Alarm
    Properties:
      AlarmName: lambda-high-error-rate
      MetricName: Errors
      Namespace: AWS/Lambda
      Dimensions:
        - Name: FunctionName
          Value: !Ref MyFunction
      Statistic: Sum
      Period: 300
      EvaluationPeriods: 2
      Threshold: 5
      ComparisonOperator: GreaterThanThreshold
      AlarmActions:
        - !Ref AlertTopic

  # Lambda 스로틀 알람
  LambdaThrottleAlarm:
    Type: AWS::CloudWatch::Alarm
    Properties:
      AlarmName: lambda-throttled
      MetricName: Throttles
      Namespace: AWS/Lambda
      Dimensions:
        - Name: FunctionName
          Value: !Ref MyFunction
      Statistic: Sum
      Period: 60
      EvaluationPeriods: 1
      Threshold: 0
      ComparisonOperator: GreaterThanThreshold
      AlarmActions:
        - !Ref AlertTopic

  # 동시성 사용률 알람
  ConcurrencyAlarm:
    Type: AWS::CloudWatch::Alarm
    Properties:
      AlarmName: lambda-high-concurrency
      MetricName: ConcurrentExecutions
      Namespace: AWS/Lambda
      Dimensions:
        - Name: FunctionName
          Value: !Ref MyFunction
      Statistic: Maximum
      Period: 60
      EvaluationPeriods: 3
      Threshold: 800
      ComparisonOperator: GreaterThanThreshold

13. 테스트 전략

13.1 로컬 테스트 with SAM CLI

# SAM CLI로 로컬 Lambda 실행
sam local invoke MyFunction \
  --event events/api-gateway.json \
  --env-vars env.json

# 로컬 API 서버 실행
sam local start-api --port 3000

# DynamoDB Local과 함께 사용
docker run -p 8000:8000 amazon/dynamodb-local
sam local invoke --docker-network host

13.2 통합 테스트

# test_integration.py
import boto3
import pytest
import json
import time

STACK_NAME = 'my-serverless-app'
API_URL = None

@pytest.fixture(scope='session', autouse=True)
def setup():
    """CloudFormation 스택에서 API URL 가져오기"""
    global API_URL
    cfn = boto3.client('cloudformation')
    response = cfn.describe_stacks(StackName=STACK_NAME)
    outputs = response['Stacks'][0]['Outputs']
    API_URL = next(o['OutputValue'] for o in outputs if o['OutputKey'] == 'ApiUrl')

def test_create_order():
    """주문 생성 통합 테스트"""
    import requests

    response = requests.post(
        f'{API_URL}/orders',
        json={
            'userId': 'test-user',
            'items': [
                {'productId': 'p1', 'name': 'Widget', 'quantity': 2, 'price': 1000}
            ]
        },
        headers={'Authorization': f'Bearer {get_test_token()}'}
    )

    assert response.status_code == 201
    data = response.json()
    assert 'orderId' in data
    assert data['status'] == 'CREATED'
    assert data['total'] == 2000

def test_get_order():
    """주문 조회 통합 테스트"""
    import requests

    # 먼저 주문 생성
    create_response = requests.post(
        f'{API_URL}/orders',
        json={
            'userId': 'test-user',
            'items': [{'productId': 'p1', 'name': 'Widget', 'quantity': 1, 'price': 500}]
        },
        headers={'Authorization': f'Bearer {get_test_token()}'}
    )
    order_id = create_response.json()['orderId']

    # 조회
    response = requests.get(
        f'{API_URL}/orders/{order_id}',
        headers={'Authorization': f'Bearer {get_test_token()}'}
    )

    assert response.status_code == 200
    assert response.json()['orderId'] == order_id

13.3 단위 테스트 (모킹)

# test_unit.py
import json
import pytest
from unittest.mock import patch, MagicMock
from moto import mock_dynamodb, mock_sns

@mock_dynamodb
@mock_sns
def test_create_order_handler():
    """Lambda 핸들러 단위 테스트"""
    import boto3

    # DynamoDB 테이블 생성
    dynamodb = boto3.resource('dynamodb', region_name='ap-northeast-2')
    table = dynamodb.create_table(
        TableName='orders',
        KeySchema=[{'AttributeName': 'orderId', 'KeyType': 'HASH'}],
        AttributeDefinitions=[{'AttributeName': 'orderId', 'AttributeType': 'S'}],
        BillingMode='PAY_PER_REQUEST'
    )

    # SNS 토픽 생성
    sns = boto3.client('sns', region_name='ap-northeast-2')
    topic = sns.create_topic(Name='order-events')

    import os
    os.environ['ORDERS_TABLE'] = 'orders'
    os.environ['ORDER_TOPIC'] = topic['TopicArn']

    from src.orders.create import handler

    event = {
        'body': json.dumps({
            'userId': 'user123',
            'items': [{'productId': 'p1', 'name': 'Test', 'quantity': 1, 'price': 1000}]
        })
    }

    context = MagicMock()
    context.aws_request_id = 'test-request-id'

    response = handler(event, context)

    assert response['statusCode'] == 201
    body = json.loads(response['body'])
    assert body['userId'] == 'user123'
    assert body['total'] == 1000

14. 실전 아키텍처 예시

14.1 이커머스 주문 시스템

클라이언트
  |
  v
[API Gateway] --> [Lambda: 주문 생성]
                     |
                     v
                  [DynamoDB: 주문 저장]
                     |
                     v
                  [EventBridge: OrderCreated 발행]
                     |
          +----------+----------+
          |          |          |
          v          v          v
     [Lambda:    [Lambda:    [Lambda:
      재고 확인]   결제 처리]   알림 발송]
          |          |
          v          v
     [DynamoDB]  [Stripe API]
          |
          v
     [Step Functions: 배송 워크플로]
          |
          v
     [Lambda: 배송 추적 업데이트]
          |
          v
     [WebSocket -> 클라이언트 실시간 알림]

14.2 미디어 처리 파이프라인

[S3: 원본 업로드]
  |
  v
[EventBridge: S3 이벤트]
  |
  v
[Step Functions: 미디어 파이프라인]
  |
  +-> [Lambda: 메타데이터 추출]
  |
  +-> [Lambda: 썸네일 생성]
  |
  +-> [Lambda: 비디오 트랜스코딩 시작]
  |      |
  |      v
  |   [MediaConvert]
  |      |
  |      v
  |   [Lambda: 트랜스코딩 완료 처리]
  |
  +-> [Lambda: AI 태깅 (Rekognition)]
  |
  v
[DynamoDB: 메타데이터 저장]
  |
  v
[CloudFront: CDN 배포]

15. 퀴즈

Q1. Lambda의 콜드 스타트가 가장 긴 런타임은?

정답: Java (SnapStart 미적용 시)

Java는 JVM 초기화, 클래스 로딩, JIT 컴파일 등으로 인해 콜드 스타트가 800ms에서 8초까지 발생할 수 있다. SnapStart를 사용하면 100-200ms로 크게 줄일 수 있다. Rust와 Go는 네이티브 바이너리로 컴파일되어 30-100ms 수준이다.

Q2. Step Functions Standard와 Express의 주요 차이점은?

정답:

Standard: 최대 1년 실행, Exactly-once, 상태 전이당 과금, 실행 이력 90일 보관
Express: 최대 5분 실행, At-least-once, 실행 시간/메모리 과금, 초당 100,000건 이상 처리 가능

Standard는 장기 실행 비즈니스 워크플로에, Express는 대량/빠른 데이터 처리에 적합하다.

Q3. Provisioned Concurrency와 Reserved Concurrency의 차이는?

정답:

Provisioned Concurrency: Lambda 인스턴스를 미리 초기화하여 콜드 스타트를 제거. 추가 비용 발생
Reserved Concurrency: 특정 함수의 최대 동시 실행 수를 제한. 비용 없음. 다른 함수의 동시성 확보가 목적

Provisioned는 성능 보장, Reserved는 리소스 격리 목적이다.

Q4. DynamoDB 단일 테이블 설계의 장단점은?

정답:

장점:

하나의 쿼리로 여러 엔티티를 조회 가능 (낮은 지연시간)
테이블 관리가 단순
트랜잭션 비용 절감

단점:

접근 패턴을 미리 알아야 함
스키마 변경이 어려움
학습 곡선이 높음
데이터 마이그레이션이 복잡

Q5. Serverless를 선택하지 말아야 하는 상황은?

정답:

실행 시간이 15분을 초과하는 장기 실행 작업
GPU가 필요한 ML 학습
상시 높은 트래픽으로 인해 컨테이너가 더 비용 효율적인 경우
WebSocket 등 장기 연결이 필요한 경우
매우 낮은 지연시간이 필수인 경우 (콜드 스타트 허용 불가)
복잡한 네트워크 구성이 필요한 경우

16. 참고 자료

AWS Lambda 공식 문서 - https://docs.aws.amazon.com/lambda/
AWS Step Functions 개발자 가이드 - https://docs.aws.amazon.com/step-functions/
Serverless Application Model (SAM) - https://docs.aws.amazon.com/serverless-application-model/
Lambda Powertools for Python - https://docs.powertools.aws.dev/lambda/python/
DynamoDB 단일 테이블 설계 - https://www.alexdebrie.com/posts/dynamodb-single-table/
AWS Well-Architected Serverless Lens - https://docs.aws.amazon.com/wellarchitected/latest/serverless-applications-lens/
Lambda Power Tuning - https://github.com/alexcasalboni/aws-lambda-power-tuning
Serverless Land - https://serverlessland.com/
EventBridge 패턴 - https://docs.aws.amazon.com/eventbridge/latest/userguide/
Aurora Serverless v2 - https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora-serverless-v2.html
API Gateway REST API - https://docs.aws.amazon.com/apigateway/latest/developerguide/
X-Ray 분산 추적 - https://docs.aws.amazon.com/xray/latest/devguide/
Serverless Framework - https://www.serverless.com/framework/docs/

Serverless Architecture Patterns Complete Guide 2025: Lambda, Step Functions, Event Sourcing, Cost Optimization

1. What is Serverless

Serverless is a computing model where the cloud provider fully abstracts away infrastructure, eliminating the need for direct server management. Developers focus solely on business logic while the cloud handles provisioning, scaling, and patching.

1.1 The 4 Principles of Serverless

Principle	Description	Examples
No Server Management	No OS patching, scaling concerns	Lambda, Cloud Functions
Auto Scaling	Scales from 0 to thousands of instances	0 to 100K requests/sec
Pay-per-Use	No cost during idle time	Billed per 100ms
Event-Driven	Requests/events trigger functions	HTTP, S3, SQS, Schedules

1.2 History of Serverless Computing

2014: AWS Lambda launched (first FaaS)
2016: Azure Functions, Google Cloud Functions
2017: AWS Step Functions, SAM launched
2018: Lambda Layers, ALB support
2019: Provisioned Concurrency, RDS Proxy
2020: Lambda Container Images, EventBridge
2021: Lambda Function URLs, Graviton2 support
2022: Lambda SnapStart (Java), streaming responses
2023: Lambda Advanced Logging, Step Functions improvements
2024: Lambda performance optimizations, ARM64 full support
2025: Lambda max memory 10GB, Step Functions Distributed Map enhancements

1.3 Cloud Provider Serverless Services

Category	AWS	Azure	GCP
FaaS	Lambda	Functions	Cloud Functions
Workflows	Step Functions	Durable Functions	Workflows
API	API Gateway	API Management	API Gateway
Messaging	SQS/SNS	Service Bus	Pub/Sub
Streaming	Kinesis	Event Hubs	Dataflow
Database	DynamoDB	Cosmos DB	Firestore
Storage	S3	Blob Storage	Cloud Storage
Event Bus	EventBridge	Event Grid	Eventarc

2. Lambda Design Patterns

How you structure your Lambda functions dramatically impacts maintainability, performance, and cost.

2.1 Single Purpose Function

Each Lambda performs exactly one task. This is the most recommended pattern.

# order_create.py - handles only order creation
import json
import boto3
import os
from datetime import datetime

dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table(os.environ['ORDERS_TABLE'])
sns = boto3.client('sns')

def handler(event, context):
    body = json.loads(event['body'])

    order = {
        'orderId': context.aws_request_id,
        'userId': body['userId'],
        'items': body['items'],
        'total': calculate_total(body['items']),
        'status': 'CREATED',
        'createdAt': datetime.utcnow().isoformat()
    }

    table.put_item(Item=order)

    # Publish event
    sns.publish(
        TopicArn=os.environ['ORDER_TOPIC'],
        Message=json.dumps(order),
        MessageAttributes={
            'eventType': {
                'DataType': 'String',
                'StringValue': 'OrderCreated'
            }
        }
    )

    return {
        'statusCode': 201,
        'body': json.dumps(order)
    }

def calculate_total(items):
    return sum(item['price'] * item['quantity'] for item in items)

Pros:

Small function size means faster cold starts
Independent deployment
Minimal IAM permissions (least privilege)
Easier debugging

Cons:

Many functions to manage
Need Layers for shared code

2.2 Monolithic Lambda (Lambda-lith)

A single Lambda handles multiple routes using frameworks like Express or FastAPI.

// app.ts - Monolithic Lambda
import express from 'express';
import serverless from 'serverless-http';

const app = express();
app.use(express.json());

// Multiple routes in a single Lambda
app.get('/orders', async (req, res) => {
  const orders = await getOrders(req.query);
  res.json(orders);
});

app.post('/orders', async (req, res) => {
  const order = await createOrder(req.body);
  res.status(201).json(order);
});

app.get('/orders/:id', async (req, res) => {
  const order = await getOrder(req.params.id);
  if (!order) return res.status(404).json({ error: 'Not found' });
  res.json(order);
});

app.put('/orders/:id/status', async (req, res) => {
  const order = await updateOrderStatus(req.params.id, req.body.status);
  res.json(order);
});

app.delete('/orders/:id', async (req, res) => {
  await cancelOrder(req.params.id);
  res.status(204).send();
});

export const handler = serverless(app);

Pros:

Migrate existing web framework code directly
Simpler function count management
Convenient local development

Cons:

Large package size means slower cold starts
Overly broad IAM permissions
One route's issue affects everything

2.3 Fan-out / Fan-in Pattern

A single event triggers multiple Lambdas simultaneously, then results are aggregated.

# serverless.yml - Fan-out architecture
service: order-processing

provider:
  name: aws
  runtime: nodejs20.x

functions:
  orderReceiver:
    handler: src/receiver.handler
    events:
      - http:
          path: /orders
          method: post
    environment:
      FAN_OUT_TOPIC: !Ref OrderFanOutTopic

  inventoryCheck:
    handler: src/inventory.handler
    events:
      - sns:
          arn: !Ref OrderFanOutTopic
          filterPolicy:
            eventType:
              - OrderCreated

  paymentProcess:
    handler: src/payment.handler
    events:
      - sns:
          arn: !Ref OrderFanOutTopic
          filterPolicy:
            eventType:
              - OrderCreated

  notificationSend:
    handler: src/notification.handler
    events:
      - sns:
          arn: !Ref OrderFanOutTopic
          filterPolicy:
            eventType:
              - OrderCreated

resources:
  Resources:
    OrderFanOutTopic:
      Type: AWS::SNS::Topic
      Properties:
        TopicName: order-fan-out

2.4 Lambda Design Pattern Comparison

Pattern	Function Count	Cold Start	Deploy Unit	Recommended For
Single Purpose	Many	Fast	Individual	Microservices
Lambda-lith	Few	Slow	Monolithic	Migration
Fan-out	Medium	Fast	Individual	Parallel processing
Lambda Layer	Medium	Moderate	Layer + Function	Shared code

3. Step Functions: Workflow Orchestration

Step Functions is AWS's serverless workflow service that visually defines complex business logic as state machines.

3.1 Standard vs Express Workflow

Feature	Standard	Express
Max Execution	1 year	5 minutes
Execution Guarantee	Exactly-once	At-least-once
Pricing	Per state transition	Execution duration + memory
Execution History	90 days retention	CloudWatch Logs
Max Throughput	2,000 transitions/sec	100,000+/sec
Use Case	Long-running workflows	High-volume fast processing

3.2 State Types

{
  "Comment": "Order Processing Workflow",
  "StartAt": "ValidateOrder",
  "States": {
    "ValidateOrder": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:us-east-1:123456789:function:validate-order",
      "Next": "CheckInventory",
      "Retry": [
        {
          "ErrorEquals": ["ServiceException"],
          "IntervalSeconds": 2,
          "MaxAttempts": 3,
          "BackoffRate": 2.0
        }
      ],
      "Catch": [
        {
          "ErrorEquals": ["ValidationError"],
          "Next": "OrderFailed"
        }
      ]
    },
    "CheckInventory": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:us-east-1:123456789:function:check-inventory",
      "Next": "ProcessPaymentOrWait"
    },
    "ProcessPaymentOrWait": {
      "Type": "Choice",
      "Choices": [
        {
          "Variable": "$.inventoryAvailable",
          "BooleanEquals": true,
          "Next": "ProcessPayment"
        }
      ],
      "Default": "WaitForInventory"
    },
    "WaitForInventory": {
      "Type": "Wait",
      "Seconds": 300,
      "Next": "CheckInventory"
    },
    "ProcessPayment": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:us-east-1:123456789:function:process-payment",
      "Next": "ParallelFulfillment"
    },
    "ParallelFulfillment": {
      "Type": "Parallel",
      "Branches": [
        {
          "StartAt": "UpdateDatabase",
          "States": {
            "UpdateDatabase": {
              "Type": "Task",
              "Resource": "arn:aws:lambda:us-east-1:123456789:function:update-db",
              "End": true
            }
          }
        },
        {
          "StartAt": "SendNotification",
          "States": {
            "SendNotification": {
              "Type": "Task",
              "Resource": "arn:aws:lambda:us-east-1:123456789:function:send-notification",
              "End": true
            }
          }
        },
        {
          "StartAt": "InitiateShipping",
          "States": {
            "InitiateShipping": {
              "Type": "Task",
              "Resource": "arn:aws:lambda:us-east-1:123456789:function:initiate-shipping",
              "End": true
            }
          }
        }
      ],
      "Next": "OrderCompleted"
    },
    "OrderCompleted": {
      "Type": "Succeed"
    },
    "OrderFailed": {
      "Type": "Fail",
      "Error": "OrderProcessingFailed",
      "Cause": "Order validation or processing failed"
    }
  }
}

3.3 State Type Summary

State Type	Purpose	Description
Task	Execute work	Invoke Lambda, DynamoDB, SQS, etc.
Choice	Conditional branching	if/else logic
Parallel	Parallel execution	Run multiple branches concurrently
Map	Iterative processing	Process each element in an array
Wait	Pause	Wait for specified time or timestamp
Pass	Data transformation	Transform input and pass through
Succeed	Success termination	Workflow completed successfully
Fail	Failure termination	Workflow failed

3.4 Callback Pattern (Human Approval Workflow)

Step Functions supports callback patterns that wait for external system responses.

# callback_handler.py - Lambda waiting for human approval
import json
import boto3

sfn = boto3.client('stepfunctions')
ses = boto3.client('ses')

def request_approval(event, context):
    """Step Functions invokes with a task token"""
    task_token = event['taskToken']
    order = event['order']

    # Send email with approval links
    approval_url = f"https://api.example.com/approve?token={task_token}"
    reject_url = f"https://api.example.com/reject?token={task_token}"

    ses.send_email(
        Source='noreply@example.com',
        Destination={'ToAddresses': ['manager@example.com']},
        Message={
            'Subject': {'Data': f"Order Approval Request: {order['orderId']}"},
            'Body': {
                'Html': {
                    'Data': f"""
                    <h2>Order Approval Request</h2>
                    <p>Order ID: {order['orderId']}</p>
                    <p>Amount: {order['total']}</p>
                    <a href="{approval_url}">Approve</a> |
                    <a href="{reject_url}">Reject</a>
                    """
                }
            }
        }
    )

def handle_approval(event, context):
    """Process approval/rejection callback"""
    params = event['queryStringParameters']
    task_token = params['token']

    if 'approve' in event['path']:
        sfn.send_task_success(
            taskToken=task_token,
            output=json.dumps({'approved': True})
        )
    else:
        sfn.send_task_failure(
            taskToken=task_token,
            error='Rejected',
            cause='Manager rejected the order'
        )

    return {
        'statusCode': 200,
        'body': json.dumps({'message': 'Processed'})
    }

4. Event-Driven Architecture Patterns

4.1 Event Sourcing with Lambda

# event_store.py
import json
import boto3
from datetime import datetime

dynamodb = boto3.resource('dynamodb')
event_store = dynamodb.Table('EventStore')
sns = boto3.client('sns')

def append_event(aggregate_id, event_type, data, version):
    """Store and publish event"""
    event = {
        'aggregateId': aggregate_id,
        'version': version,
        'eventType': event_type,
        'data': data,
        'timestamp': datetime.utcnow().isoformat(),
        'metadata': {
            'correlationId': data.get('correlationId', ''),
            'causationId': data.get('causationId', '')
        }
    }

    # Optimistic locking: fails if version already exists
    event_store.put_item(
        Item=event,
        ConditionExpression='attribute_not_exists(version)'
    )

    # Publish event
    sns.publish(
        TopicArn='arn:aws:sns:us-east-1:123456789:domain-events',
        Message=json.dumps(event),
        MessageAttributes={
            'eventType': {
                'DataType': 'String',
                'StringValue': event_type
            }
        }
    )

    return event

def replay_events(aggregate_id):
    """Replay all events for an aggregate"""
    response = event_store.query(
        KeyConditionExpression='aggregateId = :aid',
        ExpressionAttributeValues={':aid': aggregate_id},
        ScanIndexForward=True  # Chronological order
    )
    return response['Items']

4.2 Saga Pattern with Step Functions

Implement the Saga pattern for distributed transactions using Step Functions.

{
  "Comment": "Order Saga - with compensating transactions",
  "StartAt": "ReserveInventory",
  "States": {
    "ReserveInventory": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:us-east-1:123456789:function:reserve-inventory",
      "Next": "ProcessPayment",
      "Catch": [{
        "ErrorEquals": ["States.ALL"],
        "Next": "InventoryReservationFailed"
      }]
    },
    "ProcessPayment": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:us-east-1:123456789:function:process-payment",
      "Next": "ConfirmOrder",
      "Catch": [{
        "ErrorEquals": ["States.ALL"],
        "Next": "RollbackInventory"
      }]
    },
    "ConfirmOrder": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:us-east-1:123456789:function:confirm-order",
      "Next": "SagaCompleted",
      "Catch": [{
        "ErrorEquals": ["States.ALL"],
        "Next": "RollbackPayment"
      }]
    },
    "RollbackPayment": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:us-east-1:123456789:function:rollback-payment",
      "Next": "RollbackInventory"
    },
    "RollbackInventory": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:us-east-1:123456789:function:rollback-inventory",
      "Next": "SagaFailed"
    },
    "InventoryReservationFailed": {
      "Type": "Fail",
      "Error": "InventoryReservationFailed",
      "Cause": "Could not reserve inventory"
    },
    "SagaCompleted": {
      "Type": "Succeed"
    },
    "SagaFailed": {
      "Type": "Fail",
      "Error": "SagaFailed",
      "Cause": "Order saga failed, all compensations executed"
    }
  }
}

4.3 Choreography vs Orchestration

Aspect	Choreography (Events)	Orchestration (Step Functions)
Coupling	Loose	Centralized
Visibility	Requires distributed tracing	Visible in state machine
Complexity	Hard to trace event flows	Clear workflow definition
Error Handling	Each service handles independently	Central retry/compensation
Best For	Simple event flows	Complex business logic

5. Cold Start Deep Dive

Cold start is one of serverless computing's biggest technical challenges. It is the latency incurred when a Lambda function starts in a new execution environment.

5.1 Cold Start Causes

Request arrives
  |
  v
[Execution env exists?] --No--> [Cold Start Path]
  |                                  |
  Yes                           1. Provision execution env
  |                             2. Download code (S3)
  v                             3. Initialize runtime
[Warm Start]                    4. Execute handler-external code
  |                             5. Execute handler
  v                                  |
[Execute handler]                    v
  |                             [Return response]
  v
[Return response]

5.2 Cold Start Times by Runtime

Runtime	Avg Cold Start	P99 Cold Start	Package Size Impact
Python 3.12	150-300ms	500-800ms	Low
Node.js 20	150-350ms	500-900ms	Medium
Go (provided.al2023)	50-100ms	150-300ms	Very Low
Rust (provided.al2023)	30-80ms	100-250ms	Very Low
Java 21	800-3000ms	3000-8000ms	High
Java 21 (SnapStart)	100-200ms	300-500ms	Medium
.NET 8 (AOT)	200-400ms	600-1000ms	Medium

5.3 Cold Start Optimization Strategies

# Optimized Lambda function structure
import json
import os

# Initialize outside handler (reused across invocations)
# 1. Initialize connections globally
import boto3
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table(os.environ['TABLE_NAME'])

# 2. Remove unnecessary imports
# BAD: import pandas  (increases package size)
# GOOD: import only what you need

# 3. Optimize SDK configuration
from botocore.config import Config
config = Config(
    connect_timeout=5,
    read_timeout=5,
    retries={'max_attempts': 2}
)
s3 = boto3.client('s3', config=config)

def handler(event, context):
    """Keep the handler as lightweight as possible"""
    order_id = event['pathParameters']['orderId']

    response = table.get_item(Key={'orderId': order_id})
    item = response.get('Item')

    if not item:
        return {'statusCode': 404, 'body': json.dumps({'error': 'Not found'})}

    return {'statusCode': 200, 'body': json.dumps(item)}

5.4 Provisioned Concurrency

# SAM template - Provisioned Concurrency configuration
Resources:
  MyFunction:
    Type: AWS::Serverless::Function
    Properties:
      Handler: app.handler
      Runtime: python3.12
      MemorySize: 512
      AutoPublishAlias: live
      ProvisionedConcurrencyConfig:
        ProvisionedConcurrentExecutions: 10

  # Time-based auto scaling
  ScalingTarget:
    Type: AWS::ApplicationAutoScaling::ScalableTarget
    Properties:
      MaxCapacity: 100
      MinCapacity: 5
      ResourceId: !Sub function:${MyFunction}:live
      ScalableDimension: lambda:function:ProvisionedConcurrency
      ServiceNamespace: lambda

  ScalingPolicy:
    Type: AWS::ApplicationAutoScaling::ScalingPolicy
    Properties:
      PolicyName: UtilizationScaling
      PolicyType: TargetTrackingScaling
      ScalableTargetId: !Ref ScalingTarget
      TargetTrackingScalingPolicyConfiguration:
        TargetValue: 0.7
        PredefinedMetricSpecification:
          PredefinedMetricType: LambdaProvisionedConcurrencyUtilization

5.5 Java SnapStart

// SnapStart-optimized Java Lambda
import com.amazonaws.services.lambda.runtime.Context;
import com.amazonaws.services.lambda.runtime.RequestHandler;
import software.amazon.awssdk.services.dynamodb.DynamoDbClient;
import org.crac.Core;
import org.crac.Resource;

public class OrderHandler implements RequestHandler<APIGatewayProxyRequestEvent, APIGatewayProxyResponseEvent>,
                                     Resource {

    private final DynamoDbClient dynamoDb;
    private final ObjectMapper objectMapper;

    public OrderHandler() {
        // SnapStart: this initialization code is included in the snapshot
        this.dynamoDb = DynamoDbClient.create();
        this.objectMapper = new ObjectMapper();
        Core.getGlobalContext().register(this);
    }

    @Override
    public void beforeCheckpoint(org.crac.Context<? extends Resource> context) {
        // Before snapshot: clean up connections
    }

    @Override
    public void afterRestore(org.crac.Context<? extends Resource> context) {
        // After restore: re-establish connections
        // Ensure uniqueness (reset random seeds, etc.)
    }

    @Override
    public APIGatewayProxyResponseEvent handleRequest(
            APIGatewayProxyRequestEvent event, Context context) {
        // Business logic
        return new APIGatewayProxyResponseEvent()
            .withStatusCode(200)
            .withBody("{\"message\": \"OK\"}");
    }
}

6. API Patterns

6.1 REST API with API Gateway

# SAM template - REST API
Resources:
  OrdersApi:
    Type: AWS::Serverless::Api
    Properties:
      StageName: prod
      Auth:
        DefaultAuthorizer: CognitoAuthorizer
        Authorizers:
          CognitoAuthorizer:
            UserPoolArn: !GetAtt UserPool.Arn
      MethodSettings:
        - HttpMethod: '*'
          ResourcePath: '/*'
          ThrottlingBurstLimit: 100
          ThrottlingRateLimit: 50
      Cors:
        AllowMethods: "'GET,POST,PUT,DELETE,OPTIONS'"
        AllowHeaders: "'Content-Type,Authorization'"
        AllowOrigin: "'https://example.com'"

  GetOrderFunction:
    Type: AWS::Serverless::Function
    Properties:
      Handler: src/orders/get.handler
      Runtime: nodejs20.x
      Events:
        GetOrder:
          Type: Api
          Properties:
            RestApiId: !Ref OrdersApi
            Path: /orders/{orderId}
            Method: get

6.2 GraphQL with AppSync

# AppSync Schema
type Order {
  orderId: ID!
  userId: String!
  items: [OrderItem!]!
  total: Float!
  status: OrderStatus!
  createdAt: AWSDateTime!
}

type OrderItem {
  productId: String!
  name: String!
  quantity: Int!
  price: Float!
}

enum OrderStatus {
  CREATED
  PAID
  SHIPPED
  DELIVERED
  CANCELLED
}

type Query {
  getOrder(orderId: ID!): Order
  listOrders(userId: String!, limit: Int, nextToken: String): OrderConnection!
}

type Mutation {
  createOrder(input: CreateOrderInput!): Order!
  updateOrderStatus(orderId: ID!, status: OrderStatus!): Order!
}

type Subscription {
  onOrderStatusChanged(orderId: ID!): Order
    @aws_subscribe(mutations: ["updateOrderStatus"])
}

6.3 WebSocket with API Gateway

# websocket_handler.py
import json
import boto3
import os

dynamodb = boto3.resource('dynamodb')
connections_table = dynamodb.Table(os.environ['CONNECTIONS_TABLE'])

def connect(event, context):
    """WebSocket connection"""
    connection_id = event['requestContext']['connectionId']
    user_id = event['requestContext']['authorizer']['userId']

    connections_table.put_item(Item={
        'connectionId': connection_id,
        'userId': user_id
    })

    return {'statusCode': 200}

def disconnect(event, context):
    """WebSocket disconnection"""
    connection_id = event['requestContext']['connectionId']
    connections_table.delete_item(Key={'connectionId': connection_id})
    return {'statusCode': 200}

def send_message(event, context):
    """Send message"""
    domain = event['requestContext']['domainName']
    stage = event['requestContext']['stage']
    body = json.loads(event['body'])

    apigw = boto3.client(
        'apigatewaymanagementapi',
        endpoint_url=f'https://{domain}/{stage}'
    )

    # Broadcast to all connections
    connections = connections_table.scan()['Items']
    for conn in connections:
        try:
            apigw.post_to_connection(
                ConnectionId=conn['connectionId'],
                Data=json.dumps(body['message']).encode()
            )
        except apigw.exceptions.GoneException:
            connections_table.delete_item(
                Key={'connectionId': conn['connectionId']}
            )

    return {'statusCode': 200}

7. Data Patterns

7.1 DynamoDB Single-Table Design

# DynamoDB Single Table Design
# Store multiple entities in one table using PK/SK patterns

ENTITY_PATTERNS = {
    'User': {
        'PK': 'USER#user_id',
        'SK': 'PROFILE'
    },
    'Order': {
        'PK': 'USER#user_id',
        'SK': 'ORDER#order_id'
    },
    'OrderItem': {
        'PK': 'ORDER#order_id',
        'SK': 'ITEM#item_id'
    },
    'Product': {
        'PK': 'PRODUCT#product_id',
        'SK': 'DETAIL'
    }
}

# Queries by access pattern
def get_user_with_orders(user_id):
    """Fetch user and order list in a single query"""
    response = table.query(
        KeyConditionExpression='PK = :pk',
        ExpressionAttributeValues={':pk': f'USER#{user_id}'}
    )
    user = None
    orders = []
    for item in response['Items']:
        if item['SK'] == 'PROFILE':
            user = item
        elif item['SK'].startswith('ORDER#'):
            orders.append(item)
    return {'user': user, 'orders': orders}

def get_order_details(order_id):
    """Fetch order details and items in a single query"""
    response = table.query(
        KeyConditionExpression='PK = :pk',
        ExpressionAttributeValues={':pk': f'ORDER#{order_id}'}
    )
    return response['Items']

7.2 Aurora Serverless v2

# Aurora Serverless v2 + Lambda
Resources:
  AuroraCluster:
    Type: AWS::RDS::DBCluster
    Properties:
      Engine: aurora-postgresql
      EngineVersion: '15.4'
      ServerlessV2ScalingConfiguration:
        MinCapacity: 0.5
        MaxCapacity: 16
      EnableHttpEndpoint: true  # Enable Data API

  AuroraInstance:
    Type: AWS::RDS::DBInstance
    Properties:
      DBClusterIdentifier: !Ref AuroraCluster
      DBInstanceClass: db.serverless
      Engine: aurora-postgresql

  # Connection management with RDS Proxy
  RDSProxy:
    Type: AWS::RDS::DBProxy
    Properties:
      DBProxyName: orders-proxy
      EngineFamily: POSTGRESQL
      Auth:
        - AuthScheme: SECRETS
          SecretArn: !Ref DBSecret
          IAMAuth: REQUIRED
      VpcSubnetIds:
        - !Ref PrivateSubnet1
        - !Ref PrivateSubnet2

7.3 S3 Event Processing Pipeline

# S3 Event -> Lambda -> DynamoDB pipeline
import json
import boto3
import csv
import io

s3 = boto3.client('s3')
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('ProcessedData')

def process_csv_upload(event, context):
    """Process CSV uploaded to S3"""
    bucket = event['Records'][0]['s3']['bucket']['name']
    key = event['Records'][0]['s3']['object']['key']

    # Read file from S3
    response = s3.get_object(Bucket=bucket, Key=key)
    content = response['Body'].read().decode('utf-8')

    # Parse CSV and batch write
    reader = csv.DictReader(io.StringIO(content))

    with table.batch_writer() as batch:
        for row in reader:
            batch.put_item(Item={
                'id': row['id'],
                'data': row,
                'sourceFile': key,
                'processedAt': context.get_remaining_time_in_millis()
            })

    return {
        'statusCode': 200,
        'processedFile': key
    }

8. Messaging Service Selection Guide

Feature	SQS	SNS	EventBridge	Kinesis
Pattern	Queue (1:1)	Pub/Sub (1:N)	Event Bus (N:N)	Streaming
Ordering	FIFO only	FIFO only	None	Within partition
Max Message	256KB	256KB	256KB	1MB
Reprocessing	DLQ	DLQ	Archive/Replay	Retention period
Filtering	None	Message attributes	Event patterns	None
Latency	ms	ms	ms	ms
Throughput	Unlimited	Unlimited	Thousands/sec	1MB/s per shard
Pricing	Per request	Per publish	Per event	Per shard hour

8.2 Decision Tree

Messaging selection flow:
1. Need real-time streaming?
   -> Yes: Kinesis Data Streams
   -> No: continue

2. Deliver to multiple consumers simultaneously?
   -> Yes: continue
   -> No: SQS (simple queue)

3. Complex event routing/filtering?
   -> Yes: EventBridge
   -> No: SNS

4. Need event replay?
   -> Yes: EventBridge (archive) or Kinesis (retention)
   -> No: SNS/SQS

8.3 EventBridge Pattern Matching

{
  "source": ["com.myapp.orders"],
  "detail-type": ["OrderCreated"],
  "detail": {
    "total": [{"numeric": [">=", 10000]}],
    "status": ["CREATED"],
    "items": {
      "category": ["electronics", "premium"]
    }
  }
}

9. Serverless Containers

9.1 Lambda vs Fargate vs Cloud Run

Feature	Lambda	Fargate	Cloud Run
Max Execution	15 min	Unlimited	60 min
Max Memory	10GB	120GB	32GB
vCPU	Up to 6	Up to 16	Up to 8
Scale to Zero	Yes	No (min 1 task)	Yes
Cold Start	Yes	None (always running)	Yes
Pricing	Duration + memory	vCPU + memory hours	Duration + memory
Container Image	Up to 10GB	Unlimited	Up to 32GB

9.2 Lambda Container Image

# Dockerfile - Lambda container image
FROM public.ecr.aws/lambda/python:3.12

# Install dependencies
COPY requirements.txt .
RUN pip install -r requirements.txt

# Application code
COPY app/ ./app/

# Specify Lambda handler
CMD ["app.main.handler"]

# app/main.py
import json
import numpy as np  # Large dependencies OK (container image)
from sklearn.ensemble import RandomForestClassifier

# Load model (once during cold start)
model = RandomForestClassifier()

def handler(event, context):
    """ML inference Lambda"""
    features = np.array(event['features']).reshape(1, -1)
    prediction = model.predict(features)

    return {
        'statusCode': 200,
        'body': json.dumps({
            'prediction': prediction.tolist()
        })
    }

10. Cost Optimization

10.1 Lambda Cost Structure

Lambda Cost = Request Cost + Execution Duration Cost

Request Cost:
  - 1M free requests/month
  - ~$0.20 per 1M requests after

Execution Duration Cost (x86):
  - 128MB: $0.0000000021 / ms
  - 512MB: $0.0000000083 / ms
  - 1024MB: $0.0000000167 / ms
  - 1769MB (1 vCPU): $0.0000000289 / ms
  - 10240MB: $0.0000001667 / ms

ARM64 (Graviton2) Pricing:
  - ~20% cheaper than x86
  - Equal or better performance

Provisioned Concurrency Additional Cost:
  - Provisioning: $0.0000041667 / GB-second
  - Execution: $0.0000000150 / GB-ms (cheaper than on-demand)

10.2 Memory Optimization (Power Tuning)

# Using AWS Lambda Power Tuning
# Step Functions-based tool that finds optimal memory
aws stepfunctions start-execution \
  --state-machine-arn arn:aws:states:us-east-1:123456789:stateMachine:powerTuning \
  --input '{
    "lambdaARN": "arn:aws:lambda:us-east-1:123456789:function:my-function",
    "powerValues": [128, 256, 512, 1024, 1769, 3008],
    "num": 50,
    "payload": "{\"test\": true}",
    "parallelInvocation": true,
    "strategy": "cost"
  }'

Memory (MB)	Avg Duration	Cost/Invocation	Optimal?
128	2500ms	$0.0053
256	1200ms	$0.0051
512	600ms	$0.0050	Cost optimal
1024	350ms	$0.0058
1769	200ms	$0.0058	Performance optimal

10.3 Cost Reduction Checklist

Switch to ARM64 (Graviton2) - 20% savings, equal performance
Memory Power Tuning - Avoid over/under-provisioning
Set appropriate timeouts - Prevent runaway executions
Configure DLQ - Prevent repeated failed invocations
Reserved Concurrency - Limit excessive scaling
Use Lambda Layers - Reduce code size for faster cold starts
EventBridge Scheduler - Optimized alternative to CloudWatch Events
S3 Intelligent-Tiering - Auto-optimize based on access patterns
DynamoDB On-Demand - Best for unpredictable traffic
API Gateway Caching - Reduce Lambda invocations

11. Serverless vs Container Decision Framework

11.1 Comparison Matrix

Criteria	Serverless (Lambda)	Containers (ECS/K8s)
Execution Time	Max 15 min	Unlimited
Scaling Speed	Seconds	Minutes
Minimum Cost	$0 (when idle)	Always baseline cost
Max Throughput	Concurrency limited	Unlimited with pods
State Management	Stateless	Stateful possible
Warm-up	Cold starts present	Always running
Vendor Lock-in	High	Medium
Operational Burden	Very Low	High
Debugging	Harder	Easier
Networking	Limited	Full control

11.2 Decision Flow

Workload type assessment:
1. Execution time over 15 minutes? -> Containers
2. Constant traffic (hundreds of req/sec)? -> Containers (cost efficient)
3. Intermittent traffic? -> Serverless
4. GPU required? -> Containers
5. Special runtime needed? -> Containers
6. Rapid prototyping? -> Serverless
7. Long-lived WebSocket connections? -> Containers
8. Batch processing (large data)? -> Step Functions + Lambda or Containers

12. Monitoring and Observability

12.1 Lambda Powertools

# Lambda Powertools - structured logging, tracing, metrics
from aws_lambda_powertools import Logger, Tracer, Metrics
from aws_lambda_powertools.metrics import MetricUnit
from aws_lambda_powertools.event_handler import APIGatewayRestResolver

logger = Logger()
tracer = Tracer()
metrics = Metrics()
app = APIGatewayRestResolver()

@app.get("/orders/<order_id>")
@tracer.capture_method
def get_order(order_id: str):
    logger.info("Fetching order", extra={"order_id": order_id})

    order = fetch_order(order_id)

    metrics.add_metric(name="OrderFetched", unit=MetricUnit.Count, value=1)
    metrics.add_dimension(name="Environment", value="production")

    return {"order": order}

@logger.inject_lambda_context
@tracer.capture_lambda_handler
@metrics.log_metrics(capture_cold_start_metric=True)
def handler(event, context):
    return app.resolve(event, context)

12.2 X-Ray Distributed Tracing

# X-Ray SDK for tracing external calls
from aws_xray_sdk.core import xray_recorder
from aws_xray_sdk.core import patch_all

# Automatically trace all AWS SDK calls
patch_all()

@xray_recorder.capture('process_order')
def process_order(order):
    # Create subsegment
    subsegment = xray_recorder.begin_subsegment('validate')
    try:
        validate_order(order)
        subsegment.put_annotation('valid', True)
    except Exception as e:
        subsegment.put_annotation('valid', False)
        subsegment.add_exception(e)
        raise
    finally:
        xray_recorder.end_subsegment()

    # DynamoDB call (auto-traced)
    save_order(order)

    # SNS publish (auto-traced)
    publish_event(order)

12.3 CloudWatch Alarm Configuration

Resources:
  # Lambda error rate alarm
  LambdaErrorAlarm:
    Type: AWS::CloudWatch::Alarm
    Properties:
      AlarmName: lambda-high-error-rate
      MetricName: Errors
      Namespace: AWS/Lambda
      Dimensions:
        - Name: FunctionName
          Value: !Ref MyFunction
      Statistic: Sum
      Period: 300
      EvaluationPeriods: 2
      Threshold: 5
      ComparisonOperator: GreaterThanThreshold
      AlarmActions:
        - !Ref AlertTopic

  # Lambda throttle alarm
  LambdaThrottleAlarm:
    Type: AWS::CloudWatch::Alarm
    Properties:
      AlarmName: lambda-throttled
      MetricName: Throttles
      Namespace: AWS/Lambda
      Dimensions:
        - Name: FunctionName
          Value: !Ref MyFunction
      Statistic: Sum
      Period: 60
      EvaluationPeriods: 1
      Threshold: 0
      ComparisonOperator: GreaterThanThreshold
      AlarmActions:
        - !Ref AlertTopic

  # Concurrency utilization alarm
  ConcurrencyAlarm:
    Type: AWS::CloudWatch::Alarm
    Properties:
      AlarmName: lambda-high-concurrency
      MetricName: ConcurrentExecutions
      Namespace: AWS/Lambda
      Dimensions:
        - Name: FunctionName
          Value: !Ref MyFunction
      Statistic: Maximum
      Period: 60
      EvaluationPeriods: 3
      Threshold: 800
      ComparisonOperator: GreaterThanThreshold

13. Testing Strategies

13.1 Local Testing with SAM CLI

# Run Lambda locally with SAM CLI
sam local invoke MyFunction \
  --event events/api-gateway.json \
  --env-vars env.json

# Run local API server
sam local start-api --port 3000

# Use with DynamoDB Local
docker run -p 8000:8000 amazon/dynamodb-local
sam local invoke --docker-network host

13.2 Integration Tests

# test_integration.py
import boto3
import pytest
import json

STACK_NAME = 'my-serverless-app'
API_URL = None

@pytest.fixture(scope='session', autouse=True)
def setup():
    """Get API URL from CloudFormation stack"""
    global API_URL
    cfn = boto3.client('cloudformation')
    response = cfn.describe_stacks(StackName=STACK_NAME)
    outputs = response['Stacks'][0]['Outputs']
    API_URL = next(o['OutputValue'] for o in outputs if o['OutputKey'] == 'ApiUrl')

def test_create_order():
    """Order creation integration test"""
    import requests

    response = requests.post(
        f'{API_URL}/orders',
        json={
            'userId': 'test-user',
            'items': [
                {'productId': 'p1', 'name': 'Widget', 'quantity': 2, 'price': 1000}
            ]
        },
        headers={'Authorization': f'Bearer {get_test_token()}'}
    )

    assert response.status_code == 201
    data = response.json()
    assert 'orderId' in data
    assert data['status'] == 'CREATED'
    assert data['total'] == 2000

def test_get_order():
    """Order retrieval integration test"""
    import requests

    # Create order first
    create_response = requests.post(
        f'{API_URL}/orders',
        json={
            'userId': 'test-user',
            'items': [{'productId': 'p1', 'name': 'Widget', 'quantity': 1, 'price': 500}]
        },
        headers={'Authorization': f'Bearer {get_test_token()}'}
    )
    order_id = create_response.json()['orderId']

    # Fetch
    response = requests.get(
        f'{API_URL}/orders/{order_id}',
        headers={'Authorization': f'Bearer {get_test_token()}'}
    )

    assert response.status_code == 200
    assert response.json()['orderId'] == order_id

13.3 Unit Tests (Mocking)

# test_unit.py
import json
import pytest
from unittest.mock import MagicMock
from moto import mock_dynamodb, mock_sns

@mock_dynamodb
@mock_sns
def test_create_order_handler():
    """Lambda handler unit test"""
    import boto3

    # Create DynamoDB table
    dynamodb = boto3.resource('dynamodb', region_name='us-east-1')
    table = dynamodb.create_table(
        TableName='orders',
        KeySchema=[{'AttributeName': 'orderId', 'KeyType': 'HASH'}],
        AttributeDefinitions=[{'AttributeName': 'orderId', 'AttributeType': 'S'}],
        BillingMode='PAY_PER_REQUEST'
    )

    # Create SNS topic
    sns = boto3.client('sns', region_name='us-east-1')
    topic = sns.create_topic(Name='order-events')

    import os
    os.environ['ORDERS_TABLE'] = 'orders'
    os.environ['ORDER_TOPIC'] = topic['TopicArn']

    from src.orders.create import handler

    event = {
        'body': json.dumps({
            'userId': 'user123',
            'items': [{'productId': 'p1', 'name': 'Test', 'quantity': 1, 'price': 1000}]
        })
    }

    context = MagicMock()
    context.aws_request_id = 'test-request-id'

    response = handler(event, context)

    assert response['statusCode'] == 201
    body = json.loads(response['body'])
    assert body['userId'] == 'user123'
    assert body['total'] == 1000

14. Real-World Architecture Examples

14.1 E-commerce Order System

Client
  |
  v
[API Gateway] --> [Lambda: Create Order]
                     |
                     v
                  [DynamoDB: Store Order]
                     |
                     v
                  [EventBridge: Publish OrderCreated]
                     |
          +----------+----------+
          |          |          |
          v          v          v
     [Lambda:    [Lambda:    [Lambda:
      Inventory]  Payment]    Notify]
          |          |
          v          v
     [DynamoDB]  [Stripe API]
          |
          v
     [Step Functions: Shipping Workflow]
          |
          v
     [Lambda: Update Tracking]
          |
          v
     [WebSocket -> Client Real-time Notification]

14.2 Media Processing Pipeline

[S3: Original Upload]
  |
  v
[EventBridge: S3 Event]
  |
  v
[Step Functions: Media Pipeline]
  |
  +-> [Lambda: Extract Metadata]
  |
  +-> [Lambda: Generate Thumbnails]
  |
  +-> [Lambda: Start Video Transcoding]
  |      |
  |      v
  |   [MediaConvert]
  |      |
  |      v
  |   [Lambda: Handle Transcoding Complete]
  |
  +-> [Lambda: AI Tagging (Rekognition)]
  |
  v
[DynamoDB: Store Metadata]
  |
  v
[CloudFront: CDN Distribution]

15. Quiz

Q1. Which Lambda runtime has the longest cold start time?

Answer: Java (without SnapStart)

Java cold starts can range from 800ms to 8 seconds due to JVM initialization, class loading, and JIT compilation. With SnapStart, this drops dramatically to 100-200ms. Rust and Go compile to native binaries, achieving cold starts of 30-100ms.

Q2. What are the key differences between Step Functions Standard and Express?

Answer:

Standard: Max 1 year execution, Exactly-once, priced per state transition, 90-day execution history
Express: Max 5 minutes execution, At-least-once, priced by execution time/memory, can process 100,000+ events per second

Standard is ideal for long-running business workflows, while Express suits high-volume, fast data processing.

Q3. What is the difference between Provisioned Concurrency and Reserved Concurrency?

Answer:

Provisioned Concurrency: Pre-initializes Lambda instances to eliminate cold starts. Incurs additional cost
Reserved Concurrency: Limits the maximum concurrent executions for a specific function. No additional cost. Purpose is resource isolation from other functions

Provisioned is for performance guarantees, Reserved is for resource isolation.

Q4. What are the pros and cons of DynamoDB single-table design?

Answer:

Pros:

Fetch multiple entities in a single query (low latency)
Simple table management
Lower transaction costs

Cons:

Access patterns must be known in advance
Schema changes are difficult
Steep learning curve
Data migration is complex

Q5. When should you NOT choose Serverless?

Answer:

Long-running tasks exceeding 15 minutes
GPU-intensive ML training
Constant high traffic where containers are more cost-effective
Long-lived connections like WebSockets
Ultra-low latency requirements (cannot tolerate cold starts)
Complex network configurations needed

16. References

AWS Lambda Documentation - https://docs.aws.amazon.com/lambda/
AWS Step Functions Developer Guide - https://docs.aws.amazon.com/step-functions/
Serverless Application Model (SAM) - https://docs.aws.amazon.com/serverless-application-model/
Lambda Powertools for Python - https://docs.powertools.aws.dev/lambda/python/
DynamoDB Single-Table Design - https://www.alexdebrie.com/posts/dynamodb-single-table/
AWS Well-Architected Serverless Lens - https://docs.aws.amazon.com/wellarchitected/latest/serverless-applications-lens/
Lambda Power Tuning - https://github.com/alexcasalboni/aws-lambda-power-tuning
Serverless Land - https://serverlessland.com/
EventBridge Patterns - https://docs.aws.amazon.com/eventbridge/latest/userguide/
Aurora Serverless v2 - https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora-serverless-v2.html
API Gateway REST API - https://docs.aws.amazon.com/apigateway/latest/developerguide/
X-Ray Distributed Tracing - https://docs.aws.amazon.com/xray/latest/devguide/
Serverless Framework - https://www.serverless.com/framework/docs/

Serverless 아키텍처 패턴 완전 가이드 2025: Lambda, Step Functions, 이벤트 소싱, 비용 최적화

목차

1. Serverless란 무엇인가

1.1 Serverless의 4대 원칙

1.2 Serverless 컴퓨팅의 역사

1.3 주요 클라우드의 Serverless 서비스

2. Lambda 설계 패턴

2.1 단일 목적 함수 (Single Purpose Function)

2.2 모놀리식 Lambda (Lambda-lith)

2.3 Fan-out / Fan-in 패턴

2.4 Lambda 설계 패턴 비교

3. Step Functions: 워크플로 오케스트레이션

3.1 Standard vs Express Workflow

3.2 상태 타입

3.3 Step Functions 상태 타입 요약

3.4 Callback 패턴 (사람 승인 워크플로)

4. 이벤트 기반 아키텍처 패턴

4.1 이벤트 소싱 (Event Sourcing) with Lambda

4.2 Saga 패턴 with Step Functions

4.3 Choreography vs Orchestration

5. Cold Start 심층 분석

5.1 Cold Start 발생 원인

5.2 런타임별 Cold Start 시간 비교

5.3 Cold Start 최적화 전략

5.4 Provisioned Concurrency

5.5 Java SnapStart

6. API 패턴

6.1 REST API with API Gateway

6.2 GraphQL with AppSync

6.3 WebSocket with API Gateway

7. 데이터 패턴

7.1 DynamoDB 단일 테이블 설계

7.2 Aurora Serverless v2

7.3 S3 이벤트 처리 파이프라인

8. 메시징 서비스 선택 가이드

8.1 SQS vs SNS vs EventBridge vs Kinesis

8.2 의사결정 트리

8.3 EventBridge 패턴 매칭

9. Serverless 컨테이너

9.1 Lambda vs Fargate vs Cloud Run

9.2 Lambda Container Image

10. 비용 최적화

10.1 Lambda 비용 구조

10.2 메모리 최적화 (Power Tuning)

10.3 비용 절감 체크리스트

11. Serverless vs Container 의사결정 프레임워크

11.1 비교 매트릭스

11.2 의사결정 플로우

12. 모니터링과 관찰성

12.1 Lambda Powertools

12.2 X-Ray 분산 추적

12.3 CloudWatch 알람 설정

13. 테스트 전략

13.1 로컬 테스트 with SAM CLI

13.2 통합 테스트

13.3 단위 테스트 (모킹)

14. 실전 아키텍처 예시

14.1 이커머스 주문 시스템

14.2 미디어 처리 파이프라인

15. 퀴즈

16. 참고 자료

Serverless Architecture Patterns Complete Guide 2025: Lambda, Step Functions, Event Sourcing, Cost Optimization

Table of Contents

1. What is Serverless

1.1 The 4 Principles of Serverless

1.2 History of Serverless Computing

1.3 Cloud Provider Serverless Services

2. Lambda Design Patterns

2.1 Single Purpose Function

2.2 Monolithic Lambda (Lambda-lith)

2.3 Fan-out / Fan-in Pattern

2.4 Lambda Design Pattern Comparison

3. Step Functions: Workflow Orchestration

3.1 Standard vs Express Workflow

3.2 State Types

3.3 State Type Summary

3.4 Callback Pattern (Human Approval Workflow)

4. Event-Driven Architecture Patterns

4.1 Event Sourcing with Lambda

4.2 Saga Pattern with Step Functions