Split View: IaC 패턴 & 베스트 프랙티스 2025: Terraform 모듈, Pulumi, Crossplane, 상태 관리
IaC 패턴 & 베스트 프랙티스 2025: Terraform 모듈, Pulumi, Crossplane, 상태 관리
목차
1. IaC 랜드스케이프 2025: 도구 비교와 선택 기준
Infrastructure as Code(IaC)는 인프라를 코드로 정의하고 버전 관리하는 핵심 DevOps 실천법이다. 2025년 현재 IaC 생태계는 다양한 도구가 공존하며, 각각 고유한 강점을 가지고 있다.
1.1 주요 IaC 도구 비교
| 도구 | 언어 | 상태 관리 | 클라우드 지원 | 특징 |
|---|---|---|---|---|
| Terraform/OpenTofu | HCL | 원격 상태 파일 | 멀티 클라우드 | 가장 큰 생태계, 프로바이더 풍부 |
| Pulumi | TS/Python/Go/C# | Pulumi Cloud/자체 백엔드 | 멀티 클라우드 | 범용 프로그래밍 언어 사용 |
| CDKTF | TS/Python/Go/C# | Terraform 백엔드 | 멀티 클라우드 | CDK 구문 + Terraform 프로바이더 |
| AWS CDK | TS/Python/Go/C# | CloudFormation | AWS 전용 | AWS 네이티브, L2/L3 컨스트럭트 |
| Crossplane | YAML(K8s CRD) | K8s etcd | 멀티 클라우드 | K8s 네이티브, GitOps 친화 |
| Ansible | YAML | 상태 없음(절차적) | 멀티 클라우드 | 구성 관리 + 프로비저닝 |
1.2 선언적 vs 명령적 IaC
선언적(Declarative) IaC:
┌─────────────────────────────────────────────┐
│ "나는 EC2 인스턴스 3대를 원한다" │
│ → 도구가 현재 상태와 비교하여 필요한 변경 계산 │
│ → Terraform, Pulumi, CloudFormation │
└─────────────────────────────────────────────┘
명령적(Imperative) IaC:
┌─────────────────────────────────────────────┐
│ "EC2 인스턴스를 생성하라, 보안그룹을 연결하라" │
│ → 순서대로 명령을 실행 │
│ → Ansible, Shell Scripts │
└─────────────────────────────────────────────┘
1.3 OpenTofu vs Terraform
2023년 HashiCorp의 라이센스 변경(BSL) 이후 OpenTofu가 Linux Foundation 프로젝트로 탄생했다.
# OpenTofu와 Terraform은 동일한 HCL 문법 사용
# opentofu init / terraform init 모두 동일하게 동작
terraform {
required_version = ">= 1.6.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
backend "s3" {
bucket = "my-terraform-state"
key = "prod/terraform.tfstate"
region = "ap-northeast-2"
dynamodb_table = "terraform-locks"
encrypt = true
}
}
2. Terraform 모듈 설계 패턴
Terraform 모듈은 재사용 가능한 인프라 코드의 핵심 단위다. 올바른 모듈 설계는 조직 전체의 인프라 일관성과 생산성을 결정한다.
2.1 플랫 모듈 vs 중첩 모듈
플랫 모듈 구조 (권장):
modules/
├── vpc/
│ ├── main.tf
│ ├── variables.tf
│ ├── outputs.tf
│ └── versions.tf
├── ecs-cluster/
│ ├── main.tf
│ ├── variables.tf
│ ├── outputs.tf
│ └── versions.tf
└── rds/
├── main.tf
├── variables.tf
├── outputs.tf
└── versions.tf
중첩 모듈 구조 (복잡성 증가):
modules/
└── platform/
├── main.tf # vpc, ecs, rds 모듈 호출
├── modules/
│ ├── vpc/
│ ├── ecs-cluster/
│ └── rds/
└── outputs.tf
2.2 컴포지션 패턴
모듈 컴포지션은 작은 모듈을 조합하여 더 큰 인프라를 구성하는 패턴이다.
# environments/prod/main.tf
# 컴포지션 패턴: 작은 모듈을 조합
module "vpc" {
source = "../../modules/vpc"
name = "prod-vpc"
cidr_block = "10.0.0.0/16"
azs = ["ap-northeast-2a", "ap-northeast-2b", "ap-northeast-2c"]
private_subnets = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
public_subnets = ["10.0.101.0/24", "10.0.102.0/24", "10.0.103.0/24"]
enable_nat_gateway = true
single_nat_gateway = false # 프로덕션은 AZ별 NAT
}
module "ecs_cluster" {
source = "../../modules/ecs-cluster"
name = "prod-cluster"
vpc_id = module.vpc.vpc_id
subnet_ids = module.vpc.private_subnet_ids
capacity_providers = ["FARGATE", "FARGATE_SPOT"]
}
module "rds" {
source = "../../modules/rds"
name = "prod-db"
engine = "aurora-postgresql"
engine_version = "15.4"
instance_class = "db.r6g.xlarge"
vpc_id = module.vpc.vpc_id
subnet_ids = module.vpc.database_subnet_ids
security_group_ids = [module.ecs_cluster.security_group_id]
}
2.3 팩토리 패턴
동일한 모듈을 여러 번 인스턴스화하는 패턴이다.
# 서비스 팩토리 패턴
variable "services" {
type = map(object({
cpu = number
memory = number
port = number
count = number
health_check_path = string
}))
default = {
"api-gateway" = {
cpu = 512
memory = 1024
port = 8080
count = 3
health_check_path = "/health"
}
"user-service" = {
cpu = 256
memory = 512
port = 8081
count = 2
health_check_path = "/actuator/health"
}
"order-service" = {
cpu = 512
memory = 1024
port = 8082
count = 2
health_check_path = "/health"
}
}
}
module "ecs_services" {
source = "../../modules/ecs-service"
for_each = var.services
name = each.key
cluster_id = module.ecs_cluster.cluster_id
cpu = each.value.cpu
memory = each.value.memory
container_port = each.value.port
desired_count = each.value.count
health_check_path = each.value.health_check_path
subnet_ids = module.vpc.private_subnet_ids
}
2.4 래퍼 모듈 패턴
커뮤니티 모듈을 감싸서 조직 표준을 강제하는 패턴이다.
# modules/org-s3-bucket/main.tf
# 조직 표준을 강제하는 래퍼 모듈
module "s3_bucket" {
source = "terraform-aws-modules/s3-bucket/aws"
version = "~> 4.0"
bucket = var.bucket_name
# 조직 표준: 항상 암호화
server_side_encryption_configuration = {
rule = {
apply_server_side_encryption_by_default = {
sse_algorithm = "aws:kms"
kms_master_key_id = var.kms_key_id
}
}
}
# 조직 표준: 퍼블릭 접근 차단
block_public_acls = true
block_public_policy = true
ignore_public_acls = true
restrict_public_buckets = true
# 조직 표준: 버전 관리 활성화
versioning = {
enabled = true
}
# 조직 표준: 태그
tags = merge(var.tags, {
ManagedBy = "terraform"
Team = var.team
Environment = var.environment
CostCenter = var.cost_center
})
}
3. Terraform 베스트 프랙티스
3.1 원격 상태 관리
# 상태 관리 인프라 부트스트랩
# bootstrap/main.tf
resource "aws_s3_bucket" "terraform_state" {
bucket = "myorg-terraform-state-${data.aws_caller_identity.current.account_id}"
lifecycle {
prevent_destroy = true
}
}
resource "aws_s3_bucket_versioning" "terraform_state" {
bucket = aws_s3_bucket.terraform_state.id
versioning_configuration {
status = "Enabled"
}
}
resource "aws_s3_bucket_server_side_encryption_configuration" "terraform_state" {
bucket = aws_s3_bucket.terraform_state.id
rule {
apply_server_side_encryption_by_default {
sse_algorithm = "aws:kms"
}
}
}
resource "aws_dynamodb_table" "terraform_locks" {
name = "terraform-locks"
billing_mode = "PAY_PER_REQUEST"
hash_key = "LockID"
attribute {
name = "LockID"
type = "S"
}
tags = {
Name = "Terraform State Lock Table"
ManagedBy = "terraform"
}
}
3.2 워크스페이스 vs 디렉토리 전략
디렉토리 전략 (권장):
infrastructure/
├── modules/ # 재사용 모듈
│ ├── vpc/
│ ├── ecs/
│ └── rds/
├── environments/
│ ├── dev/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ ├── terraform.tfvars
│ │ └── backend.tf # dev 전용 상태 파일
│ ├── staging/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ ├── terraform.tfvars
│ │ └── backend.tf
│ └── prod/
│ ├── main.tf
│ ├── variables.tf
│ ├── terraform.tfvars
│ └── backend.tf
└── global/ # 공유 리소스 (IAM, Route53)
├── iam/
└── dns/
워크스페이스 전략 (주의 필요):
- 동일 코드, 다른 상태 → 환경 간 차이가 커지면 관리 어려움
- terraform workspace select dev
- terraform workspace select prod
- 상태 파일이 같은 백엔드에 저장 → 권한 분리 어려움
3.3 프로바이더 버전 관리
# versions.tf
terraform {
required_version = ">= 1.6.0, < 2.0.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.30" # 5.30.x 허용, 6.0 불가
}
kubernetes = {
source = "hashicorp/kubernetes"
version = ">= 2.24, < 3.0"
}
helm = {
source = "hashicorp/helm"
version = "~> 2.12"
}
}
}
# .terraform.lock.hcl 파일은 반드시 Git에 커밋
# terraform init -upgrade 로 프로바이더 업데이트
3.4 변수 검증과 타입 안전성
variable "environment" {
type = string
description = "배포 환경"
validation {
condition = contains(["dev", "staging", "prod"], var.environment)
error_message = "environment는 dev, staging, prod 중 하나여야 합니다."
}
}
variable "instance_type" {
type = string
description = "EC2 인스턴스 타입"
validation {
condition = can(regex("^(t3|m6i|c6i|r6i)\\.", var.instance_type))
error_message = "허용된 인스턴스 패밀리: t3, m6i, c6i, r6i"
}
}
variable "cidr_block" {
type = string
description = "VPC CIDR 블록"
validation {
condition = can(cidrhost(var.cidr_block, 0))
error_message = "유효한 CIDR 블록을 입력하세요."
}
}
# 복합 타입 변수
variable "scaling_config" {
type = object({
min_size = number
max_size = number
desired_size = number
})
validation {
condition = var.scaling_config.min_size <= var.scaling_config.desired_size
error_message = "min_size는 desired_size보다 작거나 같아야 합니다."
}
validation {
condition = var.scaling_config.desired_size <= var.scaling_config.max_size
error_message = "desired_size는 max_size보다 작거나 같아야 합니다."
}
}
4. Pulumi 딥 다이브: 프로그래밍 언어로 IaC
Pulumi는 TypeScript, Python, Go, C# 등 범용 프로그래밍 언어로 인프라를 정의한다. 조건문, 반복문, 추상화 등 언어의 모든 기능을 활용할 수 있다.
4.1 Pulumi TypeScript 기본 구조
// index.ts
import * as pulumi from "@pulumi/pulumi";
import * as aws from "@pulumi/aws";
const config = new pulumi.Config();
const environment = config.require("environment");
const vpcCidr = config.get("vpcCidr") || "10.0.0.0/16";
// VPC 생성
const vpc = new aws.ec2.Vpc("main-vpc", {
cidrBlock: vpcCidr,
enableDnsHostnames: true,
enableDnsSupport: true,
tags: {
Name: `${environment}-vpc`,
Environment: environment,
ManagedBy: "pulumi",
},
});
// 퍼블릭 서브넷 (프로그래밍 언어의 반복문 활용)
const azs = ["ap-northeast-2a", "ap-northeast-2b", "ap-northeast-2c"];
const publicSubnets = azs.map((az, index) => {
return new aws.ec2.Subnet(`public-subnet-${index}`, {
vpcId: vpc.id,
cidrBlock: `10.0.${index + 1}.0/24`,
availabilityZone: az,
mapPublicIpOnLaunch: true,
tags: {
Name: `${environment}-public-${az}`,
Type: "public",
},
});
});
// 출력값
export const vpcId = vpc.id;
export const publicSubnetIds = publicSubnets.map(s => s.id);
4.2 Pulumi 컴포넌트 리소스
// components/ecs-service.ts
import * as pulumi from "@pulumi/pulumi";
import * as aws from "@pulumi/aws";
interface EcsServiceArgs {
clusterArn: pulumi.Input<string>;
vpcId: pulumi.Input<string>;
subnetIds: pulumi.Input<string>[];
containerImage: string;
cpu: number;
memory: number;
port: number;
desiredCount: number;
healthCheckPath: string;
}
export class EcsService extends pulumi.ComponentResource {
public readonly serviceUrl: pulumi.Output<string>;
public readonly taskDefinitionArn: pulumi.Output<string>;
constructor(
name: string,
args: EcsServiceArgs,
opts?: pulumi.ComponentResourceOptions
) {
super("custom:app:EcsService", name, {}, opts);
const logGroup = new aws.cloudwatch.LogGroup(`${name}-logs`, {
retentionInDays: 30,
tags: { Service: name },
}, { parent: this });
const taskRole = new aws.iam.Role(`${name}-task-role`, {
assumeRolePolicy: JSON.stringify({
Version: "2012-10-17",
Statement: [{
Action: "sts:AssumeRole",
Effect: "Allow",
Principal: { Service: "ecs-tasks.amazonaws.com" },
}],
}),
}, { parent: this });
const taskDefinition = new aws.ecs.TaskDefinition(`${name}-task`, {
family: name,
cpu: args.cpu.toString(),
memory: args.memory.toString(),
networkMode: "awsvpc",
requiresCompatibilities: ["FARGATE"],
executionRoleArn: taskRole.arn,
taskRoleArn: taskRole.arn,
containerDefinitions: JSON.stringify([{
name: name,
image: args.containerImage,
cpu: args.cpu,
memory: args.memory,
portMappings: [{ containerPort: args.port }],
logConfiguration: {
logDriver: "awslogs",
options: {
"awslogs-group": logGroup.name,
"awslogs-region": "ap-northeast-2",
"awslogs-stream-prefix": "ecs",
},
},
}]),
}, { parent: this });
this.taskDefinitionArn = taskDefinition.arn;
this.registerOutputs({
taskDefinitionArn: this.taskDefinitionArn,
});
}
}
4.3 Pulumi 스택 레퍼런스
// 스택 간 데이터 공유
// infrastructure/index.ts 에서 VPC 정보 export
export const vpcId = vpc.id;
export const privateSubnetIds = privateSubnets.map(s => s.id);
// application/index.ts 에서 참조
const infraStack = new pulumi.StackReference("org/infrastructure/prod");
const vpcId = infraStack.getOutput("vpcId");
const subnetIds = infraStack.getOutput("privateSubnetIds");
4.4 Pulumi Policy as Code (CrossGuard)
// policy-pack/index.ts
import { PolicyPack, validateResourceOfType } from "@pulumi/policy";
import * as aws from "@pulumi/aws";
new PolicyPack("aws-security", {
policies: [
{
name: "s3-no-public-read",
description: "S3 버킷은 퍼블릭 읽기를 허용하면 안 됩니다",
enforcementLevel: "mandatory",
validateResource: validateResourceOfType(aws.s3.BucketAclV2, (acl, args, reportViolation) => {
if (acl.acl === "public-read" || acl.acl === "public-read-write") {
reportViolation("S3 버킷에 퍼블릭 ACL이 설정되었습니다.");
}
}),
},
{
name: "ec2-require-tags",
description: "EC2 인스턴스는 필수 태그를 가져야 합니다",
enforcementLevel: "mandatory",
validateResource: validateResourceOfType(aws.ec2.Instance, (instance, args, reportViolation) => {
const requiredTags = ["Name", "Environment", "Team", "CostCenter"];
const tags = instance.tags || {};
for (const tag of requiredTags) {
if (!(tag in tags)) {
reportViolation(`필수 태그 '${tag}'가 누락되었습니다.`);
}
}
}),
},
],
});
5. Crossplane: Kubernetes 네이티브 IaC
Crossplane은 Kubernetes CRD(Custom Resource Definition)를 사용하여 클라우드 리소스를 관리한다. kubectl로 AWS, GCP, Azure 리소스를 생성하고 관리할 수 있다.
5.1 Crossplane 아키텍처
Crossplane 아키텍처:
┌─────────────────────────────────────────────────┐
│ K8s Cluster │
│ ┌───────────────────────────────────────────┐ │
│ │ Crossplane Core │ │
│ │ ┌─────────┐ ┌──────────┐ ┌─────────┐ │ │
│ │ │Composite│ │Composition│ │ XRD │ │ │
│ │ │Resource │ │ │ │ │ │ │
│ │ └────┬────┘ └────┬─────┘ └────┬────┘ │ │
│ │ └─────────────┼─────────────┘ │ │
│ └───────────────────┬─┤─────────────────────┘ │
│ │ │ │
│ ┌───────────────────┴─┴─────────────────────┐ │
│ │ Providers │ │
│ │ ┌──────────┐ ┌──────────┐ ┌─────────┐ │ │
│ │ │AWS Prov. │ │GCP Prov. │ │Azure P. │ │ │
│ │ └────┬─────┘ └────┬─────┘ └────┬────┘ │ │
│ └───────┼──────────────┼─────────────┼──────┘ │
└──────────┼──────────────┼─────────────┼──────────┘
▼ ▼ ▼
AWS Cloud GCP Cloud Azure Cloud
5.2 XRD (Composite Resource Definition) 정의
# xrd.yaml - 플랫폼 팀이 정의하는 API 스키마
apiVersion: apiextensions.crossplane.io/v1
kind: CompositeResourceDefinition
metadata:
name: xdatabases.platform.example.com
spec:
group: platform.example.com
names:
kind: XDatabase
plural: xdatabases
claimNames:
kind: Database
plural: databases
versions:
- name: v1alpha1
served: true
referenceable: true
schema:
openAPIV3Schema:
type: object
properties:
spec:
type: object
properties:
engine:
type: string
enum: ["postgresql", "mysql"]
description: "데이터베이스 엔진"
size:
type: string
enum: ["small", "medium", "large"]
description: "인스턴스 크기"
region:
type: string
default: "ap-northeast-2"
required:
- engine
- size
5.3 Composition 정의
# composition.yaml - 실제 리소스 매핑
apiVersion: apiextensions.crossplane.io/v1
kind: Composition
metadata:
name: xdatabases.aws.platform.example.com
labels:
provider: aws
spec:
compositeTypeRef:
apiVersion: platform.example.com/v1alpha1
kind: XDatabase
resources:
- name: rds-instance
base:
apiVersion: rds.aws.upbound.io/v1beta1
kind: Instance
spec:
forProvider:
engine: postgresql
engineVersion: "15"
instanceClass: db.t3.medium
allocatedStorage: 20
publiclyAccessible: false
skipFinalSnapshot: true
patches:
- type: FromCompositeFieldPath
fromFieldPath: "spec.engine"
toFieldPath: "spec.forProvider.engine"
- type: FromCompositeFieldPath
fromFieldPath: "spec.size"
toFieldPath: "spec.forProvider.instanceClass"
transforms:
- type: map
map:
small: db.t3.medium
medium: db.r6g.large
large: db.r6g.xlarge
- type: FromCompositeFieldPath
fromFieldPath: "spec.region"
toFieldPath: "spec.forProvider.region"
- name: subnet-group
base:
apiVersion: rds.aws.upbound.io/v1beta1
kind: SubnetGroup
spec:
forProvider:
description: "Crossplane managed subnet group"
5.4 Claim 기반 프로비저닝
# claim.yaml - 개발자가 요청하는 리소스
apiVersion: platform.example.com/v1alpha1
kind: Database
metadata:
name: orders-db
namespace: orders-team
spec:
engine: postgresql
size: medium
region: ap-northeast-2
# 개발자의 사용 경험
kubectl apply -f claim.yaml
kubectl get databases -n orders-team
# NAME ENGINE SIZE READY AGE
# orders-db postgresql medium True 5m
6. 상태 관리 심화
6.1 상태 파일 백엔드 비교
S3 + DynamoDB (AWS):
┌─────────────┐ ┌──────────────┐
│ S3 Bucket │ │ DynamoDB │
│ (상태 저장) │ │ (잠금 관리) │
│ 버전 관리 ON │ │ LockID 해시 │
│ 암호화 ON │ │ 키 테이블 │
└─────────────┘ └──────────────┘
GCS (GCP):
┌─────────────────────────────┐
│ GCS Bucket │
│ (상태 저장 + 내장 잠금) │
│ 버전 관리 ON, 암호화 ON │
└─────────────────────────────┘
Terraform Cloud / Spacelift:
┌─────────────────────────────┐
│ 관리형 상태 저장 │
│ 내장 잠금, 상태 히스토리 │
│ 접근 제어, 감사 로그 │
└─────────────────────────────┘
6.2 상태 잠금과 동시성
# 상태 잠금 강제 해제 (주의: 다른 작업이 실행 중이 아닌지 확인)
terraform force-unlock LOCK_ID
# 상태 파일 조회
terraform state list
terraform state show aws_instance.web
# 상태에서 리소스 제거 (삭제 없이)
terraform state rm aws_instance.legacy
# 기존 리소스를 상태로 가져오기
terraform import aws_instance.web i-1234567890abcdef0
6.3 상태 마이그레이션
# moved 블록으로 리소스 이동 (Terraform 1.1+)
moved {
from = aws_instance.web
to = module.compute.aws_instance.web
}
moved {
from = aws_security_group.web_sg
to = module.networking.aws_security_group.web_sg
}
# import 블록으로 기존 리소스 가져오기 (Terraform 1.5+)
import {
to = aws_instance.legacy_server
id = "i-0abc123def456789"
}
import {
to = aws_s3_bucket.existing_bucket
id = "my-existing-bucket-name"
}
7. IaC 테스트 전략
7.1 Terratest
// test/vpc_test.go
package test
import (
"testing"
"github.com/gruntwork-io/terratest/modules/terraform"
"github.com/stretchr/testify/assert"
)
func TestVpcModule(t *testing.T) {
t.Parallel()
terraformOptions := &terraform.Options{
TerraformDir: "../modules/vpc",
Vars: map[string]interface{}{
"name": "test-vpc",
"cidr_block": "10.0.0.0/16",
"azs": []string{"us-east-1a", "us-east-1b"},
},
NoColor: true,
}
defer terraform.Destroy(t, terraformOptions)
terraform.InitAndApply(t, terraformOptions)
vpcId := terraform.Output(t, terraformOptions, "vpc_id")
assert.NotEmpty(t, vpcId)
privateSubnetIds := terraform.OutputList(t, terraformOptions, "private_subnet_ids")
assert.Equal(t, 2, len(privateSubnetIds))
}
7.2 Checkov 정적 분석
# Checkov 실행
checkov -d . --framework terraform
# 특정 체크 건너뛰기
checkov -d . --skip-check CKV_AWS_18,CKV_AWS_21
# JSON 리포트 생성
checkov -d . -o json > checkov-report.json
# 커스텀 정책 예시
# custom_policy.yaml
metadata:
id: "CUSTOM_001"
name: "Ensure S3 bucket has lifecycle policy"
category: "general"
definition:
cond_type: "attribute"
resource_types:
- "aws_s3_bucket"
attribute: "lifecycle_rule"
operator: "exists"
7.3 OPA Conftest
# policy/terraform.rego
package terraform
deny[msg] {
resource := input.resource_changes[_]
resource.type == "aws_instance"
not resource.change.after.tags.Environment
msg := sprintf("EC2 인스턴스 '%s'에 Environment 태그가 없습니다", [resource.address])
}
deny[msg] {
resource := input.resource_changes[_]
resource.type == "aws_s3_bucket"
resource.change.after.acl == "public-read"
msg := sprintf("S3 버킷 '%s'이 퍼블릭 읽기로 설정되었습니다", [resource.address])
}
deny[msg] {
resource := input.resource_changes[_]
resource.type == "aws_security_group_rule"
resource.change.after.cidr_blocks[_] == "0.0.0.0/0"
resource.change.after.from_port == 22
msg := "SSH(22) 포트를 0.0.0.0/0에 개방하면 안 됩니다"
}
# Conftest 실행
terraform plan -out=tfplan
terraform show -json tfplan > tfplan.json
conftest test tfplan.json -p policy/
7.4 tfsec 보안 스캐닝
# tfsec 실행
tfsec .
# 결과 예시
# Result: CRITICAL - aws_security_group.web
# Description: Security group rule allows ingress from 0.0.0.0/0 to port 22
# Impact: Unrestricted SSH access
# Resolution: Restrict SSH access to known IP ranges
# tfsec 인라인 무시
resource "aws_security_group_rule" "allow_ssh" {
#tfsec:ignore:aws-vpc-no-public-ingress-sgr
type = "ingress"
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = ["10.0.0.0/8"] # 내부 네트워크만 허용
}
8. 모노레포 vs 폴리레포 전략
8.1 모노레포 구조
infrastructure/ # 모노레포
├── .github/
│ └── workflows/
│ ├── terraform-plan.yml # PR에서 plan
│ └── terraform-apply.yml # merge 후 apply
├── modules/ # 공유 모듈
│ ├── vpc/
│ ├── ecs/
│ ├── rds/
│ └── monitoring/
├── environments/
│ ├── shared/ # 공유 리소스 (IAM, DNS)
│ │ ├── iam/
│ │ └── route53/
│ ├── dev/
│ │ ├── main.tf
│ │ └── terragrunt.hcl
│ ├── staging/
│ │ ├── main.tf
│ │ └── terragrunt.hcl
│ └── prod/
│ ├── main.tf
│ └── terragrunt.hcl
├── terragrunt.hcl # 루트 설정
└── Makefile
8.2 Terragrunt으로 DRY 달성
# environments/prod/terragrunt.hcl
include "root" {
path = find_in_parent_folders()
}
terraform {
source = "../../modules/vpc"
}
inputs = {
environment = "prod"
cidr_block = "10.0.0.0/16"
azs = ["ap-northeast-2a", "ap-northeast-2b", "ap-northeast-2c"]
enable_nat_gateway = true
single_nat_gateway = false
}
# 루트 terragrunt.hcl
remote_state {
backend = "s3"
config = {
bucket = "myorg-terraform-state"
key = "${path_relative_to_include()}/terraform.tfstate"
region = "ap-northeast-2"
encrypt = true
dynamodb_table = "terraform-locks"
}
}
generate "provider" {
path = "provider.tf"
if_exists = "overwrite_terragrunt"
contents = <<EOF
provider "aws" {
region = "ap-northeast-2"
default_tags {
tags = {
ManagedBy = "terragrunt"
Environment = "${basename(get_terragrunt_dir())}"
}
}
}
EOF
}
8.3 CI/CD 파이프라인
# .github/workflows/terraform-plan.yml
name: Terraform Plan
on:
pull_request:
paths:
- 'environments/**'
- 'modules/**'
jobs:
detect-changes:
runs-on: ubuntu-latest
outputs:
directories: "steps.changes.outputs.directories"
steps:
- uses: actions/checkout@v4
- id: changes
uses: dorny/paths-filter@v3
with:
filters: |
dev:
- 'environments/dev/**'
- 'modules/**'
staging:
- 'environments/staging/**'
- 'modules/**'
prod:
- 'environments/prod/**'
- 'modules/**'
plan:
needs: detect-changes
runs-on: ubuntu-latest
strategy:
matrix:
directory: [dev, staging, prod]
steps:
- uses: actions/checkout@v4
- uses: hashicorp/setup-terraform@v3
with:
terraform_version: "1.7.0"
- name: Terraform Init
working-directory: environments/${{ matrix.directory }}
run: terraform init -no-color
- name: Terraform Plan
working-directory: environments/${{ matrix.directory }}
run: terraform plan -no-color -out=tfplan
- name: Checkov Scan
uses: bridgecrewio/checkov-action@v12
with:
directory: environments/${{ matrix.directory }}
framework: terraform
- name: Infracost
uses: infracost/actions/setup@v3
with:
api-key: "${{ secrets.INFRACOST_API_KEY }}"
- run: |
infracost breakdown \
--path environments/${{ matrix.directory }} \
--format json \
--out-file /tmp/infracost.json
9. GitOps for IaC
9.1 Atlantis
# atlantis.yaml
version: 3
automerge: false
parallel_plan: true
parallel_apply: true
projects:
- name: dev-vpc
dir: environments/dev
workspace: default
terraform_version: v1.7.0
autoplan:
when_modified:
- "*.tf"
- "../../modules/vpc/**"
enabled: true
apply_requirements:
- approved
- mergeable
- name: prod-vpc
dir: environments/prod
workspace: default
terraform_version: v1.7.0
autoplan:
when_modified:
- "*.tf"
- "../../modules/vpc/**"
enabled: true
apply_requirements:
- approved
- mergeable
- undiverged
# Atlantis 서버 설정
# atlantis server \
# --atlantis-url="https://atlantis.example.com" \
# --gh-user="atlantis-bot" \
# --gh-token="ghp_xxx" \
# --repo-allowlist="github.com/myorg/*"
9.2 Spacelift 설정
# .spacelift/config.yml
version: "1"
stacks:
prod-infra:
space: production
project_root: environments/prod
terraform_version: "1.7.0"
autodeploy: false
administrative: false
labels:
- "env:prod"
- "team:platform"
policies:
- name: plan-approval
type: APPROVAL
body: |
package spacelift
approve {
count(input.reviews.current.approvals) >= 2
}
- name: drift-detection
type: TRIGGER
body: |
package spacelift
trigger["drift-check"] {
input.run.type == "DRIFT_DETECTION"
input.run.drift == true
}
drift_detection:
enabled: true
schedule:
- "0 */6 * * *" # 6시간마다 드리프트 감지
reconcile: false # 자동 수정하지 않음
9.3 Env0 설정
# env0.yml
version: 2
deploy:
steps:
terraformVersion: "1.7.0"
init:
commands:
- terraform init -no-color
plan:
commands:
- terraform plan -no-color -out=tfplan
- checkov -d . --framework terraform --output json > checkov.json || true
- infracost breakdown --path . --format json --out-file infracost.json || true
apply:
commands:
- terraform apply -no-color tfplan
10. 드리프트 감지와 복원
10.1 드리프트 감지 전략
# Terraform 내장 드리프트 감지
terraform plan -detailed-exitcode
# 종료 코드:
# 0 - 변경 없음
# 1 - 오류
# 2 - 변경 있음 (드리프트 감지됨)
# 자동 드리프트 감지 스크립트
#!/bin/bash
set -e
ENVIRONMENTS=("dev" "staging" "prod")
for env in "${ENVIRONMENTS[@]}"; do
echo "=== Checking drift for $env ==="
cd "environments/$env"
terraform init -no-color > /dev/null
if ! terraform plan -detailed-exitcode -no-color > "/tmp/drift-${env}.txt" 2>&1; then
EXIT_CODE=$?
if [ $EXIT_CODE -eq 2 ]; then
echo "DRIFT DETECTED in $env"
# Slack 알림 전송
curl -X POST "$SLACK_WEBHOOK" \
-H 'Content-Type: application/json' \
-d "{\"text\": \"Drift detected in ${env} environment\"}"
fi
fi
cd ../..
done
10.2 Infracost 비용 추정
# Infracost 기본 사용
infracost breakdown --path .
# PR에서 비용 차이 표시
infracost diff \
--path . \
--compare-to infracost-base.json \
--format json \
--out-file infracost-diff.json
# 비용 정책 설정
# infracost.yml
version: 0.1
projects:
- path: environments/prod
terraform_var_files:
- terraform.tfvars
usage_file: infracost-usage.yml
11. IaC 시크릿 관리
11.1 HashiCorp Vault 연동
# Vault 프로바이더
provider "vault" {
address = "https://vault.example.com"
}
# Vault에서 시크릿 읽기
data "vault_generic_secret" "db_credentials" {
path = "secret/data/prod/database"
}
resource "aws_db_instance" "main" {
engine = "postgres"
engine_version = "15"
instance_class = "db.r6g.large"
username = data.vault_generic_secret.db_credentials.data["username"]
password = data.vault_generic_secret.db_credentials.data["password"]
}
11.2 SOPS (Secrets OPerationS)
# SOPS로 암호화
sops --encrypt --age age1xxxxx secrets.yaml > secrets.enc.yaml
# SOPS 설정
# .sops.yaml
creation_rules:
- path_regex: environments/prod/.*\.enc\.yaml
age: >-
age1xxx,age2xxx
encrypted_regex: "^(password|secret|key|token)$"
- path_regex: environments/dev/.*\.enc\.yaml
age: >-
age3xxx
encrypted_regex: "^(password|secret|key|token)$"
# Terraform에서 SOPS 사용
data "sops_file" "secrets" {
source_file = "secrets.enc.yaml"
}
resource "aws_secretsmanager_secret_version" "db_password" {
secret_id = aws_secretsmanager_secret.db.id
secret_string = data.sops_file.secrets.data["db_password"]
}
12. 퀴즈
다음 퀴즈로 이 글에서 다룬 IaC 패턴에 대한 이해를 점검해 보자.
Q1. Terraform 모듈 설계
질문: Terraform 래퍼 모듈 패턴의 주된 목적은 무엇인가?
정답: 커뮤니티 모듈을 감싸서 조직 표준(암호화, 태깅, 접근 제어)을 강제하는 것이다.
래퍼 모듈은 커뮤니티 모듈을 내부적으로 호출하면서, 조직에서 필수로 요구하는 설정(S3 암호화, 퍼블릭 접근 차단, 태그 정책)을 기본값으로 적용한다. 개발자는 래퍼 모듈만 사용하면 자동으로 보안 및 거버넌스 정책을 준수하게 된다.
Q2. Pulumi vs Terraform
질문: Pulumi가 Terraform 대비 가지는 가장 큰 장점은 무엇인가?
정답: TypeScript, Python, Go 등 범용 프로그래밍 언어를 사용하므로 조건문, 반복문, 추상화 등 언어의 모든 기능을 인프라 코드에 활용할 수 있다.
HCL은 선언적 DSL이므로 복잡한 로직 구현에 한계가 있다. Pulumi는 기존 IDE, 디버거, 테스트 프레임워크, 패키지 매니저를 그대로 사용할 수 있어 개발자 생산성이 높다.
Q3. Crossplane 아키텍처
질문: Crossplane에서 XRD, Composition, Claim의 역할을 각각 설명하시오.
정답:
- XRD(Composite Resource Definition): 플랫폼 팀이 정의하는 API 스키마. 개발자에게 노출할 필드와 타입을 정의한다.
- Composition: XRD에 대한 실제 리소스 매핑. 하나의 XRD에 여러 Composition(AWS용, GCP용 등)을 연결할 수 있다.
- Claim: 개발자가 네임스페이스 수준에서 요청하는 리소스. XRD 스키마에 맞춰 간단한 YAML을 작성하면 Composition이 실제 클라우드 리소스를 생성한다.
Q4. 상태 관리
질문: Terraform 상태 파일에서 moved 블록과 import 블록의 차이점은 무엇인가?
정답:
moved블록: 이미 상태에 있는 리소스의 주소를 변경한다. 모듈 리팩토링 시 리소스를 삭제/재생성하지 않고 이동할 수 있다.import블록: 상태 파일에 없지만 실제 클라우드에 존재하는 리소스를 Terraform 관리하에 가져온다. Terraform 1.5부터 코드에서 선언적으로 import할 수 있다.
Q5. GitOps for IaC
질문: Atlantis와 Spacelift의 드리프트 감지 방식의 차이를 설명하시오.
정답:
- Atlantis: PR 기반 워크플로에 초점. 드리프트 감지 기능이 내장되어 있지 않으며, 별도의 cron 스크립트나 CI 파이프라인으로
terraform plan -detailed-exitcode를 실행해야 한다. - Spacelift: 내장 드리프트 감지 기능 제공. 스케줄(예: 6시간마다)에 따라 자동으로 plan을 실행하고, 드리프트가 발견되면 알림을 보내거나 자동 복원(reconcile) 옵션을 제공한다. 정책으로 드리프트 대응을 코드화할 수 있다.
13. 참고 자료
- HashiCorp Terraform Documentation - https://developer.hashicorp.com/terraform/docs
- Pulumi Documentation - https://www.pulumi.com/docs/
- Crossplane Documentation - https://docs.crossplane.io/
- OpenTofu Documentation - https://opentofu.org/docs/
- Terragrunt Documentation - https://terragrunt.gruntwork.io/docs/
- Terratest - https://terratest.gruntwork.io/
- Checkov by Bridgecrew - https://www.checkov.io/
- Infracost Documentation - https://www.infracost.io/docs/
- Atlantis Documentation - https://www.runatlantis.io/docs/
- Spacelift Documentation - https://docs.spacelift.io/
- SOPS (Secrets OPerationS) - https://github.com/getsops/sops
- OPA Conftest - https://www.conftest.dev/
- tfsec by Aqua Security - https://aquasecurity.github.io/tfsec/
이 글에서는 IaC의 주요 도구(Terraform, Pulumi, Crossplane)와 설계 패턴(컴포지션, 팩토리, 래퍼), 상태 관리, 테스트, GitOps, 드리프트 감지까지 포괄적으로 다루었다. 조직의 규모와 요구사항에 맞는 도구와 패턴을 선택하고, 테스트와 보안 스캐닝을 CI/CD에 통합하는 것이 성공적인 IaC 운영의 핵심이다.
IaC Patterns & Best Practices 2025: Terraform Modules, Pulumi, Crossplane, State Management
Table of Contents
1. The IaC Landscape 2025: Tool Comparison and Selection Criteria
Infrastructure as Code (IaC) is a core DevOps practice that defines and version-controls infrastructure through code. In 2025, the IaC ecosystem features a diverse set of tools, each with unique strengths.
1.1 Major IaC Tool Comparison
| Tool | Language | State Management | Cloud Support | Key Features |
|---|---|---|---|---|
| Terraform/OpenTofu | HCL | Remote state file | Multi-cloud | Largest ecosystem, rich providers |
| Pulumi | TS/Python/Go/C# | Pulumi Cloud/self-hosted | Multi-cloud | General-purpose programming languages |
| CDKTF | TS/Python/Go/C# | Terraform backend | Multi-cloud | CDK syntax + Terraform providers |
| AWS CDK | TS/Python/Go/C# | CloudFormation | AWS only | AWS native, L2/L3 constructs |
| Crossplane | YAML (K8s CRD) | K8s etcd | Multi-cloud | K8s native, GitOps friendly |
| Ansible | YAML | Stateless (procedural) | Multi-cloud | Configuration management + provisioning |
1.2 Declarative vs Imperative IaC
Declarative IaC:
┌─────────────────────────────────────────────────┐
│ "I want 3 EC2 instances" │
│ → Tool compares current state, calculates diff │
│ → Terraform, Pulumi, CloudFormation │
└─────────────────────────────────────────────────┘
Imperative IaC:
┌─────────────────────────────────────────────────┐
│ "Create an EC2 instance, attach security group" │
│ → Executes commands in order │
│ → Ansible, Shell Scripts │
└─────────────────────────────────────────────────┘
1.3 OpenTofu vs Terraform
After HashiCorp's license change to BSL in 2023, OpenTofu was born as a Linux Foundation project.
# OpenTofu and Terraform use identical HCL syntax
# Both opentofu init / terraform init work the same way
terraform {
required_version = ">= 1.6.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
backend "s3" {
bucket = "my-terraform-state"
key = "prod/terraform.tfstate"
region = "ap-northeast-2"
dynamodb_table = "terraform-locks"
encrypt = true
}
}
2. Terraform Module Design Patterns
Terraform modules are the core unit of reusable infrastructure code. Proper module design determines infrastructure consistency and productivity across your organization.
2.1 Flat Modules vs Nested Modules
Flat Module Structure (Recommended):
modules/
├── vpc/
│ ├── main.tf
│ ├── variables.tf
│ ├── outputs.tf
│ └── versions.tf
├── ecs-cluster/
│ ├── main.tf
│ ├── variables.tf
│ ├── outputs.tf
│ └── versions.tf
└── rds/
├── main.tf
├── variables.tf
├── outputs.tf
└── versions.tf
Nested Module Structure (Increased Complexity):
modules/
└── platform/
├── main.tf # Calls vpc, ecs, rds modules
├── modules/
│ ├── vpc/
│ ├── ecs-cluster/
│ └── rds/
└── outputs.tf
2.2 Composition Pattern
Module composition assembles small modules to build larger infrastructure.
# environments/prod/main.tf
# Composition pattern: combining small modules
module "vpc" {
source = "../../modules/vpc"
name = "prod-vpc"
cidr_block = "10.0.0.0/16"
azs = ["ap-northeast-2a", "ap-northeast-2b", "ap-northeast-2c"]
private_subnets = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
public_subnets = ["10.0.101.0/24", "10.0.102.0/24", "10.0.103.0/24"]
enable_nat_gateway = true
single_nat_gateway = false # Production: NAT per AZ
}
module "ecs_cluster" {
source = "../../modules/ecs-cluster"
name = "prod-cluster"
vpc_id = module.vpc.vpc_id
subnet_ids = module.vpc.private_subnet_ids
capacity_providers = ["FARGATE", "FARGATE_SPOT"]
}
module "rds" {
source = "../../modules/rds"
name = "prod-db"
engine = "aurora-postgresql"
engine_version = "15.4"
instance_class = "db.r6g.xlarge"
vpc_id = module.vpc.vpc_id
subnet_ids = module.vpc.database_subnet_ids
security_group_ids = [module.ecs_cluster.security_group_id]
}
2.3 Factory Pattern
The factory pattern instantiates the same module multiple times.
# Service Factory Pattern
variable "services" {
type = map(object({
cpu = number
memory = number
port = number
count = number
health_check_path = string
}))
default = {
"api-gateway" = {
cpu = 512
memory = 1024
port = 8080
count = 3
health_check_path = "/health"
}
"user-service" = {
cpu = 256
memory = 512
port = 8081
count = 2
health_check_path = "/actuator/health"
}
"order-service" = {
cpu = 512
memory = 1024
port = 8082
count = 2
health_check_path = "/health"
}
}
}
module "ecs_services" {
source = "../../modules/ecs-service"
for_each = var.services
name = each.key
cluster_id = module.ecs_cluster.cluster_id
cpu = each.value.cpu
memory = each.value.memory
container_port = each.value.port
desired_count = each.value.count
health_check_path = each.value.health_check_path
subnet_ids = module.vpc.private_subnet_ids
}
2.4 Wrapper Module Pattern
Wrapper modules enforce organizational standards by wrapping community modules.
# modules/org-s3-bucket/main.tf
# Wrapper module enforcing org standards
module "s3_bucket" {
source = "terraform-aws-modules/s3-bucket/aws"
version = "~> 4.0"
bucket = var.bucket_name
# Org standard: always encrypt
server_side_encryption_configuration = {
rule = {
apply_server_side_encryption_by_default = {
sse_algorithm = "aws:kms"
kms_master_key_id = var.kms_key_id
}
}
}
# Org standard: block public access
block_public_acls = true
block_public_policy = true
ignore_public_acls = true
restrict_public_buckets = true
# Org standard: enable versioning
versioning = {
enabled = true
}
# Org standard: tags
tags = merge(var.tags, {
ManagedBy = "terraform"
Team = var.team
Environment = var.environment
CostCenter = var.cost_center
})
}
3. Terraform Best Practices
3.1 Remote State Management
# State management infrastructure bootstrap
# bootstrap/main.tf
resource "aws_s3_bucket" "terraform_state" {
bucket = "myorg-terraform-state-${data.aws_caller_identity.current.account_id}"
lifecycle {
prevent_destroy = true
}
}
resource "aws_s3_bucket_versioning" "terraform_state" {
bucket = aws_s3_bucket.terraform_state.id
versioning_configuration {
status = "Enabled"
}
}
resource "aws_s3_bucket_server_side_encryption_configuration" "terraform_state" {
bucket = aws_s3_bucket.terraform_state.id
rule {
apply_server_side_encryption_by_default {
sse_algorithm = "aws:kms"
}
}
}
resource "aws_dynamodb_table" "terraform_locks" {
name = "terraform-locks"
billing_mode = "PAY_PER_REQUEST"
hash_key = "LockID"
attribute {
name = "LockID"
type = "S"
}
tags = {
Name = "Terraform State Lock Table"
ManagedBy = "terraform"
}
}
3.2 Workspaces vs Directory Strategy
Directory Strategy (Recommended):
infrastructure/
├── modules/ # Shared modules
│ ├── vpc/
│ ├── ecs/
│ └── rds/
├── environments/
│ ├── dev/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ ├── terraform.tfvars
│ │ └── backend.tf # Dev-specific state
│ ├── staging/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ ├── terraform.tfvars
│ │ └── backend.tf
│ └── prod/
│ ├── main.tf
│ ├── variables.tf
│ ├── terraform.tfvars
│ └── backend.tf
└── global/ # Shared resources (IAM, Route53)
├── iam/
└── dns/
Workspace Strategy (Use With Caution):
- Same code, different state -> Hard to manage when environments diverge
- terraform workspace select dev
- terraform workspace select prod
- State files in same backend -> Difficult to isolate permissions
3.3 Provider Version Pinning
# versions.tf
terraform {
required_version = ">= 1.6.0, < 2.0.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.30" # Allows 5.30.x, not 6.0
}
kubernetes = {
source = "hashicorp/kubernetes"
version = ">= 2.24, < 3.0"
}
helm = {
source = "hashicorp/helm"
version = "~> 2.12"
}
}
}
# Always commit .terraform.lock.hcl to Git
# Use terraform init -upgrade to update providers
3.4 Variable Validation and Type Safety
variable "environment" {
type = string
description = "Deployment environment"
validation {
condition = contains(["dev", "staging", "prod"], var.environment)
error_message = "environment must be one of: dev, staging, prod."
}
}
variable "instance_type" {
type = string
description = "EC2 instance type"
validation {
condition = can(regex("^(t3|m6i|c6i|r6i)\\.", var.instance_type))
error_message = "Allowed instance families: t3, m6i, c6i, r6i"
}
}
variable "cidr_block" {
type = string
description = "VPC CIDR block"
validation {
condition = can(cidrhost(var.cidr_block, 0))
error_message = "Please enter a valid CIDR block."
}
}
# Complex type variables
variable "scaling_config" {
type = object({
min_size = number
max_size = number
desired_size = number
})
validation {
condition = var.scaling_config.min_size <= var.scaling_config.desired_size
error_message = "min_size must be less than or equal to desired_size."
}
validation {
condition = var.scaling_config.desired_size <= var.scaling_config.max_size
error_message = "desired_size must be less than or equal to max_size."
}
}
4. Pulumi Deep Dive: IaC with Programming Languages
Pulumi defines infrastructure using general-purpose programming languages like TypeScript, Python, Go, and C#. You can leverage all language features including conditionals, loops, and abstractions.
4.1 Pulumi TypeScript Basic Structure
// index.ts
import * as pulumi from "@pulumi/pulumi";
import * as aws from "@pulumi/aws";
const config = new pulumi.Config();
const environment = config.require("environment");
const vpcCidr = config.get("vpcCidr") || "10.0.0.0/16";
// Create VPC
const vpc = new aws.ec2.Vpc("main-vpc", {
cidrBlock: vpcCidr,
enableDnsHostnames: true,
enableDnsSupport: true,
tags: {
Name: `${environment}-vpc`,
Environment: environment,
ManagedBy: "pulumi",
},
});
// Public subnets (leveraging programming language loops)
const azs = ["ap-northeast-2a", "ap-northeast-2b", "ap-northeast-2c"];
const publicSubnets = azs.map((az, index) => {
return new aws.ec2.Subnet(`public-subnet-${index}`, {
vpcId: vpc.id,
cidrBlock: `10.0.${index + 1}.0/24`,
availabilityZone: az,
mapPublicIpOnLaunch: true,
tags: {
Name: `${environment}-public-${az}`,
Type: "public",
},
});
});
// Exports
export const vpcId = vpc.id;
export const publicSubnetIds = publicSubnets.map(s => s.id);
4.2 Pulumi Component Resources
// components/ecs-service.ts
import * as pulumi from "@pulumi/pulumi";
import * as aws from "@pulumi/aws";
interface EcsServiceArgs {
clusterArn: pulumi.Input<string>;
vpcId: pulumi.Input<string>;
subnetIds: pulumi.Input<string>[];
containerImage: string;
cpu: number;
memory: number;
port: number;
desiredCount: number;
healthCheckPath: string;
}
export class EcsService extends pulumi.ComponentResource {
public readonly serviceUrl: pulumi.Output<string>;
public readonly taskDefinitionArn: pulumi.Output<string>;
constructor(
name: string,
args: EcsServiceArgs,
opts?: pulumi.ComponentResourceOptions
) {
super("custom:app:EcsService", name, {}, opts);
const logGroup = new aws.cloudwatch.LogGroup(`${name}-logs`, {
retentionInDays: 30,
tags: { Service: name },
}, { parent: this });
const taskRole = new aws.iam.Role(`${name}-task-role`, {
assumeRolePolicy: JSON.stringify({
Version: "2012-10-17",
Statement: [{
Action: "sts:AssumeRole",
Effect: "Allow",
Principal: { Service: "ecs-tasks.amazonaws.com" },
}],
}),
}, { parent: this });
const taskDefinition = new aws.ecs.TaskDefinition(`${name}-task`, {
family: name,
cpu: args.cpu.toString(),
memory: args.memory.toString(),
networkMode: "awsvpc",
requiresCompatibilities: ["FARGATE"],
executionRoleArn: taskRole.arn,
taskRoleArn: taskRole.arn,
containerDefinitions: JSON.stringify([{
name: name,
image: args.containerImage,
cpu: args.cpu,
memory: args.memory,
portMappings: [{ containerPort: args.port }],
logConfiguration: {
logDriver: "awslogs",
options: {
"awslogs-group": logGroup.name,
"awslogs-region": "ap-northeast-2",
"awslogs-stream-prefix": "ecs",
},
},
}]),
}, { parent: this });
this.taskDefinitionArn = taskDefinition.arn;
this.registerOutputs({
taskDefinitionArn: this.taskDefinitionArn,
});
}
}
4.3 Pulumi Stack References
// Sharing data between stacks
// infrastructure/index.ts exports VPC info
export const vpcId = vpc.id;
export const privateSubnetIds = privateSubnets.map(s => s.id);
// application/index.ts references it
const infraStack = new pulumi.StackReference("org/infrastructure/prod");
const vpcId = infraStack.getOutput("vpcId");
const subnetIds = infraStack.getOutput("privateSubnetIds");
4.4 Pulumi Policy as Code (CrossGuard)
// policy-pack/index.ts
import { PolicyPack, validateResourceOfType } from "@pulumi/policy";
import * as aws from "@pulumi/aws";
new PolicyPack("aws-security", {
policies: [
{
name: "s3-no-public-read",
description: "S3 buckets must not allow public read access",
enforcementLevel: "mandatory",
validateResource: validateResourceOfType(aws.s3.BucketAclV2, (acl, args, reportViolation) => {
if (acl.acl === "public-read" || acl.acl === "public-read-write") {
reportViolation("S3 bucket has a public ACL configured.");
}
}),
},
{
name: "ec2-require-tags",
description: "EC2 instances must have required tags",
enforcementLevel: "mandatory",
validateResource: validateResourceOfType(aws.ec2.Instance, (instance, args, reportViolation) => {
const requiredTags = ["Name", "Environment", "Team", "CostCenter"];
const tags = instance.tags || {};
for (const tag of requiredTags) {
if (!(tag in tags)) {
reportViolation(`Required tag '${tag}' is missing.`);
}
}
}),
},
],
});
5. Crossplane: Kubernetes-Native IaC
Crossplane uses Kubernetes CRDs (Custom Resource Definitions) to manage cloud resources. You can create and manage AWS, GCP, and Azure resources using kubectl.
5.1 Crossplane Architecture
Crossplane Architecture:
┌─────────────────────────────────────────────────┐
│ K8s Cluster │
│ ┌───────────────────────────────────────────┐ │
│ │ Crossplane Core │ │
│ │ ┌─────────┐ ┌──────────┐ ┌─────────┐ │ │
│ │ │Composite│ │Composition│ │ XRD │ │ │
│ │ │Resource │ │ │ │ │ │ │
│ │ └────┬────┘ └────┬─────┘ └────┬────┘ │ │
│ │ └─────────────┼─────────────┘ │ │
│ └───────────────────┬─┤─────────────────────┘ │
│ │ │ │
│ ┌───────────────────┴─┴─────────────────────┐ │
│ │ Providers │ │
│ │ ┌──────────┐ ┌──────────┐ ┌─────────┐ │ │
│ │ │AWS Prov. │ │GCP Prov. │ │Azure P. │ │ │
│ │ └────┬─────┘ └────┬─────┘ └────┬────┘ │ │
│ └───────┼──────────────┼─────────────┼──────┘ │
└──────────┼──────────────┼─────────────┼──────────┘
▼ ▼ ▼
AWS Cloud GCP Cloud Azure Cloud
5.2 XRD (Composite Resource Definition)
# xrd.yaml - API schema defined by platform team
apiVersion: apiextensions.crossplane.io/v1
kind: CompositeResourceDefinition
metadata:
name: xdatabases.platform.example.com
spec:
group: platform.example.com
names:
kind: XDatabase
plural: xdatabases
claimNames:
kind: Database
plural: databases
versions:
- name: v1alpha1
served: true
referenceable: true
schema:
openAPIV3Schema:
type: object
properties:
spec:
type: object
properties:
engine:
type: string
enum: ["postgresql", "mysql"]
description: "Database engine"
size:
type: string
enum: ["small", "medium", "large"]
description: "Instance size"
region:
type: string
default: "ap-northeast-2"
required:
- engine
- size
5.3 Composition Definition
# composition.yaml - Actual resource mapping
apiVersion: apiextensions.crossplane.io/v1
kind: Composition
metadata:
name: xdatabases.aws.platform.example.com
labels:
provider: aws
spec:
compositeTypeRef:
apiVersion: platform.example.com/v1alpha1
kind: XDatabase
resources:
- name: rds-instance
base:
apiVersion: rds.aws.upbound.io/v1beta1
kind: Instance
spec:
forProvider:
engine: postgresql
engineVersion: "15"
instanceClass: db.t3.medium
allocatedStorage: 20
publiclyAccessible: false
skipFinalSnapshot: true
patches:
- type: FromCompositeFieldPath
fromFieldPath: "spec.engine"
toFieldPath: "spec.forProvider.engine"
- type: FromCompositeFieldPath
fromFieldPath: "spec.size"
toFieldPath: "spec.forProvider.instanceClass"
transforms:
- type: map
map:
small: db.t3.medium
medium: db.r6g.large
large: db.r6g.xlarge
- type: FromCompositeFieldPath
fromFieldPath: "spec.region"
toFieldPath: "spec.forProvider.region"
- name: subnet-group
base:
apiVersion: rds.aws.upbound.io/v1beta1
kind: SubnetGroup
spec:
forProvider:
description: "Crossplane managed subnet group"
5.4 Claim-Based Provisioning
# claim.yaml - Resource request by developers
apiVersion: platform.example.com/v1alpha1
kind: Database
metadata:
name: orders-db
namespace: orders-team
spec:
engine: postgresql
size: medium
region: ap-northeast-2
# Developer experience
kubectl apply -f claim.yaml
kubectl get databases -n orders-team
# NAME ENGINE SIZE READY AGE
# orders-db postgresql medium True 5m
6. State Management Deep Dive
6.1 State Backend Comparison
S3 + DynamoDB (AWS):
┌─────────────┐ ┌──────────────┐
│ S3 Bucket │ │ DynamoDB │
│ (State) │ │ (Locking) │
│ Versioning │ │ LockID hash │
│ Encryption │ │ key table │
└─────────────┘ └──────────────┘
GCS (GCP):
┌─────────────────────────────┐
│ GCS Bucket │
│ (State + built-in locking) │
│ Versioning ON, Encryption ON│
└─────────────────────────────┘
Terraform Cloud / Spacelift:
┌─────────────────────────────┐
│ Managed state storage │
│ Built-in locking, history │
│ Access control, audit logs │
└─────────────────────────────┘
6.2 State Locking and Concurrency
# Force-unlock state (caution: verify no other operations running)
terraform force-unlock LOCK_ID
# Query state
terraform state list
terraform state show aws_instance.web
# Remove resource from state (without destroying)
terraform state rm aws_instance.legacy
# Import existing resource into state
terraform import aws_instance.web i-1234567890abcdef0
6.3 State Migration
# moved block for resource relocation (Terraform 1.1+)
moved {
from = aws_instance.web
to = module.compute.aws_instance.web
}
moved {
from = aws_security_group.web_sg
to = module.networking.aws_security_group.web_sg
}
# import block for existing resources (Terraform 1.5+)
import {
to = aws_instance.legacy_server
id = "i-0abc123def456789"
}
import {
to = aws_s3_bucket.existing_bucket
id = "my-existing-bucket-name"
}
7. IaC Testing Strategy
7.1 Terratest
// test/vpc_test.go
package test
import (
"testing"
"github.com/gruntwork-io/terratest/modules/terraform"
"github.com/stretchr/testify/assert"
)
func TestVpcModule(t *testing.T) {
t.Parallel()
terraformOptions := &terraform.Options{
TerraformDir: "../modules/vpc",
Vars: map[string]interface{}{
"name": "test-vpc",
"cidr_block": "10.0.0.0/16",
"azs": []string{"us-east-1a", "us-east-1b"},
},
NoColor: true,
}
defer terraform.Destroy(t, terraformOptions)
terraform.InitAndApply(t, terraformOptions)
vpcId := terraform.Output(t, terraformOptions, "vpc_id")
assert.NotEmpty(t, vpcId)
privateSubnetIds := terraform.OutputList(t, terraformOptions, "private_subnet_ids")
assert.Equal(t, 2, len(privateSubnetIds))
}
7.2 Checkov Static Analysis
# Run Checkov
checkov -d . --framework terraform
# Skip specific checks
checkov -d . --skip-check CKV_AWS_18,CKV_AWS_21
# Generate JSON report
checkov -d . -o json > checkov-report.json
# custom_policy.yaml
metadata:
id: "CUSTOM_001"
name: "Ensure S3 bucket has lifecycle policy"
category: "general"
definition:
cond_type: "attribute"
resource_types:
- "aws_s3_bucket"
attribute: "lifecycle_rule"
operator: "exists"
7.3 OPA Conftest
# policy/terraform.rego
package terraform
deny[msg] {
resource := input.resource_changes[_]
resource.type == "aws_instance"
not resource.change.after.tags.Environment
msg := sprintf("EC2 instance '%s' is missing Environment tag", [resource.address])
}
deny[msg] {
resource := input.resource_changes[_]
resource.type == "aws_s3_bucket"
resource.change.after.acl == "public-read"
msg := sprintf("S3 bucket '%s' has public-read ACL", [resource.address])
}
deny[msg] {
resource := input.resource_changes[_]
resource.type == "aws_security_group_rule"
resource.change.after.cidr_blocks[_] == "0.0.0.0/0"
resource.change.after.from_port == 22
msg := "SSH port 22 must not be open to 0.0.0.0/0"
}
# Run Conftest
terraform plan -out=tfplan
terraform show -json tfplan > tfplan.json
conftest test tfplan.json -p policy/
7.4 tfsec Security Scanning
# Run tfsec
tfsec .
# Example output:
# Result: CRITICAL - aws_security_group.web
# Description: Security group rule allows ingress from 0.0.0.0/0 to port 22
# Impact: Unrestricted SSH access
# Resolution: Restrict SSH access to known IP ranges
# tfsec inline ignore
resource "aws_security_group_rule" "allow_ssh" {
#tfsec:ignore:aws-vpc-no-public-ingress-sgr
type = "ingress"
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = ["10.0.0.0/8"] # Internal network only
}
8. Monorepo vs Polyrepo Strategy
8.1 Monorepo Structure
infrastructure/ # Monorepo
├── .github/
│ └── workflows/
│ ├── terraform-plan.yml # Plan on PR
│ └── terraform-apply.yml # Apply after merge
├── modules/ # Shared modules
│ ├── vpc/
│ ├── ecs/
│ ├── rds/
│ └── monitoring/
├── environments/
│ ├── shared/ # Shared resources (IAM, DNS)
│ │ ├── iam/
│ │ └── route53/
│ ├── dev/
│ │ ├── main.tf
│ │ └── terragrunt.hcl
│ ├── staging/
│ │ ├── main.tf
│ │ └── terragrunt.hcl
│ └── prod/
│ ├── main.tf
│ └── terragrunt.hcl
├── terragrunt.hcl # Root config
└── Makefile
8.2 DRY with Terragrunt
# environments/prod/terragrunt.hcl
include "root" {
path = find_in_parent_folders()
}
terraform {
source = "../../modules/vpc"
}
inputs = {
environment = "prod"
cidr_block = "10.0.0.0/16"
azs = ["ap-northeast-2a", "ap-northeast-2b", "ap-northeast-2c"]
enable_nat_gateway = true
single_nat_gateway = false
}
# Root terragrunt.hcl
remote_state {
backend = "s3"
config = {
bucket = "myorg-terraform-state"
key = "${path_relative_to_include()}/terraform.tfstate"
region = "ap-northeast-2"
encrypt = true
dynamodb_table = "terraform-locks"
}
}
generate "provider" {
path = "provider.tf"
if_exists = "overwrite_terragrunt"
contents = <<EOF
provider "aws" {
region = "ap-northeast-2"
default_tags {
tags = {
ManagedBy = "terragrunt"
Environment = "${basename(get_terragrunt_dir())}"
}
}
}
EOF
}
8.3 CI/CD Pipeline
# .github/workflows/terraform-plan.yml
name: Terraform Plan
on:
pull_request:
paths:
- 'environments/**'
- 'modules/**'
jobs:
detect-changes:
runs-on: ubuntu-latest
outputs:
directories: "steps.changes.outputs.directories"
steps:
- uses: actions/checkout@v4
- id: changes
uses: dorny/paths-filter@v3
with:
filters: |
dev:
- 'environments/dev/**'
- 'modules/**'
staging:
- 'environments/staging/**'
- 'modules/**'
prod:
- 'environments/prod/**'
- 'modules/**'
plan:
needs: detect-changes
runs-on: ubuntu-latest
strategy:
matrix:
directory: [dev, staging, prod]
steps:
- uses: actions/checkout@v4
- uses: hashicorp/setup-terraform@v3
with:
terraform_version: "1.7.0"
- name: Terraform Init
working-directory: environments/${{ matrix.directory }}
run: terraform init -no-color
- name: Terraform Plan
working-directory: environments/${{ matrix.directory }}
run: terraform plan -no-color -out=tfplan
- name: Checkov Scan
uses: bridgecrewio/checkov-action@v12
with:
directory: environments/${{ matrix.directory }}
framework: terraform
- name: Infracost
uses: infracost/actions/setup@v3
with:
api-key: "${{ secrets.INFRACOST_API_KEY }}"
- run: |
infracost breakdown \
--path environments/${{ matrix.directory }} \
--format json \
--out-file /tmp/infracost.json
9. GitOps for IaC
9.1 Atlantis
# atlantis.yaml
version: 3
automerge: false
parallel_plan: true
parallel_apply: true
projects:
- name: dev-vpc
dir: environments/dev
workspace: default
terraform_version: v1.7.0
autoplan:
when_modified:
- "*.tf"
- "../../modules/vpc/**"
enabled: true
apply_requirements:
- approved
- mergeable
- name: prod-vpc
dir: environments/prod
workspace: default
terraform_version: v1.7.0
autoplan:
when_modified:
- "*.tf"
- "../../modules/vpc/**"
enabled: true
apply_requirements:
- approved
- mergeable
- undiverged
9.2 Spacelift Configuration
# .spacelift/config.yml
version: "1"
stacks:
prod-infra:
space: production
project_root: environments/prod
terraform_version: "1.7.0"
autodeploy: false
administrative: false
labels:
- "env:prod"
- "team:platform"
policies:
- name: plan-approval
type: APPROVAL
body: |
package spacelift
approve {
count(input.reviews.current.approvals) >= 2
}
- name: drift-detection
type: TRIGGER
body: |
package spacelift
trigger["drift-check"] {
input.run.type == "DRIFT_DETECTION"
input.run.drift == true
}
drift_detection:
enabled: true
schedule:
- "0 */6 * * *" # Drift detection every 6 hours
reconcile: false # Do not auto-remediate
10. Drift Detection and Remediation
10.1 Drift Detection Strategy
# Terraform built-in drift detection
terraform plan -detailed-exitcode
# Exit codes:
# 0 - No changes
# 1 - Error
# 2 - Changes detected (drift found)
# Automated drift detection script
#!/bin/bash
set -e
ENVIRONMENTS=("dev" "staging" "prod")
for env in "${ENVIRONMENTS[@]}"; do
echo "=== Checking drift for $env ==="
cd "environments/$env"
terraform init -no-color > /dev/null
if ! terraform plan -detailed-exitcode -no-color > "/tmp/drift-${env}.txt" 2>&1; then
EXIT_CODE=$?
if [ $EXIT_CODE -eq 2 ]; then
echo "DRIFT DETECTED in $env"
curl -X POST "$SLACK_WEBHOOK" \
-H 'Content-Type: application/json' \
-d "{\"text\": \"Drift detected in ${env} environment\"}"
fi
fi
cd ../..
done
10.2 Infracost Cost Estimation
# Basic Infracost usage
infracost breakdown --path .
# Show cost diff in PRs
infracost diff \
--path . \
--compare-to infracost-base.json \
--format json \
--out-file infracost-diff.json
11. Secrets Management in IaC
11.1 HashiCorp Vault Integration
# Vault provider
provider "vault" {
address = "https://vault.example.com"
}
# Read secrets from Vault
data "vault_generic_secret" "db_credentials" {
path = "secret/data/prod/database"
}
resource "aws_db_instance" "main" {
engine = "postgres"
engine_version = "15"
instance_class = "db.r6g.large"
username = data.vault_generic_secret.db_credentials.data["username"]
password = data.vault_generic_secret.db_credentials.data["password"]
}
11.2 SOPS (Secrets OPerationS)
# Encrypt with SOPS
sops --encrypt --age age1xxxxx secrets.yaml > secrets.enc.yaml
# .sops.yaml
creation_rules:
- path_regex: environments/prod/.*\.enc\.yaml
age: >-
age1xxx,age2xxx
encrypted_regex: "^(password|secret|key|token)$"
- path_regex: environments/dev/.*\.enc\.yaml
age: >-
age3xxx
encrypted_regex: "^(password|secret|key|token)$"
# Using SOPS in Terraform
data "sops_file" "secrets" {
source_file = "secrets.enc.yaml"
}
resource "aws_secretsmanager_secret_version" "db_password" {
secret_id = aws_secretsmanager_secret.db.id
secret_string = data.sops_file.secrets.data["db_password"]
}
12. Quiz
Test your understanding of IaC patterns covered in this article.
Q1. Terraform Module Design
Question: What is the primary purpose of the Terraform wrapper module pattern?
Answer: To enforce organizational standards (encryption, tagging, access control) by wrapping community modules.
Wrapper modules internally call community modules while applying organization-required settings (S3 encryption, public access blocking, tag policies) as defaults. Developers using wrapper modules automatically comply with security and governance policies.
Q2. Pulumi vs Terraform
Question: What is the biggest advantage of Pulumi over Terraform?
Answer: Pulumi uses general-purpose programming languages like TypeScript, Python, and Go, enabling the use of all language features including conditionals, loops, and abstractions for infrastructure code.
HCL is a declarative DSL with limitations for complex logic. Pulumi allows developers to use existing IDEs, debuggers, test frameworks, and package managers, leading to higher developer productivity.
Q3. Crossplane Architecture
Question: Explain the roles of XRD, Composition, and Claim in Crossplane.
Answer:
- XRD (Composite Resource Definition): API schema defined by the platform team. Defines fields and types exposed to developers.
- Composition: Actual resource mapping for an XRD. Multiple Compositions (for AWS, GCP, etc.) can be linked to a single XRD.
- Claim: Namespace-level resource request by developers. Writing simple YAML matching the XRD schema triggers the Composition to create actual cloud resources.
Q4. State Management
Question: What is the difference between moved blocks and import blocks in Terraform state?
Answer:
movedblock: Changes the address of a resource already in state. Enables moving resources during module refactoring without delete/recreate.importblock: Brings resources that exist in the cloud but are not in state under Terraform management. Since Terraform 1.5, imports can be declared declaratively in code.
Q5. GitOps for IaC
Question: Explain the difference in drift detection between Atlantis and Spacelift.
Answer:
- Atlantis: Focuses on PR-based workflows. No built-in drift detection; requires separate cron scripts or CI pipelines running
terraform plan -detailed-exitcode. - Spacelift: Built-in drift detection. Automatically runs plans on a schedule (e.g., every 6 hours) and can notify on drift or auto-remediate (reconcile). Drift response can be codified through policies.
13. References
- HashiCorp Terraform Documentation - https://developer.hashicorp.com/terraform/docs
- Pulumi Documentation - https://www.pulumi.com/docs/
- Crossplane Documentation - https://docs.crossplane.io/
- OpenTofu Documentation - https://opentofu.org/docs/
- Terragrunt Documentation - https://terragrunt.gruntwork.io/docs/
- Terratest - https://terratest.gruntwork.io/
- Checkov by Bridgecrew - https://www.checkov.io/
- Infracost Documentation - https://www.infracost.io/docs/
- Atlantis Documentation - https://www.runatlantis.io/docs/
- Spacelift Documentation - https://docs.spacelift.io/
- SOPS (Secrets OPerationS) - https://github.com/getsops/sops
- OPA Conftest - https://www.conftest.dev/
- tfsec by Aqua Security - https://aquasecurity.github.io/tfsec/
This article comprehensively covered major IaC tools (Terraform, Pulumi, Crossplane), design patterns (composition, factory, wrapper), state management, testing, GitOps, and drift detection. Choosing the right tools and patterns for your organization's scale and requirements, and integrating testing and security scanning into CI/CD, are the keys to successful IaC operations.