Skip to content
Published on

Terraform & IaC Complete Guide — Everything About Managing Infrastructure as Code

Authors

1. What Is Infrastructure as Code

Why Manage Infrastructure as Code

Traditional infrastructure management required administrators to manually log into consoles to provision servers and configure networks. This approach has several problems.

  • Not reproducible: Recreating the same environment is difficult.
  • No change tracking: You cannot tell who changed what and when.
  • Scaling limits: You can manually manage 10 servers, but 100 or more is practically impossible.
  • Environment drift: Subtle differences creep in between development, staging, and production environments (Configuration Drift).

Infrastructure as Code (IaC) is a methodology where you define the desired state of your infrastructure in code files, and tools automatically realize that state. Because it is code, you can version-control it with Git, review changes, and deploy through CI/CD pipelines.

IaC Tool Comparison

ToolLanguageApproachState ManagementCloud Support
TerraformHCLDeclarativeSelf-managed State fileMulti-cloud
CloudFormationJSON/YAMLDeclarativeAWS-managedAWS only
PulumiTypeScript/Python/GoImperative + DeclarativePulumi CloudMulti-cloud
AWS CDKTypeScript/Python/GoImperativeCloudFormation stacksAWS only
CrossplaneYAMLDeclarative (K8s CRD)K8s etcdMulti-cloud

Terraform uses a dedicated declarative language called HCL and has the broadest provider ecosystem. It can manage AWS, GCP, Azure, as well as SaaS providers like Datadog, PagerDuty, and GitHub.

CloudFormation is an AWS-native tool with the fastest integration for AWS services. New AWS services are supported on the day they launch.

Pulumi uses general-purpose programming languages, so you can leverage IDE autocompletion, type checking, and unit testing as-is.

AWS CDK adds an abstraction layer on top of CloudFormation, using L2/L3 Constructs to express complex patterns concisely.

Declarative vs Imperative

IaC tools fall into two main approaches.

Declarative: You define the desired end state, and the tool calculates the difference from the current state and applies changes. Terraform and CloudFormation use this approach.

Imperative: You describe the steps to execute in order. Shell scripts, Ansible (partially), and Pulumi are closer to this approach.

Terraform adopts the declarative approach: you define the "what" of your infrastructure, and Terraform figures out the "how."


2. Terraform Fundamentals

Provider

A Provider is a plugin that allows Terraform to communicate with a specific infrastructure platform. Beyond cloud providers like AWS, GCP, and Azure, there are thousands of providers for Kubernetes, Helm, Datadog, GitHub, and more.

terraform {
  required_version = ">= 1.5.0"

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

provider "aws" {
  region = "ap-northeast-2"

  default_tags {
    tags = {
      Environment = "production"
      ManagedBy   = "terraform"
    }
  }
}

The required_providers block specifies the provider source and version constraints. ~> 5.0 means "use the latest version in the 5.x range but do not allow 6.0 or higher."

Resource

A Resource declares an infrastructure object to be managed by Terraform. Each resource is uniquely identified by the combination of its type and local name.

resource "aws_vpc" "main" {
  cidr_block           = "10.0.0.0/16"
  enable_dns_support   = true
  enable_dns_hostnames = true

  tags = {
    Name = "main-vpc"
  }
}

resource "aws_subnet" "public" {
  count             = 2
  vpc_id            = aws_vpc.main.id
  cidr_block        = cidrsubnet(aws_vpc.main.cidr_block, 8, count.index)
  availability_zone = data.aws_availability_zones.available.names[count.index]

  map_public_ip_on_launch = true

  tags = {
    Name = "public-subnet-${count.index + 1}"
  }
}

Resources reference each other using the pattern resource_type.local_name.attribute. In the example above, aws_vpc.main.id references the ID of the VPC resource.

Data Source

A Data Source reads information about resources that already exist outside of Terraform. It is read-only and does not modify infrastructure.

data "aws_availability_zones" "available" {
  state = "available"
}

data "aws_ami" "amazon_linux" {
  most_recent = true
  owners      = ["amazon"]

  filter {
    name   = "name"
    values = ["al2023-ami-*-x86_64"]
  }

  filter {
    name   = "virtualization-type"
    values = ["hvm"]
  }
}

Variable and Output

Variables are a module's input parameters, and Outputs are a module's return values.

# variables.tf
variable "environment" {
  description = "Deployment environment (dev, staging, prod)"
  type        = string
  default     = "dev"

  validation {
    condition     = contains(["dev", "staging", "prod"], var.environment)
    error_message = "environment must be one of: dev, staging, prod."
  }
}

variable "instance_type" {
  description = "EC2 instance type"
  type        = string
  default     = "t3.medium"
}

variable "db_password" {
  description = "Database password"
  type        = string
  sensitive   = true
}
# outputs.tf
output "vpc_id" {
  description = "ID of the created VPC"
  value       = aws_vpc.main.id
}

output "alb_dns_name" {
  description = "DNS name of the ALB"
  value       = aws_lb.main.dns_name
}

3. Advanced HCL Syntax

Block Structure

The basic structure of HCL (HashiCorp Configuration Language) is the block, which consists of a type, labels, and a body.

# block_type "label1" "label2" {
#   attribute = value
# }

resource "aws_instance" "web" {
  ami           = data.aws_ami.amazon_linux.id
  instance_type = var.instance_type

  root_block_device {
    volume_size = 20
    volume_type = "gp3"
  }
}

Type System

HCL supports a rich type system.

# Primitive types
variable "name" {
  type = string
}

variable "port" {
  type = number
}

variable "enabled" {
  type = bool
}

# Collection types
variable "availability_zones" {
  type = list(string)
}

variable "instance_tags" {
  type = map(string)
}

variable "allowed_ports" {
  type = set(number)
}

# Structural type
variable "database_config" {
  type = object({
    engine         = string
    engine_version = string
    instance_class = string
    allocated_storage = number
    multi_az       = bool
  })
}

Conditionals

Conditional expressions use the ternary operator syntax.

resource "aws_instance" "web" {
  ami           = data.aws_ami.amazon_linux.id
  instance_type = var.environment == "prod" ? "t3.large" : "t3.micro"

  monitoring = var.environment == "prod" ? true : false
}

Loops - count and for_each

count is suitable for numeric iteration.

resource "aws_subnet" "private" {
  count = 3

  vpc_id            = aws_vpc.main.id
  cidr_block        = cidrsubnet(aws_vpc.main.cidr_block, 8, count.index + 10)
  availability_zone = data.aws_availability_zones.available.names[count.index]

  tags = {
    Name = "private-subnet-${count.index + 1}"
  }
}

for_each is suitable for map- or set-based iteration and does not shift indices when an item in the middle is deleted.

variable "subnets" {
  type = map(object({
    cidr_block        = string
    availability_zone = string
    public            = bool
  }))
}

resource "aws_subnet" "this" {
  for_each = var.subnets

  vpc_id            = aws_vpc.main.id
  cidr_block        = each.value.cidr_block
  availability_zone = each.value.availability_zone

  map_public_ip_on_launch = each.value.public

  tags = {
    Name = each.key
  }
}

for Expressions

locals {
  # List transformation
  subnet_ids = [for s in aws_subnet.this : s.id]

  # Conditional filtering
  public_subnet_ids = [for k, s in aws_subnet.this : s.id if s.map_public_ip_on_launch]

  # Map transformation
  subnet_id_map = { for k, s in aws_subnet.this : k => s.id }
}

Local Values

Local values define commonly reused values in a single place within a module.

locals {
  common_tags = {
    Project     = var.project_name
    Environment = var.environment
    ManagedBy   = "terraform"
    Team        = "platform"
  }

  name_prefix = "${var.project_name}-${var.environment}"

  is_production = var.environment == "prod"
}

resource "aws_instance" "web" {
  ami           = data.aws_ami.amazon_linux.id
  instance_type = local.is_production ? "t3.large" : "t3.micro"

  tags = merge(local.common_tags, {
    Name = "${local.name_prefix}-web"
    Role = "webserver"
  })
}

4. State Management

What Is terraform.tfstate

Terraform records the current state of managed infrastructure in a State file. The plan command compares the state in the State file with the state declared in code to calculate changes.

State files can contain sensitive information (passwords, keys, etc.), so storing them on the local filesystem is risky.

Remote Backend (S3 + DynamoDB)

In team environments, you should use a remote backend to centrally manage State. On AWS, the S3 + DynamoDB combination is the standard.

terraform {
  backend "s3" {
    bucket         = "my-terraform-state-bucket"
    key            = "prod/vpc/terraform.tfstate"
    region         = "ap-northeast-2"
    encrypt        = true
    dynamodb_table = "terraform-lock"
  }
}

You must enable versioning and server-side encryption on the S3 bucket.

resource "aws_s3_bucket" "terraform_state" {
  bucket = "my-terraform-state-bucket"

  lifecycle {
    prevent_destroy = true
  }
}

resource "aws_s3_bucket_versioning" "terraform_state" {
  bucket = aws_s3_bucket.terraform_state.id

  versioning_configuration {
    status = "Enabled"
  }
}

resource "aws_s3_bucket_server_side_encryption_configuration" "terraform_state" {
  bucket = aws_s3_bucket.terraform_state.id

  rule {
    apply_server_side_encryption_by_default {
      sse_algorithm = "aws:kms"
    }
  }
}

resource "aws_dynamodb_table" "terraform_lock" {
  name         = "terraform-lock"
  billing_mode = "PAY_PER_REQUEST"
  hash_key     = "LockID"

  attribute {
    name = "LockID"
    type = "S"
  }
}

State Locking

The DynamoDB table provides State locking. If two people run terraform apply simultaneously, the first person acquires the lock and the second person waits.

If the lock is not released (due to an abnormal process termination, etc.), you can forcefully release it with terraform force-unlock LOCK_ID. However, you must confirm that no one else is actually working.

State Management Commands

# List resources in state
terraform state list

# Show details for a specific resource
terraform state show aws_vpc.main

# Rename a resource (when you renamed it in code)
terraform state mv aws_vpc.main aws_vpc.primary

# Remove a resource from state (actual infrastructure remains)
terraform state rm aws_instance.temp

# Import existing infrastructure into state
terraform import aws_vpc.existing vpc-0123456789abcdef0

5. Modules

What Are Modules

A module is a package of related resources bundled together. Modules increase code reusability, separate concerns, and enable sharing standardized infrastructure patterns across teams.

Module Directory Structure

modules/
  vpc/
    main.tf
    variables.tf
    outputs.tf
    README.md
  ec2/
    main.tf
    variables.tf
    outputs.tf
  rds/
    main.tf
    variables.tf
    outputs.tf

Writing a Module

# modules/vpc/main.tf
resource "aws_vpc" "this" {
  cidr_block           = var.cidr_block
  enable_dns_support   = true
  enable_dns_hostnames = true

  tags = merge(var.tags, {
    Name = "${var.name_prefix}-vpc"
  })
}

resource "aws_internet_gateway" "this" {
  vpc_id = aws_vpc.this.id

  tags = merge(var.tags, {
    Name = "${var.name_prefix}-igw"
  })
}

resource "aws_subnet" "public" {
  count = length(var.public_subnet_cidrs)

  vpc_id            = aws_vpc.this.id
  cidr_block        = var.public_subnet_cidrs[count.index]
  availability_zone = var.availability_zones[count.index]

  map_public_ip_on_launch = true

  tags = merge(var.tags, {
    Name = "${var.name_prefix}-public-${count.index + 1}"
    Tier = "public"
  })
}

resource "aws_subnet" "private" {
  count = length(var.private_subnet_cidrs)

  vpc_id            = aws_vpc.this.id
  cidr_block        = var.private_subnet_cidrs[count.index]
  availability_zone = var.availability_zones[count.index]

  tags = merge(var.tags, {
    Name = "${var.name_prefix}-private-${count.index + 1}"
    Tier = "private"
  })
}
# modules/vpc/variables.tf
variable "name_prefix" {
  description = "Resource name prefix"
  type        = string
}

variable "cidr_block" {
  description = "VPC CIDR block"
  type        = string
  default     = "10.0.0.0/16"
}

variable "public_subnet_cidrs" {
  description = "List of public subnet CIDRs"
  type        = list(string)
}

variable "private_subnet_cidrs" {
  description = "List of private subnet CIDRs"
  type        = list(string)
}

variable "availability_zones" {
  description = "List of availability zones"
  type        = list(string)
}

variable "tags" {
  description = "Common tags"
  type        = map(string)
  default     = {}
}
# modules/vpc/outputs.tf
output "vpc_id" {
  description = "VPC ID"
  value       = aws_vpc.this.id
}

output "public_subnet_ids" {
  description = "List of public subnet IDs"
  value       = aws_subnet.public[*].id
}

output "private_subnet_ids" {
  description = "List of private subnet IDs"
  value       = aws_subnet.private[*].id
}

Calling a Module

module "vpc" {
  source = "./modules/vpc"

  name_prefix          = "myapp-prod"
  cidr_block           = "10.0.0.0/16"
  public_subnet_cidrs  = ["10.0.1.0/24", "10.0.2.0/24"]
  private_subnet_cidrs = ["10.0.11.0/24", "10.0.12.0/24"]
  availability_zones   = ["ap-northeast-2a", "ap-northeast-2c"]

  tags = local.common_tags
}

# Referencing module outputs
resource "aws_instance" "web" {
  subnet_id = module.vpc.public_subnet_ids[0]
  # ...
}

Terraform Registry Modules

The Terraform Registry hosts community and HashiCorp-verified modules.

module "vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "5.5.0"

  name = "my-vpc"
  cidr = "10.0.0.0/16"

  azs             = ["ap-northeast-2a", "ap-northeast-2c"]
  public_subnets  = ["10.0.1.0/24", "10.0.2.0/24"]
  private_subnets = ["10.0.11.0/24", "10.0.12.0/24"]

  enable_nat_gateway   = true
  single_nat_gateway   = false
  enable_dns_hostnames = true
}

6. Terragrunt

What Is Terragrunt

Terragrunt is a wrapper tool for Terraform that applies the DRY (Don't Repeat Yourself) principle to eliminate configuration duplication. It is particularly useful when managing multiple environments (dev, staging, prod).

Directory Structure

infrastructure/
  terragrunt.hcl              # Root configuration
  environments/
    dev/
      terragrunt.hcl          # Dev environment common settings
      vpc/
        terragrunt.hcl
      ec2/
        terragrunt.hcl
      rds/
        terragrunt.hcl
    staging/
      terragrunt.hcl
      vpc/
        terragrunt.hcl
      ec2/
        terragrunt.hcl
    prod/
      terragrunt.hcl
      vpc/
        terragrunt.hcl
      ec2/
        terragrunt.hcl
      rds/
        terragrunt.hcl
  modules/
    vpc/
    ec2/
    rds/

Root Configuration

# infrastructure/terragrunt.hcl
remote_state {
  backend = "s3"
  generate = {
    path      = "backend.tf"
    if_exists = "overwrite_terragrunt"
  }
  config = {
    bucket         = "my-terraform-state"
    key            = "${path_relative_to_include()}/terraform.tfstate"
    region         = "ap-northeast-2"
    encrypt        = true
    dynamodb_table = "terraform-lock"
  }
}

generate "provider" {
  path      = "provider.tf"
  if_exists = "overwrite_terragrunt"
  contents  = <<EOF
provider "aws" {
  region = "ap-northeast-2"
}
EOF
}

Environment-Specific Configuration

# environments/prod/vpc/terragrunt.hcl
include "root" {
  path = find_in_parent_folders()
}

terraform {
  source = "../../../modules/vpc"
}

inputs = {
  name_prefix          = "myapp-prod"
  cidr_block           = "10.0.0.0/16"
  public_subnet_cidrs  = ["10.0.1.0/24", "10.0.2.0/24"]
  private_subnet_cidrs = ["10.0.11.0/24", "10.0.12.0/24"]
  availability_zones   = ["ap-northeast-2a", "ap-northeast-2c"]
}

Dependency Management

Terragrunt can explicitly declare dependencies between modules.

# environments/prod/ec2/terragrunt.hcl
include "root" {
  path = find_in_parent_folders()
}

terraform {
  source = "../../../modules/ec2"
}

dependency "vpc" {
  config_path = "../vpc"
}

inputs = {
  vpc_id    = dependency.vpc.outputs.vpc_id
  subnet_id = dependency.vpc.outputs.public_subnet_ids[0]
}

Running terragrunt run-all apply creates the VPC first and the EC2 instance after, following the dependency order.


7. Workflow

Core Commands

# Download provider plugins and initialize
terraform init

# Preview changes (no actual changes made)
terraform plan

# Apply changes
terraform apply

# Apply only a specific resource
terraform apply -target=aws_vpc.main

# Destroy all infrastructure
terraform destroy

# Format code
terraform fmt -recursive

# Validate configuration
terraform validate

# Generate dependency graph
terraform graph | dot -Tpng > graph.png

Saving Plan Files

# Save the plan result to a file
terraform plan -out=tfplan

# Apply the saved plan exactly (no additional confirmation)
terraform apply tfplan

Using plan files ensures that even if the code changes between the plan and apply steps, only the planned changes are applied.

CI/CD Integration

Here is an example Terraform CI/CD pipeline using GitHub Actions.

name: Terraform CI/CD

on:
  pull_request:
    paths:
      - 'infrastructure/**'
  push:
    branches:
      - main
    paths:
      - 'infrastructure/**'

jobs:
  plan:
    runs-on: ubuntu-latest
    if: github.event_name == 'pull_request'
    steps:
      - uses: actions/checkout@v4

      - uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: 1.7.0

      - name: Terraform Init
        working-directory: infrastructure
        run: terraform init

      - name: Terraform Format Check
        working-directory: infrastructure
        run: terraform fmt -check -recursive

      - name: Terraform Validate
        working-directory: infrastructure
        run: terraform validate

      - name: Terraform Plan
        working-directory: infrastructure
        run: terraform plan -no-color -out=tfplan

      - name: Comment Plan on PR
        uses: actions/github-script@v7
        with:
          script: |
            const output = `#### Terraform Plan
            \`\`\`
            Plan output here
            \`\`\`
            `;

  apply:
    runs-on: ubuntu-latest
    if: github.ref == 'refs/heads/main' && github.event_name == 'push'
    environment: production
    steps:
      - uses: actions/checkout@v4

      - uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: 1.7.0

      - name: Terraform Init
        working-directory: infrastructure
        run: terraform init

      - name: Terraform Apply
        working-directory: infrastructure
        run: terraform apply -auto-approve

8. Real-World AWS Example - VPC + EC2 + RDS + ALB

Overall Architecture

This example builds the following infrastructure:

  • VPC (2 public subnets + 2 private subnets)
  • Application Load Balancer (ALB)
  • EC2 Instances (Auto Scaling Group)
  • RDS PostgreSQL (Multi-AZ)
  • Security Group chain

VPC and Networking

# vpc.tf
module "vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "5.5.0"

  name = "${local.name_prefix}-vpc"
  cidr = "10.0.0.0/16"

  azs             = ["ap-northeast-2a", "ap-northeast-2c"]
  public_subnets  = ["10.0.1.0/24", "10.0.2.0/24"]
  private_subnets = ["10.0.11.0/24", "10.0.12.0/24"]

  enable_nat_gateway   = true
  single_nat_gateway   = true
  enable_dns_hostnames = true

  public_subnet_tags = {
    Tier = "public"
  }

  private_subnet_tags = {
    Tier = "private"
  }

  tags = local.common_tags
}

Security Groups

# security_groups.tf
resource "aws_security_group" "alb" {
  name_prefix = "${local.name_prefix}-alb-"
  vpc_id      = module.vpc.vpc_id

  ingress {
    from_port   = 80
    to_port     = 80
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  ingress {
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = merge(local.common_tags, {
    Name = "${local.name_prefix}-alb-sg"
  })

  lifecycle {
    create_before_destroy = true
  }
}

resource "aws_security_group" "app" {
  name_prefix = "${local.name_prefix}-app-"
  vpc_id      = module.vpc.vpc_id

  ingress {
    from_port       = 8080
    to_port         = 8080
    protocol        = "tcp"
    security_groups = [aws_security_group.alb.id]
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = merge(local.common_tags, {
    Name = "${local.name_prefix}-app-sg"
  })

  lifecycle {
    create_before_destroy = true
  }
}

resource "aws_security_group" "rds" {
  name_prefix = "${local.name_prefix}-rds-"
  vpc_id      = module.vpc.vpc_id

  ingress {
    from_port       = 5432
    to_port         = 5432
    protocol        = "tcp"
    security_groups = [aws_security_group.app.id]
  }

  tags = merge(local.common_tags, {
    Name = "${local.name_prefix}-rds-sg"
  })

  lifecycle {
    create_before_destroy = true
  }
}

ALB

# alb.tf
resource "aws_lb" "main" {
  name               = "${local.name_prefix}-alb"
  internal           = false
  load_balancer_type = "application"
  security_groups    = [aws_security_group.alb.id]
  subnets            = module.vpc.public_subnets

  tags = local.common_tags
}

resource "aws_lb_target_group" "app" {
  name     = "${local.name_prefix}-app-tg"
  port     = 8080
  protocol = "HTTP"
  vpc_id   = module.vpc.vpc_id

  health_check {
    path                = "/health"
    healthy_threshold   = 2
    unhealthy_threshold = 3
    timeout             = 5
    interval            = 30
  }

  tags = local.common_tags
}

resource "aws_lb_listener" "http" {
  load_balancer_arn = aws_lb.main.arn
  port              = 80
  protocol          = "HTTP"

  default_action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.app.arn
  }
}

Auto Scaling Group

# asg.tf
resource "aws_launch_template" "app" {
  name_prefix   = "${local.name_prefix}-app-"
  image_id      = data.aws_ami.amazon_linux.id
  instance_type = var.instance_type

  vpc_security_group_ids = [aws_security_group.app.id]

  user_data = base64encode(<<-SCRIPT
    #!/bin/bash
    yum update -y
    yum install -y docker
    systemctl start docker
    systemctl enable docker
    docker run -d -p 8080:8080 myapp:latest
    SCRIPT
  )

  tag_specifications {
    resource_type = "instance"
    tags = merge(local.common_tags, {
      Name = "${local.name_prefix}-app"
    })
  }
}

resource "aws_autoscaling_group" "app" {
  name                = "${local.name_prefix}-app-asg"
  desired_capacity    = 2
  max_size            = 4
  min_size            = 1
  target_group_arns   = [aws_lb_target_group.app.arn]
  vpc_zone_identifier = module.vpc.private_subnets

  launch_template {
    id      = aws_launch_template.app.id
    version = "$Latest"
  }

  tag {
    key                 = "Name"
    value               = "${local.name_prefix}-app"
    propagate_at_launch = true
  }
}

RDS

# rds.tf
resource "aws_db_subnet_group" "main" {
  name       = "${local.name_prefix}-db-subnet"
  subnet_ids = module.vpc.private_subnets

  tags = local.common_tags
}

resource "aws_db_instance" "main" {
  identifier     = "${local.name_prefix}-postgres"
  engine         = "postgres"
  engine_version = "15.4"
  instance_class = "db.t3.medium"

  allocated_storage     = 20
  max_allocated_storage = 100
  storage_encrypted     = true

  db_name  = "myapp"
  username = "admin"
  password = var.db_password

  multi_az               = true
  db_subnet_group_name   = aws_db_subnet_group.main.name
  vpc_security_group_ids = [aws_security_group.rds.id]

  backup_retention_period = 7
  skip_final_snapshot     = false
  final_snapshot_identifier = "${local.name_prefix}-final-snapshot"

  tags = local.common_tags
}

9. Security

Sensitive Variables

Variables marked as sensitive are masked in plan/apply output.

variable "db_password" {
  type      = string
  sensitive = true
}

output "db_endpoint" {
  value = aws_db_instance.main.endpoint
}

output "db_password" {
  value     = var.db_password
  sensitive = true
}

Managing Sensitive Values

  1. Environment variables: Inject via the TF_VAR_db_password environment variable.
  2. terraform.tfvars file: Add to .gitignore to exclude from version control.
  3. Vault integration: Dynamically fetch secrets from HashiCorp Vault.
# Fetch DB password from Vault
data "vault_generic_secret" "db" {
  path = "secret/data/production/database"
}

resource "aws_db_instance" "main" {
  password = data.vault_generic_secret.db.data["password"]
  # ...
}
  1. AWS Secrets Manager integration
data "aws_secretsmanager_secret_version" "db_password" {
  secret_id = "production/database/password"
}

resource "aws_db_instance" "main" {
  password = data.aws_secretsmanager_secret_version.db_password.secret_string
  # ...
}

Policy Enforcement - OPA (Open Policy Agent)

OPA can be used to enforce policies on Terraform plans.

# policy/terraform.rego
package terraform

deny[msg] {
  resource := input.resource_changes[_]
  resource.type == "aws_s3_bucket"
  not resource.change.after.server_side_encryption_configuration
  msg := "S3 buckets must have encryption configured."
}

deny[msg] {
  resource := input.resource_changes[_]
  resource.type == "aws_security_group_rule"
  resource.change.after.cidr_blocks[_] == "0.0.0.0/0"
  resource.change.after.from_port == 22
  msg := "SSH (port 22) cannot be opened to the entire internet (0.0.0.0/0)."
}
# Output plan as JSON
terraform plan -out=tfplan
terraform show -json tfplan > tfplan.json

# Run policy check with OPA
opa eval --data policy/ --input tfplan.json "data.terraform.deny"

Static Analysis with tfsec / Trivy

# Security scan with tfsec
tfsec .

# IaC scan with Trivy
trivy config .

10. Best Practices

Directory Structure

project/
  environments/
    dev/
      main.tf
      variables.tf
      terraform.tfvars
      backend.tf
    staging/
      main.tf
      variables.tf
      terraform.tfvars
      backend.tf
    prod/
      main.tf
      variables.tf
      terraform.tfvars
      backend.tf
  modules/
    vpc/
    ec2/
    rds/
    alb/
  global/
    iam/
    route53/

Naming Conventions

  • Resource names: Use snake_case (aws_security_group.web_server)
  • Variable names: Use snake_case (instance_type, db_password)
  • File names: Separate by resource type (vpc.tf, ec2.tf, rds.tf)
  • Tag values: environment-service-role pattern (prod-myapp-web)

Code Review Checklist

  1. Security: Are secrets hardcoded? Are Security Groups overly permissive?
  2. Cost: Are instance types appropriate? Are there unused resources?
  3. Availability: Is Multi-AZ applied? Is Auto Scaling configured?
  4. State management: Is the State key appropriate? Is the remote backend configured?
  5. Modularization: Can duplicate code be extracted into modules?
  6. Tagging: Do all resources have required tags?

Common Mistakes and Solutions

MistakeSolution
Committing State file to GitAdd to .gitignore, use remote backend
Hardcoded secretsUse sensitive variables, Vault, Secrets Manager
Deleting middle items from count-created resourcesUse for_each instead
Not pinning provider versionsSpecify versions in required_providers
Applying without planningRequire plan review in CI/CD
Writing all code in a single file with no modulesSeparate into modules, split files by concern

Conclusion

Terraform and IaC are the cornerstones of modern cloud infrastructure management. Defining infrastructure as code makes it reproducible, change-trackable, and improvable through code review. Here are the key principles:

  1. Define infrastructure with declarative code and version-control it with Git.
  2. Use remote backends to safely manage State, and locking to prevent concurrent modifications.
  3. Use modules to reuse code and establish team standards.
  4. CI/CD pipelines enable plan review and automated deployment.
  5. Policy enforcement automates security and compliance.

Use the examples in this guide as a starting point and adapt them to fit your own project.