Chaos and Order

💡 왼쪽 원문을 읽으면서 오른쪽에 따라 써보세요. Tab 키로 힌트를 받을 수 있습니다.

Introduction: Why Test at All?

The cost of software bugs increases exponentially the later they are discovered. If fixing a defect during development costs 1, it costs 10 during QA and 100 or more in production. Research from the IBM Systems Sciences Institute found that production defects can cost up to 100 times more to fix than those caught during design.

Testing is not simply about catching bugs. It improves code design, provides a safety net for refactoring, and serves as a communication tool across teams. Well-written tests are living documentation and system specifications.

The ROI of Testing

Some teams view the time spent writing tests as a cost. In reality, the opposite is true.

Reduced debugging time: In a codebase with strong test coverage, root cause analysis is fast. You simply look at which test failed.

Refactoring confidence: Modifying legacy code without tests is like performing surgery blindfolded. With tests in place, you can restructure code boldly.

Faster deployments: An automated test suite dramatically reduces manual QA time. Quality is verified automatically in CI/CD pipelines.

Documentation effect: Test code serves as usage examples and behavioral specifications. It is the fastest way for new team members to understand the system.

1. The Test Pyramid

The test pyramid, proposed by Mike Cohn, visually represents the recommended proportion of each test type.

Pyramid Structure

        /  E2E   \          ~10%  Slow but validates user perspective
       / Integration \      ~20%  Verifies component interactions
      /    Unit       \    ~70%  Fast and isolated unit verification

Unit tests (70%): At the base of the pyramid, these verify individual functions, methods, and classes in isolation. They run fast and are easy to write.

Integration tests (20%): In the middle layer, these verify that multiple components work together correctly. They test API endpoints, database queries, and external service integrations.

E2E tests (10%): At the top, these validate entire user scenarios from end to end. They test full workflows through browser automation.

The Inverted Pyramid Anti-Pattern

Many teams fall into the inverted pyramid (ice cream cone) pattern, relying excessively on E2E tests.

      \      E2E      /    ~70%  Slow and unstable
       \ Integration /     ~20%
        \   Unit   /       ~10%  Nearly absent

The problems with this pattern are clear:

E2E tests are slow, lengthening the feedback loop
Flaky tests occur frequently
Root cause analysis is difficult
Maintenance costs grow exponentially

The correct approach is to build a solid foundation from the bottom of the pyramid. Protect core logic with unit tests, verify integration points with integration tests, and confirm only critical user journeys with E2E tests.

2. Unit Tests

Unit tests verify the smallest units of code in isolation.

The FIRST Principles for Good Unit Tests

Fast: Tests should execute in milliseconds. Slow unit tests discourage developers from running them frequently.

Isolated: Tests must not depend on other tests or external systems (databases, networks, file systems).

Repeatable: They must return the same result regardless of environment or how many times they run.

Self-validating: Results are automatically determined as pass or fail. No manual inspection required.

Timely: Written before or immediately after the production code.

Mocking Strategy

Mocking isolates external dependencies. However, excessive mocking couples tests to implementation details.

# Python - pytest + unittest.mock
from unittest.mock import Mock, patch
import pytest

class PaymentService:
    def __init__(self, gateway):
        self.gateway = gateway

    def charge(self, amount, card_token):
        if amount <= 0:
            raise ValueError("Amount must be positive")
        return self.gateway.process_payment(amount, card_token)

class TestPaymentService:
    def test_charge_positive_amount(self):
        # Arrange
        mock_gateway = Mock()
        mock_gateway.process_payment.return_value = "txn_123"
        service = PaymentService(mock_gateway)

        # Act
        result = service.charge(100, "card_abc")

        # Assert
        assert result == "txn_123"
        mock_gateway.process_payment.assert_called_once_with(100, "card_abc")

    def test_charge_negative_amount_raises(self):
        mock_gateway = Mock()
        service = PaymentService(mock_gateway)

        with pytest.raises(ValueError, match="Amount must be positive"):
            service.charge(-50, "card_abc")

        mock_gateway.process_payment.assert_not_called()

// JavaScript - Jest
describe('PaymentService', () => {
  it('should charge positive amount', async () => {
    const mockGateway = {
      processPayment: jest.fn().mockResolvedValue('txn_123'),
    };
    const service = new PaymentService(mockGateway);

    const result = await service.charge(100, 'card_abc');

    expect(result).toBe('txn_123');
    expect(mockGateway.processPayment).toHaveBeenCalledWith(100, 'card_abc');
  });

  it('should reject negative amount', async () => {
    const mockGateway = { processPayment: jest.fn() };
    const service = new PaymentService(mockGateway);

    await expect(service.charge(-50, 'card_abc'))
      .rejects.toThrow('Amount must be positive');
    expect(mockGateway.processPayment).not.toHaveBeenCalled();
  });
});

The Coverage Trap

Aiming for 100% code coverage is a misguided goal. What matters is test quality, not the coverage number.

Meaningful coverage: Validates business logic, boundary conditions, and error handling.

Meaningless coverage: Testing getters/setters, simple delegation methods, and framework code is a waste of time.

A practical target is 80%+ for core business logic and 60-80% overall. Tests with meaningful assertions are far more valuable than chasing 100% coverage.

3. Integration Tests

Integration tests verify that multiple components work together correctly. They test interactions with real databases, message queues, and external APIs.

API Testing

REST API testing is a classic example of integration testing.

# Python - FastAPI + pytest + httpx
import pytest
from httpx import AsyncClient, ASGITransport
from app.main import app

@pytest.mark.asyncio
async def test_create_user():
    transport = ASGITransport(app=app)
    async with AsyncClient(transport=transport, base_url="http://test") as client:
        response = await client.post("/users", json={
            "name": "Alice",
            "email": "alice@example.com"
        })

    assert response.status_code == 201
    data = response.json()
    assert data["name"] == "Alice"
    assert "id" in data

@pytest.mark.asyncio
async def test_create_user_duplicate_email():
    transport = ASGITransport(app=app)
    async with AsyncClient(transport=transport, base_url="http://test") as client:
        await client.post("/users", json={
            "name": "Alice",
            "email": "alice@example.com"
        })
        response = await client.post("/users", json={
            "name": "Bob",
            "email": "alice@example.com"
        })

    assert response.status_code == 409

Testcontainers

Testcontainers is a library that uses Docker containers to provide real databases, Redis, Kafka, and other infrastructure for testing. By using real services instead of mocks, tests are more reliable.

// Java - Testcontainers + JUnit 5
@Testcontainers
@SpringBootTest
class UserRepositoryIntegrationTest {

    @Container
    static PostgreSQLContainer<?> postgres = new PostgreSQLContainer<>("postgres:16")
        .withDatabaseName("testdb")
        .withUsername("test")
        .withPassword("test");

    @DynamicPropertySource
    static void configureProperties(DynamicPropertyRegistry registry) {
        registry.add("spring.datasource.url", postgres::getJdbcUrl);
        registry.add("spring.datasource.username", postgres::getUsername);
        registry.add("spring.datasource.password", postgres::getPassword);
    }

    @Autowired
    private UserRepository userRepository;

    @Test
    void shouldSaveAndFindUser() {
        User user = new User("Alice", "alice@example.com");
        userRepository.save(user);

        Optional<User> found = userRepository.findByEmail("alice@example.com");

        assertThat(found).isPresent();
        assertThat(found.get().getName()).isEqualTo("Alice");
    }

    @Test
    void shouldReturnEmptyForNonExistentUser() {
        Optional<User> found = userRepository.findByEmail("nobody@example.com");
        assertThat(found).isEmpty();
    }
}

# Python - testcontainers
from testcontainers.postgres import PostgresContainer
import psycopg2

def test_user_crud():
    with PostgresContainer("postgres:16") as postgres:
        conn = psycopg2.connect(
            host=postgres.get_container_host_ip(),
            port=postgres.get_exposed_port(5432),
            user=postgres.username,
            password=postgres.password,
            dbname=postgres.dbname,
        )
        cursor = conn.cursor()

        cursor.execute("""
            CREATE TABLE users (
                id SERIAL PRIMARY KEY,
                name VARCHAR(100),
                email VARCHAR(100) UNIQUE
            )
        """)
        cursor.execute(
            "INSERT INTO users (name, email) VALUES (%s, %s) RETURNING id",
            ("Alice", "alice@example.com")
        )
        user_id = cursor.fetchone()[0]

        cursor.execute("SELECT name FROM users WHERE id = %s", (user_id,))
        assert cursor.fetchone()[0] == "Alice"

        conn.commit()
        conn.close()

Database Testing Strategies

Key considerations for database testing:

Test isolation: Each test must be independent. Use transaction rollbacks or data initialization before each test.

Migration verification: Verify that schema changes are compatible with existing data.

Seed data management: Manage baseline data for tests systematically. Use Factory patterns or Fixtures.

4. E2E Tests

E2E (End-to-End) tests verify the entire system from the actual user's perspective.

Playwright

Playwright is a browser automation framework developed by Microsoft, supporting Chromium, Firefox, and WebKit.

// Playwright test example
import { test, expect } from '@playwright/test';

test.describe('User login flow', () => {
  test('login with valid credentials', async ({ page }) => {
    await page.goto('/login');

    await page.getByLabel('Email').fill('user@example.com');
    await page.getByLabel('Password').fill('SecurePass123');
    await page.getByRole('button', { name: 'Sign In' }).click();

    await expect(page).toHaveURL('/dashboard');
    await expect(page.getByText('Welcome')).toBeVisible();
  });

  test('login fails with wrong password', async ({ page }) => {
    await page.goto('/login');

    await page.getByLabel('Email').fill('user@example.com');
    await page.getByLabel('Password').fill('wrong');
    await page.getByRole('button', { name: 'Sign In' }).click();

    await expect(page.getByText('Invalid email or password')).toBeVisible();
    await expect(page).toHaveURL('/login');
  });
});

Cypress

Cypress is a JavaScript E2E testing framework known for its excellent developer experience.

// Cypress test example
describe('Shopping Cart', () => {
  beforeEach(() => {
    cy.visit('/products');
  });

  it('adds product to cart', () => {
    cy.get('[data-testid="product-card"]').first().within(() => {
      cy.get('[data-testid="add-to-cart"]').click();
    });

    cy.get('[data-testid="cart-badge"]').should('have.text', '1');
    cy.get('[data-testid="cart-icon"]').click();
    cy.get('[data-testid="cart-item"]').should('have.length', 1);
  });

  it('updates quantity in cart', () => {
    cy.get('[data-testid="product-card"]').first()
      .find('[data-testid="add-to-cart"]').click();
    cy.get('[data-testid="cart-icon"]').click();

    cy.get('[data-testid="quantity-increase"]').click();
    cy.get('[data-testid="quantity-input"]').should('have.value', '2');
  });
});

Dealing with Flaky Tests

Flaky tests pass sometimes and fail other times with the same code. They are especially common in E2E tests.

Causes and solutions:

Timing issues: Use explicit waits instead of sleep. Wait for specific conditions, not arbitrary durations
Test interdependence: Make each test independent. Remove shared state
Network instability: Mock API responses or add retry logic
Dynamic data: Fix test data or use pattern matching

// Playwright - anti-flaky patterns
test('displays list after data loads', async ({ page }) => {
  await page.goto('/users');

  // Bad: fixed time wait
  // await page.waitForTimeout(3000);

  // Good: wait for specific element to appear
  await page.waitForSelector('[data-testid="user-list"]');
  const users = page.getByTestId('user-item');
  await expect(users).toHaveCount(10);
});

5. TDD (Test-Driven Development)

TDD, systematized by Kent Beck, is a development methodology where you write the test first, then write the code that makes the test pass.

The Red-Green-Refactor Cycle

Red (Fail): Write a test for functionality that does not yet exist. It will naturally fail.

Green (Pass): Write the minimal code to make the test pass. Elegance is not a concern yet.

Refactor (Improve): Improve the code while keeping the tests green. Remove duplication, improve naming, and restructure.

Practical Example: Password Validator

# Step 1: Red - write a failing test
def test_password_minimum_length():
    validator = PasswordValidator()
    assert validator.validate("short") == False

def test_password_requires_uppercase():
    validator = PasswordValidator()
    assert validator.validate("longpassword1") == False

def test_password_requires_number():
    validator = PasswordValidator()
    assert validator.validate("LongPassword") == False

def test_valid_password():
    validator = PasswordValidator()
    assert validator.validate("SecurePass123") == True

# Step 2: Green - write code to pass the tests
class PasswordValidator:
    def validate(self, password: str) -> bool:
        if len(password) < 8:
            return False
        if not any(c.isupper() for c in password):
            return False
        if not any(c.isdigit() for c in password):
            return False
        return True

# Step 3: Refactor - improve the code
from dataclasses import dataclass
from typing import List, Callable

@dataclass
class ValidationRule:
    check: Callable[[str], bool]
    message: str

class PasswordValidator:
    def __init__(self):
        self.rules: List[ValidationRule] = [
            ValidationRule(
                check=lambda p: len(p) >= 8,
                message="Password must be at least 8 characters"
            ),
            ValidationRule(
                check=lambda p: any(c.isupper() for c in p),
                message="Password must contain uppercase letter"
            ),
            ValidationRule(
                check=lambda p: any(c.isdigit() for c in p),
                message="Password must contain a digit"
            ),
        ]

    def validate(self, password: str) -> bool:
        return all(rule.check(password) for rule in self.rules)

    def get_errors(self, password: str) -> List[str]:
        return [
            rule.message
            for rule in self.rules
            if not rule.check(password)
        ]

Core TDD Principles

Fail only one test at a time: Never write multiple tests at once
Start with the simplest implementation: Hard-coding is fine at first
Let tests drive the code: Never write production code without a test
Take small steps: Small iterations are safer than big leaps

6. BDD (Behavior-Driven Development)

BDD, proposed by Dan North, describes business requirements in near-natural language and transforms them into executable tests.

The Given-When-Then Pattern

BDD scenarios are described in three parts:

Given (precondition): Describes the test's starting state.

When (action): Describes the behavior being tested.

Then (outcome): Describes the expected result.

Cucumber Example

# features/login.feature
Feature: User Login
  Users can log into the system with valid credentials

  Scenario: Successful login
    Given a user registered with email "user@example.com"
    And the password is "SecurePass123"
    When the user logs in with valid credentials on the login page
    Then the user is redirected to the dashboard page
    And a "Welcome" message is displayed

  Scenario: Failed login with wrong password
    Given a user registered with email "user@example.com"
    When the user logs in with incorrect password "wrongpass"
    Then the user remains on the login page
    And an error message is displayed

  Scenario Outline: Password policy validation
    When registration is attempted with password "<password>"
    Then the result is "<result>"

    Examples:
      | password       | result  |
      | short          | failure |
      | NoDigitHere    | failure |
      | secure123Pass  | success |

# step_definitions/login_steps.py
from behave import given, when, then

@given('a user registered with email "{email}"')
def step_user_exists(context, email):
    context.user = create_test_user(email=email)

@when('the user logs in with valid credentials on the login page')
def step_login_with_valid_credentials(context):
    context.response = context.client.post('/login', json={
        'email': context.user.email,
        'password': context.user.password,
    })

@then('the user is redirected to the dashboard page')
def step_redirected_to_dashboard(context):
    assert context.response.headers['Location'] == '/dashboard'

The Value of BDD

BDD serves as a communication tool between technical and business teams. Gherkin-format scenarios are readable even by non-developers. This reduces misunderstanding of requirements and provides clear acceptance criteria.

7. Performance Testing

Performance testing verifies how a system behaves under load.

Types of Performance Tests

Load Test: Verifies that the system operates normally under expected traffic levels.

Stress Test: Gradually increases load to find the system's breaking point.

Spike Test: Verifies how the system reacts to sudden traffic surges.

Soak Test: Applies sustained load over an extended period to detect issues like memory leaks.

Load Testing with k6

k6 is a modern load testing tool developed by Grafana Labs. Tests are written in JavaScript and run from the CLI.

// load-test.js - k6 load test
import http from 'k6/http';
import { check, sleep } from 'k6';
import { Rate, Trend } from 'k6/metrics';

const errorRate = new Rate('errors');
const responseTime = new Trend('response_time');

export const options = {
  stages: [
    { duration: '2m', target: 100 },   // Ramp up to 100 users over 2 min
    { duration: '5m', target: 100 },   // Hold 100 users for 5 min
    { duration: '2m', target: 200 },   // Ramp up to 200 users over 2 min
    { duration: '5m', target: 200 },   // Hold 200 users for 5 min
    { duration: '2m', target: 0 },     // Ramp down to 0 over 2 min
  ],
  thresholds: {
    http_req_duration: ['p(95)<500'],   // 95% of requests under 500ms
    errors: ['rate<0.01'],              // Error rate below 1%
  },
};

export default function () {
  const loginRes = http.post('https://api.example.com/login', JSON.stringify({
    username: 'testuser',
    password: 'testpass',
  }), {
    headers: { 'Content-Type': 'application/json' },
  });

  const success = check(loginRes, {
    'status is 200': (r) => r.status === 200,
    'response time < 500ms': (r) => r.timings.duration < 500,
    'has auth token': (r) => r.json('token') !== undefined,
  });

  errorRate.add(!success);
  responseTime.add(loginRes.timings.duration);

  if (loginRes.status === 200) {
    const token = loginRes.json('token');
    const profileRes = http.get('https://api.example.com/profile', {
      headers: { Authorization: `Bearer ${token}` },
    });

    check(profileRes, {
      'profile status is 200': (r) => r.status === 200,
    });
  }

  sleep(1);
}

JMeter

Apache JMeter is a performance testing tool with a long history. Its GUI-based test plan editor makes it accessible to non-developers. However, compared to k6, it consumes more resources and scripting can be cumbersome.

k6 vs JMeter comparison:

Aspect	k6	JMeter
Scripting language	JavaScript	XML (GUI)
Resource efficiency	Go-based, lightweight	Java-based, heavy
CI/CD integration	CLI-based, easy	Possible but complex
Protocol support	HTTP, WebSocket, gRPC	Very broad
Distributed execution	k6 Cloud	Requires setup
Learning curve	Low	Medium

8. Security Testing

Security testing is the process of proactively discovering and addressing application vulnerabilities.

OWASP ZAP

OWASP ZAP (Zed Attack Proxy) is an open-source web application security scanner. It can be integrated into CI/CD pipelines for automated security scanning.

# Running ZAP scan in GitHub Actions
name: Security Scan
on:
  pull_request:
    branches: [main]

jobs:
  zap-scan:
    runs-on: ubuntu-latest
    steps:
      - name: Start Application
        run: docker compose up -d

      - name: ZAP Baseline Scan
        uses: zaproxy/action-baseline@v0.12.0
        with:
          target: 'http://localhost:3000'
          rules_file_name: '.zap-rules.tsv'
          cmd_options: '-a -j -l WARN'

      - name: Upload Report
        uses: actions/upload-artifact@v4
        if: always()
        with:
          name: zap-report
          path: report_html.html

Fuzzing

Fuzzing injects random or semi-random input into a program to trigger unexpected behavior or crashes.

# Python - Property-based testing with Hypothesis
from hypothesis import given, strategies as st
import json

@given(st.text())
def test_json_roundtrip(s):
    """Any string should survive a JSON encode/decode round trip"""
    encoded = json.dumps(s)
    decoded = json.loads(encoded)
    assert decoded == s

@given(st.integers(min_value=0, max_value=1000))
def test_discount_never_exceeds_price(price):
    """Price after discount must never be negative"""
    discount = calculate_discount(price, percentage=50)
    assert discount >= 0
    assert discount <= price

Dependency Vulnerability Scanning

Detecting known vulnerabilities in third-party libraries is a critical part of security testing.

# GitHub Actions - dependency vulnerability scanning
- name: Run Trivy vulnerability scanner
  uses: aquasecurity/trivy-action@master
  with:
    scan-type: 'fs'
    scan-ref: '.'
    format: 'sarif'
    output: 'trivy-results.sarif'
    severity: 'CRITICAL,HIGH'

- name: npm audit
  run: npm audit --audit-level=high

- name: Python safety check
  run: pip install safety && safety check

9. Chaos Engineering

Chaos engineering is an experimental methodology that intentionally injects failures into production environments to verify system resilience. Netflix pioneered the practice, and it is now adopted by many large-scale services.

Principles of Chaos Engineering

Establish a hypothesis about steady state: Define what normal system behavior looks like
Simulate real-world events: Inject server failures, network latency, disk exhaustion, and more
Experiment in production: True resilience can only be verified in real environments, not staging
Minimize the blast radius: Limit the impact of experiments on users
Automate: Run experiments continuously

Chaos Monkey

Chaos Monkey, developed by Netflix, randomly terminates production instances. This verifies that the service continues to operate normally even when individual instances go down.

Netflix also maintains the broader Simian Army toolkit, which includes Latency Monkey (injecting network delays), Conformity Monkey (terminating non-standard instances), and Chaos Gorilla (simulating entire availability zone failures).

LitmusChaos

LitmusChaos is a chaos engineering platform designed for Kubernetes environments. As a CNCF project, it is widely used in cloud-native ecosystems.

# LitmusChaos - Pod deletion experiment
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
  name: pod-delete-chaos
  namespace: production
spec:
  appinfo:
    appns: 'production'
    applabel: 'app=payment-service'
    appkind: 'deployment'
  engineState: 'active'
  chaosServiceAccount: litmus-admin
  experiments:
    - name: pod-delete
      spec:
        components:
          env:
            - name: TOTAL_CHAOS_DURATION
              value: '30'
            - name: CHAOS_INTERVAL
              value: '10'
            - name: FORCE
              value: 'false'
        probe:
          - name: payment-health-check
            type: httpProbe
            httpProbe/inputs:
              url: 'http://payment-service:8080/health'
              method:
                get:
                  criteria: '=='
                  responseCode: '200'
            mode: Continuous
            runProperties:
              probeTimeout: 5
              interval: 2
              retry: 3

# LitmusChaos - Network latency experiment
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
  name: network-delay-chaos
spec:
  appinfo:
    appns: 'production'
    applabel: 'app=order-service'
    appkind: 'deployment'
  engineState: 'active'
  chaosServiceAccount: litmus-admin
  experiments:
    - name: pod-network-latency
      spec:
        components:
          env:
            - name: NETWORK_LATENCY
              value: '2000'
            - name: TOTAL_CHAOS_DURATION
              value: '60'
            - name: NETWORK_INTERFACE
              value: 'eth0'

Game Day

A Game Day is a structured exercise where the entire team participates in executing failure scenarios and practicing their response.

Game Day process:

Plan: Select the failure scenario to test (e.g., primary database failure)
Hypothesize: Define expected system behavior during the failure
Execute: Inject the failure and observe the system's response
Observe: Monitor dashboards, alerts, and logs in real time
Analyze: Compare the hypothesis with actual results and identify improvements
Improve: Fix discovered weaknesses and plan the next Game Day

Example Game Day scenarios:

Does the primary database failover complete within 30 seconds?
When one service goes down, do dependent services fall back appropriately?
If the CDN fails, can the origin servers handle the traffic?
Does automatic rollback work when a deployment needs to be reverted?

10. Test Automation and CI/CD Integration

All testing strategies realize their value when integrated into CI/CD pipelines.

Test Pipeline Design

# GitHub Actions - full test pipeline
name: Test Pipeline
on:
  push:
    branches: [main, develop]
  pull_request:
    branches: [main]

jobs:
  unit-tests:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Run Unit Tests
        run: |
          npm ci
          npm run test:unit -- --coverage
      - name: Upload Coverage
        uses: codecov/codecov-action@v4

  integration-tests:
    runs-on: ubuntu-latest
    needs: unit-tests
    services:
      postgres:
        image: postgres:16
        env:
          POSTGRES_DB: testdb
          POSTGRES_USER: test
          POSTGRES_PASSWORD: test
        ports:
          - 5432:5432
    steps:
      - uses: actions/checkout@v4
      - name: Run Integration Tests
        run: |
          npm ci
          npm run test:integration
        env:
          DATABASE_URL: postgresql://test:test@localhost:5432/testdb

  e2e-tests:
    runs-on: ubuntu-latest
    needs: integration-tests
    steps:
      - uses: actions/checkout@v4
      - name: Install Playwright
        run: npx playwright install --with-deps
      - name: Run E2E Tests
        run: npm run test:e2e
      - name: Upload Test Report
        uses: actions/upload-artifact@v4
        if: always()
        with:
          name: playwright-report
          path: playwright-report/

  security-scan:
    runs-on: ubuntu-latest
    needs: unit-tests
    steps:
      - uses: actions/checkout@v4
      - name: Run Security Scan
        uses: aquasecurity/trivy-action@master
        with:
          scan-type: 'fs'
          severity: 'CRITICAL,HIGH'

  performance-test:
    runs-on: ubuntu-latest
    needs: integration-tests
    if: github.ref == 'refs/heads/main'
    steps:
      - uses: actions/checkout@v4
      - name: Run k6 Load Test
        uses: grafana/k6-action@v0.3.1
        with:
          filename: tests/performance/load-test.js

Testing Strategy Matrix

Test Type	When to Run	Frequency	Duration	Blocking
Unit tests	Every PR/commit	Very frequent	Seconds-minutes	Must block
Integration tests	Every PR	Frequent	Minutes	Must block
E2E tests	PR, merge	Daily	Minutes-hours	Block critical flows
Performance tests	Post-merge	Weekly	Hours	On threshold breach
Security tests	PR, nightly	Daily	Minutes	Block CRITICAL
Chaos tests	Game Day	Monthly	Hours	Non-blocking (observe)

Conclusion: Building a Testing Culture Is What Matters Most

Tools and techniques are important, but the most critical factor is building a culture where testing is considered obvious.

Do not treat tests as technical debt. The time spent writing tests is an investment, not a cost.

Make tests a mandatory part of code reviews. New features without tests should not pass review.

Never leave failing tests unaddressed. Broken tests left in place erode trust in the test suite.

Invest in testing infrastructure. Allocate time and resources to CI/CD pipelines, test environments, and test data management.

Testing is not the last line of defense for software quality but the first line of attack. A well-crafted testing strategy increases development speed, reduces bugs, and builds team confidence. Start with unit tests and gradually expand your testing scope.