Split View: [컴파일러] 14. 데이터 흐름 분석 프레임워크

[컴파일러] 14. 데이터 흐름 분석 프레임워크

개요

데이터 흐름 분석(Data-Flow Analysis)은 컴파일러 최적화의 핵심 기반 기술입니다. 프로그램의 각 지점에서 특정 성질에 대한 정보를 수집하여, 안전한 최적화가 가능한지 판단합니다.

이번 글에서는 데이터 흐름 분석의 기본 개념, 세 가지 대표적인 분석(도달 정의, 활성 변수, 가용 수식), 반복 알고리즘, 그리고 격자 이론의 기초를 다룹니다.

1. 데이터 흐름 분석의 기본 개념

1.1 분석의 목표

프로그램의 각 지점(point)에서 다음과 같은 질문에 답합니다:

변수 x의 값은 어디서 정의되었는가?
변수 x의 값이 이후에 사용되는가?
수식 a + b가 이미 계산되어 사용 가능한가?

1.2 프로그램 지점

기본 블록의 각 명령어 앞과 뒤를 프로그램 지점이라 합니다.

// 블록 B:
//  [지점 1] d1: x = a + b [지점 2]
//  [지점 2] d2: y = x * c [지점 3]
//
// IN[B] = 지점 1에서의 정보
// OUT[B] = 지점 3에서의 정보

1.3 전달 함수 (Transfer Function)

각 기본 블록 B에 대해 전달 함수 f_B는 입력 정보를 출력 정보로 변환합니다:

OUT[B] = f_B(IN[B])

대부분의 데이터 흐름 분석에서 전달 함수는 gen/kill 형태입니다:

OUT[B] = gen[B] U (IN[B] - kill[B])

gen[B]: 블록 B에서 새로 생성되는 정보
kill[B]: 블록 B에서 무효화되는 정보

1.4 합류 연산 (Meet/Join Operation)

한 블록에 여러 선행 블록이 있을 때, 각 선행 블록의 정보를 합치는 연산입니다.

// 합집합(Union): 어느 경로에서든 성립하면 포함
IN[B] = U OUT[P]  (P는 B의 모든 선행자)

// 교집합(Intersection): 모든 경로에서 성립해야 포함
IN[B] = n OUT[P]  (P는 B의 모든 선행자)

2. 도달 정의 (Reaching Definitions)

2.1 개념

정의 d가 프로그램 지점 p에 도달한다 = d에서 p까지의 경로가 존재하며, 그 경로에서 같은 변수가 다시 정의되지 않음.

2.2 gen/kill 집합

명령어 d: x = y op z에 대해:
  gen = 정의 d 자체
  kill = x에 대한 모든 다른 정의

블록 B 전체의 gen/kill:

// 블록 B에 d1, d2, ..., dk 명령어가 순서대로 있을 때
// 뒤의 명령어가 앞의 명령어를 덮어쓸 수 있음

gen[B] = gen(dk) U (gen(dk-1) - kill(dk)) U ...
kill[B] = kill(d1) U kill(d2) U ... U kill(dk)

2.3 데이터 흐름 방정식

OUT[B] = gen[B] U (IN[B] - kill[B])
IN[B]  = U OUT[P]  (P는 B의 모든 선행자)

합류 연산이 합집합인 이유: 어떤 경로를 통해서든 정의가 도달하면 그 값을 사용할 가능성이 있으므로, 보수적으로 모두 포함해야 합니다.

2.4 예시

// 흐름 그래프:
//   B1 -> B2 -> B3
//          ^     |
//          +-----+

// B1: d1: x = 5        gen={d1}, kill={d2,d4}
// B2: d2: x = x + 1    gen={d2}, kill={d1,d4}
// B3: d3: y = x         gen={d3}, kill={}
//     d4: x = y + 1     gen={d4}, kill={d1,d2}

// gen[B3] = {d4} U ({d3} - {d1,d2}) = {d3, d4}
// kill[B3] = {d1, d2}

반복 계산:

초기값: OUT[B] = {} (모든 B에 대해)

반복 1:
  IN[B1] = {}
  OUT[B1] = {d1} U ({} - {d2,d4}) = {d1}

  IN[B2] = OUT[B1] U OUT[B3] = {d1} U {} = {d1}
  OUT[B2] = {d2} U ({d1} - {d1,d4}) = {d2}

  IN[B3] = OUT[B2] = {d2}
  OUT[B3] = {d3,d4} U ({d2} - {d1,d2}) = {d3,d4}

반복 2:
  IN[B2] = OUT[B1] U OUT[B3] = {d1} U {d3,d4} = {d1,d3,d4}
  OUT[B2] = {d2} U ({d1,d3,d4} - {d1,d4}) = {d2,d3}

  IN[B3] = OUT[B2] = {d2,d3}
  OUT[B3] = {d3,d4} U ({d2,d3} - {d1,d2}) = {d3,d4}

반복 3:
  IN[B2] = {d1} U {d3,d4} = {d1,d3,d4}  (변화 없음)
  -> 수렴!

3. 활성 변수 분석 (Live-Variable Analysis)

3.1 개념

변수 x가 프로그램 지점 p에서 **활성(live)**이다 = p에서 시작하는 어떤 경로에서 x가 재정의되기 전에 사용됨.

3.2 역방향 분석

활성 변수 분석은 역방향(backward) 분석입니다. 프로그램의 끝에서 시작으로 거슬러 올라갑니다.

// 전달 함수 (역방향)
IN[B]  = use[B] U (OUT[B] - def[B])
OUT[B] = U IN[S]  (S는 B의 모든 후속자)

use[B]: 블록 B에서 정의되기 전에 사용되는 변수
def[B]: 블록 B에서 정의되는 변수

3.3 use/def 집합 계산

// 블록 B:
//   x = y + z    // use: {y, z}, def: {x}
//   a = x + 1    // use: {x}, def: {a}  (x는 위에서 정의되었으므로 use[B]에 포함 안됨)

// use[B] = {y, z}  (정의 전에 사용된 변수)
// def[B] = {x, a}  (정의된 변수)

3.4 예시

// B1: a = 1; b = 2
// B2: c = a + b; d = c - a
// B3: d = b + d; (출구)

// use[B1] = {}, def[B1] = {a, b}
// use[B2] = {a, b}, def[B2] = {c, d}
// use[B3] = {b, d}, def[B3] = {d}

// 초기값: IN[B] = {} (모든 B에 대해)

반복 1: (역방향: B3 -> B2 -> B1)
  OUT[B3] = {}  (출구 블록)
  IN[B3] = {b, d} U ({} - {d}) = {b, d}

  OUT[B2] = IN[B3] = {b, d}
  IN[B2] = {a, b} U ({b, d} - {c, d}) = {a, b}

  OUT[B1] = IN[B2] = {a, b}
  IN[B1] = {} U ({a, b} - {a, b}) = {}

  -> 수렴! (변화 없음)

3.5 활용

죽은 코드 제거: 대입 x = expr에서 OUT 지점에서 x가 활성이 아니면 죽은 코드
레지스터 할당: 동시에 활성인 변수들이 간섭 그래프의 간선을 형성

4. 가용 수식 (Available Expressions)

4.1 개념

수식 x op y가 프로그램 지점 p에서 가용하다 = p에 도달하는 모든 경로에서 x op y가 계산되었고, 그 이후로 x와 y가 재정의되지 않음.

4.2 데이터 흐름 방정식

OUT[B]   = e_gen[B] U (IN[B] - e_kill[B])
IN[B]    = n OUT[P]  (P는 B의 모든 선행자)
IN[ENTRY] = {}

합류 연산이 교집합인 이유: 모든 경로에서 계산되어야 안전하게 재사용할 수 있습니다.

e_gen[B]: 블록 B에서 계산되고, 이후에 피연산자가 재정의되지 않는 수식
e_kill[B]: 블록 B에서 변수 x가 정의될 때, x를 피연산자로 포함하는 모든 수식

4.3 예시

// B1: t1 = a + b; c = t1
//     e_gen[B1] = {a+b}
//     e_kill[B1] = {c를 포함하는 수식}

// B2: t2 = a + b; d = t2 - c
//     e_gen[B2] = {a+b, t2-c}
//     e_kill[B2] = {d를 포함하는 수식}

// B3: a = 1
//     e_gen[B3] = {}
//     e_kill[B3] = {a를 포함하는 모든 수식} = {a+b}

4.4 초기값의 중요성

가용 수식 분석에서는 초기값 설정이 중요합니다:

OUT[ENTRY] = {}     // 진입점에서는 아무 수식도 가용하지 않음
OUT[B] = U          // 다른 블록은 전체 집합(모든 수식)으로 초기화
                    // (교집합이므로 큰 값에서 시작해야 올바르게 수렴)

5. 반복 알고리즘

5.1 일반적인 반복 알고리즘

알고리즘: 순방향 데이터 흐름 분석

입력: 흐름 그래프, gen/kill 집합, 합류 연산
출력: 각 블록의 IN/OUT 집합

1. 초기화
   OUT[ENTRY] = 초기값
   for (모든 블록 B != ENTRY)
       OUT[B] = 초기값

2. 반복
   changed = true
   while (changed):
       changed = false
       for (모든 블록 B != ENTRY):
           IN[B] = meet(OUT[P] for all predecessors P)
           oldOUT = OUT[B]
           OUT[B] = gen[B] U (IN[B] - kill[B])
           if OUT[B] != oldOUT:
               changed = true

5.2 수렴 보장

반복 알고리즘이 반드시 종료하는 이유:

gen/kill 전달 함수는 **단조(monotone)**합니다 - 입력이 커지면 출력도 커짐(또는 같음)
데이터 흐름 값의 집합은 유한합니다
합류 연산(합집합 또는 교집합)도 단조합니다

따라서 각 반복에서 OUT 값은 한 방향으로만 변화하며, 유한한 범위 내에서 결국 안정 상태에 도달합니다.

5.3 수렴 속도

일반적으로 반복 횟수는 흐름 그래프의 **깊이(depth)**에 비례합니다. 깊이는 루프의 중첩 깊이와 관련됩니다. 대부분의 프로그램에서 2-3번의 반복으로 수렴합니다.

6. 세 가지 분석 비교

특성	도달 정의	활성 변수	가용 수식
방향	순방향	역방향	순방향
합류 연산	합집합	합집합	교집합
전달 함수	gen U (IN - kill)	use U (OUT - def)	e_gen U (IN - e_kill)
초기값	공집합	공집합	전체집합(Entry 제외)
활용	상수 전파	죽은 코드 제거	공통 부분식 제거

7. 격자 이론의 기초

7.1 반순서 집합 (Partially Ordered Set)

데이터 흐름 분석의 수학적 기반은 격자 이론(Lattice Theory)입니다.

반순서 집합은 집합 S와 관계 <=로 구성되며, 다음을 만족합니다:

반사성: a <= a
반대칭성: a <= b이고 b <= a이면 a = b
추이성: a <= b이고 b <= c이면 a <= c

7.2 격자 (Lattice)

반순서 집합에서 임의의 두 원소 a, b에 대해:

최소 상한(join): a와 b 모두보다 크거나 같은 원소 중 가장 작은 것
최대 하한(meet): a와 b 모두보다 작거나 같은 원소 중 가장 큰 것

이 두 가지가 항상 존재하면 격자라 합니다.

7.3 데이터 흐름 분석과 격자

// 도달 정의 분석의 격자
// 원소: 정의들의 부분집합 (2^D)
// 순서: 집합 포함관계 (부분집합)
// meet 연산: 합집합 (U)
// bottom: 공집합 ({})
// top: 전체집합 (D)

// 가용 수식 분석의 격자
// 원소: 수식들의 부분집합 (2^E)
// 순서: 역방향 집합 포함관계 (상위집합)
// meet 연산: 교집합
// bottom: 전체집합 (E)
// top: 공집합 ({})

7.4 고정점 정리

Tarski의 고정점 정리: 완전 격자 위에서 단조 함수 f는 최소 고정점을 가집니다. 반복 알고리즘은 이 최소 고정점에 수렴합니다.

이것이 반복 알고리즘의 정확성(correctness)과 종료성(termination)을 보장하는 수학적 기반입니다.

7.5 안전성과 정밀성

// 보수적(conservative) 분석:
// - 안전한 쪽으로 근사
// - 최적화 기회를 놓칠 수 있지만, 잘못된 최적화는 하지 않음

// 도달 정의: 실제보다 더 많은 정의가 도달한다고 분석
//   -> 상수 전파를 적용하지 못하는 경우가 생길 수 있음 (안전)

// 가용 수식: 실제보다 더 적은 수식이 가용하다고 분석
//   -> 공통 부분식 제거를 적용하지 못하는 경우가 생김 (안전)

8. 실용적 고려사항

8.1 비트 벡터 표현

데이터 흐름 값을 비트 벡터로 표현하면 효율적입니다:

// 정의 d1, d2, d3, d4가 있을 때
// gen[B] = {d1, d3} -> 비트 벡터: 1010
// kill[B] = {d2}    -> 비트 벡터: 0100

// 전달 함수 계산:
// OUT = gen | (IN & ~kill)  (비트 연산으로 한번에 처리)

비트 연산은 하드웨어 수준에서 매우 빠르므로 대규모 프로그램에서도 효율적입니다.

8.2 워크리스트 알고리즘

단순 반복 대신 워크리스트(worklist) 알고리즘을 사용하면 더 효율적입니다:

알고리즘: 워크리스트 기반 반복

1. 모든 블록을 워크리스트에 추가
2. 워크리스트가 비어있지 않은 동안:
   a. 블록 B를 워크리스트에서 꺼냄
   b. OUT[B]를 다시 계산
   c. OUT[B]가 변경되었으면 B의 후속자들을 워크리스트에 추가

변경된 블록의 영향을 받는 블록만 다시 계산하므로 불필요한 연산을 줄입니다.

정리

개념	설명
전달 함수	기본 블록의 입력 정보를 출력 정보로 변환
gen/kill 집합	블록에서 생성/무효화되는 데이터 흐름 정보
합류 연산	여러 경로의 정보를 합치는 연산 (합집합 또는 교집합)
도달 정의	변수의 정의가 특정 지점에 도달하는지 분석
활성 변수	변수의 값이 이후에 사용되는지 분석
가용 수식	수식이 이미 계산되어 재사용 가능한지 분석
반복 알고리즘	데이터 흐름 방정식을 고정점까지 반복 계산
격자 이론	데이터 흐름 분석의 수학적 기반

데이터 흐름 분석은 컴파일러 최적화의 이론적 기반으로서, 최적화의 안전성을 보장합니다. 격자 이론에 의한 수학적 근거가 있으므로, 반복 알고리즘은 반드시 수렴하며 보수적으로 안전한 결과를 제공합니다. 다음 글에서는 이 분석 기법을 활용한 루프 최적화를 자세히 다루겠습니다.

[Compiler] 14. Data-Flow Analysis Framework

Overview

Data-Flow Analysis is a core foundational technology for compiler optimization. It collects information about specific properties at each point in a program to determine whether safe optimizations are possible.

In this article, we cover the basic concepts of data-flow analysis, three representative analyses (reaching definitions, live variables, available expressions), iterative algorithms, and the basics of lattice theory.

1. Basic Concepts of Data-Flow Analysis

1.1 Goal of Analysis

At each point in a program, we answer questions such as:

Where was the value of variable x defined?
Is the value of variable x used later?
Has the expression a + b already been computed and is it available?

1.2 Program Points

The positions before and after each instruction in a basic block are called program points.

// Block B:
//  [point 1] d1: x = a + b [point 2]
//  [point 2] d2: y = x * c [point 3]
//
// IN[B] = information at point 1
// OUT[B] = information at point 3

1.3 Transfer Function

For each basic block B, the transfer function f_B transforms input information into output information:

OUT[B] = f_B(IN[B])

In most data-flow analyses, the transfer function takes the gen/kill form:

OUT[B] = gen[B] U (IN[B] - kill[B])

gen[B]: Information newly generated in block B
kill[B]: Information invalidated in block B

1.4 Meet/Join Operation

When a block has multiple predecessors, the meet operation combines the information from each predecessor.

// Union: Include if it holds on any path
IN[B] = U OUT[P]  (P is all predecessors of B)

// Intersection: Include only if it holds on all paths
IN[B] = n OUT[P]  (P is all predecessors of B)

2. Reaching Definitions

2.1 Concept

Definition d reaches program point p = a path exists from d to p, and the same variable is not redefined along that path.

2.2 gen/kill Sets

For instruction d: x = y op z:
  gen = definition d itself
  kill = all other definitions of x

gen/kill for the entire block B:

// When block B has instructions d1, d2, ..., dk in order
// Later instructions can overwrite earlier ones

gen[B] = gen(dk) U (gen(dk-1) - kill(dk)) U ...
kill[B] = kill(d1) U kill(d2) U ... U kill(dk)

2.3 Data-Flow Equations

OUT[B] = gen[B] U (IN[B] - kill[B])
IN[B]  = U OUT[P]  (P is all predecessors of B)

The meet operation is union because: if a definition reaches through any path, the value might be used, so all must be conservatively included.

2.4 Example

// Flow graph:
//   B1 -> B2 -> B3
//          ^     |
//          +-----+

// B1: d1: x = 5        gen={d1}, kill={d2,d4}
// B2: d2: x = x + 1    gen={d2}, kill={d1,d4}
// B3: d3: y = x         gen={d3}, kill={}
//     d4: x = y + 1     gen={d4}, kill={d1,d2}

// gen[B3] = {d4} U ({d3} - {d1,d2}) = {d3, d4}
// kill[B3] = {d1, d2}

Iterative computation:

Initial values: OUT[B] = {} (for all B)

Iteration 1:
  IN[B1] = {}
  OUT[B1] = {d1} U ({} - {d2,d4}) = {d1}

  IN[B2] = OUT[B1] U OUT[B3] = {d1} U {} = {d1}
  OUT[B2] = {d2} U ({d1} - {d1,d4}) = {d2}

  IN[B3] = OUT[B2] = {d2}
  OUT[B3] = {d3,d4} U ({d2} - {d1,d2}) = {d3,d4}

Iteration 2:
  IN[B2] = OUT[B1] U OUT[B3] = {d1} U {d3,d4} = {d1,d3,d4}
  OUT[B2] = {d2} U ({d1,d3,d4} - {d1,d4}) = {d2,d3}

  IN[B3] = OUT[B2] = {d2,d3}
  OUT[B3] = {d3,d4} U ({d2,d3} - {d1,d2}) = {d3,d4}

Iteration 3:
  IN[B2] = {d1} U {d3,d4} = {d1,d3,d4}  (no change)
  -> Converged!

3. Live-Variable Analysis

3.1 Concept

Variable x is live at program point p = on some path starting from p, x is used before being redefined.

3.2 Backward Analysis

Live variable analysis is a backward analysis. It proceeds from the end of the program toward the beginning.

// Transfer function (backward)
IN[B]  = use[B] U (OUT[B] - def[B])
OUT[B] = U IN[S]  (S is all successors of B)

use[B]: Variables used before being defined in block B
def[B]: Variables defined in block B

3.3 Computing use/def Sets

// Block B:
//   x = y + z    // use: {y, z}, def: {x}
//   a = x + 1    // use: {x}, def: {a}  (x was defined above, so not in use[B])

// use[B] = {y, z}  (variables used before being defined)
// def[B] = {x, a}  (variables defined)

3.4 Example

// B1: a = 1; b = 2
// B2: c = a + b; d = c - a
// B3: d = b + d; (exit)

// use[B1] = {}, def[B1] = {a, b}
// use[B2] = {a, b}, def[B2] = {c, d}
// use[B3] = {b, d}, def[B3] = {d}

// Initial values: IN[B] = {} (for all B)

Iteration 1: (backward: B3 -> B2 -> B1)
  OUT[B3] = {}  (exit block)
  IN[B3] = {b, d} U ({} - {d}) = {b, d}

  OUT[B2] = IN[B3] = {b, d}
  IN[B2] = {a, b} U ({b, d} - {c, d}) = {a, b}

  OUT[B1] = IN[B2] = {a, b}
  IN[B1] = {} U ({a, b} - {a, b}) = {}

  -> Converged! (no change)

3.5 Applications

Dead code elimination: If x is not live at the OUT point of assignment x = expr, it is dead code
Register allocation: Variables that are simultaneously live form edges in the interference graph

4. Available Expressions

4.1 Concept

Expression x op y is available at program point p = on all paths reaching p, x op y has been computed and neither x nor y has been redefined since.

4.2 Data-Flow Equations

OUT[B]   = e_gen[B] U (IN[B] - e_kill[B])
IN[B]    = n OUT[P]  (P is all predecessors of B)
IN[ENTRY] = {}

The meet operation is intersection because: the expression must have been computed on all paths to be safely reused.

e_gen[B]: Expressions computed in block B whose operands are not subsequently redefined
e_kill[B]: When variable x is defined in block B, all expressions containing x as an operand

4.3 Example

// B1: t1 = a + b; c = t1
//     e_gen[B1] = {a+b}
//     e_kill[B1] = {expressions containing c}

// B2: t2 = a + b; d = t2 - c
//     e_gen[B2] = {a+b, t2-c}
//     e_kill[B2] = {expressions containing d}

// B3: a = 1
//     e_gen[B3] = {}
//     e_kill[B3] = {all expressions containing a} = {a+b}

4.4 Importance of Initial Values

Initial value settings are important in available expressions analysis:

OUT[ENTRY] = {}     // No expressions are available at the entry point
OUT[B] = U          // Other blocks are initialized to the universal set (all expressions)
                    // (Since intersection is used, we must start from a large value to converge correctly)

5. Iterative Algorithm

5.1 General Iterative Algorithm

Algorithm: Forward Data-Flow Analysis

Input: Flow graph, gen/kill sets, meet operation
Output: IN/OUT sets for each block

1. Initialization
   OUT[ENTRY] = initial value
   for (all blocks B != ENTRY)
       OUT[B] = initial value

2. Iteration
   changed = true
   while (changed):
       changed = false
       for (all blocks B != ENTRY):
           IN[B] = meet(OUT[P] for all predecessors P)
           oldOUT = OUT[B]
           OUT[B] = gen[B] U (IN[B] - kill[B])
           if OUT[B] != oldOUT:
               changed = true

5.2 Convergence Guarantee

Why the iterative algorithm always terminates:

gen/kill transfer functions are monotone - as input grows, output grows (or stays the same)
The set of data-flow values is finite
Meet operations (union or intersection) are also monotone

Therefore, OUT values change in only one direction at each iteration and eventually reach a stable state within the finite range.

5.3 Convergence Speed

In general, the number of iterations is proportional to the depth of the flow graph. Depth is related to the nesting depth of loops. Most programs converge in 2-3 iterations.

6. Comparison of Three Analyses

Property	Reaching Definitions	Live Variables	Available Expressions
Direction	Forward	Backward	Forward
Meet operation	Union	Union	Intersection
Transfer function	gen U (IN - kill)	use U (OUT - def)	e_gen U (IN - e_kill)
Initial value	Empty set	Empty set	Universal set (except Entry)
Application	Constant propagation	Dead code elimination	Common subexpression elimination

7. Basics of Lattice Theory

7.1 Partially Ordered Set

The mathematical foundation of data-flow analysis is Lattice Theory.

A partially ordered set consists of a set S and a relation <=, satisfying:

Reflexivity: a <= a
Antisymmetry: if a <= b and b <= a then a = b
Transitivity: if a <= b and b <= c then a <= c

7.2 Lattice

In a partially ordered set, for any two elements a and b:

Least upper bound (join): The smallest element that is greater than or equal to both a and b
Greatest lower bound (meet): The largest element that is less than or equal to both a and b

If both always exist, it is called a lattice.

7.3 Data-Flow Analysis and Lattices

// Lattice for reaching definitions analysis
// Elements: subsets of definitions (2^D)
// Order: set inclusion (subset)
// Meet operation: union (U)
// Bottom: empty set ({})
// Top: universal set (D)

// Lattice for available expressions analysis
// Elements: subsets of expressions (2^E)
// Order: reverse set inclusion (superset)
// Meet operation: intersection
// Bottom: universal set (E)
// Top: empty set ({})

7.4 Fixed-Point Theorem

Tarski's Fixed-Point Theorem: A monotone function f on a complete lattice has a least fixed point. The iterative algorithm converges to this least fixed point.

This is the mathematical foundation that guarantees the correctness and termination of the iterative algorithm.

7.5 Safety and Precision

// Conservative analysis:
// - Approximates toward the safe side
// - May miss optimization opportunities, but never performs incorrect optimizations

// Reaching definitions: analyzes that more definitions reach than actually do
//   -> May not be able to apply constant propagation in some cases (safe)

// Available expressions: analyzes that fewer expressions are available than actually are
//   -> May not be able to apply common subexpression elimination in some cases (safe)

8. Practical Considerations

8.1 Bit Vector Representation

Representing data-flow values as bit vectors is efficient:

// Given definitions d1, d2, d3, d4
// gen[B] = {d1, d3} -> bit vector: 1010
// kill[B] = {d2}    -> bit vector: 0100

// Transfer function computation:
// OUT = gen | (IN & ~kill)  (processed at once with bitwise operations)

Bitwise operations are very fast at the hardware level, making them efficient even for large programs.

8.2 Worklist Algorithm

Using a worklist algorithm instead of simple iteration is more efficient:

Algorithm: Worklist-Based Iteration

1. Add all blocks to the worklist
2. While the worklist is not empty:
   a. Remove block B from the worklist
   b. Recompute OUT[B]
   c. If OUT[B] has changed, add B's successors to the worklist

By recomputing only blocks affected by changes, unnecessary computations are reduced.

Summary

Concept	Description
Transfer function	Transforms input information to output information for a basic block
gen/kill sets	Data-flow information generated/invalidated in a block
Meet operation	Operation combining information from multiple paths (union or intersection)
Reaching definitions	Analyzes whether a variable's definition reaches a specific point
Live variables	Analyzes whether a variable's value is used later
Available expressions	Analyzes whether an expression has been computed and is reusable
Iterative algorithm	Iteratively computes data-flow equations until a fixed point
Lattice theory	Mathematical foundation of data-flow analysis

Data-flow analysis serves as the theoretical foundation for compiler optimization, guaranteeing the safety of optimizations. Since it has mathematical backing from lattice theory, the iterative algorithm is guaranteed to converge and provides conservatively safe results. In the next article, we will examine loop optimization in detail using these analysis techniques.