Split View: [컴파일러] 08. 구문 지시 번역 - SDD와 SDT

[컴파일러] 08. 구문 지시 번역 - SDD와 SDT

구문 지시 번역 - SDD와 SDT

구문 지시 번역(Syntax-Directed Translation)은 문법의 각 생성 규칙에 **속성(attribute)**과 **의미 규칙(semantic rule)**을 부착하여, 파싱과 동시에 번역을 수행하는 기법입니다.

1. 구문 지시 정의(SDD) 개요

**구문 지시 정의(Syntax-Directed Definition, SDD)**는 문법에 속성과 의미 규칙을 추가한 것입니다.

속성의 종류

합성 속성(Synthesized Attribute): 자식 노드의 속성 값으로부터 아래에서 위로 계산됩니다.
상속 속성(Inherited Attribute): 부모나 형제 노드의 속성 값으로부터 위에서 아래로 계산됩니다.

합성 속성 예제: 산술식 평가

생성 규칙              의미 규칙
------------------------------------------------------
L -> E '\n'           L.val = E.val
E -> E1 + T           E.val = E1.val + T.val
E -> T                E.val = T.val
T -> T1 * F           T.val = T1.val * F.val
T -> F                T.val = F.val
F -> ( E )            F.val = E.val
F -> digit            F.val = digit.lexval

입력 3 * 5 + 4에 대한 주석 달린 파스 트리를 살펴보겠습니다.

              L (val = 19)
              |
           E (val = 19)
          / | \
    E(15)   +   T(4)
    |           |
  T(15)       F(4)
  / | \        |
T(3) * F(5)  digit(4)
|       |
F(3)  digit(5)
|
digit(3)

상속 속성 예제: 타입 선언

생성 규칙              의미 규칙
------------------------------------------------------
D -> T L              L.inh = T.type
T -> int              T.type = integer
T -> float            T.type = float
L -> L1 , id          L1.inh = L.inh
                      addtype(id.entry, L.inh)
L -> id               addtype(id.entry, L.inh)

입력 float id1, id2, id3에 대한 평가를 보겠습니다.

       D
      / \
     T    L (inh = float)
     |   / | \
   float L  , id3  -> addtype(id3, float)
        / | \
       L  , id2    -> addtype(id2, float)
       |
      id1          -> addtype(id1, float)

L.inh 속성이 부모에서 자식으로 전파되어 모든 식별자에 float 타입을 설정합니다.

2. 의존 그래프(Dependency Graph)

속성들 사이의 계산 순서를 결정하기 위해 의존 그래프를 구성합니다.

의존 그래프의 구성

각 속성 인스턴스에 대해 노드를 만듭니다.
속성 b의 계산에 속성 a가 필요하면, a에서 b로 간선을 추가합니다.

예: E -> E1 + T  에서  E.val = E1.val + T.val

  E1.val ----+
              +--> E.val
  T.val  ----+

평가 순서

의존 그래프에 **순환(cycle)**이 없으면, **위상 정렬(topological sort)**을 통해 속성을 평가할 수 있습니다.

위상 정렬 결과 (가능한 평가 순서):
  digit(3).lexval -> F.val -> T.val ->
  digit(5).lexval -> F.val -> T1.val ->
  digit(4).lexval -> F.val -> T2.val ->
  E1.val -> E.val -> L.val

순환이 있으면 속성을 평가할 수 없으며, 이는 SDD의 설계 오류입니다.

3. S-속성 정의와 L-속성 정의

S-속성 정의(S-Attributed Definition)

합성 속성만 사용하는 SDD입니다.

특징:
- 합성 속성만 사용
- 파스 트리를 후위 순회(post-order)로 평가 가능
- Bottom-Up 파싱 (LR 파서)과 자연스럽게 결합

S-속성 정의는 가장 단순하고 구현이 쉽습니다. LR 파서의 축약 동작 시 의미 규칙을 실행하면 됩니다.

L-속성 정의(L-Attributed Definition)

속성 계산에서 왼쪽에서 오른쪽 순서가 보장되는 SDD입니다.

조건: A -> X1 X2 ... Xn 에서
  Xi의 상속 속성은 다음에만 의존할 수 있음:
  1. A의 상속 속성
  2. X1, X2, ..., Xi-1의 속성 (Xi 왼쪽의 기호들)
  3. Xi 자신의 속성 (단, 순환이 없어야 함)

L-속성 정의의 포함 관계는 다음과 같습니다.

S-속성 정의 ⊂ L-속성 정의

모든 S-속성 정의는 L-속성 정의입니다.
L-속성 정의는 Top-Down 파싱과 자연스럽게 결합합니다.

4. 구문 지시 번역 체계(SDT)

**구문 지시 번역 체계(Syntax-Directed Translation Scheme, SDT)**는 생성 규칙의 본문 사이에 **의미 동작(semantic action)**을 삽입한 것입니다.

SDT 표기

의미 동작을 중괄호로 감싸서 생성 규칙 본문 중간에 배치합니다.

// S-속성 SDD를 SDT로 변환 (동작이 규칙 끝에)
E -> E1 + T  { E.val = E1.val + T.val }
E -> T       { E.val = T.val }

// L-속성 SDD를 SDT로 변환 (동작이 중간에)
D -> T { L.inh = T.type } L
T -> int  { T.type = integer }
T -> float { T.type = float }

5. Bottom-Up 파서에서의 SDT 구현

LR 파서에서 S-속성 정의를 구현하는 방법입니다.

파싱 스택에 속성값 저장

파싱 스택: 상태와 함께 속성값을 저장

+------+------+------+------+------+------+
| s0   | s1   | s2   | s3   | s4   | s5   |
| -    | E    | +    | T    | *    | F    |
| -    | 15   | -    | 4    | -    | 2    |
+------+------+------+------+------+------+
         val          val          val

축약 시 의미 동작 실행

규칙: T -> T1 * F  { T.val = T1.val * F.val }

축약 전 스택:  ... | T(3) | * | F(5)
축약 후 스택:  ... | T(15)
  -> 계산: 3 * 5 = 15

Yacc에서의 구현

expr : expr '+' term   { $$ = $1 + $3; }
     | term            { $$ = $1; }
     ;

$$는 좌변 논터미널의 속성, $1, $2, $3은 우변 기호의 속성입니다.

6. Top-Down 파서에서의 SDT 구현

재귀 하강 파서에서 L-속성 정의를 구현하는 방법입니다.

상속 속성은 매개변수로, 합성 속성은 반환값으로

// D -> T L
// T.type은 합성 속성, L.inh는 상속 속성
void D() {
    String type = T();    // T의 합성 속성을 반환받음
    L(type);              // L의 상속 속성을 매개변수로 전달
}

String T() {
    if (lookahead == INT) {
        match(INT);
        return "integer";
    } else if (lookahead == FLOAT) {
        match(FLOAT);
        return "float";
    }
    throw new SyntaxError("Expected type");
}

void L(String inheritedType) {
    if (lookahead == ID) {
        addType(currentToken, inheritedType);
        match(ID);
        if (lookahead == COMMA) {
            match(COMMA);
            L(inheritedType);  // 상속 속성 전달
        }
    }
}

7. SDT의 활용 예제

7.1 중위를 후위로 변환

E -> T R
R -> + T { print('+') } R | - T { print('-') } R | epsilon
T -> num { print(num.val) }

입력 9 - 5 + 2에 대한 동작을 추적합니다.

E -> T R
  T: print(9)     -> 출력: 9
  R -> - T R
    T: print(5)   -> 출력: 5
    print('-')     -> 출력: -
    R -> + T R
      T: print(2) -> 출력: 2
      print('+')   -> 출력: +
      R -> epsilon

최종 출력: 9 5 - 2 +

7.2 배열 타입 생성

// 문법: int [2][3] 같은 배열 타입 선언 처리
B -> int                 { B.type = integer }
   | float               { B.type = float }

C -> [ num ] C1          { C.type = array(num.val, C1.type) }
   | epsilon             { C.type = B.type }

T -> B C                 { T.type = C.type }

int [2][3]에 대한 평가를 살펴보겠습니다.

B.type = integer
C -> [2] C1
  C1 -> [3] C2
    C2 -> epsilon
    C2.type = integer        (B.type 상속)
  C1.type = array(3, integer)
C.type = array(2, array(3, integer))
T.type = array(2, array(3, integer))

8. 마커 논터미널과 동작 변환

Bottom-Up 파서에서 중간 동작을 처리하려면, 동작을 생성 규칙 끝으로 이동해야 합니다. 이를 위해 **마커 논터미널(marker nonterminal)**을 사용합니다.

마커 논터미널 변환

// 원래 SDT
B -> a { action1 } b { action2 }

// 마커 논터미널을 사용한 변환
B -> a M b { action2 }
M -> epsilon { action1 }

마커 논터미널 M은 빈 문자열로 축약되면서 action1을 실행합니다.

예제: 타입 선언에서의 마커 사용

// 원래 SDT (L-속성)
D -> T { L.inh = T.type } L

// 마커 논터미널 변환
D -> T M L
M -> epsilon  { /* T.type을 스택에서 복사하여 L.inh로 사용 */ }

9. 속성 문법의 응용

구문 지시 번역은 컴파일러의 여러 단계에서 활용됩니다.

적용 분야

단계                    속성의 활용
------------------------------------------------------
어휘 분석              토큰의 속성값 (정수값, 문자열값)
구문 분석              파스 트리/AST 구성
타입 검사              타입 속성의 합성과 검증
중간 코드 생성          3주소 코드의 합성
코드 최적화             데이터 흐름 속성
코드 생성              레지스터 할당, 명령어 선택

타입 검사의 SDD 예시

E -> E1 + E2   { E.type = if E1.type == integer and E2.type == integer
                          then integer
                          else if E1.type == float or E2.type == float
                          then float
                          else type_error }

E -> num       { E.type = integer }
E -> num.num   { E.type = float }
E -> id        { E.type = lookup(id.entry) }

10. 구현 전략 요약

어떤 SDD 유형을 선택할까?

+------------------+-------------------+--------------------+
| 특성              | S-속성 정의        | L-속성 정의         |
+------------------+-------------------+--------------------+
| 속성 종류         | 합성만            | 합성 + 상속         |
| 평가 방향         | 아래 -> 위        | 왼쪽 -> 오른쪽      |
| 파서 종류         | Bottom-Up (LR)   | Top-Down (LL)      |
| 구현 복잡도        | 단순             | 중간               |
| 표현력            | 제한적           | 풍부               |
+------------------+-------------------+--------------------+

실제 컴파일러의 접근법

대부분의 현대 컴파일러는 파싱 단계에서 AST를 구성하고, 이후 AST를 여러 번 순회하면서 속성을 계산합니다. 이 방식은 SDD의 제약에서 자유로우면서도 체계적인 구현이 가능합니다.

1단계: 파싱 -> AST 구성 (합성 속성 사용)
2단계: AST 순회 -> 심볼 테이블 구축 (상속 속성 흉내)
3단계: AST 순회 -> 타입 검사
4단계: AST 순회 -> 중간 코드 생성

정리

개념	핵심 내용
합성 속성	자식에서 부모로 전파, 아래에서 위로 계산
상속 속성	부모/형제에서 자식으로 전파, 위에서 아래로 계산
의존 그래프	속성 간 계산 순서를 결정하는 방향 그래프
S-속성 정의	합성 속성만 사용, LR 파서와 자연스럽게 결합
L-속성 정의	왼쪽에서 오른쪽 평가 보장, LL 파서와 결합
SDT	생성 규칙 본문에 의미 동작을 삽입
마커 논터미널	중간 동작을 규칙 끝으로 이동하기 위한 기법

구문 지시 번역은 파싱 결과에 의미를 부여하는 핵심 기법입니다. 다음 글에서는 이를 활용하여 중간 코드를 생성하는 방법을 다루겠습니다.

[Compiler] 08. Syntax-Directed Translation - SDD and SDT

Syntax-Directed Translation - SDD and SDT

Syntax-Directed Translation is a technique that attaches attributes and semantic rules to each production rule of the grammar, performing translation simultaneously with parsing.

1. Syntax-Directed Definition (SDD) Overview

A Syntax-Directed Definition (SDD) adds attributes and semantic rules to a grammar.

Types of Attributes

Synthesized Attribute: Computed bottom-up from child node attribute values.
Inherited Attribute: Computed top-down from parent or sibling node attribute values.

Synthesized Attribute Example: Arithmetic Expression Evaluation

Production              Semantic Rule
------------------------------------------------------
L -> E '\n'           L.val = E.val
E -> E1 + T           E.val = E1.val + T.val
E -> T                E.val = T.val
T -> T1 * F           T.val = T1.val * F.val
T -> F                T.val = F.val
F -> ( E )            F.val = E.val
F -> digit            F.val = digit.lexval

Let us examine the annotated parse tree for input 3 * 5 + 4:

              L (val = 19)
              |
           E (val = 19)
          / | \
    E(15)   +   T(4)
    |           |
  T(15)       F(4)
  / | \        |
T(3) * F(5)  digit(4)
|       |
F(3)  digit(5)
|
digit(3)

Inherited Attribute Example: Type Declarations

Production              Semantic Rule
------------------------------------------------------
D -> T L              L.inh = T.type
T -> int              T.type = integer
T -> float            T.type = float
L -> L1 , id          L1.inh = L.inh
                      addtype(id.entry, L.inh)
L -> id               addtype(id.entry, L.inh)

Let us examine the evaluation for input float id1, id2, id3:

       D
      / \
     T    L (inh = float)
     |   / | \
   float L  , id3  -> addtype(id3, float)
        / | \
       L  , id2    -> addtype(id2, float)
       |
      id1          -> addtype(id1, float)

The L.inh attribute propagates from parent to child, setting the float type for all identifiers.

2. Dependency Graph

A dependency graph is constructed to determine the computation order among attributes.

Constructing a Dependency Graph

Create a node for each attribute instance.
If attribute a is needed to compute attribute b, add an edge from a to b.

Example: E -> E1 + T  where  E.val = E1.val + T.val

  E1.val ----+
              +--> E.val
  T.val  ----+

Evaluation Order

If the dependency graph has no cycles, attributes can be evaluated through topological sort.

Topological sort result (possible evaluation order):
  digit(3).lexval -> F.val -> T.val ->
  digit(5).lexval -> F.val -> T1.val ->
  digit(4).lexval -> F.val -> T2.val ->
  E1.val -> E.val -> L.val

If there are cycles, attributes cannot be evaluated, which indicates a design error in the SDD.

3. S-Attributed and L-Attributed Definitions

S-Attributed Definition

An SDD that uses only synthesized attributes.

Characteristics:
- Uses only synthesized attributes
- Can be evaluated with post-order traversal of the parse tree
- Naturally combines with Bottom-Up parsing (LR parser)

S-attributed definitions are the simplest and easiest to implement. Semantic rules are executed during the reduce action of an LR parser.

L-Attributed Definition

An SDD where left-to-right order is guaranteed in attribute computation.

Condition: For A -> X1 X2 ... Xn,
  Inherited attributes of Xi can only depend on:
  1. Inherited attributes of A
  2. Attributes of X1, X2, ..., Xi-1 (symbols to the left of Xi)
  3. Attributes of Xi itself (as long as there are no cycles)

The containment relationship of L-attributed definitions is:

S-attributed definitions < L-attributed definitions

All S-attributed definitions are L-attributed definitions.
L-attributed definitions naturally combine with Top-Down parsing.

4. Syntax-Directed Translation Scheme (SDT)

A Syntax-Directed Translation Scheme (SDT) inserts semantic actions within the body of production rules.

SDT Notation

Semantic actions are enclosed in curly braces and placed within the production rule body.

// Converting S-attributed SDD to SDT (actions at end of rule)
E -> E1 + T  { E.val = E1.val + T.val }
E -> T       { E.val = T.val }

// Converting L-attributed SDD to SDT (actions in the middle)
D -> T { L.inh = T.type } L
T -> int  { T.type = integer }
T -> float { T.type = float }

5. Implementing SDT in Bottom-Up Parsers

How to implement S-attributed definitions in an LR parser.

Storing Attribute Values in the Parsing Stack

Parsing stack: stores attribute values along with states

+------+------+------+------+------+------+
| s0   | s1   | s2   | s3   | s4   | s5   |
| -    | E    | +    | T    | *    | F    |
| -    | 15   | -    | 4    | -    | 2    |
+------+------+------+------+------+------+
         val          val          val

Executing Semantic Actions on Reduction

Rule: T -> T1 * F  { T.val = T1.val * F.val }

Stack before reduction:  ... | T(3) | * | F(5)
Stack after reduction:   ... | T(15)
  -> Computation: 3 * 5 = 15

Implementation in Yacc

expr : expr '+' term   { $$ = $1 + $3; }
     | term            { $$ = $1; }
     ;

$$ is the attribute of the left-side nonterminal, and $1, $2, $3 are the attributes of right-side symbols.

6. Implementing SDT in Top-Down Parsers

How to implement L-attributed definitions in a recursive descent parser.

Inherited Attributes as Parameters, Synthesized Attributes as Return Values

// D -> T L
// T.type is a synthesized attribute, L.inh is an inherited attribute
void D() {
    String type = T();    // Receive T's synthesized attribute as return
    L(type);              // Pass L's inherited attribute as parameter
}

String T() {
    if (lookahead == INT) {
        match(INT);
        return "integer";
    } else if (lookahead == FLOAT) {
        match(FLOAT);
        return "float";
    }
    throw new SyntaxError("Expected type");
}

void L(String inheritedType) {
    if (lookahead == ID) {
        addType(currentToken, inheritedType);
        match(ID);
        if (lookahead == COMMA) {
            match(COMMA);
            L(inheritedType);  // Pass inherited attribute
        }
    }
}

7. SDT Application Examples

7.1 Infix to Postfix Conversion

E -> T R
R -> + T { print('+') } R | - T { print('-') } R | epsilon
T -> num { print(num.val) }

Tracing the actions for input 9 - 5 + 2:

E -> T R
  T: print(9)     -> output: 9
  R -> - T R
    T: print(5)   -> output: 5
    print('-')     -> output: -
    R -> + T R
      T: print(2) -> output: 2
      print('+')   -> output: +
      R -> epsilon

Final output: 9 5 - 2 +

7.2 Array Type Construction

// Grammar: handling array type declarations like int [2][3]
B -> int                 { B.type = integer }
   | float               { B.type = float }

C -> [ num ] C1          { C.type = array(num.val, C1.type) }
   | epsilon             { C.type = B.type }

T -> B C                 { T.type = C.type }

Evaluating int [2][3]:

B.type = integer
C -> [2] C1
  C1 -> [3] C2
    C2 -> epsilon
    C2.type = integer        (inherited from B.type)
  C1.type = array(3, integer)
C.type = array(2, array(3, integer))
T.type = array(2, array(3, integer))

8. Marker Nonterminals and Action Transformation

To handle intermediate actions in Bottom-Up parsers, actions must be moved to the end of production rules. Marker nonterminals are used for this purpose.

Marker Nonterminal Transformation

// Original SDT
B -> a { action1 } b { action2 }

// Transformation using marker nonterminal
B -> a M b { action2 }
M -> epsilon { action1 }

Marker nonterminal M reduces to the empty string while executing action1.

Example: Marker Usage in Type Declarations

// Original SDT (L-attributed)
D -> T { L.inh = T.type } L

// Marker nonterminal transformation
D -> T M L
M -> epsilon  { /* copy T.type from stack to use as L.inh */ }

9. Applications of Attribute Grammars

Syntax-directed translation is utilized in various phases of a compiler.

Application Areas

Phase                    Use of Attributes
------------------------------------------------------
Lexical Analysis        Token attribute values (integer, string)
Syntax Analysis         Parse tree / AST construction
Type Checking           Synthesis and verification of type attributes
Intermediate Code Gen.  Synthesis of three-address code
Code Optimization       Data flow attributes
Code Generation         Register allocation, instruction selection

Type Checking SDD Example

E -> E1 + E2   { E.type = if E1.type == integer and E2.type == integer
                          then integer
                          else if E1.type == float or E2.type == float
                          then float
                          else type_error }

E -> num       { E.type = integer }
E -> num.num   { E.type = float }
E -> id        { E.type = lookup(id.entry) }

10. Implementation Strategy Summary

Which SDD Type to Choose?

+------------------+-------------------+--------------------+
| Characteristic   | S-Attributed Def. | L-Attributed Def.  |
+------------------+-------------------+--------------------+
| Attribute Types  | Synthesized only  | Synthesized + Inh. |
| Evaluation Dir.  | Bottom -> Up      | Left -> Right      |
| Parser Type      | Bottom-Up (LR)   | Top-Down (LL)      |
| Complexity       | Simple            | Moderate           |
| Expressiveness   | Limited           | Rich               |
+------------------+-------------------+--------------------+

Approach in Real Compilers

Most modern compilers construct an AST during the parsing phase and then compute attributes through multiple traversals of the AST. This approach is free from SDD constraints while enabling systematic implementation.

Phase 1: Parsing -> AST construction (using synthesized attributes)
Phase 2: AST traversal -> Symbol table construction (simulating inherited attributes)
Phase 3: AST traversal -> Type checking
Phase 4: AST traversal -> Intermediate code generation

Summary

Concept	Key Content
Synthesized Attribute	Propagated child to parent, computed bottom-up
Inherited Attribute	Propagated parent/sibling to child, computed top-down
Dependency Graph	Directed graph determining attribute computation order
S-Attributed Def.	Uses only synthesized attributes, combines with LR
L-Attributed Def.	Guarantees left-to-right evaluation, combines with LL
SDT	Inserts semantic actions in production rule body
Marker Nonterminal	Technique to move intermediate actions to rule end

Syntax-directed translation is the core technique for assigning meaning to parsing results. In the next article, we will cover how to generate intermediate code using this technique.