Split View: [컴파일러] 05. 유한 오토마타와 어휘 분석기 구현

[컴파일러] 05. 유한 오토마타와 어휘 분석기 구현

유한 오토마타와 어휘 분석기 구현

정규 표현식은 토큰의 패턴을 명세하는 도구이고, 유한 오토마타는 이를 인식하는 실행 모델입니다. 이 글에서는 NFA, DFA, 그리고 이들 간의 변환을 다룹니다.

1. 유한 오토마타(Finite Automata) 개요

유한 오토마타는 문자열을 인식하는 수학적 모델입니다. 두 가지 종류가 있습니다.

NFA (Nondeterministic Finite Automaton): 비결정적 유한 오토마타
DFA (Deterministic Finite Automaton): 결정적 유한 오토마타

둘 다 정규 언어를 인식하며, 표현력은 동일합니다. 하지만 구현 특성이 다릅니다.

정규 표현식 <=> NFA <=> DFA

- NFA: 구성이 간단하지만 실행이 복잡
- DFA: 구성이 복잡하지만 실행이 빠름

2. NFA (비결정적 유한 오토마타)

NFA는 5-튜플로 정의됩니다: (S, Sigma, delta, s0, F)

S: 상태의 유한 집합
Sigma: 입력 알파벳
delta: 전이 함수 (상태와 입력에 대해 상태의 집합을 반환)
s0: 시작 상태
F: 수락 상태의 집합

NFA의 핵심 특징

하나의 입력에 대해 여러 전이가 가능합니다.
입실론(epsilon) 전이: 입력 없이도 상태를 전이할 수 있습니다.

NFA 예제: `(a|b)*abb`

         a
    +--------+
    |        |
    v    a   |    b         b
--> 0 -----> 1 -----> 2 -----> ((3))
    |
    +--b--+
    |     |
    +<----+

전이 테이블

상태  |  a      |  b
-----+--------+--------
  0  | {0, 1} | {0}
  1  | -      | {2}
  2  | -      | {3}
  3  | -      | -

상태 0에서 입력 a를 읽으면 상태 0 또는 상태 1로 갈 수 있습니다. 이것이 비결정적이라는 의미입니다.

NFA의 문자열 수락

NFA는 입력 문자열에 대해 시작 상태에서 수락 상태로 가는 경로가 하나라도 존재하면 해당 문자열을 수락합니다.

입력: "aabb"

가능한 경로:
  0 -a-> 0 -a-> 0 -b-> 0 -b-> 0  (수락 상태 아님)
  0 -a-> 0 -a-> 1 -b-> 2 -b-> 3  (수락 상태!) -> 수락
  0 -a-> 1 (다음 a에 대한 전이 없음) -> 실패

3. DFA (결정적 유한 오토마타)

DFA는 NFA의 특수한 경우로, 다음과 같은 제약이 있습니다.

각 상태에서 각 입력 기호에 대해 정확히 하나의 전이만 있습니다.
입실론 전이가 없습니다.

DFA 예제: `(a|b)*abb`

         b
    +--------+
    |        |
    v    a   |    b         b
--> 0 -----> 1 -----> 2 -----> ((3))
    ^        |        |
    |   b    |   a    |   a
    +--------+        |
    |                 |
    +-----------------+

전이 테이블

상태  |  a  |  b
-----+-----+-----
  0  |  1  |  0
  1  |  1  |  2
  2  |  1  |  3
  3  |  1  |  0

DFA는 각 상태에서 각 입력에 대해 전이가 유일합니다. 따라서 실행이 간단하고 빠릅니다.

DFA 시뮬레이션 알고리즘

public boolean simulate(DFA dfa, String input) {
    int state = dfa.startState;

    for (char ch : input.toCharArray()) {
        state = dfa.transition[state][ch];
        if (state == DFA.ERROR) {
            return false;  // 에러 상태
        }
    }

    return dfa.isAccepting(state);
}

DFA의 시뮬레이션은 입력 길이에 비례하는 O(n) 시간이 걸립니다.

4. 정규 표현식에서 NFA로: Thompson 구성

**Thompson 구성(Thompson's Construction)**은 정규 표현식을 NFA로 변환하는 체계적인 방법입니다.

기본 구성 요소

1. 단일 문자 a:
   (i) --a--> ((f))

2. 입실론(epsilon):
   (i) --epsilon--> ((f))

합집합: r | s

              epsilon     NFA(r)     epsilon
          +----------> (i_r) ...> (f_r) ----------+
          |                                        |
   (i) ---+                                        +---> ((f))
          |                                        |
          +----------> (i_s) ...> (f_s) ----------+
              epsilon     NFA(s)     epsilon

접합: rs

   (i_r) --> NFA(r) --> (f_r/i_s) --> NFA(s) --> ((f_s))

r의 수락 상태와 s의 시작 상태를 합칩니다.

클로저: r*

                    epsilon
              +-------------------+
              |                   |
   (i) --epsilon--> (i_r) --> NFA(r) --> (f_r) --epsilon--> ((f))
     |                                                        ^
     +-------------------epsilon------------------------------+

Thompson 구성 예제: `(a|b)*abb`

단계별로 구성하면 다음과 같습니다.

1단계: a의 NFA
   (2) --a--> (3)

2단계: b의 NFA
   (4) --b--> (5)

3단계: a|b의 NFA
   (1) --eps--> (2) --a--> (3) --eps--> (6)
   (1) --eps--> (4) --b--> (5) --eps--> (6)

4단계: (a|b)* 의 NFA
   (0) --eps--> (1) --...--> (6) --eps--> (7)
   (0) --eps--> (7)
   (6) --eps--> (1)

5단계: (a|b)*abb 완성
   ... --eps--> (7) --a--> (8) --b--> (9) --b--> ((10))

Thompson 구성의 특징은 다음과 같습니다.

NFA의 상태 수는 정규 표현식 길이의 최대 2배입니다.
각 상태에서 나가는 전이는 최대 2개입니다.
시작 상태와 수락 상태가 각각 하나입니다.

5. NFA에서 DFA로: 부분 집합 구성

**부분 집합 구성(Subset Construction)**은 NFA를 등가의 DFA로 변환하는 알고리즘입니다.

핵심 아이디어

DFA의 각 상태는 NFA 상태들의 집합입니다. NFA가 동시에 있을 수 있는 모든 상태를 하나의 DFA 상태로 표현합니다.

필요한 연산

epsilon-closure(s): 상태 s에서 epsilon 전이만으로 도달 가능한 모든 상태의 집합
epsilon-closure(T): 상태 집합 T의 각 상태에 대한 epsilon-closure의 합집합
move(T, a): 상태 집합 T에서 입력 a에 대한 전이로 도달 가능한 상태의 집합

알고리즘

Dstates에 epsilon-closure(s0)를 추가하고 "미처리"로 표시

while (미처리 상태 T가 Dstates에 존재) do
    T를 "처리됨"으로 표시
    for (각 입력 기호 a) do
        U = epsilon-closure(move(T, a))
        if (U가 Dstates에 없으면)
            U를 Dstates에 추가하고 "미처리"로 표시
        Dtran[T, a] = U
    end for
end while

부분 집합 구성 예제

NFA 상태: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
NFA 시작: 0, NFA 수락: 10

DFA 상태 A = epsilon-closure(0) = {0, 1, 2, 4, 7}
  move(A, a) = {3, 8}
  epsilon-closure({3, 8}) = {1, 2, 3, 4, 6, 7, 8} = B
  move(A, b) = {5}
  epsilon-closure({5}) = {1, 2, 4, 5, 6, 7} = C

DFA 상태 B = {1, 2, 3, 4, 6, 7, 8}
  move(B, a) = {3, 8}  -> B
  move(B, b) = {5, 9}
  epsilon-closure({5, 9}) = {1, 2, 4, 5, 6, 7, 9} = D

DFA 상태 C = {1, 2, 4, 5, 6, 7}
  move(C, a) = {3, 8}  -> B
  move(C, b) = {5}     -> C

DFA 상태 D = {1, 2, 4, 5, 6, 7, 9}
  move(D, a) = {3, 8}  -> B
  move(D, b) = {5, 10}
  epsilon-closure({5, 10}) = {1, 2, 4, 5, 6, 7, 10} = E

DFA 상태 E = {1, 2, 4, 5, 6, 7, 10}  (10 포함 -> 수락 상태)
  move(E, a) = {3, 8}  -> B
  move(E, b) = {5}     -> C

결과 DFA 전이 테이블:

상태  |  a  |  b  | 수락?
-----+-----+-----+------
  A  |  B  |  C  |  X
  B  |  B  |  D  |  X
  C  |  B  |  C  |  X
  D  |  B  |  E  |  X
  E  |  B  |  C  |  O

6. DFA 최소화

부분 집합 구성으로 만들어진 DFA에는 불필요한 상태가 있을 수 있습니다. DFA 최소화는 등가인 최소 상태 DFA를 구합니다.

Hopcroft 알고리즘

1. 초기 분할: {수락 상태}, {비수락 상태}

2. 반복: 각 그룹을 검사하여, 같은 그룹 내에서
   어떤 입력에 대해 다른 그룹으로 전이하는 상태들을 분리

3. 더 이상 분리할 수 없을 때 종료

최소화 예제

초기 분할:
  G1 = {A, B, C, D}  (비수락)
  G2 = {E}           (수락)

입력 'b'에 대한 검사:
  A -> C (G1), B -> D (G1), C -> C (G1), D -> E (G2)
  D는 다른 그룹으로 가므로 분리!

새 분할:
  G1 = {A, B, C}
  G3 = {D}
  G2 = {E}

입력 'b'에 대한 검사:
  A -> C (G1), B -> D (G3), C -> C (G1)
  B를 분리!

최종 분할:
  {A, C}, {B}, {D}, {E}

7. Lex 도구를 활용한 어휘 분석기 생성

Lex(또는 Flex)는 정규 표현식으로 토큰 패턴을 기술하면 어휘 분석기를 자동으로 생성해주는 도구입니다.

Lex 프로그램 구조

선언부 (declarations)
%%
번역 규칙 (translation rules)
%%
보조 함수 (auxiliary functions)

Lex 프로그램 예제

%{
#include <stdio.h>
#include "y.tab.h"
%}

digit    [0-9]
letter   [a-zA-Z]

%%

if          { return IF; }
else        { return ELSE; }
while       { return WHILE; }

{letter}({letter}|{digit})*  { return ID; }
{digit}+                      { return NUM; }

"+"         { return PLUS; }
"-"         { return MINUS; }
"*"         { return TIMES; }
"/"         { return DIVIDE; }
"="         { return ASSIGN; }
"=="        { return EQ; }
"<"         { return LT; }
"<="        { return LE; }

"("         { return LPAREN; }
")"         { return RPAREN; }
";"         { return SEMI; }

[ \t\n]+    { /* 공백 무시 */ }
.           { printf("Unexpected: %s\n", yytext); }

%%

Lex의 동작 원리

Lex 명세 (.l 파일)
       |
       v
  +----------+
  |   Lex    |  (Lex 컴파일러)
  +----------+
       |
       v
  lex.yy.c (C 소스 코드)
       |
       v
  +----------+
  | C 컴파일러 |
  +----------+
       |
       v
  어휘 분석기 (실행 파일)

Lex의 규칙 충돌 해결

여러 패턴이 동시에 매치될 때의 규칙입니다.

최장 일치(Longest Match): 가장 길게 매치되는 패턴을 선택합니다.
- 입력이 ifx이면 키워드 if가 아닌 식별자 ifx로 인식합니다.
규칙 우선순위(Rule Priority): 같은 길이면 먼저 나온 규칙을 선택합니다.
- if는 식별자 패턴보다 키워드 규칙이 먼저 나오므로 키워드로 인식합니다.

8. 정규 표현식에서 DFA로 직접 변환

Thompson 구성과 부분 집합 구성을 거치지 않고, 정규 표현식에서 DFA를 직접 구성하는 방법도 있습니다.

주요 함수

정규 표현식의 구문 트리 노드 n에 대해 다음을 계산합니다.

nullable(n): 노드 n이 빈 문자열을 생성할 수 있는지 여부
firstpos(n): n에서 생성되는 문자열의 첫 번째 위치가 될 수 있는 위치의 집합
lastpos(n): n에서 생성되는 문자열의 마지막 위치가 될 수 있는 위치의 집합
followpos(p): 위치 p 다음에 올 수 있는 위치의 집합

nullable 규칙:
  epsilon 노드     -> true
  위치 i 노드       -> false
  cat(c1, c2) 노드 -> nullable(c1) AND nullable(c2)
  or(c1, c2) 노드  -> nullable(c1) OR nullable(c2)
  star(c1) 노드    -> true

이 방법은 중간 단계인 NFA를 거치지 않으므로, 상태 수가 적은 효율적인 DFA를 바로 얻을 수 있습니다.

정리

개념	핵심 내용
NFA	비결정적, 입실론 전이 가능, 구성이 간단
DFA	결정적, 각 입력에 전이 하나, 실행이 빠름 (O(n))
Thompson 구성	정규 표현식을 NFA로 체계적 변환
부분 집합 구성	NFA를 DFA로 변환 (상태 폭발 가능)
DFA 최소화	등가인 최소 상태 DFA 구성
Lex	정규 표현식으로 어휘 분석기를 자동 생성
최장 일치	여러 패턴 매치 시 가장 긴 것 선택

유한 오토마타는 어휘 분석의 이론적 기반이자 실제 구현의 핵심입니다. 다음 글에서는 구문 분석 단계로 넘어가 Top-Down 파싱과 LL 파서를 살펴보겠습니다.

[Compiler] 05. Finite Automata and Lexical Analyzer Implementation

Finite Automata and Lexical Analyzer Implementation

Regular expressions are tools for specifying token patterns, and finite automata are execution models for recognizing them. This article covers NFA, DFA, and the conversions between them.

1. Finite Automata Overview

Finite automata are mathematical models for recognizing strings. There are two types:

NFA (Nondeterministic Finite Automaton)
DFA (Deterministic Finite Automaton)

Both recognize regular languages and have the same expressive power. However, their implementation characteristics differ.

Regular Expression <=> NFA <=> DFA

- NFA: Simple to construct but complex to execute
- DFA: Complex to construct but fast to execute

2. NFA (Nondeterministic Finite Automaton)

An NFA is defined as a 5-tuple: (S, Sigma, delta, s0, F)

S: A finite set of states
Sigma: Input alphabet
delta: Transition function (returns a set of states for a given state and input)
s0: Start state
F: Set of accepting states

Key Characteristics of NFA

Multiple transitions are possible for a single input.
Epsilon transitions: States can transition without consuming input.

NFA Example: `(a|b)*abb`

         a
    +--------+
    |        |
    v    a   |    b         b
--> 0 -----> 1 -----> 2 -----> ((3))
    |
    +--b--+
    |     |
    +<----+

Transition Table

State |  a      |  b
-----+--------+--------
  0  | {0, 1} | {0}
  1  | -      | {2}
  2  | -      | {3}
  3  | -      | -

From state 0, reading input a can go to either state 0 or state 1. This is what nondeterministic means.

String Acceptance by NFA

An NFA accepts an input string if at least one path exists from the start state to an accepting state.

Input: "aabb"

Possible paths:
  0 -a-> 0 -a-> 0 -b-> 0 -b-> 0  (not accepting state)
  0 -a-> 0 -a-> 1 -b-> 2 -b-> 3  (accepting state!) -> Accept
  0 -a-> 1 (no transition for next a) -> Fail

3. DFA (Deterministic Finite Automaton)

A DFA is a special case of NFA with the following constraints:

For each state and each input symbol, there is exactly one transition.
No epsilon transitions.

DFA Example: `(a|b)*abb`

         b
    +--------+
    |        |
    v    a   |    b         b
--> 0 -----> 1 -----> 2 -----> ((3))
    ^        |        |
    |   b    |   a    |   a
    +--------+        |
    |                 |
    +-----------------+

Transition Table

State |  a  |  b
-----+-----+-----
  0  |  1  |  0
  1  |  1  |  2
  2  |  1  |  3
  3  |  1  |  0

In a DFA, the transition for each state and each input is unique. This makes execution simple and fast.

DFA Simulation Algorithm

public boolean simulate(DFA dfa, String input) {
    int state = dfa.startState;

    for (char ch : input.toCharArray()) {
        state = dfa.transition[state][ch];
        if (state == DFA.ERROR) {
            return false;  // Error state
        }
    }

    return dfa.isAccepting(state);
}

DFA simulation takes O(n) time, proportional to input length.

4. Regular Expression to NFA: Thompson Construction

Thompson's Construction is a systematic method for converting a regular expression into an NFA.

Basic Building Blocks

1. Single character a:
   (i) --a--> ((f))

2. Epsilon:
   (i) --epsilon--> ((f))

Union: r | s

              epsilon     NFA(r)     epsilon
          +----------> (i_r) ...> (f_r) ----------+
          |                                        |
   (i) ---+                                        +---> ((f))
          |                                        |
          +----------> (i_s) ...> (f_s) ----------+
              epsilon     NFA(s)     epsilon

Concatenation: rs

   (i_r) --> NFA(r) --> (f_r/i_s) --> NFA(s) --> ((f_s))

The accepting state of r and the start state of s are merged.

Closure: r*

                    epsilon
              +-------------------+
              |                   |
   (i) --epsilon--> (i_r) --> NFA(r) --> (f_r) --epsilon--> ((f))
     |                                                        ^
     +-------------------epsilon------------------------------+

Thompson Construction Example: `(a|b)*abb`

Step-by-step construction:

Step 1: NFA for a
   (2) --a--> (3)

Step 2: NFA for b
   (4) --b--> (5)

Step 3: NFA for a|b
   (1) --eps--> (2) --a--> (3) --eps--> (6)
   (1) --eps--> (4) --b--> (5) --eps--> (6)

Step 4: NFA for (a|b)*
   (0) --eps--> (1) --...--> (6) --eps--> (7)
   (0) --eps--> (7)
   (6) --eps--> (1)

Step 5: Complete (a|b)*abb
   ... --eps--> (7) --a--> (8) --b--> (9) --b--> ((10))

The characteristics of Thompson construction are:

The number of NFA states is at most twice the length of the regular expression.
Each state has at most 2 outgoing transitions.
There is exactly one start state and exactly one accepting state.

5. NFA to DFA: Subset Construction

Subset Construction is an algorithm that converts an NFA to an equivalent DFA.

Core Idea

Each DFA state is a set of NFA states. All states the NFA can be in simultaneously are represented as a single DFA state.

Required Operations

epsilon-closure(s): Set of all states reachable from state s via epsilon transitions only
epsilon-closure(T): Union of epsilon-closure for each state in state set T
move(T, a): Set of states reachable from state set T via transitions on input a

Algorithm

Add epsilon-closure(s0) to Dstates and mark as "unprocessed"

while (unprocessed state T exists in Dstates) do
    Mark T as "processed"
    for (each input symbol a) do
        U = epsilon-closure(move(T, a))
        if (U is not in Dstates)
            Add U to Dstates and mark as "unprocessed"
        Dtran[T, a] = U
    end for
end while

Subset Construction Example

NFA states: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
NFA start: 0, NFA accept: 10

DFA state A = epsilon-closure(0) = {0, 1, 2, 4, 7}
  move(A, a) = {3, 8}
  epsilon-closure({3, 8}) = {1, 2, 3, 4, 6, 7, 8} = B
  move(A, b) = {5}
  epsilon-closure({5}) = {1, 2, 4, 5, 6, 7} = C

DFA state B = {1, 2, 3, 4, 6, 7, 8}
  move(B, a) = {3, 8}  -> B
  move(B, b) = {5, 9}
  epsilon-closure({5, 9}) = {1, 2, 4, 5, 6, 7, 9} = D

DFA state C = {1, 2, 4, 5, 6, 7}
  move(C, a) = {3, 8}  -> B
  move(C, b) = {5}     -> C

DFA state D = {1, 2, 4, 5, 6, 7, 9}
  move(D, a) = {3, 8}  -> B
  move(D, b) = {5, 10}
  epsilon-closure({5, 10}) = {1, 2, 4, 5, 6, 7, 10} = E

DFA state E = {1, 2, 4, 5, 6, 7, 10}  (contains 10 -> accepting state)
  move(E, a) = {3, 8}  -> B
  move(E, b) = {5}     -> C

Resulting DFA transition table:

State |  a  |  b  | Accept?
-----+-----+-----+--------
  A  |  B  |  C  |  X
  B  |  B  |  D  |  X
  C  |  B  |  C  |  X
  D  |  B  |  E  |  X
  E  |  B  |  C  |  O

6. DFA Minimization

The DFA produced by subset construction may have unnecessary states. DFA minimization finds the equivalent DFA with the minimum number of states.

Hopcroft Algorithm

1. Initial partition: {accepting states}, {non-accepting states}

2. Repeat: examine each group, and within the same group,
   separate states that transition to different groups for some input

3. Terminate when no more splits are possible

Minimization Example

Initial partition:
  G1 = {A, B, C, D}  (non-accepting)
  G2 = {E}           (accepting)

Checking input 'b':
  A -> C (G1), B -> D (G1), C -> C (G1), D -> E (G2)
  D goes to a different group, so split!

New partition:
  G1 = {A, B, C}
  G3 = {D}
  G2 = {E}

Checking input 'b':
  A -> C (G1), B -> D (G3), C -> C (G1)
  Split B!

Final partition:
  {A, C}, {B}, {D}, {E}

7. Lexical Analyzer Generation Using the Lex Tool

Lex (or Flex) is a tool that automatically generates a lexical analyzer when token patterns are described using regular expressions.

Lex Program Structure

Declarations
%%
Translation Rules
%%
Auxiliary Functions

Lex Program Example

%{
#include <stdio.h>
#include "y.tab.h"
%}

digit    [0-9]
letter   [a-zA-Z]

%%

if          { return IF; }
else        { return ELSE; }
while       { return WHILE; }

{letter}({letter}|{digit})*  { return ID; }
{digit}+                      { return NUM; }

"+"         { return PLUS; }
"-"         { return MINUS; }
"*"         { return TIMES; }
"/"         { return DIVIDE; }
"="         { return ASSIGN; }
"=="        { return EQ; }
"<"         { return LT; }
"<="        { return LE; }

"("         { return LPAREN; }
")"         { return RPAREN; }
";"         { return SEMI; }

[ \t\n]+    { /* ignore whitespace */ }
.           { printf("Unexpected: %s\n", yytext); }

%%

How Lex Works

Lex Specification (.l file)
       |
       v
  +----------+
  |   Lex    |  (Lex Compiler)
  +----------+
       |
       v
  lex.yy.c (C Source Code)
       |
       v
  +----------+
  | C Compiler|
  +----------+
       |
       v
  Lexical Analyzer (Executable)

Lex Rule Conflict Resolution

Rules when multiple patterns match simultaneously:

Longest Match: Selects the pattern with the longest match.
- If the input is ifx, it is recognized as the identifier ifx, not the keyword if.
Rule Priority: If lengths are equal, the rule listed first is selected.
- if is recognized as a keyword because the keyword rule appears before the identifier pattern.

8. Direct Conversion from Regular Expression to DFA

There is also a method to construct a DFA directly from a regular expression, bypassing Thompson construction and subset construction.

Key Functions

For node n of the regular expression's syntax tree, the following are computed:

nullable(n): Whether node n can generate the empty string
firstpos(n): Set of positions that can be the first position of strings generated by n
lastpos(n): Set of positions that can be the last position of strings generated by n
followpos(p): Set of positions that can follow position p

nullable rules:
  epsilon node     -> true
  position i node  -> false
  cat(c1, c2) node -> nullable(c1) AND nullable(c2)
  or(c1, c2) node  -> nullable(c1) OR nullable(c2)
  star(c1) node    -> true

This method bypasses the intermediate NFA step, yielding an efficient DFA with fewer states directly.

Summary

Concept	Key Content
NFA	Nondeterministic, epsilon transitions, simple to construct
DFA	Deterministic, one transition per input, fast execution (O(n))
Thompson Construction	Systematic conversion of regex to NFA
Subset Construction	Conversion of NFA to DFA (possible state explosion)
DFA Minimization	Constructing the equivalent minimum-state DFA
Lex	Automatic lexical analyzer generation from regex
Longest Match	Selects the longest pattern when multiple match

Finite automata are both the theoretical foundation and the core implementation of lexical analysis. In the next article, we will move to the syntax analysis phase and explore Top-Down parsing and LL parsers.

[컴파일러] 05. 유한 오토마타와 어휘 분석기 구현

유한 오토마타와 어휘 분석기 구현

1. 유한 오토마타(Finite Automata) 개요

2. NFA (비결정적 유한 오토마타)

NFA의 핵심 특징

NFA 예제: (a|b)*abb

전이 테이블

NFA의 문자열 수락

3. DFA (결정적 유한 오토마타)

DFA 예제: (a|b)*abb

전이 테이블

DFA 시뮬레이션 알고리즘

4. 정규 표현식에서 NFA로: Thompson 구성

기본 구성 요소

합집합: r | s

접합: rs

클로저: r*

Thompson 구성 예제: (a|b)*abb

5. NFA에서 DFA로: 부분 집합 구성

핵심 아이디어

필요한 연산

알고리즘

부분 집합 구성 예제

6. DFA 최소화

Hopcroft 알고리즘

최소화 예제

7. Lex 도구를 활용한 어휘 분석기 생성

Lex 프로그램 구조

Lex 프로그램 예제

Lex의 동작 원리

Lex의 규칙 충돌 해결

8. 정규 표현식에서 DFA로 직접 변환

주요 함수

정리

[Compiler] 05. Finite Automata and Lexical Analyzer Implementation

Finite Automata and Lexical Analyzer Implementation

1. Finite Automata Overview

2. NFA (Nondeterministic Finite Automaton)

Key Characteristics of NFA

NFA Example: (a|b)*abb

Transition Table

String Acceptance by NFA

3. DFA (Deterministic Finite Automaton)

DFA Example: (a|b)*abb

Transition Table

DFA Simulation Algorithm

4. Regular Expression to NFA: Thompson Construction

Basic Building Blocks

Union: r | s

Concatenation: rs

Closure: r*

Thompson Construction Example: (a|b)*abb

5. NFA to DFA: Subset Construction

Core Idea

Required Operations

Algorithm

Subset Construction Example

6. DFA Minimization

Hopcroft Algorithm

Minimization Example

7. Lexical Analyzer Generation Using the Lex Tool

Lex Program Structure

Lex Program Example

How Lex Works

Lex Rule Conflict Resolution

8. Direct Conversion from Regular Expression to DFA

Key Functions

Summary

NFA 예제: `(a|b)*abb`

DFA 예제: `(a|b)*abb`

Thompson 구성 예제: `(a|b)*abb`

NFA Example: `(a|b)*abb`

DFA Example: `(a|b)*abb`

Thompson Construction Example: `(a|b)*abb`