Skip to content

Split View: 컴파일러 & 인터프리터 설계 완전 가이드 2025: Lexer, Parser, AST, 코드 생성

✨ Learn with Quiz
|

컴파일러 & 인터프리터 설계 완전 가이드 2025: Lexer, Parser, AST, 코드 생성

목차

1. 왜 컴파일러를 배워야 하는가

컴파일러 이론은 컴퓨터 과학의 핵심 분야 중 하나입니다. 컴파일러를 이해하면 프로그래밍 언어의 동작 원리를 깊이 파악할 수 있으며, 실무에서도 다양한 곳에 활용됩니다.

1.1 실무에서의 활용

활용 분야예시
DSL 설계설정 언어, 쿼리 언어, 템플릿 엔진
트랜스파일러TypeScript, Babel, PostCSS
린터/포매터ESLint, Prettier, rustfmt
IDE 기능자동 완성, 리팩토링, 구문 강조
코드 분석정적 분석, 보안 스캐너, 의존성 분석
빌드 도구webpack, esbuild, SWC, Vite

1.2 컴파일레이션 파이프라인 개요

소스 코드 → 컴파일레이션 파이프라인:

┌──────────┐    ┌──────────┐    ┌──────────┐
Source   │───>Lexer   │───>ParserCode(Tokenizer)│    │          │
└──────────┘    └──────────┘    └──────────┘
                                      v
┌──────────┐    ┌──────────┐    ┌──────────┐
Code<───│Optimizer │<───│ Semantic│Generator │    │          │    │ Analysis└──────────┘    └──────────┘    └──────────┘
      v
┌──────────┐
TargetCode└──────────┘

각 단계의 역할:
1. Lexer: 소스 코드 -> 토큰 스트림
2. Parser: 토큰 스트림 -> AST (추상 구문 트리)
3. Semantic Analysis: 타입 검사, 스코프 해석
4. Optimizer: IR 수준에서 최적화
5. Code Generator: 타겟 코드 생성 (기계어, 바이트코드, WASM)

2. Lexer(렉서) 구현

Lexer(토크나이저)는 소스 코드 문자열을 토큰(Token)의 스트림으로 변환합니다.

2.1 토큰 타입 정의

// 토큰 타입 열거형
enum TokenType {
  // 리터럴
  Number = 'Number',
  String = 'String',
  Boolean = 'Boolean',

  // 식별자 & 키워드
  Identifier = 'Identifier',
  Let = 'Let',
  Const = 'Const',
  Function = 'Function',
  If = 'If',
  Else = 'Else',
  Return = 'Return',
  While = 'While',
  For = 'For',

  // 연산자
  Plus = 'Plus',           // +
  Minus = 'Minus',         // -
  Star = 'Star',           // *
  Slash = 'Slash',         // /
  Percent = 'Percent',     // %
  Assign = 'Assign',       // =
  Equal = 'Equal',         // ==
  NotEqual = 'NotEqual',   // !=
  LessThan = 'LessThan',  // <
  GreaterThan = 'GreaterThan', // >
  And = 'And',             // &&
  Or = 'Or',               // ||
  Not = 'Not',             // !

  // 구분자
  LeftParen = 'LeftParen',     // (
  RightParen = 'RightParen',   // )
  LeftBrace = 'LeftBrace',     // {  (코드 블록 안에서만)
  RightBrace = 'RightBrace',   // }  (코드 블록 안에서만)
  LeftBracket = 'LeftBracket', // [
  RightBracket = 'RightBracket', // ]
  Comma = 'Comma',
  Semicolon = 'Semicolon',
  Colon = 'Colon',
  Dot = 'Dot',
  Arrow = 'Arrow',         // =>

  // 특수
  EOF = 'EOF',
}

// 토큰 인터페이스
interface Token {
  type: TokenType;
  value: string;
  line: number;
  column: number;
}

2.2 Lexer 클래스 구현

class Lexer {
  private source: string;
  private pos: number = 0;
  private line: number = 1;
  private column: number = 1;
  private tokens: Token[] = [];

  // 키워드 맵
  private static keywords: Map<string, TokenType> = new Map([
    ['let', TokenType.Let],
    ['const', TokenType.Const],
    ['function', TokenType.Function],
    ['if', TokenType.If],
    ['else', TokenType.Else],
    ['return', TokenType.Return],
    ['while', TokenType.While],
    ['for', TokenType.For],
    ['true', TokenType.Boolean],
    ['false', TokenType.Boolean],
  ]);

  constructor(source: string) {
    this.source = source;
  }

  tokenize(): Token[] {
    while (this.pos < this.source.length) {
      this.skipWhitespace();
      this.skipComments();

      if (this.pos >= this.source.length) break;

      const char = this.source[this.pos];

      if (this.isDigit(char)) {
        this.readNumber();
      } else if (this.isAlpha(char) || char === '_') {
        this.readIdentifier();
      } else if (char === '"' || char === "'") {
        this.readString(char);
      } else {
        this.readOperator();
      }
    }

    this.tokens.push(this.makeToken(TokenType.EOF, ''));
    return this.tokens;
  }

  private readNumber(): void {
    const start = this.pos;
    let isFloat = false;

    while (this.pos < this.source.length && this.isDigit(this.current())) {
      this.advance();
    }

    // 소수점 처리
    if (this.current() === '.' && this.isDigit(this.peek())) {
      isFloat = true;
      this.advance(); // '.' 건너뛰기
      while (this.pos < this.source.length && this.isDigit(this.current())) {
        this.advance();
      }
    }

    const value = this.source.slice(start, this.pos);
    this.tokens.push(this.makeToken(TokenType.Number, value));
  }

  private readIdentifier(): void {
    const start = this.pos;

    while (
      this.pos < this.source.length &&
      (this.isAlphaNumeric(this.current()) || this.current() === '_')
    ) {
      this.advance();
    }

    const value = this.source.slice(start, this.pos);
    const type = Lexer.keywords.get(value) || TokenType.Identifier;
    this.tokens.push(this.makeToken(type, value));
  }

  private readString(quote: string): void {
    this.advance(); // 시작 따옴표 건너뛰기
    const start = this.pos;
    let value = '';

    while (this.pos < this.source.length && this.current() !== quote) {
      if (this.current() === '\\') {
        this.advance();
        switch (this.current()) {
          case 'n': value += '\n'; break;
          case 't': value += '\t'; break;
          case '\\': value += '\\'; break;
          default: value += this.current();
        }
      } else {
        value += this.current();
      }
      this.advance();
    }

    if (this.current() !== quote) {
      throw new Error(
        'Unterminated string at line ' + this.line
      );
    }
    this.advance(); // 닫는 따옴표 건너뛰기

    this.tokens.push(this.makeToken(TokenType.String, value));
  }

  private readOperator(): void {
    const char = this.current();
    const next = this.peek();

    // 2문자 연산자 확인
    const twoChar = char + (next || '');
    const twoCharOps: Record<string, TokenType> = {
      '==': TokenType.Equal,
      '!=': TokenType.NotEqual,
      '&&': TokenType.And,
      '||': TokenType.Or,
      '=>': TokenType.Arrow,
    };

    if (twoCharOps[twoChar]) {
      this.tokens.push(this.makeToken(twoCharOps[twoChar], twoChar));
      this.advance();
      this.advance();
      return;
    }

    // 1문자 연산자
    const oneCharOps: Record<string, TokenType> = {
      '+': TokenType.Plus,
      '-': TokenType.Minus,
      '*': TokenType.Star,
      '/': TokenType.Slash,
      '%': TokenType.Percent,
      '=': TokenType.Assign,
      '<': TokenType.LessThan,
      '>': TokenType.GreaterThan,
      '!': TokenType.Not,
      '(': TokenType.LeftParen,
      ')': TokenType.RightParen,
      '[': TokenType.LeftBracket,
      ']': TokenType.RightBracket,
      ',': TokenType.Comma,
      ';': TokenType.Semicolon,
      ':': TokenType.Colon,
      '.': TokenType.Dot,
    };

    // 중괄호 처리
    if (char === '{') {
      this.tokens.push(this.makeToken(TokenType.LeftBrace, char));
      this.advance();
      return;
    }
    if (char === '}') {
      this.tokens.push(this.makeToken(TokenType.RightBrace, char));
      this.advance();
      return;
    }

    if (oneCharOps[char]) {
      this.tokens.push(this.makeToken(oneCharOps[char], char));
      this.advance();
    } else {
      throw new Error(
        'Unexpected character: ' + char + ' at line ' + this.line
      );
    }
  }

  // 유틸리티 메서드들
  private current(): string { return this.source[this.pos]; }
  private peek(): string { return this.source[this.pos + 1]; }
  private advance(): void {
    if (this.current() === '\n') {
      this.line++;
      this.column = 1;
    } else {
      this.column++;
    }
    this.pos++;
  }
  private isDigit(c: string): boolean { return /[0-9]/.test(c); }
  private isAlpha(c: string): boolean { return /[a-zA-Z_]/.test(c); }
  private isAlphaNumeric(c: string): boolean { return /[a-zA-Z0-9_]/.test(c); }
  private skipWhitespace(): void {
    while (this.pos < this.source.length && /\s/.test(this.current())) {
      this.advance();
    }
  }
  private skipComments(): void {
    if (this.current() === '/' && this.peek() === '/') {
      while (this.pos < this.source.length && this.current() !== '\n') {
        this.advance();
      }
    }
  }
  private makeToken(type: TokenType, value: string): Token {
    return { type, value, line: this.line, column: this.column };
  }
}

2.3 에러 복구(Error Recovery)

// 에러 복구가 가능한 Lexer
class RobustLexer extends Lexer {
  private errors: LexerError[] = [];

  tokenize(): Token[] {
    while (this.pos < this.source.length) {
      try {
        this.skipWhitespace();
        if (this.pos >= this.source.length) break;
        // ... 일반 토큰화 로직
      } catch (error) {
        // 에러 기록 후 다음 유효한 위치로 이동
        this.errors.push({
          message: error.message,
          line: this.line,
          column: this.column,
        });
        this.synchronize(); // 다음 유효 토큰 시작 위치까지 건너뛰기
      }
    }
    return this.tokens;
  }

  private synchronize(): void {
    // 다음 공백이나 구분자까지 건너뛰기
    while (
      this.pos < this.source.length &&
      !/[\s;,(){}[\]]/.test(this.current())
    ) {
      this.advance();
    }
  }
}

3. Parser(파서) 구현

Parser는 토큰 스트림을 AST(추상 구문 트리)로 변환합니다.

3.1 파서 유형 비교

파서 유형 비교:
┌─────────────────┬────────────┬────────────┬──────────────┐
│ 유형            │ 구현 난이도│ 표현력     │ 대표 도구    │
├─────────────────┼────────────┼────────────┼──────────────┤
│ 재귀 하강       │ 쉬움      │ LL(k)GCC, GoPratt Parser    │ 보통      │ 우선순위   │ V8, RustPEG             │ 보통      │ 무제한 예측│ pest, PEG.jsParser Combinator│ 보통     │ 유연       │ nom, parsec  │
LR/LALR         │ 어려움    │ LR(1)      │ yacc, bison  │
Earley          │ 보통      │ 모든 CFG   │ nearley      │
└─────────────────┴────────────┴────────────┴──────────────┘

3.2 재귀 하강 파서(Recursive Descent Parser)

// AST 노드 타입 정의
type ASTNode =
  | NumberLiteral
  | StringLiteral
  | BooleanLiteral
  | Identifier
  | BinaryExpression
  | UnaryExpression
  | CallExpression
  | VariableDeclaration
  | FunctionDeclaration
  | IfStatement
  | WhileStatement
  | ReturnStatement
  | BlockStatement
  | Program;

interface NumberLiteral {
  type: 'NumberLiteral';
  value: number;
}

interface BinaryExpression {
  type: 'BinaryExpression';
  operator: string;
  left: ASTNode;
  right: ASTNode;
}

interface VariableDeclaration {
  type: 'VariableDeclaration';
  kind: 'let' | 'const';
  name: string;
  initializer: ASTNode | null;
}

interface FunctionDeclaration {
  type: 'FunctionDeclaration';
  name: string;
  params: string[];
  body: BlockStatement;
}

interface IfStatement {
  type: 'IfStatement';
  condition: ASTNode;
  consequent: BlockStatement;
  alternate: BlockStatement | IfStatement | null;
}

interface Program {
  type: 'Program';
  body: ASTNode[];
}

// 재귀 하강 파서
class RecursiveDescentParser {
  private tokens: Token[];
  private pos: number = 0;

  constructor(tokens: Token[]) {
    this.tokens = tokens;
  }

  parse(): Program {
    const body: ASTNode[] = [];

    while (!this.isAtEnd()) {
      body.push(this.parseStatement());
    }

    return { type: 'Program', body };
  }

  private parseStatement(): ASTNode {
    const token = this.current();

    switch (token.type) {
      case TokenType.Let:
      case TokenType.Const:
        return this.parseVariableDeclaration();
      case TokenType.Function:
        return this.parseFunctionDeclaration();
      case TokenType.If:
        return this.parseIfStatement();
      case TokenType.While:
        return this.parseWhileStatement();
      case TokenType.Return:
        return this.parseReturnStatement();
      case TokenType.LeftBrace:
        return this.parseBlockStatement();
      default:
        return this.parseExpressionStatement();
    }
  }

  private parseVariableDeclaration(): VariableDeclaration {
    const kind = this.consume().value as 'let' | 'const';
    const name = this.expect(TokenType.Identifier).value;

    let initializer: ASTNode | null = null;
    if (this.match(TokenType.Assign)) {
      initializer = this.parseExpression();
    }

    this.expect(TokenType.Semicolon);

    return { type: 'VariableDeclaration', kind, name, initializer };
  }

  private parseFunctionDeclaration(): FunctionDeclaration {
    this.expect(TokenType.Function);
    const name = this.expect(TokenType.Identifier).value;

    this.expect(TokenType.LeftParen);
    const params: string[] = [];

    if (!this.check(TokenType.RightParen)) {
      do {
        params.push(this.expect(TokenType.Identifier).value);
      } while (this.match(TokenType.Comma));
    }

    this.expect(TokenType.RightParen);
    const body = this.parseBlockStatement();

    return { type: 'FunctionDeclaration', name, params, body };
  }

  private parseIfStatement(): IfStatement {
    this.expect(TokenType.If);
    this.expect(TokenType.LeftParen);
    const condition = this.parseExpression();
    this.expect(TokenType.RightParen);

    const consequent = this.parseBlockStatement();

    let alternate: BlockStatement | IfStatement | null = null;
    if (this.match(TokenType.Else)) {
      if (this.check(TokenType.If)) {
        alternate = this.parseIfStatement();
      } else {
        alternate = this.parseBlockStatement();
      }
    }

    return { type: 'IfStatement', condition, consequent, alternate };
  }

  // 유틸리티 메서드
  private current(): Token { return this.tokens[this.pos]; }
  private consume(): Token { return this.tokens[this.pos++]; }
  private isAtEnd(): boolean { return this.current().type === TokenType.EOF; }
  private check(type: TokenType): boolean { return this.current().type === type; }
  private match(type: TokenType): boolean {
    if (this.check(type)) { this.consume(); return true; }
    return false;
  }
  private expect(type: TokenType): Token {
    if (this.check(type)) return this.consume();
    throw new Error(
      'Expected ' + type + ' but got ' + this.current().type +
      ' at line ' + this.current().line
    );
  }
}

3.3 Pratt Parser (연산자 우선순위 파서)

Pratt Parser는 연산자 우선순위를 우아하게 처리하는 파서입니다. V8(JavaScript), Rust 컴파일러 등 많은 현대 언어 구현에서 사용됩니다.

// 연산자 우선순위 정의
enum Precedence {
  None = 0,
  Assignment = 1,  // =
  Or = 2,          // ||
  And = 3,         // &&
  Equality = 4,    // == !=
  Comparison = 5,  // < > (이하/이상 포함)
  Term = 6,        // + -
  Factor = 7,      // * / %
  Unary = 8,       // ! -
  Call = 9,        // ()
  Member = 10,     // .
}

// Prefix 파서 함수 타입 (숫자, 식별자, 괄호, 단항 연산자)
type PrefixParseFn = () => ASTNode;

// Infix 파서 함수 타입 (이항 연산자, 함수 호출)
type InfixParseFn = (left: ASTNode) => ASTNode;

class PrattParser {
  private tokens: Token[];
  private pos: number = 0;

  // Prefix 파서 등록
  private prefixParsers: Map<TokenType, PrefixParseFn> = new Map();
  // Infix 파서 등록
  private infixParsers: Map<TokenType, InfixParseFn> = new Map();
  // 우선순위 등록
  private precedences: Map<TokenType, Precedence> = new Map();

  constructor(tokens: Token[]) {
    this.tokens = tokens;

    // Prefix 파서 등록
    this.registerPrefix(TokenType.Number, this.parseNumber.bind(this));
    this.registerPrefix(TokenType.String, this.parseString.bind(this));
    this.registerPrefix(TokenType.Boolean, this.parseBoolean.bind(this));
    this.registerPrefix(TokenType.Identifier, this.parseIdentifier.bind(this));
    this.registerPrefix(TokenType.LeftParen, this.parseGrouping.bind(this));
    this.registerPrefix(TokenType.Minus, this.parseUnary.bind(this));
    this.registerPrefix(TokenType.Not, this.parseUnary.bind(this));

    // Infix 파서 등록
    this.registerInfix(TokenType.Plus, Precedence.Term, this.parseBinary.bind(this));
    this.registerInfix(TokenType.Minus, Precedence.Term, this.parseBinary.bind(this));
    this.registerInfix(TokenType.Star, Precedence.Factor, this.parseBinary.bind(this));
    this.registerInfix(TokenType.Slash, Precedence.Factor, this.parseBinary.bind(this));
    this.registerInfix(TokenType.Equal, Precedence.Equality, this.parseBinary.bind(this));
    this.registerInfix(TokenType.NotEqual, Precedence.Equality, this.parseBinary.bind(this));
    this.registerInfix(TokenType.LessThan, Precedence.Comparison, this.parseBinary.bind(this));
    this.registerInfix(TokenType.GreaterThan, Precedence.Comparison, this.parseBinary.bind(this));
    this.registerInfix(TokenType.And, Precedence.And, this.parseBinary.bind(this));
    this.registerInfix(TokenType.Or, Precedence.Or, this.parseBinary.bind(this));
    this.registerInfix(TokenType.LeftParen, Precedence.Call, this.parseCall.bind(this));
  }

  // 핵심: Pratt 파싱 알고리즘
  parseExpression(precedence: Precedence = Precedence.None): ASTNode {
    const token = this.current();
    const prefixFn = this.prefixParsers.get(token.type);

    if (!prefixFn) {
      throw new Error('No prefix parser for ' + token.type);
    }

    let left = prefixFn();

    // 현재 우선순위보다 높은 infix 연산자가 있으면 계속 파싱
    while (
      !this.isAtEnd() &&
      precedence < this.getPrecedence(this.current().type)
    ) {
      const infixFn = this.infixParsers.get(this.current().type);
      if (!infixFn) break;
      left = infixFn(left);
    }

    return left;
  }

  // 이항 연산자 파싱
  private parseBinary(left: ASTNode): ASTNode {
    const operator = this.consume();
    const precedence = this.precedences.get(operator.type) || Precedence.None;
    const right = this.parseExpression(precedence); // 왼쪽 결합

    return {
      type: 'BinaryExpression',
      operator: operator.value,
      left,
      right,
    };
  }

  // 함수 호출 파싱
  private parseCall(left: ASTNode): ASTNode {
    this.consume(); // '(' 소비
    const args: ASTNode[] = [];

    if (!this.check(TokenType.RightParen)) {
      do {
        args.push(this.parseExpression());
      } while (this.match(TokenType.Comma));
    }

    this.expect(TokenType.RightParen);

    return {
      type: 'CallExpression',
      callee: left,
      arguments: args,
    } as CallExpression;
  }

  // 유틸리티 메서드들
  private registerPrefix(type: TokenType, fn: PrefixParseFn): void {
    this.prefixParsers.set(type, fn);
  }

  private registerInfix(type: TokenType, prec: Precedence, fn: InfixParseFn): void {
    this.infixParsers.set(type, fn);
    this.precedences.set(type, prec);
  }

  private getPrecedence(type: TokenType): Precedence {
    return this.precedences.get(type) || Precedence.None;
  }

  private parseNumber(): ASTNode {
    const token = this.consume();
    return { type: 'NumberLiteral', value: parseFloat(token.value) };
  }

  private parseString(): ASTNode {
    const token = this.consume();
    return { type: 'StringLiteral', value: token.value } as StringLiteral;
  }

  private parseBoolean(): ASTNode {
    const token = this.consume();
    return { type: 'BooleanLiteral', value: token.value === 'true' } as BooleanLiteral;
  }

  private parseIdentifier(): ASTNode {
    const token = this.consume();
    return { type: 'Identifier', name: token.value } as Identifier;
  }

  private parseGrouping(): ASTNode {
    this.consume(); // '(' 소비
    const expr = this.parseExpression();
    this.expect(TokenType.RightParen);
    return expr;
  }

  private parseUnary(): ASTNode {
    const operator = this.consume();
    const operand = this.parseExpression(Precedence.Unary);
    return {
      type: 'UnaryExpression',
      operator: operator.value,
      operand,
    } as UnaryExpression;
  }

  // 공통 유틸리티
  private current(): Token { return this.tokens[this.pos]; }
  private consume(): Token { return this.tokens[this.pos++]; }
  private isAtEnd(): boolean { return this.current().type === TokenType.EOF; }
  private check(type: TokenType): boolean { return this.current().type === type; }
  private match(type: TokenType): boolean {
    if (this.check(type)) { this.consume(); return true; }
    return false;
  }
  private expect(type: TokenType): Token {
    if (this.check(type)) return this.consume();
    throw new Error('Expected ' + type + ' but got ' + this.current().type);
  }
}

4. AST 설계와 Visitor 패턴

4.1 AST 노드 설계 원칙

// 각 AST 노드는 위치 정보를 포함
interface ASTNodeBase {
  type: string;
  loc: SourceLocation;
}

interface SourceLocation {
  start: Position;
  end: Position;
  source?: string;
}

interface Position {
  line: number;
  column: number;
  offset: number;
}

4.2 Visitor 패턴

// Visitor 인터페이스
interface ASTVisitor<T> {
  visitProgram(node: Program): T;
  visitNumberLiteral(node: NumberLiteral): T;
  visitStringLiteral(node: StringLiteral): T;
  visitBinaryExpression(node: BinaryExpression): T;
  visitUnaryExpression(node: UnaryExpression): T;
  visitIdentifier(node: Identifier): T;
  visitVariableDeclaration(node: VariableDeclaration): T;
  visitFunctionDeclaration(node: FunctionDeclaration): T;
  visitCallExpression(node: CallExpression): T;
  visitIfStatement(node: IfStatement): T;
  visitReturnStatement(node: ReturnStatement): T;
  visitBlockStatement(node: BlockStatement): T;
}

// AST 프린터 (디버깅용)
class ASTPrinter implements ASTVisitor<string> {
  private indent: number = 0;

  visitProgram(node: Program): string {
    return node.body.map((stmt) => this.visit(stmt)).join('\n');
  }

  visitBinaryExpression(node: BinaryExpression): string {
    return (
      '(' +
      this.visit(node.left) +
      ' ' + node.operator + ' ' +
      this.visit(node.right) +
      ')'
    );
  }

  visitNumberLiteral(node: NumberLiteral): string {
    return node.value.toString();
  }

  visitIdentifier(node: Identifier): string {
    return node.name;
  }

  // 범용 visit 디스패처
  visit(node: ASTNode): string {
    switch (node.type) {
      case 'Program': return this.visitProgram(node as Program);
      case 'NumberLiteral': return this.visitNumberLiteral(node as NumberLiteral);
      case 'BinaryExpression': return this.visitBinaryExpression(node as BinaryExpression);
      case 'Identifier': return this.visitIdentifier(node as Identifier);
      // ... 나머지 노드 타입
      default: return 'Unknown: ' + node.type;
    }
  }
}

5. 의미 분석 (Semantic Analysis)

5.1 심볼 테이블과 스코프

// 심볼 정보
interface Symbol {
  name: string;
  type: DataType;
  kind: 'variable' | 'function' | 'parameter';
  mutable: boolean;
  scope: Scope;
}

// 스코프 (렉시컬 스코프 체인)
class Scope {
  private symbols: Map<string, Symbol> = new Map();
  private parent: Scope | null;

  constructor(parent: Scope | null = null) {
    this.parent = parent;
  }

  define(symbol: Symbol): void {
    if (this.symbols.has(symbol.name)) {
      throw new Error('Symbol already defined: ' + symbol.name);
    }
    this.symbols.set(symbol.name, symbol);
  }

  resolve(name: string): Symbol | null {
    if (this.symbols.has(name)) {
      return this.symbols.get(name)!;
    }
    // 부모 스코프에서 검색
    if (this.parent) {
      return this.parent.resolve(name);
    }
    return null;
  }

  createChild(): Scope {
    return new Scope(this);
  }
}

5.2 타입 검사

// 타입 시스템
type DataType =
  | { kind: 'number' }
  | { kind: 'string' }
  | { kind: 'boolean' }
  | { kind: 'void' }
  | { kind: 'function'; params: DataType[]; returnType: DataType }
  | { kind: 'array'; elementType: DataType };

class TypeChecker implements ASTVisitor<DataType> {
  private currentScope: Scope = new Scope();

  visitBinaryExpression(node: BinaryExpression): DataType {
    const leftType = this.visit(node.left);
    const rightType = this.visit(node.right);

    // 산술 연산자
    if (['+', '-', '*', '/', '%'].includes(node.operator)) {
      if (node.operator === '+') {
        // 문자열 연결 허용
        if (leftType.kind === 'string' || rightType.kind === 'string') {
          return { kind: 'string' };
        }
      }

      if (leftType.kind !== 'number' || rightType.kind !== 'number') {
        throw new TypeError(
          'Operator ' + node.operator +
          ' requires number operands, got ' +
          leftType.kind + ' and ' + rightType.kind
        );
      }
      return { kind: 'number' };
    }

    // 비교 연산자
    if (['==', '!=', '<', '>'].includes(node.operator)) {
      if (leftType.kind !== rightType.kind) {
        throw new TypeError(
          'Cannot compare ' + leftType.kind + ' with ' + rightType.kind
        );
      }
      return { kind: 'boolean' };
    }

    // 논리 연산자
    if (['&&', '||'].includes(node.operator)) {
      if (leftType.kind !== 'boolean' || rightType.kind !== 'boolean') {
        throw new TypeError('Logical operators require boolean operands');
      }
      return { kind: 'boolean' };
    }

    throw new TypeError('Unknown operator: ' + node.operator);
  }

  visitVariableDeclaration(node: VariableDeclaration): DataType {
    const initType = node.initializer ? this.visit(node.initializer) : { kind: 'void' as const };

    this.currentScope.define({
      name: node.name,
      type: initType,
      kind: 'variable',
      mutable: node.kind === 'let',
      scope: this.currentScope,
    });

    return { kind: 'void' };
  }

  // ... 나머지 visit 메서드들
  visit(node: ASTNode): DataType {
    // 각 노드 타입에 맞는 메서드 디스패치
    switch (node.type) {
      case 'NumberLiteral': return { kind: 'number' };
      case 'StringLiteral': return { kind: 'string' };
      case 'BooleanLiteral': return { kind: 'boolean' };
      case 'BinaryExpression': return this.visitBinaryExpression(node as BinaryExpression);
      case 'VariableDeclaration': return this.visitVariableDeclaration(node as VariableDeclaration);
      default: return { kind: 'void' };
    }
  }
}

6. 중간 표현 (IR - Intermediate Representation)

6.1 Three-Address Code

// 3주소 코드(Three-Address Code)
// 각 명령어가 최대 3개의 주소(피연산자)를 가짐

type IRInstruction =
  | { op: 'CONST'; dest: string; value: number }
  | { op: 'ADD' | 'SUB' | 'MUL' | 'DIV'; dest: string; left: string; right: string }
  | { op: 'COPY'; dest: string; src: string }
  | { op: 'LABEL'; name: string }
  | { op: 'JUMP'; target: string }
  | { op: 'JUMP_IF_FALSE'; condition: string; target: string }
  | { op: 'CALL'; dest: string; fn: string; args: string[] }
  | { op: 'RETURN'; value: string | null }
  | { op: 'PARAM'; value: string };

// AST -> IR 변환
class IRGenerator implements ASTVisitor<string> {
  private instructions: IRInstruction[] = [];
  private tempCounter: number = 0;
  private labelCounter: number = 0;

  private newTemp(): string {
    return '_t' + (this.tempCounter++);
  }

  private newLabel(): string {
    return '_L' + (this.labelCounter++);
  }

  private emit(instruction: IRInstruction): void {
    this.instructions.push(instruction);
  }

  visitBinaryExpression(node: BinaryExpression): string {
    const left = this.visit(node.left);
    const right = this.visit(node.right);
    const dest = this.newTemp();

    const opMap: Record<string, string> = {
      '+': 'ADD', '-': 'SUB', '*': 'MUL', '/': 'DIV',
    };

    this.emit({
      op: opMap[node.operator] as 'ADD',
      dest,
      left,
      right,
    });

    return dest;
  }

  visitIfStatement(node: IfStatement): string {
    const condition = this.visit(node.condition);
    const elseLabel = this.newLabel();
    const endLabel = this.newLabel();

    this.emit({ op: 'JUMP_IF_FALSE', condition, target: elseLabel });

    // then 블록
    this.visit(node.consequent as ASTNode);
    this.emit({ op: 'JUMP', target: endLabel });

    // else 블록
    this.emit({ op: 'LABEL', name: elseLabel });
    if (node.alternate) {
      this.visit(node.alternate as ASTNode);
    }

    this.emit({ op: 'LABEL', name: endLabel });
    return '';
  }

  // ... 나머지 visit 메서드들
  visit(node: ASTNode): string {
    switch (node.type) {
      case 'NumberLiteral': {
        const dest = this.newTemp();
        this.emit({ op: 'CONST', dest, value: (node as NumberLiteral).value });
        return dest;
      }
      case 'BinaryExpression':
        return this.visitBinaryExpression(node as BinaryExpression);
      case 'IfStatement':
        return this.visitIfStatement(node as IfStatement);
      default:
        return '';
    }
  }

  getInstructions(): IRInstruction[] {
    return this.instructions;
  }
}

6.2 SSA 형태와 LLVM IR

LLVM IR의 기본 구조:

; 함수 정의
define i32 @add(i32 %a, i32 %b) {
entry:
  %sum = add i32 %a, %b
  ret i32 %sum
}

; if-else 구조
define i32 @max(i32 %a, i32 %b) {
entry:
  %cmp = icmp sgt i32 %a, %b
  br i1 %cmp, label %then, label %else

then:
  ret i32 %a

else:
  ret i32 %b
}

; 루프 구조
define i32 @sum_to_n(i32 %n) {
entry:
  br label %loop

loop:
  %i = phi i32 [ 0, %entry ], [ %next_i, %loop ]
  %sum = phi i32 [ 0, %entry ], [ %next_sum, %loop ]
  %next_sum = add i32 %sum, %i
  %next_i = add i32 %i, 1
  %cond = icmp slt i32 %next_i, %n
  br i1 %cond, label %loop, label %exit

exit:
  ret i32 %next_sum
}

SSA(Static Single Assignment) 특징:
- 각 변수는 정확히 한 번만 할당됨
- phi 노드로 제어 흐름 합류 지점 처리
- 최적화 패스를 적용하기 용이

7. 최적화 패스 (Optimization Passes)

7.1 상수 폴딩 (Constant Folding)

// 상수 폴딩: 컴파일 시점에 상수 표현식 계산
class ConstantFolding {
  optimize(node: ASTNode): ASTNode {
    if (node.type === 'BinaryExpression') {
      const binary = node as BinaryExpression;
      const left = this.optimize(binary.left);
      const right = this.optimize(binary.right);

      // 양쪽 모두 상수이면 미리 계산
      if (left.type === 'NumberLiteral' && right.type === 'NumberLiteral') {
        const leftVal = (left as NumberLiteral).value;
        const rightVal = (right as NumberLiteral).value;

        let result: number;
        switch (binary.operator) {
          case '+': result = leftVal + rightVal; break;
          case '-': result = leftVal - rightVal; break;
          case '*': result = leftVal * rightVal; break;
          case '/': result = leftVal / rightVal; break;
          default: return { ...binary, left, right };
        }

        return { type: 'NumberLiteral', value: result };
      }

      return { ...binary, left, right };
    }

    return node;
  }
}

// 예: 2 + 3 * 4 -> 14

7.2 죽은 코드 제거 (Dead Code Elimination)

// 죽은 코드 제거
class DeadCodeElimination {
  optimize(program: Program): Program {
    const usedVariables = this.collectUsedVariables(program);

    const optimizedBody = program.body.filter((stmt) => {
      if (stmt.type === 'VariableDeclaration') {
        const decl = stmt as VariableDeclaration;
        // 사용되지 않고 사이드 이펙트가 없으면 제거
        if (!usedVariables.has(decl.name) && !this.hasSideEffects(decl.initializer)) {
          return false;
        }
      }
      return true;
    });

    return { ...program, body: optimizedBody };
  }

  private collectUsedVariables(program: Program): Set<string> {
    const used = new Set<string>();

    function walk(node: ASTNode): void {
      if (node.type === 'Identifier') {
        used.add((node as Identifier).name);
      }
      // 자식 노드 순회
      for (const child of getChildren(node)) {
        walk(child);
      }
    }

    program.body.forEach(walk);
    return used;
  }

  private hasSideEffects(node: ASTNode | null): boolean {
    if (!node) return false;
    return node.type === 'CallExpression';
  }
}

7.3 인라이닝 (Function Inlining)

// 함수 인라이닝: 작은 함수의 본문을 호출 위치에 삽입
class FunctionInliner {
  private functions: Map<string, FunctionDeclaration> = new Map();

  optimize(program: Program): Program {
    // 1단계: 함수 수집
    for (const stmt of program.body) {
      if (stmt.type === 'FunctionDeclaration') {
        const fn = stmt as FunctionDeclaration;
        this.functions.set(fn.name, fn);
      }
    }

    // 2단계: 인라이닝 가능한 호출 찾아서 대체
    return this.transformProgram(program);
  }

  private shouldInline(fn: FunctionDeclaration): boolean {
    // 함수 본문이 3문장 이하이고 재귀가 아니면 인라이닝
    const bodySize = fn.body.body.length;
    const isRecursive = this.containsCallTo(fn.body, fn.name);
    return bodySize <= 3 && !isRecursive;
  }

  private containsCallTo(node: ASTNode, name: string): boolean {
    if (node.type === 'CallExpression') {
      const call = node as CallExpression;
      if (call.callee.type === 'Identifier' && (call.callee as Identifier).name === name) {
        return true;
      }
    }
    return false;
  }

  private transformProgram(program: Program): Program {
    // 인라이닝 변환 로직
    return program; // 간략화
  }
}

7.4 루프 최적화

루프 최적화 기법들:

1. 루프 불변 코드 이동 (LICM - Loop Invariant Code Motion)
   // Before                    // After
   for (i = 0; i < n; i++)    x = a + b;  // 루프 밖으로 이동
     arr[i] = a + b + i;      for (i = 0; i < n; i++)
                                  arr[i] = x + i;

2. 루프 풀기 (Loop Unrolling)
   // Before                    // After
   for (i = 0; i < 4; i++)    arr[0] = 0;
     arr[i] = 0;               arr[1] = 0;
                                arr[2] = 0;
                                arr[3] = 0;

3. 강도 축소 (Strength Reduction)
   // Before                    // After
   for (i = 0; i < n; i++)    t = 0;
     arr[i] = i * 4;           for (i = 0; i < n; i++)
                                  arr[i] = t;
                                  t = t + 4;  // 곱셈 -> 덧셈

8. 코드 생성 (Code Generation)

8.1 WebAssembly 코드 생성

// AST -> WebAssembly Text Format (WAT) 코드 생성
class WasmCodeGenerator {
  private output: string[] = [];

  generate(program: Program): string {
    this.output.push('(module');

    for (const stmt of program.body) {
      if (stmt.type === 'FunctionDeclaration') {
        this.generateFunction(stmt as FunctionDeclaration);
      }
    }

    this.output.push(')');
    return this.output.join('\n');
  }

  private generateFunction(node: FunctionDeclaration): void {
    const params = node.params.map((p) => '(param $' + p + ' i32)').join(' ');

    this.output.push('  (func $' + node.name + ' ' + params + ' (result i32)');

    // 로컬 변수 선언
    this.output.push('    (local $__result i32)');

    // 함수 본문 생성
    for (const stmt of node.body.body) {
      this.generateStatement(stmt);
    }

    this.output.push('  )');
    this.output.push('  (export "' + node.name + '" (func $' + node.name + '))');
  }

  private generateExpression(node: ASTNode): void {
    switch (node.type) {
      case 'NumberLiteral':
        this.output.push('    i32.const ' + (node as NumberLiteral).value);
        break;

      case 'Identifier':
        this.output.push('    local.get $' + (node as Identifier).name);
        break;

      case 'BinaryExpression': {
        const binary = node as BinaryExpression;
        this.generateExpression(binary.left);
        this.generateExpression(binary.right);

        const opMap: Record<string, string> = {
          '+': 'i32.add',
          '-': 'i32.sub',
          '*': 'i32.mul',
          '/': 'i32.div_s',
          '<': 'i32.lt_s',
          '>': 'i32.gt_s',
          '==': 'i32.eq',
          '!=': 'i32.ne',
        };

        this.output.push('    ' + opMap[binary.operator]);
        break;
      }
    }
  }

  private generateStatement(node: ASTNode): void {
    // 문장 코드 생성
    if (node.type === 'ReturnStatement') {
      const ret = node as ReturnStatement;
      if (ret.value) {
        this.generateExpression(ret.value);
      }
      this.output.push('    return');
    }
  }
}

8.2 바이트코드 생성

// 스택 기반 바이트코드
enum OpCode {
  CONST = 0x01,      // 상수 푸시
  ADD = 0x02,        // 덧셈
  SUB = 0x03,        // 뺄셈
  MUL = 0x04,        // 곱셈
  DIV = 0x05,        // 나눗셈
  LOAD = 0x10,       // 변수 로드
  STORE = 0x11,      // 변수 저장
  JUMP = 0x20,       // 무조건 점프
  JUMP_IF_FALSE = 0x21, // 조건부 점프
  CALL = 0x30,       // 함수 호출
  RETURN = 0x31,     // 리턴
  PRINT = 0x40,      // 출력
  HALT = 0xFF,       // 정지
}

class BytecodeCompiler {
  private bytecode: number[] = [];
  private constants: any[] = [];
  private locals: Map<string, number> = new Map();

  compile(program: Program): Uint8Array {
    for (const stmt of program.body) {
      this.compileNode(stmt);
    }
    this.emit(OpCode.HALT);

    return new Uint8Array(this.bytecode);
  }

  private compileNode(node: ASTNode): void {
    switch (node.type) {
      case 'NumberLiteral': {
        const index = this.addConstant((node as NumberLiteral).value);
        this.emit(OpCode.CONST);
        this.emit(index);
        break;
      }

      case 'BinaryExpression': {
        const binary = node as BinaryExpression;
        this.compileNode(binary.left);
        this.compileNode(binary.right);

        switch (binary.operator) {
          case '+': this.emit(OpCode.ADD); break;
          case '-': this.emit(OpCode.SUB); break;
          case '*': this.emit(OpCode.MUL); break;
          case '/': this.emit(OpCode.DIV); break;
        }
        break;
      }

      case 'VariableDeclaration': {
        const decl = node as VariableDeclaration;
        const slot = this.locals.size;
        this.locals.set(decl.name, slot);

        if (decl.initializer) {
          this.compileNode(decl.initializer);
          this.emit(OpCode.STORE);
          this.emit(slot);
        }
        break;
      }

      case 'Identifier': {
        const id = node as Identifier;
        const slot = this.locals.get(id.name);
        if (slot === undefined) {
          throw new Error('Undefined variable: ' + id.name);
        }
        this.emit(OpCode.LOAD);
        this.emit(slot);
        break;
      }
    }
  }

  private emit(byte: number): void {
    this.bytecode.push(byte);
  }

  private addConstant(value: any): number {
    this.constants.push(value);
    return this.constants.length - 1;
  }
}

9. 인터프리터 설계

9.1 Tree-Walking 인터프리터

// Tree-Walking 인터프리터
class TreeWalkInterpreter {
  private environment: Environment;

  constructor() {
    this.environment = new Environment();
    // 내장 함수 등록
    this.environment.define('print', (args: any[]) => {
      console.log(...args);
      return null;
    });
  }

  interpret(program: Program): any {
    let result: any = null;
    for (const stmt of program.body) {
      result = this.execute(stmt);
    }
    return result;
  }

  private execute(node: ASTNode): any {
    switch (node.type) {
      case 'NumberLiteral':
        return (node as NumberLiteral).value;

      case 'StringLiteral':
        return (node as StringLiteral).value;

      case 'BooleanLiteral':
        return (node as BooleanLiteral).value;

      case 'Identifier':
        return this.environment.get((node as Identifier).name);

      case 'BinaryExpression': {
        const binary = node as BinaryExpression;
        const left = this.execute(binary.left);
        const right = this.execute(binary.right);

        switch (binary.operator) {
          case '+': return left + right;
          case '-': return left - right;
          case '*': return left * right;
          case '/': return left / right;
          case '%': return left % right;
          case '==': return left === right;
          case '!=': return left !== right;
          case '<': return left < right;
          case '>': return left > right;
          case '&&': return left && right;
          case '||': return left || right;
          default: throw new Error('Unknown operator: ' + binary.operator);
        }
      }

      case 'VariableDeclaration': {
        const decl = node as VariableDeclaration;
        const value = decl.initializer ? this.execute(decl.initializer) : null;
        this.environment.define(decl.name, value);
        return null;
      }

      case 'FunctionDeclaration': {
        const fn = node as FunctionDeclaration;
        const closure = this.environment;

        const callable = (args: any[]) => {
          const fnEnv = new Environment(closure);
          fn.params.forEach((param, i) => {
            fnEnv.define(param, args[i]);
          });

          const prevEnv = this.environment;
          this.environment = fnEnv;

          try {
            for (const stmt of fn.body.body) {
              const result = this.execute(stmt);
              if (result && result.__return__) {
                return result.value;
              }
            }
          } finally {
            this.environment = prevEnv;
          }

          return null;
        };

        this.environment.define(fn.name, callable);
        return null;
      }

      case 'CallExpression': {
        const call = node as CallExpression;
        const fn = this.execute(call.callee);
        const args = call.arguments.map((arg: ASTNode) => this.execute(arg));
        return fn(args);
      }

      case 'IfStatement': {
        const ifStmt = node as IfStatement;
        const condition = this.execute(ifStmt.condition);

        if (condition) {
          return this.executeBlock(ifStmt.consequent);
        } else if (ifStmt.alternate) {
          if (ifStmt.alternate.type === 'BlockStatement') {
            return this.executeBlock(ifStmt.alternate as BlockStatement);
          }
          return this.execute(ifStmt.alternate);
        }
        return null;
      }

      case 'ReturnStatement': {
        const ret = node as ReturnStatement;
        return {
          __return__: true,
          value: ret.value ? this.execute(ret.value) : null,
        };
      }

      case 'BlockStatement':
        return this.executeBlock(node as BlockStatement);

      default:
        throw new Error('Unknown node type: ' + node.type);
    }
  }

  private executeBlock(block: BlockStatement): any {
    const prevEnv = this.environment;
    this.environment = new Environment(prevEnv);

    try {
      for (const stmt of block.body) {
        const result = this.execute(stmt);
        if (result && result.__return__) return result;
      }
    } finally {
      this.environment = prevEnv;
    }

    return null;
  }
}

// 환경(Environment) - 변수 바인딩 관리
class Environment {
  private bindings: Map<string, any> = new Map();
  private parent: Environment | null;

  constructor(parent: Environment | null = null) {
    this.parent = parent;
  }

  define(name: string, value: any): void {
    this.bindings.set(name, value);
  }

  get(name: string): any {
    if (this.bindings.has(name)) {
      return this.bindings.get(name);
    }
    if (this.parent) return this.parent.get(name);
    throw new Error('Undefined variable: ' + name);
  }

  set(name: string, value: any): void {
    if (this.bindings.has(name)) {
      this.bindings.set(name, value);
      return;
    }
    if (this.parent) { this.parent.set(name, value); return; }
    throw new Error('Undefined variable: ' + name);
  }
}

9.2 바이트코드 VM

// 스택 기반 바이트코드 VM
class VirtualMachine {
  private stack: any[] = [];
  private locals: any[] = new Array(256).fill(null);
  private ip: number = 0; // 명령어 포인터

  execute(bytecode: Uint8Array, constants: any[]): any {
    while (this.ip < bytecode.length) {
      const opcode = bytecode[this.ip++];

      switch (opcode) {
        case OpCode.CONST: {
          const index = bytecode[this.ip++];
          this.stack.push(constants[index]);
          break;
        }

        case OpCode.ADD: {
          const b = this.stack.pop();
          const a = this.stack.pop();
          this.stack.push(a + b);
          break;
        }

        case OpCode.SUB: {
          const b = this.stack.pop();
          const a = this.stack.pop();
          this.stack.push(a - b);
          break;
        }

        case OpCode.MUL: {
          const b = this.stack.pop();
          const a = this.stack.pop();
          this.stack.push(a * b);
          break;
        }

        case OpCode.DIV: {
          const b = this.stack.pop();
          const a = this.stack.pop();
          this.stack.push(a / b);
          break;
        }

        case OpCode.LOAD: {
          const slot = bytecode[this.ip++];
          this.stack.push(this.locals[slot]);
          break;
        }

        case OpCode.STORE: {
          const slot = bytecode[this.ip++];
          this.locals[slot] = this.stack.pop();
          break;
        }

        case OpCode.JUMP: {
          const target = bytecode[this.ip++];
          this.ip = target;
          break;
        }

        case OpCode.JUMP_IF_FALSE: {
          const target = bytecode[this.ip++];
          const condition = this.stack.pop();
          if (!condition) {
            this.ip = target;
          }
          break;
        }

        case OpCode.PRINT: {
          console.log(this.stack.pop());
          break;
        }

        case OpCode.HALT:
          return this.stack.length > 0 ? this.stack.pop() : null;

        default:
          throw new Error('Unknown opcode: 0x' + opcode.toString(16));
      }
    }

    return this.stack.length > 0 ? this.stack.pop() : null;
  }
}

10. 가비지 컬렉션 (Garbage Collection)

10.1 GC 알고리즘 비교

GC 알고리즘 비교:
┌──────────────────┬────────────┬────────────┬──────────────┐
│ 알고리즘         │ 처리량     │ 일시정지   │ 메모리 오버헤드│
├──────────────────┼────────────┼────────────┼──────────────┤
│ 참조 카운팅      │ 높음      │ 짧음      │ 객체당 카운터│
Mark-Sweep       │ 높음      │ 길 수 있음│ 마크 비트    │
Mark-Compact     │ 보통      │ 길 수 있음│ 마크 비트    │
세대별(Generational)│ 매우 높음│ 대부분 짧음│ 카드 테이블  │
동시(Concurrent) │ 높음      │ 매우 짧음 │ Write Barrier│
└──────────────────┴────────────┴────────────┴──────────────┘

10.2 Mark-Sweep 구현

// 간단한 Mark-Sweep GC
class GarbageCollector {
  private heap: GCObject[] = [];
  private roots: Set<GCObject> = new Set();

  allocate(value: any): GCObject {
    const obj: GCObject = {
      value,
      marked: false,
      references: [],
    };
    this.heap.push(obj);

    // 힙이 너무 크면 GC 실행
    if (this.heap.length > 1000) {
      this.collect();
    }

    return obj;
  }

  addRoot(obj: GCObject): void {
    this.roots.add(obj);
  }

  removeRoot(obj: GCObject): void {
    this.roots.delete(obj);
  }

  collect(): void {
    // Mark 단계
    this.markAll();

    // Sweep 단계
    this.sweep();
  }

  private markAll(): void {
    for (const root of this.roots) {
      this.mark(root);
    }
  }

  private mark(obj: GCObject): void {
    if (obj.marked) return; // 이미 마킹됨 (순환 참조 방지)
    obj.marked = true;

    // 참조하는 객체들도 마킹
    for (const ref of obj.references) {
      this.mark(ref);
    }
  }

  private sweep(): void {
    this.heap = this.heap.filter((obj) => {
      if (obj.marked) {
        obj.marked = false; // 다음 GC를 위해 리셋
        return true;
      }
      // 마킹되지 않은 객체는 해제
      return false;
    });
  }
}

interface GCObject {
  value: any;
  marked: boolean;
  references: GCObject[];
}

11. 미니 언어 실전 구현

11.1 전체 파이프라인 통합

// 미니 언어 "MiniLang" 전체 파이프라인
class MiniLang {
  // 소스 코드 -> 실행 결과
  static run(source: string): any {
    // 1. 토큰화
    const lexer = new Lexer(source);
    const tokens = lexer.tokenize();

    // 2. 파싱
    const parser = new RecursiveDescentParser(tokens);
    const ast = parser.parse();

    // 3. 타입 검사
    const typeChecker = new TypeChecker();
    // typeChecker.check(ast); // 선택적

    // 4. 최적화
    const optimizer = new ConstantFolding();
    // const optimizedAst = optimizer.optimize(ast);

    // 5. 실행
    const interpreter = new TreeWalkInterpreter();
    return interpreter.interpret(ast);
  }
}

// 사용 예시
const source = `
let x = 10;
let y = 20;

function add(a, b) {
  return a + b;
}

function factorial(n) {
  if (n == 0) {
    return 1;
  }
  return n * factorial(n - 1);
}

let result = add(x, y);
print(result);
print(factorial(5));
`;

MiniLang.run(source);
// 출력: 30
// 출력: 120

12. 현대 컴파일러 프로젝트

12.1 주요 프로젝트 아키텍처

현대 컴파일러/트랜스파일러 비교:

Rust Compiler (rustc):
  Source -> HIR -> MIR -> LLVM IR -> 기계어
  특징: 소유권 시스템, 빌림 검사, 단형화(monomorphization)

V8 TurboFan (JavaScript):
  Source -> AST -> Bytecode(Ignition) -> 최적 코드(TurboFan)
  특징: 인라인 캐시, 투기적 최적화, Deoptimization

TypeScript Compiler (tsc):
  Source -> Scanner -> Parser -> AST -> Binder -> Checker -> Emitter
  특징: 구조적 타입 시스템, 타입 추론, .d.ts 생성

SWC (Speedy Web Compiler):
  Source -> Lexer -> Parser -> AST -> Transform -> Codegen
  특징: Rust로 작성, Babel 대비 20-70배 빠름

esbuild:
  Source -> Lexer -> Parser -> AST -> Linker -> Codegen
  특징: Go로 작성, 극도로 빠른 번들링

Zig Compiler:
  Source -> AST -> ZIR -> AIR -> 기계어/LLVM IR
  특징: comptime 실행, 증분 컴파일

12.2 학습 리소스 로드맵

컴파일러 학습 로드맵:

초급:
├── Crafting Interpreters (Robert Nystrom)
- Tree-Walking 인터프리터 (Java)
- 바이트코드 VM (C)
├── Writing An Interpreter In Go (Thorsten Ball)
└── 미니 Lisp 인터프리터 구현

중급:
├── Engineering a Compiler (Cooper & Torczon)
- 데이터 흐름 분석
- 레지스터 할당
├── LLVM Tutorial: Kaleidoscope
- LLVM IR 생성
- JIT 컴파일
└── WebAssembly 컴파일러 구현

고급:
├── Compilers: Principles, Techniques, and Tools (Dragon Book)
├── Advanced Compiler Design and Implementation (Muchnick)
├── SSA-based Compiler Design (Springer)
└── 실제 오픈소스 컴파일러 기여
    - Rust, V8, LLVM, GCC

13. 퀴즈

아래 퀴즈로 학습 내용을 점검해 보세요.

Q1. Lexer와 Parser의 역할 차이를 설명하고, 각각의 입출력은 무엇인가?

A1.

Lexer (토크나이저):

  • 역할: 소스 코드 문자열을 토큰(Token)의 스트림으로 변환
  • 입력: 소스 코드 문자열 (예: let x = 10 + 20;)
  • 출력: 토큰 배열 (예: [LET, IDENT("x"), ASSIGN, NUM(10), PLUS, NUM(20), SEMICOLON])
  • 공백, 주석 등을 제거하고 키워드, 식별자, 리터럴, 연산자를 구분합니다.

Parser (구문 분석기):

  • 역할: 토큰 스트림을 AST(추상 구문 트리)로 변환
  • 입력: 토큰 배열
  • 출력: AST (트리 구조)
  • 문법 규칙에 따라 토큰 간의 관계와 구조를 파악합니다. 구문 오류도 이 단계에서 검출됩니다.
Q2. Pratt Parser가 재귀 하강 파서보다 연산자 우선순위 처리에 유리한 이유는?

A2. 재귀 하강 파서에서 연산자 우선순위를 처리하려면 각 우선순위 레벨마다 별도의 파싱 함수를 만들어야 합니다. 예를 들어 parseAddition(), parseMultiplication(), parseComparison() 등 우선순위가 늘어날수록 함수가 계속 추가됩니다.

Pratt Parser의 장점:

  1. 단일 함수로 모든 우선순위를 처리 (parseExpression(precedence))
  2. 우선순위를 숫자 값으로 표현하여 비교가 간단
  3. 새로운 연산자를 추가할 때 테이블에 등록만 하면 됨
  4. Prefix와 Infix를 독립적으로 등록하여 같은 기호의 다른 의미를 처리 가능 (예: -가 단항과 이항 모두)

이 방식은 V8(JavaScript), Rust 컴파일러 등 실전에서 널리 사용됩니다.

Q3. SSA(Static Single Assignment) 형태가 컴파일러 최적화에 유리한 이유는?

A3. SSA 형태에서는 각 변수가 정확히 한 번만 할당됩니다. 이로 인해:

  1. def-use 분석이 단순: 변수의 정의 지점이 하나이므로 사용처를 바로 추적 가능
  2. 상수 전파가 쉬움: 변수가 한 번만 할당되므로 상수 값을 안전하게 전파
  3. 죽은 코드 제거가 정확: 사용되지 않는 정의를 정확히 식별
  4. 레지스터 할당 최적화: 변수의 수명이 명확하여 효율적 할당 가능
  5. 제어 흐름 합류: phi 노드로 다른 경로에서 오는 값을 명확히 표현

LLVM IR, GCC의 GIMPLE 등 현대 컴파일러 인프라 대부분이 SSA를 기반으로 합니다.

Q4. Tree-Walking 인터프리터와 바이트코드 VM 인터프리터의 장단점은?

A4.

Tree-Walking 인터프리터:

  • 장점: 구현이 간단하고 직관적, AST를 직접 순회하므로 디버깅 용이
  • 단점: 노드 순회 오버헤드로 인해 느림, 캐시 효율이 낮음, 재귀 호출 스택 소모

바이트코드 VM 인터프리터:

  • 장점: Tree-Walking 대비 3-10배 빠름, 컴팩트한 바이트코드로 캐시 효율적, 직렬화 가능(바이트코드 저장)
  • 단점: 구현이 복잡(컴파일러 + VM 모두 필요), 디버깅 정보 유지가 어려움

일반적으로 프로토타이핑에는 Tree-Walking, 프로덕션에는 바이트코드 VM을 사용합니다. CPython, Java JVM, Lua 등이 바이트코드 VM 방식을 사용합니다.

Q5. Mark-Sweep GC의 동작 원리와 세대별(Generational) GC가 이를 개선하는 방법은?

A5.

Mark-Sweep GC:

  1. Mark 단계: 루트(전역 변수, 스택)에서 시작하여 도달 가능한 모든 객체를 마킹
  2. Sweep 단계: 힙 전체를 스캔하여 마킹되지 않은 객체를 해제
  • 단점: 전체 힙을 스캔해야 하므로 힙이 클수록 일시정지가 길어짐

세대별(Generational) GC의 개선:

  • "대부분의 객체는 젊은 세대에서 죽는다"는 관찰에 기반
  • Young Generation: 자주, 빠르게 수집 (Minor GC)
  • Old Generation: 드물게 수집 (Major GC)
  • Young에서 살아남은 객체만 Old로 승격(promotion)
  • Write Barrier로 세대 간 참조를 추적
  • V8, JVM, Go 등 대부분의 현대 런타임이 세대별 GC를 사용합니다.

14. 참고 자료

  1. Crafting Interpreters - Robert Nystrom의 무료 온라인 컴파일러 교재
  2. Writing An Interpreter In Go - Thorsten Ball의 Go 인터프리터 구현서
  3. LLVM Tutorial: Kaleidoscope - LLVM 공식 튜토리얼
  4. WebAssembly Specification - WebAssembly 공식 스펙
  5. Simple but Powerful Pratt Parsing - Pratt Parser 해설
  6. The Rust Reference - Rust 언어 레퍼런스
  7. V8 Blog - TurboFan - V8 TurboFan JIT 컴파일러
  8. Engineering a Compiler - 컴파일러 공학 교재
  9. PEG.js - PEG 파서 생성기
  10. Tree-sitter - 증분 파싱 라이브러리
  11. Binaryen - WebAssembly 컴파일러 인프라
  12. The GC Handbook - 가비지 컬렉션 핸드북
  13. SWC Project - Rust 기반 TypeScript/JavaScript 컴파일러

Compiler & Interpreter Design Complete Guide 2025: Lexer, Parser, AST, Code Generation

Table of Contents

1. Why Learn Compilers

Compiler theory is one of the core fields of computer science. Understanding compilers gives you deep insight into how programming languages work and has numerous practical applications.

1.1 Practical Applications

Application AreaExamples
DSL DesignConfiguration languages, query languages, template engines
TranspilersTypeScript, Babel, PostCSS
Linters/FormattersESLint, Prettier, rustfmt
IDE FeaturesAutocomplete, refactoring, syntax highlighting
Code AnalysisStatic analysis, security scanners, dependency analysis
Build Toolswebpack, esbuild, SWC, Vite

1.2 Compilation Pipeline Overview

Source Code -> Compilation Pipeline:

┌──────────┐    ┌──────────┐    ┌──────────┐
Source   │───>Lexer   │───>ParserCode(Tokenizer)│    │          │
└──────────┘    └──────────┘    └──────────┘
                                      v
┌──────────┐    ┌──────────┐    ┌──────────┐
Code<───│Optimizer │<───│ Semantic│Generator │    │          │    │ Analysis└──────────┘    └──────────┘    └──────────┘
      v
┌──────────┐
TargetCode└──────────┘

Role of each stage:
1. Lexer: Source code -> Token stream
2. Parser: Token stream -> AST (Abstract Syntax Tree)
3. Semantic Analysis: Type checking, scope resolution
4. Optimizer: Optimization at IR level
5. Code Generator: Generate target code (machine code, bytecode, WASM, etc.)

2. Lexer Implementation

The Lexer (Tokenizer) transforms source code strings into a stream of Tokens.

2.1 Token Type Definition

// Token type enumeration
enum TokenType {
  // Literals
  Number = 'Number',
  String = 'String',
  Boolean = 'Boolean',

  // Identifiers & Keywords
  Identifier = 'Identifier',
  Let = 'Let',
  Const = 'Const',
  Function = 'Function',
  If = 'If',
  Else = 'Else',
  Return = 'Return',
  While = 'While',
  For = 'For',

  // Operators
  Plus = 'Plus',           // +
  Minus = 'Minus',         // -
  Star = 'Star',           // *
  Slash = 'Slash',         // /
  Percent = 'Percent',     // %
  Assign = 'Assign',       // =
  Equal = 'Equal',         // ==
  NotEqual = 'NotEqual',   // !=
  LessThan = 'LessThan',  // <
  GreaterThan = 'GreaterThan', // >
  And = 'And',             // &&
  Or = 'Or',               // ||
  Not = 'Not',             // !

  // Delimiters
  LeftParen = 'LeftParen',     // (
  RightParen = 'RightParen',   // )
  LeftBrace = 'LeftBrace',     // { (inside code blocks only)
  RightBrace = 'RightBrace',   // } (inside code blocks only)
  LeftBracket = 'LeftBracket', // [
  RightBracket = 'RightBracket', // ]
  Comma = 'Comma',
  Semicolon = 'Semicolon',
  Colon = 'Colon',
  Dot = 'Dot',
  Arrow = 'Arrow',         // =>

  // Special
  EOF = 'EOF',
}

// Token interface
interface Token {
  type: TokenType;
  value: string;
  line: number;
  column: number;
}

2.2 Lexer Class Implementation

class Lexer {
  private source: string;
  private pos: number = 0;
  private line: number = 1;
  private column: number = 1;
  private tokens: Token[] = [];

  // Keyword map
  private static keywords: Map<string, TokenType> = new Map([
    ['let', TokenType.Let],
    ['const', TokenType.Const],
    ['function', TokenType.Function],
    ['if', TokenType.If],
    ['else', TokenType.Else],
    ['return', TokenType.Return],
    ['while', TokenType.While],
    ['for', TokenType.For],
    ['true', TokenType.Boolean],
    ['false', TokenType.Boolean],
  ]);

  constructor(source: string) {
    this.source = source;
  }

  tokenize(): Token[] {
    while (this.pos < this.source.length) {
      this.skipWhitespace();
      this.skipComments();

      if (this.pos >= this.source.length) break;

      const char = this.source[this.pos];

      if (this.isDigit(char)) {
        this.readNumber();
      } else if (this.isAlpha(char) || char === '_') {
        this.readIdentifier();
      } else if (char === '"' || char === "'") {
        this.readString(char);
      } else {
        this.readOperator();
      }
    }

    this.tokens.push(this.makeToken(TokenType.EOF, ''));
    return this.tokens;
  }

  private readNumber(): void {
    const start = this.pos;
    let isFloat = false;

    while (this.pos < this.source.length && this.isDigit(this.current())) {
      this.advance();
    }

    // Handle decimal point
    if (this.current() === '.' && this.isDigit(this.peek())) {
      isFloat = true;
      this.advance(); // Skip '.'
      while (this.pos < this.source.length && this.isDigit(this.current())) {
        this.advance();
      }
    }

    const value = this.source.slice(start, this.pos);
    this.tokens.push(this.makeToken(TokenType.Number, value));
  }

  private readIdentifier(): void {
    const start = this.pos;

    while (
      this.pos < this.source.length &&
      (this.isAlphaNumeric(this.current()) || this.current() === '_')
    ) {
      this.advance();
    }

    const value = this.source.slice(start, this.pos);
    const type = Lexer.keywords.get(value) || TokenType.Identifier;
    this.tokens.push(this.makeToken(type, value));
  }

  private readString(quote: string): void {
    this.advance(); // Skip opening quote
    const start = this.pos;
    let value = '';

    while (this.pos < this.source.length && this.current() !== quote) {
      if (this.current() === '\\') {
        this.advance();
        switch (this.current()) {
          case 'n': value += '\n'; break;
          case 't': value += '\t'; break;
          case '\\': value += '\\'; break;
          default: value += this.current();
        }
      } else {
        value += this.current();
      }
      this.advance();
    }

    if (this.current() !== quote) {
      throw new Error(
        'Unterminated string at line ' + this.line
      );
    }
    this.advance(); // Skip closing quote

    this.tokens.push(this.makeToken(TokenType.String, value));
  }

  private readOperator(): void {
    const char = this.current();
    const next = this.peek();

    // Check two-character operators
    const twoChar = char + (next || '');
    const twoCharOps: Record<string, TokenType> = {
      '==': TokenType.Equal,
      '!=': TokenType.NotEqual,
      '&&': TokenType.And,
      '||': TokenType.Or,
      '=>': TokenType.Arrow,
    };

    if (twoCharOps[twoChar]) {
      this.tokens.push(this.makeToken(twoCharOps[twoChar], twoChar));
      this.advance();
      this.advance();
      return;
    }

    // Single-character operators
    const oneCharOps: Record<string, TokenType> = {
      '+': TokenType.Plus,
      '-': TokenType.Minus,
      '*': TokenType.Star,
      '/': TokenType.Slash,
      '%': TokenType.Percent,
      '=': TokenType.Assign,
      '<': TokenType.LessThan,
      '>': TokenType.GreaterThan,
      '!': TokenType.Not,
      '(': TokenType.LeftParen,
      ')': TokenType.RightParen,
      '[': TokenType.LeftBracket,
      ']': TokenType.RightBracket,
      ',': TokenType.Comma,
      ';': TokenType.Semicolon,
      ':': TokenType.Colon,
      '.': TokenType.Dot,
    };

    // Handle braces
    if (char === '{') {
      this.tokens.push(this.makeToken(TokenType.LeftBrace, char));
      this.advance();
      return;
    }
    if (char === '}') {
      this.tokens.push(this.makeToken(TokenType.RightBrace, char));
      this.advance();
      return;
    }

    if (oneCharOps[char]) {
      this.tokens.push(this.makeToken(oneCharOps[char], char));
      this.advance();
    } else {
      throw new Error(
        'Unexpected character: ' + char + ' at line ' + this.line
      );
    }
  }

  // Utility methods
  private current(): string { return this.source[this.pos]; }
  private peek(): string { return this.source[this.pos + 1]; }
  private advance(): void {
    if (this.current() === '\n') {
      this.line++;
      this.column = 1;
    } else {
      this.column++;
    }
    this.pos++;
  }
  private isDigit(c: string): boolean { return /[0-9]/.test(c); }
  private isAlpha(c: string): boolean { return /[a-zA-Z_]/.test(c); }
  private isAlphaNumeric(c: string): boolean { return /[a-zA-Z0-9_]/.test(c); }
  private skipWhitespace(): void {
    while (this.pos < this.source.length && /\s/.test(this.current())) {
      this.advance();
    }
  }
  private skipComments(): void {
    if (this.current() === '/' && this.peek() === '/') {
      while (this.pos < this.source.length && this.current() !== '\n') {
        this.advance();
      }
    }
  }
  private makeToken(type: TokenType, value: string): Token {
    return { type, value, line: this.line, column: this.column };
  }
}

2.3 Error Recovery

// Lexer with error recovery
class RobustLexer extends Lexer {
  private errors: LexerError[] = [];

  tokenize(): Token[] {
    while (this.pos < this.source.length) {
      try {
        this.skipWhitespace();
        if (this.pos >= this.source.length) break;
        // ... normal tokenization logic
      } catch (error) {
        // Record error and advance to next valid position
        this.errors.push({
          message: error.message,
          line: this.line,
          column: this.column,
        });
        this.synchronize(); // Skip to next valid token start
      }
    }
    return this.tokens;
  }

  private synchronize(): void {
    // Skip to next whitespace or delimiter
    while (
      this.pos < this.source.length &&
      !/[\s;,(){}[\]]/.test(this.current())
    ) {
      this.advance();
    }
  }
}

3. Parser Implementation

The Parser transforms a token stream into an AST (Abstract Syntax Tree).

3.1 Parser Type Comparison

Parser Type Comparison:
┌─────────────────────┬────────────┬────────────┬──────────────┐
TypeDifficultyExpressiveNotable Tool├─────────────────────┼────────────┼────────────┼──────────────┤
Recursive DescentEasyLL(k)GCC, GoPratt ParserMediumPrecedenceV8, RustPEGMediumUnlimited  │ pest, PEG.jsParser CombinatorsMediumFlexible   │ nom, parsec  │
LR/LALRHardLR(1)      │ yacc, bison  │
EarleyMediumAll CFGs   │ nearley      │
└─────────────────────┴────────────┴────────────┴──────────────┘

3.2 Recursive Descent Parser

// AST Node type definition
type ASTNode =
  | NumberLiteral
  | StringLiteral
  | BooleanLiteral
  | Identifier
  | BinaryExpression
  | UnaryExpression
  | CallExpression
  | VariableDeclaration
  | FunctionDeclaration
  | IfStatement
  | WhileStatement
  | ReturnStatement
  | BlockStatement
  | Program;

interface NumberLiteral {
  type: 'NumberLiteral';
  value: number;
}

interface BinaryExpression {
  type: 'BinaryExpression';
  operator: string;
  left: ASTNode;
  right: ASTNode;
}

interface VariableDeclaration {
  type: 'VariableDeclaration';
  kind: 'let' | 'const';
  name: string;
  initializer: ASTNode | null;
}

interface FunctionDeclaration {
  type: 'FunctionDeclaration';
  name: string;
  params: string[];
  body: BlockStatement;
}

interface IfStatement {
  type: 'IfStatement';
  condition: ASTNode;
  consequent: BlockStatement;
  alternate: BlockStatement | IfStatement | null;
}

interface Program {
  type: 'Program';
  body: ASTNode[];
}

// Recursive Descent Parser
class RecursiveDescentParser {
  private tokens: Token[];
  private pos: number = 0;

  constructor(tokens: Token[]) {
    this.tokens = tokens;
  }

  parse(): Program {
    const body: ASTNode[] = [];

    while (!this.isAtEnd()) {
      body.push(this.parseStatement());
    }

    return { type: 'Program', body };
  }

  private parseStatement(): ASTNode {
    const token = this.current();

    switch (token.type) {
      case TokenType.Let:
      case TokenType.Const:
        return this.parseVariableDeclaration();
      case TokenType.Function:
        return this.parseFunctionDeclaration();
      case TokenType.If:
        return this.parseIfStatement();
      case TokenType.While:
        return this.parseWhileStatement();
      case TokenType.Return:
        return this.parseReturnStatement();
      case TokenType.LeftBrace:
        return this.parseBlockStatement();
      default:
        return this.parseExpressionStatement();
    }
  }

  private parseVariableDeclaration(): VariableDeclaration {
    const kind = this.consume().value as 'let' | 'const';
    const name = this.expect(TokenType.Identifier).value;

    let initializer: ASTNode | null = null;
    if (this.match(TokenType.Assign)) {
      initializer = this.parseExpression();
    }

    this.expect(TokenType.Semicolon);

    return { type: 'VariableDeclaration', kind, name, initializer };
  }

  private parseFunctionDeclaration(): FunctionDeclaration {
    this.expect(TokenType.Function);
    const name = this.expect(TokenType.Identifier).value;

    this.expect(TokenType.LeftParen);
    const params: string[] = [];

    if (!this.check(TokenType.RightParen)) {
      do {
        params.push(this.expect(TokenType.Identifier).value);
      } while (this.match(TokenType.Comma));
    }

    this.expect(TokenType.RightParen);
    const body = this.parseBlockStatement();

    return { type: 'FunctionDeclaration', name, params, body };
  }

  private parseIfStatement(): IfStatement {
    this.expect(TokenType.If);
    this.expect(TokenType.LeftParen);
    const condition = this.parseExpression();
    this.expect(TokenType.RightParen);

    const consequent = this.parseBlockStatement();

    let alternate: BlockStatement | IfStatement | null = null;
    if (this.match(TokenType.Else)) {
      if (this.check(TokenType.If)) {
        alternate = this.parseIfStatement();
      } else {
        alternate = this.parseBlockStatement();
      }
    }

    return { type: 'IfStatement', condition, consequent, alternate };
  }

  // Utility methods
  private current(): Token { return this.tokens[this.pos]; }
  private consume(): Token { return this.tokens[this.pos++]; }
  private isAtEnd(): boolean { return this.current().type === TokenType.EOF; }
  private check(type: TokenType): boolean { return this.current().type === type; }
  private match(type: TokenType): boolean {
    if (this.check(type)) { this.consume(); return true; }
    return false;
  }
  private expect(type: TokenType): Token {
    if (this.check(type)) return this.consume();
    throw new Error(
      'Expected ' + type + ' but got ' + this.current().type +
      ' at line ' + this.current().line
    );
  }
}

3.3 Pratt Parser (Operator Precedence Parser)

The Pratt Parser elegantly handles operator precedence. It is used in many modern language implementations including V8 (JavaScript) and the Rust compiler.

// Operator precedence definition
enum Precedence {
  None = 0,
  Assignment = 1,  // =
  Or = 2,          // ||
  And = 3,         // &&
  Equality = 4,    // == !=
  Comparison = 5,  // < > (including less-than-or-equal, greater-than-or-equal)
  Term = 6,        // + -
  Factor = 7,      // * / %
  Unary = 8,       // ! -
  Call = 9,        // ()
  Member = 10,     // .
}

// Prefix parser function type (numbers, identifiers, parentheses, unary operators)
type PrefixParseFn = () => ASTNode;

// Infix parser function type (binary operators, function calls)
type InfixParseFn = (left: ASTNode) => ASTNode;

class PrattParser {
  private tokens: Token[];
  private pos: number = 0;

  // Prefix parser registry
  private prefixParsers: Map<TokenType, PrefixParseFn> = new Map();
  // Infix parser registry
  private infixParsers: Map<TokenType, InfixParseFn> = new Map();
  // Precedence registry
  private precedences: Map<TokenType, Precedence> = new Map();

  constructor(tokens: Token[]) {
    this.tokens = tokens;

    // Register prefix parsers
    this.registerPrefix(TokenType.Number, this.parseNumber.bind(this));
    this.registerPrefix(TokenType.String, this.parseString.bind(this));
    this.registerPrefix(TokenType.Boolean, this.parseBoolean.bind(this));
    this.registerPrefix(TokenType.Identifier, this.parseIdentifier.bind(this));
    this.registerPrefix(TokenType.LeftParen, this.parseGrouping.bind(this));
    this.registerPrefix(TokenType.Minus, this.parseUnary.bind(this));
    this.registerPrefix(TokenType.Not, this.parseUnary.bind(this));

    // Register infix parsers
    this.registerInfix(TokenType.Plus, Precedence.Term, this.parseBinary.bind(this));
    this.registerInfix(TokenType.Minus, Precedence.Term, this.parseBinary.bind(this));
    this.registerInfix(TokenType.Star, Precedence.Factor, this.parseBinary.bind(this));
    this.registerInfix(TokenType.Slash, Precedence.Factor, this.parseBinary.bind(this));
    this.registerInfix(TokenType.Equal, Precedence.Equality, this.parseBinary.bind(this));
    this.registerInfix(TokenType.NotEqual, Precedence.Equality, this.parseBinary.bind(this));
    this.registerInfix(TokenType.LessThan, Precedence.Comparison, this.parseBinary.bind(this));
    this.registerInfix(TokenType.GreaterThan, Precedence.Comparison, this.parseBinary.bind(this));
    this.registerInfix(TokenType.And, Precedence.And, this.parseBinary.bind(this));
    this.registerInfix(TokenType.Or, Precedence.Or, this.parseBinary.bind(this));
    this.registerInfix(TokenType.LeftParen, Precedence.Call, this.parseCall.bind(this));
  }

  // Core: Pratt parsing algorithm
  parseExpression(precedence: Precedence = Precedence.None): ASTNode {
    const token = this.current();
    const prefixFn = this.prefixParsers.get(token.type);

    if (!prefixFn) {
      throw new Error('No prefix parser for ' + token.type);
    }

    let left = prefixFn();

    // Continue parsing if higher precedence infix operator exists
    while (
      !this.isAtEnd() &&
      precedence < this.getPrecedence(this.current().type)
    ) {
      const infixFn = this.infixParsers.get(this.current().type);
      if (!infixFn) break;
      left = infixFn(left);
    }

    return left;
  }

  // Binary operator parsing
  private parseBinary(left: ASTNode): ASTNode {
    const operator = this.consume();
    const precedence = this.precedences.get(operator.type) || Precedence.None;
    const right = this.parseExpression(precedence); // Left-associative

    return {
      type: 'BinaryExpression',
      operator: operator.value,
      left,
      right,
    };
  }

  // Function call parsing
  private parseCall(left: ASTNode): ASTNode {
    this.consume(); // Consume '('
    const args: ASTNode[] = [];

    if (!this.check(TokenType.RightParen)) {
      do {
        args.push(this.parseExpression());
      } while (this.match(TokenType.Comma));
    }

    this.expect(TokenType.RightParen);

    return {
      type: 'CallExpression',
      callee: left,
      arguments: args,
    } as CallExpression;
  }

  // Utility methods
  private registerPrefix(type: TokenType, fn: PrefixParseFn): void {
    this.prefixParsers.set(type, fn);
  }

  private registerInfix(type: TokenType, prec: Precedence, fn: InfixParseFn): void {
    this.infixParsers.set(type, fn);
    this.precedences.set(type, prec);
  }

  private getPrecedence(type: TokenType): Precedence {
    return this.precedences.get(type) || Precedence.None;
  }

  private parseNumber(): ASTNode {
    const token = this.consume();
    return { type: 'NumberLiteral', value: parseFloat(token.value) };
  }

  private parseString(): ASTNode {
    const token = this.consume();
    return { type: 'StringLiteral', value: token.value } as StringLiteral;
  }

  private parseBoolean(): ASTNode {
    const token = this.consume();
    return { type: 'BooleanLiteral', value: token.value === 'true' } as BooleanLiteral;
  }

  private parseIdentifier(): ASTNode {
    const token = this.consume();
    return { type: 'Identifier', name: token.value } as Identifier;
  }

  private parseGrouping(): ASTNode {
    this.consume(); // Consume '('
    const expr = this.parseExpression();
    this.expect(TokenType.RightParen);
    return expr;
  }

  private parseUnary(): ASTNode {
    const operator = this.consume();
    const operand = this.parseExpression(Precedence.Unary);
    return {
      type: 'UnaryExpression',
      operator: operator.value,
      operand,
    } as UnaryExpression;
  }

  // Common utilities
  private current(): Token { return this.tokens[this.pos]; }
  private consume(): Token { return this.tokens[this.pos++]; }
  private isAtEnd(): boolean { return this.current().type === TokenType.EOF; }
  private check(type: TokenType): boolean { return this.current().type === type; }
  private match(type: TokenType): boolean {
    if (this.check(type)) { this.consume(); return true; }
    return false;
  }
  private expect(type: TokenType): Token {
    if (this.check(type)) return this.consume();
    throw new Error('Expected ' + type + ' but got ' + this.current().type);
  }
}

4. AST Design and Visitor Pattern

4.1 AST Node Design Principles

// Each AST node includes location information
interface ASTNodeBase {
  type: string;
  loc: SourceLocation;
}

interface SourceLocation {
  start: Position;
  end: Position;
  source?: string;
}

interface Position {
  line: number;
  column: number;
  offset: number;
}

4.2 Visitor Pattern

// Visitor interface
interface ASTVisitor<T> {
  visitProgram(node: Program): T;
  visitNumberLiteral(node: NumberLiteral): T;
  visitStringLiteral(node: StringLiteral): T;
  visitBinaryExpression(node: BinaryExpression): T;
  visitUnaryExpression(node: UnaryExpression): T;
  visitIdentifier(node: Identifier): T;
  visitVariableDeclaration(node: VariableDeclaration): T;
  visitFunctionDeclaration(node: FunctionDeclaration): T;
  visitCallExpression(node: CallExpression): T;
  visitIfStatement(node: IfStatement): T;
  visitReturnStatement(node: ReturnStatement): T;
  visitBlockStatement(node: BlockStatement): T;
}

// AST Printer (for debugging)
class ASTPrinter implements ASTVisitor<string> {
  private indent: number = 0;

  visitProgram(node: Program): string {
    return node.body.map((stmt) => this.visit(stmt)).join('\n');
  }

  visitBinaryExpression(node: BinaryExpression): string {
    return (
      '(' +
      this.visit(node.left) +
      ' ' + node.operator + ' ' +
      this.visit(node.right) +
      ')'
    );
  }

  visitNumberLiteral(node: NumberLiteral): string {
    return node.value.toString();
  }

  visitIdentifier(node: Identifier): string {
    return node.name;
  }

  // Generic visit dispatcher
  visit(node: ASTNode): string {
    switch (node.type) {
      case 'Program': return this.visitProgram(node as Program);
      case 'NumberLiteral': return this.visitNumberLiteral(node as NumberLiteral);
      case 'BinaryExpression': return this.visitBinaryExpression(node as BinaryExpression);
      case 'Identifier': return this.visitIdentifier(node as Identifier);
      default: return 'Unknown: ' + node.type;
    }
  }
}

5. Semantic Analysis

5.1 Symbol Table and Scope

// Symbol information
interface Symbol {
  name: string;
  type: DataType;
  kind: 'variable' | 'function' | 'parameter';
  mutable: boolean;
  scope: Scope;
}

// Scope (lexical scope chain)
class Scope {
  private symbols: Map<string, Symbol> = new Map();
  private parent: Scope | null;

  constructor(parent: Scope | null = null) {
    this.parent = parent;
  }

  define(symbol: Symbol): void {
    if (this.symbols.has(symbol.name)) {
      throw new Error('Symbol already defined: ' + symbol.name);
    }
    this.symbols.set(symbol.name, symbol);
  }

  resolve(name: string): Symbol | null {
    if (this.symbols.has(name)) {
      return this.symbols.get(name)!;
    }
    // Search in parent scope
    if (this.parent) {
      return this.parent.resolve(name);
    }
    return null;
  }

  createChild(): Scope {
    return new Scope(this);
  }
}

5.2 Type Checking

// Type system
type DataType =
  | { kind: 'number' }
  | { kind: 'string' }
  | { kind: 'boolean' }
  | { kind: 'void' }
  | { kind: 'function'; params: DataType[]; returnType: DataType }
  | { kind: 'array'; elementType: DataType };

class TypeChecker implements ASTVisitor<DataType> {
  private currentScope: Scope = new Scope();

  visitBinaryExpression(node: BinaryExpression): DataType {
    const leftType = this.visit(node.left);
    const rightType = this.visit(node.right);

    // Arithmetic operators
    if (['+', '-', '*', '/', '%'].includes(node.operator)) {
      if (node.operator === '+') {
        // Allow string concatenation
        if (leftType.kind === 'string' || rightType.kind === 'string') {
          return { kind: 'string' };
        }
      }

      if (leftType.kind !== 'number' || rightType.kind !== 'number') {
        throw new TypeError(
          'Operator ' + node.operator +
          ' requires number operands, got ' +
          leftType.kind + ' and ' + rightType.kind
        );
      }
      return { kind: 'number' };
    }

    // Comparison operators
    if (['==', '!=', '<', '>'].includes(node.operator)) {
      if (leftType.kind !== rightType.kind) {
        throw new TypeError(
          'Cannot compare ' + leftType.kind + ' with ' + rightType.kind
        );
      }
      return { kind: 'boolean' };
    }

    // Logical operators
    if (['&&', '||'].includes(node.operator)) {
      if (leftType.kind !== 'boolean' || rightType.kind !== 'boolean') {
        throw new TypeError('Logical operators require boolean operands');
      }
      return { kind: 'boolean' };
    }

    throw new TypeError('Unknown operator: ' + node.operator);
  }

  visit(node: ASTNode): DataType {
    switch (node.type) {
      case 'NumberLiteral': return { kind: 'number' };
      case 'StringLiteral': return { kind: 'string' };
      case 'BooleanLiteral': return { kind: 'boolean' };
      case 'BinaryExpression': return this.visitBinaryExpression(node as BinaryExpression);
      default: return { kind: 'void' };
    }
  }
}

6. Intermediate Representation (IR)

6.1 Three-Address Code

// Three-Address Code
// Each instruction has at most 3 addresses (operands)

type IRInstruction =
  | { op: 'CONST'; dest: string; value: number }
  | { op: 'ADD' | 'SUB' | 'MUL' | 'DIV'; dest: string; left: string; right: string }
  | { op: 'COPY'; dest: string; src: string }
  | { op: 'LABEL'; name: string }
  | { op: 'JUMP'; target: string }
  | { op: 'JUMP_IF_FALSE'; condition: string; target: string }
  | { op: 'CALL'; dest: string; fn: string; args: string[] }
  | { op: 'RETURN'; value: string | null };

// AST -> IR conversion
class IRGenerator implements ASTVisitor<string> {
  private instructions: IRInstruction[] = [];
  private tempCounter: number = 0;
  private labelCounter: number = 0;

  private newTemp(): string {
    return '_t' + (this.tempCounter++);
  }

  private newLabel(): string {
    return '_L' + (this.labelCounter++);
  }

  private emit(instruction: IRInstruction): void {
    this.instructions.push(instruction);
  }

  visitBinaryExpression(node: BinaryExpression): string {
    const left = this.visit(node.left);
    const right = this.visit(node.right);
    const dest = this.newTemp();

    const opMap: Record<string, string> = {
      '+': 'ADD', '-': 'SUB', '*': 'MUL', '/': 'DIV',
    };

    this.emit({
      op: opMap[node.operator] as 'ADD',
      dest,
      left,
      right,
    });

    return dest;
  }

  visitIfStatement(node: IfStatement): string {
    const condition = this.visit(node.condition);
    const elseLabel = this.newLabel();
    const endLabel = this.newLabel();

    this.emit({ op: 'JUMP_IF_FALSE', condition, target: elseLabel });

    // then block
    this.visit(node.consequent as ASTNode);
    this.emit({ op: 'JUMP', target: endLabel });

    // else block
    this.emit({ op: 'LABEL', name: elseLabel });
    if (node.alternate) {
      this.visit(node.alternate as ASTNode);
    }

    this.emit({ op: 'LABEL', name: endLabel });
    return '';
  }

  visit(node: ASTNode): string {
    switch (node.type) {
      case 'NumberLiteral': {
        const dest = this.newTemp();
        this.emit({ op: 'CONST', dest, value: (node as NumberLiteral).value });
        return dest;
      }
      case 'BinaryExpression':
        return this.visitBinaryExpression(node as BinaryExpression);
      case 'IfStatement':
        return this.visitIfStatement(node as IfStatement);
      default:
        return '';
    }
  }

  getInstructions(): IRInstruction[] {
    return this.instructions;
  }
}

6.2 SSA Form and LLVM IR

LLVM IR Basic Structure:

; Function definition
define i32 @add(i32 %a, i32 %b) {
entry:
  %sum = add i32 %a, %b
  ret i32 %sum
}

; if-else structure
define i32 @max(i32 %a, i32 %b) {
entry:
  %cmp = icmp sgt i32 %a, %b
  br i1 %cmp, label %then, label %else

then:
  ret i32 %a

else:
  ret i32 %b
}

; Loop structure
define i32 @sum_to_n(i32 %n) {
entry:
  br label %loop

loop:
  %i = phi i32 [ 0, %entry ], [ %next_i, %loop ]
  %sum = phi i32 [ 0, %entry ], [ %next_sum, %loop ]
  %next_sum = add i32 %sum, %i
  %next_i = add i32 %i, 1
  %cond = icmp slt i32 %next_i, %n
  br i1 %cond, label %loop, label %exit

exit:
  ret i32 %next_sum
}

SSA (Static Single Assignment) Characteristics:
- Each variable is assigned exactly once
- Phi nodes handle control flow join points
- Facilitates optimization passes

7. Optimization Passes

7.1 Constant Folding

// Constant Folding: Evaluate constant expressions at compile time
class ConstantFolding {
  optimize(node: ASTNode): ASTNode {
    if (node.type === 'BinaryExpression') {
      const binary = node as BinaryExpression;
      const left = this.optimize(binary.left);
      const right = this.optimize(binary.right);

      // If both sides are constants, compute at compile time
      if (left.type === 'NumberLiteral' && right.type === 'NumberLiteral') {
        const leftVal = (left as NumberLiteral).value;
        const rightVal = (right as NumberLiteral).value;

        let result: number;
        switch (binary.operator) {
          case '+': result = leftVal + rightVal; break;
          case '-': result = leftVal - rightVal; break;
          case '*': result = leftVal * rightVal; break;
          case '/': result = leftVal / rightVal; break;
          default: return { ...binary, left, right };
        }

        return { type: 'NumberLiteral', value: result };
      }

      return { ...binary, left, right };
    }

    return node;
  }
}

// Example: 2 + 3 * 4 -> 14

7.2 Dead Code Elimination

// Dead Code Elimination
class DeadCodeElimination {
  optimize(program: Program): Program {
    const usedVariables = this.collectUsedVariables(program);

    const optimizedBody = program.body.filter((stmt) => {
      if (stmt.type === 'VariableDeclaration') {
        const decl = stmt as VariableDeclaration;
        // Remove if unused and has no side effects
        if (!usedVariables.has(decl.name) && !this.hasSideEffects(decl.initializer)) {
          return false;
        }
      }
      return true;
    });

    return { ...program, body: optimizedBody };
  }

  private collectUsedVariables(program: Program): Set<string> {
    const used = new Set<string>();

    function walk(node: ASTNode): void {
      if (node.type === 'Identifier') {
        used.add((node as Identifier).name);
      }
      for (const child of getChildren(node)) {
        walk(child);
      }
    }

    program.body.forEach(walk);
    return used;
  }

  private hasSideEffects(node: ASTNode | null): boolean {
    if (!node) return false;
    return node.type === 'CallExpression';
  }
}

7.3 Loop Optimization

Loop Optimization Techniques:

1. Loop Invariant Code Motion (LICM)
   // Before                    // After
   for (i = 0; i < n; i++)    x = a + b;  // Moved outside loop
     arr[i] = a + b + i;      for (i = 0; i < n; i++)
                                  arr[i] = x + i;

2. Loop Unrolling
   // Before                    // After
   for (i = 0; i < 4; i++)    arr[0] = 0;
     arr[i] = 0;               arr[1] = 0;
                                arr[2] = 0;
                                arr[3] = 0;

3. Strength Reduction
   // Before                    // After
   for (i = 0; i < n; i++)    t = 0;
     arr[i] = i * 4;           for (i = 0; i < n; i++)
                                  arr[i] = t;
                                  t = t + 4;  // Multiply -> Addition

8. Code Generation

8.1 WebAssembly Code Generation

// AST -> WebAssembly Text Format (WAT) code generation
class WasmCodeGenerator {
  private output: string[] = [];

  generate(program: Program): string {
    this.output.push('(module');

    for (const stmt of program.body) {
      if (stmt.type === 'FunctionDeclaration') {
        this.generateFunction(stmt as FunctionDeclaration);
      }
    }

    this.output.push(')');
    return this.output.join('\n');
  }

  private generateFunction(node: FunctionDeclaration): void {
    const params = node.params.map((p) => '(param $' + p + ' i32)').join(' ');

    this.output.push('  (func $' + node.name + ' ' + params + ' (result i32)');
    this.output.push('    (local $__result i32)');

    for (const stmt of node.body.body) {
      this.generateStatement(stmt);
    }

    this.output.push('  )');
    this.output.push('  (export "' + node.name + '" (func $' + node.name + '))');
  }

  private generateExpression(node: ASTNode): void {
    switch (node.type) {
      case 'NumberLiteral':
        this.output.push('    i32.const ' + (node as NumberLiteral).value);
        break;

      case 'Identifier':
        this.output.push('    local.get $' + (node as Identifier).name);
        break;

      case 'BinaryExpression': {
        const binary = node as BinaryExpression;
        this.generateExpression(binary.left);
        this.generateExpression(binary.right);

        const opMap: Record<string, string> = {
          '+': 'i32.add',
          '-': 'i32.sub',
          '*': 'i32.mul',
          '/': 'i32.div_s',
          '<': 'i32.lt_s',
          '>': 'i32.gt_s',
          '==': 'i32.eq',
          '!=': 'i32.ne',
        };

        this.output.push('    ' + opMap[binary.operator]);
        break;
      }
    }
  }

  private generateStatement(node: ASTNode): void {
    if (node.type === 'ReturnStatement') {
      const ret = node as ReturnStatement;
      if (ret.value) {
        this.generateExpression(ret.value);
      }
      this.output.push('    return');
    }
  }
}

8.2 Bytecode Generation

// Stack-based bytecode
enum OpCode {
  CONST = 0x01,        // Push constant
  ADD = 0x02,          // Addition
  SUB = 0x03,          // Subtraction
  MUL = 0x04,          // Multiplication
  DIV = 0x05,          // Division
  LOAD = 0x10,         // Load variable
  STORE = 0x11,        // Store variable
  JUMP = 0x20,         // Unconditional jump
  JUMP_IF_FALSE = 0x21,// Conditional jump
  CALL = 0x30,         // Function call
  RETURN = 0x31,       // Return
  PRINT = 0x40,        // Print
  HALT = 0xFF,         // Halt
}

class BytecodeCompiler {
  private bytecode: number[] = [];
  private constants: any[] = [];
  private locals: Map<string, number> = new Map();

  compile(program: Program): Uint8Array {
    for (const stmt of program.body) {
      this.compileNode(stmt);
    }
    this.emit(OpCode.HALT);

    return new Uint8Array(this.bytecode);
  }

  private compileNode(node: ASTNode): void {
    switch (node.type) {
      case 'NumberLiteral': {
        const index = this.addConstant((node as NumberLiteral).value);
        this.emit(OpCode.CONST);
        this.emit(index);
        break;
      }

      case 'BinaryExpression': {
        const binary = node as BinaryExpression;
        this.compileNode(binary.left);
        this.compileNode(binary.right);

        switch (binary.operator) {
          case '+': this.emit(OpCode.ADD); break;
          case '-': this.emit(OpCode.SUB); break;
          case '*': this.emit(OpCode.MUL); break;
          case '/': this.emit(OpCode.DIV); break;
        }
        break;
      }

      case 'VariableDeclaration': {
        const decl = node as VariableDeclaration;
        const slot = this.locals.size;
        this.locals.set(decl.name, slot);

        if (decl.initializer) {
          this.compileNode(decl.initializer);
          this.emit(OpCode.STORE);
          this.emit(slot);
        }
        break;
      }

      case 'Identifier': {
        const id = node as Identifier;
        const slot = this.locals.get(id.name);
        if (slot === undefined) {
          throw new Error('Undefined variable: ' + id.name);
        }
        this.emit(OpCode.LOAD);
        this.emit(slot);
        break;
      }
    }
  }

  private emit(byte: number): void {
    this.bytecode.push(byte);
  }

  private addConstant(value: any): number {
    this.constants.push(value);
    return this.constants.length - 1;
  }
}

9. Interpreter Design

9.1 Tree-Walking Interpreter

// Tree-Walking Interpreter
class TreeWalkInterpreter {
  private environment: Environment;

  constructor() {
    this.environment = new Environment();
    this.environment.define('print', (args: any[]) => {
      console.log(...args);
      return null;
    });
  }

  interpret(program: Program): any {
    let result: any = null;
    for (const stmt of program.body) {
      result = this.execute(stmt);
    }
    return result;
  }

  private execute(node: ASTNode): any {
    switch (node.type) {
      case 'NumberLiteral':
        return (node as NumberLiteral).value;

      case 'StringLiteral':
        return (node as StringLiteral).value;

      case 'BooleanLiteral':
        return (node as BooleanLiteral).value;

      case 'Identifier':
        return this.environment.get((node as Identifier).name);

      case 'BinaryExpression': {
        const binary = node as BinaryExpression;
        const left = this.execute(binary.left);
        const right = this.execute(binary.right);

        switch (binary.operator) {
          case '+': return left + right;
          case '-': return left - right;
          case '*': return left * right;
          case '/': return left / right;
          case '%': return left % right;
          case '==': return left === right;
          case '!=': return left !== right;
          case '<': return left < right;
          case '>': return left > right;
          case '&&': return left && right;
          case '||': return left || right;
          default: throw new Error('Unknown operator: ' + binary.operator);
        }
      }

      case 'FunctionDeclaration': {
        const fn = node as FunctionDeclaration;
        const closure = this.environment;

        const callable = (args: any[]) => {
          const fnEnv = new Environment(closure);
          fn.params.forEach((param, i) => {
            fnEnv.define(param, args[i]);
          });

          const prevEnv = this.environment;
          this.environment = fnEnv;

          try {
            for (const stmt of fn.body.body) {
              const result = this.execute(stmt);
              if (result && result.__return__) {
                return result.value;
              }
            }
          } finally {
            this.environment = prevEnv;
          }

          return null;
        };

        this.environment.define(fn.name, callable);
        return null;
      }

      case 'CallExpression': {
        const call = node as CallExpression;
        const fn = this.execute(call.callee);
        const args = call.arguments.map((arg: ASTNode) => this.execute(arg));
        return fn(args);
      }

      case 'IfStatement': {
        const ifStmt = node as IfStatement;
        const condition = this.execute(ifStmt.condition);

        if (condition) {
          return this.executeBlock(ifStmt.consequent);
        } else if (ifStmt.alternate) {
          if (ifStmt.alternate.type === 'BlockStatement') {
            return this.executeBlock(ifStmt.alternate as BlockStatement);
          }
          return this.execute(ifStmt.alternate);
        }
        return null;
      }

      case 'ReturnStatement': {
        const ret = node as ReturnStatement;
        return {
          __return__: true,
          value: ret.value ? this.execute(ret.value) : null,
        };
      }

      default:
        throw new Error('Unknown node type: ' + node.type);
    }
  }

  private executeBlock(block: BlockStatement): any {
    const prevEnv = this.environment;
    this.environment = new Environment(prevEnv);

    try {
      for (const stmt of block.body) {
        const result = this.execute(stmt);
        if (result && result.__return__) return result;
      }
    } finally {
      this.environment = prevEnv;
    }

    return null;
  }
}

// Environment - manages variable bindings
class Environment {
  private bindings: Map<string, any> = new Map();
  private parent: Environment | null;

  constructor(parent: Environment | null = null) {
    this.parent = parent;
  }

  define(name: string, value: any): void {
    this.bindings.set(name, value);
  }

  get(name: string): any {
    if (this.bindings.has(name)) {
      return this.bindings.get(name);
    }
    if (this.parent) return this.parent.get(name);
    throw new Error('Undefined variable: ' + name);
  }

  set(name: string, value: any): void {
    if (this.bindings.has(name)) {
      this.bindings.set(name, value);
      return;
    }
    if (this.parent) { this.parent.set(name, value); return; }
    throw new Error('Undefined variable: ' + name);
  }
}

9.2 Bytecode VM

// Stack-based Bytecode VM
class VirtualMachine {
  private stack: any[] = [];
  private locals: any[] = new Array(256).fill(null);
  private ip: number = 0; // Instruction pointer

  execute(bytecode: Uint8Array, constants: any[]): any {
    while (this.ip < bytecode.length) {
      const opcode = bytecode[this.ip++];

      switch (opcode) {
        case OpCode.CONST: {
          const index = bytecode[this.ip++];
          this.stack.push(constants[index]);
          break;
        }

        case OpCode.ADD: {
          const b = this.stack.pop();
          const a = this.stack.pop();
          this.stack.push(a + b);
          break;
        }

        case OpCode.SUB: {
          const b = this.stack.pop();
          const a = this.stack.pop();
          this.stack.push(a - b);
          break;
        }

        case OpCode.MUL: {
          const b = this.stack.pop();
          const a = this.stack.pop();
          this.stack.push(a * b);
          break;
        }

        case OpCode.DIV: {
          const b = this.stack.pop();
          const a = this.stack.pop();
          this.stack.push(a / b);
          break;
        }

        case OpCode.LOAD: {
          const slot = bytecode[this.ip++];
          this.stack.push(this.locals[slot]);
          break;
        }

        case OpCode.STORE: {
          const slot = bytecode[this.ip++];
          this.locals[slot] = this.stack.pop();
          break;
        }

        case OpCode.JUMP: {
          const target = bytecode[this.ip++];
          this.ip = target;
          break;
        }

        case OpCode.JUMP_IF_FALSE: {
          const target = bytecode[this.ip++];
          const condition = this.stack.pop();
          if (!condition) {
            this.ip = target;
          }
          break;
        }

        case OpCode.PRINT: {
          console.log(this.stack.pop());
          break;
        }

        case OpCode.HALT:
          return this.stack.length > 0 ? this.stack.pop() : null;

        default:
          throw new Error('Unknown opcode: 0x' + opcode.toString(16));
      }
    }

    return this.stack.length > 0 ? this.stack.pop() : null;
  }
}

10. Garbage Collection

10.1 GC Algorithm Comparison

GC Algorithm Comparison:
┌──────────────────────┬────────────┬─────────────┬────────────────┐
AlgorithmThroughputPause TimeMemory Overhead│
├──────────────────────┼────────────┼─────────────┼────────────────┤
Reference CountingHighShortCounter/object │
Mark-SweepHighCan be long │ Mark bits      │
Mark-CompactMediumCan be long │ Mark bits      │
GenerationalVery HighMostly short│ Card table     │
ConcurrentHighVery short  │ Write barriers │
└──────────────────────┴────────────┴─────────────┴────────────────┘

10.2 Mark-Sweep Implementation

// Simple Mark-Sweep GC
class GarbageCollector {
  private heap: GCObject[] = [];
  private roots: Set<GCObject> = new Set();

  allocate(value: any): GCObject {
    const obj: GCObject = {
      value,
      marked: false,
      references: [],
    };
    this.heap.push(obj);

    // Run GC if heap is too large
    if (this.heap.length > 1000) {
      this.collect();
    }

    return obj;
  }

  addRoot(obj: GCObject): void {
    this.roots.add(obj);
  }

  collect(): void {
    // Mark phase
    this.markAll();

    // Sweep phase
    this.sweep();
  }

  private markAll(): void {
    for (const root of this.roots) {
      this.mark(root);
    }
  }

  private mark(obj: GCObject): void {
    if (obj.marked) return; // Already marked (cycle prevention)
    obj.marked = true;

    // Mark referenced objects
    for (const ref of obj.references) {
      this.mark(ref);
    }
  }

  private sweep(): void {
    this.heap = this.heap.filter((obj) => {
      if (obj.marked) {
        obj.marked = false; // Reset for next GC
        return true;
      }
      // Unmarked objects are freed
      return false;
    });
  }
}

interface GCObject {
  value: any;
  marked: boolean;
  references: GCObject[];
}

11. Building a Mini Language

11.1 Full Pipeline Integration

// Mini Language "MiniLang" full pipeline
class MiniLang {
  // Source code -> Execution result
  static run(source: string): any {
    // 1. Tokenize
    const lexer = new Lexer(source);
    const tokens = lexer.tokenize();

    // 2. Parse
    const parser = new RecursiveDescentParser(tokens);
    const ast = parser.parse();

    // 3. Type check
    const typeChecker = new TypeChecker();
    // typeChecker.check(ast); // Optional

    // 4. Optimize
    const optimizer = new ConstantFolding();
    // const optimizedAst = optimizer.optimize(ast);

    // 5. Execute
    const interpreter = new TreeWalkInterpreter();
    return interpreter.interpret(ast);
  }
}

// Usage example
const source = `
let x = 10;
let y = 20;

function add(a, b) {
  return a + b;
}

function factorial(n) {
  if (n == 0) {
    return 1;
  }
  return n * factorial(n - 1);
}

let result = add(x, y);
print(result);
print(factorial(5));
`;

MiniLang.run(source);
// Output: 30
// Output: 120

12. Modern Compiler Projects

12.1 Major Project Architectures

Modern Compiler/Transpiler Comparison:

Rust Compiler (rustc):
  Source -> HIR -> MIR -> LLVM IR -> Machine Code
  Features: Ownership system, borrow checker, monomorphization

V8 TurboFan (JavaScript):
  Source -> AST -> Bytecode(Ignition) -> Optimized Code(TurboFan)
  Features: Inline caches, speculative optimization, deoptimization

TypeScript Compiler (tsc):
  Source -> Scanner -> Parser -> AST -> Binder -> Checker -> Emitter
  Features: Structural type system, type inference, .d.ts generation

SWC (Speedy Web Compiler):
  Source -> Lexer -> Parser -> AST -> Transform -> Codegen
  Features: Written in Rust, 20-70x faster than Babel

esbuild:
  Source -> Lexer -> Parser -> AST -> Linker -> Codegen
  Features: Written in Go, extremely fast bundling

Zig Compiler:
  Source -> AST -> ZIR -> AIR -> Machine Code/LLVM IR
  Features: comptime execution, incremental compilation

12.2 Learning Resource Roadmap

Compiler Learning Roadmap:

Beginner:
├── Crafting Interpreters (Robert Nystrom)
- Tree-Walking interpreter (Java)
- Bytecode VM (C)
├── Writing An Interpreter In Go (Thorsten Ball)
└── Mini Lisp interpreter implementation

Intermediate:
├── Engineering a Compiler (Cooper & Torczon)
- Data flow analysis
- Register allocation
├── LLVM Tutorial: Kaleidoscope
- LLVM IR generation
- JIT compilation
└── WebAssembly compiler implementation

Advanced:
├── Compilers: Principles, Techniques, and Tools (Dragon Book)
├── Advanced Compiler Design and Implementation (Muchnick)
├── SSA-based Compiler Design (Springer)
└── Contributing to real open-source compilers
    - Rust, V8, LLVM, GCC, etc.

13. Quiz

Test your knowledge with the quizzes below.

Q1. Explain the difference between Lexer and Parser roles, and what are their respective inputs and outputs?

A1.

Lexer (Tokenizer):

  • Role: Transform source code strings into a stream of Tokens
  • Input: Source code string (e.g., let x = 10 + 20;)
  • Output: Token array (e.g., [LET, IDENT("x"), ASSIGN, NUM(10), PLUS, NUM(20), SEMICOLON])
  • Removes whitespace, comments and distinguishes keywords, identifiers, literals, operators.

Parser (Syntax Analyzer):

  • Role: Transform token stream into AST (Abstract Syntax Tree)
  • Input: Token array
  • Output: AST (tree structure)
  • Determines relationships and structure between tokens according to grammar rules. Syntax errors are also detected at this stage.
Q2. Why is the Pratt Parser advantageous over recursive descent for handling operator precedence?

A2. In a recursive descent parser, handling operator precedence requires creating separate parsing functions for each precedence level. For example, parseAddition(), parseMultiplication(), parseComparison() -- functions keep growing as precedence levels increase.

Pratt Parser advantages:

  1. Single function handles all precedence levels (parseExpression(precedence))
  2. Precedence is expressed as numeric values making comparison simple
  3. Adding new operators only requires registering in a table
  4. Prefix and Infix are registered independently, allowing the same symbol to have different meanings (e.g., - as both unary and binary)

This approach is widely used in practice by V8 (JavaScript), the Rust compiler, and others.

Q3. Why is SSA (Static Single Assignment) form beneficial for compiler optimizations?

A3. In SSA form, each variable is assigned exactly once. This provides:

  1. Simplified def-use analysis: With only one definition point, tracking uses is straightforward
  2. Easier constant propagation: Safely propagate constant values since variables are assigned once
  3. Accurate dead code elimination: Precisely identify unused definitions
  4. Better register allocation: Clear variable lifetimes enable efficient allocation
  5. Control flow merging: Phi nodes clearly express values coming from different paths

Most modern compiler infrastructure including LLVM IR and GCC's GIMPLE are SSA-based.

Q4. What are the pros and cons of Tree-Walking interpreters vs Bytecode VM interpreters?

A4.

Tree-Walking Interpreter:

  • Pros: Simple and intuitive implementation, easy debugging with direct AST traversal
  • Cons: Slow due to node traversal overhead, poor cache efficiency, recursive call stack consumption

Bytecode VM Interpreter:

  • Pros: 3-10x faster than tree-walking, compact bytecode is cache-friendly, serializable (bytecode can be saved)
  • Cons: Complex implementation (requires both compiler and VM), harder to maintain debug info

Generally, tree-walking is used for prototyping and bytecode VM for production. CPython, Java JVM, and Lua all use the bytecode VM approach.

Q5. How does Mark-Sweep GC work, and how does Generational GC improve upon it?

A5.

Mark-Sweep GC:

  1. Mark phase: Starting from roots (globals, stack), mark all reachable objects
  2. Sweep phase: Scan entire heap and free unmarked objects
  • Downside: Must scan the entire heap, so pauses get longer as the heap grows

Generational GC improvements:

  • Based on the observation that "most objects die young"
  • Young Generation: Collected frequently and quickly (Minor GC)
  • Old Generation: Collected infrequently (Major GC)
  • Only objects surviving Young are promoted to Old
  • Write barriers track inter-generational references
  • V8, JVM, Go and most modern runtimes use generational GC.

14. References

  1. Crafting Interpreters - Robert Nystrom's free online compiler textbook
  2. Writing An Interpreter In Go - Thorsten Ball's Go interpreter book
  3. LLVM Tutorial: Kaleidoscope - LLVM official tutorial
  4. WebAssembly Specification - WebAssembly official spec
  5. Simple but Powerful Pratt Parsing - Pratt Parser explanation
  6. The Rust Reference - Rust language reference
  7. V8 Blog - TurboFan - V8 TurboFan JIT compiler
  8. Engineering a Compiler - Compiler engineering textbook
  9. PEG.js - PEG parser generator
  10. Tree-sitter - Incremental parsing library
  11. Binaryen - WebAssembly compiler infrastructure
  12. The GC Handbook - Garbage Collection handbook
  13. SWC Project - Rust-based TypeScript/JavaScript compiler