Skip to content
Published on

Compiler & Interpreter Design Complete Guide 2025: Lexer, Parser, AST, Code Generation

Authors

Table of Contents

1. Why Learn Compilers

Compiler theory is one of the core fields of computer science. Understanding compilers gives you deep insight into how programming languages work and has numerous practical applications.

1.1 Practical Applications

Application AreaExamples
DSL DesignConfiguration languages, query languages, template engines
TranspilersTypeScript, Babel, PostCSS
Linters/FormattersESLint, Prettier, rustfmt
IDE FeaturesAutocomplete, refactoring, syntax highlighting
Code AnalysisStatic analysis, security scanners, dependency analysis
Build Toolswebpack, esbuild, SWC, Vite

1.2 Compilation Pipeline Overview

Source Code -> Compilation Pipeline:

┌──────────┐    ┌──────────┐    ┌──────────┐
Source   │───>Lexer   │───>ParserCode(Tokenizer)│    │          │
└──────────┘    └──────────┘    └──────────┘
                                      v
┌──────────┐    ┌──────────┐    ┌──────────┐
Code<───│Optimizer │<───│ Semantic│Generator │    │          │    │ Analysis└──────────┘    └──────────┘    └──────────┘
      v
┌──────────┐
TargetCode└──────────┘

Role of each stage:
1. Lexer: Source code -> Token stream
2. Parser: Token stream -> AST (Abstract Syntax Tree)
3. Semantic Analysis: Type checking, scope resolution
4. Optimizer: Optimization at IR level
5. Code Generator: Generate target code (machine code, bytecode, WASM, etc.)

2. Lexer Implementation

The Lexer (Tokenizer) transforms source code strings into a stream of Tokens.

2.1 Token Type Definition

// Token type enumeration
enum TokenType {
  // Literals
  Number = 'Number',
  String = 'String',
  Boolean = 'Boolean',

  // Identifiers & Keywords
  Identifier = 'Identifier',
  Let = 'Let',
  Const = 'Const',
  Function = 'Function',
  If = 'If',
  Else = 'Else',
  Return = 'Return',
  While = 'While',
  For = 'For',

  // Operators
  Plus = 'Plus',           // +
  Minus = 'Minus',         // -
  Star = 'Star',           // *
  Slash = 'Slash',         // /
  Percent = 'Percent',     // %
  Assign = 'Assign',       // =
  Equal = 'Equal',         // ==
  NotEqual = 'NotEqual',   // !=
  LessThan = 'LessThan',  // <
  GreaterThan = 'GreaterThan', // >
  And = 'And',             // &&
  Or = 'Or',               // ||
  Not = 'Not',             // !

  // Delimiters
  LeftParen = 'LeftParen',     // (
  RightParen = 'RightParen',   // )
  LeftBrace = 'LeftBrace',     // { (inside code blocks only)
  RightBrace = 'RightBrace',   // } (inside code blocks only)
  LeftBracket = 'LeftBracket', // [
  RightBracket = 'RightBracket', // ]
  Comma = 'Comma',
  Semicolon = 'Semicolon',
  Colon = 'Colon',
  Dot = 'Dot',
  Arrow = 'Arrow',         // =>

  // Special
  EOF = 'EOF',
}

// Token interface
interface Token {
  type: TokenType;
  value: string;
  line: number;
  column: number;
}

2.2 Lexer Class Implementation

class Lexer {
  private source: string;
  private pos: number = 0;
  private line: number = 1;
  private column: number = 1;
  private tokens: Token[] = [];

  // Keyword map
  private static keywords: Map<string, TokenType> = new Map([
    ['let', TokenType.Let],
    ['const', TokenType.Const],
    ['function', TokenType.Function],
    ['if', TokenType.If],
    ['else', TokenType.Else],
    ['return', TokenType.Return],
    ['while', TokenType.While],
    ['for', TokenType.For],
    ['true', TokenType.Boolean],
    ['false', TokenType.Boolean],
  ]);

  constructor(source: string) {
    this.source = source;
  }

  tokenize(): Token[] {
    while (this.pos < this.source.length) {
      this.skipWhitespace();
      this.skipComments();

      if (this.pos >= this.source.length) break;

      const char = this.source[this.pos];

      if (this.isDigit(char)) {
        this.readNumber();
      } else if (this.isAlpha(char) || char === '_') {
        this.readIdentifier();
      } else if (char === '"' || char === "'") {
        this.readString(char);
      } else {
        this.readOperator();
      }
    }

    this.tokens.push(this.makeToken(TokenType.EOF, ''));
    return this.tokens;
  }

  private readNumber(): void {
    const start = this.pos;
    let isFloat = false;

    while (this.pos < this.source.length && this.isDigit(this.current())) {
      this.advance();
    }

    // Handle decimal point
    if (this.current() === '.' && this.isDigit(this.peek())) {
      isFloat = true;
      this.advance(); // Skip '.'
      while (this.pos < this.source.length && this.isDigit(this.current())) {
        this.advance();
      }
    }

    const value = this.source.slice(start, this.pos);
    this.tokens.push(this.makeToken(TokenType.Number, value));
  }

  private readIdentifier(): void {
    const start = this.pos;

    while (
      this.pos < this.source.length &&
      (this.isAlphaNumeric(this.current()) || this.current() === '_')
    ) {
      this.advance();
    }

    const value = this.source.slice(start, this.pos);
    const type = Lexer.keywords.get(value) || TokenType.Identifier;
    this.tokens.push(this.makeToken(type, value));
  }

  private readString(quote: string): void {
    this.advance(); // Skip opening quote
    const start = this.pos;
    let value = '';

    while (this.pos < this.source.length && this.current() !== quote) {
      if (this.current() === '\\') {
        this.advance();
        switch (this.current()) {
          case 'n': value += '\n'; break;
          case 't': value += '\t'; break;
          case '\\': value += '\\'; break;
          default: value += this.current();
        }
      } else {
        value += this.current();
      }
      this.advance();
    }

    if (this.current() !== quote) {
      throw new Error(
        'Unterminated string at line ' + this.line
      );
    }
    this.advance(); // Skip closing quote

    this.tokens.push(this.makeToken(TokenType.String, value));
  }

  private readOperator(): void {
    const char = this.current();
    const next = this.peek();

    // Check two-character operators
    const twoChar = char + (next || '');
    const twoCharOps: Record<string, TokenType> = {
      '==': TokenType.Equal,
      '!=': TokenType.NotEqual,
      '&&': TokenType.And,
      '||': TokenType.Or,
      '=>': TokenType.Arrow,
    };

    if (twoCharOps[twoChar]) {
      this.tokens.push(this.makeToken(twoCharOps[twoChar], twoChar));
      this.advance();
      this.advance();
      return;
    }

    // Single-character operators
    const oneCharOps: Record<string, TokenType> = {
      '+': TokenType.Plus,
      '-': TokenType.Minus,
      '*': TokenType.Star,
      '/': TokenType.Slash,
      '%': TokenType.Percent,
      '=': TokenType.Assign,
      '<': TokenType.LessThan,
      '>': TokenType.GreaterThan,
      '!': TokenType.Not,
      '(': TokenType.LeftParen,
      ')': TokenType.RightParen,
      '[': TokenType.LeftBracket,
      ']': TokenType.RightBracket,
      ',': TokenType.Comma,
      ';': TokenType.Semicolon,
      ':': TokenType.Colon,
      '.': TokenType.Dot,
    };

    // Handle braces
    if (char === '{') {
      this.tokens.push(this.makeToken(TokenType.LeftBrace, char));
      this.advance();
      return;
    }
    if (char === '}') {
      this.tokens.push(this.makeToken(TokenType.RightBrace, char));
      this.advance();
      return;
    }

    if (oneCharOps[char]) {
      this.tokens.push(this.makeToken(oneCharOps[char], char));
      this.advance();
    } else {
      throw new Error(
        'Unexpected character: ' + char + ' at line ' + this.line
      );
    }
  }

  // Utility methods
  private current(): string { return this.source[this.pos]; }
  private peek(): string { return this.source[this.pos + 1]; }
  private advance(): void {
    if (this.current() === '\n') {
      this.line++;
      this.column = 1;
    } else {
      this.column++;
    }
    this.pos++;
  }
  private isDigit(c: string): boolean { return /[0-9]/.test(c); }
  private isAlpha(c: string): boolean { return /[a-zA-Z_]/.test(c); }
  private isAlphaNumeric(c: string): boolean { return /[a-zA-Z0-9_]/.test(c); }
  private skipWhitespace(): void {
    while (this.pos < this.source.length && /\s/.test(this.current())) {
      this.advance();
    }
  }
  private skipComments(): void {
    if (this.current() === '/' && this.peek() === '/') {
      while (this.pos < this.source.length && this.current() !== '\n') {
        this.advance();
      }
    }
  }
  private makeToken(type: TokenType, value: string): Token {
    return { type, value, line: this.line, column: this.column };
  }
}

2.3 Error Recovery

// Lexer with error recovery
class RobustLexer extends Lexer {
  private errors: LexerError[] = [];

  tokenize(): Token[] {
    while (this.pos < this.source.length) {
      try {
        this.skipWhitespace();
        if (this.pos >= this.source.length) break;
        // ... normal tokenization logic
      } catch (error) {
        // Record error and advance to next valid position
        this.errors.push({
          message: error.message,
          line: this.line,
          column: this.column,
        });
        this.synchronize(); // Skip to next valid token start
      }
    }
    return this.tokens;
  }

  private synchronize(): void {
    // Skip to next whitespace or delimiter
    while (
      this.pos < this.source.length &&
      !/[\s;,(){}[\]]/.test(this.current())
    ) {
      this.advance();
    }
  }
}

3. Parser Implementation

The Parser transforms a token stream into an AST (Abstract Syntax Tree).

3.1 Parser Type Comparison

Parser Type Comparison:
┌─────────────────────┬────────────┬────────────┬──────────────┐
TypeDifficultyExpressiveNotable Tool├─────────────────────┼────────────┼────────────┼──────────────┤
Recursive DescentEasyLL(k)GCC, GoPratt ParserMediumPrecedenceV8, RustPEGMediumUnlimited  │ pest, PEG.jsParser CombinatorsMediumFlexible   │ nom, parsec  │
LR/LALRHardLR(1)      │ yacc, bison  │
EarleyMediumAll CFGs   │ nearley      │
└─────────────────────┴────────────┴────────────┴──────────────┘

3.2 Recursive Descent Parser

// AST Node type definition
type ASTNode =
  | NumberLiteral
  | StringLiteral
  | BooleanLiteral
  | Identifier
  | BinaryExpression
  | UnaryExpression
  | CallExpression
  | VariableDeclaration
  | FunctionDeclaration
  | IfStatement
  | WhileStatement
  | ReturnStatement
  | BlockStatement
  | Program;

interface NumberLiteral {
  type: 'NumberLiteral';
  value: number;
}

interface BinaryExpression {
  type: 'BinaryExpression';
  operator: string;
  left: ASTNode;
  right: ASTNode;
}

interface VariableDeclaration {
  type: 'VariableDeclaration';
  kind: 'let' | 'const';
  name: string;
  initializer: ASTNode | null;
}

interface FunctionDeclaration {
  type: 'FunctionDeclaration';
  name: string;
  params: string[];
  body: BlockStatement;
}

interface IfStatement {
  type: 'IfStatement';
  condition: ASTNode;
  consequent: BlockStatement;
  alternate: BlockStatement | IfStatement | null;
}

interface Program {
  type: 'Program';
  body: ASTNode[];
}

// Recursive Descent Parser
class RecursiveDescentParser {
  private tokens: Token[];
  private pos: number = 0;

  constructor(tokens: Token[]) {
    this.tokens = tokens;
  }

  parse(): Program {
    const body: ASTNode[] = [];

    while (!this.isAtEnd()) {
      body.push(this.parseStatement());
    }

    return { type: 'Program', body };
  }

  private parseStatement(): ASTNode {
    const token = this.current();

    switch (token.type) {
      case TokenType.Let:
      case TokenType.Const:
        return this.parseVariableDeclaration();
      case TokenType.Function:
        return this.parseFunctionDeclaration();
      case TokenType.If:
        return this.parseIfStatement();
      case TokenType.While:
        return this.parseWhileStatement();
      case TokenType.Return:
        return this.parseReturnStatement();
      case TokenType.LeftBrace:
        return this.parseBlockStatement();
      default:
        return this.parseExpressionStatement();
    }
  }

  private parseVariableDeclaration(): VariableDeclaration {
    const kind = this.consume().value as 'let' | 'const';
    const name = this.expect(TokenType.Identifier).value;

    let initializer: ASTNode | null = null;
    if (this.match(TokenType.Assign)) {
      initializer = this.parseExpression();
    }

    this.expect(TokenType.Semicolon);

    return { type: 'VariableDeclaration', kind, name, initializer };
  }

  private parseFunctionDeclaration(): FunctionDeclaration {
    this.expect(TokenType.Function);
    const name = this.expect(TokenType.Identifier).value;

    this.expect(TokenType.LeftParen);
    const params: string[] = [];

    if (!this.check(TokenType.RightParen)) {
      do {
        params.push(this.expect(TokenType.Identifier).value);
      } while (this.match(TokenType.Comma));
    }

    this.expect(TokenType.RightParen);
    const body = this.parseBlockStatement();

    return { type: 'FunctionDeclaration', name, params, body };
  }

  private parseIfStatement(): IfStatement {
    this.expect(TokenType.If);
    this.expect(TokenType.LeftParen);
    const condition = this.parseExpression();
    this.expect(TokenType.RightParen);

    const consequent = this.parseBlockStatement();

    let alternate: BlockStatement | IfStatement | null = null;
    if (this.match(TokenType.Else)) {
      if (this.check(TokenType.If)) {
        alternate = this.parseIfStatement();
      } else {
        alternate = this.parseBlockStatement();
      }
    }

    return { type: 'IfStatement', condition, consequent, alternate };
  }

  // Utility methods
  private current(): Token { return this.tokens[this.pos]; }
  private consume(): Token { return this.tokens[this.pos++]; }
  private isAtEnd(): boolean { return this.current().type === TokenType.EOF; }
  private check(type: TokenType): boolean { return this.current().type === type; }
  private match(type: TokenType): boolean {
    if (this.check(type)) { this.consume(); return true; }
    return false;
  }
  private expect(type: TokenType): Token {
    if (this.check(type)) return this.consume();
    throw new Error(
      'Expected ' + type + ' but got ' + this.current().type +
      ' at line ' + this.current().line
    );
  }
}

3.3 Pratt Parser (Operator Precedence Parser)

The Pratt Parser elegantly handles operator precedence. It is used in many modern language implementations including V8 (JavaScript) and the Rust compiler.

// Operator precedence definition
enum Precedence {
  None = 0,
  Assignment = 1,  // =
  Or = 2,          // ||
  And = 3,         // &&
  Equality = 4,    // == !=
  Comparison = 5,  // < > (including less-than-or-equal, greater-than-or-equal)
  Term = 6,        // + -
  Factor = 7,      // * / %
  Unary = 8,       // ! -
  Call = 9,        // ()
  Member = 10,     // .
}

// Prefix parser function type (numbers, identifiers, parentheses, unary operators)
type PrefixParseFn = () => ASTNode;

// Infix parser function type (binary operators, function calls)
type InfixParseFn = (left: ASTNode) => ASTNode;

class PrattParser {
  private tokens: Token[];
  private pos: number = 0;

  // Prefix parser registry
  private prefixParsers: Map<TokenType, PrefixParseFn> = new Map();
  // Infix parser registry
  private infixParsers: Map<TokenType, InfixParseFn> = new Map();
  // Precedence registry
  private precedences: Map<TokenType, Precedence> = new Map();

  constructor(tokens: Token[]) {
    this.tokens = tokens;

    // Register prefix parsers
    this.registerPrefix(TokenType.Number, this.parseNumber.bind(this));
    this.registerPrefix(TokenType.String, this.parseString.bind(this));
    this.registerPrefix(TokenType.Boolean, this.parseBoolean.bind(this));
    this.registerPrefix(TokenType.Identifier, this.parseIdentifier.bind(this));
    this.registerPrefix(TokenType.LeftParen, this.parseGrouping.bind(this));
    this.registerPrefix(TokenType.Minus, this.parseUnary.bind(this));
    this.registerPrefix(TokenType.Not, this.parseUnary.bind(this));

    // Register infix parsers
    this.registerInfix(TokenType.Plus, Precedence.Term, this.parseBinary.bind(this));
    this.registerInfix(TokenType.Minus, Precedence.Term, this.parseBinary.bind(this));
    this.registerInfix(TokenType.Star, Precedence.Factor, this.parseBinary.bind(this));
    this.registerInfix(TokenType.Slash, Precedence.Factor, this.parseBinary.bind(this));
    this.registerInfix(TokenType.Equal, Precedence.Equality, this.parseBinary.bind(this));
    this.registerInfix(TokenType.NotEqual, Precedence.Equality, this.parseBinary.bind(this));
    this.registerInfix(TokenType.LessThan, Precedence.Comparison, this.parseBinary.bind(this));
    this.registerInfix(TokenType.GreaterThan, Precedence.Comparison, this.parseBinary.bind(this));
    this.registerInfix(TokenType.And, Precedence.And, this.parseBinary.bind(this));
    this.registerInfix(TokenType.Or, Precedence.Or, this.parseBinary.bind(this));
    this.registerInfix(TokenType.LeftParen, Precedence.Call, this.parseCall.bind(this));
  }

  // Core: Pratt parsing algorithm
  parseExpression(precedence: Precedence = Precedence.None): ASTNode {
    const token = this.current();
    const prefixFn = this.prefixParsers.get(token.type);

    if (!prefixFn) {
      throw new Error('No prefix parser for ' + token.type);
    }

    let left = prefixFn();

    // Continue parsing if higher precedence infix operator exists
    while (
      !this.isAtEnd() &&
      precedence < this.getPrecedence(this.current().type)
    ) {
      const infixFn = this.infixParsers.get(this.current().type);
      if (!infixFn) break;
      left = infixFn(left);
    }

    return left;
  }

  // Binary operator parsing
  private parseBinary(left: ASTNode): ASTNode {
    const operator = this.consume();
    const precedence = this.precedences.get(operator.type) || Precedence.None;
    const right = this.parseExpression(precedence); // Left-associative

    return {
      type: 'BinaryExpression',
      operator: operator.value,
      left,
      right,
    };
  }

  // Function call parsing
  private parseCall(left: ASTNode): ASTNode {
    this.consume(); // Consume '('
    const args: ASTNode[] = [];

    if (!this.check(TokenType.RightParen)) {
      do {
        args.push(this.parseExpression());
      } while (this.match(TokenType.Comma));
    }

    this.expect(TokenType.RightParen);

    return {
      type: 'CallExpression',
      callee: left,
      arguments: args,
    } as CallExpression;
  }

  // Utility methods
  private registerPrefix(type: TokenType, fn: PrefixParseFn): void {
    this.prefixParsers.set(type, fn);
  }

  private registerInfix(type: TokenType, prec: Precedence, fn: InfixParseFn): void {
    this.infixParsers.set(type, fn);
    this.precedences.set(type, prec);
  }

  private getPrecedence(type: TokenType): Precedence {
    return this.precedences.get(type) || Precedence.None;
  }

  private parseNumber(): ASTNode {
    const token = this.consume();
    return { type: 'NumberLiteral', value: parseFloat(token.value) };
  }

  private parseString(): ASTNode {
    const token = this.consume();
    return { type: 'StringLiteral', value: token.value } as StringLiteral;
  }

  private parseBoolean(): ASTNode {
    const token = this.consume();
    return { type: 'BooleanLiteral', value: token.value === 'true' } as BooleanLiteral;
  }

  private parseIdentifier(): ASTNode {
    const token = this.consume();
    return { type: 'Identifier', name: token.value } as Identifier;
  }

  private parseGrouping(): ASTNode {
    this.consume(); // Consume '('
    const expr = this.parseExpression();
    this.expect(TokenType.RightParen);
    return expr;
  }

  private parseUnary(): ASTNode {
    const operator = this.consume();
    const operand = this.parseExpression(Precedence.Unary);
    return {
      type: 'UnaryExpression',
      operator: operator.value,
      operand,
    } as UnaryExpression;
  }

  // Common utilities
  private current(): Token { return this.tokens[this.pos]; }
  private consume(): Token { return this.tokens[this.pos++]; }
  private isAtEnd(): boolean { return this.current().type === TokenType.EOF; }
  private check(type: TokenType): boolean { return this.current().type === type; }
  private match(type: TokenType): boolean {
    if (this.check(type)) { this.consume(); return true; }
    return false;
  }
  private expect(type: TokenType): Token {
    if (this.check(type)) return this.consume();
    throw new Error('Expected ' + type + ' but got ' + this.current().type);
  }
}

4. AST Design and Visitor Pattern

4.1 AST Node Design Principles

// Each AST node includes location information
interface ASTNodeBase {
  type: string;
  loc: SourceLocation;
}

interface SourceLocation {
  start: Position;
  end: Position;
  source?: string;
}

interface Position {
  line: number;
  column: number;
  offset: number;
}

4.2 Visitor Pattern

// Visitor interface
interface ASTVisitor<T> {
  visitProgram(node: Program): T;
  visitNumberLiteral(node: NumberLiteral): T;
  visitStringLiteral(node: StringLiteral): T;
  visitBinaryExpression(node: BinaryExpression): T;
  visitUnaryExpression(node: UnaryExpression): T;
  visitIdentifier(node: Identifier): T;
  visitVariableDeclaration(node: VariableDeclaration): T;
  visitFunctionDeclaration(node: FunctionDeclaration): T;
  visitCallExpression(node: CallExpression): T;
  visitIfStatement(node: IfStatement): T;
  visitReturnStatement(node: ReturnStatement): T;
  visitBlockStatement(node: BlockStatement): T;
}

// AST Printer (for debugging)
class ASTPrinter implements ASTVisitor<string> {
  private indent: number = 0;

  visitProgram(node: Program): string {
    return node.body.map((stmt) => this.visit(stmt)).join('\n');
  }

  visitBinaryExpression(node: BinaryExpression): string {
    return (
      '(' +
      this.visit(node.left) +
      ' ' + node.operator + ' ' +
      this.visit(node.right) +
      ')'
    );
  }

  visitNumberLiteral(node: NumberLiteral): string {
    return node.value.toString();
  }

  visitIdentifier(node: Identifier): string {
    return node.name;
  }

  // Generic visit dispatcher
  visit(node: ASTNode): string {
    switch (node.type) {
      case 'Program': return this.visitProgram(node as Program);
      case 'NumberLiteral': return this.visitNumberLiteral(node as NumberLiteral);
      case 'BinaryExpression': return this.visitBinaryExpression(node as BinaryExpression);
      case 'Identifier': return this.visitIdentifier(node as Identifier);
      default: return 'Unknown: ' + node.type;
    }
  }
}

5. Semantic Analysis

5.1 Symbol Table and Scope

// Symbol information
interface Symbol {
  name: string;
  type: DataType;
  kind: 'variable' | 'function' | 'parameter';
  mutable: boolean;
  scope: Scope;
}

// Scope (lexical scope chain)
class Scope {
  private symbols: Map<string, Symbol> = new Map();
  private parent: Scope | null;

  constructor(parent: Scope | null = null) {
    this.parent = parent;
  }

  define(symbol: Symbol): void {
    if (this.symbols.has(symbol.name)) {
      throw new Error('Symbol already defined: ' + symbol.name);
    }
    this.symbols.set(symbol.name, symbol);
  }

  resolve(name: string): Symbol | null {
    if (this.symbols.has(name)) {
      return this.symbols.get(name)!;
    }
    // Search in parent scope
    if (this.parent) {
      return this.parent.resolve(name);
    }
    return null;
  }

  createChild(): Scope {
    return new Scope(this);
  }
}

5.2 Type Checking

// Type system
type DataType =
  | { kind: 'number' }
  | { kind: 'string' }
  | { kind: 'boolean' }
  | { kind: 'void' }
  | { kind: 'function'; params: DataType[]; returnType: DataType }
  | { kind: 'array'; elementType: DataType };

class TypeChecker implements ASTVisitor<DataType> {
  private currentScope: Scope = new Scope();

  visitBinaryExpression(node: BinaryExpression): DataType {
    const leftType = this.visit(node.left);
    const rightType = this.visit(node.right);

    // Arithmetic operators
    if (['+', '-', '*', '/', '%'].includes(node.operator)) {
      if (node.operator === '+') {
        // Allow string concatenation
        if (leftType.kind === 'string' || rightType.kind === 'string') {
          return { kind: 'string' };
        }
      }

      if (leftType.kind !== 'number' || rightType.kind !== 'number') {
        throw new TypeError(
          'Operator ' + node.operator +
          ' requires number operands, got ' +
          leftType.kind + ' and ' + rightType.kind
        );
      }
      return { kind: 'number' };
    }

    // Comparison operators
    if (['==', '!=', '<', '>'].includes(node.operator)) {
      if (leftType.kind !== rightType.kind) {
        throw new TypeError(
          'Cannot compare ' + leftType.kind + ' with ' + rightType.kind
        );
      }
      return { kind: 'boolean' };
    }

    // Logical operators
    if (['&&', '||'].includes(node.operator)) {
      if (leftType.kind !== 'boolean' || rightType.kind !== 'boolean') {
        throw new TypeError('Logical operators require boolean operands');
      }
      return { kind: 'boolean' };
    }

    throw new TypeError('Unknown operator: ' + node.operator);
  }

  visit(node: ASTNode): DataType {
    switch (node.type) {
      case 'NumberLiteral': return { kind: 'number' };
      case 'StringLiteral': return { kind: 'string' };
      case 'BooleanLiteral': return { kind: 'boolean' };
      case 'BinaryExpression': return this.visitBinaryExpression(node as BinaryExpression);
      default: return { kind: 'void' };
    }
  }
}

6. Intermediate Representation (IR)

6.1 Three-Address Code

// Three-Address Code
// Each instruction has at most 3 addresses (operands)

type IRInstruction =
  | { op: 'CONST'; dest: string; value: number }
  | { op: 'ADD' | 'SUB' | 'MUL' | 'DIV'; dest: string; left: string; right: string }
  | { op: 'COPY'; dest: string; src: string }
  | { op: 'LABEL'; name: string }
  | { op: 'JUMP'; target: string }
  | { op: 'JUMP_IF_FALSE'; condition: string; target: string }
  | { op: 'CALL'; dest: string; fn: string; args: string[] }
  | { op: 'RETURN'; value: string | null };

// AST -> IR conversion
class IRGenerator implements ASTVisitor<string> {
  private instructions: IRInstruction[] = [];
  private tempCounter: number = 0;
  private labelCounter: number = 0;

  private newTemp(): string {
    return '_t' + (this.tempCounter++);
  }

  private newLabel(): string {
    return '_L' + (this.labelCounter++);
  }

  private emit(instruction: IRInstruction): void {
    this.instructions.push(instruction);
  }

  visitBinaryExpression(node: BinaryExpression): string {
    const left = this.visit(node.left);
    const right = this.visit(node.right);
    const dest = this.newTemp();

    const opMap: Record<string, string> = {
      '+': 'ADD', '-': 'SUB', '*': 'MUL', '/': 'DIV',
    };

    this.emit({
      op: opMap[node.operator] as 'ADD',
      dest,
      left,
      right,
    });

    return dest;
  }

  visitIfStatement(node: IfStatement): string {
    const condition = this.visit(node.condition);
    const elseLabel = this.newLabel();
    const endLabel = this.newLabel();

    this.emit({ op: 'JUMP_IF_FALSE', condition, target: elseLabel });

    // then block
    this.visit(node.consequent as ASTNode);
    this.emit({ op: 'JUMP', target: endLabel });

    // else block
    this.emit({ op: 'LABEL', name: elseLabel });
    if (node.alternate) {
      this.visit(node.alternate as ASTNode);
    }

    this.emit({ op: 'LABEL', name: endLabel });
    return '';
  }

  visit(node: ASTNode): string {
    switch (node.type) {
      case 'NumberLiteral': {
        const dest = this.newTemp();
        this.emit({ op: 'CONST', dest, value: (node as NumberLiteral).value });
        return dest;
      }
      case 'BinaryExpression':
        return this.visitBinaryExpression(node as BinaryExpression);
      case 'IfStatement':
        return this.visitIfStatement(node as IfStatement);
      default:
        return '';
    }
  }

  getInstructions(): IRInstruction[] {
    return this.instructions;
  }
}

6.2 SSA Form and LLVM IR

LLVM IR Basic Structure:

; Function definition
define i32 @add(i32 %a, i32 %b) {
entry:
  %sum = add i32 %a, %b
  ret i32 %sum
}

; if-else structure
define i32 @max(i32 %a, i32 %b) {
entry:
  %cmp = icmp sgt i32 %a, %b
  br i1 %cmp, label %then, label %else

then:
  ret i32 %a

else:
  ret i32 %b
}

; Loop structure
define i32 @sum_to_n(i32 %n) {
entry:
  br label %loop

loop:
  %i = phi i32 [ 0, %entry ], [ %next_i, %loop ]
  %sum = phi i32 [ 0, %entry ], [ %next_sum, %loop ]
  %next_sum = add i32 %sum, %i
  %next_i = add i32 %i, 1
  %cond = icmp slt i32 %next_i, %n
  br i1 %cond, label %loop, label %exit

exit:
  ret i32 %next_sum
}

SSA (Static Single Assignment) Characteristics:
- Each variable is assigned exactly once
- Phi nodes handle control flow join points
- Facilitates optimization passes

7. Optimization Passes

7.1 Constant Folding

// Constant Folding: Evaluate constant expressions at compile time
class ConstantFolding {
  optimize(node: ASTNode): ASTNode {
    if (node.type === 'BinaryExpression') {
      const binary = node as BinaryExpression;
      const left = this.optimize(binary.left);
      const right = this.optimize(binary.right);

      // If both sides are constants, compute at compile time
      if (left.type === 'NumberLiteral' && right.type === 'NumberLiteral') {
        const leftVal = (left as NumberLiteral).value;
        const rightVal = (right as NumberLiteral).value;

        let result: number;
        switch (binary.operator) {
          case '+': result = leftVal + rightVal; break;
          case '-': result = leftVal - rightVal; break;
          case '*': result = leftVal * rightVal; break;
          case '/': result = leftVal / rightVal; break;
          default: return { ...binary, left, right };
        }

        return { type: 'NumberLiteral', value: result };
      }

      return { ...binary, left, right };
    }

    return node;
  }
}

// Example: 2 + 3 * 4 -> 14

7.2 Dead Code Elimination

// Dead Code Elimination
class DeadCodeElimination {
  optimize(program: Program): Program {
    const usedVariables = this.collectUsedVariables(program);

    const optimizedBody = program.body.filter((stmt) => {
      if (stmt.type === 'VariableDeclaration') {
        const decl = stmt as VariableDeclaration;
        // Remove if unused and has no side effects
        if (!usedVariables.has(decl.name) && !this.hasSideEffects(decl.initializer)) {
          return false;
        }
      }
      return true;
    });

    return { ...program, body: optimizedBody };
  }

  private collectUsedVariables(program: Program): Set<string> {
    const used = new Set<string>();

    function walk(node: ASTNode): void {
      if (node.type === 'Identifier') {
        used.add((node as Identifier).name);
      }
      for (const child of getChildren(node)) {
        walk(child);
      }
    }

    program.body.forEach(walk);
    return used;
  }

  private hasSideEffects(node: ASTNode | null): boolean {
    if (!node) return false;
    return node.type === 'CallExpression';
  }
}

7.3 Loop Optimization

Loop Optimization Techniques:

1. Loop Invariant Code Motion (LICM)
   // Before                    // After
   for (i = 0; i < n; i++)    x = a + b;  // Moved outside loop
     arr[i] = a + b + i;      for (i = 0; i < n; i++)
                                  arr[i] = x + i;

2. Loop Unrolling
   // Before                    // After
   for (i = 0; i < 4; i++)    arr[0] = 0;
     arr[i] = 0;               arr[1] = 0;
                                arr[2] = 0;
                                arr[3] = 0;

3. Strength Reduction
   // Before                    // After
   for (i = 0; i < n; i++)    t = 0;
     arr[i] = i * 4;           for (i = 0; i < n; i++)
                                  arr[i] = t;
                                  t = t + 4;  // Multiply -> Addition

8. Code Generation

8.1 WebAssembly Code Generation

// AST -> WebAssembly Text Format (WAT) code generation
class WasmCodeGenerator {
  private output: string[] = [];

  generate(program: Program): string {
    this.output.push('(module');

    for (const stmt of program.body) {
      if (stmt.type === 'FunctionDeclaration') {
        this.generateFunction(stmt as FunctionDeclaration);
      }
    }

    this.output.push(')');
    return this.output.join('\n');
  }

  private generateFunction(node: FunctionDeclaration): void {
    const params = node.params.map((p) => '(param $' + p + ' i32)').join(' ');

    this.output.push('  (func $' + node.name + ' ' + params + ' (result i32)');
    this.output.push('    (local $__result i32)');

    for (const stmt of node.body.body) {
      this.generateStatement(stmt);
    }

    this.output.push('  )');
    this.output.push('  (export "' + node.name + '" (func $' + node.name + '))');
  }

  private generateExpression(node: ASTNode): void {
    switch (node.type) {
      case 'NumberLiteral':
        this.output.push('    i32.const ' + (node as NumberLiteral).value);
        break;

      case 'Identifier':
        this.output.push('    local.get $' + (node as Identifier).name);
        break;

      case 'BinaryExpression': {
        const binary = node as BinaryExpression;
        this.generateExpression(binary.left);
        this.generateExpression(binary.right);

        const opMap: Record<string, string> = {
          '+': 'i32.add',
          '-': 'i32.sub',
          '*': 'i32.mul',
          '/': 'i32.div_s',
          '<': 'i32.lt_s',
          '>': 'i32.gt_s',
          '==': 'i32.eq',
          '!=': 'i32.ne',
        };

        this.output.push('    ' + opMap[binary.operator]);
        break;
      }
    }
  }

  private generateStatement(node: ASTNode): void {
    if (node.type === 'ReturnStatement') {
      const ret = node as ReturnStatement;
      if (ret.value) {
        this.generateExpression(ret.value);
      }
      this.output.push('    return');
    }
  }
}

8.2 Bytecode Generation

// Stack-based bytecode
enum OpCode {
  CONST = 0x01,        // Push constant
  ADD = 0x02,          // Addition
  SUB = 0x03,          // Subtraction
  MUL = 0x04,          // Multiplication
  DIV = 0x05,          // Division
  LOAD = 0x10,         // Load variable
  STORE = 0x11,        // Store variable
  JUMP = 0x20,         // Unconditional jump
  JUMP_IF_FALSE = 0x21,// Conditional jump
  CALL = 0x30,         // Function call
  RETURN = 0x31,       // Return
  PRINT = 0x40,        // Print
  HALT = 0xFF,         // Halt
}

class BytecodeCompiler {
  private bytecode: number[] = [];
  private constants: any[] = [];
  private locals: Map<string, number> = new Map();

  compile(program: Program): Uint8Array {
    for (const stmt of program.body) {
      this.compileNode(stmt);
    }
    this.emit(OpCode.HALT);

    return new Uint8Array(this.bytecode);
  }

  private compileNode(node: ASTNode): void {
    switch (node.type) {
      case 'NumberLiteral': {
        const index = this.addConstant((node as NumberLiteral).value);
        this.emit(OpCode.CONST);
        this.emit(index);
        break;
      }

      case 'BinaryExpression': {
        const binary = node as BinaryExpression;
        this.compileNode(binary.left);
        this.compileNode(binary.right);

        switch (binary.operator) {
          case '+': this.emit(OpCode.ADD); break;
          case '-': this.emit(OpCode.SUB); break;
          case '*': this.emit(OpCode.MUL); break;
          case '/': this.emit(OpCode.DIV); break;
        }
        break;
      }

      case 'VariableDeclaration': {
        const decl = node as VariableDeclaration;
        const slot = this.locals.size;
        this.locals.set(decl.name, slot);

        if (decl.initializer) {
          this.compileNode(decl.initializer);
          this.emit(OpCode.STORE);
          this.emit(slot);
        }
        break;
      }

      case 'Identifier': {
        const id = node as Identifier;
        const slot = this.locals.get(id.name);
        if (slot === undefined) {
          throw new Error('Undefined variable: ' + id.name);
        }
        this.emit(OpCode.LOAD);
        this.emit(slot);
        break;
      }
    }
  }

  private emit(byte: number): void {
    this.bytecode.push(byte);
  }

  private addConstant(value: any): number {
    this.constants.push(value);
    return this.constants.length - 1;
  }
}

9. Interpreter Design

9.1 Tree-Walking Interpreter

// Tree-Walking Interpreter
class TreeWalkInterpreter {
  private environment: Environment;

  constructor() {
    this.environment = new Environment();
    this.environment.define('print', (args: any[]) => {
      console.log(...args);
      return null;
    });
  }

  interpret(program: Program): any {
    let result: any = null;
    for (const stmt of program.body) {
      result = this.execute(stmt);
    }
    return result;
  }

  private execute(node: ASTNode): any {
    switch (node.type) {
      case 'NumberLiteral':
        return (node as NumberLiteral).value;

      case 'StringLiteral':
        return (node as StringLiteral).value;

      case 'BooleanLiteral':
        return (node as BooleanLiteral).value;

      case 'Identifier':
        return this.environment.get((node as Identifier).name);

      case 'BinaryExpression': {
        const binary = node as BinaryExpression;
        const left = this.execute(binary.left);
        const right = this.execute(binary.right);

        switch (binary.operator) {
          case '+': return left + right;
          case '-': return left - right;
          case '*': return left * right;
          case '/': return left / right;
          case '%': return left % right;
          case '==': return left === right;
          case '!=': return left !== right;
          case '<': return left < right;
          case '>': return left > right;
          case '&&': return left && right;
          case '||': return left || right;
          default: throw new Error('Unknown operator: ' + binary.operator);
        }
      }

      case 'FunctionDeclaration': {
        const fn = node as FunctionDeclaration;
        const closure = this.environment;

        const callable = (args: any[]) => {
          const fnEnv = new Environment(closure);
          fn.params.forEach((param, i) => {
            fnEnv.define(param, args[i]);
          });

          const prevEnv = this.environment;
          this.environment = fnEnv;

          try {
            for (const stmt of fn.body.body) {
              const result = this.execute(stmt);
              if (result && result.__return__) {
                return result.value;
              }
            }
          } finally {
            this.environment = prevEnv;
          }

          return null;
        };

        this.environment.define(fn.name, callable);
        return null;
      }

      case 'CallExpression': {
        const call = node as CallExpression;
        const fn = this.execute(call.callee);
        const args = call.arguments.map((arg: ASTNode) => this.execute(arg));
        return fn(args);
      }

      case 'IfStatement': {
        const ifStmt = node as IfStatement;
        const condition = this.execute(ifStmt.condition);

        if (condition) {
          return this.executeBlock(ifStmt.consequent);
        } else if (ifStmt.alternate) {
          if (ifStmt.alternate.type === 'BlockStatement') {
            return this.executeBlock(ifStmt.alternate as BlockStatement);
          }
          return this.execute(ifStmt.alternate);
        }
        return null;
      }

      case 'ReturnStatement': {
        const ret = node as ReturnStatement;
        return {
          __return__: true,
          value: ret.value ? this.execute(ret.value) : null,
        };
      }

      default:
        throw new Error('Unknown node type: ' + node.type);
    }
  }

  private executeBlock(block: BlockStatement): any {
    const prevEnv = this.environment;
    this.environment = new Environment(prevEnv);

    try {
      for (const stmt of block.body) {
        const result = this.execute(stmt);
        if (result && result.__return__) return result;
      }
    } finally {
      this.environment = prevEnv;
    }

    return null;
  }
}

// Environment - manages variable bindings
class Environment {
  private bindings: Map<string, any> = new Map();
  private parent: Environment | null;

  constructor(parent: Environment | null = null) {
    this.parent = parent;
  }

  define(name: string, value: any): void {
    this.bindings.set(name, value);
  }

  get(name: string): any {
    if (this.bindings.has(name)) {
      return this.bindings.get(name);
    }
    if (this.parent) return this.parent.get(name);
    throw new Error('Undefined variable: ' + name);
  }

  set(name: string, value: any): void {
    if (this.bindings.has(name)) {
      this.bindings.set(name, value);
      return;
    }
    if (this.parent) { this.parent.set(name, value); return; }
    throw new Error('Undefined variable: ' + name);
  }
}

9.2 Bytecode VM

// Stack-based Bytecode VM
class VirtualMachine {
  private stack: any[] = [];
  private locals: any[] = new Array(256).fill(null);
  private ip: number = 0; // Instruction pointer

  execute(bytecode: Uint8Array, constants: any[]): any {
    while (this.ip < bytecode.length) {
      const opcode = bytecode[this.ip++];

      switch (opcode) {
        case OpCode.CONST: {
          const index = bytecode[this.ip++];
          this.stack.push(constants[index]);
          break;
        }

        case OpCode.ADD: {
          const b = this.stack.pop();
          const a = this.stack.pop();
          this.stack.push(a + b);
          break;
        }

        case OpCode.SUB: {
          const b = this.stack.pop();
          const a = this.stack.pop();
          this.stack.push(a - b);
          break;
        }

        case OpCode.MUL: {
          const b = this.stack.pop();
          const a = this.stack.pop();
          this.stack.push(a * b);
          break;
        }

        case OpCode.DIV: {
          const b = this.stack.pop();
          const a = this.stack.pop();
          this.stack.push(a / b);
          break;
        }

        case OpCode.LOAD: {
          const slot = bytecode[this.ip++];
          this.stack.push(this.locals[slot]);
          break;
        }

        case OpCode.STORE: {
          const slot = bytecode[this.ip++];
          this.locals[slot] = this.stack.pop();
          break;
        }

        case OpCode.JUMP: {
          const target = bytecode[this.ip++];
          this.ip = target;
          break;
        }

        case OpCode.JUMP_IF_FALSE: {
          const target = bytecode[this.ip++];
          const condition = this.stack.pop();
          if (!condition) {
            this.ip = target;
          }
          break;
        }

        case OpCode.PRINT: {
          console.log(this.stack.pop());
          break;
        }

        case OpCode.HALT:
          return this.stack.length > 0 ? this.stack.pop() : null;

        default:
          throw new Error('Unknown opcode: 0x' + opcode.toString(16));
      }
    }

    return this.stack.length > 0 ? this.stack.pop() : null;
  }
}

10. Garbage Collection

10.1 GC Algorithm Comparison

GC Algorithm Comparison:
┌──────────────────────┬────────────┬─────────────┬────────────────┐
AlgorithmThroughputPause TimeMemory Overhead│
├──────────────────────┼────────────┼─────────────┼────────────────┤
Reference CountingHighShortCounter/object │
Mark-SweepHighCan be long │ Mark bits      │
Mark-CompactMediumCan be long │ Mark bits      │
GenerationalVery HighMostly short│ Card table     │
ConcurrentHighVery short  │ Write barriers │
└──────────────────────┴────────────┴─────────────┴────────────────┘

10.2 Mark-Sweep Implementation

// Simple Mark-Sweep GC
class GarbageCollector {
  private heap: GCObject[] = [];
  private roots: Set<GCObject> = new Set();

  allocate(value: any): GCObject {
    const obj: GCObject = {
      value,
      marked: false,
      references: [],
    };
    this.heap.push(obj);

    // Run GC if heap is too large
    if (this.heap.length > 1000) {
      this.collect();
    }

    return obj;
  }

  addRoot(obj: GCObject): void {
    this.roots.add(obj);
  }

  collect(): void {
    // Mark phase
    this.markAll();

    // Sweep phase
    this.sweep();
  }

  private markAll(): void {
    for (const root of this.roots) {
      this.mark(root);
    }
  }

  private mark(obj: GCObject): void {
    if (obj.marked) return; // Already marked (cycle prevention)
    obj.marked = true;

    // Mark referenced objects
    for (const ref of obj.references) {
      this.mark(ref);
    }
  }

  private sweep(): void {
    this.heap = this.heap.filter((obj) => {
      if (obj.marked) {
        obj.marked = false; // Reset for next GC
        return true;
      }
      // Unmarked objects are freed
      return false;
    });
  }
}

interface GCObject {
  value: any;
  marked: boolean;
  references: GCObject[];
}

11. Building a Mini Language

11.1 Full Pipeline Integration

// Mini Language "MiniLang" full pipeline
class MiniLang {
  // Source code -> Execution result
  static run(source: string): any {
    // 1. Tokenize
    const lexer = new Lexer(source);
    const tokens = lexer.tokenize();

    // 2. Parse
    const parser = new RecursiveDescentParser(tokens);
    const ast = parser.parse();

    // 3. Type check
    const typeChecker = new TypeChecker();
    // typeChecker.check(ast); // Optional

    // 4. Optimize
    const optimizer = new ConstantFolding();
    // const optimizedAst = optimizer.optimize(ast);

    // 5. Execute
    const interpreter = new TreeWalkInterpreter();
    return interpreter.interpret(ast);
  }
}

// Usage example
const source = `
let x = 10;
let y = 20;

function add(a, b) {
  return a + b;
}

function factorial(n) {
  if (n == 0) {
    return 1;
  }
  return n * factorial(n - 1);
}

let result = add(x, y);
print(result);
print(factorial(5));
`;

MiniLang.run(source);
// Output: 30
// Output: 120

12. Modern Compiler Projects

12.1 Major Project Architectures

Modern Compiler/Transpiler Comparison:

Rust Compiler (rustc):
  Source -> HIR -> MIR -> LLVM IR -> Machine Code
  Features: Ownership system, borrow checker, monomorphization

V8 TurboFan (JavaScript):
  Source -> AST -> Bytecode(Ignition) -> Optimized Code(TurboFan)
  Features: Inline caches, speculative optimization, deoptimization

TypeScript Compiler (tsc):
  Source -> Scanner -> Parser -> AST -> Binder -> Checker -> Emitter
  Features: Structural type system, type inference, .d.ts generation

SWC (Speedy Web Compiler):
  Source -> Lexer -> Parser -> AST -> Transform -> Codegen
  Features: Written in Rust, 20-70x faster than Babel

esbuild:
  Source -> Lexer -> Parser -> AST -> Linker -> Codegen
  Features: Written in Go, extremely fast bundling

Zig Compiler:
  Source -> AST -> ZIR -> AIR -> Machine Code/LLVM IR
  Features: comptime execution, incremental compilation

12.2 Learning Resource Roadmap

Compiler Learning Roadmap:

Beginner:
├── Crafting Interpreters (Robert Nystrom)
- Tree-Walking interpreter (Java)
- Bytecode VM (C)
├── Writing An Interpreter In Go (Thorsten Ball)
└── Mini Lisp interpreter implementation

Intermediate:
├── Engineering a Compiler (Cooper & Torczon)
- Data flow analysis
- Register allocation
├── LLVM Tutorial: Kaleidoscope
- LLVM IR generation
- JIT compilation
└── WebAssembly compiler implementation

Advanced:
├── Compilers: Principles, Techniques, and Tools (Dragon Book)
├── Advanced Compiler Design and Implementation (Muchnick)
├── SSA-based Compiler Design (Springer)
└── Contributing to real open-source compilers
    - Rust, V8, LLVM, GCC, etc.

13. Quiz

Test your knowledge with the quizzes below.

Q1. Explain the difference between Lexer and Parser roles, and what are their respective inputs and outputs?

A1.

Lexer (Tokenizer):

  • Role: Transform source code strings into a stream of Tokens
  • Input: Source code string (e.g., let x = 10 + 20;)
  • Output: Token array (e.g., [LET, IDENT("x"), ASSIGN, NUM(10), PLUS, NUM(20), SEMICOLON])
  • Removes whitespace, comments and distinguishes keywords, identifiers, literals, operators.

Parser (Syntax Analyzer):

  • Role: Transform token stream into AST (Abstract Syntax Tree)
  • Input: Token array
  • Output: AST (tree structure)
  • Determines relationships and structure between tokens according to grammar rules. Syntax errors are also detected at this stage.
Q2. Why is the Pratt Parser advantageous over recursive descent for handling operator precedence?

A2. In a recursive descent parser, handling operator precedence requires creating separate parsing functions for each precedence level. For example, parseAddition(), parseMultiplication(), parseComparison() -- functions keep growing as precedence levels increase.

Pratt Parser advantages:

  1. Single function handles all precedence levels (parseExpression(precedence))
  2. Precedence is expressed as numeric values making comparison simple
  3. Adding new operators only requires registering in a table
  4. Prefix and Infix are registered independently, allowing the same symbol to have different meanings (e.g., - as both unary and binary)

This approach is widely used in practice by V8 (JavaScript), the Rust compiler, and others.

Q3. Why is SSA (Static Single Assignment) form beneficial for compiler optimizations?

A3. In SSA form, each variable is assigned exactly once. This provides:

  1. Simplified def-use analysis: With only one definition point, tracking uses is straightforward
  2. Easier constant propagation: Safely propagate constant values since variables are assigned once
  3. Accurate dead code elimination: Precisely identify unused definitions
  4. Better register allocation: Clear variable lifetimes enable efficient allocation
  5. Control flow merging: Phi nodes clearly express values coming from different paths

Most modern compiler infrastructure including LLVM IR and GCC's GIMPLE are SSA-based.

Q4. What are the pros and cons of Tree-Walking interpreters vs Bytecode VM interpreters?

A4.

Tree-Walking Interpreter:

  • Pros: Simple and intuitive implementation, easy debugging with direct AST traversal
  • Cons: Slow due to node traversal overhead, poor cache efficiency, recursive call stack consumption

Bytecode VM Interpreter:

  • Pros: 3-10x faster than tree-walking, compact bytecode is cache-friendly, serializable (bytecode can be saved)
  • Cons: Complex implementation (requires both compiler and VM), harder to maintain debug info

Generally, tree-walking is used for prototyping and bytecode VM for production. CPython, Java JVM, and Lua all use the bytecode VM approach.

Q5. How does Mark-Sweep GC work, and how does Generational GC improve upon it?

A5.

Mark-Sweep GC:

  1. Mark phase: Starting from roots (globals, stack), mark all reachable objects
  2. Sweep phase: Scan entire heap and free unmarked objects
  • Downside: Must scan the entire heap, so pauses get longer as the heap grows

Generational GC improvements:

  • Based on the observation that "most objects die young"
  • Young Generation: Collected frequently and quickly (Minor GC)
  • Old Generation: Collected infrequently (Major GC)
  • Only objects surviving Young are promoted to Old
  • Write barriers track inter-generational references
  • V8, JVM, Go and most modern runtimes use generational GC.

14. References

  1. Crafting Interpreters - Robert Nystrom's free online compiler textbook
  2. Writing An Interpreter In Go - Thorsten Ball's Go interpreter book
  3. LLVM Tutorial: Kaleidoscope - LLVM official tutorial
  4. WebAssembly Specification - WebAssembly official spec
  5. Simple but Powerful Pratt Parsing - Pratt Parser explanation
  6. The Rust Reference - Rust language reference
  7. V8 Blog - TurboFan - V8 TurboFan JIT compiler
  8. Engineering a Compiler - Compiler engineering textbook
  9. PEG.js - PEG parser generator
  10. Tree-sitter - Incremental parsing library
  11. Binaryen - WebAssembly compiler infrastructure
  12. The GC Handbook - Garbage Collection handbook
  13. SWC Project - Rust-based TypeScript/JavaScript compiler