Split View: [컴파일러] 02. 프로그래밍 언어 기초와 컴파일러 기술 응용

[컴파일러] 02. 프로그래밍 언어 기초와 컴파일러 기술 응용

프로그래밍 언어 기초와 컴파일러 기술 응용

컴파일러를 제대로 이해하려면 프로그래밍 언어 자체에 대한 기초 지식이 필요합니다. 이 글에서는 언어의 발전 과정부터 핵심 개념, 그리고 컴파일러 기술이 컴파일 이외의 분야에 어떻게 활용되는지를 다룹니다.

1. 프로그래밍 언어의 발전

1세대: 기계어 (1940년대)

컴퓨터가 직접 실행할 수 있는 이진 코드입니다. 사람이 직접 작성하기 매우 어렵습니다.

10110000 01100001

2세대: 어셈블리어 (1950년대)

기계어를 기호(mnemonic)로 표현한 저급 언어입니다.

MOV AL, 61h
ADD AL, BL

3세대: 고급 언어 (1960년대~)

Fortran, COBOL, C, Java 등 사람이 이해하기 쉬운 언어가 등장했습니다.

int result = a + b;

4세대 이상

SQL 같은 도메인 특화 언어(DSL)와 Python, JavaScript 같은 스크립트 언어가 포함됩니다.

언어 설계와 컴파일러의 상호 작용

프로그래밍 언어의 설계와 컴파일러 기술은 서로 영향을 주고받습니다.

Fortran: 효율적인 수치 연산 컴파일이 목표였습니다.
Java: 바이트코드와 JIT 컴파일러를 통한 이식성을 강조했습니다.
Rust: 컴파일 타임 소유권 검사로 메모리 안전성을 보장합니다.

2. 정적(Static) vs 동적(Dynamic) 구분

컴파일러 이론에서 정적이란 컴파일 시간에 결정되는 것을, 동적이란 실행 시간에 결정되는 것을 의미합니다.

정적 타입 언어

변수의 타입이 컴파일 시간에 결정됩니다.

int x = 10;         // x는 컴파일 시 int로 확정
String s = "hello"; // s는 컴파일 시 String으로 확정
// x = "world";     // 컴파일 에러!

장점: 타입 오류를 실행 전에 발견할 수 있으며, 최적화가 용이합니다.

동적 타입 언어

변수의 타입이 실행 시간에 결정됩니다.

x = 10        # x는 현재 int
x = "hello"   # x가 str로 변경 (에러 없음)

장점: 유연하고 빠른 개발이 가능합니다.

정적/동적 비교

            컴파일 시간          실행 시간
           (Static)            (Dynamic)
타입 결정:  Java, C, Rust       Python, JS, Ruby
바인딩:    정적 바인딩           동적 바인딩
디스패치:  정적 디스패치          동적 디스패치 (가상 함수)
메모리:    스택 할당 결정        힙 할당 결정

3. 스코프 규칙(Scope Rules)

**스코프(scope)**란 프로그램에서 변수 선언이 유효한 범위를 말합니다.

정적 스코프(Static/Lexical Scope)

변수의 스코프가 소스 코드의 구조에 의해 결정됩니다. 대부분의 현대 언어(C, Java, Python 등)가 정적 스코프를 사용합니다.

int x = 10;

void foo() {
    printf("%d", x);  // 항상 전역 변수 x = 10 참조
}

void bar() {
    int x = 20;
    foo();  // foo 안에서 x는 여전히 10
}

정적 스코프에서는 함수가 정의된 위치에서 변수를 찾습니다.

동적 스코프(Dynamic Scope)

변수의 스코프가 실행 시간의 호출 순서에 의해 결정됩니다. 일부 Lisp 방언과 Bash 스크립트가 동적 스코프를 사용합니다.

#!/bin/bash
x=10

foo() {
    echo $x  # 호출 시점의 x 값을 참조
}

bar() {
    local x=20
    foo   # 동적 스코프이므로 x = 20 출력
}

bar  # 출력: 20

동적 스코프에서는 함수가 호출된 위치에서 변수를 찾습니다.

블록 구조와 스코프

C 계열 언어에서 블록({...})은 새로운 스코프를 만듭니다.

void example() {
    int x = 1;
    {
        int x = 2;  // 내부 블록의 x (외부 x를 가림)
        printf("%d\n", x);  // 2 출력
    }
    printf("%d\n", x);  // 1 출력 (외부 x)
}

컴파일러는 **스코프 체인(scope chain)**을 통해 변수 이름을 해결합니다. 현재 스코프에서 찾지 못하면 바깥 스코프를 차례로 탐색합니다.

4. 매개변수 전달 방식(Parameter Passing)

함수에 인자를 전달하는 방식은 언어마다 다릅니다.

값에 의한 호출(Call by Value)

인자의 값을 복사하여 함수에 전달합니다.

void swap(int a, int b) {
    int temp = a;
    a = b;
    b = temp;
    // a, b는 복사본이므로 원본에 영향 없음
}

int main() {
    int x = 1, y = 2;
    swap(x, y);
    // x = 1, y = 2 (변경 안 됨)
}

참조에 의한 호출(Call by Reference)

인자의 메모리 주소를 전달합니다.

void swap(int &a, int &b) {
    int temp = a;
    a = b;
    b = temp;
    // a, b는 원본의 별칭이므로 원본이 변경됨
}

int main() {
    int x = 1, y = 2;
    swap(x, y);
    // x = 2, y = 1 (변경됨)
}

이름에 의한 호출(Call by Name)

인자의 표현식 자체를 전달하며, 사용할 때마다 재평가합니다. Algol 60에서 사용했고 현대 언어에서는 거의 사용하지 않지만, Scala의 => T 문법이 유사한 개념입니다.

def condition(test: => Boolean, msg: => String): Unit = {
    if (test) println(msg)  // msg는 test가 true일 때만 평가됨
}

매개변수 전달 방식 비교

+------------------+----------+----------+------------------+
| 방식              | 복사 비용 | 원본 수정 | 대표 언어         |
+------------------+----------+----------+------------------+
| Call by Value    | 있음     | 불가     | C, Java(기본타입) |
| Call by Reference| 없음     | 가능     | C++, C#(ref)     |
| Call by Name     | 없음     | 가능     | Algol 60, Scala  |
+------------------+----------+----------+------------------+

5. 환경(Environment)과 상태(State)

컴파일러 이론에서 중요한 두 가지 매핑이 있습니다.

이름(name) --[환경]--> 저장 위치(location) --[상태]--> 값(value)

예:
  "x"  --[환경]--> 0x7fff1234  --[상태]--> 42

환경(Environment): 이름을 저장 위치에 매핑합니다. 변수 선언 시 생성됩니다.
상태(State): 저장 위치를 값에 매핑합니다. 대입문 실행 시 변경됩니다.

이 구분은 포인터와 참조를 이해하는 데 중요합니다. 같은 이름이 다른 환경에서 다른 위치를 가리킬 수 있고(스코프), 같은 위치의 값은 프로그램 실행 중 바뀔 수 있습니다(상태 변경).

6. 컴파일러 기술의 응용 분야

컴파일러 기술은 프로그래밍 언어 번역 외에도 다양한 분야에서 활용됩니다.

IDE 도구

현대 IDE(IntelliJ, VS Code 등)는 컴파일러 기술을 사용합니다.

구문 강조(Syntax Highlighting): 어휘 분석 기술을 활용합니다.
코드 자동 완성: 구문 분석과 타입 분석 결과를 이용합니다.
리팩토링: 구문 트리 변환 기술을 적용합니다.
실시간 에러 표시: 증분 파싱(incremental parsing) 기술을 사용합니다.

보안 분석

정적 분석(Static Analysis): 코드를 실행하지 않고 잠재적 취약점을 탐지합니다.
버퍼 오버플로 탐지: 데이터 흐름 분석을 통해 버퍼 크기 초과를 검사합니다.
오염 분석(Taint Analysis): 외부 입력이 위험한 함수에 도달하는지 추적합니다.

[보안 정적 분석의 흐름]
  소스 코드 --> [구문 분석] --> [데이터 흐름 분석] --> [취약점 보고]

프로그램 검증

컴파일러의 의미 분석 기술을 확장하여 프로그램이 명세를 충족하는지 검증합니다.

모델 체킹(Model Checking): 상태 공간 탐색을 통한 속성 검증
추상 해석(Abstract Interpretation): 프로그램의 근사적 의미 분석
의존 타입(Dependent Types): Agda, Idris 같은 언어에서 타입으로 속성을 증명

기타 응용 분야

자연어 처리(NLP): 문법 분석에 구문 분석 기법을 활용합니다.
하드웨어 합성: VHDL, Verilog 컴파일러가 하드웨어 회로를 생성합니다.
데이터베이스 쿼리: SQL 파서와 쿼리 최적화에 컴파일러 기술을 사용합니다.
도메인 특화 언어(DSL): 컴파일러 프론트엔드 기술로 커스텀 언어를 구축합니다.

정리

개념	핵심 내용
정적 vs 동적	컴파일 시간에 결정되면 정적, 실행 시간에 결정되면 동적
정적 스코프	코드의 구조(정의 위치)로 변수를 해결
동적 스코프	실행 시간 호출 순서로 변수를 해결
값 전달	값 복사, 원본 불변
참조 전달	주소 전달, 원본 변경 가능
환경 vs 상태	이름을 위치로, 위치를 값으로 매핑
컴파일러 응용	IDE, 보안 분석, 프로그램 검증, DSL 등

프로그래밍 언어의 기본 개념을 이해하는 것은 컴파일러의 각 단계가 왜 필요한지를 이해하는 데 필수적입니다. 다음 글에서는 간단한 구문 지시 번역기를 직접 만들어 보면서 컴파일러의 핵심 원리를 체험해 보겠습니다.

[Compiler] 02. Programming Language Basics and Compiler Technology Applications

Programming Language Basics and Compiler Technology Applications

To properly understand compilers, foundational knowledge of programming languages themselves is necessary. This article covers the evolution of languages, core concepts, and how compiler technology is utilized in fields beyond compilation.

1. Evolution of Programming Languages

1st Generation: Machine Language (1940s)

Binary code that computers can directly execute. Extremely difficult for humans to write directly.

10110000 01100001

2nd Generation: Assembly Language (1950s)

A low-level language that represents machine code with mnemonics.

MOV AL, 61h
ADD AL, BL

3rd Generation: High-Level Languages (1960s~)

Languages like Fortran, COBOL, C, and Java appeared, which are easier for humans to understand.

int result = a + b;

4th Generation and Beyond

Includes domain-specific languages (DSLs) like SQL and scripting languages like Python and JavaScript.

Interaction Between Language Design and Compilers

Programming language design and compiler technology influence each other.

Fortran: Aimed at efficient compilation of numerical computations.
Java: Emphasized portability through bytecode and JIT compilation.
Rust: Guarantees memory safety through compile-time ownership checking.

2. Static vs Dynamic Distinction

In compiler theory, static means determined at compile time, and dynamic means determined at runtime.

Statically Typed Languages

Variable types are determined at compile time.

int x = 10;         // x is fixed as int at compile time
String s = "hello"; // s is fixed as String at compile time
// x = "world";     // Compile error!

Advantages: Type errors can be detected before execution, and optimization is easier.

Dynamically Typed Languages

Variable types are determined at runtime.

x = 10        # x is currently int
x = "hello"   # x changes to str (no error)

Advantages: Flexible and enables rapid development.

Static/Dynamic Comparison

            Compile Time          Runtime
           (Static)              (Dynamic)
Type:      Java, C, Rust         Python, JS, Ruby
Binding:   Static binding        Dynamic binding
Dispatch:  Static dispatch       Dynamic dispatch (virtual functions)
Memory:    Stack allocation      Heap allocation

3. Scope Rules

A scope is the region of a program where a variable declaration is valid.

Static Scope (Lexical Scope)

The scope of a variable is determined by the structure of the source code. Most modern languages (C, Java, Python, etc.) use static scope.

int x = 10;

void foo() {
    printf("%d", x);  // Always refers to global variable x = 10
}

void bar() {
    int x = 20;
    foo();  // Inside foo, x is still 10
}

In static scope, functions look up variables at the location where they are defined.

Dynamic Scope

The scope of a variable is determined by the runtime call sequence. Some Lisp dialects and Bash scripts use dynamic scope.

#!/bin/bash
x=10

foo() {
    echo $x  # Refers to the value of x at the time of the call
}

bar() {
    local x=20
    foo   # Dynamic scope, so outputs x = 20
}

bar  # Output: 20

In dynamic scope, functions look up variables at the location where they are called.

Block Structure and Scope

In C-family languages, blocks ({...}) create new scopes.

void example() {
    int x = 1;
    {
        int x = 2;  // Inner block's x (shadows outer x)
        printf("%d\n", x);  // Prints 2
    }
    printf("%d\n", x);  // Prints 1 (outer x)
}

The compiler resolves variable names through a scope chain. If a variable is not found in the current scope, it searches the enclosing scopes in order.

4. Parameter Passing Mechanisms

The way arguments are passed to functions differs by language.

Call by Value

A copy of the value of the argument is passed to the function.

void swap(int a, int b) {
    int temp = a;
    a = b;
    b = temp;
    // a, b are copies, so the originals are unaffected
}

int main() {
    int x = 1, y = 2;
    swap(x, y);
    // x = 1, y = 2 (unchanged)
}

Call by Reference

The memory address of the argument is passed.

void swap(int &a, int &b) {
    int temp = a;
    a = b;
    b = temp;
    // a, b are aliases for the originals, so originals are modified
}

int main() {
    int x = 1, y = 2;
    swap(x, y);
    // x = 2, y = 1 (changed)
}

Call by Name

The expression itself of the argument is passed and re-evaluated each time it is used. Used in Algol 60 and rarely in modern languages, though Scala's => T syntax is a similar concept.

def condition(test: => Boolean, msg: => String): Unit = {
    if (test) println(msg)  // msg is evaluated only when test is true
}

Parameter Passing Comparison

+------------------+----------+----------+------------------+
| Method           | Copy Cost| Modifies | Representative   |
|                  |          | Original | Languages        |
+------------------+----------+----------+------------------+
| Call by Value    | Yes      | No       | C, Java(primitives)|
| Call by Reference| No       | Yes      | C++, C#(ref)     |
| Call by Name     | No       | Yes      | Algol 60, Scala  |
+------------------+----------+----------+------------------+

5. Environment and State

There are two important mappings in compiler theory.

name --[environment]--> storage location --[state]--> value

Example:
  "x"  --[environment]--> 0x7fff1234  --[state]--> 42

Environment: Maps names to storage locations. Created when variables are declared.
State: Maps storage locations to values. Changed when assignment statements are executed.

This distinction is important for understanding pointers and references. The same name can refer to different locations in different environments (scope), and the value at the same location can change during program execution (state change).

6. Applications of Compiler Technology

Compiler technology is utilized in various fields beyond programming language translation.

IDE Tools

Modern IDEs (IntelliJ, VS Code, etc.) use compiler technology.

Syntax Highlighting: Leverages lexical analysis techniques.
Code Auto-completion: Uses syntax analysis and type analysis results.
Refactoring: Applies syntax tree transformation techniques.
Real-time Error Display: Uses incremental parsing techniques.

Security Analysis

Static Analysis: Detects potential vulnerabilities without executing code.
Buffer Overflow Detection: Checks for buffer size overflows through data flow analysis.
Taint Analysis: Tracks whether external input reaches dangerous functions.

[Security Static Analysis Flow]
  Source Code --> [Syntax Analysis] --> [Data Flow Analysis] --> [Vulnerability Report]

Program Verification

Extends the compiler's semantic analysis techniques to verify that programs meet specifications.

Model Checking: Property verification through state space exploration
Abstract Interpretation: Approximate semantic analysis of programs
Dependent Types: Proving properties through types in languages like Agda and Idris

Other Application Areas

Natural Language Processing (NLP): Uses parsing techniques for grammar analysis.
Hardware Synthesis: VHDL and Verilog compilers generate hardware circuits.
Database Queries: Uses compiler technology for SQL parsers and query optimization.
Domain-Specific Languages (DSL): Builds custom languages using compiler front-end technology.

Summary

Concept	Key Content
Static vs Dynamic	Static if determined at compile time, dynamic if at runtime
Static Scope	Resolves variables by code structure (definition location)
Dynamic Scope	Resolves variables by runtime call sequence
Call by Value	Copies value, original unchanged
Call by Reference	Passes address, original can be modified
Environment vs State	Maps names to locations, locations to values
Compiler Applications	IDE, security analysis, program verification, DSL, etc.

Understanding the basic concepts of programming languages is essential for understanding why each phase of a compiler is necessary. In the next article, we will build a simple syntax-directed translator to experience the core principles of compilers firsthand.