Skip to content
Published on

[Compiler] 02. Programming Language Basics and Compiler Technology Applications

Authors

Programming Language Basics and Compiler Technology Applications

To properly understand compilers, foundational knowledge of programming languages themselves is necessary. This article covers the evolution of languages, core concepts, and how compiler technology is utilized in fields beyond compilation.


1. Evolution of Programming Languages

1st Generation: Machine Language (1940s)

Binary code that computers can directly execute. Extremely difficult for humans to write directly.

10110000 01100001

2nd Generation: Assembly Language (1950s)

A low-level language that represents machine code with mnemonics.

MOV AL, 61h
ADD AL, BL

3rd Generation: High-Level Languages (1960s~)

Languages like Fortran, COBOL, C, and Java appeared, which are easier for humans to understand.

int result = a + b;

4th Generation and Beyond

Includes domain-specific languages (DSLs) like SQL and scripting languages like Python and JavaScript.

Interaction Between Language Design and Compilers

Programming language design and compiler technology influence each other.

  • Fortran: Aimed at efficient compilation of numerical computations.
  • Java: Emphasized portability through bytecode and JIT compilation.
  • Rust: Guarantees memory safety through compile-time ownership checking.

2. Static vs Dynamic Distinction

In compiler theory, static means determined at compile time, and dynamic means determined at runtime.

Statically Typed Languages

Variable types are determined at compile time.

int x = 10;         // x is fixed as int at compile time
String s = "hello"; // s is fixed as String at compile time
// x = "world";     // Compile error!

Advantages: Type errors can be detected before execution, and optimization is easier.

Dynamically Typed Languages

Variable types are determined at runtime.

x = 10        # x is currently int
x = "hello"   # x changes to str (no error)

Advantages: Flexible and enables rapid development.

Static/Dynamic Comparison

            Compile Time          Runtime
           (Static)              (Dynamic)
Type:      Java, C, Rust         Python, JS, Ruby
Binding:   Static binding        Dynamic binding
Dispatch:  Static dispatch       Dynamic dispatch (virtual functions)
Memory:    Stack allocation      Heap allocation

3. Scope Rules

A scope is the region of a program where a variable declaration is valid.

Static Scope (Lexical Scope)

The scope of a variable is determined by the structure of the source code. Most modern languages (C, Java, Python, etc.) use static scope.

int x = 10;

void foo() {
    printf("%d", x);  // Always refers to global variable x = 10
}

void bar() {
    int x = 20;
    foo();  // Inside foo, x is still 10
}

In static scope, functions look up variables at the location where they are defined.

Dynamic Scope

The scope of a variable is determined by the runtime call sequence. Some Lisp dialects and Bash scripts use dynamic scope.

#!/bin/bash
x=10

foo() {
    echo $x  # Refers to the value of x at the time of the call
}

bar() {
    local x=20
    foo   # Dynamic scope, so outputs x = 20
}

bar  # Output: 20

In dynamic scope, functions look up variables at the location where they are called.

Block Structure and Scope

In C-family languages, blocks ({...}) create new scopes.

void example() {
    int x = 1;
    {
        int x = 2;  // Inner block's x (shadows outer x)
        printf("%d\n", x);  // Prints 2
    }
    printf("%d\n", x);  // Prints 1 (outer x)
}

The compiler resolves variable names through a scope chain. If a variable is not found in the current scope, it searches the enclosing scopes in order.


4. Parameter Passing Mechanisms

The way arguments are passed to functions differs by language.

Call by Value

A copy of the value of the argument is passed to the function.

void swap(int a, int b) {
    int temp = a;
    a = b;
    b = temp;
    // a, b are copies, so the originals are unaffected
}

int main() {
    int x = 1, y = 2;
    swap(x, y);
    // x = 1, y = 2 (unchanged)
}

Call by Reference

The memory address of the argument is passed.

void swap(int &a, int &b) {
    int temp = a;
    a = b;
    b = temp;
    // a, b are aliases for the originals, so originals are modified
}

int main() {
    int x = 1, y = 2;
    swap(x, y);
    // x = 2, y = 1 (changed)
}

Call by Name

The expression itself of the argument is passed and re-evaluated each time it is used. Used in Algol 60 and rarely in modern languages, though Scala's => T syntax is a similar concept.

def condition(test: => Boolean, msg: => String): Unit = {
    if (test) println(msg)  // msg is evaluated only when test is true
}

Parameter Passing Comparison

+------------------+----------+----------+------------------+
| Method           | Copy Cost| Modifies | Representative   |
|                  |          | Original | Languages        |
+------------------+----------+----------+------------------+
| Call by Value    | Yes      | No       | C, Java(primitives)|
| Call by Reference| No       | Yes      | C++, C#(ref)     |
| Call by Name     | No       | Yes      | Algol 60, Scala  |
+------------------+----------+----------+------------------+

5. Environment and State

There are two important mappings in compiler theory.

name --[environment]--> storage location --[state]--> value

Example:
  "x"  --[environment]--> 0x7fff1234  --[state]--> 42
  • Environment: Maps names to storage locations. Created when variables are declared.
  • State: Maps storage locations to values. Changed when assignment statements are executed.

This distinction is important for understanding pointers and references. The same name can refer to different locations in different environments (scope), and the value at the same location can change during program execution (state change).


6. Applications of Compiler Technology

Compiler technology is utilized in various fields beyond programming language translation.

IDE Tools

Modern IDEs (IntelliJ, VS Code, etc.) use compiler technology.

  • Syntax Highlighting: Leverages lexical analysis techniques.
  • Code Auto-completion: Uses syntax analysis and type analysis results.
  • Refactoring: Applies syntax tree transformation techniques.
  • Real-time Error Display: Uses incremental parsing techniques.

Security Analysis

  • Static Analysis: Detects potential vulnerabilities without executing code.
  • Buffer Overflow Detection: Checks for buffer size overflows through data flow analysis.
  • Taint Analysis: Tracks whether external input reaches dangerous functions.
[Security Static Analysis Flow]
  Source Code --> [Syntax Analysis] --> [Data Flow Analysis] --> [Vulnerability Report]

Program Verification

Extends the compiler's semantic analysis techniques to verify that programs meet specifications.

  • Model Checking: Property verification through state space exploration
  • Abstract Interpretation: Approximate semantic analysis of programs
  • Dependent Types: Proving properties through types in languages like Agda and Idris

Other Application Areas

  • Natural Language Processing (NLP): Uses parsing techniques for grammar analysis.
  • Hardware Synthesis: VHDL and Verilog compilers generate hardware circuits.
  • Database Queries: Uses compiler technology for SQL parsers and query optimization.
  • Domain-Specific Languages (DSL): Builds custom languages using compiler front-end technology.

Summary

ConceptKey Content
Static vs DynamicStatic if determined at compile time, dynamic if at runtime
Static ScopeResolves variables by code structure (definition location)
Dynamic ScopeResolves variables by runtime call sequence
Call by ValueCopies value, original unchanged
Call by ReferencePasses address, original can be modified
Environment vs StateMaps names to locations, locations to values
Compiler ApplicationsIDE, security analysis, program verification, DSL, etc.

Understanding the basic concepts of programming languages is essential for understanding why each phase of a compiler is necessary. In the next article, we will build a simple syntax-directed translator to experience the core principles of compilers firsthand.