- Published on
[Compiler] 02. Programming Language Basics and Compiler Technology Applications
- Authors

- Name
- Youngju Kim
- @fjvbn20031
Programming Language Basics and Compiler Technology Applications
To properly understand compilers, foundational knowledge of programming languages themselves is necessary. This article covers the evolution of languages, core concepts, and how compiler technology is utilized in fields beyond compilation.
1. Evolution of Programming Languages
1st Generation: Machine Language (1940s)
Binary code that computers can directly execute. Extremely difficult for humans to write directly.
10110000 01100001
2nd Generation: Assembly Language (1950s)
A low-level language that represents machine code with mnemonics.
MOV AL, 61h
ADD AL, BL
3rd Generation: High-Level Languages (1960s~)
Languages like Fortran, COBOL, C, and Java appeared, which are easier for humans to understand.
int result = a + b;
4th Generation and Beyond
Includes domain-specific languages (DSLs) like SQL and scripting languages like Python and JavaScript.
Interaction Between Language Design and Compilers
Programming language design and compiler technology influence each other.
- Fortran: Aimed at efficient compilation of numerical computations.
- Java: Emphasized portability through bytecode and JIT compilation.
- Rust: Guarantees memory safety through compile-time ownership checking.
2. Static vs Dynamic Distinction
In compiler theory, static means determined at compile time, and dynamic means determined at runtime.
Statically Typed Languages
Variable types are determined at compile time.
int x = 10; // x is fixed as int at compile time
String s = "hello"; // s is fixed as String at compile time
// x = "world"; // Compile error!
Advantages: Type errors can be detected before execution, and optimization is easier.
Dynamically Typed Languages
Variable types are determined at runtime.
x = 10 # x is currently int
x = "hello" # x changes to str (no error)
Advantages: Flexible and enables rapid development.
Static/Dynamic Comparison
Compile Time Runtime
(Static) (Dynamic)
Type: Java, C, Rust Python, JS, Ruby
Binding: Static binding Dynamic binding
Dispatch: Static dispatch Dynamic dispatch (virtual functions)
Memory: Stack allocation Heap allocation
3. Scope Rules
A scope is the region of a program where a variable declaration is valid.
Static Scope (Lexical Scope)
The scope of a variable is determined by the structure of the source code. Most modern languages (C, Java, Python, etc.) use static scope.
int x = 10;
void foo() {
printf("%d", x); // Always refers to global variable x = 10
}
void bar() {
int x = 20;
foo(); // Inside foo, x is still 10
}
In static scope, functions look up variables at the location where they are defined.
Dynamic Scope
The scope of a variable is determined by the runtime call sequence. Some Lisp dialects and Bash scripts use dynamic scope.
#!/bin/bash
x=10
foo() {
echo $x # Refers to the value of x at the time of the call
}
bar() {
local x=20
foo # Dynamic scope, so outputs x = 20
}
bar # Output: 20
In dynamic scope, functions look up variables at the location where they are called.
Block Structure and Scope
In C-family languages, blocks ({...}) create new scopes.
void example() {
int x = 1;
{
int x = 2; // Inner block's x (shadows outer x)
printf("%d\n", x); // Prints 2
}
printf("%d\n", x); // Prints 1 (outer x)
}
The compiler resolves variable names through a scope chain. If a variable is not found in the current scope, it searches the enclosing scopes in order.
4. Parameter Passing Mechanisms
The way arguments are passed to functions differs by language.
Call by Value
A copy of the value of the argument is passed to the function.
void swap(int a, int b) {
int temp = a;
a = b;
b = temp;
// a, b are copies, so the originals are unaffected
}
int main() {
int x = 1, y = 2;
swap(x, y);
// x = 1, y = 2 (unchanged)
}
Call by Reference
The memory address of the argument is passed.
void swap(int &a, int &b) {
int temp = a;
a = b;
b = temp;
// a, b are aliases for the originals, so originals are modified
}
int main() {
int x = 1, y = 2;
swap(x, y);
// x = 2, y = 1 (changed)
}
Call by Name
The expression itself of the argument is passed and re-evaluated each time it is used. Used in Algol 60 and rarely in modern languages, though Scala's => T syntax is a similar concept.
def condition(test: => Boolean, msg: => String): Unit = {
if (test) println(msg) // msg is evaluated only when test is true
}
Parameter Passing Comparison
+------------------+----------+----------+------------------+
| Method | Copy Cost| Modifies | Representative |
| | | Original | Languages |
+------------------+----------+----------+------------------+
| Call by Value | Yes | No | C, Java(primitives)|
| Call by Reference| No | Yes | C++, C#(ref) |
| Call by Name | No | Yes | Algol 60, Scala |
+------------------+----------+----------+------------------+
5. Environment and State
There are two important mappings in compiler theory.
name --[environment]--> storage location --[state]--> value
Example:
"x" --[environment]--> 0x7fff1234 --[state]--> 42
- Environment: Maps names to storage locations. Created when variables are declared.
- State: Maps storage locations to values. Changed when assignment statements are executed.
This distinction is important for understanding pointers and references. The same name can refer to different locations in different environments (scope), and the value at the same location can change during program execution (state change).
6. Applications of Compiler Technology
Compiler technology is utilized in various fields beyond programming language translation.
IDE Tools
Modern IDEs (IntelliJ, VS Code, etc.) use compiler technology.
- Syntax Highlighting: Leverages lexical analysis techniques.
- Code Auto-completion: Uses syntax analysis and type analysis results.
- Refactoring: Applies syntax tree transformation techniques.
- Real-time Error Display: Uses incremental parsing techniques.
Security Analysis
- Static Analysis: Detects potential vulnerabilities without executing code.
- Buffer Overflow Detection: Checks for buffer size overflows through data flow analysis.
- Taint Analysis: Tracks whether external input reaches dangerous functions.
[Security Static Analysis Flow]
Source Code --> [Syntax Analysis] --> [Data Flow Analysis] --> [Vulnerability Report]
Program Verification
Extends the compiler's semantic analysis techniques to verify that programs meet specifications.
- Model Checking: Property verification through state space exploration
- Abstract Interpretation: Approximate semantic analysis of programs
- Dependent Types: Proving properties through types in languages like Agda and Idris
Other Application Areas
- Natural Language Processing (NLP): Uses parsing techniques for grammar analysis.
- Hardware Synthesis: VHDL and Verilog compilers generate hardware circuits.
- Database Queries: Uses compiler technology for SQL parsers and query optimization.
- Domain-Specific Languages (DSL): Builds custom languages using compiler front-end technology.
Summary
| Concept | Key Content |
|---|---|
| Static vs Dynamic | Static if determined at compile time, dynamic if at runtime |
| Static Scope | Resolves variables by code structure (definition location) |
| Dynamic Scope | Resolves variables by runtime call sequence |
| Call by Value | Copies value, original unchanged |
| Call by Reference | Passes address, original can be modified |
| Environment vs State | Maps names to locations, locations to values |
| Compiler Applications | IDE, security analysis, program verification, DSL, etc. |
Understanding the basic concepts of programming languages is essential for understanding why each phase of a compiler is necessary. In the next article, we will build a simple syntax-directed translator to experience the core principles of compilers firsthand.