- Authors

- Name
- Youngju Kim
- @fjvbn20031
Overview
Compiler technology continues to evolve with the development of programming languages, changes in hardware, and the emergence of new application domains. In this article, we examine modern compiler architectures, major compiler toolchains, and the application of compiler technology in various fields including security, AI/ML, and the web.
1. LLVM Architecture
1.1 What is LLVM
LLVM is a compiler infrastructure project that provides a modularized compiler toolchain. It was originally an acronym for "Low Level Virtual Machine," but is now used as the name for the entire project.
1.2 Three-Phase Architecture
The core design philosophy of LLVM is the separation into frontend-middle end-backend three phases.
Source code Frontend LLVM IR Middle End LLVM IR Backend Machine code
(Optimization)
C/C++ --> Clang --> --> --> --> x86
Rust --> rustc --> Common IR --> Common Opt --> Common IR --> ARM
Swift --> swiftc --> --> --> --> RISC-V
Fortran --> flang --> --> --> --> WebAssembly
Advantages of this structure:
- To support a new language, only a frontend needs to be implemented
- To support a new architecture, only a backend needs to be implemented
- Optimizations are shared across all languages and architectures
1.3 LLVM IR
LLVM IR is an SSA-based low-level intermediate representation.
; LLVM IR example: sum of two numbers
define i32 @add(i32 %a, i32 %b) {
entry:
%result = add i32 %a, %b
ret i32 %result
}
; Conditional example
define i32 @max(i32 %a, i32 %b) {
entry:
%cmp = icmp sgt i32 %a, %b
br i1 %cmp, label %then, label %else
then:
br label %merge
else:
br label %merge
merge:
%result = phi i32 [%a, %then], [%b, %else]
ret i32 %result
}
Three representation forms of LLVM IR:
- Text form (
.llfiles): Human-readable form - Bytecode (
.bcfiles): Efficient serialization form - In-memory representation: C++ objects used inside the compiler
1.4 LLVM Optimization Passes
LLVM optimizations are organized in units called passes.
Key optimization passes:
- mem2reg: Promote memory accesses to registers (SSA construction)
- instcombine: Instruction combination optimization
- gvn: Global value numbering
- licm: Loop-invariant code motion
- indvars: Induction variable simplification
- loop-unroll: Loop unrolling
- inline: Function inlining
- sccp: Sparse conditional constant propagation
- dce: Dead code elimination
- simplifycfg: Control flow simplification
Pass configuration by optimization level:
-O0: No optimization (for debugging)
-O1: Basic optimization (fast compilation)
-O2: Standard optimization (recommended for most cases)
-O3: Aggressive optimization (allows code size increase)
-Os: Code size optimization
-Oz: Extreme size optimization
2. GCC vs LLVM vs Clang
2.1 GCC (GNU Compiler Collection)
History: Started by Richard Stallman in 1987
Languages: C, C++, Fortran, Go, Ada, etc.
Features:
- Nearly 40 years of history, extensive architecture support
- GIMPLE (intermediate repr) -> RTL (low-level repr) two-stage structure
- Powerful optimization (especially Fortran)
- GPL license
2.2 LLVM/Clang
History: Started by Chris Lattner in 2003 (UIUC)
Clang: LLVM's C/C++/Objective-C frontend
Features:
- Modular library design
- Better error messages
- Faster compilation speed
- Apache 2.0 license (favorable for commercial use)
- Easy IDE integration (libclang, clangd)
2.3 Comparison
| Aspect | GCC | LLVM/Clang |
|---|---|---|
| Error messages | Basic | Detailed and friendly |
| Compilation speed | Average | Fast |
| Code quality | Excellent | Excellent |
| Architecture support | Very broad | Broad (expanding) |
| Extensibility | Difficult | Easy (library) |
| Static analysis | Basic | Powerful (Clang Static Analyzer) |
| License | GPL | Apache 2.0 |
Practical choices:
- Embedded/legacy systems: GCC (broad architecture support)
- iOS/macOS development: Clang (Apple's official compiler)
- Static analysis/tool development: LLVM (modular library)
- High-performance computing: Use both (decide by benchmarks)
3. JIT Compilation
3.1 AOT vs JIT
AOT (Ahead-Of-Time) compilation:
Source code -> [Compile] -> Machine code -> [Execute]
Examples: C/C++ (gcc, clang), Rust, Go
JIT (Just-In-Time) compilation:
Source code -> [Start with interpreter] -> [Detect hotspots] -> [Compile during execution] -> [Switch to optimized code]
Examples: Java (HotSpot), JavaScript (V8), .NET (RyuJIT)
3.2 Advantages of JIT
// 1. Profile-Guided Optimization (PGO)
// Optimization decisions based on information collected during execution
if (type == "string") { // True 95% of the time
// JIT: Optimize this branch (inline cache)
}
// 2. Speculative Optimization
// Run fast code as long as assumptions hold
// If assumptions fail, deoptimize and fall back to interpreter
// 3. Adaptive Optimization
// Optimize only hot code, run cold code in interpreter
// Balance between compilation time and execution time
3.3 Major JIT Engines
Java HotSpot JVM:
Execution flow:
1. Start with bytecode interpreter
2. When call count exceeds threshold, C1 compiler (fast compilation, simple optimization)
3. If executed more, C2 compiler (slow compilation, aggressive optimization)
Tiered Compilation:
Level 0: Interpreter
Level 1: C1 (no profiling)
Level 2: C1 (limited profiling)
Level 3: C1 (full profiling)
Level 4: C2 (optimized code)
JavaScript V8:
Execution flow:
1. Parser: JavaScript -> AST
2. Ignition (interpreter): AST -> bytecode execution
3. Sparkplug (baseline JIT): Fast machine code generation
4. Maglev (mid-tier JIT): Mid-level optimization
5. TurboFan (optimizing JIT): Aggressive optimization
Key techniques:
- Hidden Classes: Give structure to dynamically typed objects
- Inline Caching: Optimize property access
- Deoptimization: Fall back to interpreter when speculation fails
4. Modern Language Features and Compilation Challenges
4.1 Generics
// Generic implementation strategies:
// 1. Monomorphization - Rust, C++
// Generate separate code for each concrete type
// Pros: Fast execution (inlining, specialized optimization)
// Cons: Code size increase (code bloat)
// 2. Type Erasure - Java, Kotlin
// Remove generic type information at compile time, process as Object
// Pros: Small code size
// Cons: Boxing/unboxing overhead, loss of runtime type information
// 3. Dictionary Passing - Haskell
// Pass type class method tables as arguments
// Pros: Small code size
// Cons: Indirect call overhead
4.2 Closures
// Closure: A function that captures free variables
// Compilation handling:
// 1. Store captured variables in a struct (environment)
// 2. Closure = function pointer + environment pointer
// Example (pseudocode):
// Original:
// fn make_adder(x):
// return fn(y): x + y
// After compilation:
// struct Env { int x; }
// int closure_fn(Env* env, int y) { return env->x + y; }
// Closure make_adder(int x) {
// Env* env = alloc(Env);
// env->x = x;
// return (closure_fn, env);
// }
4.3 Pattern Matching
// Pattern matching compilation strategies:
// 1. Decision Tree
// Test each pattern sequentially
// Construct tree to minimize number of checks
// 2. Backtracking Automaton
// Match multiple patterns simultaneously
// Memory-efficient but complex to implement
// Example:
// match value {
// (0, y) => ...,
// (x, 0) => ...,
// (x, y) => ...,
// }
//
// Decision tree:
// value.0 == 0?
// yes -> Pattern 1 (y = value.1)
// no -> value.1 == 0?
// yes -> Pattern 2 (x = value.0)
// no -> Pattern 3 (x = value.0, y = value.1)
5. Compiler Technology in Security
5.1 Static Analysis
Uses compiler technology to find bugs without executing code.
Key static analysis tools:
- Clang Static Analyzer: Path-sensitive analysis, memory bug detection
- Coverity: Commercial static analysis tool
- Infer (Meta): Memory safety checking on large codebases
- CodeQL (GitHub): Query-based code analysis
Detectable issues:
- Null pointer dereference
- Buffer overflow
- Memory leak
- Use-after-free
- Data races
5.2 Sanitizers
Insert checking code at compile time to detect bugs at runtime.
Key sanitizers (LLVM/GCC supported):
AddressSanitizer (ASan):
- Detects memory access errors (buffer overflow, use-after-free)
- About 2x performance overhead
- Compile: clang -fsanitize=address
MemorySanitizer (MSan):
- Detects uninitialized memory reads
- About 3x performance overhead
ThreadSanitizer (TSan):
- Detects data races
- About 5-15x performance overhead
UndefinedBehaviorSanitizer (UBSan):
- Detects undefined behavior (integer overflow, invalid shifts, etc.)
- Minimal performance overhead
How ASan works:
// Original code:
int a[10];
a[15] = 42; // Buffer overflow!
// Code inserted by ASan (conceptual):
// 1. Set up "red zones" around memory
// 2. Check boundaries before every memory access
// 3. Report error if access touches red zone
// Runtime output:
// ERROR: AddressSanitizer: stack-buffer-overflow
// WRITE of size 4 at address ...
// [Stack trace]
5.3 Control Flow Integrity (CFI)
Verifies targets of indirect branches to prevent code reuse attacks (ROP, JOP).
// Control Flow Integrity (LLVM CFI):
// Verifies that calls through function pointers target only valid targets
// clang -fsanitize=cfi
// Verifies the signature of target functions at indirect calls
6. Compilers for AI/ML
6.1 Why Deep Learning Compilers Are Needed
Traditional approach:
PyTorch/TensorFlow -> Framework runtime -> cuDNN/MKL -> GPU/CPU
Compiler approach:
Model definition -> Graph IR -> Optimization -> Code generation -> GPU/CPU/TPU/NPU
Advantages:
- Automate hardware-specific optimization
- Rapid support for new hardware
- Minimize memory access through operator fusion
6.2 XLA (Accelerated Linear Algebra)
A deep learning compiler developed by Google, used in TensorFlow and JAX.
XLA optimizations:
1. Operation Fusion
- Combine multiple element-wise operations into a single kernel
- Eliminate memory allocation for intermediate tensors
2. Layout Optimization
- Optimize tensor memory layout for hardware
3. Constant Folding
- Pre-compute tensor operations determinable at compile time
6.3 TVM (Tensor Virtual Machine)
Apache TVM is an open-source deep learning compiler targeting diverse hardware.
TVM stack:
Frontend: Import PyTorch, TensorFlow, ONNX models
|
Relay IR: High-level graph representation
|
Relay optimization: Graph-level optimization (fusion, quantization, etc.)
|
Tensor IR (TIR): Low-level tensor operation representation
|
AutoTVM/Ansor: Automatic performance tuning (schedule search)
|
Code generation: CUDA, OpenCL, Metal, LLVM, etc.
6.4 Other ML Compilers
- MLIR (Multi-Level IR): Multi-level IR framework in the LLVM project
Unified compiler infrastructure supporting various abstraction levels
- Triton: Python-based language/compiler for GPU kernel writing
Backend for PyTorch 2.0's torch.compile
- IREE: Compiler for deploying ML models to embedded/mobile environments
- StableHLO: Portable serialization format for ML models
7. WebAssembly Compilation
7.1 WebAssembly (Wasm) Overview
WebAssembly is a binary instruction format that runs at near-native speed in web browsers.
Features:
- Stack-based virtual machine
- Static type system
- Memory safety (linear memory model)
- Can be used as compilation target from various languages
- Expanding use beyond browsers to servers and embedded
7.2 Compilation Pipeline to Wasm
C/C++ -> Emscripten -> LLVM -> Wasm backend -> .wasm
Rust -> rustc -> LLVM -> Wasm backend -> .wasm
Go -> TinyGo -> LLVM -> Wasm backend -> .wasm
Kotlin -> Kotlin/Wasm -> .wasm
7.3 Wasm Text Format Example
;; WAT (WebAssembly Text Format) example: Fibonacci
(module
(func $fib (param $n i32) (result i32)
(if (i32.lt_s (local.get $n) (i32.const 2))
(then (return (local.get $n)))
)
(i32.add
(call $fib (i32.sub (local.get $n) (i32.const 1)))
(call $fib (i32.sub (local.get $n) (i32.const 2)))
)
)
(export "fib" (func $fib))
)
7.4 Wasm Optimization Challenges
Key challenges:
1. GC integration: Reference types and GC support (Wasm GC proposal)
2. SIMD: Vector operation support (128-bit SIMD implemented)
3. Threads: Shared memory and atomic operations
4. Exception handling: Zero-cost exception handling
5. Tail calls: Support for functional languages
Optimization tools:
- Binaryen: Wasm-specific optimization (wasm-opt)
- Dead code elimination
- Function inlining
- Constant folding
- Code size optimization
7.5 WASI (WebAssembly System Interface)
A system interface for running Wasm outside the browser.
Application areas:
- Serverless computing: Cloudflare Workers, Fastly Compute
- Container alternative: Docker + Wasm
- Plugin systems: Safe extension execution environments
- Embedded systems: Resource-constrained environments
8. Future Outlook
8.1 Development Directions of Compiler Technology
1. AI-based Compiler Optimization
- Reinforcement learning for optimization pass ordering
- Neural network-based register allocation
- LLM-based code analysis and optimization suggestions
2. Domain-Specific Compilers (DSL Compilers)
- Compilers optimized for specific fields (ML, graphs, databases)
- Halide (image processing), GraphIt (graph algorithms)
3. Verified Compilers
- CompCert: Mathematically verified C compiler
- Formally prove correctness of optimizations
4. Heterogeneous Computing Support
- Compilers that manage CPU + GPU + FPGA + NPU together
- Standards like SYCL, OneAPI (Intel)
5. Security-Embedded Compilers
- Spread of memory-safe languages (Rust)
- Utilization of hardware security features (ARM MTE, Intel CET)
Summary
| Concept | Description |
|---|---|
| LLVM | Modular compiler infrastructure, three-phase architecture |
| Clang | LLVM-based C/C++ frontend |
| JIT compilation | Compiling hot code to machine code during execution |
| Monomorphization | Generating code for each concrete type from generics |
| Static analysis | Finding code bugs without execution |
| Sanitizers | Inserting runtime checking code to detect bugs at runtime |
| XLA | Google's ML compiler (TensorFlow/JAX) |
| TVM | Open-source ML compiler targeting diverse hardware |
| WebAssembly | Portable binary format for web browsers |
| WASI | System interface standard for Wasm |
Compiler technology has gone beyond simply translating source code to machine code, playing a core role in diverse fields including security, AI/ML, and web technologies. The importance of compilers is growing especially with the emergence of heterogeneous hardware and the explosion of AI workloads. The fundamental principles covered in this series -- from lexical analysis to code optimization -- form the foundation for all these modern applications.