- Published on
Compiler and Linker Internals — From #include to an Executable: ELF, Symbols, Static/Dynamic Linking, Relocation, LTO (2025)
- Authors

- Name
- Youngju Kim
- @fjvbn20031
0. Before We Start — The Mystery of "gcc hello.c -o hello"
#include <stdio.h>
int main(void) {
printf("Hello\n");
return 0;
}
This 11-line source becomes an executable with a single gcc hello.c -o hello. It feels so natural that we write it off as "the compiler handles it." But let's throw some odd questions at it:
- Where does the actual code for
printflive? Inside my program? Or fetched at runtime? - Why does
hello.cfirst becomehello.oand thenhello? - In C++, why don't the overloads
void f(int)andvoid f(double)collide as names? - Why does deleting
libc.so.6take down the entire Linux system? - Why does
stripmake the binary smaller but break the debugger?
This article digs into the full journey from source to running program for C/C++ code, along with the internals. The target is Linux + ELF + GCC/Clang, but the principles apply similarly on Windows (PE/COFF) and macOS (Mach-O).
1. The Four Phases of a Build — Preprocess, Compile, Assemble, Link
1.1 The Stages Hidden Behind One gcc Command
hello.c --[cpp: preprocess]--> hello.i (expanded C source)
hello.i --[cc1: compile]-----> hello.s (assembly)
hello.s --[as: assemble]-----> hello.o (object file)
hello.o --[ld: link]---------> hello (executable)
Each stage is independently runnable. To actually see the intermediate outputs:
gcc -E hello.c -o hello.i # preprocess only
gcc -S hello.i -o hello.s # compile only
gcc -c hello.s -o hello.o # assemble only
gcc hello.o -o hello # link only
1.2 The Preprocessor — A Simple Text Substitution Engine
cpp (the C preprocessor) surprisingly knows nothing about C. It just substitutes text:
#include <stdio.h>- insert the contents of stdio.h in place.#define PI 3.14- replace everyPIwith3.14.#ifdef DEBUG- conditionally include/exclude text.
This is the root of C macros being dangerous. For example:
#define SQUARE(x) x*x
SQUARE(1+2) // -> 1+2*1+2 = 5 (intended 9)
This is why constexpr and templates in C++ replaced macros. But the preprocessor is still essential for header guards (#ifndef HEADER_H), platform branches (#ifdef _WIN32), conditional logging, and the like.
Look at gcc -E output and the original 11-line hello.c expands into tens of thousands of lines. stdio.h pulls in dozens of system headers transitively.
1.3 The Compiler — Where Real "Understanding" Begins
What the compiler frontend (Clang, GCC cc1) does:
hello.i
-> Lexer (tokenize)
-> Parser (build AST)
-> Semantic Analyzer (type check, name resolution)
-> IR generation (GIMPLE / LLVM IR)
-> Optimization passes (dozens)
-> Backend (emit target-CPU assembly)
hello.s
LLVM IR example (compiling hello.c):
@.str = private constant [7 x i8] c"Hello\0A\00"
define i32 @main() {
%1 = call i32 @printf(i8* getelementptr ([7 x i8], [7 x i8]* @.str, i32 0, i32 0))
ret i32 0
}
declare i32 @printf(i8*, ...)
Interesting bits at the IR level:
@printfis only declared, with no definition - a symbol to be resolved later at link time.@.stris a global for a constant string. It will land in the.rodatasection ofhello.o.
1.4 The Assembler — Turning Text Into Binary
hello.s (x86-64 example):
.section .rodata
.LC0:
.string "Hello"
.text
.globl main
main:
pushq %rbp
movq %rsp, %rbp
leaq .LC0(%rip), %rdi
call puts@PLT
movl $0, %eax
popq %rbp
ret
as (the GNU assembler) converts this into machine-code bytes and packs them into an ELF object file (hello.o). What's interesting:
call puts@PLTis left as a relocation entry that says "fill in the real address of puts here." The actual address is unknown.leaq .LC0(%rip)references.LC0, a symbol inside this file - the offset can be computed internally.
1.5 The Linker — Combining Pieces Into a Whole Executable
Core responsibilities of the linker:
- Merge multiple
.ofiles: concatenate sections of the same kind (.text,.data). - Symbol resolution: connect external symbols like
printfto their actual definitions. - Relocation: once final addresses are fixed, fill in the address fields of instructions.
- Executable header generation: produce an ELF header the kernel can read.
We'll cover why each of these matters in detail from the next chapter.
2. ELF — The Lingua Franca of Linux Binaries
2.1 Two Views of ELF
Since Linux adopted it as standard in 1999, ELF (Executable and Linkable Format) has become the de facto standard across Unix-like systems. The same file was designed to be seen in two ways:
+------------------+------------------+
| ELF Header | ELF Header |
+------------------+------------------+
| Program Header | |
| (execution view) | |
+------------------+ Sections: |
| | .text |
| Segments: | .rodata |
| LOAD (r-x) | .data |
| LOAD (rw-) | .bss |
| INTERP | .symtab |
| DYNAMIC | .strtab |
| | .rela.text |
+------------------+------------------+
| Section Header | Section Header |
| (linking view) | (linking view) |
+------------------+------------------+
runtime view link-time view
- Section: what the linker deals with. Finely split into
.text,.data,.bss,.symtab, etc. - Segment (Program Header): what the kernel deals with. Groups sections by permissions (r-x, rw-) and maps them into memory.
2.2 Key Sections
| Section | Content | Memory permissions |
|---|---|---|
.text | Executable code | r-x |
.rodata | Read-only constants (string literals, etc.) | r-- |
.data | Initialized global/static variables | rw- |
.bss | Zero-initialized global/static variables | rw- |
.symtab | Symbol table (function/variable name -> address) | (file only) |
.strtab | String pool of symbol names | (file only) |
.rela.text | Relocation entries for .text | (file only) |
.dynsym | Dynamic symbols (used by the linker) | r-- |
.plt, .got | Jump tables for dynamic linking | r-x / rw- |
2.3 The Magic of .bss — Why Zero-Initialized Variables Don't Take File Space
int zeros[1000000]; // 4MB
int ones[1000000] = {1, ...}; // 4MB
oneslands in.data- the executable grows by 4MB.zeroslands in.bss- the executable size barely changes.
Reason: .bss records "just the size" in the file. When the kernel loads the program, it maps a zero page (the global zero page). Storing 4MB of zeros on disk would be waste.
On first write, copy-on-write allocates a real page. So "zero-initialized" actually has a performance edge over "uninitialized" (the CPU caches the zero page well anyway).
2.4 Seeing it Live — The World of readelf
$ readelf -h hello
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 ...
Class: ELF64
Data: 2's complement, little endian
Type: EXEC (Executable file)
Machine: Advanced Micro Devices X86-64
Entry point address: 0x401050
$ readelf -S hello # section headers
$ readelf -l hello # program headers (segments)
$ readelf -s hello # symbol table
$ objdump -d hello # disassembly
Run these on a small program like hello and the structure of ELF comes into focus. The best way to learn systems programming.
3. Symbols — The Link Between Names and Addresses
3.1 What Is a Symbol
A symbol is a "name -> address" mapping. At compile time, the real addresses of functions and globals are not yet fixed, but the names exist. The linker merges all .o files and assigns addresses.
Symbol table of hello.o:
$ readelf -s hello.o
Num: Value Size Type Bind Vis Name
5: 0000000000000000 22 FUNC GLOBAL DEFAULT 1 main
6: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND printf
main: defined in this file (Bind=GLOBAL, section 1 =.text, Size=22).printf: undefined (UND). Must be found externally at link time.
3.2 Binding — Local, Global, Weak
| Binding | Meaning |
|---|---|
| LOCAL | File-internal (static functions/variables). Not visible to other files. |
| GLOBAL | Externally exposed. Multiple files can link against this symbol. |
| WEAK | A default that gets overridden by any STRONG definition. |
The WEAK trick: a library provides the default implementation as WEAK, so users can easily override it with a STRONG definition of the same name. Symbols like pthread's pthread_mutex_lock work this way.
// default implementation (inside libc)
__attribute__((weak)) void my_log(const char* msg) {
fprintf(stderr, "LOG: %s\n", msg);
}
// user override
void my_log(const char* msg) {
write_to_elasticsearch(msg);
}
3.3 Duplicate Symbols — The "multiple definition of foo" Error
If the same GLOBAL symbol exists in multiple .o files, the linker errors out:
/usr/bin/ld: b.o: multiple definition of `foo'; a.o: first defined here
Most common causes:
- Putting a function definition (not just a declaration) in a header so multiple
.cfiles include it. Useinlineorstatic, or keep only the declaration in the header and put the definition in one.c. - Putting a global variable definition (
int counter = 0;) in a header. Useextern int counter;to declare in the header and define it in exactly one.c.
C++ inline functions get special treatment - the linker tolerates duplicates and keeps only one (COMDAT sections).
4. Relocation — A World Where Addresses Are Fixed Late
4.1 Why Relocation Is Needed
We saw call printf@PLT in main of hello.o. This call instruction uses a relative address (x86-64 call rel32). That is, "how far to jump from the current PC." The problem:
- At
hello.oassembly time, we don't knowprintf's final address. - We don't even know
main's final address (another.omight come before it).
The assembler fills the instruction's address field with 0 and records a relocation entry in .rela.text saying "write (printf's address - current address) here":
$ readelf -r hello.o
Relocation section '.rela.text':
Offset Info Type Sym.Value Sym. Name + Addend
0000000c ... R_X86_64_PLT32 0 .rodata - 4
00000015 ... R_X86_64_PLT32 0 puts - 4
4.2 The Variety of Relocation Types
x86-64 alone has dozens of relocation types:
| Type | Meaning |
|---|---|
| R_X86_64_64 | Absolute 64-bit address |
| R_X86_64_PC32 | PC-relative 32-bit (call, jmp) |
| R_X86_64_PLT32 | Call via PLT |
| R_X86_64_GOTPCREL | Reference relative to GOT entry |
| R_X86_64_TPOFF32 | TLS (thread-local storage) |
Each tells the linker "how to compute the value." This is a core part of the ABI.
4.3 -fPIC — Position Independent Code
A shared library (.so) may be loaded at a different address in each process. So it can't use absolute addresses.
When compiled with -fPIC (Position Independent Code):
- Global variable access goes through the GOT (Global Offset Table).
- Function calls go through the PLT (Procedure Linkage Table).
This is how libc can be loaded into memory once and shared across every process. A major memory-saver.
5. Static Linking vs Dynamic Linking
5.1 Static Linking — Everything Inside My Binary
gcc hello.c -o hello -static
- libc's
printfcode is copied into the executable. - Executable size: 10KB -> 800KB.
- Pros: runs on systems without libc. No version-compatibility issues.
- Cons: big. Each process carries its own copy of libc in memory.
The Structure of a Static Library
libfoo.a is actually just an archive:
$ ar t /usr/lib/x86_64-linux-gnu/libc.a | head
init-first.o
libc-start.o
sysdep.o
...
A bundle of .o files made with ar (archiver). The linker pulls in only the .o files whose symbols my program requires (lazy linking).
"Requires" is decided at the file granularity of globals - an unused function in the same .o gets dragged in. This is why -ffunction-sections -fdata-sections -Wl,--gc-sections exists.
5.2 Dynamic Linking — Load on Demand
gcc hello.c -o hello # default: dynamic linking
- The executable only records a reference like "load libc.so.6 and pull printf from there."
- At run time the dynamic linker (
/lib64/ld-linux-x86-64.so.2) maps libc into memory and resolves symbols.
The INTERP Section — Name of the Dynamic Linker
$ readelf -l hello | grep -A1 INTERP
INTERP ...
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
The executable names an "interpreter" that executes it. The kernel actually runs ld.so first, and ld.so then loads my program.
5.3 The Secret of ldd
$ ldd hello
linux-vdso.so.1 (0x00007ffd...)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f...)
/lib64/ld-linux-x86-64.so.2 (0x00007f...)
ldd shows "which shared libraries this program requires." Internally, however, it actually runs the program but stops before main and reads the map the dynamic linker resolved. That's why running ldd on a malicious binary is dangerous - use objdump -p or readelf -d instead.
5.4 The Magic of GOT/PLT
Flow of a dynamic call:
main calls printf
->
call printf@plt (jump to the PLT entry)
->
PLT[printf]:
jmp *GOT[printf] ; read the address from GOT and jump
; initially GOT is set to "call the resolver"
-> (on first call)
resolver invokes ld.so -> finds printf's real address -> updates GOT[printf]
->
subsequent calls jump directly through GOT[printf] (lazy binding)
Lazy binding: resolve only on the first call. Unused functions never get resolved. Faster startup.
LD_BIND_NOW=1: resolve every symbol at startup. No per-call delay but slow startup. Security win (GOT becomes read-only -> RELRO).
5.5 RELRO — A Small Security Revolution
GOT was one of the targets of Return-Oriented Programming attacks. Keeping GOT writable lets an attacker write a malicious address into it to hijack the next call.
Partial RELRO: .got.plt stays writable (needed for lazy binding); only .got is read-only.
Full RELRO (-Wl,-z,relro,-z,now): resolve every symbol at startup, then make .got.plt read-only too. Same effect as LD_BIND_NOW=1.
Modern distros default to Full RELRO.
6. Versioning of Shared Libraries — The Numbers After .so
libc.so.6 -> libc-2.35.so
libc.so -> libc.so.6
Three tiers:
libc-2.35.so: the actual file, a specific version.libc.so.6: theSONAME, the ABI version. This is what gets recorded in the executable at link time.libc.so: thelinker name, a symlink for developer convenience.gcc -lclooks this up.
6.1 Symbol Versioning — Same Name, Different Versions
In glibc, the same function can exist in multiple versions:
$ objdump -T /lib/x86_64-linux-gnu/libc.so.6 | grep memcpy
0000000000000000 DF *UND* memcpy@GLIBC_2.14
0000000000000000 DF *UND* memcpy@@GLIBC_2.17
In 2011, memcpy got a bug-fix version that broke the wrong behavior Adobe Flash depended on. Old binaries use the @GLIBC_2.2.5 version; freshly compiled code uses @@GLIBC_2.14 - coexisting inside the same libc.so.6.
This is why 10-year-old Linux binaries still run.
6.2 SO Version Incompatibility — "GLIBC_2.35 not found"
./myapp: /lib/x86_64-linux-gnu/libc.so.6: version GLIBC_2.35 not found
A common sight when running a binary built on Ubuntu 22.04 on Ubuntu 20.04. Fixes:
- Build on the old system: reproduce the old distro in Docker.
- Static linking: no compatibility issue (but PAM, NSS, etc. still need dynamic loading).
- Rust/Go: statically link their own runtime - mitigates this problem.
7. C++ Name Mangling — The Secret of Overloading
7.1 Why Mangling Is Needed
In C, function name = symbol name:
void foo(int) { } // symbol: foo
In C++, multiple overloads share the same name:
void foo(int);
void foo(double);
void foo(int, int);
If they shared one symbol, the linker couldn't tell them apart. Solution: name mangling.
void foo(int) -> _Z3fooi
void foo(double) -> _Z3food
void foo(int, int) -> _Z3fooii
7.2 Itanium ABI Mangling Rules
The Itanium C++ ABI mangling used by GCC/Clang:
_Z : prefix (mangling)
N ... E : namespace
3foo : name "foo" (length 3)
i : int
d : double
Pi : pointer to int
PKc : pointer to const char
A complex example:
namespace ns {
class C {
void method(const std::string& s, int n);
};
}
// mangled symbol:
_ZN2ns1C6methodERKSsi
7.3 c++filt — Make It Human-Readable
$ c++filt _ZN2ns1C6methodERKSsi
ns::C::method(std::string const&, int)
When a debugger, stack trace, or linker error shows a mangled name, decode it with c++filt.
7.4 extern "C" — Disable Mangling
To use a C library from C++, turn off mangling:
extern "C" {
#include <stdio.h>
}
extern "C" void my_c_api(int x); // symbol: my_c_api (no mangling)
Essential for plugin architectures and dynamic loading (dlsym).
7.5 The Nightmare of ABI Compatibility
In C++, not only names but also class layouts are part of the ABI:
- Change the layout of
std::stringand binaries compiled with the old and new compilers can no longer mix. - The famous "dual ABI" event happened in GCC 5 when
std::stringandstd::listwere moved to a C++11-compatible layout. - Selectable via
_GLIBCXX_USE_CXX11_ABI=0/1, but mixing them produces subtle crashes.
Takeaway: when shipping a C++ library, exposing an abstract C API is safer. Qt and GTK take this route.
8. LTO (Link-Time Optimization) — Whole-Program Optimization
8.1 The Limits of a Translation Unit
In the traditional model, the compiler sees only one file (translation unit):
// foo.c
int add(int a, int b) { return a + b; }
// bar.c
extern int add(int, int);
int main() { return add(2, 3); }
The compiler can't know add "always returns the constant 5" - so replacing the call with the constant 5 is impossible.
8.2 The LTO Fix
The -flto option: at compile time, store the IR (LLVM bitcode or GCC GIMPLE) inside .o as well. At link time, collect all this IR and re-optimize it.
- Cross-module inlining: inline
addintomain-> constant folding ->return 5. - Dead code elimination: remove functions nobody calls.
- Devirtualization (C++): convert virtual calls to direct calls.
8.3 Thin LTO — Better Scalability
Classic LTO loads all IR at once - large projects need tens of GB of RAM.
Thin LTO (LLVM): compile each module independently but exchange summaries. Perform only the cross-module optimizations that matter. Firefox and Chrome use this.
8.4 The Cost of LTO
- Longer build times: the link stage gets heavy.
- Harder debugging: inlined functions make stack traces tricky.
- Incremental-build complexity: a single-file edit triggers a long relink.
Still, release builds typically see a 5-15% performance lift - worth it.
9. PGO — Profile-Guided Optimization
9.1 "Make the Hot Path Fast"
Static analysis alone can't tell "which arm of an if-else runs more often." PGO:
- Instrumented build: compile with
-fprofile-generate- inserts code to record how many times each branch/loop runs. - Run a representative workload: run with actual production scenarios. Counters are written to
*.gcdafiles. - Optimized build: recompile with
-fprofile-use- counter-driven optimizations:- Split hot code into
.text.hotfor i-cache efficiency. - Move cold code elsewhere.
- Put frequently taken branches on the fallthrough path.
- Prioritize inlining frequently called functions.
- Split hot code into
9.2 Impact in Practice
Chrome, Firefox, and Clang itself are built with PGO. Typically 10-30% performance improvement. Especially effective for interpreter loops such as JITs and VMs.
10. Rust, Go — A Different Path From Classical Linking
10.1 Rust — Monomorphization and LTO
Rust monomorphizes generics (instantiate-on-use):
fn foo<T>(x: T) { ... }
foo::<i32>(1);
foo::<String>("a".into());
// the compiler emits foo twice, for i32 and for String -> distinct symbols
Similar to C++ templates. Result:
- No runtime cost.
- Binary size blow-up (the same function copied N times).
- Longer compile times.
Rust defaults to static linking (libstd included). To use a .so you must explicitly set crate-type = ["cdylib"].
10.2 Go — A Single Binary That Includes the Runtime
Go defaults to static linking and bundles the runtime (GC, scheduler) too:
- "The simplest hello world" is 2MB+.
- But deployment ends with one line:
scp binary user@server:/usr/local/bin/- Go's killer feature since before Docker. - Using cgo switches to dynamic linking (needs
libc) - a common cause of Docker distroless breakage.
10.3 Zig — The Gospel of Cross-Compilation
Zig ships libc source inside its toolchain:
zig cc -target x86_64-linux-gnu hello.c -o hello # for Linux
zig cc -target aarch64-macos hello.c -o hello # for ARM Mac
Go made cross-compilation easy but cgo made it complicated. Zig solves cross-compiling C too. Adopted by Bun, Uber, and others.
11. Collection of Practical Pitfalls
11.1 "undefined reference to func" — The Most Common Linker Error
Five causes:
- Missing
.oor.a: forgetting foo.o ingcc main.o foo.o -o main. - Missing library name: in
gcc main.c -o main -lm, without-lm(math library),sin/cosstay unresolved. - Library order: in
gcc main.o -lfoo -lbar, if bar depends on foo, the correct order is-lbar -lfoo. GNU ld scans left-to-right only once. - C++ mangling of C functions: including the header without
extern "C". - Symbol not exported: missing
__declspec(dllexport)in Windows DLLs, or hiding everything with-fvisibility=hidden.
11.2 The Traps of strip
strip hello # remove all symbols -> no crash stack traces
strip --strip-debug hello # remove only debug info -> appropriate for prod
objcopy --only-keep-debug hello hello.debug
objcopy --strip-debug hello
objcopy --add-gnu-debuglink=hello.debug hello # keep them separate
Typically production gets stripped, debug symbols are stored separately, and on a crash you combine them under gdb.
11.3 rpath vs runpath
To make the executable find libfoo.so in /opt/myapp/lib:
-Wl,-rpath=/opt/myapp/lib: hard-coded. The$ORIGINvariable designates "the directory of the executable."LD_LIBRARY_PATH: env var. Ignored by setuid for security./etc/ld.so.conf.d/: system-wide.
$ORIGIN/../lib is especially useful: a portable deployment loads libraries from lib/ next to bin/.
11.4 LD_PRELOAD — Great and Dangerous
LD_PRELOAD=./mymalloc.so ./myapp
- Intercept every
malloccall with my implementation. - jemalloc, tcmalloc are injected into existing binaries this way.
- The underlying mechanism of performance profilers (memory leak detectors).
- Security risk: the kernel blocks malicious LD_PRELOAD from hijacking setuid binaries.
12. Practical Checklist
Debugging:
readelf -a binary- scan the whole structure.objdump -d binary- disassembly.nm binary- symbol list.nm -Dfor dynamic symbols.ldd binary- dependency libraries (only on trusted binaries).strace -e openat binary- which files it opens at runtime.
Performance builds:
-O2or-O3+-flto+-fno-plt+-Wl,-O1,--as-needed.- For hotspots that need profile guidance, add
-fprofile-generate/use.
Security builds:
-fstack-protector-strong- stack canary.-D_FORTIFY_SOURCE=2- buffer-overflow guard.-fPIE -pie- position-independent executable.-Wl,-z,relro,-z,now- Full RELRO.-Wl,-z,noexecstack- NX bit.
Shrinking:
-ffunction-sections -fdata-sections -Wl,--gc-sections- drop unused functions/data.strip --strip-debug.-Osor-Oz- size-first optimization.
13. Closing — The 40 Years Hidden Behind gcc hello.c
During the 8 seconds hello.c becomes hello:
- The descendant of the 1970s Unix v6 ed.o + as.o pipeline runs.
- The ELF format Sun defined along with SVR4 in 1988 writes the headers.
- 30 years of symbol versioning since glibc 2.0 (adopted by Linux in 1995) gets resolved.
- The IR proposed by LLVM in the early 2000s becomes the intermediate representation (if you use clang).
- Thin LTO and PGO from the 2010s accelerate release builds.
- In the 2020s, Zig and Rust are rewriting the common sense of cross-compilation.
A single line of gcc hello.c operates at the intersection of all that history. Today's executable stands on yesterday's wisdom.
In the next post I plan to dig into how the OS kernel actually loads this binary into memory - the execve syscall, page table creation, mmap, the layout of a process's address space, vdso, and the fast path of system calls. The journey continues even after the executable is loaded into memory.
References
- John R. Levine - "Linkers and Loaders" (Morgan Kaufmann, 1999) - the classic.
- Ian Lance Taylor - "Linkers" blog series (2007) - commentary from the author of the gold linker.
- Ulrich Drepper - "How To Write Shared Libraries" (2011) - the definitive doc by a glibc maintainer.
- System V ABI AMD64 Supplement - the official x86-64 ABI document.
- Itanium C++ ABI - the source of the mangling rules.
- LLVM Thin LTO paper (Apple, 2016).
- Mike Pall - LuaJIT's PGO/LTO notes.
- "Computer Systems: A Programmer's Perspective" - Bryant and O'Hallaron, 3rd ed - Chapter 7 (Linking).