- Authors

- Name
- Youngju Kim
- @fjvbn20031
File-System Internals
After examining the basic principles of file system implementation, we now take a deeper look at the internal structures and performance optimization techniques of real file systems. This article covers modern file systems such as ext4 and APFS, as well as network file systems including NFS.
1. File System Mounting
The operating system must mount a file system before using it. Mounting connects a file system to a specific point (mount point) in the directory tree.
Before mount: After mount:
/ (root) / (root)
╱ ╲ ╱ ╲
home mnt home mnt
│ │ (empty) │ │
user1 user1 ┌──────────────┐
│ USB drive │
│ (ext4) │
│ photos/ │
│ docs/ │
└──────────────┘
Accessible via /mnt/photos
# 마운트 명령어 예시
# 장치를 마운트 포인트에 연결
mount /dev/sdb1 /mnt
# 특정 파일 시스템 타입 지정
mount -t ext4 /dev/sdb1 /mnt
# 읽기 전용 마운트
mount -o ro /dev/sdb1 /mnt
# 현재 마운트 목록 확인
mount | column -t
# 언마운트
umount /mnt
/etc/fstab File
Defines file systems to be automatically mounted at boot.
# /etc/fstab 예시
# 장치 마운트포인트 타입 옵션 덤프 패스
/dev/sda1 / ext4 defaults 0 1
/dev/sda2 /home ext4 defaults 0 2
/dev/sdb1 /data xfs noatime 0 2
tmpfs /tmp tmpfs size=2G 0 0
2. Partitions and Block Groups
Disk Partition Structure
┌──────────────────────────────────────────────────┐
│ Physical Disk │
│ │
│ ┌─────┬────────────┬────────────┬────────────┐ │
│ │ MBR │ Partition 1│ Partition 2│ Partition 3│ │
│ │ │ (ext4) │ (swap) │ (ext4) │ │
│ │ │ / │ │ /home │ │
│ └─────┴────────────┴────────────┴────────────┘ │
│ │
│ GPT (GUID Partition Table): │
│ ┌─────┬──────┬──────┬──────┬───────┬─────┐ │
│ │ MBR │ GPT │Part 1│Part 2│ ... │ GPT │ │
│ │prot.│header│ │ │ │bkup │ │
│ └─────┴──────┴──────┴──────┴───────┴─────┘ │
└──────────────────────────────────────────────────┘
Block Groups (ext4)
ext4 divides the disk into multiple block groups for management.
┌────────────────────────────────────────────────────┐
│ ext4 File System Layout │
│ │
│ ┌──────┬──────────┬──────────┬──────────┬────────┐ │
│ │Group │ Group 0 │ Group 1 │ Group 2 │ ... │ │
│ │ 0 │ │ │ │ │ │
│ └──────┴──────────┴──────────┴──────────┴────────┘ │
│ │
│ Inside each block group: │
│ ┌────────┬────────┬────────┬────────┬────────────┐ │
│ │ Super │ Group │ Block │ inode │ Data │ │
│ │ Block │ Desc. │ Bitmap │ Table │ Blocks │ │
│ │ (bkup) │ Table │+ inode │ │ │ │
│ │ │ │ Bitmap │ │ │ │
│ └────────┴────────┴────────┴────────┴────────────┘ │
└────────────────────────────────────────────────────┘
Benefits of block groups:
- Locality: Places related data close together to minimize disk seeks
- Parallelism: Operations on different block groups can be performed in parallel
- Reliability: Superblock and group descriptor backups enable recovery
3. ext4 File System Internals
ext4 is the most widely used file system on Linux.
ext4 Key Features
| Property | Value |
|---|---|
| Max file size | 16 TiB |
| Max volume size | 1 EiB |
| Max file name length | 255 bytes |
| Max directory depth | Unlimited |
| Journaling | Supported (ordered) |
| Extents | Supported |
Extent-Based Allocation
ext4 uses extents instead of traditional indirect block pointers.
Traditional (ext2/ext3): ext4 Extents:
inode → 12 direct pointers inode → extent tree
→ single indirect
→ double indirect extent: (start block, length)
→ triple indirect one extent represents contiguous blocks
Ex: 100 contiguous blocks Ex: 100 contiguous blocks
→ 100 pointers needed → 1 extent (start=50, len=100)
// ext4 익스텐트 구조체 (간단화)
struct ext4_extent {
uint32_t ee_block; // 논리 블록 번호
uint16_t ee_len; // 블록 수 (최대 32768)
uint16_t ee_start_hi; // 물리 블록 상위 16비트
uint32_t ee_start_lo; // 물리 블록 하위 32비트
};
// 익스텐트 트리 헤더
struct ext4_extent_header {
uint16_t eh_magic; // 매직 넘버 (0xF30A)
uint16_t eh_entries; // 유효한 엔트리 수
uint16_t eh_max; // 최대 엔트리 수
uint16_t eh_depth; // 트리 깊이 (0 = 리프)
uint32_t eh_generation;
};
ext4 Delayed Allocation
Traditional approach: Delayed allocation:
write() call write() call
│ │
├→ Allocate blocks (immediately) ├→ Store data in page cache
├→ Write to disk │ (blocks not yet allocated)
└→ Done │
├→ At flush time
│ ├→ Allocate contiguous blocks at once
│ └→ Write to disk
└→ Done
Advantage: Increased chance of contiguous allocation, better performance
4. Apple File System (APFS)
APFS is Apple's file system developed to replace HFS+.
APFS Key Features
┌──────────────────────────────────────────┐
│ APFS Container │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Volume 1 │ │ Volume 2 │ │ Volume 3 │ │
│ │ (macOS) │ │ (Data) │ │ (Preboot)│ │
│ └──────────┘ └──────────┘ └──────────┘ │
│ │
│ Space sharing: Volumes dynamically │
│ share free space within the container │
└──────────────────────────────────────────┘
- Copy-on-Write (CoW): Preserves the original and writes to a new location when modifying data
- Snapshots: Point-in-time recovery using CoW
- Space Sharing: Multiple volumes within a single container share space dynamically
- Encryption: Native encryption at file/volume level
- Clones: Instant file/directory copy (copies only metadata, actual copy on modification)
Copy-on-Write Operation
Initial state:
Block A → [Original Data]
On modification (CoW):
Block A → [Original Data] (preserved)
Block B → [Modified Data] (written to new location)
Metadata pointer updated to point to Block B
With a snapshot:
Snapshot → Block A (retains original reference)
Current → Block B (modified version reference)
5. Performance Optimization
Buffer Cache
Application read() request
│
▼
┌──────────────────┐
│ Buffer Cache │
│ (Kernel Memory) │
│ │
│ [Block 42: data] │ ← Cache hit: return immediately
│ [Block 88: data] │
│ [Block 15: data] │
│ [Block 7: data] │
└────────┬─────────┘
│ Cache miss
▼
┌─────────┐
│ Disk │ → Read and store in cache
└─────────┘
Read-Ahead
When a sequential access pattern is detected, the next blocks are read into the cache in advance.
Request pattern: Block 1 → Block 2 → Block 3 → ...
When Block 3 is requested:
Kernel detects sequential pattern → also reads blocks 4, 5, 6, 7
┌────┬────┬────┬────┬────┬────┬────┐
│ B1 │ B2 │ B3 │ B4 │ B5 │ B6 │ B7 │
│req │req │req │read│read│read│read│
│done│done│ret │ahd │ahd │ahd │ahd │
└────┴────┴────┴────┴────┴────┴────┘
Free-Behind
During sequential reads, evicts already-used pages from the cache to free space.
Sequential read direction →
Cache state:
[B1] [B2] [B3] [B4] [B5] ← Normal LRU
↑ Old pages remain in cache
[--] [--] [B3] [B4] [B5] ← Free-Behind applied
↑ Released after read → space for new data
Unified Buffer Cache
Past (separate): Present (unified):
┌──────────┐ ┌──────────┐ ┌──────────────────────┐
│ Page │ │ Buffer │ │ Unified Page Cache │
│ Cache │ │ Cache │ │ │
│(for mmap)│ │(for read)│ │ Both mmap and │
│ │ │ │ │ read/write use the │
└──────────┘ └──────────┘ │ same cache │
Same data may be └──────────────────────┘
cached twice No duplication,
consistency guaranteed
6. NFS (Network File System)
NFS is a distributed file system protocol for accessing remote file systems over a network.
NFS Architecture
Client Server
┌──────────────────┐ ┌──────────────────┐
│ User Process │ │ │
│ open("/mnt/f") │ │ │
├──────────────────┤ │ │
│ VFS │ │ │
├──────────────────┤ ├──────────────────┤
│ NFS Client │ │ NFS Server │
│ (nfsd) │ ← RPC/XDR →│ Daemon (nfsd) │
├──────────────────┤ ├──────────────────┤
│ Network │ │ Local │
│ (TCP/UDP) │ │ File System │
└──────────────────┘ └──────────────────┘
NFS Mount
# NFS 서버에서 공유 디렉토리 설정
# /etc/exports
/data 192.168.1.0/24(rw,sync,no_subtree_check)
# NFS 서비스 시작
sudo systemctl start nfs-server
# 클라이언트에서 마운트
sudo mount -t nfs server:/data /mnt/nfs
# /etc/fstab에 추가하여 부팅 시 자동 마운트
# server:/data /mnt/nfs nfs defaults 0 0
NFS Stateless Design
NFS v3 was designed as a stateless protocol.
Stateless advantages:
- No client reconnection needed after server restart
- Server doesn't manage per-client state, good scalability
Stateless implementation:
- File handle identifies files
- Each request is independent (includes all needed info like offset)
- Caching handled on the client side
NFS v4 changes:
- Switched to stateful protocol
- Supports advanced features like file locking, delegation
- Enhanced security (Kerberos authentication)
7. Summary
- Mounting: The process of attaching a file system to the directory tree
- Block Groups: Structure that improves data locality and reduces seek time
- ext4: Performance and reliability through extents, delayed allocation, and journaling
- APFS: Modern features including Copy-on-Write, snapshots, and space sharing
- Performance Optimization: Buffer cache, read-ahead, free-behind, unified page cache
- NFS: RPC-based network file system, v3 stateless / v4 stateful
Quiz: File-System Internals
Q1. Why are ext4 extents better than traditional indirect block pointers?
A1. Extents represent contiguous physical blocks as (start block, length), so instead of thousands of individual block pointers for large files, they can be managed with a small number of extents. This reduces metadata overhead and improves performance by reading contiguous blocks in a single I/O operation.
Q2. What are the advantages and disadvantages of Copy-on-Write (CoW)?
A2. Advantages: Original data is preserved, making snapshot implementation easy and maintaining consistency on crashes. File cloning is also O(1) fast. Disadvantages: Modifications always write to new locations, which can turn sequential writes into random writes and may cause fragmentation.
Q3. Why was NFS v3 stateless, and why did v4 switch to stateful?
A3. The stateless design of v3 makes server restart after crashes simple and provides good scalability. However, it makes advanced features like file locking and cache consistency difficult to implement. v4 maintains state to support richer features such as file locking, delegation, and security.