File-System Internals

After examining the basic principles of file system implementation, we now take a deeper look at the internal structures and performance optimization techniques of real file systems. This article covers modern file systems such as ext4 and APFS, as well as network file systems including NFS.

1. File System Mounting

The operating system must mount a file system before using it. Mounting connects a file system to a specific point (mount point) in the directory tree.

Before mount:                  After mount:
     / (root)                      / (root)
    ╱    ╲                       ╱    ╲
  home   mnt                   home   mnt
  │      │ (empty)             │      │
 user1                        user1  ┌──────────────┐
                                     │ USB drive    │
                                     │ (ext4)       │
                                     │  photos/     │
                                     │  docs/       │
                                     └──────────────┘
                              Accessible via /mnt/photos

# 마운트 명령어 예시
# 장치를 마운트 포인트에 연결
mount /dev/sdb1 /mnt

# 특정 파일 시스템 타입 지정
mount -t ext4 /dev/sdb1 /mnt

# 읽기 전용 마운트
mount -o ro /dev/sdb1 /mnt

# 현재 마운트 목록 확인
mount | column -t

# 언마운트
umount /mnt

/etc/fstab File

Defines file systems to be automatically mounted at boot.

# /etc/fstab 예시
# 장치            마운트포인트  타입   옵션            덤프 패스
/dev/sda1         /            ext4   defaults         0    1
/dev/sda2         /home        ext4   defaults         0    2
/dev/sdb1         /data        xfs    noatime          0    2
tmpfs             /tmp         tmpfs  size=2G          0    0

2. Partitions and Block Groups

Disk Partition Structure

┌──────────────────────────────────────────────────┐
│                Physical Disk                      │
│                                                  │
│  ┌─────┬────────────┬────────────┬────────────┐  │
│  │ MBR │ Partition 1│ Partition 2│ Partition 3│  │
│  │     │ (ext4)     │ (swap)     │ (ext4)     │  │
│  │     │ /          │            │ /home      │  │
│  └─────┴────────────┴────────────┴────────────┘  │
│                                                  │
│  GPT (GUID Partition Table):                     │
│  ┌─────┬──────┬──────┬──────┬───────┬─────┐     │
│  │ MBR │ GPT  │Part 1│Part 2│ ...   │ GPT │     │
│  │prot.│header│      │      │       │bkup │     │
│  └─────┴──────┴──────┴──────┴───────┴─────┘     │
└──────────────────────────────────────────────────┘

Block Groups (ext4)

ext4 divides the disk into multiple block groups for management.

┌────────────────────────────────────────────────────┐
│              ext4 File System Layout                │
│                                                    │
│ ┌──────┬──────────┬──────────┬──────────┬────────┐ │
│ │Group │ Group 0  │ Group 1  │ Group 2  │ ...    │ │
│ │  0   │          │          │          │        │ │
│ └──────┴──────────┴──────────┴──────────┴────────┘ │
│                                                    │
│ Inside each block group:                           │
│ ┌────────┬────────┬────────┬────────┬────────────┐ │
│ │ Super  │ Group  │ Block  │ inode  │ Data       │ │
│ │ Block  │ Desc.  │ Bitmap │ Table  │ Blocks     │ │
│ │ (bkup) │ Table  │+ inode │        │            │ │
│ │        │        │ Bitmap │        │            │ │
│ └────────┴────────┴────────┴────────┴────────────┘ │
└────────────────────────────────────────────────────┘

Benefits of block groups:

Locality: Places related data close together to minimize disk seeks
Parallelism: Operations on different block groups can be performed in parallel
Reliability: Superblock and group descriptor backups enable recovery

3. ext4 File System Internals

ext4 is the most widely used file system on Linux.

ext4 Key Features

Property	Value
Max file size	16 TiB
Max volume size	1 EiB
Max file name length	255 bytes
Max directory depth	Unlimited
Journaling	Supported (ordered)
Extents	Supported

Extent-Based Allocation

ext4 uses extents instead of traditional indirect block pointers.

Traditional (ext2/ext3):          ext4 Extents:
inode → 12 direct pointers       inode → extent tree
     → single indirect
     → double indirect           extent: (start block, length)
     → triple indirect           one extent represents contiguous blocks

Ex: 100 contiguous blocks        Ex: 100 contiguous blocks
→ 100 pointers needed            → 1 extent (start=50, len=100)

// ext4 익스텐트 구조체 (간단화)
struct ext4_extent {
    uint32_t ee_block;     // 논리 블록 번호
    uint16_t ee_len;       // 블록 수 (최대 32768)
    uint16_t ee_start_hi;  // 물리 블록 상위 16비트
    uint32_t ee_start_lo;  // 물리 블록 하위 32비트
};

// 익스텐트 트리 헤더
struct ext4_extent_header {
    uint16_t eh_magic;     // 매직 넘버 (0xF30A)
    uint16_t eh_entries;   // 유효한 엔트리 수
    uint16_t eh_max;       // 최대 엔트리 수
    uint16_t eh_depth;     // 트리 깊이 (0 = 리프)
    uint32_t eh_generation;
};

ext4 Delayed Allocation

Traditional approach:                Delayed allocation:
write() call                         write() call
  │                                    │
  ├→ Allocate blocks (immediately)     ├→ Store data in page cache
  ├→ Write to disk                     │   (blocks not yet allocated)
  └→ Done                             │
                                       ├→ At flush time
                                       │   ├→ Allocate contiguous blocks at once
                                       │   └→ Write to disk
                                       └→ Done

Advantage: Increased chance of contiguous allocation, better performance

4. Apple File System (APFS)

APFS is Apple's file system developed to replace HFS+.

APFS Key Features

┌──────────────────────────────────────────┐
│              APFS Container              │
│                                          │
│  ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│  │ Volume 1 │ │ Volume 2 │ │ Volume 3 │ │
│  │ (macOS)  │ │ (Data)   │ │ (Preboot)│ │
│  └──────────┘ └──────────┘ └──────────┘ │
│                                          │
│  Space sharing: Volumes dynamically      │
│  share free space within the container   │
└──────────────────────────────────────────┘

Copy-on-Write (CoW): Preserves the original and writes to a new location when modifying data
Snapshots: Point-in-time recovery using CoW
Space Sharing: Multiple volumes within a single container share space dynamically
Encryption: Native encryption at file/volume level
Clones: Instant file/directory copy (copies only metadata, actual copy on modification)

Copy-on-Write Operation

Initial state:
Block A → [Original Data]

On modification (CoW):
Block A → [Original Data]  (preserved)
Block B → [Modified Data]  (written to new location)
Metadata pointer updated to point to Block B

With a snapshot:
Snapshot → Block A (retains original reference)
Current  → Block B (modified version reference)

5. Performance Optimization

Buffer Cache

Application read() request
       │
       ▼
┌──────────────────┐
│   Buffer Cache   │
│ (Kernel Memory)  │
│                  │
│ [Block 42: data] │ ← Cache hit: return immediately
│ [Block 88: data] │
│ [Block 15: data] │
│ [Block 7:  data] │
└────────┬─────────┘
         │ Cache miss
         ▼
    ┌─────────┐
    │  Disk   │ → Read and store in cache
    └─────────┘

Read-Ahead

When a sequential access pattern is detected, the next blocks are read into the cache in advance.

Request pattern: Block 1 → Block 2 → Block 3 → ...

When Block 3 is requested:
Kernel detects sequential pattern → also reads blocks 4, 5, 6, 7

┌────┬────┬────┬────┬────┬────┬────┐
│ B1 │ B2 │ B3 │ B4 │ B5 │ B6 │ B7 │
│req │req │req │read│read│read│read│
│done│done│ret │ahd │ahd │ahd │ahd │
└────┴────┴────┴────┴────┴────┴────┘

Free-Behind

During sequential reads, evicts already-used pages from the cache to free space.

Sequential read direction →

Cache state:
[B1] [B2] [B3] [B4] [B5]  ← Normal LRU
 ↑ Old pages remain in cache

[--] [--] [B3] [B4] [B5]  ← Free-Behind applied
 ↑ Released after read → space for new data

Unified Buffer Cache

Past (separate):                 Present (unified):
┌──────────┐ ┌──────────┐      ┌──────────────────────┐
│ Page     │ │ Buffer   │      │  Unified Page Cache   │
│ Cache    │ │ Cache    │      │                      │
│(for mmap)│ │(for read)│      │  Both mmap and       │
│          │ │          │      │  read/write use the  │
└──────────┘ └──────────┘      │  same cache          │
 Same data may be              └──────────────────────┘
 cached twice                   No duplication,
                                consistency guaranteed

6. NFS (Network File System)

NFS is a distributed file system protocol for accessing remote file systems over a network.

NFS Architecture

Client                              Server
┌──────────────────┐            ┌──────────────────┐
│ User Process     │            │                  │
│  open("/mnt/f")  │            │                  │
├──────────────────┤            │                  │
│      VFS         │            │                  │
├──────────────────┤            ├──────────────────┤
│  NFS Client      │            │  NFS Server      │
│  (nfsd)          │ ← RPC/XDR →│  Daemon (nfsd)   │
├──────────────────┤            ├──────────────────┤
│    Network       │            │    Local         │
│    (TCP/UDP)     │            │    File System   │
└──────────────────┘            └──────────────────┘

NFS Mount

# NFS 서버에서 공유 디렉토리 설정
# /etc/exports
/data  192.168.1.0/24(rw,sync,no_subtree_check)

# NFS 서비스 시작
sudo systemctl start nfs-server

# 클라이언트에서 마운트
sudo mount -t nfs server:/data /mnt/nfs

# /etc/fstab에 추가하여 부팅 시 자동 마운트
# server:/data  /mnt/nfs  nfs  defaults  0  0

NFS Stateless Design

NFS v3 was designed as a stateless protocol.

Stateless advantages:
- No client reconnection needed after server restart
- Server doesn't manage per-client state, good scalability

Stateless implementation:
- File handle identifies files
- Each request is independent (includes all needed info like offset)
- Caching handled on the client side

NFS v4 changes:
- Switched to stateful protocol
- Supports advanced features like file locking, delegation
- Enhanced security (Kerberos authentication)

7. Summary

Mounting: The process of attaching a file system to the directory tree
Block Groups: Structure that improves data locality and reduces seek time
ext4: Performance and reliability through extents, delayed allocation, and journaling
APFS: Modern features including Copy-on-Write, snapshots, and space sharing
Performance Optimization: Buffer cache, read-ahead, free-behind, unified page cache
NFS: RPC-based network file system, v3 stateless / v4 stateful

Quiz: File-System Internals

Q1. Why are ext4 extents better than traditional indirect block pointers?

A1. Extents represent contiguous physical blocks as (start block, length), so instead of thousands of individual block pointers for large files, they can be managed with a small number of extents. This reduces metadata overhead and improves performance by reading contiguous blocks in a single I/O operation.

Q2. What are the advantages and disadvantages of Copy-on-Write (CoW)?

A2. Advantages: Original data is preserved, making snapshot implementation easy and maintaining consistency on crashes. File cloning is also O(1) fast. Disadvantages: Modifications always write to new locations, which can turn sequential writes into random writes and may cause fragmentation.

Q3. Why was NFS v3 stateless, and why did v4 switch to stateful?

A3. The stateless design of v3 makes server restart after crashes simple and provides good scalability. However, it makes advanced features like file locking and cache consistency difficult to implement. v4 maintains state to support richer features such as file locking, delegation, and security.