- Authors

- Name
- Youngju Kim
- @fjvbn20031
File-System Implementation
After examining the file system's user interface, we now look at how the operating system implements the file system behind that interface. This article covers on-disk data structures, in-memory data structures, allocation methods, free space management, and more.
1. File System Layer Hierarchy
┌──────────────────────────────────────────┐
│ Application (User Space) │
├──────────────────────────────────────────┤
│ Logical File System │
│ (File name → inode translation, │
│ protection checks) │
├──────────────────────────────────────────┤
│ File Organization Module │
│ (Logical block → physical block mapping)│
├──────────────────────────────────────────┤
│ Basic File System │
│ (Issues block I/O requests) │
├──────────────────────────────────────────┤
│ I/O Control │
│ (Device drivers, interrupt handlers) │
├──────────────────────────────────────────┤
│ Storage Device (Disk/SSD) │
└──────────────────────────────────────────┘
2. On-Disk Structures
File System Disk Layout
┌────────┬───────────┬──────────┬──────────┬─────────────────┐
│ Boot │ Super │ Free │ inode │ Data Blocks │
│ Block │ Block │ Space │ Table │ │
│ │ │ Mgmt │ │ │
│ Boot │ Meta │ Bitmap/ │ File │ Actual file │
│ loader │ data │ list │ meta │ data │
└────────┴───────────┴──────────┴──────────┴─────────────────┘
| Structure | Role |
|---|---|
| Boot Block | Code needed for booting (boot loader) |
| Super Block | Overall file system info (block count, inode count, etc.) |
| Free Space Mgmt | Tracks available blocks (bitmap or linked list) |
| inode Table | Metadata and block pointers for each file |
| Data Blocks | Actual file/directory data |
inode Structure
┌─────────────────────────────────────┐
│ inode Structure │
├─────────────────────────────────────┤
│ File type, permissions (mode) │
│ Owner (uid, gid) │
│ Size │
│ Timestamps (atime, mtime, ctime) │
│ Link count │
│ Block count │
├─────────────────────────────────────┤
│ Direct block pointers [0-11] → data │
│ Single indirect pointer → ptr block │
│ Double indirect pointer → ptr^2 │
│ Triple indirect pointer → ptr^3 │
└─────────────────────────────────────┘
// 간단화된 inode 구조체 (ext2 기반)
struct inode {
uint16_t mode; // 파일 유형 + 권한
uint16_t uid; // 소유자 ID
uint32_t size; // 파일 크기
uint32_t atime; // 접근 시간
uint32_t mtime; // 수정 시간
uint32_t ctime; // 상태 변경 시간
uint16_t gid; // 그룹 ID
uint16_t links_count; // 하드 링크 수
uint32_t blocks; // 할당된 블록 수
uint32_t direct[12]; // 직접 블록 포인터
uint32_t single_indirect; // 단일 간접 포인터
uint32_t double_indirect; // 이중 간접 포인터
uint32_t triple_indirect; // 삼중 간접 포인터
};
3. In-Memory Structures
The file system caches frequently used information in memory for performance.
┌───────────────────────────────────────────────┐
│ Kernel Memory │
│ │
│ ┌─────────────────┐ ┌───────────────────┐ │
│ │ Mount Table │ │ Directory Cache │ │
│ │ (FS list) │ │ (recent paths) │ │
│ └─────────────────┘ └───────────────────┘ │
│ │
│ ┌─────────────────┐ ┌───────────────────┐ │
│ │ System-wide │ │ Per-process │ │
│ │ Open File Table │ │ Open File Table │ │
│ └─────────────────┘ └───────────────────┘ │
│ │
│ ┌─────────────────┐ │
│ │ Buffer Cache │ ← Disk block cache │
│ └─────────────────┘ │
└───────────────────────────────────────────────┘
4. VFS (Virtual File System)
VFS is an abstraction layer that unifies multiple file system types under a single interface.
User Process
open() read() write()
│
┌──────────┴──────────┐
│ VFS │
│ (Virtual File │
│ System) │
│ vnode interface │
└──┬──────┬──────┬─────┘
│ │ │
┌──┴──┐┌──┴──┐┌──┴──┐
│ext4 ││NFS ││FAT │
│ ││ ││ │
└──┬──┘└──┬──┘└──┬──┘
│ │ │
Local Network USB
disk server drive
VFS Key Objects
// VFS 추상화의 주요 객체 (Linux 기반 의사 코드)
// 1. 슈퍼블록 객체: 파일 시스템 인스턴스 정보
struct super_block {
struct list_head s_list; // 슈퍼블록 목록
dev_t s_dev; // 장치 식별자
unsigned long s_blocksize; // 블록 크기
struct super_operations *s_op; // 연산 테이블
// ...
};
// 2. inode 객체: 개별 파일 정보
struct inode {
umode_t i_mode; // 파일 유형 + 권한
uid_t i_uid; // 소유자
loff_t i_size; // 파일 크기
struct inode_operations *i_op; // inode 연산
struct file_operations *i_fop;// 파일 연산
// ...
};
// 3. dentry 객체: 디렉토리 엔트리 (경로명 컴포넌트)
struct dentry {
struct inode *d_inode; // 관련 inode
struct dentry *d_parent; // 부모 디렉토리
struct qstr d_name; // 엔트리 이름
struct dentry_operations *d_op;
// ...
};
// 4. file 객체: 열린 파일 인스턴스
struct file {
struct dentry *f_dentry; // 관련 dentry
loff_t f_pos; // 현재 오프셋
unsigned int f_flags; // 열기 플래그
struct file_operations *f_op; // 파일 연산
// ...
};
5. Directory Implementation
Linear List
Directory Block:
┌──────────┬────────┬──────────┬────────┬──────────┬────────┐
│ "file1" │ inode │ "file2" │ inode │ "file3" │ inode │
│ │ 23 │ │ 47 │ │ 12 │
└──────────┴────────┴──────────┴────────┴──────────┴────────┘
Search: O(n) - sequential scan from beginning
Advantage: Simple implementation
Disadvantage: Slow for large directories
Hash Table
Hash function: hash("filename") → bucket index
Bucket 0: ["file_a", inode 23] → ["file_d", inode 89] → NULL
Bucket 1: ["file_b", inode 47] → NULL
Bucket 2: NULL
Bucket 3: ["file_c", inode 12] → ["file_e", inode 56] → NULL
...
Search: O(1) average (linked list traversal on collision)
Advantage: Fast lookup
Disadvantage: Fixed size, collision management required
6. Allocation Methods
Determines how file data blocks are arranged on disk.
Contiguous Allocation
Directory Entry:
┌────────┬───────────┬──────┐
│ Name │ Start Blk │ Len │
├────────┼───────────┼──────┤
│ file_a │ 2 │ 3 │
│ file_b │ 7 │ 2 │
│ file_c │ 14 │ 4 │
└────────┴───────────┴──────┘
Disk Blocks:
0 1 [2 3 4] 5 6 [7 8] 9 10 11 12 13 [14 15 16 17]
file_a file_b file_c
Advantages: Efficient for both sequential and direct access, minimizes disk seeks Disadvantages: External fragmentation, difficult to change file size
Linked Allocation
Directory:
┌────────┬──────┬──────┐
│ Name │Start │ End │
├────────┼──────┼──────┤
│ file_a │ 9 │ 1 │
└────────┴──────┴──────┘
Disk Blocks (each block contains next block pointer):
Block 9 → Block 16 → Block 25 → Block 1 → NULL
[data|16] [data|25] [data|1] [data|-1]
Advantages: No external fragmentation, easy to resize files Disadvantages: Inefficient direct access (requires sequential traversal), pointer space overhead
FAT (File Allocation Table)
FAT Table (cached in memory):
┌─────┬─────┬─────┬─────┬─────┬─────┬─────┐
│ 0 │ 1 │ 2 │ 3 │ 4 │ 5 │ 6 │ ← Block number
├─────┼─────┼─────┼─────┼─────┼─────┼─────┤
│ 3 │ EOF │ 5 │ 4 │ EOF │ 1 │ FREE│ ← Next block
└─────┴─────┴─────┴─────┴─────┴─────┴─────┘
File A: start=0 → 0→3→4→EOF (blocks 0, 3, 4)
File B: start=2 → 2→5→1→EOF (blocks 2, 5, 1)
Indexed Allocation
Directory:
┌────────┬─────────────┐
│ Name │ Index Block │
├────────┼─────────────┤
│ file_a │ 19 │
└────────┴─────────────┘
Index Block 19: Data Blocks:
┌─────────────┐ Block 9: [data...]
│ 9 │ ────→ Block 16: [data...]
│ 16 │ ────→ Block 25: [data...]
│ 25 │ ────→ Block 1: [data...]
│ 1 │ ────→
│ -1 (end) │
└─────────────┘
Advantages: Efficient direct access, no external fragmentation Disadvantages: Index block overhead, inefficient for small files
Allocation Method Comparison
| Method | Sequential | Direct | Fragmentation | File Growth |
|---|---|---|---|---|
| Contiguous | Very fast | Fast | External | Difficult |
| Linked | Moderate | Slow | None | Easy |
| Indexed | Moderate | Fast | None | Moderate |
7. Free Space Management
Bitmap (Bit Vector)
Block number: 0 1 2 3 4 5 6 7 8 9 10 11 12
Bitmap: 1 1 0 1 1 0 0 1 0 0 0 1 0
1 = in use, 0 = free
Finding free blocks: search for first 0 bit in bitmap
Advantage: Easy to find contiguous free space
Disadvantage: Bitmap grows large for big disks
(1TB disk, 4KB blocks → 32MB bitmap)
// 비트맵에서 빈 블록 찾기
int find_free_block(uint32_t *bitmap, int total_blocks) {
int words = total_blocks / 32;
for (int i = 0; i < words; i++) {
if (bitmap[i] != 0xFFFFFFFF) { // 모두 사용 중이 아니면
// 첫 번째 0 비트 위치 찾기
for (int j = 0; j < 32; j++) {
if (!(bitmap[i] & (1 << j))) {
return i * 32 + j;
}
}
}
}
return -1; // 빈 블록 없음
}
Linked List
Free Space Head → Block 2 → Block 5 → Block 6 → Block 8 → NULL
Each free block points to the next free block
Advantage: Simple implementation
Disadvantage: Hard to find contiguous free space, traversal cost
TRIM Command
Notifies the SSD of deleted blocks in advance to optimize garbage collection.
Without TRIM: With TRIM:
1. File delete → meta only 1. File delete → meta remove
2. SSD unaware block is free 2. TRIM command notifies SSD
3. On later write: 3. SSD erases block in advance
Read → Erase → Write needed 4. On later write: write directly
(write perf degradation) (write perf maintained)
8. Journaling
A mechanism to maintain file system consistency in case of system crashes.
Journaling Operation Process
Normal write (no journaling):
1. Write data blocks
2. Update inode
3. Update bitmap
→ Crash at step 2 causes inconsistency!
Journaled write:
┌──────────────────────────────────────┐
│ Journal Area (Log) │
│ │
│ 1. Transaction Begin (TxB) │
│ 2. Record metadata changes │
│ 3. Transaction Commit (TxE) │
│ │
│ → Apply to actual location after │
│ commit │
│ → Delete journal after apply │
└──────────────────────────────────────┘
Journaling Modes
| Mode | Journal Contents | Safety | Perf |
|---|---|---|---|
| Journal | Metadata + Data | Highest | Slow |
| Ordered | Metadata only (data written first) | High | Medium |
| Writeback | Metadata only (no ordering guarantee) | Medium | Fast |
// 저널링 트랜잭션 의사 코드
void journaled_write(struct transaction *tx) {
// 1. 저널에 트랜잭션 시작 기록
journal_write_begin(tx);
// 2. 변경할 메타데이터를 저널에 기록
journal_write_metadata(tx, inode_data);
journal_write_metadata(tx, bitmap_data);
// 3. 저널에 트랜잭션 커밋 기록
journal_write_commit(tx);
// 4. 실제 파일 시스템 위치에 적용 (체크포인트)
write_to_actual_location(inode_data);
write_to_actual_location(bitmap_data);
// 5. 저널에서 트랜잭션 제거
journal_free(tx);
}
// 복구 시: 커밋된 트랜잭션은 재실행, 미커밋은 무시
void recover_from_crash() {
for_each_transaction(journal) {
if (transaction_committed(tx))
replay_transaction(tx); // 재실행
else
discard_transaction(tx); // 폐기
}
}
9. Summary
- File System Layers: Logical file system → File organization module → Basic file system → I/O control
- Disk Structures: Boot block, super block, inode table, data blocks
- VFS: Abstracts multiple file systems under a single interface
- Allocation Methods: Contiguous (fast but fragmentation), linked (flexible but slow direct access), indexed (balanced)
- Free Space Management: Bitmap is most common. TRIM optimizes SSDs
- Journaling: Ensures file system consistency through transaction logs
Quiz: File-System Implementation
Q1. What is the role of VFS?
A1. VFS (Virtual File System) abstracts different file systems such as ext4, NFS, and FAT under a unified interface (open, read, write, etc.). This allows applications to access files using the same system calls regardless of the underlying file system type.
Q2. What are the trade-offs between contiguous allocation and indexed allocation?
A2. Contiguous allocation is fast for both sequential and direct access but suffers from external fragmentation and makes file resizing difficult. Indexed allocation has no external fragmentation and makes file growth easy, but requires additional space and I/O for the index block.
Q3. How does the ordered journaling mode work?
A3. In ordered mode, only metadata is journaled, but data blocks are written to disk before the metadata. This prevents a situation where metadata points to data that hasn't been written yet after a crash. It offers better performance than journal mode while maintaining data consistency.