Split View: [운영체제] 19. 네트워크와 분산 시스템

[운영체제] 19. 네트워크와 분산 시스템

네트워크와 분산 시스템

현대 컴퓨팅 환경은 네트워크로 연결된 여러 컴퓨터가 협력하여 동작합니다. 이 글에서는 네트워크 기초, 분산 시스템의 개념과 과제, 분산 파일 시스템, MapReduce, 그리고 분산 조정 메커니즘을 살펴봅니다.

1. 네트워크 유형

┌───────────────────────────────────────────────┐
│            네트워크 규모별 분류                 │
│                                               │
│  PAN (Personal Area Network):                 │
│  ┌───────────────────┐                        │
│  │ 블루투스, NFC      │  ~ 수 미터            │
│  └───────────────────┘                        │
│                                               │
│  LAN (Local Area Network):                    │
│  ┌───────────────────┐                        │
│  │ 이더넷, Wi-Fi     │  건물/캠퍼스 내        │
│  │ 1~10 Gbps        │                        │
│  └───────────────────┘                        │
│                                               │
│  MAN (Metropolitan Area Network):             │
│  ┌───────────────────┐                        │
│  │ 도시 규모 네트워크 │                        │
│  └───────────────────┘                        │
│                                               │
│  WAN (Wide Area Network):                     │
│  ┌───────────────────┐                        │
│  │ 국가/대륙 간 연결  │  인터넷               │
│  └───────────────────┘                        │
└───────────────────────────────────────────────┘

2. TCP/IP 모델

┌─────────────────────────────────────────────┐
│           TCP/IP 4계층 모델                  │
│                                             │
│  ┌─────────────────┐                        │
│  │ 응용 계층       │  HTTP, FTP, DNS, SSH   │
│  │ (Application)   │  소켓 API              │
│  ├─────────────────┤                        │
│  │ 전송 계층       │  TCP (신뢰성, 순서)     │
│  │ (Transport)     │  UDP (빠름, 비연결)     │
│  ├─────────────────┤                        │
│  │ 인터넷 계층     │  IP (주소 지정, 라우팅) │
│  │ (Internet)      │  ICMP, ARP             │
│  ├─────────────────┤                        │
│  │ 네트워크 접근   │  이더넷, Wi-Fi          │
│  │ (Link)          │  MAC 주소              │
│  └─────────────────┘                        │
└─────────────────────────────────────────────┘

TCP vs UDP

특성	TCP	UDP
연결	연결 지향 (3-way handshake)	비연결
신뢰성	보장 (재전송, 순서 보장)	미보장
속도	상대적으로 느림	빠름
용도	웹, 이메일, 파일 전송	스트리밍, DNS, 게임

소켓 프로그래밍 예시

// TCP 서버 예시 (C)
#include <stdio.h>
#include <string.h>
#include <sys/socket.h>
#include <arpa/inet.h>
#include <unistd.h>

int main() {
    int server_fd, client_fd;
    struct sockaddr_in server_addr, client_addr;
    socklen_t client_len = sizeof(client_addr);
    char buffer[1024];

    // 1. 소켓 생성
    server_fd = socket(AF_INET, SOCK_STREAM, 0);

    // 2. 주소 바인딩
    server_addr.sin_family = AF_INET;
    server_addr.sin_addr.s_addr = INADDR_ANY;
    server_addr.sin_port = htons(8080);
    bind(server_fd, (struct sockaddr *)&server_addr,
         sizeof(server_addr));

    // 3. 수신 대기
    listen(server_fd, 5);
    printf("서버 대기 중 (포트 8080)...\n");

    // 4. 연결 수락
    client_fd = accept(server_fd,
                       (struct sockaddr *)&client_addr,
                       &client_len);

    // 5. 데이터 수신
    int bytes = recv(client_fd, buffer, sizeof(buffer) - 1, 0);
    buffer[bytes] = '\0';
    printf("수신: %s\n", buffer);

    // 6. 응답 전송
    const char *response = "Hello from server!";
    send(client_fd, response, strlen(response), 0);

    // 7. 연결 종료
    close(client_fd);
    close(server_fd);
    return 0;
}

// TCP 클라이언트 예시 (C)
#include <stdio.h>
#include <string.h>
#include <sys/socket.h>
#include <arpa/inet.h>
#include <unistd.h>

int main() {
    int sock;
    struct sockaddr_in server_addr;
    char buffer[1024];

    // 1. 소켓 생성
    sock = socket(AF_INET, SOCK_STREAM, 0);

    // 2. 서버에 연결
    server_addr.sin_family = AF_INET;
    server_addr.sin_port = htons(8080);
    inet_pton(AF_INET, "127.0.0.1", &server_addr.sin_addr);
    connect(sock, (struct sockaddr *)&server_addr,
            sizeof(server_addr));

    // 3. 데이터 전송
    const char *message = "Hello from client!";
    send(sock, message, strlen(message), 0);

    // 4. 응답 수신
    int bytes = recv(sock, buffer, sizeof(buffer) - 1, 0);
    buffer[bytes] = '\0';
    printf("서버 응답: %s\n", buffer);

    // 5. 연결 종료
    close(sock);
    return 0;
}

3. 분산 시스템

여러 독립적인 컴퓨터가 네트워크를 통해 협력하여 하나의 시스템처럼 동작합니다.

장점과 과제

장점:                              과제:
┌──────────────────────────┐      ┌──────────────────────────┐
│ 자원 공유                │      │ 네트워크 장애            │
│ - 저장소, 연산 자원 공유 │      │ - 부분 장애 처리         │
│                          │      │                          │
│ 확장성                   │      │ 일관성 유지              │
│ - 수평 확장 (노드 추가)  │      │ - 데이터 복제 시 일관성  │
│                          │      │                          │
│ 가용성                   │      │ 보안                     │
│ - 노드 장애 시 자동 복구 │      │ - 네트워크 통신 보안     │
│                          │      │                          │
│ 성능                     │      │ 시계 동기화              │
│ - 병렬 처리로 처리량 향상│      │ - 분산 환경 이벤트 순서  │
└──────────────────────────┘      └──────────────────────────┘

CAP 정리

분산 시스템은 일관성(Consistency), 가용성(Availability), 분단 허용성(Partition Tolerance) 세 가지를 동시에 만족할 수 없습니다.

        일관성 (C)
        ╱       ╲
       ╱   CA    ╲
      ╱  (단일서버)╲
     ╱               ╲
    ╱                  ╲
가용성 (A) ──── 분단 허용성 (P)
         AP / CP

네트워크 분단은 불가피하므로:
- CP 시스템: 일관성 우선 (예: ZooKeeper, HBase)
  → 분단 시 일부 요청 거부
- AP 시스템: 가용성 우선 (예: Cassandra, DynamoDB)
  → 분단 시 불일치 허용, 나중에 수렴

4. 분산 파일 시스템

NFS (Network File System)

클라이언트 A ──→ ┐
클라이언트 B ──→ ├→ NFS 서버 ──→ 로컬 디스크
클라이언트 C ──→ ┘      │
                        └→ /exports/shared/
                            ├── project/
                            ├── data/
                            └── config/

특징:
- 원격 파일을 로컬처럼 접근 (투명성)
- POSIX 호환 인터페이스
- 클라이언트측 캐싱으로 성능 향상

GFS (Google File System)

대용량 파일을 위한 분산 파일 시스템입니다.

┌──────────────────────────────────────────┐
│            GFS 아키텍처                   │
│                                          │
│  ┌───────────────┐                       │
│  │  GFS Master   │  ← 메타데이터 관리    │
│  │  (Name Node)  │  (파일 → 청크 매핑)   │
│  └───────┬───────┘                       │
│          │                               │
│  ┌───────┼───────┬───────┐               │
│  │       │       │       │               │
│  ▼       ▼       ▼       ▼               │
│ ┌────┐ ┌────┐ ┌────┐ ┌────┐             │
│ │Chunk│ │Chunk│ │Chunk│ │Chunk│           │
│ │Srv 1│ │Srv 2│ │Srv 3│ │Srv 4│           │
│ └────┘ └────┘ └────┘ └────┘             │
│                                          │
│ 파일 → 64MB 청크로 분할                  │
│ 각 청크는 3개 서버에 복제 (내결함성)     │
└──────────────────────────────────────────┘

HDFS (Hadoop Distributed File System)

GFS의 오픈소스 구현으로, 대규모 데이터 처리에 최적화되어 있습니다.

┌─────────────────────────────────────────┐
│            HDFS 아키텍처                 │
│                                         │
│  ┌────────────┐                         │
│  │ NameNode   │ ← 메타데이터 (메모리)   │
│  │            │   파일명→블록 매핑       │
│  │            │   블록→DataNode 매핑     │
│  └─────┬──────┘                         │
│        │ Heartbeat + Block Report       │
│  ┌─────┼──────┬──────────┐              │
│  ▼     ▼      ▼          ▼              │
│ ┌────┐┌────┐┌────┐    ┌────┐            │
│ │DN 1││DN 2││DN 3│    │DN 4│            │
│ │    ││    ││    │    │    │            │
│ │Blk1││Blk1││Blk2│    │Blk2│            │
│ │Blk3││Blk2││Blk3│    │Blk1│            │
│ └────┘└────┘└────┘    └────┘            │
│                                         │
│ 기본 블록 크기: 128MB                    │
│ 복제 계수: 3 (기본)                      │
└─────────────────────────────────────────┘

// HDFS Java API 예시
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.FSDataInputStream;
import java.io.BufferedReader;
import java.io.InputStreamReader;

public class HdfsExample {
    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        FileSystem fs = FileSystem.get(conf);

        // 파일 쓰기
        Path writePath = new Path("/user/data/output.txt");
        FSDataOutputStream out = fs.create(writePath);
        out.writeUTF("Hello, HDFS!");
        out.close();

        // 파일 읽기
        Path readPath = new Path("/user/data/output.txt");
        FSDataInputStream in = fs.open(readPath);
        BufferedReader reader = new BufferedReader(
            new InputStreamReader(in)
        );
        String line;
        while ((line = reader.readLine()) != null) {
            System.out.println(line);
        }
        reader.close();

        // 파일 삭제
        fs.delete(new Path("/user/data/output.txt"), false);
        fs.close();
    }
}

5. MapReduce

대규모 데이터를 병렬로 처리하는 프로그래밍 모델입니다.

입력 데이터 (3개 분할):

Split 1: "hello world hello"
Split 2: "world foo hello"
Split 3: "bar foo hello"

     │           │           │
     ▼           ▼           ▼
  ┌──────┐   ┌──────┐   ┌──────┐
  │Map 1 │   │Map 2 │   │Map 3 │   ← Map 단계
  └──┬───┘   └──┬───┘   └──┬───┘
     │           │           │
  (hello,1)   (world,1)   (bar,1)
  (world,1)   (foo,1)     (foo,1)
  (hello,1)   (hello,1)   (hello,1)
     │           │           │
     └─────┬─────┘───────────┘
           │
     Shuffle & Sort (키별 그룹화)
           │
  ┌────────┴────────────────────────┐
  │ bar:    [1]                     │
  │ foo:    [1, 1]                  │
  │ hello:  [1, 1, 1, 1]           │
  │ world:  [1, 1]                 │
  └────────┬────────────────────────┘
           │
     ┌─────┴──────┐
     ▼            ▼
  ┌──────┐   ┌──────┐
  │Red. 1│   │Red. 2│   ← Reduce 단계
  └──┬───┘   └──┬───┘
     │           │
  bar:1       hello:4
  foo:2       world:2

// WordCount MapReduce 예시 (Java)
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import java.io.IOException;

// Mapper: 각 단어를 (단어, 1) 쌍으로 출력
public class WordCountMapper
    extends Mapper<Object, Text, Text, IntWritable> {

    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();

    public void map(Object key, Text value, Context context)
        throws IOException, InterruptedException {
        String[] words = value.toString().split("\\s+");
        for (String w : words) {
            word.set(w);
            context.write(word, one);
        }
    }
}

// Reducer: 같은 키의 값들을 합산
public class WordCountReducer
    extends Reducer<Text, IntWritable, Text, IntWritable> {

    public void reduce(Text key, Iterable<IntWritable> values,
                       Context context)
        throws IOException, InterruptedException {
        int sum = 0;
        for (IntWritable val : values) {
            sum += val.get();
        }
        context.write(key, new IntWritable(sum));
    }
}

6. 클라우드 스토리지

클라우드 환경에서 대규모 데이터를 저장하고 관리하는 서비스입니다.

┌────────────────────────────────────────────────┐
│          클라우드 스토리지 서비스 비교           │
│                                                │
│  AWS:                                          │
│  ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐          │
│  │ S3   │ │ EBS  │ │ EFS  │ │Glacier│         │
│  │객체  │ │블록  │ │파일  │ │아카이브│         │
│  └──────┘ └──────┘ └──────┘ └──────┘          │
│                                                │
│  GCP:                                          │
│  ┌──────────┐ ┌──────────┐ ┌──────────┐       │
│  │ Cloud    │ │Persistent│ │ Filestore│       │
│  │ Storage  │ │ Disk     │ │          │       │
│  └──────────┘ └──────────┘ └──────────┘       │
│                                                │
│  스토리지 클래스 (S3 기준):                     │
│  Standard → IA → Glacier → Deep Archive       │
│  (핫)      (웜)  (콜드)    (아카이브)           │
│  접근 빈도 높음 ─────────→ 접근 빈도 낮음      │
│  비용 높음 ──────────────→ 비용 낮음           │
└────────────────────────────────────────────────┘

7. 분산 조정 (Distributed Coordination)

합의 알고리즘 - Raft

분산 시스템에서 노드들이 동일한 상태에 합의하는 알고리즘입니다.

Raft 리더 선출:

Node A (Follower)  Node B (Leader)  Node C (Follower)
    │                   │                │
    │  ←── Heartbeat ── │ ── Heartbeat → │
    │                   │                │
    │                   X (장애 발생)     │
    │                   │                │
    │  (타임아웃)       │                │
    │  → Candidate!     │                │
    │                   │                │
    │ ── RequestVote ────────────────→   │
    │ ←──────────────── Vote ────────   │
    │                                    │
    │  → 새로운 Leader!                  │
    │                                    │
    │ ── AppendEntries (Heartbeat) ──→   │

ZooKeeper

분산 애플리케이션을 위한 조정 서비스입니다.

┌──────────────────────────────────────┐
│        ZooKeeper 앙상블              │
│                                      │
│  ┌────────┐ ┌────────┐ ┌────────┐   │
│  │Server 1│ │Server 2│ │Server 3│   │
│  │(Leader)│ │(Follow.)│ │(Follow.)│  │
│  └────────┘ └────────┘ └────────┘   │
│                                      │
│  ZNode 트리 구조:                    │
│  /                                   │
│  ├── /config                         │
│  │   ├── /config/database            │
│  │   └── /config/cache               │
│  ├── /locks                          │
│  │   └── /locks/resource1            │
│  └── /members                        │
│      ├── /members/node1              │
│      └── /members/node2              │
└──────────────────────────────────────┘

// ZooKeeper 분산 잠금 예시 (Java, 의사 코드)
import org.apache.zookeeper.ZooKeeper;
import org.apache.zookeeper.CreateMode;
import org.apache.zookeeper.ZooDefs;

public class DistributedLock {
    private ZooKeeper zk;
    private String lockPath;

    public DistributedLock(ZooKeeper zk, String resource) {
        this.zk = zk;
        this.lockPath = "/locks/" + resource;
    }

    public void lock() throws Exception {
        // 임시 순차 노드 생성
        String myNode = zk.create(
            lockPath + "/lock-",
            new byte[0],
            ZooDefs.Ids.OPEN_ACL_UNSAFE,
            CreateMode.EPHEMERAL_SEQUENTIAL
        );

        while (true) {
            // 가장 작은 번호의 노드가 잠금 소유자
            var children = zk.getChildren(lockPath, false);
            String smallest = children.stream()
                .sorted()
                .findFirst()
                .orElse(null);

            if (myNode.endsWith(smallest)) {
                return; // 잠금 획득
            }
            // 그렇지 않으면 이전 노드 감시하며 대기
            Thread.sleep(100);
        }
    }

    public void unlock() throws Exception {
        zk.delete(lockPath, -1);
    }
}

8. 정리

네트워크: TCP/IP 4계층 모델, TCP(신뢰성) vs UDP(빠름)
분산 시스템: 자원 공유와 확장성의 이점, CAP 정리의 트레이드오프
분산 파일 시스템: NFS(전통), GFS/HDFS(대규모 데이터)
MapReduce: Map(분산 처리) + Reduce(집계)로 병렬 데이터 처리
클라우드 스토리지: 객체/블록/파일 스토리지, 티어링으로 비용 최적화
분산 조정: Raft 합의 알고리즘, ZooKeeper 조정 서비스

퀴즈: 네트워크와 분산 시스템

Q1. CAP 정리에서 CP 시스템과 AP 시스템의 차이점은?

A1. CP 시스템(예: ZooKeeper)은 네트워크 분단 시 일관성을 유지하기 위해 일부 요청을 거부하여 가용성을 포기합니다. AP 시스템(예: Cassandra)은 네트워크 분단 시에도 모든 요청에 응답하지만, 노드 간 데이터가 일시적으로 불일치할 수 있으며, 나중에 수렴(eventual consistency)합니다.

Q2. HDFS에서 데이터 블록을 3개 복제하는 이유는?

A2. 3개 복제는 내결함성과 성능의 균형을 제공합니다. 1개의 DataNode가 장애를 일으켜도 나머지 2개의 복제본에서 데이터를 읽을 수 있으며, 읽기 요청을 여러 복제본에 분산하여 처리량을 높일 수 있습니다. HDFS는 복제본을 서로 다른 랙에 배치하여 랙 수준의 장애에도 대비합니다.

Q3. MapReduce에서 Shuffle 단계의 역할은?

A3. Shuffle 단계는 Map 출력을 키(key)별로 정렬하고 그룹화하여 같은 키를 가진 모든 값을 하나의 Reducer에 전달합니다. 이 과정에서 네트워크를 통한 데이터 전송이 발생하므로 MapReduce에서 가장 비용이 큰 단계이며, 성능 최적화의 핵심 대상입니다.

[OS Concepts] 19. Networks and Distributed Systems

Networks and Distributed Systems

Modern computing environments operate through multiple networked computers working together. This article examines networking fundamentals, distributed system concepts and challenges, distributed file systems, MapReduce, and distributed coordination mechanisms.

1. Network Types

┌───────────────────────────────────────────────┐
│         Network Classification by Scale        │
│                                               │
│  PAN (Personal Area Network):                 │
│  ┌───────────────────┐                        │
│  │ Bluetooth, NFC     │  ~ meters             │
│  └───────────────────┘                        │
│                                               │
│  LAN (Local Area Network):                    │
│  ┌───────────────────┐                        │
│  │ Ethernet, Wi-Fi   │  Building/campus       │
│  │ 1~10 Gbps         │                        │
│  └───────────────────┘                        │
│                                               │
│  MAN (Metropolitan Area Network):             │
│  ┌───────────────────┐                        │
│  │ City-scale network│                        │
│  └───────────────────┘                        │
│                                               │
│  WAN (Wide Area Network):                     │
│  ┌───────────────────┐                        │
│  │ Cross-country/     │  Internet             │
│  │ continent          │                        │
│  └───────────────────┘                        │
└───────────────────────────────────────────────┘

2. TCP/IP Model

┌─────────────────────────────────────────────┐
│           TCP/IP 4-Layer Model              │
│                                             │
│  ┌─────────────────┐                        │
│  │ Application     │  HTTP, FTP, DNS, SSH   │
│  │                 │  Socket API            │
│  ├─────────────────┤                        │
│  │ Transport       │  TCP (reliable, ordered)│
│  │                 │  UDP (fast, connless)   │
│  ├─────────────────┤                        │
│  │ Internet        │  IP (addressing,routing)│
│  │                 │  ICMP, ARP             │
│  ├─────────────────┤                        │
│  │ Link            │  Ethernet, Wi-Fi       │
│  │                 │  MAC addresses         │
│  └─────────────────┘                        │
└─────────────────────────────────────────────┘

TCP vs UDP

Property	TCP	UDP
Connection	Connection-oriented (3-way)	Connectionless
Reliability	Guaranteed (retransmit)	Not guaranteed
Speed	Relatively slower	Fast
Use cases	Web, email, file transfer	Streaming, DNS, games

Socket Programming Example

// TCP 서버 예시 (C)
#include <stdio.h>
#include <string.h>
#include <sys/socket.h>
#include <arpa/inet.h>
#include <unistd.h>

int main() {
    int server_fd, client_fd;
    struct sockaddr_in server_addr, client_addr;
    socklen_t client_len = sizeof(client_addr);
    char buffer[1024];

    // 1. 소켓 생성
    server_fd = socket(AF_INET, SOCK_STREAM, 0);

    // 2. 주소 바인딩
    server_addr.sin_family = AF_INET;
    server_addr.sin_addr.s_addr = INADDR_ANY;
    server_addr.sin_port = htons(8080);
    bind(server_fd, (struct sockaddr *)&server_addr,
         sizeof(server_addr));

    // 3. 수신 대기
    listen(server_fd, 5);
    printf("서버 대기 중 (포트 8080)...\n");

    // 4. 연결 수락
    client_fd = accept(server_fd,
                       (struct sockaddr *)&client_addr,
                       &client_len);

    // 5. 데이터 수신
    int bytes = recv(client_fd, buffer, sizeof(buffer) - 1, 0);
    buffer[bytes] = '\0';
    printf("수신: %s\n", buffer);

    // 6. 응답 전송
    const char *response = "Hello from server!";
    send(client_fd, response, strlen(response), 0);

    // 7. 연결 종료
    close(client_fd);
    close(server_fd);
    return 0;
}

// TCP 클라이언트 예시 (C)
#include <stdio.h>
#include <string.h>
#include <sys/socket.h>
#include <arpa/inet.h>
#include <unistd.h>

int main() {
    int sock;
    struct sockaddr_in server_addr;
    char buffer[1024];

    sock = socket(AF_INET, SOCK_STREAM, 0);

    server_addr.sin_family = AF_INET;
    server_addr.sin_port = htons(8080);
    inet_pton(AF_INET, "127.0.0.1", &server_addr.sin_addr);
    connect(sock, (struct sockaddr *)&server_addr,
            sizeof(server_addr));

    const char *message = "Hello from client!";
    send(sock, message, strlen(message), 0);

    int bytes = recv(sock, buffer, sizeof(buffer) - 1, 0);
    buffer[bytes] = '\0';
    printf("서버 응답: %s\n", buffer);

    close(sock);
    return 0;
}

3. Distributed Systems

Multiple independent computers cooperate over a network to function as a single system.

Advantages and Challenges

Advantages:                        Challenges:
┌──────────────────────────┐      ┌──────────────────────────┐
│ Resource sharing          │      │ Network failures          │
│ - Share storage, compute │      │ - Partial failure handling│
│                          │      │                          │
│ Scalability              │      │ Consistency               │
│ - Horizontal scaling     │      │ - Data replica consistency│
│                          │      │                          │
│ Availability             │      │ Security                  │
│ - Auto-recovery on       │      │ - Network communication  │
│   node failure           │      │   security               │
│                          │      │                          │
│ Performance              │      │ Clock synchronization    │
│ - Parallel processing    │      │ - Event ordering in      │
│   throughput             │      │   distributed env        │
└──────────────────────────┘      └──────────────────────────┘

CAP Theorem

A distributed system cannot simultaneously satisfy Consistency, Availability, and Partition Tolerance.

        Consistency (C)
        ╱       ╲
       ╱   CA    ╲
      ╱ (single   ╲
     ╱   server)    ╲
    ╱                 ╲
Availability (A) ── Partition Tolerance (P)
         AP / CP

Since network partitions are inevitable:
- CP system: Consistency first (e.g., ZooKeeper, HBase)
  → Rejects some requests during partition
- AP system: Availability first (e.g., Cassandra, DynamoDB)
  → Allows inconsistency during partition, converges later

4. Distributed File Systems

NFS (Network File System)

Client A ──→ ┐
Client B ──→ ├→ NFS Server ──→ Local Disk
Client C ──→ ┘      │
                    └→ /exports/shared/
                        ├── project/
                        ├── data/
                        └── config/

Features:
- Access remote files as local (transparency)
- POSIX-compatible interface
- Client-side caching for performance

GFS (Google File System)

A distributed file system designed for large files.

┌──────────────────────────────────────────┐
│            GFS Architecture              │
│                                          │
│  ┌───────────────┐                       │
│  │  GFS Master   │  ← Metadata mgmt     │
│  │  (Name Node)  │  (file → chunk map)  │
│  └───────┬───────┘                       │
│          │                               │
│  ┌───────┼───────┬───────┐               │
│  ▼       ▼       ▼       ▼               │
│ ┌────┐ ┌────┐ ┌────┐ ┌────┐             │
│ │Chunk│ │Chunk│ │Chunk│ │Chunk│           │
│ │Srv 1│ │Srv 2│ │Srv 3│ │Srv 4│           │
│ └────┘ └────┘ └────┘ └────┘             │
│                                          │
│ Files → split into 64MB chunks           │
│ Each chunk replicated to 3 servers       │
└──────────────────────────────────────────┘

HDFS (Hadoop Distributed File System)

Open-source implementation of GFS, optimized for large-scale data processing.

┌─────────────────────────────────────────┐
│            HDFS Architecture            │
│                                         │
│  ┌────────────┐                         │
│  │ NameNode   │ ← Metadata (in memory) │
│  │            │   Filename→block map    │
│  │            │   Block→DataNode map    │
│  └─────┬──────┘                         │
│        │ Heartbeat + Block Report       │
│  ┌─────┼──────┬──────────┐              │
│  ▼     ▼      ▼          ▼              │
│ ┌────┐┌────┐┌────┐    ┌────┐            │
│ │DN 1││DN 2││DN 3│    │DN 4│            │
│ │Blk1││Blk1││Blk2│    │Blk2│            │
│ │Blk3││Blk2││Blk3│    │Blk1│            │
│ └────┘└────┘└────┘    └────┘            │
│                                         │
│ Default block size: 128MB               │
│ Replication factor: 3 (default)         │
└─────────────────────────────────────────┘

// HDFS Java API 예시
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.FSDataInputStream;
import java.io.BufferedReader;
import java.io.InputStreamReader;

public class HdfsExample {
    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        FileSystem fs = FileSystem.get(conf);

        // 파일 쓰기
        Path writePath = new Path("/user/data/output.txt");
        FSDataOutputStream out = fs.create(writePath);
        out.writeUTF("Hello, HDFS!");
        out.close();

        // 파일 읽기
        Path readPath = new Path("/user/data/output.txt");
        FSDataInputStream in = fs.open(readPath);
        BufferedReader reader = new BufferedReader(
            new InputStreamReader(in)
        );
        String line;
        while ((line = reader.readLine()) != null) {
            System.out.println(line);
        }
        reader.close();

        // 파일 삭제
        fs.delete(new Path("/user/data/output.txt"), false);
        fs.close();
    }
}

5. MapReduce

A programming model for processing large-scale data in parallel.

Input data (3 splits):

Split 1: "hello world hello"
Split 2: "world foo hello"
Split 3: "bar foo hello"

     │           │           │
     ▼           ▼           ▼
  ┌──────┐   ┌──────┐   ┌──────┐
  │Map 1 │   │Map 2 │   │Map 3 │   ← Map phase
  └──┬───┘   └──┬───┘   └──┬───┘
     │           │           │
  (hello,1)   (world,1)   (bar,1)
  (world,1)   (foo,1)     (foo,1)
  (hello,1)   (hello,1)   (hello,1)
     │           │           │
     └─────┬─────┘───────────┘
           │
     Shuffle & Sort (group by key)
           │
  ┌────────┴────────────────────────┐
  │ bar:    [1]                     │
  │ foo:    [1, 1]                  │
  │ hello:  [1, 1, 1, 1]           │
  │ world:  [1, 1]                 │
  └────────┬────────────────────────┘
           │
     ┌─────┴──────┐
     ▼            ▼
  ┌──────┐   ┌──────┐
  │Red. 1│   │Red. 2│   ← Reduce phase
  └──┬───┘   └──┬───┘
     │           │
  bar:1       hello:4
  foo:2       world:2

// WordCount MapReduce 예시 (Java)
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import java.io.IOException;

public class WordCountMapper
    extends Mapper<Object, Text, Text, IntWritable> {

    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();

    public void map(Object key, Text value, Context context)
        throws IOException, InterruptedException {
        String[] words = value.toString().split("\\s+");
        for (String w : words) {
            word.set(w);
            context.write(word, one);
        }
    }
}

public class WordCountReducer
    extends Reducer<Text, IntWritable, Text, IntWritable> {

    public void reduce(Text key, Iterable<IntWritable> values,
                       Context context)
        throws IOException, InterruptedException {
        int sum = 0;
        for (IntWritable val : values) {
            sum += val.get();
        }
        context.write(key, new IntWritable(sum));
    }
}

6. Cloud Storage

Services for storing and managing large-scale data in cloud environments.

┌────────────────────────────────────────────────┐
│       Cloud Storage Service Comparison          │
│                                                │
│  AWS:                                          │
│  ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐          │
│  │ S3   │ │ EBS  │ │ EFS  │ │Glacier│         │
│  │Object│ │Block │ │File  │ │Archive│         │
│  └──────┘ └──────┘ └──────┘ └──────┘          │
│                                                │
│  GCP:                                          │
│  ┌──────────┐ ┌──────────┐ ┌──────────┐       │
│  │ Cloud    │ │Persistent│ │ Filestore│       │
│  │ Storage  │ │ Disk     │ │          │       │
│  └──────────┘ └──────────┘ └──────────┘       │
│                                                │
│  Storage Classes (S3):                         │
│  Standard → IA → Glacier → Deep Archive       │
│  (Hot)     (Warm) (Cold)   (Archive)          │
│  High access ───────────→ Low access          │
│  High cost ──────────────→ Low cost           │
└────────────────────────────────────────────────┘

7. Distributed Coordination

Consensus Algorithm - Raft

An algorithm for nodes in a distributed system to agree on the same state.

Raft Leader Election:

Node A (Follower)  Node B (Leader)  Node C (Follower)
    │                   │                │
    │  ←── Heartbeat ── │ ── Heartbeat → │
    │                   │                │
    │                   X (failure)      │
    │                   │                │
    │  (timeout)        │                │
    │  → Candidate!     │                │
    │                   │                │
    │ ── RequestVote ────────────────→   │
    │ ←──────────────── Vote ────────   │
    │                                    │
    │  → New Leader!                     │
    │                                    │
    │ ── AppendEntries (Heartbeat) ──→   │

ZooKeeper

A coordination service for distributed applications.

┌──────────────────────────────────────┐
│        ZooKeeper Ensemble            │
│                                      │
│  ┌────────┐ ┌────────┐ ┌────────┐   │
│  │Server 1│ │Server 2│ │Server 3│   │
│  │(Leader)│ │(Follow.)│ │(Follow.)│  │
│  └────────┘ └────────┘ └────────┘   │
│                                      │
│  ZNode tree structure:              │
│  /                                   │
│  ├── /config                         │
│  │   ├── /config/database            │
│  │   └── /config/cache               │
│  ├── /locks                          │
│  │   └── /locks/resource1            │
│  └── /members                        │
│      ├── /members/node1              │
│      └── /members/node2              │
└──────────────────────────────────────┘

// ZooKeeper 분산 잠금 예시 (Java, 의사 코드)
import org.apache.zookeeper.ZooKeeper;
import org.apache.zookeeper.CreateMode;
import org.apache.zookeeper.ZooDefs;

public class DistributedLock {
    private ZooKeeper zk;
    private String lockPath;

    public DistributedLock(ZooKeeper zk, String resource) {
        this.zk = zk;
        this.lockPath = "/locks/" + resource;
    }

    public void lock() throws Exception {
        String myNode = zk.create(
            lockPath + "/lock-",
            new byte[0],
            ZooDefs.Ids.OPEN_ACL_UNSAFE,
            CreateMode.EPHEMERAL_SEQUENTIAL
        );

        while (true) {
            var children = zk.getChildren(lockPath, false);
            String smallest = children.stream()
                .sorted()
                .findFirst()
                .orElse(null);

            if (myNode.endsWith(smallest)) {
                return; // 잠금 획득
            }
            Thread.sleep(100);
        }
    }

    public void unlock() throws Exception {
        zk.delete(lockPath, -1);
    }
}

8. Summary

Networks: TCP/IP 4-layer model, TCP (reliability) vs UDP (speed)
Distributed Systems: Benefits of resource sharing and scalability, CAP theorem trade-offs
Distributed File Systems: NFS (traditional), GFS/HDFS (large-scale data)
MapReduce: Map (distributed processing) + Reduce (aggregation) for parallel data processing
Cloud Storage: Object/block/file storage, cost optimization through tiering
Distributed Coordination: Raft consensus algorithm, ZooKeeper coordination service

Quiz: Networks and Distributed Systems

Q1. What is the difference between CP and AP systems in the CAP theorem?

A1. CP systems (e.g., ZooKeeper) maintain consistency during network partitions by rejecting some requests, sacrificing availability. AP systems (e.g., Cassandra) respond to all requests even during partitions, but data across nodes may be temporarily inconsistent, eventually converging (eventual consistency).

Q2. Why does HDFS replicate data blocks to 3 copies?

A2. Triple replication provides a balance between fault tolerance and performance. Even if one DataNode fails, data can be read from the remaining 2 replicas, and read requests can be distributed across replicas to increase throughput. HDFS also places replicas on different racks to guard against rack-level failures.

Q3. What is the role of the Shuffle phase in MapReduce?

A3. The Shuffle phase sorts and groups Map output by key, delivering all values with the same key to a single Reducer. This involves data transfer over the network, making it the most expensive phase in MapReduce and a key target for performance optimization.