Skip to content

필사 모드: Distributed Storage 2026 Deep Dive — MinIO, SeaweedFS, Ceph, Garage, JuiceFS, OpenEBS, Longhorn, Rook, R2, S3 Express

English
0%
정확도 0%
💡 왼쪽 원문을 읽으면서 오른쪽에 따라 써보세요. Tab 키로 힌트를 받을 수 있습니다.
원문 렌더가 준비되기 전까지 텍스트 가이드로 표시합니다.

Prologue — In 2026, "S3 Is the New Disk"

Through the 2010s, the storage answer was simple. **Block (SAN, iSCSI), file (NFS, SMB), object (S3, Swift).** Three clean categories, each owning different workloads.

In 2026, the landscape looks completely different.

- **Object storage has become disk.** AWS shipped S3 Express One Zone (single AZ, single-digit ms latency), and S3 Tables made Iceberg a first-class citizen — making "run a database on top of S3" a serious option. Crunchy Bridge for Analytics keeps Postgres data on S3, and DuckDB treats parquet-on-S3 as a primary interface.

- **Block storage is abstracted through CSI.** On Kubernetes, "mount EBS" really means "the EBS CSI driver provisions a PV," and OpenEBS / Longhorn / Rook now occupy that slot for non-cloud workloads.

- **Egress fees became the new lock-in.** Cloudflare R2 weaponized zero-egress to threaten S3. Backblaze B2 and Wasabi play the same card. AWS responded by effectively freezing S3 Standard pricing and differentiating with Express One Zone and Storage Lens.

At the center of all of this sits **MinIO** — a single binary, the S3 API, the same code everywhere. MinIO has effectively become the reference implementation of the S3 protocol.

This article maps the entire 2026 distributed storage stack — MinIO, SeaweedFS, Ceph, Garage, JuiceFS, OpenEBS, Longhorn, Rook, Portworx, Lightbits, DRBD, GlusterFS, HDFS, R2, B2, Wasabi, Storj, S3 Express, S3 Tables, NHN Cloud, Naver Cloud, KT Cloud, Sakura, IIJ GIO — and the local filesystems beneath: ZFS, btrfs, XFS, ext4, bcachefs, F2FS.

1. The Distributed Storage Map — Object, Block, File

Before looking at any tool, you have to see where each category fits.

| Model | Interface | Representative systems | Consistency | Workloads |

|---|---|---|---|---|

| Object | HTTP REST (S3 API) | MinIO, Ceph RGW, SeaweedFS, Garage, R2, B2, Wasabi, S3 | Strong read-after-write | Photos, backups, data lakes, logs |

| Block | iSCSI, NVMe-oF, CSI | Ceph RBD, OpenEBS Mayastor, Longhorn, Portworx, Lightbits | Strong | Databases, K8s PVs, container root |

| File (POSIX) | NFS, CephFS, FUSE | CephFS, JuiceFS, GlusterFS, LizardFS, S3FS | POSIX | Legacy apps, shared workspaces, HPC |

| Distributed FS (legacy) | HDFS API, MapReduce | HDFS, Hadoop, Alluxio | Weak | Big data (declining) |

One-line takeaway: **"Object storage is the new disk. Block and file hide behind the CSI abstraction."**

The deeper insight: object storage is no longer just "where photos go." When S3 Express One Zone shipped single-digit ms latency in 2024, database engines started seriously evaluating patterns like "write the WAL to S3 and query directly on top." Neon, Crunchy Bridge, DuckLake, and Apache Iceberg all push in that direction.

2. MinIO — The Single-Binary S3 Standard

MinIO is a single-binary object store written in Go. It started in 2014 and by 2026 has effectively become the reference implementation of the S3 protocol. The same code runs on Kubernetes, on bare metal, and on a Raspberry Pi cluster.

Key traits: 100% S3 API compatibility (no client code changes when migrating), Erasure Coding built-in (4 data + 2 parity is typical), single binary (zero dependencies, 30-second install), AGPL v3 license (with a commercial track via MinIO Enterprise).

Let's look at everything from a docker run to a Kubernetes Operator.

MinIO container — single node, 4 disks

docker run -d --name minio \

-p 9000:9000 -p 9001:9001 \

-e "MINIO_ROOT_USER=admin" \

-e "MINIO_ROOT_PASSWORD=changeme123" \

-v /mnt/data1:/data1 -v /mnt/data2:/data2 \

-v /mnt/data3:/data3 -v /mnt/data4:/data4 \

quay.io/minio/minio server /data{1...4} --console-address ":9001"

Use mc (MinIO Client) to make a bucket

mc alias set local http://localhost:9000 admin changeme123

mc mb local/backups

mc cp ./backup.tar.gz local/backups/

mc ls local/backups

On Kubernetes you use the MinIO Operator. Below is the core of a distributed 4-node, 16-disk cluster manifest.

apiVersion: minio.min.io/v2

kind: Tenant

metadata:

name: storage-prod

namespace: minio

spec:

pools:

- servers: 4

volumesPerServer: 4

name: pool-0

volumeClaimTemplate:

spec:

accessModes: ["ReadWriteOnce"]

resources:

requests:

storage: 1Ti

storageClassName: local-nvme

mountPath: /export

requestAutoCert: true

features:

bucketDNS: true

domains:

console: console.minio.example.com

minio:

- s3.minio.example.com

MinIO's real strength is simplicity. Unlike Ceph, you don't manage separate OSD, MON, MGR, and MDS daemons. That's why "just use MinIO" became the default when a new S3 backend is needed in 2026.

3. SeaweedFS — Object Storage Inspired by Facebook Haystack

SeaweedFS is an object store inspired by Facebook's Haystack paper. It's in the same category as MinIO but with a different design philosophy.

- **Master/Volume separation** — Volume Servers optimized for small files.

- **WebDAV, S3, and FUSE gateways** — POSIX, S3, and WebDAV simultaneously.

- **Tiered storage** — Hot on SSD, cold automatically migrated to HDD or S3.

- **Hybrid Erasure Coding + Replication** — Replicate hot data, EC the cold.

Single node — Master + Volume + Filer + S3 gateway in one

weed server -dir=/data -master.port=9333 -volume.port=8080 \

-filer -s3 -s3.port=8333

Add a Volume node on a separate machine

weed volume -dir=/mnt/disk1 -mserver=master.example.com:9333 \

-port=8080 -max=100

Upload via S3 API (AWS CLI as-is)

aws --endpoint-url http://localhost:8333 s3 cp ./video.mp4 s3://media/

FUSE mount — looks like POSIX

weed mount -filer=filer.example.com:8888 -dir=/mnt/seaweed -filer.path=/

SeaweedFS shines for workloads with many small files (image thumbnails, log chunks). Volume Servers pack files into a single large container file, so inode pressure disappears.

4. Ceph — Still the Enterprise On-Prem Standard

Ceph started as Sage Weil's 2007 PhD thesis and remains the default answer for enterprise on-prem storage in 2026. A single system delivers object, block, and file.

- **RGW (Rados Gateway)** — S3/Swift API.

- **RBD (Rados Block Device)** — distributed block device, mounted directly by QEMU/KVM.

- **CephFS** — POSIX distributed filesystem.

- **CRUSH algorithm** — data placement computed without a central metadata server.

Core truth: Ceph is powerful but operationally heavy. You manage OSD, MON, MGR, MDS, and RGW. That's why `cephadm` became the standard around 2018, and on Kubernetes **Rook** took over.

Bootstrap with cephadm

cephadm bootstrap --mon-ip 10.0.0.10 \

--initial-dashboard-user admin \

--initial-dashboard-password changeme123

Add OSDs — all available disks on each node

ceph orch host add node2 10.0.0.11

ceph orch host add node3 10.0.0.12

ceph orch apply osd --all-available-devices

Deploy RGW (S3 API)

ceph orch apply rgw default --placement="3 node1 node2 node3"

Create a pool + user

ceph osd pool create rgw-data 256 256 erasure

radosgw-admin user create --uid=appuser --display-name="App User"

Create + map an RBD image

rbd create app-disk --size 10G --pool rbd-pool

rbd map app-disk --pool rbd-pool

mkfs.xfs /dev/rbd0

5. Rook — The Ceph Operator on Kubernetes

Rook is a CNCF Graduated project that wraps Ceph in a Kubernetes Operator. One-line summary: "The standard way to run Ceph on K8s." Rook reads a CephCluster CR (custom resource) and creates the OSD DaemonSet, MON, MGR, and MDS pods. RGW and CephFS are separate CRs.

apiVersion: ceph.rook.io/v1

kind: CephCluster

metadata:

name: rook-ceph

namespace: rook-ceph

spec:

cephVersion:

image: quay.io/ceph/ceph:v19.2.0

dataDirHostPath: /var/lib/rook

mon:

count: 3

allowMultiplePerNode: false

mgr:

count: 2

dashboard:

enabled: true

ssl: true

storage:

useAllNodes: true

useAllDevices: true

config:

osdsPerDevice: "1"

apiVersion: ceph.rook.io/v1

kind: CephFilesystem

metadata:

name: cephfs

namespace: rook-ceph

spec:

metadataPool:

replicated:

size: 3

dataPools:

- name: data

replicated:

size: 3

preservePoolsOnDelete: true

metadataServer:

activeCount: 2

activeStandby: true

When you need a distributed file system on Kubernetes in 2026, Rook is the most battle-tested option. The catch: you inherit Ceph's underlying complexity.

6. Garage — Deuxfleurs' Lightweight S3

Garage is an S3-compatible object store built by the Deuxfleurs co-op in France. Their motivation: "Ceph is heavy and MinIO distributed mode is heavy. We need something lighter."

Features: written in Rust (memory-safe, low resource use), runs on as few as 3 nodes (works fine on a Raspberry Pi cluster at home), S3-compatible (most S3 SDKs work as-is), CRDT-based metadata (simple partition recovery), AGPL v3.

Bootstrap a 3-node cluster

garage node id # prints the node ID

garage layout assign -z dc1 -c 1T <node-id-1>

garage layout assign -z dc1 -c 1T <node-id-2>

garage layout assign -z dc2 -c 1T <node-id-3>

garage layout apply --version 1

Issue an S3 key

garage key create app-key

garage bucket create photos

garage bucket allow --read --write photos --key app-key

Use AWS CLI as-is

aws --endpoint http://garage.example.com:3900 \

s3 cp ./photo.jpg s3://photos/

Garage's differentiator is optimization for "geographically distributed small clusters." DC1-DC2-DC3 are designed to keep working even when each is on a residential link. That's why the self-hosting community adopted it quickly.

7. JuiceFS, S3FS, Goofys — POSIX over Object Storage

JuiceFS layers a POSIX filesystem over object storage (S3, MinIO, R2, etc.). Metadata lives in Redis, MySQL, or TiKV; data lives in S3.

JuiceFS fits "large file, read-heavy" workloads particularly well — ML training dataset sharing, media editing workflows, backups. Because metadata sits in Redis/TiKV, metadata performance is far better than the underlying object store can offer alone.

Adjacent options like `s3fs-fuse`, `goofys`, `geesefs`, and `rclone mount` exist, but only JuiceFS implements POSIX semantics properly (hard links, atomic rename, fcntl lock). The other options have incomplete POSIX semantics, making them fit for backup, log collection, and media streaming workloads — but unsuitable as database data directories.

Format with Redis metadata + S3 data backend

juicefs format \

--storage s3 \

--bucket https://my-bucket.s3.us-east-1.amazonaws.com \

--access-key AKIA... \

--secret-key SECRET... \

redis://meta.example.com:6379/1 \

my-jfs

Mount

juicefs mount redis://meta.example.com:6379/1 /mnt/jfs

Use like any filesystem

cp -r /home/user/dataset /mnt/jfs/datasets/

ls -la /mnt/jfs/datasets/

Quick alternative — mount R2 with rclone

rclone mount r2:my-bucket /mnt/r2 \

--vfs-cache-mode writes \

--dir-cache-time 1m \

--buffer-size 32M

8. OpenEBS — Kubernetes-Native Container Attached Storage

OpenEBS positions itself as "Container Attached Storage" for Kubernetes. It abstracts node-local storage through a set of engines and exposes them as PVs.

| Engine | Trait | Use case |

|---|---|---|

| Mayastor | NVMe-oF + SPDK for high performance | DBs, high-IOPS workloads |

| cStor | ZFS-based snapshot/replication | General stateful apps |

| Jiva | Lightweight, Longhorn-like | Dev/test |

| NDM | Node Device Manager — disk discovery | Base for every engine |

| LocalPV | hostpath/lvm/zfs local PV | Single-node, fast IO |

apiVersion: storage.k8s.io/v1

kind: StorageClass

metadata:

name: mayastor-3-replicas

provisioner: io.openebs.csi-mayastor

parameters:

repl: "3"

protocol: nvmf

ioTimeout: "30"

fsType: xfs

allowVolumeExpansion: true

reclaimPolicy: Delete

Mayastor uses SPDK (Storage Performance Development Kit) for kernel-bypass IO. As a result, it preserves 90%+ of raw NVMe SSD performance at the PV layer. The downside: configuring NVMe-oF targets and initiators is complex.

9. Longhorn — Rancher's Distributed Block Storage

Longhorn is a CNCF Graduated, Kubernetes-native distributed block storage system built by Rancher Labs. It continues to be actively developed under SUSE.

- **iSCSI-based** — broad compatibility.

- **Per-volume controller** — each volume gets its own controller pod for isolation.

- **Backups to S3/NFS** — snapshots + external backup as first-class features.

- **DR volume** — asynchronous replication to a different cluster.

apiVersion: longhorn.io/v1beta2

kind: Volume

metadata:

name: app-data

namespace: longhorn-system

spec:

size: "10Gi"

numberOfReplicas: 3

frontend: blockdev

staleReplicaTimeout: 30

dataLocality: best-effort

accessMode: rwo

Operationally Longhorn is simpler than OpenEBS Mayastor, though Mayastor wins on raw performance. So a common pattern is "Mayastor for DBs, Longhorn for everything else."

10. Portworx, Lightbits, DRBD, and Legacy Options

In the enterprise, these names also enter the shortlist.

- **Portworx (acquired by Pure Storage)** — PX-Store provides K8s-native block and file; PX-Backup handles K8s backup. Commercial but K8s-friendly operationally.

- **Lightbits Labs** — NVMe-oF appliance. Targets EBS-class performance on bare-metal Kubernetes.

- **DRBD (LINBIT)** — "Linux RAID over network" since 1999. Still alive in 2026 as an HA option for stateful workloads.

- **GlusterFS (legacy)** — Red Hat declared EOL, but it still runs in some legacy environments.

- **Hadoop HDFS (legacy)** — once the big-data standard, now squeezed out by object storage + Iceberg.

- **LizardFS** — a MooseFS fork. POSIX distributed FS but with a small community.

Core truth: for a new workload starting in 2026, GlusterFS and HDFS are off the table. They live in "keep the existing system alive" territory.

11. AWS S3 — Standard, Express One Zone, S3 Tables

AWS S3 launched in 2006 and remains the de facto standard for object storage in 2026. Three big changes happened between 2024 and 2026.

1. **S3 Express One Zone** — single AZ, directory buckets, single-digit ms latency. About 7x the price of Standard but 10x lower latency. Used for databases, gaming, real-time analytics.

2. **S3 Tables** — Iceberg tables hosted directly by S3. AWS handles compaction and snapshot expiry. Athena, EMR, and Redshift Spectrum query directly.

3. **S3 Storage Lens** — account-wide storage usage dashboards. Priority one for cost optimization.

Standard bucket

aws s3 mb s3://my-archive-bucket --region us-east-1

Express One Zone directory bucket (name ends with .s3express-az1--x-s3)

aws s3api create-bucket \

--bucket fast-bucket--use1-az4--x-s3 \

--region us-east-1 \

--create-bucket-configuration '{"Location":{"Type":"AvailabilityZone","Name":"use1-az4"},"Bucket":{"Type":"Directory","DataRedundancy":"SingleAvailabilityZone"}}'

S3 Tables (Iceberg)

aws s3tables create-namespace --table-bucket-arn arn:aws:s3tables:... --namespace analytics

aws s3tables create-table --namespace analytics --name events --format ICEBERG

Enable Object Lock for immutable backups

aws s3api create-bucket --bucket immutable-backups \

--object-lock-enabled-for-bucket

aws s3api put-object-retention \

--bucket immutable-backups \

--key backup-2026-05-19.tar \

--retention 'Mode=COMPLIANCE,RetainUntilDate=2026-08-19T00:00:00Z'

The decisive shift that makes S3 "disk" is the Express One Zone + S3 Tables combination. Crunchy Bridge for Analytics keeps Postgres WAL on S3, DuckDB treats parquet-on-S3 as a first-class interface, and Apache Iceberg + S3 Tables has become the new standard for data lakes.

12. Cloudflare R2 — The Zero-Egress Era

Cloudflare R2 went GA in 2022 and plunged a simple knife into the object storage market — zero egress. It is S3-API-compatible, and getting your data out costs zero dollars per GB.

R2's core value is egress cost. For download-heavy workloads — video hosting, photo galleries, ML model distribution — 70-90% savings over S3 are common. R2's limits are around multi-region consistency and a subset of S3 features (some Object Lock modes, certain Versioning semantics). These have steadily narrowed in 2025-2026.

Create an R2 bucket with wrangler

wrangler r2 bucket create my-bucket

Use a normal S3 client via the R2 endpoint

aws --endpoint https://<account-id>.r2.cloudflarestorage.com \

s3 cp ./video.mp4 s3://my-bucket/

Sync S3 -> R2 with rclone

rclone sync s3:my-bucket r2:my-bucket \

--transfers 32 --checkers 64 --fast-list

13. Backblaze B2, Wasabi, Storj, Filebase, Tigris

R2 cleared the road, but several similar object stores existed before it.

- **Backblaze B2** — around $0.005/GB since 2015. Egress through Cloudflare is zero (Bandwidth Alliance). B2 Live Read is strong for large-media streaming.

- **Wasabi Hot Storage** — zero egress, zero API call charges. 90-day minimum storage commitment.

- **Storj DCS** — decentralized storage. Data is erasure-coded into chunks distributed across 80 nodes.

- **Filebase** — IPFS/Sia-style decentralized storage hidden behind an S3 API.

- **Tigris Data** — DynamoDB-compatible serverless object storage with global distribution.

Selection guide: for download-light backups, B2 is cheapest. Download-heavy → R2. Compressed datasets → Storj fits the decentralized model. If absolute S3 API fidelity is required, Wasabi is the most faithful.

14. GCP Cloud Storage and Azure Blob — The Hyperscaler Answers

GCP Cloud Storage and Azure Blob share the same category as S3 but make different design choices.

**GCP Cloud Storage:**

- Single global namespace (region/dual-region/multi-region share the same URL pattern).

- Standard / Nearline / Coldline / Archive — 4 tiers.

- Both XML API (S3-compatible) and JSON API.

- Lifecycle rules for automatic tier migration.

**Azure Blob:**

- Hot / Cool / Cold / Archive — 4 tiers.

- Three blob types: Block, Append, Page.

- ADLS Gen2 (Hierarchical Namespace) provides HDFS semantics.

- AzCopy for large-scale migration.

Key differentiation: GCS shines with the global namespace, and Azure Blob shines because ADLS Gen2 has effectively replaced HDFS — Synapse and Databricks treat ADLS Gen2 as a first-class interface.

15. Korean and Japanese Cloud Object Storage

Regional clouds matter too.

**Korea:**

- **NHN Cloud Object Storage** — S3-compatible, multi-region, strong on gaming/media workloads.

- **Naver Cloud Object Storage** — S3-compatible, fits environments with Korean data-sovereignty requirements.

- **KT Cloud Object Storage** — S3-compatible, public-cloud certified.

**Japan:**

- **Sakura Internet Object Storage** — S3-compatible, Tokyo/Ishikari regions.

- **IIJ GIO Object Storage** — enterprise SLAs, Japanese data centers.

- **NTT Communications, Fujitsu Cloud** — telecom-affiliated clouds.

If data-sovereignty rules apply (Korean PIPA, Japanese APPI), regional clouds become primary candidates. Thanks to S3 API compatibility, migration usually only requires changing the endpoint in client code.

16. Local Filesystems — ZFS, btrfs, XFS, ext4, bcachefs, F2FS

Before laying down distributed storage, the local filesystem underneath also matters. The candidate list as of May 2026:

| FS | Trait | Use case |

|---|---|---|

| ext4 | The standard, most battle-tested | Default for Linux servers |

| XFS | Strong with large files and high parallelism | Ceph OSD, large file servers |

| ZFS (OpenZFS) | Snapshots, compression, dedup, checksums | NAS, DBs, OpenEBS cStor backend |

| btrfs | ZFS-like, mainline kernel | Synology NAS, some Fedora |

| bcachefs | Joined mainline in 2024 | Next-gen CoW, early production |

| F2FS | Flash-optimized | Android, mobile |

| LizardFS POSIX | Distributed FS with POSIX | Some media workflows |

ZFS's strength is essentially free snapshots. In 2024 bcachefs landed in the mainline Linux kernel, implementing ZFS-like features (snapshots, checksums, compression, tiered cache) under GPL. As of 2026 it's still "early production" — but it has a strong chance of challenging btrfs over the next 5 years.

Create a mirrored ZFS pool + snapshots + remote send

zpool create tank mirror /dev/sdb /dev/sdc

zfs set compression=zstd tank

zfs snapshot tank@daily-$(date +%F)

zfs send -i tank@yesterday tank@today | ssh backup-host zfs receive tank/replica

Put metadata on NVMe with a special vdev

zpool add tank special mirror /dev/nvme0n1 /dev/nvme1n1

zfs set dedup=on tank/backups

17. CSI — How Kubernetes Talks to Storage

Every storage integration in Kubernetes runs over **CSI (Container Storage Interface)**, standardized in 2017 and effectively the single storage interface for K8s in 2026.

A CSI driver typically consists of these containers:

- **csi-provisioner** — creates PVs in response to PVCs.

- **csi-attacher** — handles VolumeAttachment.

- **csi-resizer** — online volume expansion.

- **csi-snapshotter** — snapshots and restore.

- **node-driver-registrar** — node registration.

- **driver** — the provider-specific logic (e.g., ebs.csi.aws.com, rbd.csi.ceph.com).

| Provider | CSI driver |

|---|---|

| AWS | ebs.csi.aws.com, efs.csi.aws.com, s3.csi.aws.com |

| GCP | pd.csi.storage.gke.io, filestore.csi.storage.gke.io |

| Azure | disk.csi.azure.com, file.csi.azure.com, blob.csi.azure.com |

| Ceph | rbd.csi.ceph.com, cephfs.csi.ceph.com |

| OpenEBS | mayastor.openebs.io, openebs.io/lvm, openebs.io/zfs |

| Longhorn | driver.longhorn.io |

| MinIO | s3.csi.min.io (DirectPV) |

Most K8s clusters use a cloud provider's CSI as the primary, then layer OpenEBS, Longhorn, or Rook to turn bare-metal local disks into PVs.

18. Object Lock, WORM, and the Immutable Backup Paradigm

After ransomware attacks exploded in the mid-2020s, the backup default shifted to "immutable." Object Lock and WORM (Write Once Read Many) became central.

- **AWS S3 Object Lock** — Governance and Compliance modes. Compliance mode is unremovable even by the root user.

- **MinIO Object Lock** — S3-compatible.

- **Ceph RGW Object Lock** — supported since the Reef release.

- **R2 Object Lock** — GA in 2024.

Backup vendors built on this paradigm: Veeam, Rubrik, Cohesity, Druva. Their standard workflow is "write immutable backups to MinIO or S3." Auto-tiering — hot data on NVMe, cold on HDD/tape — became part of the same standard pattern.

19. Tiering, Dedup, Lifecycle, and Multi-Cloud Replication

Automation patterns to reduce storage cost and increase data resilience.

1. **Lifecycle rules** — move to cold tier after N days, to archive after M days, delete after K days. The S3 Lifecycle, GCS Lifecycle, and Azure Blob Management Policy all share this shape.

2. **Data tiering** — hot on NVMe, cold on HDD/tape. ZFS special vdev, OpenZFS ARC/L2ARC, SeaweedFS tiered storage embody this pattern.

3. **Deduplication** — ZFS dedup, Veeam built-in dedup, NetApp ONTAP volume dedup. For backup workloads, 5-30x reductions are common.

4. **Multi-cloud replication** — MinIO Mirror Mode, rclone sync, AWS S3 Replication (Cross-Region, Cross-Account), GCS Object Replication, R2 to S3 via Workers.

Core truth: savings in 2026 come from "how do you automatically move cold data to the cold tier." Leaving everything in S3 Standard is expensive; putting everything in Glacier slows recovery. And cross-region replication has steep transfer costs — watch for cost traps.

20. "S3 Is the New Disk" — Iceberg, DuckDB, Postgres on S3

The biggest paradigm shift between 2024 and 2026. The age of databases running directly on S3.

- **Apache Iceberg + S3 Tables** — a transactional table format that lives on S3. Athena, EMR, Trino, Spark, Snowflake query it directly.

- **DuckDB + parquet on S3** — `read_parquet('s3://...')` is first-class. Local analytics see S3 data directly.

- **Crunchy Bridge for Analytics** — Postgres data on S3, queried via FDW (Foreign Data Wrapper).

- **Neon, DuckLake** — keep WAL/checkpoint on S3 to implement "stateless DB."

- **MotherDuck** — DuckDB as a SaaS, hybrid local/cloud.

In DuckDB, after `INSTALL httpfs; LOAD httpfs;` and setting `s3_region` plus the access keys, a query like `read_parquet('s3://bucket/events/year=2026/*.parquet')` works as a first-class citizen. No ETL — DuckDB analyzes the parquet sitting on S3 directly.

The consequence: the boundary between "the storage layer of a traditional DBMS" and "object storage" is dissolving. NVMe Flash Array vendors (Pure Storage, NetApp ONTAP, Dell PowerStore) are responding by putting "an S3 object interface in front of our NVMe arrays."

21. NVMe Flash Array Vendors — Pure, NetApp, Dell PowerStore

On the enterprise hardware side, the 2026 answer is the **NVMe Flash Array**.

- **Pure Storage FlashArray //X and //XL** — all NVMe, inline dedup/compression. Acquired Portworx to unify with K8s storage.

- **NetApp ONTAP AFF (All-Flash FAS)** — since ONTAP 9.14, exposes an S3 object interface as a first-class feature. SnapMirror and SnapVault remain differentiators.

- **Dell PowerStore** — Dell's NVMe-oF array. Container Storage Module exposes K8s CSI.

- **HPE Alletra** — successor to HPE Nimble Storage. AI-driven predictive analytics.

- **VAST Data** — DASE (Disaggregated Shared-Everything) architecture. Strong for AI training datasets.

Core truth: enterprise storage in 2026 must pair "one of Pure/NetApp/Dell/HPE" with "K8s CSI" and "an S3 object interface" to make the shortlist.

22. Operations Checklist — Before Deploying Distributed Storage

What to check before standing it up.

- **Define RPO/RTO** — agree on acceptable data loss window (RPO) and recovery time (RTO) up front.

- **3-2-1 backup rule** — 3 copies, 2 different media, 1 offsite.

- **Enable Object Lock** — backup buckets must be immutable.

- **Encryption** — both at rest (SSE-S3/SSE-KMS) and in transit (TLS).

- **Monitoring** — Prometheus + Grafana on IOPS, throughput, latency, error rate.

- **Backup recovery drill** — at least once a quarter. Don't confuse "having a backup" with "having proven recovery."

- **Cost alerts** — sudden egress spikes are usually a security signal.

- **CSI driver version management** — verify compatibility on each K8s minor upgrade.

- **Back up metadata separately** — JuiceFS and Ceph metadata must be backed up apart from the data.

Core lesson: distributed storage failures usually stem not from a disk failure but from "no metadata backup" or "we never actually tested recovery."

23. Decision Matrix — What to Use When

| Situation | Recommendation | Why |

|---|---|---|

| First on-prem S3 backend | MinIO | Single binary, 30-second install |

| Large enterprise, object + block + file | Ceph + Rook | Proven, unified stack |

| Small cluster, self-hosting | Garage | Rust, works on home links |

| Many small files | SeaweedFS | Haystack model, no inode pressure |

| ML training dataset sharing | JuiceFS | POSIX over S3 |

| High-performance K8s PV for DBs | OpenEBS Mayastor | NVMe-oF + SPDK |

| General K8s stateful apps | Longhorn | Simpler operations |

| Egress-heavy workloads | Cloudflare R2 | Zero egress |

| Cold backup storage | Backblaze B2 | Cheapest per GB |

| Data sovereignty (Korea) | NHN Cloud / Naver Cloud | Local data centers |

| Data sovereignty (Japan) | Sakura / IIJ GIO | Japanese data centers |

| Data lake (Iceberg) | AWS S3 + S3 Tables | Direct Athena/EMR integration |

| Ultra-low-latency object | AWS S3 Express One Zone | Single-digit ms |

| Immutable backup | MinIO/S3 + Object Lock | Ransomware defense |

| Multi-cloud replication | rclone / MinIO mirror | Avoid vendor lock-in |

24. Looking Past 2026

- **Object storage as disk, accelerating** — once Iceberg, DuckLake, and Neon standardize, "the DB is an S3 client" becomes the default mental model.

- **The end of the egress war** — Cloudflare R2 will keep eating S3 share, and AWS will eventually play some form of egress price cut.

- **Consolidation of K8s storage** — two of OpenEBS, Longhorn, and Rook will likely merge or converge on a single standard.

- **Mainstream NVMe-oF** — Mayastor and Lightbits pull NVMe-oF into the standard answer for "RDMA-class performance on K8s."

- **bcachefs's rise** — 5 years post-mainline, it will threaten btrfs. ZFS will remain strong outside copyleft due to the license question.

- **AI workload storage demands** — training dataset sharing (JuiceFS, Alluxio), checkpoint storage (S3 Express), model distribution (R2) emerge as distinct categories.

The most important question when picking distributed storage in 2026 is no longer "which backend?" It is **"which API surface do you preserve, and which cost curve do you ride?"** And for most teams, the answer is the S3 API and CSI.

References

- MinIO documentation — [https://min.io/docs](https://min.io/docs)

- SeaweedFS — [https://seaweedfs.github.io](https://seaweedfs.github.io)

- Ceph — [https://docs.ceph.com](https://docs.ceph.com), [https://ceph.io](https://ceph.io)

- Garage by Deuxfleurs — [https://garagehq.deuxfleurs.fr](https://garagehq.deuxfleurs.fr)

- JuiceFS — [https://juicefs.com](https://juicefs.com), [https://github.com/juicedata/juicefs](https://github.com/juicedata/juicefs)

- OpenEBS — [https://openebs.io](https://openebs.io), [https://github.com/openebs/openebs](https://github.com/openebs/openebs)

- Longhorn — [https://longhorn.io](https://longhorn.io)

- Rook — [https://rook.io](https://rook.io), [https://github.com/rook/rook](https://github.com/rook/rook)

- Portworx (Pure Storage) — [https://portworx.com](https://portworx.com)

- Lightbits Labs — [https://www.lightbitslabs.com](https://www.lightbitslabs.com)

- DRBD (LINBIT) — [https://linbit.com/drbd](https://linbit.com/drbd)

- Cloudflare R2 — [https://www.cloudflare.com/products/r2](https://www.cloudflare.com/products/r2)

- Backblaze B2 — [https://www.backblaze.com/cloud-storage](https://www.backblaze.com/cloud-storage)

- Wasabi Hot Cloud Storage — [https://wasabi.com](https://wasabi.com)

- Storj DCS — [https://www.storj.io](https://www.storj.io)

- Filebase — [https://filebase.com](https://filebase.com)

- Tigris Data — [https://www.tigrisdata.com](https://www.tigrisdata.com)

- AWS S3 — [https://aws.amazon.com/s3](https://aws.amazon.com/s3), S3 Express One Zone, S3 Tables

- GCP Cloud Storage — [https://cloud.google.com/storage](https://cloud.google.com/storage)

- Azure Blob Storage — [https://learn.microsoft.com/azure/storage/blobs](https://learn.microsoft.com/azure/storage/blobs)

- NHN Cloud Object Storage — [https://www.nhncloud.com/kr/service/storage/object-storage](https://www.nhncloud.com/kr/service/storage/object-storage)

- Naver Cloud Object Storage — [https://www.ncloud.com](https://www.ncloud.com)

- KT Cloud Object Storage — [https://cloud.kt.com](https://cloud.kt.com)

- Sakura Internet Object Storage — [https://manual.sakura.ad.jp/cloud/objectstorage](https://manual.sakura.ad.jp/cloud/objectstorage)

- IIJ GIO Object Storage — [https://www.iij.ad.jp/biz/gio](https://www.iij.ad.jp/biz/gio)

- OpenZFS — [https://openzfs.github.io](https://openzfs.github.io)

- bcachefs — [https://bcachefs.org](https://bcachefs.org)

- Kubernetes CSI — [https://kubernetes-csi.github.io/docs](https://kubernetes-csi.github.io/docs)

- Apache Iceberg — [https://iceberg.apache.org](https://iceberg.apache.org)

- DuckDB — [https://duckdb.org](https://duckdb.org)

- Veeam — [https://www.veeam.com](https://www.veeam.com), Rubrik — [https://www.rubrik.com](https://www.rubrik.com), Cohesity — [https://www.cohesity.com](https://www.cohesity.com), Druva — [https://www.druva.com](https://www.druva.com)

현재 단락 (1/387)

Through the 2010s, the storage answer was simple. **Block (SAN, iSCSI), file (NFS, SMB), object (S3,...

작성 글자: 0원문 글자: 26,640작성 단락: 0/387