Chaos and Order

Chaos and Order https://www.youngju.dev/blog 천천히 올바르게. AI Researcher & DevOps Engineer Youngju's tech blog. GPU/CUDA, LLM, MLOps, Kubernetes AI workloads, distributed training, and data engineering. ko fjvbn2003@gmail.com (Youngju Kim) fjvbn2003@gmail.com (Youngju Kim) Sat, 16 May 2026 00:00:00 GMT https://www.youngju.dev/blog/culture/2026-05-16-computer-vision-frameworks-2026-opencv-4-mediapipe-detectron2-yolo-v11-mmdetection-sam-2-grounding-dino-deep-dive.en Computer Vision Frameworks 2026 - OpenCV 4, MediaPipe, Detectron2, YOLO v11, MMDetection, SAM 2, Grounding DINO Deep Dive https://www.youngju.dev/blog/culture/2026-05-16-computer-vision-frameworks-2026-opencv-4-mediapipe-detectron2-yolo-v11-mmdetection-sam-2-grounding-dino-deep-dive.en The 2026 computer vision stack is no longer about "touching pixels". OpenCV 4.10 has made ONNX inference table stakes, MediaPipe Studio reduces mobile real-time pipelines to one line, YOLO v11 bundles NAS, segmentation, and pose estimation into a single model from Ultralytics, SAM 2 tracks video masks in real time, and Grounding DINO 1.6 plus Florence-2 made "drawing boxes from text" the standard for open-vocabulary detection. This article walks through the 2026 CV stack end-to-end - OpenCV, MediaPipe, Detectron3, the YOLO family, MMDetection, SAM 2, Grounding DINO, Florence-2, YOLO-World, VLMs (GPT-4o, Claude 3.5, Gemini 2.0, Qwen2-VL, InternVL 2.5), 3D vision (DUSt3R, MASt3R, VGGT), Depth Anything v3, DINOv3, and embedded inference (ONNX Runtime, TensorRT, OpenVINO, CoreML) in one breath. Sat, 16 May 2026 00:00:00 GMT fjvbn2003@gmail.com (Youngju Kim) computer-visionopencvmediapipedetectron2yolommdetectionsamgrounding-dinovlmsegmentationenglish https://www.youngju.dev/blog/culture/2026-05-16-computer-vision-frameworks-2026-opencv-4-mediapipe-detectron2-yolo-v11-mmdetection-sam-2-grounding-dino-deep-dive.ja コンピュータビジョン・フレームワーク2026完全ガイド - OpenCV 4・MediaPipe・Detectron2・YOLO v11・MMDetection・SAM 2・Grounding DINO徹底解説 https://www.youngju.dev/blog/culture/2026-05-16-computer-vision-frameworks-2026-opencv-4-mediapipe-detectron2-yolo-v11-mmdetection-sam-2-grounding-dino-deep-dive.ja 2026年のコンピュータビジョン・スタックはもはや「ピクセルを触る仕事」ではない。OpenCV 4.10はONNX推論を基本機能として取り込み、MediaPipe Studioはモバイル実時間パイプラインを一行に縮め、YOLO v11はUltralyticsからNAS・セグメンテーション・姿勢推定までを一つのモデルに束ね、SAM 2は動画マスクを実時間で追跡し、Grounding DINO 1.6とFlorence-2は「テキストから箱を描く」オープン語彙検出を標準にした。本記事はOpenCV・MediaPipe・Detectron3・YOLO系・MMDetection・SAM 2・Grounding DINO・Florence-2・YOLO-World・VLM（GPT-4o・Claude 3.5・Gemini 2.0・Qwen2-VL・InternVL 2.5）・3Dビジョン（DUSt3R・MASt3R・VGGT）・Depth Anything v3・DINOv3・組込推論（ONNX Runtime・TensorRT・OpenVINO・CoreML）まで、2026年CVスタック全体を一息で整理する。 Sat, 16 May 2026 00:00:00 GMT fjvbn2003@gmail.com (Youngju Kim) computer-visionopencvmediapipedetectron2yolommdetectionsamgrounding-dinovlmsegmentation日本語 https://www.youngju.dev/blog/culture/2026-05-16-computer-vision-frameworks-2026-opencv-4-mediapipe-detectron2-yolo-v11-mmdetection-sam-2-grounding-dino-deep-dive 컴퓨터 비전 프레임워크 2026 완벽 가이드 - OpenCV 4 · MediaPipe · Detectron2 · YOLO v11 · MMDetection · SAM 2 · Grounding DINO 심층 분석 https://www.youngju.dev/blog/culture/2026-05-16-computer-vision-frameworks-2026-opencv-4-mediapipe-detectron2-yolo-v11-mmdetection-sam-2-grounding-dino-deep-dive 2026년의 컴퓨터 비전 스택은 더 이상 "픽셀을 만지는 일"이 아니다. OpenCV 4.10이 ONNX 추론을 기본기로 받아들이고, MediaPipe Studio가 모바일 실시간 파이프라인을 한 줄로 줄이고, YOLO v11이 Ultralytics에서 NAS·세그멘테이션·자세 추정까지 한 모델에 묶고, SAM 2가 비디오 마스크를 실시간으로 추적하고, Grounding DINO 1.6과 Florence-2가 "텍스트로 박스를 그리는" 오픈-보캐브 검출을 표준으로 만들었다. 이 글은 OpenCV·MediaPipe·Detectron3·YOLO 계열·MMDetection·SAM 2·Grounding DINO·Florence-2·YOLO-World·VLM(GPT-4o·Claude 3.5·Gemini 2.0·Qwen2-VL·InternVL 2.5)·3D 비전(DUSt3R·MASt3R·VGGT)·Depth Anything v3·DINOv3·임베디드 추론(ONNX Runtime·TensorRT·OpenVINO·CoreML)까지 2026년 컴퓨터 비전 스택 전체를 한 호흡으로 정리한다. Sat, 16 May 2026 00:00:00 GMT fjvbn2003@gmail.com (Youngju Kim) computer-visionopencvmediapipedetectron2yolommdetectionsamgrounding-dinovlmsegmentation