Skip to content

Split View: 로보틱스 & AI 완전 가이드: ROS2부터 휴머노이드 로봇까지

|

로보틱스 & AI 완전 가이드: ROS2부터 휴머노이드 로봇까지

1. 로보틱스 기초

자유도 (DOF: Degrees of Freedom)

로봇의 자유도란 로봇이 독립적으로 움직일 수 있는 축의 수입니다. 6-DOF 로봇팔은 3차원 공간에서 임의의 위치와 자세를 표현하기에 충분한 최소 자유도를 갖습니다. 손가락을 포함한 전신 휴머노이드 로봇은 50-DOF 이상을 가질 수 있습니다.

자유도표현 가능한 운동예시
1-DOF단일 축 회전/병진컨베이어 벨트
3-DOF3차원 위치델타 로봇
6-DOF위치 + 자세 완전 제어산업용 로봇팔
7-DOF여유 자유도 포함KUKA iiwa, Franka Panda

좌표계 변환: 동차 변환 행렬

3D 공간에서 강체의 위치와 방향을 표현하기 위해 **동차 변환 행렬(Homogeneous Transformation Matrix)**을 사용합니다. 4x4 행렬로 회전(R)과 평행이동(t)을 동시에 표현합니다.

T=[R3×3t3×101×31]T = \begin{bmatrix} R_{3\times3} & t_{3\times1} \\ 0_{1\times3} & 1 \end{bmatrix}

여러 좌표계 간의 변환은 행렬 곱으로 연쇄적으로 적용할 수 있습니다.

Tbaseend=Tbase1T12Tn1endT_{base \to end} = T_{base \to 1} \cdot T_{1 \to 2} \cdots T_{n-1 \to end}

순기구학 (Forward Kinematics)

순기구학은 각 관절의 각도(joint angles)가 주어졌을 때 엔드이펙터(end-effector)의 위치와 자세를 계산하는 문제입니다. 각 관절에 해당하는 변환 행렬을 곱하여 구합니다.

import numpy as np

def dh_transform(a, d, alpha, theta):
    """Denavit-Hartenberg 변환 행렬 계산"""
    ct, st = np.cos(theta), np.sin(theta)
    ca, sa = np.cos(alpha), np.sin(alpha)
    return np.array([
        [ct,    -st*ca,  st*sa,  a*ct],
        [st,     ct*ca, -ct*sa,  a*st],
        [0,      sa,     ca,     d   ],
        [0,      0,      0,      1   ]
    ])

def forward_kinematics(joint_angles, dh_params):
    """
    순기구학: 관절 각도 -> 엔드이펙터 위치/자세
    dh_params: [(a, d, alpha, theta_offset), ...]
    """
    T = np.eye(4)
    for i, theta in enumerate(joint_angles):
        a, d, alpha, theta_off = dh_params[i]
        T = T @ dh_transform(a, d, alpha, theta + theta_off)
    return T

# 2-DOF 평면 로봇팔 예시 (링크 길이 각 1.0m)
dh_params_2dof = [
    (1.0, 0, 0, 0),
    (1.0, 0, 0, 0),
]
joint_angles = [np.pi/4, np.pi/4]
T_end = forward_kinematics(joint_angles, dh_params_2dof)
print("엔드이펙터 위치:", T_end[:3, 3])

역기구학 (Inverse Kinematics)

역기구학은 원하는 엔드이펙터의 위치/자세에서 필요한 관절 각도를 계산합니다. 해석적 방법(Analytical IK)과 수치적 방법(Numerical IK, 야코비안 기반)이 있습니다.

수치적 역기구학의 핵심은 야코비안(Jacobian) 행렬 JJ를 이용한 반복 계산입니다.

Δθ=J+Δx\Delta \theta = J^+ \cdot \Delta x

여기서 J+J^+는 야코비안의 의사역행렬(pseudoinverse)입니다.


2. ROS2 (Robot Operating System 2)

ROS1 vs ROS2 핵심 차이

ROS2는 ROS1의 단점을 극복하기 위해 DDS(Data Distribution Service) 기반 통신으로 재설계되었습니다.

항목ROS1ROS2
통신 미들웨어자체 TCP/UDPDDS (OMG 표준)
실시간 지원미흡지원 (rclcpp rt)
Python 버전Python 2/3Python 3 전용
보안없음SROS2 (TLS/DDS)
멀티 로봇어려움네임스페이스로 용이
수명주기 노드없음LifecycleNode 지원

핵심 개념: 노드, 토픽, 서비스, 액션

  • 노드(Node): 독립적인 실행 단위. 하나의 프로세스 또는 여러 노드 포함 가능
  • 토픽(Topic): 발행(publish)/구독(subscribe) 비동기 메시지 통신
  • 서비스(Service): 요청/응답(request/response) 동기 통신
  • 액션(Action): 장기 실행 작업용 비동기 통신 (피드백 포함)

ROS2 Python 퍼블리셔/서브스크라이버 예제

import rclpy
from rclpy.node import Node
from std_msgs.msg import String

class TalkerNode(Node):
    def __init__(self):
        super().__init__('talker')
        self.publisher = self.create_publisher(String, 'chatter', 10)
        self.timer = self.create_timer(0.5, self.publish_msg)
        self.count = 0

    def publish_msg(self):
        msg = String()
        msg.data = f'Hello ROS2: {self.count}'
        self.publisher.publish(msg)
        self.get_logger().info(f'Published: {msg.data}')
        self.count += 1

class ListenerNode(Node):
    def __init__(self):
        super().__init__('listener')
        self.subscription = self.create_subscription(
            String, 'chatter', self.callback, 10
        )

    def callback(self, msg):
        self.get_logger().info(f'Received: {msg.data}')

def main():
    rclpy.init()
    talker = TalkerNode()
    rclpy.spin(talker)
    rclpy.shutdown()

런치 파일 예제

# launch/my_robot.launch.py
from launch import LaunchDescription
from launch_ros.actions import Node

def generate_launch_description():
    return LaunchDescription([
        Node(
            package='my_robot',
            executable='talker',
            name='talker_node',
            output='screen',
            parameters=[{'use_sim_time': False}],
        ),
        Node(
            package='my_robot',
            executable='listener',
            name='listener_node',
            output='screen',
        ),
    ])

실행: ros2 launch my_robot my_robot.launch.py


3. 센서 처리

LiDAR: Point Cloud 처리

LiDAR(Light Detection And Ranging)는 레이저를 이용해 3D 포인트 클라우드를 생성합니다. 자율주행 및 3D 매핑에 필수적입니다.

import open3d as o3d
import numpy as np

def process_pointcloud(pcd_path):
    # 포인트 클라우드 로드
    pcd = o3d.io.read_point_cloud(pcd_path)

    # 통계적 이상치 제거
    pcd_clean, _ = pcd.remove_statistical_outlier(
        nb_neighbors=20, std_ratio=2.0
    )

    # Voxel downsampling (해상도 축소)
    pcd_down = pcd_clean.voxel_down_sample(voxel_size=0.05)

    # 법선 추정
    pcd_down.estimate_normals(
        search_param=o3d.geometry.KDTreeSearchParamHybrid(
            radius=0.1, max_nn=30
        )
    )

    # RANSAC 평면 분할 (지면 제거)
    plane_model, inliers = pcd_down.segment_plane(
        distance_threshold=0.01,
        ransac_n=3,
        num_iterations=1000
    )
    [a, b, c, d] = plane_model
    print(f"지면 평면 방정식: {a:.2f}x + {b:.2f}y + {c:.2f}z + {d:.2f} = 0")

    # 지면 제외 객체만 추출
    objects_pcd = pcd_down.select_by_index(inliers, invert=True)
    return objects_pcd

IMU와 칼만 필터

IMU(관성 측정 장치)는 가속도계와 자이로스코프를 포함하며, 로봇의 방향과 속도를 추정합니다. 칼만 필터를 이용해 노이즈를 줄이고 정확한 상태 추정을 수행합니다.

칼만 필터의 두 단계:

  1. 예측(Predict): 이전 상태와 모션 모델로 현재 상태 예측
  2. 업데이트(Update): 센서 측정값으로 예측값 보정

x^kk=x^kk1+Kk(zkHx^kk1)\hat{x}_{k|k} = \hat{x}_{k|k-1} + K_k(z_k - H\hat{x}_{k|k-1})

여기서 KkK_k는 칼만 이득(Kalman Gain), zkz_k는 측정값입니다.


4. SLAM (Simultaneous Localization and Mapping)

SLAM이란?

SLAM은 로봇이 미지의 환경에서 동시에 지도를 작성하고 자신의 위치를 추정하는 기술입니다. 로봇 탐색과 자율주행의 핵심 문제입니다.

SLAM의 핵심 과제:

  • Data Association: 현재 관측이 어떤 랜드마크에 해당하는지 매칭
  • Loop Closure: 이전에 방문한 장소를 인식하여 누적 오차 보정
  • Map Representation: 특징점 맵, 격자 맵, 위상 맵 등

EKF-SLAM

**확장 칼만 필터(EKF)**를 이용한 SLAM. 비선형 로봇 운동/관측 모델을 1차 테일러 전개로 선형화합니다.

상태 벡터: 로봇 포즈 + 모든 랜드마크 위치

x=[xr,yr,θr,xl1,yl1,xl2,yl2,]Tx = [x_r, y_r, \theta_r, x_{l1}, y_{l1}, x_{l2}, y_{l2}, \ldots]^T

계산 복잡도가 랜드마크 수 nn에 대해 O(n2)O(n^2)이므로 대규모 환경에서는 한계가 있습니다.

실제 SLAM 라이브러리

라이브러리방법센서특징
CartographerGraph SLAMLiDAR, IMUGoogle 개발, ROS2 지원
ORB-SLAM3Feature-basedCamera, IMU실시간 단안/스테레오
LIO-SAMTightly-coupledLiDAR + IMU야외 대규모 환경
RTAB-MapGraph SLAMRGB-D, LiDAR실시간 3D 맵핑
# Cartographer ROS2 설치 및 실행
sudo apt install ros-humble-cartographer-ros

# Cartographer 실행 (예: TurtleBot3)
ros2 launch turtlebot3_cartographer cartographer.launch.py

5. 모션 플래닝

경로 계획 알고리즘 개요

모션 플래닝은 장애물을 피하면서 시작점에서 목표점까지의 경로를 찾는 문제입니다.

  • Grid-based: A*, Dijkstra - 격자 맵에서 최적 경로
  • Sampling-based: RRT, PRM - 고차원 구성 공간에 효율적
  • Optimization-based: CHOMP, STOMP - 비용 함수 최적화

RRT* 구현

**RRT*(Rapidly-exploring Random Tree Star)**는 점근적 최적성을 보장하는 샘플링 기반 알고리즘입니다.

import numpy as np

def rrt_star(start, goal, obstacles, max_iter=1000, step=0.5, goal_radius=0.5):
    """
    간단한 2D RRT* 구현
    start, goal: np.array([x, y])
    obstacles: [(cx, cy, radius), ...]
    """
    nodes = [start]
    parents = [-1]
    costs = [0.0]

    def is_collision(p1, p2):
        """선분 p1-p2가 장애물과 충돌하는지 검사"""
        for (cx, cy, r) in obstacles:
            center = np.array([cx, cy])
            d = p2 - p1
            t = np.clip(np.dot(center - p1, d) / (np.dot(d, d) + 1e-9), 0, 1)
            closest = p1 + t * d
            if np.linalg.norm(closest - center) < r:
                return True
        return False

    for _ in range(max_iter):
        rand = goal if np.random.random() < 0.05 else np.random.uniform(-10, 10, 2)

        dists = [np.linalg.norm(np.array(n) - rand) for n in nodes]
        nearest_idx = int(np.argmin(dists))
        nearest = np.array(nodes[nearest_idx])

        direction = rand - nearest
        norm = np.linalg.norm(direction)
        if norm < 1e-9:
            continue
        new_node = nearest + step * direction / norm

        if is_collision(nearest, new_node):
            continue

        nodes.append(new_node)
        parents.append(nearest_idx)
        costs.append(costs[nearest_idx] + step)

        if np.linalg.norm(new_node - goal) < goal_radius:
            print(f"목표 도달! 총 노드 수: {len(nodes)}")
            return nodes, parents

    return nodes, parents

# 실행 예시
start = np.array([0.0, 0.0])
goal = np.array([9.0, 9.0])
obstacles = [(5.0, 5.0, 1.5), (3.0, 7.0, 1.0)]
nodes, parents = rrt_star(start, goal, obstacles)

MoveIt2와 로봇팔 모션 플래닝

MoveIt2는 ROS2에서 로봇팔 모션 플래닝을 위한 표준 프레임워크입니다.

# MoveIt2 Python API 예시
import rclpy
from moveit.planning import MoveItPy
from moveit.core.robot_state import RobotState

rclpy.init()
moveit = MoveItPy(node_name='moveit_py_planning')
panda_arm = moveit.get_planning_component('panda_arm')

# 목표 포즈 설정
from geometry_msgs.msg import PoseStamped
goal_pose = PoseStamped()
goal_pose.header.frame_id = 'panda_link0'
goal_pose.pose.position.x = 0.5
goal_pose.pose.position.y = 0.0
goal_pose.pose.position.z = 0.5
goal_pose.pose.orientation.w = 1.0

panda_arm.set_goal_state(pose_stamped_msg=goal_pose, pose_link='panda_hand')

# 경로 계획 및 실행
plan_result = panda_arm.plan()
if plan_result:
    moveit.execute(plan_result.trajectory)

6. 컴퓨터 비전과 로봇

물체 감지 (Object Detection)

로봇이 환경을 인식하기 위해 YOLO(You Only Look Once) 계열 모델이 실시간 물체 감지에 널리 사용됩니다.

from ultralytics import YOLO
import cv2

# YOLOv11 로드
model = YOLO('yolo11n.pt')

def detect_objects(frame):
    """카메라 프레임에서 물체 감지"""
    results = model(frame, conf=0.5)
    annotated = results[0].plot()

    for box in results[0].boxes:
        cls_id = int(box.cls[0])
        label = model.names[cls_id]
        conf = float(box.conf[0])
        x1, y1, x2, y2 = map(int, box.xyxy[0])
        print(f"{label}: {conf:.2f} at ({x1},{y1})-({x2},{y2})")

    return annotated

# 웹캠 실시간 감지
cap = cv2.VideoCapture(0)
while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        break
    annotated_frame = detect_objects(frame)
    cv2.imshow('Robot Vision', annotated_frame)
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break
cap.release()

6D 포즈 추정

로봇 파지(grasping)를 위해 물체의 6D 포즈(위치 3D + 방향 3D)를 추정합니다. PnP(Perspective-n-Point) 알고리즘을 이용합니다.

import cv2
import numpy as np

def estimate_pose(object_points, image_points, camera_matrix, dist_coeffs):
    """
    PnP 알고리즘으로 6D 포즈 추정
    object_points: 3D 점 (물체 좌표계)
    image_points: 2D 점 (이미지 좌표)
    """
    success, rvec, tvec = cv2.solvePnP(
        object_points, image_points,
        camera_matrix, dist_coeffs,
        flags=cv2.SOLVEPNP_ITERATIVE
    )
    if success:
        R, _ = cv2.Rodrigues(rvec)
        print(f"위치: {tvec.flatten()}")
        print(f"회전 행렬:\n{R}")
        return R, tvec
    return None, None

7. 강화학습 기반 로봇 제어

로봇 강화학습 환경

시뮬레이터개발사특징용도
MuJoCoGoogle DeepMind정확한 물리, 빠른 시뮬레이션연속 제어 연구
PyBulletErwin Coumans오픈소스, Python 친화적프로토타이핑
Isaac GymNVIDIAGPU 병렬 시뮬레이션대규모 RL 학습
CARLAIntel/UPV자율주행 특화AD 연구

SAC를 이용한 연속 제어 학습

**SAC(Soft Actor-Critic)**는 엔트로피 정규화를 통해 탐색과 활용을 균형 있게 유지하는 오프-정책 강화학습 알고리즘입니다.

import gymnasium as gym
from stable_baselines3 import SAC
from stable_baselines3.common.callbacks import EvalCallback
import os

# MuJoCo Ant 환경에서 SAC 학습
env = gym.make('Ant-v4', render_mode=None)
eval_env = gym.make('Ant-v4', render_mode=None)

# 최적 모델 저장 콜백
eval_callback = EvalCallback(
    eval_env,
    best_model_save_path='./logs/best_model',
    log_path='./logs/',
    eval_freq=10000,
    deterministic=True,
    render=False
)

# SAC 모델 생성
model = SAC(
    'MlpPolicy',
    env,
    verbose=1,
    learning_rate=3e-4,
    buffer_size=1_000_000,
    batch_size=256,
    ent_coef='auto',
    gamma=0.99,
    tau=0.005,
    tensorboard_log='./tb_logs/'
)

# 학습
model.learn(total_timesteps=1_000_000, callback=eval_callback)
model.save('ant_sac_final')

# 추론 (배포)
obs, _ = env.reset()
total_reward = 0
for _ in range(1000):
    action, _ = model.predict(obs, deterministic=True)
    obs, reward, done, truncated, info = env.step(action)
    total_reward += reward
    if done or truncated:
        print(f"에피소드 종료, 총 보상: {total_reward:.2f}")
        obs, _ = env.reset()
        total_reward = 0

Sim-to-Real 전이

시뮬레이션에서 학습한 정책을 실제 로봇에 적용할 때 발생하는 Reality Gap 문제를 해결하는 기법들:

  1. 도메인 랜덤화(Domain Randomization): 시뮬레이션에서 물리 파라미터(마찰, 질량, 관절 강성)를 무작위로 변화시켜 실제 환경 변이에 강인한 정책 학습
  2. 도메인 적응(Domain Adaptation): 실제 환경 데이터로 정책을 미세 조정
  3. 시스템 식별(System Identification): 실제 로봇의 물리 파라미터를 정확히 측정하여 시뮬레이터에 반영

8. 자율주행

SAE 자율주행 레벨

레벨이름설명예시
L0비자동화운전자 전적 제어일반 차량
L1운전자 보조단일 기능 보조ACC, LKAS
L2부분 자동화복합 기능Tesla Autopilot
L3조건부 자동화특정 조건 자율Mercedes Drive Pilot
L4고도 자동화지정 구역 완전 자율Waymo One
L5완전 자동화모든 조건 자율(미실현)

인식-예측-계획-제어 파이프라인

class AutonomousDrivingSystem:
    """자율주행 시스템의 기본 파이프라인"""

    def __init__(self, perception, prediction, planning, control):
        self.perception = perception    # 인식 모듈
        self.prediction = prediction    # 예측 모듈
        self.planning = planning        # 계획 모듈
        self.control = control          # 제어 모듈

    def run_step(self, sensor_data):
        # 1. 인식: 센서 데이터에서 객체 감지 및 추적
        detected_objects = self.perception.detect(sensor_data)
        ego_state = self.perception.localize(sensor_data)

        # 2. 예측: 주변 차량/보행자 미래 궤적 예측
        predicted_trajectories = self.prediction.predict(
            detected_objects, horizon=3.0  # 3초 예측
        )

        # 3. 계획: 목표까지 안전한 경로 계획
        trajectory = self.planning.plan(
            ego_state, predicted_trajectories, goal=self.goal
        )

        # 4. 제어: 계획된 경로 추종
        steering, throttle, brake = self.control.compute(
            ego_state, trajectory
        )

        return steering, throttle, brake

9. 휴머노이드 로봇 2026

2026년 휴머노이드 로봇 시장은 급격히 성장하고 있습니다. 주요 플레이어들의 최신 현황을 살펴봅니다.

주요 휴머노이드 로봇 비교

로봇회사높이/무게주요 특징
Atlas (전기)Boston Dynamics1.5m / 89kg전기 액추에이터, 고급 이동성
Optimus Gen 3Tesla1.73m / 57kg손 22-DOF, FSD AI 탑재
Figure 02Figure AI1.7m / 60kgOpenAI 협업, 음성 이해
Unitree H1/G1Unitree Robotics1.8m/1.3m가성비, 오픈소스 친화적
1X NEO Beta1X Technologies1.65m / 30kg경량화, 소프트 액추에이터

로봇 파운데이션 모델

**RT-2(Robotics Transformer 2)**는 Google DeepMind가 개발한 비전-언어-행동 모델로, 웹 데이터에서 학습한 지식을 로봇 제어에 직접 활용합니다. 자연어 명령을 이해하고 새로운 객체에도 일반화할 수 있습니다.

OpenVLA는 오픈소스 VLA(Vision-Language-Action) 모델로, 연구자들이 자체 로봇 데이터로 파인튜닝하여 사용할 수 있습니다.

# OpenVLA 추론 예시 (개념 코드)
from transformers import AutoModelForVision2Seq, AutoProcessor
import torch

model_name = "openvla/openvla-7b"
processor = AutoProcessor.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForVision2Seq.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map='auto',
    trust_remote_code=True
)

# 비전-언어 명령으로 행동 생성
image = get_camera_image()  # 로봇 카메라 이미지
instruction = "pick up the red block and place it in the box"

inputs = processor(instruction, image).to('cuda:0', dtype=torch.bfloat16)
action = model.predict_action(**inputs, unnorm_key='bridge_orig', do_sample=False)
# action: [dx, dy, dz, droll, dpitch, dyaw, gripper]
robot.execute_action(action)

임베디드 AI와 온디바이스 추론

휴머노이드 로봇의 두뇌는 클라우드 의존 없이 실시간으로 동작해야 합니다. NVIDIA Jetson Orin, Qualcomm Robotics RB5 같은 엣지 AI 칩이 핵심입니다.

  • NVIDIA Jetson AGX Orin: 275 TOPS, 로봇 AI 추론 표준
  • 양자화(Quantization): INT8/FP16으로 모델 경량화
  • TensorRT: NVIDIA GPU 추론 최적화 엔진

10. 퀴즈: 로보틱스 & AI 핵심 개념

Q1. SLAM에서 Loop Closure가 중요한 이유는 무엇인가요?

정답: 로봇이 이전에 방문했던 장소를 인식하여 누적된 위치 추정 오류를 보정하기 위해서입니다.

설명: 장기간 탐색 시 작은 오차가 누적되어 큰 위치 오류가 발생합니다. Loop Closure가 감지되면 그래프 최적화(예: g2o, iSAM2)를 통해 전체 궤적의 오차를 수정합니다. 이 과정 없이는 지도가 점점 왜곡됩니다.

Q2. ROS2가 DDS를 사용하는 주요 이점은 무엇인가요?

정답: 표준화된 미들웨어, 실시간 지원, 멀티 로봇 환경에서의 네트워크 유연성, 보안(SROS2) 지원입니다.

설명: ROS1은 단일 roscore에 의존해 단일 장애점이 존재했습니다. DDS 기반 ROS2는 분산형 아키텍처로 roscore 없이 노드 간 직접 통신이 가능하며, QoS(Quality of Service) 정책으로 통신 신뢰성을 세밀하게 제어할 수 있습니다.

Q3. SAC 알고리즘에서 엔트로피 항(entropy term)의 역할은 무엇인가요?

정답: 정책의 무작위성(탐색 수준)을 보상의 일부로 포함시켜, 에이전트가 과도하게 결정론적인 행동에 수렴하지 않고 다양한 행동을 유지하도록 합니다.

설명: SAC의 목표 함수는 기대 보상과 정책 엔트로피의 합입니다. 엔트로피 계수(alpha)가 클수록 탐색을 더 강조하며, auto 설정 시 목표 엔트로피에 맞게 자동 조정됩니다. 이를 통해 지역 최적해 함정에 빠지는 것을 방지합니다.

Q4. 역기구학(IK)이 순기구학(FK)보다 어려운 근본적인 이유는 무엇인가요?

정답: IK는 해가 존재하지 않거나, 하나가 아닌 복수의 해가 존재하는 비선형 방정식 시스템이기 때문입니다.

설명: FK는 관절 각도가 주어지면 유일한 엔드이펙터 포즈를 계산하는 단방향 함수입니다. 반면 IK는 목표 포즈에서 역산하는 과정으로, 7-DOF 이상의 여유 자유도 로봇은 무한히 많은 해를 가지며(redundancy), 작업 공간 경계에서는 해가 없을 수 있습니다. 특이점(singularity) 근처에서는 야코비안이 singular해져 수치적 불안정성이 발생합니다.

Q5. Sim-to-Real 전이에서 도메인 랜덤화가 효과적인 이유는?

정답: 다양한 시뮬레이션 환경에서 학습함으로써 실제 환경의 불확실성과 변동성에 강인한(robust) 정책을 학습하기 때문입니다.

설명: 단일 시뮬레이션 파라미터로 학습하면 실제 환경의 마찰, 액추에이터 지연, 센서 노이즈 등의 차이로 인해 성능이 급격히 저하됩니다. 도메인 랜덤화는 이러한 파라미터를 학습 중 무작위로 변화시켜 정책이 다양한 물리 조건에 적응하도록 강제합니다. OpenAI의 Dextrous Hand 연구가 이 방법의 대표적 성공 사례입니다.


참고 자료

Robotics & AI Complete Guide: From ROS2 to Humanoid Robots

1. Robotics Fundamentals

Degrees of Freedom (DOF)

A robot's degrees of freedom refers to the number of independent axes along which it can move. A 6-DOF robot arm has the minimum number of degrees needed to represent any arbitrary position and orientation in 3D space. A full-body humanoid robot including fingers can have 50 or more DOF.

DOFExpressible MotionExamples
1-DOFSingle-axis rotation/translationConveyor belt
3-DOF3D position onlyDelta robot
6-DOFFull position + orientationIndustrial robot arm
7-DOFWith redundancyKUKA iiwa, Franka Panda

Coordinate Transforms: Homogeneous Transformation Matrix

To represent the position and orientation of a rigid body in 3D space, robotics uses the Homogeneous Transformation Matrix — a 4x4 matrix that encodes both rotation (R) and translation (t) simultaneously.

T=[R3×3t3×101×31]T = \begin{bmatrix} R_{3\times3} & t_{3\times1} \\ 0_{1\times3} & 1 \end{bmatrix}

Transformations between multiple coordinate frames are chained via matrix multiplication:

Tbaseend=Tbase1T12Tn1endT_{base \to end} = T_{base \to 1} \cdot T_{1 \to 2} \cdots T_{n-1 \to end}

Forward Kinematics

Forward kinematics computes the position and orientation of the end-effector given the joint angles. Each joint's transformation matrix is multiplied in sequence.

import numpy as np

def dh_transform(a, d, alpha, theta):
    """Compute a Denavit-Hartenberg transformation matrix."""
    ct, st = np.cos(theta), np.sin(theta)
    ca, sa = np.cos(alpha), np.sin(alpha)
    return np.array([
        [ct,    -st*ca,  st*sa,  a*ct],
        [st,     ct*ca, -ct*sa,  a*st],
        [0,      sa,     ca,     d   ],
        [0,      0,      0,      1   ]
    ])

def forward_kinematics(joint_angles, dh_params):
    """
    FK: joint angles -> end-effector pose
    dh_params: [(a, d, alpha, theta_offset), ...]
    """
    T = np.eye(4)
    for i, theta in enumerate(joint_angles):
        a, d, alpha, theta_off = dh_params[i]
        T = T @ dh_transform(a, d, alpha, theta + theta_off)
    return T

# 2-DOF planar arm example (link lengths 1.0m each)
dh_params_2dof = [
    (1.0, 0, 0, 0),
    (1.0, 0, 0, 0),
]
joint_angles = [np.pi/4, np.pi/4]
T_end = forward_kinematics(joint_angles, dh_params_2dof)
print("End-effector position:", T_end[:3, 3])

Inverse Kinematics

Inverse kinematics computes the required joint angles given a desired end-effector pose. There are analytical approaches (closed-form IK) and numerical approaches (Jacobian-based iterative IK).

The core of numerical IK is iterative computation using the Jacobian matrix JJ:

Δθ=J+Δx\Delta \theta = J^+ \cdot \Delta x

where J+J^+ is the pseudoinverse of the Jacobian. Near singularities, the Jacobian becomes ill-conditioned and the numerical solution can become unstable.

Denavit-Hartenberg Parameters

The DH convention provides a systematic method to assign coordinate frames to each link of a robot. Each joint is described by four parameters: link length aa, link twist α\alpha, link offset dd, and joint angle θ\theta. This compact representation makes it straightforward to implement FK for any serial manipulator.


2. ROS2 (Robot Operating System 2)

ROS1 vs ROS2: Key Differences

ROS2 was redesigned from scratch to overcome ROS1's limitations, adopting DDS (Data Distribution Service) as its communication middleware.

FeatureROS1ROS2
CommunicationCustom TCP/UDPDDS (OMG standard)
Real-time supportPoorSupported (rclcpp rt)
Python versionPython 2/3Python 3 only
SecurityNoneSROS2 (TLS/DDS)
Multi-robotDifficultEasy with namespaces
Lifecycle nodesNoneLifecycleNode supported
Middlewareroscore requiredDecentralized, no roscore

Core Concepts: Nodes, Topics, Services, Actions

  • Node: The fundamental computation unit. Can be one process or contain multiple nodes.
  • Topic: Asynchronous publish/subscribe communication (fire-and-forget).
  • Service: Synchronous request/response communication (blocking call).
  • Action: Asynchronous communication for long-running tasks, with feedback and preemption support.

ROS2 Python Publisher/Subscriber Example

import rclpy
from rclpy.node import Node
from std_msgs.msg import String

class TalkerNode(Node):
    def __init__(self):
        super().__init__('talker')
        self.publisher = self.create_publisher(String, 'chatter', 10)
        self.timer = self.create_timer(0.5, self.publish_msg)
        self.count = 0

    def publish_msg(self):
        msg = String()
        msg.data = f'Hello ROS2: {self.count}'
        self.publisher.publish(msg)
        self.get_logger().info(f'Published: {msg.data}')
        self.count += 1

class ListenerNode(Node):
    def __init__(self):
        super().__init__('listener')
        self.subscription = self.create_subscription(
            String, 'chatter', self.callback, 10
        )

    def callback(self, msg):
        self.get_logger().info(f'Received: {msg.data}')

def main():
    rclpy.init()
    node = TalkerNode()
    rclpy.spin(node)
    rclpy.shutdown()

Launch File Example

# launch/my_robot.launch.py
from launch import LaunchDescription
from launch_ros.actions import Node

def generate_launch_description():
    return LaunchDescription([
        Node(
            package='my_robot',
            executable='talker',
            name='talker_node',
            output='screen',
            parameters=[{'use_sim_time': False}],
        ),
        Node(
            package='my_robot',
            executable='listener',
            name='listener_node',
            output='screen',
        ),
    ])

Run with: ros2 launch my_robot my_robot.launch.py


3. Sensor Processing

LiDAR: Point Cloud Processing

LiDAR (Light Detection And Ranging) generates 3D point clouds using laser pulses. It is essential for autonomous navigation and 3D mapping.

import open3d as o3d
import numpy as np

def process_pointcloud(pcd_path):
    # Load point cloud
    pcd = o3d.io.read_point_cloud(pcd_path)

    # Statistical outlier removal
    pcd_clean, _ = pcd.remove_statistical_outlier(
        nb_neighbors=20, std_ratio=2.0
    )

    # Voxel downsampling (reduce resolution)
    pcd_down = pcd_clean.voxel_down_sample(voxel_size=0.05)

    # Normal estimation
    pcd_down.estimate_normals(
        search_param=o3d.geometry.KDTreeSearchParamHybrid(
            radius=0.1, max_nn=30
        )
    )

    # RANSAC plane segmentation (ground removal)
    plane_model, inliers = pcd_down.segment_plane(
        distance_threshold=0.01,
        ransac_n=3,
        num_iterations=1000
    )
    [a, b, c, d] = plane_model
    print(f"Ground plane: {a:.2f}x + {b:.2f}y + {c:.2f}z + {d:.2f} = 0")

    # Extract non-ground objects
    objects_pcd = pcd_down.select_by_index(inliers, invert=True)
    return objects_pcd

IMU and the Kalman Filter

An IMU (Inertial Measurement Unit) contains accelerometers and gyroscopes to estimate orientation and velocity. The Kalman Filter reduces sensor noise and provides accurate state estimation.

Two stages of the Kalman Filter:

  1. Predict: Estimate the current state using the motion model
  2. Update: Correct the prediction using sensor measurements

x^kk=x^kk1+Kk(zkHx^kk1)\hat{x}_{k|k} = \hat{x}_{k|k-1} + K_k(z_k - H\hat{x}_{k|k-1})

where KkK_k is the Kalman Gain and zkz_k is the measurement. The EKF (Extended Kalman Filter) handles nonlinear systems via first-order Taylor linearization.

Camera: RGB-D and Stereo Vision

RGB-D cameras (e.g., Intel RealSense, Microsoft Azure Kinect) provide color plus depth per pixel. Stereo cameras compute depth via triangulation. Both are used for obstacle detection, semantic mapping, and grasping.


4. SLAM (Simultaneous Localization and Mapping)

What is SLAM?

SLAM is the problem of building a map of an unknown environment while simultaneously estimating the robot's location within it. It is a foundational challenge in robotics navigation.

Key SLAM challenges:

  • Data Association: Matching current observations to known landmarks
  • Loop Closure: Recognizing a previously visited place to correct accumulated drift
  • Map Representation: Landmark maps, occupancy grids, topological maps, etc.

EKF-SLAM

EKF-SLAM uses the Extended Kalman Filter, linearizing the nonlinear robot motion and observation models via first-order Taylor expansion.

State vector: robot pose + all landmark positions

x=[xr,yr,θr,xl1,yl1,xl2,yl2,]Tx = [x_r, y_r, \theta_r, x_{l1}, y_{l1}, x_{l2}, y_{l2}, \ldots]^T

Computational complexity is O(n2)O(n^2) in the number of landmarks nn, making it unsuitable for large-scale environments.

Graph-Based SLAM

Modern SLAM systems typically use a graph-based formulation. Poses are nodes in a graph, and sensor constraints are edges. Loop closure adds edges between distant nodes, and the entire graph is optimized using nonlinear least squares (e.g., g2o, GTSAM).

Real-World SLAM Libraries

LibraryMethodSensorsNotes
CartographerGraph SLAMLiDAR, IMUDeveloped by Google, ROS2 support
ORB-SLAM3Feature-basedCamera, IMUReal-time mono/stereo/RGB-D
LIO-SAMTightly-coupledLiDAR + IMULarge outdoor environments
RTAB-MapGraph SLAMRGB-D, LiDARReal-time 3D mapping
# Install Cartographer for ROS2 Humble
sudo apt install ros-humble-cartographer-ros

# Run Cartographer with TurtleBot3
ros2 launch turtlebot3_cartographer cartographer.launch.py

5. Motion Planning

Planning Algorithm Overview

Motion planning finds a collision-free path from a start configuration to a goal in the robot's configuration space (C-space).

  • Grid-based: A*, Dijkstra — optimal on discrete grids, but scale poorly to high DOF
  • Sampling-based: RRT, PRM — efficient in high-dimensional C-spaces
  • Optimization-based: CHOMP, STOMP — optimize a cost functional over a trajectory

RRT* Implementation

RRT* (Rapidly-exploring Random Tree Star) provides asymptotic optimality guarantees on top of RRT.

import numpy as np

def rrt_star(start, goal, obstacles, max_iter=1000, step=0.5, goal_radius=0.5):
    """
    Simple 2D RRT* implementation.
    start, goal: np.array([x, y])
    obstacles: [(cx, cy, radius), ...]
    """
    nodes = [start]
    parents = [-1]
    costs = [0.0]

    def is_collision(p1, p2):
        """Check if segment p1-p2 intersects any obstacle."""
        for (cx, cy, r) in obstacles:
            center = np.array([cx, cy])
            d = p2 - p1
            t = np.clip(np.dot(center - p1, d) / (np.dot(d, d) + 1e-9), 0, 1)
            closest = p1 + t * d
            if np.linalg.norm(closest - center) < r:
                return True
        return False

    for _ in range(max_iter):
        rand = goal if np.random.random() < 0.05 else np.random.uniform(-10, 10, 2)

        dists = [np.linalg.norm(np.array(n) - rand) for n in nodes]
        nearest_idx = int(np.argmin(dists))
        nearest = np.array(nodes[nearest_idx])

        direction = rand - nearest
        norm = np.linalg.norm(direction)
        if norm < 1e-9:
            continue
        new_node = nearest + step * direction / norm

        if is_collision(nearest, new_node):
            continue

        nodes.append(new_node)
        parents.append(nearest_idx)
        costs.append(costs[nearest_idx] + step)

        if np.linalg.norm(new_node - goal) < goal_radius:
            print(f"Goal reached! Total nodes: {len(nodes)}")
            return nodes, parents

    return nodes, parents

# Example usage
start = np.array([0.0, 0.0])
goal = np.array([9.0, 9.0])
obstacles = [(5.0, 5.0, 1.5), (3.0, 7.0, 1.0)]
nodes, parents = rrt_star(start, goal, obstacles)

MoveIt2 for Robot Arm Planning

MoveIt2 is the standard framework for robot arm motion planning in ROS2. It wraps multiple planning libraries (OMPL, STOMP, CHOMP) under a unified API.

# MoveIt2 Python API example
import rclpy
from moveit.planning import MoveItPy

rclpy.init()
moveit = MoveItPy(node_name='moveit_py_planning')
panda_arm = moveit.get_planning_component('panda_arm')

# Set goal pose
from geometry_msgs.msg import PoseStamped
goal_pose = PoseStamped()
goal_pose.header.frame_id = 'panda_link0'
goal_pose.pose.position.x = 0.5
goal_pose.pose.position.y = 0.0
goal_pose.pose.position.z = 0.5
goal_pose.pose.orientation.w = 1.0

panda_arm.set_goal_state(pose_stamped_msg=goal_pose, pose_link='panda_hand')

# Plan and execute
plan_result = panda_arm.plan()
if plan_result:
    moveit.execute(plan_result.trajectory)

6. Computer Vision for Robotics

Object Detection with YOLO

Real-time object detection is essential for robots operating in dynamic environments. The YOLO (You Only Look Once) family remains the go-to choice for edge deployment.

from ultralytics import YOLO
import cv2

# Load YOLOv11 nano
model = YOLO('yolo11n.pt')

def detect_objects(frame):
    """Detect objects in a camera frame."""
    results = model(frame, conf=0.5)
    annotated = results[0].plot()

    for box in results[0].boxes:
        cls_id = int(box.cls[0])
        label = model.names[cls_id]
        conf = float(box.conf[0])
        x1, y1, x2, y2 = map(int, box.xyxy[0])
        print(f"{label}: {conf:.2f} at ({x1},{y1})-({x2},{y2})")

    return annotated

# Live webcam detection
cap = cv2.VideoCapture(0)
while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        break
    annotated_frame = detect_objects(frame)
    cv2.imshow('Robot Vision', annotated_frame)
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break
cap.release()

6D Pose Estimation

To grasp an object, a robot needs to estimate its 6D pose — 3D position and 3D orientation. The PnP (Perspective-n-Point) algorithm solves this given 2D-3D point correspondences.

import cv2
import numpy as np

def estimate_pose(object_points, image_points, camera_matrix, dist_coeffs):
    """
    6D pose estimation using PnP.
    object_points: 3D points in object coordinate frame
    image_points: Corresponding 2D image coordinates
    """
    success, rvec, tvec = cv2.solvePnP(
        object_points, image_points,
        camera_matrix, dist_coeffs,
        flags=cv2.SOLVEPNP_ITERATIVE
    )
    if success:
        R, _ = cv2.Rodrigues(rvec)
        print(f"Position: {tvec.flatten()}")
        print(f"Rotation matrix:\n{R}")
        return R, tvec
    return None, None

7. Reinforcement Learning for Robot Control

Simulation Environments

SimulatorDeveloperStrengthsUse Case
MuJoCoGoogle DeepMindAccurate physics, fastContinuous control research
PyBulletErwin CoumansOpen-source, Python-friendlyPrototyping
Isaac GymNVIDIAMassively parallel GPU simLarge-scale RL
CARLAIntel/UPVAD-specific sensorsAutonomous driving

SAC for Continuous Control

SAC (Soft Actor-Critic) is an off-policy RL algorithm that incorporates entropy regularization, balancing exploration and exploitation automatically.

import gymnasium as gym
from stable_baselines3 import SAC
from stable_baselines3.common.callbacks import EvalCallback

# Train SAC on MuJoCo Ant
env = gym.make('Ant-v4', render_mode=None)
eval_env = gym.make('Ant-v4', render_mode=None)

eval_callback = EvalCallback(
    eval_env,
    best_model_save_path='./logs/best_model',
    log_path='./logs/',
    eval_freq=10000,
    deterministic=True,
    render=False
)

model = SAC(
    'MlpPolicy',
    env,
    verbose=1,
    learning_rate=3e-4,
    buffer_size=1_000_000,
    batch_size=256,
    ent_coef='auto',  # auto-tune entropy coefficient
    gamma=0.99,
    tau=0.005,
    tensorboard_log='./tb_logs/'
)

model.learn(total_timesteps=1_000_000, callback=eval_callback)
model.save('ant_sac_final')

# Inference / deployment
obs, _ = env.reset()
total_reward = 0.0
for _ in range(1000):
    action, _ = model.predict(obs, deterministic=True)
    obs, reward, done, truncated, info = env.step(action)
    total_reward += reward
    if done or truncated:
        print(f"Episode ended. Total reward: {total_reward:.2f}")
        obs, _ = env.reset()
        total_reward = 0.0

Sim-to-Real Transfer

Bridging the Reality Gap between simulation and the real world is one of the central challenges in robot learning.

  1. Domain Randomization: Randomly vary physical parameters (friction, mass, joint stiffness) in simulation so the policy learns to cope with a wide distribution of environments.
  2. Domain Adaptation: Fine-tune the policy using a small amount of real-world data.
  3. System Identification: Precisely measure physical parameters of the real robot and incorporate them in the simulator.
  4. Randomized Rendering: For vision-based policies, randomize textures, lighting, and camera properties.

8. Autonomous Driving

SAE Autonomy Levels

LevelNameDescriptionExamples
L0No AutomationDriver fully in controlStandard vehicles
L1Driver AssistanceSingle feature assistACC, LKAS
L2Partial AutomationCombined featuresTesla Autopilot
L3Conditional AutomationSelf-driving in specific conditionsMercedes Drive Pilot
L4High AutomationFull autonomy in geofenced areasWaymo One
L5Full AutomationAll conditions, no driverNot yet realized

Perception-Prediction-Planning-Control Pipeline

class AutonomousDrivingSystem:
    """Standard autonomous driving pipeline."""

    def __init__(self, perception, prediction, planning, control):
        self.perception = perception
        self.prediction = prediction
        self.planning = planning
        self.control = control
        self.goal = None

    def run_step(self, sensor_data):
        # 1. Perception: detect and track objects, localize ego vehicle
        detected_objects = self.perception.detect(sensor_data)
        ego_state = self.perception.localize(sensor_data)

        # 2. Prediction: forecast future trajectories of other agents
        predicted_trajectories = self.prediction.predict(
            detected_objects, horizon=3.0  # 3-second prediction horizon
        )

        # 3. Planning: generate a safe, comfortable trajectory to goal
        trajectory = self.planning.plan(
            ego_state, predicted_trajectories, goal=self.goal
        )

        # 4. Control: track the planned trajectory
        steering, throttle, brake = self.control.compute(
            ego_state, trajectory
        )

        return steering, throttle, brake

Sensor Fusion: LiDAR + Camera

Modern autonomous vehicles fuse LiDAR point clouds and camera images. LiDAR provides accurate depth; cameras provide rich semantic information. Common approaches include:

  • Early fusion: Project LiDAR points onto image, create a unified representation
  • Late fusion: Run separate detectors, combine outputs with Kalman filtering
  • Deep fusion: End-to-end networks that take both modalities as input

9. Humanoid Robots in 2026

The humanoid robot market is growing rapidly in 2026. Here is a snapshot of the major players.

Humanoid Robot Comparison

RobotCompanyHeight / WeightKey Features
Atlas (Electric)Boston Dynamics1.5m / 89kgElectric actuators, advanced mobility
Optimus Gen 3Tesla1.73m / 57kg22-DOF hands, FSD AI stack
Figure 02Figure AI1.7m / 60kgOpenAI collaboration, voice understanding
Unitree H1/G1Unitree Robotics1.8m/1.3mCost-effective, open-source friendly
1X NEO Beta1X Technologies1.65m / 30kgLightweight, soft actuators

Foundation Models for Robotics

RT-2 (Robotics Transformer 2), developed by Google DeepMind, is a vision-language-action (VLA) model that directly applies knowledge learned from web-scale data to robot control. It understands natural language commands and generalizes to novel objects.

OpenVLA is an open-source VLA model that researchers can fine-tune on their own robot data.

# OpenVLA inference example (conceptual)
from transformers import AutoModelForVision2Seq, AutoProcessor
import torch

model_name = "openvla/openvla-7b"
processor = AutoProcessor.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForVision2Seq.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map='auto',
    trust_remote_code=True
)

# Generate robot action from vision + language
image = get_camera_image()   # robot wrist camera image
instruction = "pick up the red block and place it in the box"

inputs = processor(instruction, image).to('cuda:0', dtype=torch.bfloat16)
action = model.predict_action(**inputs, unnorm_key='bridge_orig', do_sample=False)
# action: [dx, dy, dz, droll, dpitch, dyaw, gripper]
robot.execute_action(action)

Embedded AI and On-Device Inference

Humanoid robots must make decisions in real time without relying on the cloud. Key hardware platforms include NVIDIA Jetson AGX Orin (275 TOPS) and Qualcomm Robotics RB5.

Key techniques for efficient on-device inference:

  • Quantization: INT8 / FP16 reduces model size and latency
  • Pruning: Remove redundant weights to reduce computation
  • TensorRT / ONNX Runtime: Hardware-optimized inference engines
  • Distillation: Train a smaller student model from a large teacher

10. Quiz: Robotics & AI Core Concepts

Q1. Why is Loop Closure critical in SLAM?

Answer: It allows the robot to recognize a previously visited place and correct the accumulated pose estimation error.

Explanation: During extended navigation, small sensor and odometry errors accumulate over time, causing the robot's estimated trajectory to drift significantly from reality. When a loop closure is detected (e.g., through feature matching or place recognition), a graph optimization step (using tools like g2o or GTSAM) redistributes this accumulated error across all past poses, dramatically improving map consistency.

Q2. What are the primary advantages of ROS2 using DDS over ROS1?

Answer: Standardized middleware, real-time support, flexible multi-robot networking, no single point of failure (no roscore), and built-in security (SROS2).

Explanation: ROS1 depended on a single roscore process, which was a single point of failure. ROS2's DDS-based architecture is fully decentralized — nodes discover each other automatically without a broker. QoS (Quality of Service) policies allow fine-grained control over communication reliability, latency, and history. SROS2 adds TLS-based encryption and authentication.

Q3. What role does the entropy term play in the SAC algorithm?

Answer: It encourages the policy to remain stochastic (exploratory) by including entropy as part of the reward signal, preventing premature convergence to suboptimal deterministic behaviors.

Explanation: SAC's objective is to maximize the sum of expected return and policy entropy. A higher entropy coefficient (alpha) encourages more exploration. When set to 'auto', SAC adjusts alpha dynamically to maintain a target entropy level. This mechanism helps escape local optima and produces more robust policies that generalize better to unseen situations.

Q4. Why is inverse kinematics fundamentally harder than forward kinematics?

Answer: IK is a nonlinear system of equations that may have no solution, a unique solution, or infinitely many solutions, whereas FK always produces a unique result.

Explanation: FK is a straightforward composition of transformation matrices — given joint angles, there is exactly one end-effector pose. IK inverts this mapping: for a 7-DOF redundant manipulator, infinitely many joint configurations reach the same end-effector pose (kinematic redundancy). At workspace boundaries, no solution may exist. Near singularities, the Jacobian becomes rank-deficient, causing numerical instability in iterative solvers.

Q5. Why does Domain Randomization work for Sim-to-Real transfer?

Answer: By training across a wide distribution of simulated physical parameters, the policy learns to be robust to the variability found in the real world.

Explanation: A policy trained with fixed simulation parameters overfits to those specific conditions and fails when physical properties in reality differ (different friction, actuator delays, sensor noise). Domain Randomization treats the real world as just another sample from the training distribution. OpenAI's Dexterous Hand and NVIDIA's quadruped locomotion work are landmark demonstrations of this approach, achieving remarkable real-world performance purely through massively randomized simulation.


References