- Authors

- Name
- Youngju Kim
- @fjvbn20031
- 1. Robotics Fundamentals
- 2. ROS2 (Robot Operating System 2)
- 3. Sensor Processing
- 4. SLAM (Simultaneous Localization and Mapping)
- 5. Motion Planning
- 6. Computer Vision for Robotics
- 7. Reinforcement Learning for Robot Control
- 8. Autonomous Driving
- 9. Humanoid Robots in 2026
- 10. Quiz: Robotics & AI Core Concepts
- References
1. Robotics Fundamentals
Degrees of Freedom (DOF)
A robot's degrees of freedom refers to the number of independent axes along which it can move. A 6-DOF robot arm has the minimum number of degrees needed to represent any arbitrary position and orientation in 3D space. A full-body humanoid robot including fingers can have 50 or more DOF.
| DOF | Expressible Motion | Examples |
|---|---|---|
| 1-DOF | Single-axis rotation/translation | Conveyor belt |
| 3-DOF | 3D position only | Delta robot |
| 6-DOF | Full position + orientation | Industrial robot arm |
| 7-DOF | With redundancy | KUKA iiwa, Franka Panda |
Coordinate Transforms: Homogeneous Transformation Matrix
To represent the position and orientation of a rigid body in 3D space, robotics uses the Homogeneous Transformation Matrix — a 4x4 matrix that encodes both rotation (R) and translation (t) simultaneously.
Transformations between multiple coordinate frames are chained via matrix multiplication:
Forward Kinematics
Forward kinematics computes the position and orientation of the end-effector given the joint angles. Each joint's transformation matrix is multiplied in sequence.
import numpy as np
def dh_transform(a, d, alpha, theta):
"""Compute a Denavit-Hartenberg transformation matrix."""
ct, st = np.cos(theta), np.sin(theta)
ca, sa = np.cos(alpha), np.sin(alpha)
return np.array([
[ct, -st*ca, st*sa, a*ct],
[st, ct*ca, -ct*sa, a*st],
[0, sa, ca, d ],
[0, 0, 0, 1 ]
])
def forward_kinematics(joint_angles, dh_params):
"""
FK: joint angles -> end-effector pose
dh_params: [(a, d, alpha, theta_offset), ...]
"""
T = np.eye(4)
for i, theta in enumerate(joint_angles):
a, d, alpha, theta_off = dh_params[i]
T = T @ dh_transform(a, d, alpha, theta + theta_off)
return T
# 2-DOF planar arm example (link lengths 1.0m each)
dh_params_2dof = [
(1.0, 0, 0, 0),
(1.0, 0, 0, 0),
]
joint_angles = [np.pi/4, np.pi/4]
T_end = forward_kinematics(joint_angles, dh_params_2dof)
print("End-effector position:", T_end[:3, 3])
Inverse Kinematics
Inverse kinematics computes the required joint angles given a desired end-effector pose. There are analytical approaches (closed-form IK) and numerical approaches (Jacobian-based iterative IK).
The core of numerical IK is iterative computation using the Jacobian matrix :
where is the pseudoinverse of the Jacobian. Near singularities, the Jacobian becomes ill-conditioned and the numerical solution can become unstable.
Denavit-Hartenberg Parameters
The DH convention provides a systematic method to assign coordinate frames to each link of a robot. Each joint is described by four parameters: link length , link twist , link offset , and joint angle . This compact representation makes it straightforward to implement FK for any serial manipulator.
2. ROS2 (Robot Operating System 2)
ROS1 vs ROS2: Key Differences
ROS2 was redesigned from scratch to overcome ROS1's limitations, adopting DDS (Data Distribution Service) as its communication middleware.
| Feature | ROS1 | ROS2 |
|---|---|---|
| Communication | Custom TCP/UDP | DDS (OMG standard) |
| Real-time support | Poor | Supported (rclcpp rt) |
| Python version | Python 2/3 | Python 3 only |
| Security | None | SROS2 (TLS/DDS) |
| Multi-robot | Difficult | Easy with namespaces |
| Lifecycle nodes | None | LifecycleNode supported |
| Middleware | roscore required | Decentralized, no roscore |
Core Concepts: Nodes, Topics, Services, Actions
- Node: The fundamental computation unit. Can be one process or contain multiple nodes.
- Topic: Asynchronous publish/subscribe communication (fire-and-forget).
- Service: Synchronous request/response communication (blocking call).
- Action: Asynchronous communication for long-running tasks, with feedback and preemption support.
ROS2 Python Publisher/Subscriber Example
import rclpy
from rclpy.node import Node
from std_msgs.msg import String
class TalkerNode(Node):
def __init__(self):
super().__init__('talker')
self.publisher = self.create_publisher(String, 'chatter', 10)
self.timer = self.create_timer(0.5, self.publish_msg)
self.count = 0
def publish_msg(self):
msg = String()
msg.data = f'Hello ROS2: {self.count}'
self.publisher.publish(msg)
self.get_logger().info(f'Published: {msg.data}')
self.count += 1
class ListenerNode(Node):
def __init__(self):
super().__init__('listener')
self.subscription = self.create_subscription(
String, 'chatter', self.callback, 10
)
def callback(self, msg):
self.get_logger().info(f'Received: {msg.data}')
def main():
rclpy.init()
node = TalkerNode()
rclpy.spin(node)
rclpy.shutdown()
Launch File Example
# launch/my_robot.launch.py
from launch import LaunchDescription
from launch_ros.actions import Node
def generate_launch_description():
return LaunchDescription([
Node(
package='my_robot',
executable='talker',
name='talker_node',
output='screen',
parameters=[{'use_sim_time': False}],
),
Node(
package='my_robot',
executable='listener',
name='listener_node',
output='screen',
),
])
Run with: ros2 launch my_robot my_robot.launch.py
3. Sensor Processing
LiDAR: Point Cloud Processing
LiDAR (Light Detection And Ranging) generates 3D point clouds using laser pulses. It is essential for autonomous navigation and 3D mapping.
import open3d as o3d
import numpy as np
def process_pointcloud(pcd_path):
# Load point cloud
pcd = o3d.io.read_point_cloud(pcd_path)
# Statistical outlier removal
pcd_clean, _ = pcd.remove_statistical_outlier(
nb_neighbors=20, std_ratio=2.0
)
# Voxel downsampling (reduce resolution)
pcd_down = pcd_clean.voxel_down_sample(voxel_size=0.05)
# Normal estimation
pcd_down.estimate_normals(
search_param=o3d.geometry.KDTreeSearchParamHybrid(
radius=0.1, max_nn=30
)
)
# RANSAC plane segmentation (ground removal)
plane_model, inliers = pcd_down.segment_plane(
distance_threshold=0.01,
ransac_n=3,
num_iterations=1000
)
[a, b, c, d] = plane_model
print(f"Ground plane: {a:.2f}x + {b:.2f}y + {c:.2f}z + {d:.2f} = 0")
# Extract non-ground objects
objects_pcd = pcd_down.select_by_index(inliers, invert=True)
return objects_pcd
IMU and the Kalman Filter
An IMU (Inertial Measurement Unit) contains accelerometers and gyroscopes to estimate orientation and velocity. The Kalman Filter reduces sensor noise and provides accurate state estimation.
Two stages of the Kalman Filter:
- Predict: Estimate the current state using the motion model
- Update: Correct the prediction using sensor measurements
where is the Kalman Gain and is the measurement. The EKF (Extended Kalman Filter) handles nonlinear systems via first-order Taylor linearization.
Camera: RGB-D and Stereo Vision
RGB-D cameras (e.g., Intel RealSense, Microsoft Azure Kinect) provide color plus depth per pixel. Stereo cameras compute depth via triangulation. Both are used for obstacle detection, semantic mapping, and grasping.
4. SLAM (Simultaneous Localization and Mapping)
What is SLAM?
SLAM is the problem of building a map of an unknown environment while simultaneously estimating the robot's location within it. It is a foundational challenge in robotics navigation.
Key SLAM challenges:
- Data Association: Matching current observations to known landmarks
- Loop Closure: Recognizing a previously visited place to correct accumulated drift
- Map Representation: Landmark maps, occupancy grids, topological maps, etc.
EKF-SLAM
EKF-SLAM uses the Extended Kalman Filter, linearizing the nonlinear robot motion and observation models via first-order Taylor expansion.
State vector: robot pose + all landmark positions
Computational complexity is in the number of landmarks , making it unsuitable for large-scale environments.
Graph-Based SLAM
Modern SLAM systems typically use a graph-based formulation. Poses are nodes in a graph, and sensor constraints are edges. Loop closure adds edges between distant nodes, and the entire graph is optimized using nonlinear least squares (e.g., g2o, GTSAM).
Real-World SLAM Libraries
| Library | Method | Sensors | Notes |
|---|---|---|---|
| Cartographer | Graph SLAM | LiDAR, IMU | Developed by Google, ROS2 support |
| ORB-SLAM3 | Feature-based | Camera, IMU | Real-time mono/stereo/RGB-D |
| LIO-SAM | Tightly-coupled | LiDAR + IMU | Large outdoor environments |
| RTAB-Map | Graph SLAM | RGB-D, LiDAR | Real-time 3D mapping |
# Install Cartographer for ROS2 Humble
sudo apt install ros-humble-cartographer-ros
# Run Cartographer with TurtleBot3
ros2 launch turtlebot3_cartographer cartographer.launch.py
5. Motion Planning
Planning Algorithm Overview
Motion planning finds a collision-free path from a start configuration to a goal in the robot's configuration space (C-space).
- Grid-based: A*, Dijkstra — optimal on discrete grids, but scale poorly to high DOF
- Sampling-based: RRT, PRM — efficient in high-dimensional C-spaces
- Optimization-based: CHOMP, STOMP — optimize a cost functional over a trajectory
RRT* Implementation
RRT* (Rapidly-exploring Random Tree Star) provides asymptotic optimality guarantees on top of RRT.
import numpy as np
def rrt_star(start, goal, obstacles, max_iter=1000, step=0.5, goal_radius=0.5):
"""
Simple 2D RRT* implementation.
start, goal: np.array([x, y])
obstacles: [(cx, cy, radius), ...]
"""
nodes = [start]
parents = [-1]
costs = [0.0]
def is_collision(p1, p2):
"""Check if segment p1-p2 intersects any obstacle."""
for (cx, cy, r) in obstacles:
center = np.array([cx, cy])
d = p2 - p1
t = np.clip(np.dot(center - p1, d) / (np.dot(d, d) + 1e-9), 0, 1)
closest = p1 + t * d
if np.linalg.norm(closest - center) < r:
return True
return False
for _ in range(max_iter):
rand = goal if np.random.random() < 0.05 else np.random.uniform(-10, 10, 2)
dists = [np.linalg.norm(np.array(n) - rand) for n in nodes]
nearest_idx = int(np.argmin(dists))
nearest = np.array(nodes[nearest_idx])
direction = rand - nearest
norm = np.linalg.norm(direction)
if norm < 1e-9:
continue
new_node = nearest + step * direction / norm
if is_collision(nearest, new_node):
continue
nodes.append(new_node)
parents.append(nearest_idx)
costs.append(costs[nearest_idx] + step)
if np.linalg.norm(new_node - goal) < goal_radius:
print(f"Goal reached! Total nodes: {len(nodes)}")
return nodes, parents
return nodes, parents
# Example usage
start = np.array([0.0, 0.0])
goal = np.array([9.0, 9.0])
obstacles = [(5.0, 5.0, 1.5), (3.0, 7.0, 1.0)]
nodes, parents = rrt_star(start, goal, obstacles)
MoveIt2 for Robot Arm Planning
MoveIt2 is the standard framework for robot arm motion planning in ROS2. It wraps multiple planning libraries (OMPL, STOMP, CHOMP) under a unified API.
# MoveIt2 Python API example
import rclpy
from moveit.planning import MoveItPy
rclpy.init()
moveit = MoveItPy(node_name='moveit_py_planning')
panda_arm = moveit.get_planning_component('panda_arm')
# Set goal pose
from geometry_msgs.msg import PoseStamped
goal_pose = PoseStamped()
goal_pose.header.frame_id = 'panda_link0'
goal_pose.pose.position.x = 0.5
goal_pose.pose.position.y = 0.0
goal_pose.pose.position.z = 0.5
goal_pose.pose.orientation.w = 1.0
panda_arm.set_goal_state(pose_stamped_msg=goal_pose, pose_link='panda_hand')
# Plan and execute
plan_result = panda_arm.plan()
if plan_result:
moveit.execute(plan_result.trajectory)
6. Computer Vision for Robotics
Object Detection with YOLO
Real-time object detection is essential for robots operating in dynamic environments. The YOLO (You Only Look Once) family remains the go-to choice for edge deployment.
from ultralytics import YOLO
import cv2
# Load YOLOv11 nano
model = YOLO('yolo11n.pt')
def detect_objects(frame):
"""Detect objects in a camera frame."""
results = model(frame, conf=0.5)
annotated = results[0].plot()
for box in results[0].boxes:
cls_id = int(box.cls[0])
label = model.names[cls_id]
conf = float(box.conf[0])
x1, y1, x2, y2 = map(int, box.xyxy[0])
print(f"{label}: {conf:.2f} at ({x1},{y1})-({x2},{y2})")
return annotated
# Live webcam detection
cap = cv2.VideoCapture(0)
while cap.isOpened():
ret, frame = cap.read()
if not ret:
break
annotated_frame = detect_objects(frame)
cv2.imshow('Robot Vision', annotated_frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
6D Pose Estimation
To grasp an object, a robot needs to estimate its 6D pose — 3D position and 3D orientation. The PnP (Perspective-n-Point) algorithm solves this given 2D-3D point correspondences.
import cv2
import numpy as np
def estimate_pose(object_points, image_points, camera_matrix, dist_coeffs):
"""
6D pose estimation using PnP.
object_points: 3D points in object coordinate frame
image_points: Corresponding 2D image coordinates
"""
success, rvec, tvec = cv2.solvePnP(
object_points, image_points,
camera_matrix, dist_coeffs,
flags=cv2.SOLVEPNP_ITERATIVE
)
if success:
R, _ = cv2.Rodrigues(rvec)
print(f"Position: {tvec.flatten()}")
print(f"Rotation matrix:\n{R}")
return R, tvec
return None, None
7. Reinforcement Learning for Robot Control
Simulation Environments
| Simulator | Developer | Strengths | Use Case |
|---|---|---|---|
| MuJoCo | Google DeepMind | Accurate physics, fast | Continuous control research |
| PyBullet | Erwin Coumans | Open-source, Python-friendly | Prototyping |
| Isaac Gym | NVIDIA | Massively parallel GPU sim | Large-scale RL |
| CARLA | Intel/UPV | AD-specific sensors | Autonomous driving |
SAC for Continuous Control
SAC (Soft Actor-Critic) is an off-policy RL algorithm that incorporates entropy regularization, balancing exploration and exploitation automatically.
import gymnasium as gym
from stable_baselines3 import SAC
from stable_baselines3.common.callbacks import EvalCallback
# Train SAC on MuJoCo Ant
env = gym.make('Ant-v4', render_mode=None)
eval_env = gym.make('Ant-v4', render_mode=None)
eval_callback = EvalCallback(
eval_env,
best_model_save_path='./logs/best_model',
log_path='./logs/',
eval_freq=10000,
deterministic=True,
render=False
)
model = SAC(
'MlpPolicy',
env,
verbose=1,
learning_rate=3e-4,
buffer_size=1_000_000,
batch_size=256,
ent_coef='auto', # auto-tune entropy coefficient
gamma=0.99,
tau=0.005,
tensorboard_log='./tb_logs/'
)
model.learn(total_timesteps=1_000_000, callback=eval_callback)
model.save('ant_sac_final')
# Inference / deployment
obs, _ = env.reset()
total_reward = 0.0
for _ in range(1000):
action, _ = model.predict(obs, deterministic=True)
obs, reward, done, truncated, info = env.step(action)
total_reward += reward
if done or truncated:
print(f"Episode ended. Total reward: {total_reward:.2f}")
obs, _ = env.reset()
total_reward = 0.0
Sim-to-Real Transfer
Bridging the Reality Gap between simulation and the real world is one of the central challenges in robot learning.
- Domain Randomization: Randomly vary physical parameters (friction, mass, joint stiffness) in simulation so the policy learns to cope with a wide distribution of environments.
- Domain Adaptation: Fine-tune the policy using a small amount of real-world data.
- System Identification: Precisely measure physical parameters of the real robot and incorporate them in the simulator.
- Randomized Rendering: For vision-based policies, randomize textures, lighting, and camera properties.
8. Autonomous Driving
SAE Autonomy Levels
| Level | Name | Description | Examples |
|---|---|---|---|
| L0 | No Automation | Driver fully in control | Standard vehicles |
| L1 | Driver Assistance | Single feature assist | ACC, LKAS |
| L2 | Partial Automation | Combined features | Tesla Autopilot |
| L3 | Conditional Automation | Self-driving in specific conditions | Mercedes Drive Pilot |
| L4 | High Automation | Full autonomy in geofenced areas | Waymo One |
| L5 | Full Automation | All conditions, no driver | Not yet realized |
Perception-Prediction-Planning-Control Pipeline
class AutonomousDrivingSystem:
"""Standard autonomous driving pipeline."""
def __init__(self, perception, prediction, planning, control):
self.perception = perception
self.prediction = prediction
self.planning = planning
self.control = control
self.goal = None
def run_step(self, sensor_data):
# 1. Perception: detect and track objects, localize ego vehicle
detected_objects = self.perception.detect(sensor_data)
ego_state = self.perception.localize(sensor_data)
# 2. Prediction: forecast future trajectories of other agents
predicted_trajectories = self.prediction.predict(
detected_objects, horizon=3.0 # 3-second prediction horizon
)
# 3. Planning: generate a safe, comfortable trajectory to goal
trajectory = self.planning.plan(
ego_state, predicted_trajectories, goal=self.goal
)
# 4. Control: track the planned trajectory
steering, throttle, brake = self.control.compute(
ego_state, trajectory
)
return steering, throttle, brake
Sensor Fusion: LiDAR + Camera
Modern autonomous vehicles fuse LiDAR point clouds and camera images. LiDAR provides accurate depth; cameras provide rich semantic information. Common approaches include:
- Early fusion: Project LiDAR points onto image, create a unified representation
- Late fusion: Run separate detectors, combine outputs with Kalman filtering
- Deep fusion: End-to-end networks that take both modalities as input
9. Humanoid Robots in 2026
The humanoid robot market is growing rapidly in 2026. Here is a snapshot of the major players.
Humanoid Robot Comparison
| Robot | Company | Height / Weight | Key Features |
|---|---|---|---|
| Atlas (Electric) | Boston Dynamics | 1.5m / 89kg | Electric actuators, advanced mobility |
| Optimus Gen 3 | Tesla | 1.73m / 57kg | 22-DOF hands, FSD AI stack |
| Figure 02 | Figure AI | 1.7m / 60kg | OpenAI collaboration, voice understanding |
| Unitree H1/G1 | Unitree Robotics | 1.8m/1.3m | Cost-effective, open-source friendly |
| 1X NEO Beta | 1X Technologies | 1.65m / 30kg | Lightweight, soft actuators |
Foundation Models for Robotics
RT-2 (Robotics Transformer 2), developed by Google DeepMind, is a vision-language-action (VLA) model that directly applies knowledge learned from web-scale data to robot control. It understands natural language commands and generalizes to novel objects.
OpenVLA is an open-source VLA model that researchers can fine-tune on their own robot data.
# OpenVLA inference example (conceptual)
from transformers import AutoModelForVision2Seq, AutoProcessor
import torch
model_name = "openvla/openvla-7b"
processor = AutoProcessor.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForVision2Seq.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
device_map='auto',
trust_remote_code=True
)
# Generate robot action from vision + language
image = get_camera_image() # robot wrist camera image
instruction = "pick up the red block and place it in the box"
inputs = processor(instruction, image).to('cuda:0', dtype=torch.bfloat16)
action = model.predict_action(**inputs, unnorm_key='bridge_orig', do_sample=False)
# action: [dx, dy, dz, droll, dpitch, dyaw, gripper]
robot.execute_action(action)
Embedded AI and On-Device Inference
Humanoid robots must make decisions in real time without relying on the cloud. Key hardware platforms include NVIDIA Jetson AGX Orin (275 TOPS) and Qualcomm Robotics RB5.
Key techniques for efficient on-device inference:
- Quantization: INT8 / FP16 reduces model size and latency
- Pruning: Remove redundant weights to reduce computation
- TensorRT / ONNX Runtime: Hardware-optimized inference engines
- Distillation: Train a smaller student model from a large teacher
10. Quiz: Robotics & AI Core Concepts
Q1. Why is Loop Closure critical in SLAM?
Answer: It allows the robot to recognize a previously visited place and correct the accumulated pose estimation error.
Explanation: During extended navigation, small sensor and odometry errors accumulate over time, causing the robot's estimated trajectory to drift significantly from reality. When a loop closure is detected (e.g., through feature matching or place recognition), a graph optimization step (using tools like g2o or GTSAM) redistributes this accumulated error across all past poses, dramatically improving map consistency.
Q2. What are the primary advantages of ROS2 using DDS over ROS1?
Answer: Standardized middleware, real-time support, flexible multi-robot networking, no single point of failure (no roscore), and built-in security (SROS2).
Explanation: ROS1 depended on a single roscore process, which was a single point of failure. ROS2's DDS-based architecture is fully decentralized — nodes discover each other automatically without a broker. QoS (Quality of Service) policies allow fine-grained control over communication reliability, latency, and history. SROS2 adds TLS-based encryption and authentication.
Q3. What role does the entropy term play in the SAC algorithm?
Answer: It encourages the policy to remain stochastic (exploratory) by including entropy as part of the reward signal, preventing premature convergence to suboptimal deterministic behaviors.
Explanation: SAC's objective is to maximize the sum of expected return and policy entropy. A higher entropy coefficient (alpha) encourages more exploration. When set to 'auto', SAC adjusts alpha dynamically to maintain a target entropy level. This mechanism helps escape local optima and produces more robust policies that generalize better to unseen situations.
Q4. Why is inverse kinematics fundamentally harder than forward kinematics?
Answer: IK is a nonlinear system of equations that may have no solution, a unique solution, or infinitely many solutions, whereas FK always produces a unique result.
Explanation: FK is a straightforward composition of transformation matrices — given joint angles, there is exactly one end-effector pose. IK inverts this mapping: for a 7-DOF redundant manipulator, infinitely many joint configurations reach the same end-effector pose (kinematic redundancy). At workspace boundaries, no solution may exist. Near singularities, the Jacobian becomes rank-deficient, causing numerical instability in iterative solvers.
Q5. Why does Domain Randomization work for Sim-to-Real transfer?
Answer: By training across a wide distribution of simulated physical parameters, the policy learns to be robust to the variability found in the real world.
Explanation: A policy trained with fixed simulation parameters overfits to those specific conditions and fails when physical properties in reality differ (different friction, actuator delays, sensor noise). Domain Randomization treats the real world as just another sample from the training distribution. OpenAI's Dexterous Hand and NVIDIA's quadruped locomotion work are landmark demonstrations of this approach, achieving remarkable real-world performance purely through massively randomized simulation.