💡 왼쪽 원문을 읽으면서 오른쪽에 따라 써보세요. Tab 키로 힌트를 받을 수 있습니다.

원문 렌더가 준비되기 전까지 텍스트 가이드로 표시합니다.

Prologue — The robotics "ChatGPT moment" turned out to be real

When ChatGPT shipped in November 2022, many robotics researchers asked the same question: **"When does our ChatGPT moment arrive?"** For robots to do in the physical world what LLMs did with text, four axes had to align — data, simulation, hardware, and models.

As of May 2026, that alignment is mostly complete.

- **Data**: The Hugging Face LeRobot dataset hub opened in October 2024, and within 18 months thousands of robot datasets have been uploaded. The Open X-Embodiment consortium has gathered 22 robot platforms, 527 skills, and over 1 million trajectories.

- **Simulation**: NVIDIA Isaac Sim 5.0 delivers GPU-accelerated physics and rendering on Omniverse. MuJoCo became fully open source after Google DeepMind's acquisition. Gazebo Harmonic took over from Classic.

- **Hardware**: Humanoids like Tesla Optimus, Figure 02, 1X NEO Beta, Unitree G1, Apptronik Apollo, and Sanctuary Phoenix are entering mass production in the 10K~20K USD price band.

- **Models**: Google RT-2 (2023), OpenVLA (2024), Physical Intelligence π0 (2024), NVIDIA GR00T N1 (2025) — **VLA (Vision-Language-Action)** has emerged as a new model category.

On top of all this sits **ROS 2**. ROS 1 effectively died when Noetic hit EOL in May 2025, and ROS 2 — Jazzy Jalisco (2024.05), Kilted Kaiju (2025.05), and Lyrical Luth (2026.05 imminent) — has settled into a yearly LTS cadence.

This article maps the entire 2026 robotics stack — ROS 2, Nav2, MoveIt 2, Gazebo, Isaac Sim, MuJoCo, LeRobot, GR00T, VLAs, Foxglove — in one breath.

1. The 2026 robotics landscape — humanoid boom meets LLM+robot

First the big picture. Four major currents are running through robotics in 2026, all at once.

1.1 Humanoid boom

If 2024~2025 was the "year of demos," 2026 is the "year of mass production."

- **Tesla Optimus** — Beta launched late 2025, external sales targeted for Q3 2026. Runs on Tesla's own FSD chip.

- **Figure 02** — Partnership with OpenAI; 100 units deployed at BMW's Spartanburg plant.

- **1X NEO Beta** — Announced 2024, home shipments starting 2026. Price around 20K USD.

- **Boston Dynamics Atlas (Electric)** — Hydraulic Atlas retired April 2024; electric Atlas debuted. Now owned by Hyundai, trained on Hyundai Motor Group's data and assembly lines.

- **Unitree G1** — Chinese Unitree, 16K USD, open API.

- **Apptronik Apollo** — Deployed at Mercedes-Benz factories.

- **Sanctuary Phoenix** — 7th-generation industrial humanoid from Canada.

1.2 Simulation-first learning (Sim-to-Real)

Humanoids' biggest problem is **data scarcity**. Humans see and hear petabytes of data over a lifetime; humanoids have a few terabytes at most. Enter **simulation-first learning**.

NVIDIA's slogan is "**3 Computers**" — a training computer (DGX), a simulation computer (Omniverse/Isaac Sim), and an execution computer (Jetson Thor). Run thousands of parallel environments per second in Isaac Sim, train policies via reinforcement learning, then transfer those policies to real robots (sim-to-real).

1.3 VLA — the robot GPT moment

The first signal came in July 2023 with Google DeepMind's **RT-2 (Robotic Transformer 2)**. They took a VLM (PaLI-X) and fine-tuned it to emit action tokens. The result generalized natural-language commands like "pick up the extinct animal" to objects it had never seen.

Stanford released **OpenVLA (7B)** open source in June 2024. In October 2024, Physical Intelligence unveiled **π0 (pi-zero)** — a flow-matching-based 7B VLA — showing dexterity demos like folding shirts and washing dishes. NVIDIA's **GR00T N1 (2025)** is a 17B model dedicated to humanoids.

1.4 Data democratization — Hugging Face LeRobot

In October 2024, Hugging Face launched **LeRobot**, opening robotics ML from the "secret garden" of pre-ChatGPT NLP to GitHub-level accessibility. Pre-trained models, datasets, tutorials, and learning robot arm kits like SO-100/SO-101 (around 200 USD) all in one bundle.

2026 robotics 4-stack

[ Model ] RT-2 / OpenVLA / π0 / GR00T N1 / Helix

[ Learning ] LeRobot / Open X-Embodiment / RoboHive

[ Simulator ] Isaac Sim 5.0 / MuJoCo / Gazebo Harmonic / Webots

[ Middleware] ROS 2 Jazzy/Kilted/Lyrical + Nav2 + MoveIt 2

[ Hardware ] Optimus / Figure / 1X / Atlas / G1 / Apollo / Phoenix

[ Visualize ] Foxglove Studio / RViz2 / PlotJuggler

We'll walk through each layer.

2. ROS 2 Jazzy → Kilted → Lyrical — one LTS per year

2.1 From ROS 1 to ROS 2 — why move

ROS 1 started at Willow Garage in 2007 and became the de facto standard in academia and research. But it had limits.

- **Single master (roscore) dependency** — every node disconnects if the master dies.

- **Lack of real-time** — robotics control needs millisecond determinism; ROS 1 didn't guarantee it.

- **No security** — anyone could publish/subscribe to any topic.

- **Weak Windows / embedded support**.

ROS 2 was redesigned from scratch to address these.

- **DDS (Data Distribution Service)** as middleware — industrial communications standard, no master.

- **QoS (Quality of Service)** policies — fine-grained control over Reliability, Durability, History.

- **DDS-Security** — authentication, encryption, access control.

- **rclcpp/rclpy** — C++17 and Python 3 based.

When ROS 1 Noetic hit EOL in May 2025, the ROS 1 era officially ended.

2.2 ROS 2 LTS cadence — May Day releases

ROS 2 releases every May (right after Ubuntu LTS). Even years are 5-year LTS; odd years are 2-year non-LTS.

|---|---|---|---|---|

| Foxy Fitzroy | 2020.06 | 2023.06 | 22.04 | First LTS |

| Galactic Geochelone | 2021.05 | 2022.12 | 20.04 | Non-LTS |

| Humble Hawksbill | 2022.05 | 2027.05 | 22.04 | LTS, most widely deployed |

| Iron Irwini | 2023.05 | 2024.12 | 22.04 | Non-LTS |

| Jazzy Jalisco | 2024.05 | 2029.05 | 24.04 | **LTS** |

| Kilted Kaiju | 2025.05 | 2026.12 | 24.04 | Non-LTS |

As of May 2026, **Jazzy (production)** and **Kilted (latest)** are the main targets, with Lyrical Luth about to land. New projects should start on Jazzy.

2.3 ROS 2 core concepts — a 5-minute refresher

Python rclpy node example

from rclpy.node import Node

from std_msgs.msg import String

class HelloPublisher(Node):

def __init__(self):

super().__init__('hello_publisher')

self.publisher = self.create_publisher(String, 'greetings', 10)

self.timer = self.create_timer(1.0, self.tick)

def tick(self):

msg = String()

msg.data = 'Hello, ROS 2 Jazzy'

self.publisher.publish(msg)

self.get_logger().info(f'Published: {msg.data}')

def main():

rclpy.init()

node = HelloPublisher()

rclpy.spin(node)

node.destroy_node()

rclpy.shutdown()

Four core communication patterns:

1. **Topic** — pub/sub, async streams (sensor data, images).

2. **Service** — synchronous RPC, short request/response (parameter queries).

3. **Action** — long-running tasks with feedback and cancellation (navigation, pickup).

4. **Parameter** — node settings, can be changed at runtime.

2.4 Build system — colcon

ROS 2 uses `colcon` for builds. A typical workspace pattern:

Create workspace

mkdir -p ros2_ws/src

cd ros2_ws

Clone packages into src

cd src

git clone https://github.com/ros-planning/navigation2.git -b jazzy

cd ..

Install dependencies

rosdep install --from-paths src --ignore-src -r -y

Build (parallel, symlink for fast dev iteration)

colcon build --symlink-install --parallel-workers $(nproc)

Source the environment

source install/setup.bash

2.5 What changed — Jazzy → Kilted → Lyrical migration notes

Major changes since Jazzy:

- **Iceoryx2 middleware** — Eclipse Iceoryx2 is stable. Microsecond IPC via shared memory.

- **Zenoh RMW** — Eclipse Zenoh officially adopted as a ROS 2 middleware. Strong for multi-robot and cloud gateways.

- **rcl_logging improvements** — Structured logging, JSON output.

- **Python 3.12 default**.

- **Gazebo Sim integration** — `ros_gz` package hit 1.0.

3. Nav2 (Steve Macenski) — the navigation stack

3.1 What is Nav2

**Nav2 (Navigation 2)** is the ROS 2 redesign of ROS 1's `move_base` navigation stack. Maintained by **Steve Macenski** (formerly Samsung Research America), who now consults through Open Navigation LLC.

Nav2 bundles together:

- **Global planner** — start-to-goal pathfinding (Navfn, SmacPlanner, ThetaStar).

- **Local planner (Controller)** — path following with obstacle avoidance (DWB, RPP, MPPI).

- **Recovery Behaviors** — back up, spin when stuck.

- **BT Navigator** — Behavior Tree controlling the whole flow.

- **Costmap 2D** — global and local cost maps.

- **Lifecycle nodes** — configure/activate/deactivate state machine.

3.2 Behavior Trees for navigation flow

Nav2's signature feature is the **Behavior Tree (BT)**. The whole navigation flow is defined in XML.

BT win: reconfigure behavior by editing XML, no code changes. Visualizes nicely for debugging (Groot, BehaviorTree.CPP tools).

3.3 MPPI Controller — the 2024 standard

Nav2 has several local controllers, but the de facto standard since 2024 is the **MPPI (Model Predictive Path Integral) Controller**. Sampling-based MPC that simulates thousands of trajectories in parallel and picks the lowest-cost one.

nav2_params.yaml (MPPI section)

controller_server:

ros__parameters:

controller_plugins: ["FollowPath"]

FollowPath:

plugin: "nav2_mppi_controller::MPPIController"

time_steps: 56

model_dt: 0.05

batch_size: 2000

vx_std: 0.2

vy_std: 0.2

wz_std: 0.4

vx_max: 0.5

vx_min: -0.35

vy_max: 0.5

wz_max: 1.9

iteration_count: 1

prune_distance: 1.7

transform_tolerance: 0.1

temperature: 0.3

gamma: 0.015

motion_model: "DiffDrive"

visualize: false

critics:

- "ConstraintCritic"

- "ObstaclesCritic"

- "GoalCritic"

- "GoalAngleCritic"

- "PathAlignCritic"

- "PathFollowCritic"

- "PathAngleCritic"

- "PreferForwardCritic"

MPPI handles avoidance and cornering more naturally than DWB or Regulated Pure Pursuit, and gets faster with GPU acceleration. Downside: many parameters to tune.

3.4 SLAM — two common companions

For unmapped environments, run SLAM alongside Nav2.

- **SLAM Toolbox** (Steve Macenski) — 2D LiDAR based, lifelong mapping + localization. Most widely used.

- **RTAB-Map** — RGB-D camera based 3D SLAM. ROS 2 supported.

Run SLAM Toolbox async mode (mapping)

ros2 launch slam_toolbox online_async_launch.py \

params_file:=./config/mapper_params_online_async.yaml \

use_sim_time:=true

Save the map

ros2 service call /slam_toolbox/save_map slam_toolbox/srv/SaveMap "{name: {data: 'my_map'}}"

4. MoveIt 2 — motion planning

4.1 What MoveIt does

**MoveIt** is a motion-planning framework for robot arms (manipulators). Input: "send the end-effector to (x, y, z, roll, pitch, yaw)". Output: a joint trajectory that reaches that pose without collisions.

Internally:

1. **IK (Inverse Kinematics)** — solve joint angles for the target pose.

2. **Collision checking** — self-collision plus environment collisions.

3. **Trajectory planning** — OMPL / STOMP / CHOMP / Pilz algorithms.

4. **Trajectory execution** — send to a controller on top of `ros2_control`.

4.2 MoveIt Setup Assistant

The starting point for adding a new robot to MoveIt is the **Setup Assistant** GUI.

ros2 launch moveit_setup_assistant setup_assistant.launch.py

Load a URDF / Xacro and the GUI lets you:

- Auto-generate the Self-Collision Matrix.

- Define Planning Groups (e.g., arm + gripper).

- Register pre-defined poses (home, ready).

- Specify end effectors.

- Configure controllers and sensors.

The result is a `moveit_config` package.

4.3 MoveIt 2 example — pick and place

Python MoveIt 2 (moveit_py)

from moveit.planning import MoveItPy

from geometry_msgs.msg import PoseStamped

def main():

rclpy.init()

moveit = MoveItPy(node_name="moveit_py_demo")

arm = moveit.get_planning_component("manipulator")

target_pose = PoseStamped()

target_pose.header.frame_id = "base_link"

target_pose.pose.position.x = 0.4

target_pose.pose.position.y = 0.1

target_pose.pose.position.z = 0.5

target_pose.pose.orientation.w = 1.0

arm.set_start_state_to_current_state()

arm.set_goal_state(pose_stamped_msg=target_pose, pose_link="tool0")

plan_result = arm.plan()

if plan_result:

robot_trajectory = plan_result.trajectory

moveit.execute(robot_trajectory, controllers=[])

else:

print("Planning failed")

moveit.shutdown()

rclpy.shutdown()

if __name__ == "__main__":

main()

The C++ API uses `MoveGroupInterface` most often. Both wrap the same backend (OMPL).

4.4 OMPL vs STOMP vs Pilz

|---|---|---|---|

Start with RRTConnect by default; use Pilz for repetitive industrial motions.

4.5 Connection to ros2_control

MoveIt 2 only plans — execution is handled by `ros2_control`. The standard interface between them is the **JointTrajectoryController**.

controller_manager config

controller_manager:

ros__parameters:

update_rate: 100

joint_trajectory_controller:

type: joint_trajectory_controller/JointTrajectoryController

joint_trajectory_controller:

ros__parameters:

joints:

- shoulder_pan_joint

- shoulder_lift_joint

- elbow_joint

- wrist_1_joint

- wrist_2_joint

- wrist_3_joint

command_interfaces:

- position

state_interfaces:

- position

- velocity

5. Gazebo Harmonic — the standard simulator

5.1 Death of Gazebo Classic, birth of New Gazebo

What ROS users casually call "**Gazebo**" is actually two different pieces of software.

- **Gazebo Classic** (gazebo7~gazebo11) — Started 2002, EOL January 2025.

- **New Gazebo** (Citadel, Edifice, Fortress, Garden, Harmonic) — A rewrite from Ignition Robotics. In 2022, the project dropped the Ignition brand and reclaimed the Gazebo name.

As of 2026, new projects unconditionally pick **New Gazebo Harmonic (2023.09 LTS, supported through 2028)**.

5.2 Gazebo Harmonic highlights

- **gz-sim** — simulation engine, ECS (Entity-Component-System) architecture.

- **DART** as the default physics engine (Bullet and ODE selectable).

- **OGRE 2** renderer — Metal / Vulkan / OpenGL.

- **SDFormat 1.10** — world and robot description format.

- **GUI** — Qt-based, pluggable.

- **ROS 2 integration** — `ros_gz_bridge` for topic mapping.

5.3 SDF world example

<?xml version="1.0"?>

5.4 ros_gz_bridge to connect ROS 2 topics

Gazebo topics use Gazebo Transport (a separate middleware); ROS 2 uses DDS. `ros_gz_bridge` connects them.

Map /cmd_vel bidirectionally

ros2 run ros_gz_bridge parameter_bridge \

/cmd_vel@geometry_msgs/msg/Twist@gz.msgs.Twist \

/odom@nav_msgs/msg/Odometry[gz.msgs.Odometry \

/scan@sensor_msgs/msg/LaserScan[gz.msgs.LaserScan

5.5 Limits — why Isaac Sim came alongside

Gazebo Harmonic is excellent, but has limits.

- **GPU utilization** — Rendering uses the GPU, but physics is still CPU. Hard to run dozens or hundreds of parallel environments.

- **Photorealistic rendering** — Insufficient for vision-training synthetic data.

- **Soft body / fluids** — Limited.

Isaac Sim fills this gap.

6. Isaac Sim 5.0 (NVIDIA Omniverse) + Isaac ROS

6.1 What sets Isaac Sim apart

NVIDIA **Isaac Sim** is a robot simulator built on Omniverse. **Isaac Sim 5.0** shipped in January 2026, and its core features are:

- **PhysX 5** — GPU-accelerated physics, thousands of parallel environments.

- **RTX path tracing** — photorealistic rendering, optimal for ML synthetic data.

- **OpenUSD** scene graph — the standard pushed by Pixar, Apple, and Adobe together.

- **Isaac Lab** — reinforcement learning framework, Gym-compatible.

- **GR00T integration** — humanoid foundation model data generation.

6.2 Isaac Lab reinforcement learning code (simple example)

Building a Cartpole RL env with Isaac Lab (conceptual)

from omni.isaac.lab.envs import ManagerBasedRLEnv

from omni.isaac.lab.scene import InteractiveSceneCfg

from omni.isaac.lab.assets import ArticulationCfg

from omni.isaac.lab.terrains import TerrainImporterCfg

class CartpoleSceneCfg(InteractiveSceneCfg):

cartpole = ArticulationCfg(

prim_path="/World/envs/env_.*/Cartpole",

usd_path="omniverse://localhost/NVIDIA/Assets/Robots/Cartpole/cartpole.usd",

)

env_cfg = ManagerBasedRLEnvCfg(

scene=CartpoleSceneCfg(num_envs=4096, env_spacing=4.0),

sim={"dt": 1/120, "device": "cuda:0"},

)

env = ManagerBasedRLEnv(cfg=env_cfg)

Train with PPO

from skrl.agents.torch.ppo import PPO

agent = PPO(models=..., memory=..., cfg={"learning_epochs": 8})

agent.train(env, timesteps=1_000_000)

Run 4096 parallel environments on a single RTX 4090 and a policy converges in a few days. Pulling this off in Gazebo would take dozens of CPU servers.

6.3 Isaac ROS — NVIDIA's ROS 2 accelerators

**Isaac ROS** is a separate suite from Isaac Sim — NVIDIA's collection of CUDA-accelerated ROS 2 packages.

| Package | Function | Acceleration |

|---|---|---|

| `isaac_ros_visual_slam` | RGB-D visual SLAM | GPU |

| `isaac_ros_nvblox` | 3D reconstruction / occupancy | GPU |

| `isaac_ros_apriltag` | AprilTag detection | GPU |

| `isaac_ros_centerpose` | 6DOF pose estimation | TensorRT |

| `isaac_ros_dnn_image_encoder` | camera → ML preprocessing | GPU |

Optimized for Jetson Orin / Thor boards, the same perception pipeline runs 10~30x faster than on CPU.

6.4 OpenUSD comes to robotics

**OpenUSD (Universal Scene Description)** was created by Pixar and standardized as AOUSD (Alliance for OpenUSD) in 2023. It started in film and games, but NVIDIA is pushing it as the standard for robotics.

- Represents scenes, assets, materials, and physics as **a single graph**.

- Layering (base + variant) — multiple teams can work on the same scene concurrently.

- Python API.

- Isaac Sim, Blender, Maya, Houdini all support it.

URDF / SDF are fine for defining a single robot, but weak for large environments like factories or cities. USD likely replaces them there.

7. MuJoCo (DeepMind) — physics simulation

7.1 The MuJoCo story — Roboti LLC → DeepMind acquisition → open source

**MuJoCo (Multi-Joint dynamics with Contact)** is a physics engine Emo Todorov built in the early 2010s. Originally paid commercial software (Roboti LLC), it powered some OpenAI Gym environments (Humanoid, HalfCheetah). Student licenses existed but commercial ones were expensive.

In October 2021, Google DeepMind acquired MuJoCo and made it **fully free and open source (Apache 2.0)**. That single move dramatically lowered the barrier for reinforcement learning and robotics research.

7.2 MuJoCo strengths

- **Contact dynamics** — Friction and impact are more accurate and stable than in other engines.

- **Speed** — Optimized for parallel simulation, with a GPU backend (MJX).

- **Determinism** — Same input, same output (critical for RL reproducibility).

- **MJCF** — XML-based model format, more concise than URDF.

7.3 MJCF example (humanoid)

7.4 MJX — JAX backend for GPU acceleration

Shipped in 2024, **MJX** is MuJoCo on JAX — simulate tens of thousands of environments on GPU / TPU at once. Same category as Brax (Google) or IsaacGym.

from mujoco import mjx

mj_model = mujoco.MjModel.from_xml_path("humanoid.xml")

mjx_model = mjx.put_model(mj_model)

Simulate 4096 envs in parallel

batch_size = 4096

keys = jax.random.split(jax.random.PRNGKey(0), batch_size)

@jax.jit

@jax.vmap

def reset(key):

data = mjx.make_data(mjx_model)

return data

@jax.jit

@jax.vmap

def step(data, action):

data = data.replace(ctrl=action)

data = mjx.step(mjx_model, data)

return data

batch_data = reset(keys)

actions = jax.numpy.zeros((batch_size, mj_model.nu))

for _ in range(1000):

batch_data = step(batch_data, actions)

7.5 MuJoCo Menagerie — official robot collection

The **MuJoCo Menagerie** repo maintained by DeepMind has validated MJCFs for industrial robots.

- Unitree H1, G1, Go2

- Boston Dynamics Spot

- Franka Emika Panda

- KUKA iiwa14

- Shadow Dexterous Hand

- Anybotics Anymal C

- Universal Robots UR5e / UR10e

Drop them in for RL or imitation-learning simulations directly.

8. Webots (Cyberbotics) — the lightweight alternative

8.1 Webots' niche

**Webots** is a robot simulator built since 1996 by Swiss company Cyberbotics. Open-sourced (Apache 2.0) in 2018.

Why use Webots when Gazebo, Isaac Sim, and MuJoCo exist?

- **Lightweight** — Gazebo Harmonic is a pain to install; Webots is a single package and done.

- **Education** — Standard in competitions like RoboCup, DARPA Subterranean, DJI RoboMaster.

- **Rich built-in robots** — Around 100 robot models bundled (e-puck, NAO, Pioneer, TurtleBot).

- **C / C++ / Python / Java / MATLAB / ROS / ROS 2** all supported.

8.2 Webots controller example

Webots Python controller

from controller import Robot

TIME_STEP = 32

MAX_SPEED = 6.28

robot = Robot()

left_motor = robot.getDevice("left wheel motor")

right_motor = robot.getDevice("right wheel motor")

left_motor.setPosition(float('inf'))

right_motor.setPosition(float('inf'))

ds_left = robot.getDevice("ds_left")

ds_right = robot.getDevice("ds_right")

ds_left.enable(TIME_STEP)

ds_right.enable(TIME_STEP)

while robot.step(TIME_STEP) != -1:

left = ds_left.getValue()

right = ds_right.getValue()

if left < 100:

left_motor.setVelocity(-MAX_SPEED * 0.5)

right_motor.setVelocity(MAX_SPEED * 0.5)

elif right < 100:

left_motor.setVelocity(MAX_SPEED * 0.5)

right_motor.setVelocity(-MAX_SPEED * 0.5)

else:

left_motor.setVelocity(MAX_SPEED)

right_motor.setVelocity(MAX_SPEED)

8.3 Which simulator for which job

| Use case | Recommended |

|---|---|

| Learning, tutorials, education | Webots |

| Industrial mobile robots (ROS 2 integration) | Gazebo Harmonic |

| RL, humanoids, synthetic data | Isaac Sim 5.0 |

| Manipulation, dexterity, contact-rich tasks | MuJoCo |

| Fast parallel RL | MJX or Isaac Lab |

| Multi-robot, drone swarms | Gazebo + Zenoh |

9. Hugging Face LeRobot (2024.10) — democratizing robotics ML

9.1 Why LeRobot emerged

In October 2024, Hugging Face launched **LeRobot**. Catchphrase: "**Robotics for everyone**". Core idea — bundle these together:

- **Pre-trained models** — π0, ACT, Diffusion Policy, TDMPC, etc.

- **Dataset hub** — Open X-Embodiment, BridgeData V2, DROID in hf-datasets format.

- **Affordable hardware kits** — SO-100 (around 200 USD), SO-101, Koch v1.1, Stanford Mobile Aloha clones.

- **Tutorials** — Run a full imitation-learning cycle in Colab in 30 minutes.

The point is **full integration with PyTorch + the Hugging Face ecosystem**. Same play Transformers ran on NLP, now for robotics.

9.2 LeRobot data pipeline

Load a LeRobot dataset

from lerobot.common.datasets.lerobot_dataset import LeRobotDataset

dataset = LeRobotDataset("lerobot/aloha_static_coffee")

print(dataset.meta.info)

Grab one episode

episode = dataset[0]

print(episode["observation.image.cam_high"].shape)

print(episode["action"].shape)

9.3 ACT imitation learning

ACT (Action Chunking Transformer) is the imitation-learning baseline from the Stanford Mobile Aloha team. Trainable directly inside LeRobot.

Collect data with an SO-100 arm

python lerobot/scripts/control_robot.py record \

--robot.type=so100 \

--control.type=record \

--control.fps=30 \

--control.repo_id=youngju/so100_pick_and_place \

--control.num_episodes=50 \

--control.warmup_time_s=5 \

--control.episode_time_s=30 \

--control.push_to_hub=true

Train the ACT policy

python lerobot/scripts/train.py \

policy=act \

dataset_repo_id=youngju/so100_pick_and_place \

env=so100 \

training.offline_steps=80_000 \

training.batch_size=32 \

training.lr=1e-4

Evaluate on the real robot

python lerobot/scripts/eval.py \

-p youngju/so100_pick_and_place_act \

eval.n_episodes=10

9.4 SO-100 — robot learning that starts at 200 USD

**SO-100** (originally designed by The Robot Studio) is a 5DOF desktop arm whose parts cost around 200~300 USD. Built from 3D-printed parts plus Dynamixel (Robotis) or Feetech servos. LeRobot ships SO-100 / SO-101 drivers out of the box, so the data collection / training / execution cycle runs at home.

That's LeRobot's real impact. Just as NLP was expensive and gated before the ChatGPT API, robotics research only happened on top of a 50K USD industrial arm. Now that floor is 300 USD.

10. NVIDIA GR00T — humanoid foundation model

10.1 Project GR00T begins

Jensen Huang announced **Project GR00T (Generalist Robot 00 Technology)** at GTC March 2024. In one line: "**GPT for humanoids**".

The project bundles three things.

1. **GR00T N1** (2025.03) — 17B parameter VLA, first publicly released weights.

2. **Isaac Lab + Cosmos** — simulation and synthetic-data generation.

3. **Jetson Thor** — humanoid-onboard inference SoC, shipped 2025.

10.2 GR00T N1 architecture

GR00T N1's core is a **System 1 / System 2 dual mode** (borrowed directly from Kahneman).

- **System 1 (fast)** — Diffusion Transformer, action output at 100Hz. Reflexive motor control.

- **System 2 (slow)** — VLM (NVILA), reasoning and planning at 1~10Hz. Environment understanding.

The two connect through token cross-attention. Same as how a human catches a ball — slow planning (System 2: "ball falling → reach hand") and fine-grained finger adjustment (System 1) happen simultaneously.

10.3 GR00T data pyramid

NVIDIA's data strategy:

Real robot demonstrations (tens to hundreds of hours)

up (least amount, most expensive)

Tele-op (VR like Apple Vision Pro)

Simulation (Isaac Sim parallel envs)

Cosmos synthetic video (millions of hours)

| (most amount, cheapest)

Internet video / documents

Top to bottom: less quantity, higher quality. GR00T pre-trains on internet video, augments with Cosmos synthetic data, fine-tunes via RL in Isaac Sim, and finishes with real demonstrations.

10.4 GR00T inference code (conceptual)

GR00T inference (conceptual interface)

from gr00t import GR00TPolicy

policy = GR00TPolicy.from_pretrained("nvidia/gr00t-n1")

observation = {

"video.image_left": ..., # left camera

"video.image_right": ..., # right camera

"state.joint_pos": ..., # 28 DOF humanoid

"language.instruction": "Pick up the red block and place it on the shelf"

}

System 2 plans at 5Hz, System 1 acts at 50Hz

for t in range(simulation_steps):

action = policy.predict(observation) # 28D action vector

new_observation = robot.step(action)

observation = new_observation

10.5 Other humanoid foundation models

- **Figure Helix** (2025.02) — Figure AI's in-house model after leaving OpenAI partnership.

- **1X World Model** — Video-prediction model for the home robot.

- **Tesla Optimus FSD** — Self-driving stack ported to humanoids.

- **Physical Intelligence π0** — Next-generation π0.5 in development.

11. VLA models — RT-2, OpenVLA, π0

11.1 What is a VLA

A **VLA (Vision-Language-Action)** model takes an image (V) plus a language instruction (L) and outputs action (A) tokens. The next-token prediction pattern from LLMs, applied to actions.

[ Image frames ] -> vision encoder ----+

[ "Pick up apple" ] -> tokenizer ------+-> Transformer -> [ action tokens ]

[ Robot state ] -> projector ----------+ |

[ dx, dy, dz, gripper ]

11.2 RT-2 (2023.07) — first signal

Google DeepMind's **RT-2** was a VLM (PaLI-X 55B, PaLM-E 12B) fine-tuned with RT-1 robot data. The trick: encode actions as tokens and add them to the LLM vocabulary.

- Input: camera image + "Move the empty can to the apple"

- Output: `<action>+13 +127 -3 +1 0 0 0 1</action>` (6DOF delta plus gripper)

What stunned people: it generalized to never-seen objects (extinct animal cards) and never-seen commands ("move the puppy picture").

11.3 OpenVLA (2024.06) — first open-source VLA

Stanford, UC Berkeley, and Toyota Research released **OpenVLA (7B)**, based on the Prismatic VLM.

- Trained on **Open X-Embodiment** 970K trajectories.

- 14 days of training on 64 A100s.

- Weights, code, data all public.

- LoRA-finetunable to new robots and tasks.

from transformers import AutoModelForVision2Seq, AutoProcessor

from PIL import Image

processor = AutoProcessor.from_pretrained("openvla/openvla-7b", trust_remote_code=True)

vla = AutoModelForVision2Seq.from_pretrained(

"openvla/openvla-7b",

attn_implementation="flash_attention_2",

torch_dtype=torch.bfloat16,

trust_remote_code=True

).to("cuda:0")

image = Image.open("camera_view.png")

prompt = "In: What action should the robot take to pick up the cup?\nOut:"

inputs = processor(prompt, image).to("cuda:0", dtype=torch.bfloat16)

action = vla.predict_action(**inputs, unnorm_key="bridge_orig", do_sample=False)

print(action) # [dx, dy, dz, droll, dpitch, dyaw, gripper]

11.4 π0 (Physical Intelligence, 2024.10)

**Physical Intelligence (PI)** was founded by Sergey Levine, Chelsea Finn, and others. **π0 (pi-zero)** is a 7B parameter flow-matching VLA.

The core innovation: **action chunking + flow matching**:

- Generate a 50ms-long action chunk (50 steps) per inference.

- Flow matching instead of diffusion for smooth action distributions.

- Dexterity demos: folding clothes, washing dishes, packing boxes.

π0 inference pseudocode

from openpi.models.pi0 import Pi0Policy

policy = Pi0Policy.from_checkpoint("pi0_base")

action_chunk = policy.sample_action(

images={"primary": img1, "wrist": img2},

instruction="Fold the blue shirt",

state=robot_state,

)

action_chunk: shape [50, action_dim]

for action in action_chunk:

robot.execute(action)

In April 2025 PI released π0.5 and π1; some weights are on Hugging Face.

11.5 VLA model comparison

|---|---|---|---|---|

11.6 Limits and what comes next

VLAs don't solve everything. As of 2026:

1. **Long-horizon tasks** — Difficult beyond 30 seconds. Planning (System 2) is weak.

2. **Data efficiency** — New tasks need tens to hundreds of demonstrations. One-shot is far off.

3. **Safety guarantees** — Formally certifying neural policy safety is hard.

4. **Hardware-software coupling** — The same model rarely transfers well between robots.

12. Foxglove Studio — data visualization

12.1 RViz2's limits, Foxglove's arrival

ROS 2's default visualization is **RViz2**. Powerful enough, but limited.

- ROS 2 dependent — Hard to inspect non-ROS systems (MAVLink, custom protocols).

- Weak log (rosbag) analysis.

- Hard to collaborate or share (only screenshots and recordings).

- No web or tablet support.

**Foxglove Studio** is a multi-protocol visualization tool from a company founded by ex-Cruise engineers in 2021.

- **Desktop + web + cloud** supported simultaneously.

- **ROS 1, ROS 2, MCAP, MAVLink, MQTT, raw WebSocket** all supported.

- **3D, Plot, Image, State Transitions, Map, Tab** and many other panels.

- **MCAP** has become the de facto new standard (rosbag2 replacement).

12.2 MCAP — the new log standard

**MCAP (Message Capture)** is a log container format from Foxglove. As of ROS 2 Jazzy, rosbag2's default storage is MCAP.

Features:

- Indexed for fast random access.

- Embeds message definitions, so it's self-describing.

- Supports ROS, protobuf, JSON, flatbuffers.

- Python / Go / Rust SDKs.

Record rosbag2 as MCAP (default in Jazzy)

ros2 bag record -a -s mcap -o my_session

Inspect MCAP info

mcap info my_session/my_session_0.mcap

Convert

mcap convert old.bag new.mcap

12.3 Viewing data in Foxglove

Live ROS 2 topics in desktop app

- Launch Foxglove Studio

- + Add connection -> ROS 2 -> auto-discover

Or drag-drop an MCAP file in the web app

https://app.foxglove.dev

Collaboration wins: teammates open the same MCAP and see the same panel layout at the same timestamp. A pull request can link to "the bug at t=23.4s".

12.4 PlotJuggler — signal analysis

For numerical (sensor / controller) time-series analysis, **PlotJuggler** is the most convenient tool.

sudo apt install ros-jazzy-plotjuggler-ros

ros2 run plotjuggler plotjuggler

Drag-drop CSV, rosbag2, MCAP, or live topics to plot. Easy formula-based channel creation and filtering.

13. Korea and Japan robotics — from industrial powerhouses to humanoids

13.1 Korea — Hyundai's Boston Dynamics bet

In December 2021, Hyundai Motor Group bought **Boston Dynamics** from SoftBank Group for 1.1 billion USD. Many called it expensive at the time, but with the humanoid boom the bet's meaning has crystallized.

- **Atlas (Electric)** — Hydraulic Atlas retired April 2024, new electric Atlas unveiled. Learning vehicle assembly at Hyundai's Metaplant in Georgia.

- **Spot** — Quadruped, used for security and inspection at Hyundai and Kia factories.

- **Stretch** — Warehouse robot for box handling.

Other Korean players:

- **Doosan Robotics** — Collaborative robots (cobot) M-series and H-series. Famous for applications like Korean chicken-frying robots.

- **Rainbow Robotics** — Spinoff from KAIST's HUBO project; Samsung Electronics acquired a stake in 2024. Developing the RB-Y1 humanoid.

- **Robotis** — Dynamixel smart actuators. The de facto standard inside LeRobot SO-100, OpenManipulator, and global robotics education kits.

- **Yujin Robot, Robotis** — Mobile robots and educational platforms.

13.2 Japan — long-standing industrial robotics powerhouse

Japan has been world number one in industrial robotics since the 1980s. In 2026 that position has not changed.

**Industrial robot Big Four** (2026):

- **FANUC** — Yellow industrial arms, the standard on automotive and semiconductor lines.

- **Yaskawa Motoman** — Welding and assembly.

- **Kawasaki Robotics** — Parent is the motorcycle company.

- **Mitsubishi Electric** — MELFA series.

**Service and humanoid side**:

- **Sony AIBO** — First released 1999, revived in 2018 (ERS-1000). Icon of home robot pets.

- **SoftBank Pepper** — Launched 2014, production stopped 2021. In 2026 only seen in museums.

- **Honda ASIMO** — Development ended 2018. The end of the first humanoid generation.

- **Toyota T-HR3 / Human Support Robot (HSR)** — Research and elder-care robots.

- **Kawasaki Kaleido** — Humanoid announced in 2022.

- **Preferred Networks** — AI / robotics startup, partnered with FANUC.

13.3 Different textures of the two countries

| | Korea | Japan |

|---|---|---|

| **Strengths** | Humanoid (BD), cobot (Doosan), motors (Robotis) | Industrial arms (FANUC, Yaskawa), precision motors, automotive |

| **Weaknesses** | Late entry to industrial standard market | Slow on humanoid / AI integration |

| **R&D** | Government-led, big-corp centered | Corporate-led, craftsman culture |

| **AI integration** | Fast (rapid adoption of LLM, VLA) | Conservative |

A interesting 2026 trend: the **Hyundai-Boston Dynamics-Hyundai Mobis-Hyundai Wia** axis has become almost the only non-US bloc that can compete head-on with US Big Tech (Tesla, Figure). Japan remains strong in industrial but is in catch-up mode for the humanoid / VLA race.

13.4 Learning resources

- Korea — KAIST and SNU robotics labs, Hyundai Robotics Academy, Woowa Brothers robot team.

- Japan — Preferred Networks challenges, Toyota Research Institute, AIST (National Institute of Advanced Industrial Science and Technology).

14. References

ROS 2 official / Nav2 / MoveIt

- ROS 2 docs — https://docs.ros.org/

- ROS 2 Jazzy release — https://docs.ros.org/en/jazzy/

- ROS 2 Kilted release — https://docs.ros.org/en/kilted/

- ROS 2 distribution schedule — https://docs.ros.org/en/rolling/Releases.html

- Nav2 (Navigation 2) — https://docs.nav2.org/

- Nav2 GitHub — https://github.com/ros-navigation/navigation2

- Steve Macenski blog — https://www.opennav.org/

- MoveIt 2 — https://moveit.ai/

- MoveIt 2 tutorials — https://moveit.picknik.ai/

- ros2_control — https://control.ros.org/

- SLAM Toolbox — https://github.com/SteveMacenski/slam_toolbox

Simulation

- Gazebo (New) — https://gazebosim.org/

- Gazebo Harmonic release — https://gazebosim.org/docs/harmonic/

- Gazebo Classic EOL — https://gazebosim.org/docs/latest/getstarted/

- NVIDIA Isaac Sim — https://developer.nvidia.com/isaac/sim

- Isaac Lab — https://isaac-sim.github.io/IsaacLab/

- Isaac ROS — https://nvidia-isaac-ros.github.io/

- MuJoCo — https://mujoco.org/

- MuJoCo GitHub — https://github.com/google-deepmind/mujoco

- MuJoCo Menagerie — https://github.com/google-deepmind/mujoco_menagerie

- MJX docs — https://mujoco.readthedocs.io/en/stable/mjx.html

- Webots — https://cyberbotics.com/

- OpenUSD — https://openusd.org/

VLA / foundation models

- RT-2 paper — https://robotics-transformer2.github.io/

- OpenVLA — https://openvla.github.io/

- OpenVLA GitHub — https://github.com/openvla/openvla

- Physical Intelligence — https://www.physicalintelligence.company/

- π0 blog — https://www.physicalintelligence.company/blog/pi0

- NVIDIA GR00T — https://developer.nvidia.com/project-gr00t

- GR00T N1 announcement — https://blogs.nvidia.com/blog/gr00t-n1-foundation-model-humanoids/

- Hugging Face LeRobot — https://github.com/huggingface/lerobot

- LeRobot blog — https://huggingface.co/blog/lerobot

- Open X-Embodiment — https://robotics-transformer-x.github.io/

Visualization / data

- Foxglove Studio — https://foxglove.dev/

- MCAP format — https://mcap.dev/

- PlotJuggler — https://github.com/facontidavide/PlotJuggler

- rosbag2 — https://github.com/ros2/rosbag2

Humanoid platforms

- Boston Dynamics — https://bostondynamics.com/

- Tesla Optimus — https://www.tesla.com/optimus

- Figure AI — https://www.figure.ai/

- 1X Technologies — https://www.1x.tech/

- Unitree — https://www.unitree.com/

- Apptronik Apollo — https://apptronik.com/

- Sanctuary AI — https://sanctuary.ai/

Korea / Japan

- Doosan Robotics — https://www.doosanrobotics.com/

- Robotis — https://www.robotis.com/

- Rainbow Robotics — https://www.rainbow-robotics.com/

- FANUC — https://www.fanuc.com/

- Yaskawa Motoman — https://www.yaskawa.com/

- Kawasaki Robotics — https://kawasakirobotics.com/

- Toyota Research Institute — https://www.tri.global/

- Preferred Networks — https://www.preferred.jp/

Closing — robotics is really changing

In 2020, robotics development was "ROS 1 + Gazebo Classic + hand-written controllers + a PhD". The entry barrier was a PhD.

In 2026, robotics development is "**ROS 2 Jazzy + Isaac Sim / MuJoCo + a VLA pulled from LeRobot + a 300 USD SO-100**". The entry barrier is a laptop and curiosity.

Three forces driving the change:

1. **Simulation maturity** — Isaac Sim, MuJoCo, and Gazebo Harmonic deliver industrial-grade quality for free.

2. **VLAs arriving** — Perception, planning, and control once hand-engineered now compress into a single model.

3. **Democratization** — LeRobot bundles data, models, and hardware in one open package.

Whether all this actually puts a humanoid in our living rooms, or whether it becomes another "self-driving in 5 years" overhype, will be answered in 2027~2028. But one thing is clear — **in 2026 we wait for that answer with a tool pile that bears no resemblance to what we had a year ago.**

Install ROS 2. Run one MuJoCo simulation. Follow one LeRobot ACT tutorial. That's the lightest way to put a foot into 2026 robotics.