The Complete Autonomous Driving & Robotics Tech Stack: From C++, ROS2, CUDA, TensorRT to VLM/VLA, Simulation, and Beyond

1. Overview
2. Modern C++ for Robotics (C++17/20/23)
3. ROS / ROS2 (Robot Operating System)
4. Computer Vision for Autonomous Driving
5. VLM/VLA Models (Vision-Language-Action)
6. CUDA and Parallel Programming
7. TensorRT
8. Model Optimization (Quantization, Pruning, Distillation)
9. Sensor Fusion (GPS, IMU, Camera, LiDAR)
10. SIL/HIL Testing
11. Simulation Software
12. Full Autonomous Driving Stack
13. VR/AR and Digital Twins
- 13.1 Application Areas
- 13.2 Key Platforms
14. Cloud Technologies
15. Learning Roadmap
16. References

1. Overview

Autonomous driving and robotics systems are not built on a single technology — they are a convergence of dozens of disciplines. The entire pipeline, from receiving raw sensor data to perceiving the environment, planning a path, and controlling the vehicle, involves C++, GPU programming, deep learning, sensor fusion, simulation, and cloud infrastructure.

This post provides a practitioner-oriented overview of the 13 core technical domains every autonomous driving and robotics engineer should know.

┌────────────────────────────────────────────────────────────────┐
│             Autonomous Driving Tech Stack Architecture          │
│                                                                │
│  ┌──────────┐  ┌───────────┐  ┌───────────┐  ┌──────────────┐ │
│  │  Sensing  │  │ Perception│  │  Decision │  │   Control    │ │
│  │ GPS/IMU  │→│ CV/DL     │→│ Planning  │→│ Control      │ │
│  │ Camera   │  │ Sensor    │  │ Prediction│  │ CAN/Ethernet │ │
│  │ LiDAR    │  │ Fusion    │  │           │  │              │ │
│  │          │  │ VLM/VLA   │  │           │  │              │ │
│  └──────────┘  └───────────┘  └───────────┘  └──────────────┘ │
│                                                                │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │ Infra Layer: C++ | ROS2 | CUDA | TensorRT | Cloud/MLOps │  │
│  └──────────────────────────────────────────────────────────┘  │
│                                                                │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │ Validation Layer: SIL/HIL | Simulation(CARLA/Isaac) | VR/AR│
│  └──────────────────────────────────────────────────────────┘  │
└────────────────────────────────────────────────────────────────┘

2. Modern C++ for Robotics (C++17/20/23)

2.1 Why C++?

Robotics demands deterministic execution, zero-overhead abstraction, and direct hardware access. Modern C++ delivers all three while dramatically improving code safety and expressiveness. From ROS2 nodes to CUDA kernels and real-time control loops, every performance-critical piece of code is written in C++.

2.2 Key Features by Standard

C++17 (Robotics Baseline)

Feature	Robotics Use Case
`std::optional` / `std::variant`	Representing sensor state ("value present/absent")
Structured bindings	`auto [x, y, z] = getPosition();`
`if constexpr`	Compile-time branching in sensor abstraction layers
`std::filesystem`	Log management, map file loading
Parallel STL (`std::execution::par`)	Parallel point cloud processing

C++20 (Current Robotics Standard)

// Concepts: Type-safe sensor interfaces
template<typename T>
concept Sensor = requires(T s) {
    { s.read() } -> std::convertible_to<SensorData>;
    { s.calibrate() } -> std::same_as<bool>;
};

// Ranges: Sensor data pipeline
auto obstacles = pointCloud
    | views::filter(isAboveGround)
    | views::transform(toWorldFrame)
    | views::take(maxObstacles);

// Coroutines: Cooperative multitasking without RTOS overhead
// Asynchronous I/O via co_await, co_yield

Concepts: Template parameter constraints for compile-time type safety
Ranges: Composable lazy data transformations
Coroutines: Asynchronous I/O on embedded platforms
std::jthread: Threads with cooperative cancellation

C++23 (Adoption in Progress for Robotics)

std::expected<T, E>: Error handling without exceptions (exceptions are forbidden in real-time code)
std::mdspan: Multidimensional array views for image/tensor data (zero-copy)
std::print: Type-safe formatted output

2.3 Real-Time Programming Considerations

✗ Dynamic memory allocation on hot paths → ✓ std::pmr allocators or pre-allocated pools
✗ Exceptions in real-time control loops   → ✓ std::expected or error codes
✗ Mutex-based communication               → ✓ std::atomic, lock-free data structures
✗ Default scheduling                      → ✓ SCHED_FIFO / SCHED_RR (POSIX)

2.4 Learning Resources

Programming with C++20 — Andreas Fertig
Modern C++ Blog

3. ROS / ROS2 (Robot Operating System)

3.1 What Is ROS2?

ROS2 is an open-source middleware for building robot applications. It is a complete rewrite of ROS1, designed to support real-time operations, multi-robot systems, and production-grade deployments. The latest LTS release is ROS2 Jazzy Jalisco (2024.05).

3.2 ROS1 vs ROS2

Aspect	ROS1	ROS2
Discovery	Centralized (`roscore`)	Decentralized (DDS discovery)
Middleware	Custom TCPROS/UDPROS	DDS/RTPS standard
Real-time	Not supported	First-class support via DDS QoS
Security	None	DDS-SROS2 (authentication, encryption, ACL)
Multi-robot	Complex namespace workarounds	Native multi-domain support
Lifecycle	None	Managed Node (configure, activate, deactivate)
OS Support	Linux only (official)	Linux, macOS, Windows, RTOS
Build System	catkin	colcon + ament

3.3 DDS Middleware Layer

ROS2 communicates through the Data Distribution Service (DDS) standard.

DDS Implementation	Characteristics
Eclipse Cyclone DDS	Lightweight, high-performance (Jazzy default)
eProsima Fast DDS	Feature-rich, widely adopted
RTI Connext DDS	Enterprise-grade, safety-certified

Key QoS Profiles: Reliability (Best-Effort vs Reliable), Durability (Volatile vs Transient-Local), History Depth, Deadline, Liveliness

3.4 Core Concepts

Concept	Description	Example
Node	Modular process unit	Perception node, planning node, control node
Topic	Pub/Sub channel	Sensor data streams
Service	Synchronous Request/Reply	"Trigger calibration"
Action	Async long-running task + feedback	"Navigate to waypoint"
Executor	Callback execution policy	SingleThreaded, MultiThreaded
Component Node	Dynamically loadable shared library	Zero-copy intra-process communication
Lifecycle Node	Deterministic start/stop state machine	configure → activate → deactivate

3.5 Learning Resources

4. Computer Vision for Autonomous Driving

4.1 Core Paradigm: BEV (Bird's-Eye-View) Representation

The dominant paradigm in 2024-2026 is projecting multi-camera views into a unified BEV feature space.

Front Camera ──┐
Left Camera  ──┤
Right Camera ──┼──→ [BEV Feature Space] ──→ 3D Detection
Rear Camera  ──┤                            Lane Detection
Side Cameras ──┘                            Occupancy Prediction

Model	Method	Performance (nuScenes NDS)
BEVFormer	Deformable Attention + Spatiotemporal Transformer	56.9%
BEVDet/BEVDepth	Explicit depth prediction for 2D→3D lifting	-
LSS	Per-pixel depth distribution prediction	-

4.2 Perception Pipeline

Stage	Technique	Representative Models
2D Object Detection	Real-time detection	YOLOv8, YOLOv9, RT-DETR
3D Object Detection	Camera-based 3D	DETR3D, PETR, StreamPETR
Lane Detection	Parametric/anchor-based	CLRNet, LaneATT, TopoNet
Depth Estimation	Monocular/multi-view	MiDaS, Depth Anything V2
Occupancy Prediction	3D voxel grid	SurroundOcc, Occ3D
Traffic Sign/Signal	Infrastructure classification	Dedicated classifiers

4.3 End-to-End Perception-Planning Integration

Perception Evolution:
CNN (2011-2016) → RNN+GAN (2016-2018) → BEV (2018-2020)
→ Transformer+BEV (2020-present) → Occupancy (2022-present) → End-to-End VLA (2024-present)

UniAD (CVPR 2023 Best Paper): Perception + prediction + planning in a single network
VAD: End-to-end driving based on vectorized scene representations
DriveTransformer (ICLR 2025): Efficient parallel end-to-end architecture

4.4 Learning Resources

5. VLM/VLA Models (Vision-Language-Action)

5.1 What Is VLA?

Vision-Language-Action (VLA) models are foundation models that take visual input (camera images) and language commands and directly output robot actions. They serve as a bridge connecting internet-scale vision-language pretraining with robotic control.

5.2 Key Models Timeline

Model	Organization	Year	Key Features
PaLM-E	Google	2023	562B multimodal model, visual tokens embedded into LLM
RT-2	DeepMind	2023	First VLA, discretized action tokens, Chain-of-Thought reasoning
Octo	UC Berkeley	2024	Open-source generalist policy, Open X-Embodiment training, Diffusion head
OpenVLA	Stanford	2024.06	7B parameters, Llama 2 + DINOv2 + SigLIP, LoRA fine-tuning support
pi0	Physical Intelligence	Late 2024	~3.3B, continuous action output via Flow Matching
Helix	Figure AI	2025.02	First full-body humanoid VLA (arms, hands, torso, head, fingers)
GR00T N1	NVIDIA	2025.03	Humanoid foundation model, Isaac Sim integration

5.3 Core Concepts

Action Output Comparison:

RT-2 Approach (Action Tokenization):
  "move arm" → LLM → [token256] [token128] [token064] → Discrete actions

pi0 Approach (Flow Matching):
  "move arm" → VLM → Flow Expert → Continuous vector field → Smooth actions

Action Tokenization: Discretizing continuous actions into vocabulary tokens (RT-2)
Flow Matching: Generating continuous actions via learned vector fields (pi0)
Cross-Embodiment Transfer: Training on multiple robot types for generalization
Open X-Embodiment: 21+ institutions, 1M+ episodes collaborative dataset

5.4 Learning Resources

6. CUDA and Parallel Programming

6.1 Why GPUs?

Autonomous vehicles must process multiple camera streams, LiDAR point clouds, and radar signals simultaneously while running several neural networks within 100ms. CPUs alone simply cannot keep up.

6.2 CUDA Programming Model

┌─────────────────────────────────────────────┐
│            CUDA Memory Hierarchy             │
│                                              │
│  Registers (per thread)                      │
│    ↓                                         │
│  Shared Memory (per block, ~48-164KB)        │
│    ↓                                         │
│  L2 Cache                                    │
│    ↓                                         │
│  Global Memory (VRAM)                        │
│                                              │
│  Thread → Warp(32) → Block(max 1024) → Grid  │
└─────────────────────────────────────────────┘

Concept	Description
Kernel	Function executed in parallel by thousands of GPU threads
Warp	32 threads executing synchronously in SIMT fashion
Stream	Concurrent kernel execution and compute/memory transfer overlap
Coalesced Access	Adjacent threads accessing adjacent memory for maximum bandwidth
Shared Memory	User-managed scratchpad for intra-block data reuse
Pinned Memory	Asynchronous CPU-GPU transfer via DMA

6.3 CUDA Applications in Autonomous Driving

Application	Specific Tasks
Point cloud processing	Voxelization, ground removal, clustering
Image preprocessing	Distortion correction, resizing, color space conversion, normalization
Neural network inference	Convolution, attention, normalization kernels (cuDNN, cuBLAS)
Post-processing	NMS, BEV grid generation
Sensor synchronization	Multi-sensor stream timestamp alignment

6.4 NVIDIA Autonomous Driving Platforms

Platform	Performance	Use Case
Orin SoC	254 TOPS INT8	Current L2+ through L4
Thor (next-gen)	2,000 TOPS	L4 central computing

6.5 Ecosystem Libraries

cuDNN (deep learning), cuBLAS (linear algebra), Thrust (parallel STL), CUB (block/device primitives), NCCL (multi-GPU communication), cuPCL (point clouds)

6.6 Learning Resources

7. TensorRT

7.1 What Is TensorRT?

NVIDIA's high-performance deep learning inference SDK. It optimizes PyTorch/TensorFlow/ONNX models through graph optimization, automatic kernel tuning, precision calibration, and memory management — typically achieving 2x to 10x speedup.

7.2 Core Optimization Techniques

Layer/Kernel Fusion

Before optimization: Conv → BatchNorm → ReLU (3 kernel launches)
After optimization:  Conv+BN+ReLU (1 kernel launch)

Impact: Up to 80% reduction in kernel launch overhead
        Up to 50% reduction in memory bandwidth
        ~30% throughput improvement

Precision Calibration

Conversion	Throughput Gain	Accuracy Loss	Calibration Required
FP32 → FP16	2x	Negligible	No
FP32 → INT8	4x	Less than 1% (with proper calibration)	Yes (500-1000 samples)
FP32 → FP8	Optimal (Hopper/Blackwell)	Minimal	Yes

PTQ (Post-Training Quantization): No retraining needed, quantization with calibration data only QAT (Quantization-Aware Training): Simulates quantization during training for higher accuracy

Deployment Workflow

PyTorch Model
  → ONNX Export (torch.onnx.export)
  → TensorRT Builder (trtexec or Python API)
    → Graph optimization + layer fusion
    → Precision calibration (INT8/FP8)
    → Automatic kernel tuning
  → Serialized engine (.engine file)
  → TensorRT Runtime (inference)

7.3 Integration Options

Tool	Use Case
trtexec	CLI build and benchmarking
TensorRT Python/C++ API	Programmatic control
Torch-TensorRT	Native PyTorch integration
ONNX-TensorRT	Direct ONNX model optimization
Triton Inference Server	Model serving with TensorRT backend

7.4 Learning Resources

8. Model Optimization (Quantization, Pruning, Distillation)

8.1 Why Is This Necessary?

A BEVFormer model requires 50+ TFLOPS at FP32 — impossible to run on an in-vehicle SoC. Model optimization can achieve a 4x to 16x reduction while retaining over 95% of the original accuracy.

8.2 Quantization

A technique that reduces the numerical precision of weights and activations.

Method	Retraining	Accuracy	Best For
PTQ	Not required (calibration only)	Slightly lower	Fast deployment, quantization-robust models
QAT	Required (fake quantization)	Higher than PTQ	Production models, accuracy-critical tasks

Precision Levels:

Precision	Compression	Accuracy Loss
FP16	2x	Negligible
INT8	4x	Less than 1%
INT4 (AWQ, GPTQ)	8x	Minor
FP8 (H100/H200)	Optimal	Minimal

8.3 Pruning

A technique that removes unnecessary weights, neurons, or channels.

Type	Method	Pros	Cons
Unstructured	Zeroing individual weights	90%+ sparsity achievable	Requires specialized hardware (2:4 sparsity)
Structured	Removing entire channels/heads/layers	Direct FLOPs reduction, general-purpose hardware	Lower compression ratio than unstructured

8.4 Knowledge Distillation

Transfers knowledge from a large "teacher" model to a smaller "student" model.

Logit Distillation: Student mimics the teacher's output probability distribution
Feature Distillation: Student mimics the teacher's intermediate representations
QAD: Mimics the teacher while handling quantization errors

8.5 Industry-Standard Pipeline (2025)

Large Teacher (FP32)
  → Knowledge Distillation → Smaller Student
  → Structured Pruning → Remove channels/heads
  → QAT Fine-tuning → INT8/FP8
  → TensorRT Export → Fused and optimized engine

8.6 Tools

NVIDIA Model Optimizer (ModelOpt): Unified API for quantization, pruning, distillation, and sparsity
PyTorch: torch.quantization, torch.ao.quantization
Hugging Face Optimum: Transformer model optimization

9. Sensor Fusion (GPS, IMU, Camera, LiDAR)

9.1 Why Fusion?

Sensor	Strengths	Weaknesses
Camera	Rich semantic information, low cost	No direct depth measurement, light-sensitive
LiDAR	Precise 3D point clouds	Expensive, sparse at long range
Radar	All-weather operation	Low angular resolution
GPS	Global positioning	Meter-level error, unreliable in tunnels/urban canyons
IMU	High-frequency motion data	Drift over time

Fusion compensates for each sensor's weaknesses through complementary strengths.

9.2 Fusion Architectures

Level	Method	Example
Early (Data)	Combine raw data, then extract features	Painting camera RGB onto LiDAR points
Mid (Feature)	Merge NN features from each sensor in a shared space	BEVFusion, TransFusion
Late (Decision)	Independent detection, then rule/learning-based merge	Ensemble voting

Dominant trend in 2025: Unified BEV + Token-Level Cross-Modal Attention

9.3 Classical State Estimation

Kalman Filter (KF)

Predict:  x̂ₖ|ₖ₋₁ = F·x̂ₖ₋₁ + B·uₖ
          Pₖ|ₖ₋₁ = F·Pₖ₋₁·Fᵀ + Q

Update:   Kₖ = Pₖ|ₖ₋₁·Hᵀ·(H·Pₖ|ₖ₋₁·Hᵀ + R)⁻¹
          x̂ₖ = x̂ₖ|ₖ₋₁ + Kₖ·(zₖ - H·x̂ₖ|ₖ₋₁)

Filter	Characteristics	Best For
KF	Linear systems, Gaussian noise	Simple GPS+Odometry
EKF	Jacobian-based nonlinear linearization	GPS+IMU fusion standard
UKF	Sigma points (no Jacobian needed)	Highly nonlinear systems
Particle Filter	Non-parametric, multimodal distributions	Urban GPS ambiguity

State Vector (typical EKF): [x, y, z, roll, pitch, yaw, vx, vy, vz, ax, ay, az]

9.4 Sensor Calibration

Type	Description	Tools
Extrinsic	Rotation + translation between sensors	Kalibr (ETH Zurich), checkerboard-based
Intrinsic	Internal sensor parameters (focal length, distortion coefficients)	OpenCV `calibrateCamera`
Temporal	Time offset between sensors	PTP, GPS PPS, signal correlation

9.5 Learning Resources

10. SIL/HIL Testing

10.1 Why Is This Necessary?

Statistically proving safety through physical road testing would require 11 billion miles of driving (Waymo estimate). SIL/HIL simulation can simulate millions of miles per day.

10.2 SIL (Software-in-the-Loop)

┌──────────────────────────────────────────────────┐
│                  SIL Environment                  │
│                                                   │
│  [Perception Algorithms] ←→ [Sensor Simulation]   │
│  [Planning Algorithms]   ←→ [Scenario Engine]     │
│  [Control Algorithms]    ←→ [Vehicle Dynamics Model]│
│                                                   │
│  Execution: Host PC (x86)                         │
│  Physical Hardware: None                          │
│  Iteration Speed: Seconds to minutes              │
│  CI/CD Integration: Yes (cloud parallelization)   │
└──────────────────────────────────────────────────┘

Advantages: No hardware cost, fully reproducible, CI/CD integration, cluster parallelization

10.3 HIL (Hardware-in-the-Loop)

┌──────────────────────────────────────────────────┐
│                  HIL Environment                  │
│                                                   │
│  [Actual ECU (DUT)] ←→ [HIL Simulator]            │
│                         ├ Vehicle dynamics model   │
│                         ├ Sensor signal injection (HDMI/ETH)│
│                         ├ Bus simulation (CAN/ETH) │
│                         └ Fault injection          │
│                                                   │
│  Execution: Real target hardware (Orin, EyeQ, etc.)│
│  Real-time: Hardware clock rate                   │
│  ISO 26262: Required for functional safety certification│
└──────────────────────────────────────────────────┘

10.4 V-Model Test Pyramid

MIL (Model-in-the-Loop)          — MATLAB/Simulink prototyping
  → SIL                           — Host PC + simulation environment
    → PIL (Processor-in-the-Loop) — Target processor compilation, host execution
      → HIL                       — Target ECU + simulation environment
        → VIL (Vehicle-in-the-Loop) — Real vehicle + scenario injection
          → Road Testing          — Real vehicle + real environment

10.5 Industry Tools

Tool	Use Case
dSPACE SCALEXIO	HIL simulation
NI PXI	PXI-based HIL
Vector CANoe	Bus simulation
Applied Intuition HIL Sim	ADAS/AD HIL platform
IPG CarMaker	SIL/HIL vehicle dynamics

11. Simulation Software

11.1 Major Simulator Comparison

Feature	CARLA	Isaac Sim	LGSVL	CarSim	Simulink
Open source	Yes	Yes	Yes*	No	No
Engine	Unreal	Omniverse	Unity	Proprietary	Proprietary
Sensor simulation	High	Very high	High	Low	Medium
Vehicle dynamics	Medium	Medium	Medium	Very high	High
ROS2 support	Yes	Yes	Yes	Bridge	Toolbox
Synthetic data	Yes	Best	Yes	No	Limited
ML training	API	Isaac Lab (RL)	API	No	RL Toolbox
Active dev (2025)	Yes	Yes	No*	Yes	Yes

*LGSVL was discontinued by LG

11.2 CARLA (Open Source, Unreal Engine)

# Run CARLA via Docker
docker pull carlasim/carla:0.9.15
docker run --privileged --gpus all --net=host \
    carlasim/carla:0.9.15 /bin/bash ./CarlaUE4.sh

# Control scenarios via Python API
pip install carla

import carla

client = carla.Client('localhost', 2000)
world = client.get_world()

# Spawn a vehicle
blueprint = world.get_blueprint_library().find('vehicle.tesla.model3')
spawn_point = world.get_map().get_spawn_points()[0]
vehicle = world.spawn_actor(blueprint, spawn_point)

# Attach a camera sensor
camera_bp = world.get_blueprint_library().find('sensor.camera.rgb')
camera = world.spawn_actor(camera_bp, carla.Transform(), attach_to=vehicle)

GitHub: carla-simulator/carla (10K+ stars)
OpenDRIVE map format, ROS/ROS2 bridge

11.3 NVIDIA Isaac Sim

Omniverse (USD)-based, photorealistic RGB, depth, and segmentation masks via RTX renderer
PhysX GPU-accelerated physics engine
NuRec neural rendering to minimize the sim-to-real gap
Isaac Lab (RL training), Replicator (synthetic data), Cosmos (generative AI environments)

11.4 Learning Resources

12. Full Autonomous Driving Stack

12.1 Modular Stack Architecture

┌─────────────────────────────────────────────────────────────────┐
│                   Full Autonomous Driving Stack                  │
│                                                                  │
│  1. Sensing         Sensor drivers, time sync, logging           │
│       ↓                                                          │
│  2. Localization    HD Map matching, V-SLAM, LiDAR SLAM, GNSS/IMU│
│       ↓              → 6-DOF vehicle pose (100+ Hz)              │
│  3. Perception      3D detection, tracking, semantic seg, Occupancy│
│       ↓              → 3D bounding boxes, track IDs, semantic map│
│  4. Prediction      Agent future trajectory prediction (3-8s)    │
│       ↓              → Multi-modal trajectories per agent        │
│  5. Planning        Route planning, behavior planning, motion planning│
│       ↓              → Trajectory (pose + velocity sequence)     │
│  6. Control         Lateral (steering) + longitudinal (accel/brake)│
│       ↓              → CAN commands (steer-by-wire, brake-by-wire)│
└─────────────────────────────────────────────────────────────────┘

12.2 End-to-End vs Modular

Approach	Pros	Cons
Modular	Clear interfaces, easy testing, interpretable	Error propagation, inter-module information loss
End-to-End	Global optimization, information preservation	Hard to interpret, difficult safety verification
Hybrid	Learned perception + rule-based safety	Current industry mainstream

12.3 Open-Source Stacks

Stack	Description
Autoware	World's leading open-source AD stack, ROS2-based, fully modular
Apollo (Baidu)	Comprehensive AD platform, deployed in robotaxi operations

13. VR/AR and Digital Twins

13.1 Application Areas

Area	Description
Digital Twin	Virtual replica of physical robot/environment, real-time sync
Teleoperation	Remote robot control via VR (surgery, hazardous environments, space)
Data Collection	Human demonstrations in VR as robot policy training data
Simulation Visualization	Developers immerse in the robot's world for debugging

13.2 Key Platforms

NVIDIA Omniverse: USD-based, real-time rendering, physics simulation, multi-user collaboration
Unity + ROS: ROS-Unity integration via Unity Robotics Hub
WebXR + rosbridge: Browser-based VR robot control

14. Cloud Technologies

14.1 Why Cloud?

Autonomous vehicles generate 1 to 5TB of data per hour. Training perception models requires thousands of GPU-hours. Cloud is not optional — it is essential infrastructure.

14.2 Data Pipeline

Vehicle (Edge)
  → Upload raw logs via cellular/WiFi
  → Object Storage (S3/GCS/Azure Blob)
  → Data catalog & indexing (scenario mining)
  → Auto-annotation (pre-labeling with existing models)
  → Human annotation (verification, corner cases)
  → Dataset versioning (DVC, LakeFS)
  → Training cluster
  → Model registry
  → Validation pipeline (offline metrics, SIL)
  → OTA deployment

14.3 Key Technologies

Technology	Role
Apache Kafka	Real-time streaming (telemetry, OTA, vehicle comms)
Apache Flink	Stream processing (real-time scenario detection)
Apache Spark	Large-scale batch data transformation
Apache Airflow	ML pipeline workflow orchestration
MCAP	Multimodal log data format (successor to rosbag)

14.4 OTA (Over-the-Air) Updates

A/B Partitioning: Update inactive partition → switch on reboot
Delta Updates: Transmit only changed bytes (100-500MB vs 10+GB)
Staged Rollout: 1% → monitor → gradual expansion
Rollback: Revert to previous version on anomaly detection

Cryptographic signing, apply only in safe state, ISO 24089 standard

14.5 Data Flywheel

Model deployment → Real-world driving data collection → Automatic failure case mining
→ Additional annotation → Retraining → SIL validation → A/B testing → Full deployment
→ [Repeat]

15. Learning Roadmap

15.1 Fundamentals (1-3 Months)

Order	Topic	Recommended Resources
1	Modern C++ (17/20)	Programming with C++20
2	ROS2 Basics	ROS2 Jazzy Tutorials
3	Linux/POSIX Systems Programming	APUE (Advanced Programming in the UNIX Environment)
4	Computer Vision Fundamentals	CS231n (Stanford)

15.2 Intermediate (3-6 Months)

Order	Topic	Recommended Resources
5	CUDA Programming	CUDA C Programming Guide
6	Sensor Fusion (KF, EKF)	Probabilistic Robotics (Thrun)
7	AD Perception (BEV, 3D Detection)	BEVFormer Paper
8	TensorRT Optimization & Deployment	TensorRT Documentation

15.3 Advanced (6-12 Months)

Order	Topic	Recommended Resources
9	Full AD Stack	Autoware Documentation
10	VLM/VLA Models	VLA Survey
11	Simulation (CARLA)	CARLA Tutorials
12	SIL/HIL Testing	Hands-on projects
13	Cloud MLOps	Practical experience

16. References

Official Documentation

Key Papers

Li, Z., et al. (2022). "BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers". ECCV 2022.
Hu, Y., et al. (2023). "Planning-Oriented Autonomous Driving (UniAD)". CVPR 2023 Best Paper.
Brohan, A., et al. (2023). "RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control". arxiv.org/abs/2307.15818
Black, K., et al. (2024). "pi0: A Vision-Language-Action Flow Model for General Robot Control". arxiv.org/abs/2410.24164
Team, O., et al. (2024). "Octo: An Open-Source Generalist Robot Policy". octo-models.github.io
Kim, M., et al. (2024). "OpenVLA: An Open-Source Vision-Language-Action Model". arxiv.org/abs/2406.09246