Skip to content
Published on

The Complete Autonomous Driving & Robotics Tech Stack: From C++, ROS2, CUDA, TensorRT to VLM/VLA, Simulation, and Beyond

Authors
  • Name
    Twitter

1. Overview

Autonomous driving and robotics systems are not built on a single technology — they are a convergence of dozens of disciplines. The entire pipeline, from receiving raw sensor data to perceiving the environment, planning a path, and controlling the vehicle, involves C++, GPU programming, deep learning, sensor fusion, simulation, and cloud infrastructure.

This post provides a practitioner-oriented overview of the 13 core technical domains every autonomous driving and robotics engineer should know.

┌────────────────────────────────────────────────────────────────┐
Autonomous Driving Tech Stack Architecture│                                                                │
│  ┌──────────┐  ┌───────────┐  ┌───────────┐  ┌──────────────┐ │
│  │  Sensing  │  │ Perception│Decision │  │   Control    │ │
│  │ GPS/IMU  │→│ CV/DL     │→│ Planning  │→│ Control      │ │
│  │ Camera   │  │ Sensor    │  │ Prediction│CAN/Ethernet │ │
│  │ LiDAR    │  │ Fusion    │  │           │  │              │ │
│  │          │  │ VLM/VLA   │  │           │  │              │ │
│  └──────────┘  └───────────┘  └───────────┘  └──────────────┘ │
│                                                                │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │ Infra Layer: C++ | ROS2 | CUDA | TensorRT | Cloud/MLOps │  │
│  └──────────────────────────────────────────────────────────┘  │
│                                                                │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │ Validation Layer: SIL/HIL | Simulation(CARLA/Isaac) | VR/AR│  └──────────────────────────────────────────────────────────┘  │
└────────────────────────────────────────────────────────────────┘

2. Modern C++ for Robotics (C++17/20/23)

2.1 Why C++?

Robotics demands deterministic execution, zero-overhead abstraction, and direct hardware access. Modern C++ delivers all three while dramatically improving code safety and expressiveness. From ROS2 nodes to CUDA kernels and real-time control loops, every performance-critical piece of code is written in C++.

2.2 Key Features by Standard

C++17 (Robotics Baseline)

FeatureRobotics Use Case
std::optional / std::variantRepresenting sensor state ("value present/absent")
Structured bindingsauto [x, y, z] = getPosition();
if constexprCompile-time branching in sensor abstraction layers
std::filesystemLog management, map file loading
Parallel STL (std::execution::par)Parallel point cloud processing

C++20 (Current Robotics Standard)

// Concepts: Type-safe sensor interfaces
template<typename T>
concept Sensor = requires(T s) {
    { s.read() } -> std::convertible_to<SensorData>;
    { s.calibrate() } -> std::same_as<bool>;
};

// Ranges: Sensor data pipeline
auto obstacles = pointCloud
    | views::filter(isAboveGround)
    | views::transform(toWorldFrame)
    | views::take(maxObstacles);

// Coroutines: Cooperative multitasking without RTOS overhead
// Asynchronous I/O via co_await, co_yield
  • Concepts: Template parameter constraints for compile-time type safety
  • Ranges: Composable lazy data transformations
  • Coroutines: Asynchronous I/O on embedded platforms
  • std::jthread: Threads with cooperative cancellation

C++23 (Adoption in Progress for Robotics)

  • std::expected<T, E>: Error handling without exceptions (exceptions are forbidden in real-time code)
  • std::mdspan: Multidimensional array views for image/tensor data (zero-copy)
  • std::print: Type-safe formatted output

2.3 Real-Time Programming Considerations

Dynamic memory allocation on hot paths → ✓ std::pmr allocators or pre-allocated pools
Exceptions in real-time control loops   → ✓ std::expected or error codes
Mutex-based communication               → ✓ std::atomic, lock-free data structures
Default scheduling                      → ✓ SCHED_FIFO / SCHED_RR (POSIX)

2.4 Learning Resources


3. ROS / ROS2 (Robot Operating System)

3.1 What Is ROS2?

ROS2 is an open-source middleware for building robot applications. It is a complete rewrite of ROS1, designed to support real-time operations, multi-robot systems, and production-grade deployments. The latest LTS release is ROS2 Jazzy Jalisco (2024.05).

3.2 ROS1 vs ROS2

AspectROS1ROS2
DiscoveryCentralized (roscore)Decentralized (DDS discovery)
MiddlewareCustom TCPROS/UDPROSDDS/RTPS standard
Real-timeNot supportedFirst-class support via DDS QoS
SecurityNoneDDS-SROS2 (authentication, encryption, ACL)
Multi-robotComplex namespace workaroundsNative multi-domain support
LifecycleNoneManaged Node (configure, activate, deactivate)
OS SupportLinux only (official)Linux, macOS, Windows, RTOS
Build Systemcatkincolcon + ament

3.3 DDS Middleware Layer

ROS2 communicates through the Data Distribution Service (DDS) standard.

DDS ImplementationCharacteristics
Eclipse Cyclone DDSLightweight, high-performance (Jazzy default)
eProsima Fast DDSFeature-rich, widely adopted
RTI Connext DDSEnterprise-grade, safety-certified

Key QoS Profiles: Reliability (Best-Effort vs Reliable), Durability (Volatile vs Transient-Local), History Depth, Deadline, Liveliness

3.4 Core Concepts

ConceptDescriptionExample
NodeModular process unitPerception node, planning node, control node
TopicPub/Sub channelSensor data streams
ServiceSynchronous Request/Reply"Trigger calibration"
ActionAsync long-running task + feedback"Navigate to waypoint"
ExecutorCallback execution policySingleThreaded, MultiThreaded
Component NodeDynamically loadable shared libraryZero-copy intra-process communication
Lifecycle NodeDeterministic start/stop state machineconfigure → activate → deactivate

3.5 Learning Resources


4. Computer Vision for Autonomous Driving

4.1 Core Paradigm: BEV (Bird's-Eye-View) Representation

The dominant paradigm in 2024-2026 is projecting multi-camera views into a unified BEV feature space.

Front Camera ──┐
Left Camera  ──┤
Right Camera ──┼──→ [BEV Feature Space] ──→ 3D Detection
Rear Camera  ──┤                            Lane Detection
Side Cameras ──┘                            Occupancy Prediction
ModelMethodPerformance (nuScenes NDS)
BEVFormerDeformable Attention + Spatiotemporal Transformer56.9%
BEVDet/BEVDepthExplicit depth prediction for 2D→3D lifting-
LSSPer-pixel depth distribution prediction-

4.2 Perception Pipeline

StageTechniqueRepresentative Models
2D Object DetectionReal-time detectionYOLOv8, YOLOv9, RT-DETR
3D Object DetectionCamera-based 3DDETR3D, PETR, StreamPETR
Lane DetectionParametric/anchor-basedCLRNet, LaneATT, TopoNet
Depth EstimationMonocular/multi-viewMiDaS, Depth Anything V2
Occupancy Prediction3D voxel gridSurroundOcc, Occ3D
Traffic Sign/SignalInfrastructure classificationDedicated classifiers

4.3 End-to-End Perception-Planning Integration

Perception Evolution:
CNN (2011-2016)RNN+GAN (2016-2018)BEV (2018-2020)
Transformer+BEV (2020-present)Occupancy (2022-present)End-to-End VLA (2024-present)
  • UniAD (CVPR 2023 Best Paper): Perception + prediction + planning in a single network
  • VAD: End-to-end driving based on vectorized scene representations
  • DriveTransformer (ICLR 2025): Efficient parallel end-to-end architecture

4.4 Learning Resources


5. VLM/VLA Models (Vision-Language-Action)

5.1 What Is VLA?

Vision-Language-Action (VLA) models are foundation models that take visual input (camera images) and language commands and directly output robot actions. They serve as a bridge connecting internet-scale vision-language pretraining with robotic control.

5.2 Key Models Timeline

ModelOrganizationYearKey Features
PaLM-EGoogle2023562B multimodal model, visual tokens embedded into LLM
RT-2DeepMind2023First VLA, discretized action tokens, Chain-of-Thought reasoning
OctoUC Berkeley2024Open-source generalist policy, Open X-Embodiment training, Diffusion head
OpenVLAStanford2024.067B parameters, Llama 2 + DINOv2 + SigLIP, LoRA fine-tuning support
pi0Physical IntelligenceLate 2024~3.3B, continuous action output via Flow Matching
HelixFigure AI2025.02First full-body humanoid VLA (arms, hands, torso, head, fingers)
GR00T N1NVIDIA2025.03Humanoid foundation model, Isaac Sim integration

5.3 Core Concepts

Action Output Comparison:

RT-2 Approach (Action Tokenization):
  "move arm"LLM[token256] [token128] [token064]Discrete actions

pi0 Approach (Flow Matching):
  "move arm"VLMFlow ExpertContinuous vector field → Smooth actions
  • Action Tokenization: Discretizing continuous actions into vocabulary tokens (RT-2)
  • Flow Matching: Generating continuous actions via learned vector fields (pi0)
  • Cross-Embodiment Transfer: Training on multiple robot types for generalization
  • Open X-Embodiment: 21+ institutions, 1M+ episodes collaborative dataset

5.4 Learning Resources


6. CUDA and Parallel Programming

6.1 Why GPUs?

Autonomous vehicles must process multiple camera streams, LiDAR point clouds, and radar signals simultaneously while running several neural networks within 100ms. CPUs alone simply cannot keep up.

6.2 CUDA Programming Model

┌─────────────────────────────────────────────┐
CUDA Memory Hierarchy│                                              │
Registers (per thread)│    ↓                                         │
Shared Memory (per block, ~48-164KB)│    ↓                                         │
L2 Cache│    ↓                                         │
Global Memory (VRAM)│                                              │
ThreadWarp(32)Block(max 1024)Grid└─────────────────────────────────────────────┘
ConceptDescription
KernelFunction executed in parallel by thousands of GPU threads
Warp32 threads executing synchronously in SIMT fashion
StreamConcurrent kernel execution and compute/memory transfer overlap
Coalesced AccessAdjacent threads accessing adjacent memory for maximum bandwidth
Shared MemoryUser-managed scratchpad for intra-block data reuse
Pinned MemoryAsynchronous CPU-GPU transfer via DMA

6.3 CUDA Applications in Autonomous Driving

ApplicationSpecific Tasks
Point cloud processingVoxelization, ground removal, clustering
Image preprocessingDistortion correction, resizing, color space conversion, normalization
Neural network inferenceConvolution, attention, normalization kernels (cuDNN, cuBLAS)
Post-processingNMS, BEV grid generation
Sensor synchronizationMulti-sensor stream timestamp alignment

6.4 NVIDIA Autonomous Driving Platforms

PlatformPerformanceUse Case
Orin SoC254 TOPS INT8Current L2+ through L4
Thor (next-gen)2,000 TOPSL4 central computing

6.5 Ecosystem Libraries

cuDNN (deep learning), cuBLAS (linear algebra), Thrust (parallel STL), CUB (block/device primitives), NCCL (multi-GPU communication), cuPCL (point clouds)

6.6 Learning Resources


7. TensorRT

7.1 What Is TensorRT?

NVIDIA's high-performance deep learning inference SDK. It optimizes PyTorch/TensorFlow/ONNX models through graph optimization, automatic kernel tuning, precision calibration, and memory management — typically achieving 2x to 10x speedup.

7.2 Core Optimization Techniques

Layer/Kernel Fusion

Before optimization: ConvBatchNormReLU (3 kernel launches)
After optimization:  Conv+BN+ReLU (1 kernel launch)

Impact: Up to 80% reduction in kernel launch overhead
        Up to 50% reduction in memory bandwidth
        ~30% throughput improvement

Precision Calibration

ConversionThroughput GainAccuracy LossCalibration Required
FP32 → FP162xNegligibleNo
FP32 → INT84xLess than 1% (with proper calibration)Yes (500-1000 samples)
FP32 → FP8Optimal (Hopper/Blackwell)MinimalYes

PTQ (Post-Training Quantization): No retraining needed, quantization with calibration data only QAT (Quantization-Aware Training): Simulates quantization during training for higher accuracy

Deployment Workflow

PyTorch Model
ONNX Export (torch.onnx.export)
TensorRT Builder (trtexec or Python API)
Graph optimization + layer fusion
Precision calibration (INT8/FP8)
Automatic kernel tuning
Serialized engine (.engine file)
TensorRT Runtime (inference)

7.3 Integration Options

ToolUse Case
trtexecCLI build and benchmarking
TensorRT Python/C++ APIProgrammatic control
Torch-TensorRTNative PyTorch integration
ONNX-TensorRTDirect ONNX model optimization
Triton Inference ServerModel serving with TensorRT backend

7.4 Learning Resources


8. Model Optimization (Quantization, Pruning, Distillation)

8.1 Why Is This Necessary?

A BEVFormer model requires 50+ TFLOPS at FP32 — impossible to run on an in-vehicle SoC. Model optimization can achieve a 4x to 16x reduction while retaining over 95% of the original accuracy.

8.2 Quantization

A technique that reduces the numerical precision of weights and activations.

MethodRetrainingAccuracyBest For
PTQNot required (calibration only)Slightly lowerFast deployment, quantization-robust models
QATRequired (fake quantization)Higher than PTQProduction models, accuracy-critical tasks

Precision Levels:

PrecisionCompressionAccuracy Loss
FP162xNegligible
INT84xLess than 1%
INT4 (AWQ, GPTQ)8xMinor
FP8 (H100/H200)OptimalMinimal

8.3 Pruning

A technique that removes unnecessary weights, neurons, or channels.

TypeMethodProsCons
UnstructuredZeroing individual weights90%+ sparsity achievableRequires specialized hardware (2:4 sparsity)
StructuredRemoving entire channels/heads/layersDirect FLOPs reduction, general-purpose hardwareLower compression ratio than unstructured

8.4 Knowledge Distillation

Transfers knowledge from a large "teacher" model to a smaller "student" model.

  • Logit Distillation: Student mimics the teacher's output probability distribution
  • Feature Distillation: Student mimics the teacher's intermediate representations
  • QAD: Mimics the teacher while handling quantization errors

8.5 Industry-Standard Pipeline (2025)

Large Teacher (FP32)
Knowledge DistillationSmaller Student
Structured PruningRemove channels/heads
QAT Fine-tuning → INT8/FP8
TensorRT ExportFused and optimized engine

8.6 Tools

  • NVIDIA Model Optimizer (ModelOpt): Unified API for quantization, pruning, distillation, and sparsity
  • PyTorch: torch.quantization, torch.ao.quantization
  • Hugging Face Optimum: Transformer model optimization

9. Sensor Fusion (GPS, IMU, Camera, LiDAR)

9.1 Why Fusion?

SensorStrengthsWeaknesses
CameraRich semantic information, low costNo direct depth measurement, light-sensitive
LiDARPrecise 3D point cloudsExpensive, sparse at long range
RadarAll-weather operationLow angular resolution
GPSGlobal positioningMeter-level error, unreliable in tunnels/urban canyons
IMUHigh-frequency motion dataDrift over time

Fusion compensates for each sensor's weaknesses through complementary strengths.

9.2 Fusion Architectures

LevelMethodExample
Early (Data)Combine raw data, then extract featuresPainting camera RGB onto LiDAR points
Mid (Feature)Merge NN features from each sensor in a shared spaceBEVFusion, TransFusion
Late (Decision)Independent detection, then rule/learning-based mergeEnsemble voting

Dominant trend in 2025: Unified BEV + Token-Level Cross-Modal Attention

9.3 Classical State Estimation

Kalman Filter (KF)

Predict:  x̂ₖ|ₖ₋₁ = F·x̂ₖ₋₁ + B·uₖ
          P|ₖ₋₁ = F·Pₖ₋₁·F+ Q

Update:   K= P|ₖ₋₁·Hᵀ·(H·P|ₖ₋₁·H+ R)⁻¹
          x̂ₖ = x̂ₖ|ₖ₋₁ + Kₖ·(zₖ - H·x̂ₖ|ₖ₋₁)
FilterCharacteristicsBest For
KFLinear systems, Gaussian noiseSimple GPS+Odometry
EKFJacobian-based nonlinear linearizationGPS+IMU fusion standard
UKFSigma points (no Jacobian needed)Highly nonlinear systems
Particle FilterNon-parametric, multimodal distributionsUrban GPS ambiguity

State Vector (typical EKF): [x, y, z, roll, pitch, yaw, vx, vy, vz, ax, ay, az]

9.4 Sensor Calibration

TypeDescriptionTools
ExtrinsicRotation + translation between sensorsKalibr (ETH Zurich), checkerboard-based
IntrinsicInternal sensor parameters (focal length, distortion coefficients)OpenCV calibrateCamera
TemporalTime offset between sensorsPTP, GPS PPS, signal correlation

9.5 Learning Resources


10. SIL/HIL Testing

10.1 Why Is This Necessary?

Statistically proving safety through physical road testing would require 11 billion miles of driving (Waymo estimate). SIL/HIL simulation can simulate millions of miles per day.

10.2 SIL (Software-in-the-Loop)

┌──────────────────────────────────────────────────┐
SIL Environment│                                                   │
[Perception Algorithms] ←→ [Sensor Simulation][Planning Algorithms]   ←→ [Scenario Engine][Control Algorithms]    ←→ [Vehicle Dynamics Model]│                                                   │
Execution: Host PC (x86)Physical Hardware: NoneIteration Speed: Seconds to minutes              │
CI/CD Integration: Yes (cloud parallelization)└──────────────────────────────────────────────────┘

Advantages: No hardware cost, fully reproducible, CI/CD integration, cluster parallelization

10.3 HIL (Hardware-in-the-Loop)

┌──────────────────────────────────────────────────┐
HIL Environment│                                                   │
[Actual ECU (DUT)] ←→ [HIL Simulator]│                         ├ Vehicle dynamics model   │
│                         ├ Sensor signal injection (HDMI/ETH)│                         ├ Bus simulation (CAN/ETH)│                         └ Fault injection          │
│                                                   │
Execution: Real target hardware (Orin, EyeQ, etc.)Real-time: Hardware clock rate                   │
ISO 26262: Required for functional safety certification│
└──────────────────────────────────────────────────┘

10.4 V-Model Test Pyramid

MIL (Model-in-the-Loop)MATLAB/Simulink prototyping
SILHost PC + simulation environment
PIL (Processor-in-the-Loop)Target processor compilation, host execution
HILTarget ECU + simulation environment
VIL (Vehicle-in-the-Loop)Real vehicle + scenario injection
Road TestingReal vehicle + real environment

10.5 Industry Tools

ToolUse Case
dSPACE SCALEXIOHIL simulation
NI PXIPXI-based HIL
Vector CANoeBus simulation
Applied Intuition HIL SimADAS/AD HIL platform
IPG CarMakerSIL/HIL vehicle dynamics

11. Simulation Software

11.1 Major Simulator Comparison

FeatureCARLAIsaac SimLGSVLCarSimSimulink
Open sourceYesYesYes*NoNo
EngineUnrealOmniverseUnityProprietaryProprietary
Sensor simulationHighVery highHighLowMedium
Vehicle dynamicsMediumMediumMediumVery highHigh
ROS2 supportYesYesYesBridgeToolbox
Synthetic dataYesBestYesNoLimited
ML trainingAPIIsaac Lab (RL)APINoRL Toolbox
Active dev (2025)YesYesNo*YesYes

*LGSVL was discontinued by LG

11.2 CARLA (Open Source, Unreal Engine)

# Run CARLA via Docker
docker pull carlasim/carla:0.9.15
docker run --privileged --gpus all --net=host \
    carlasim/carla:0.9.15 /bin/bash ./CarlaUE4.sh

# Control scenarios via Python API
pip install carla
import carla

client = carla.Client('localhost', 2000)
world = client.get_world()

# Spawn a vehicle
blueprint = world.get_blueprint_library().find('vehicle.tesla.model3')
spawn_point = world.get_map().get_spawn_points()[0]
vehicle = world.spawn_actor(blueprint, spawn_point)

# Attach a camera sensor
camera_bp = world.get_blueprint_library().find('sensor.camera.rgb')
camera = world.spawn_actor(camera_bp, carla.Transform(), attach_to=vehicle)

11.3 NVIDIA Isaac Sim

  • Omniverse (USD)-based, photorealistic RGB, depth, and segmentation masks via RTX renderer
  • PhysX GPU-accelerated physics engine
  • NuRec neural rendering to minimize the sim-to-real gap
  • Isaac Lab (RL training), Replicator (synthetic data), Cosmos (generative AI environments)

11.4 Learning Resources


12. Full Autonomous Driving Stack

12.1 Modular Stack Architecture

┌─────────────────────────────────────────────────────────────────┐
Full Autonomous Driving Stack│                                                                  │
1. Sensing         Sensor drivers, time sync, logging           │
│       ↓                                                          │
2. Localization    HD Map matching, V-SLAM, LiDAR SLAM, GNSS/IMU│       ↓              → 6-DOF vehicle pose (100+ Hz)3. Perception      3D detection, tracking, semantic seg, Occupancy│
│       ↓              → 3D bounding boxes, track IDs, semantic map│
4. Prediction      Agent future trajectory prediction (3-8s)│       ↓              → Multi-modal trajectories per agent        │
5. Planning        Route planning, behavior planning, motion planning│
│       ↓              → Trajectory (pose + velocity sequence)6. Control         Lateral (steering) + longitudinal (accel/brake)│       ↓              → CAN commands (steer-by-wire, brake-by-wire)└─────────────────────────────────────────────────────────────────┘

12.2 End-to-End vs Modular

ApproachProsCons
ModularClear interfaces, easy testing, interpretableError propagation, inter-module information loss
End-to-EndGlobal optimization, information preservationHard to interpret, difficult safety verification
HybridLearned perception + rule-based safetyCurrent industry mainstream

12.3 Open-Source Stacks

StackDescription
AutowareWorld's leading open-source AD stack, ROS2-based, fully modular
Apollo (Baidu)Comprehensive AD platform, deployed in robotaxi operations

13. VR/AR and Digital Twins

13.1 Application Areas

AreaDescription
Digital TwinVirtual replica of physical robot/environment, real-time sync
TeleoperationRemote robot control via VR (surgery, hazardous environments, space)
Data CollectionHuman demonstrations in VR as robot policy training data
Simulation VisualizationDevelopers immerse in the robot's world for debugging

13.2 Key Platforms

  • NVIDIA Omniverse: USD-based, real-time rendering, physics simulation, multi-user collaboration
  • Unity + ROS: ROS-Unity integration via Unity Robotics Hub
  • WebXR + rosbridge: Browser-based VR robot control

14. Cloud Technologies

14.1 Why Cloud?

Autonomous vehicles generate 1 to 5TB of data per hour. Training perception models requires thousands of GPU-hours. Cloud is not optional — it is essential infrastructure.

14.2 Data Pipeline

Vehicle (Edge)
Upload raw logs via cellular/WiFi
Object Storage (S3/GCS/Azure Blob)
Data catalog & indexing (scenario mining)
Auto-annotation (pre-labeling with existing models)
Human annotation (verification, corner cases)
Dataset versioning (DVC, LakeFS)
Training cluster
Model registry
Validation pipeline (offline metrics, SIL)
OTA deployment

14.3 Key Technologies

TechnologyRole
Apache KafkaReal-time streaming (telemetry, OTA, vehicle comms)
Apache FlinkStream processing (real-time scenario detection)
Apache SparkLarge-scale batch data transformation
Apache AirflowML pipeline workflow orchestration
MCAPMultimodal log data format (successor to rosbag)

14.4 OTA (Over-the-Air) Updates

A/B Partitioning: Update inactive partition → switch on reboot
Delta Updates: Transmit only changed bytes (100-500MB vs 10+GB)
Staged Rollout: 1% → monitor → gradual expansion
Rollback: Revert to previous version on anomaly detection
  • Cryptographic signing, apply only in safe state, ISO 24089 standard

14.5 Data Flywheel

Model deployment → Real-world driving data collection → Automatic failure case mining
Additional annotation → RetrainingSIL validation → A/B testing → Full deployment
[Repeat]

15. Learning Roadmap

15.1 Fundamentals (1-3 Months)

OrderTopicRecommended Resources
1Modern C++ (17/20)Programming with C++20
2ROS2 BasicsROS2 Jazzy Tutorials
3Linux/POSIX Systems ProgrammingAPUE (Advanced Programming in the UNIX Environment)
4Computer Vision FundamentalsCS231n (Stanford)

15.2 Intermediate (3-6 Months)

OrderTopicRecommended Resources
5CUDA ProgrammingCUDA C Programming Guide
6Sensor Fusion (KF, EKF)Probabilistic Robotics (Thrun)
7AD Perception (BEV, 3D Detection)BEVFormer Paper
8TensorRT Optimization & DeploymentTensorRT Documentation

15.3 Advanced (6-12 Months)

OrderTopicRecommended Resources
9Full AD StackAutoware Documentation
10VLM/VLA ModelsVLA Survey
11Simulation (CARLA)CARLA Tutorials
12SIL/HIL TestingHands-on projects
13Cloud MLOpsPractical experience

16. References

Official Documentation

  1. NVIDIA CUDA Programming Guide
  2. NVIDIA TensorRT Documentation
  3. ROS2 Jazzy Documentation
  4. CARLA Documentation
  5. NVIDIA Isaac Sim Documentation
  6. Autoware Documentation

Key Papers

  1. Li, Z., et al. (2022). "BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers". ECCV 2022.
  2. Hu, Y., et al. (2023). "Planning-Oriented Autonomous Driving (UniAD)". CVPR 2023 Best Paper.
  3. Brohan, A., et al. (2023). "RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control". arxiv.org/abs/2307.15818
  4. Black, K., et al. (2024). "pi0: A Vision-Language-Action Flow Model for General Robot Control". arxiv.org/abs/2410.24164
  5. Team, O., et al. (2024). "Octo: An Open-Source Generalist Robot Policy". octo-models.github.io
  6. Kim, M., et al. (2024). "OpenVLA: An Open-Source Vision-Language-Action Model". arxiv.org/abs/2406.09246

GitHub Repositories

  1. carla-simulator/carla
  2. autowarefoundation/autoware
  3. ApolloAuto/apollo
  4. openvla/openvla
  5. octo-models/octo
  6. OpenDriveLab/UniAD
  7. NVIDIA/Model-Optimizer

Blog Posts and Tutorials

  1. NVIDIA: How DRIVE AGX Achieves Fast Perception
  2. NVIDIA: Top 5 AI Model Optimization Techniques
  3. Multi-Sensor Fusion Survey (MDPI)
  4. VLA Models Overview (DigitalOcean)
  5. NetApp: Data Pipeline for Autonomous Driving