Robotics

LiDAR and Point Clouds

2005. DARPA Grand Challenge. Stanford Stanley wins: 5 LiDAR on the roof, 131 miles across the Mojave without a driver. Velodyne HDL-64E - 64 laser beams, 1.3 million points per second. The beginning of an era. LiDAR turned autonomous driving from a lab experiment into reality.

Waymo One: 5 Velodyne Alpha Prime LiDAR + real-time segmentation at 100 km/h - 20 million miles with passengers across the US
Boston Dynamics Spot: 5 stereo depth cameras + LiDAR for indoor SLAM in industrial facilities and mines
Apple iPhone 12 Pro: solid-state LiDAR for AR face mesh and room scanning - 40 million devices with LiDAR in a pocket
HERE Technologies: mobile LiDAR mapping vans - 3D HD maps for 110 countries powering autonomous driving

Historical context

In 2005, Sebastian Thrun's Stanford team won the DARPA Grand Challenge - the first autonomous completion of 132 miles of the Mojave Desert in history. Five LiDAR units on the roof of Stanford Stanley were an unconventional choice: most teams bet on cameras and radar. Velodyne, founded by David Hall in a garage in 2005, developed the HDL-64E specifically for DARPA. That victory defined the architecture of autonomous vehicles for the next 15 years. In 2007, the CMU team (Boss) won the DARPA Urban Challenge - this time in urban conditions with other vehicles.

LiDAR: working principle and sensor types

**2005. DARPA Grand Challenge. Stanford Stanley wins: 5 LiDAR units on the roof, 131 miles across the Mojave without a driver.** Velodyne HDL-64E - 64 laser beams, 1.3 million points per second. The beginning of an era. LiDAR (Light Detection And Ranging) is not one technology but a family of sensors with different physics and architectures.

**TOF vs FMCW:** Time of Flight (TOF) measures time `t` between sending and receiving a pulse. Distance `d = c * t / 2`, where `c` is the speed of light. FMCW (Frequency Modulated Continuous Wave) emits a continuous wave with frequency modulation and computes distance from the frequency shift of the echo. FMCW gives object velocity (Doppler) without additional measurements and handles bright sunlight better.

Type	Example	Beams	Points/sec	Price	Notes
Mechanical (rotating)	Velodyne HDL-64E	64	1.3M	$75K+	360 FOV, battle-tested
Mechanical (rotating)	Ouster OS1-128	128	2.6M	$8K	Compact, ROS 2 ready
Solid-state (MEMS)	Livox Mid-360	non-uniform	240K	$600	No rotating parts, long life
Solid-state (Flash)	Luminar Iris	scan pattern	300K	$500 (auto)	Range 250 m, FMCW

A LiDAR TOF sensor sends a pulse. The echo returns after 100 ns (nanoseconds). What is the distance to the object? (speed of light = 3*10^8 m/s)

Point cloud structure and processing

**A point cloud is not a 3D model. It is a set of points in space with no connections between them.** Each point is a tuple `(x, y, z, intensity, ring, timestamp)`. No normals, no triangles, no topology. PointNet has no idea the points are neighbors. Algorithms must build connections explicitly - via k-NN, voxel grid, or a KD-tree.

**Coordinate frames:** in robotics it is critical to distinguish frames. `sensor_frame` - origin at LiDAR center. `base_link` - origin at robot center. `map` - global frame. Transformations between frames are stored in the TF2 tree in ROS 2. Frame confusion is a frequent source of navigation bugs.

**Open3D vs PCL:** Open3D (Python/C++) is the modern choice for prototyping and ML integration. PCL (C++) is the industry standard with a rich algorithm library and better performance for embedded. Both are available in ROS 2: `pcl_ros` for PCL, `open3d_ros_helper` for Open3D.

Voxel downsampling with voxel_size=0.1 m is applied to a point cloud of 1 million points. What happens to the geometric accuracy of small objects?

Registration: ICP, NDT and LiDAR SLAM

**Waymo builds HD maps with 5 cm accuracy - the backbone of their fleet across 5 cities.** Registration - the task of aligning two point clouds. ICP (Iterative Closest Point): find correspondences between points -> compute transform -> apply -> repeat until convergence. LiDAR SLAM extends this to online map building with simultaneous localization.

**LOAM and LIO-SAM:** LOAM (Lidar Odometry and Mapping) extracts edge and planar features from the point cloud instead of working with the full cloud - 10x faster than full ICP. LIO-SAM adds tightly-coupled IMU and GPS for global correction. These are the standard algorithms at Waymo and Apollo for HD map building.

ICP converged to a transform with fitness=0.3 (30% correspondences). Should this result be trusted?

Segmentation: from RANSAC to PointPillars

**Tesla uses cameras only. Waymo uses 5 LiDAR + 29 cameras + 6 radars. The difference: Tesla reconstructs 3D from 2D pixels. Waymo gets 3D directly. Two philosophical camps in autonomous driving.** Point cloud segmentation splits the cloud into semantic classes: ground, road, pedestrian, car. Each approach offers a different trade-off between speed, accuracy, and robustness.

**SemanticKITTI and IoU:** the standard benchmark is SemanticKITTI (KITTI sequences with per-point labels). Metric - mean IoU across 19 classes. RandLA-Net (2020): random sampling + Local Feature Aggregation, 77.4 mIoU. PointPillars (object detection): 62.7 AP on KITTI Car. For evaluating a custom model - use IoU per class, not accuracy (class imbalance: ground >> pedestrians).

**LiDAR goes blind in snow:** rain drops and snowflakes reflect the laser like objects - false points everywhere. Cameras lose clarity in rain. Radar works in any weather but has low resolution. A robust system fuses all three. That is why Waymo does not drop LiDAR despite the cost.

Deep learning for point clouds requires a fixed number of input points

PointNet++ and modern architectures (RandLA-Net, PointPillars) natively handle variable-size point clouds via symmetric aggregation operations

PointNet uses per-point MLP + global max pooling - a symmetric function invariant to point order and count. Max pooling over N points produces a fixed-size vector regardless of N. PointPillars divides space into pillars and aggregates points inside each one - the number of points per pillar varies, but the result is always the same size. The fixed-count constraint only appears when explicit padding/truncation is added for batch processing.

RANSAC plane fitting for ground removal uses distance_threshold=0.2 m. On a flat road results are good. What happens on a 15-degree slope?