Robotics

Sensor Fusion

April 1970. Apollo 13. An oxygen tank explodes 200,000 miles from Earth. The AGC navigation computer falls back to inertial-only mode - gyroscopes and accelerometers, no external signals. After 200 hours, accumulated gyroscope drift would have introduced an error of tens of kilometers. The astronauts survived in part because 9 years earlier Rudolf Kalman had found the mathematically optimal way to combine noisy sensor data with an imperfect physical model. Today that same algorithm runs inside Tesla, SpaceX, and every smartphone GPS chip. No single sensor can trust itself alone - which is exactly why sensor fusion became the foundation of all autonomous systems.

  • **Tesla Autopilot** - 8-camera + IMU fusion at 36 fps; EKF for real-time object tracking across all camera views
  • **Waymo One** - 29 cameras + 5 LIDARs + 6 radars in a unified KF stack; 20 million km of autonomous driving by 2024
  • **SpaceX Falcon 9** - Kalman filter for first-stage landing: IMU + optical sensors + GPS, achieving 10 cm accuracy on the landing pad
  • **Smartphone GPS** - every GPS chip fuses signals from 4-12 satellites via weighted least squares (simplified KF), producing the smooth dot on the map
  • **Da Vinci surgical robots** - encoder + force sensor fusion for sub-millimeter precision that cancels surgeon hand tremor

Kalman Filter: optimal statistics under the hood

April 1970. Apollo 13. The AGC navigation computer runs only on inertial sensors - accelerometers and gyroscopes. No external signals. After 200 hours of flight, accumulated gyroscope drift would introduce an error of dozens of kilometers. The astronauts survived not because the sensors were precise - but because 9 years earlier Rudolf Kalman published a paper that NASA embedded into the Apollo program. That paper described how to combine imperfect measurements with an imperfect physical model and get a better answer than any single source could provide.

The Kalman filter is not a filter in the DSP sense. It is a recursive Bayesian estimator for linear systems with Gaussian noise. Each step consists of two phases: prediction (what the motion model expects) and correction (how the measurement deviates from the prediction). Formally:

Where $\hat{x}$ is the state vector (position, velocity), $P$ is the covariance matrix (uncertainty), $F$ is the state transition matrix, $Q$ is the process noise covariance. After prediction, measurement $z_k$ arrives - and the correction step begins:

$K_k$ is the Kalman gain - the "trust weight" given to the measurement. When the sensor is noisy (large $R_k$), $K_k$ shrinks and the filter trusts the model. When the model is imprecise (large $Q_k$), $K_k$ grows and the filter trusts the sensor. This is the automatically computed optimal balance within the class of linear unbiased estimators.

The Kalman filter is the MMSE (Minimum Mean Square Error) estimator for linear Gaussian systems. Tesla Autopilot uses its variants for object tracking. SpaceX uses it for rocket landing. Every GPS chip in every smartphone uses it to smooth position output. One mathematical framework, thousands of applications.

What happens to Kalman gain $K_k$ when the measurement noise $R_k$ is very large?

Extended Kalman Filter: handling the nonlinear world

The classical Kalman filter handles only linear systems. But the real world is uncooperative: GPS provides coordinates in a spherical system, cameras project 3D space through a nonlinear transformation, wheel odometry relates encoder ticks to position through trigonometry. A vehicle with IMU and GPS is a nonlinear system all the way down. The Extended Kalman Filter (EKF) is the engineering solution: linearize the nonlinearity at each step using the Jacobian matrix.

Instead of a fixed linear matrix $F$, EKF computes the Jacobian of the nonlinear transition function $f$ evaluated at the current state estimate:

This is first-order Taylor expansion. It works well when the function is smooth and the state changes gradually. Waymo uses EKF for fusing LIDAR + IMU data in real time - 100 Hz update rate, 64-beam LIDAR, 6-axis IMU, everything merged into a single state vector. When the vehicle turns sharply, the Jacobian is recomputed every step - which is why EKF is more expensive than the linear Kalman filter.

EKF is the workhorse of autonomous systems. Its weakness: linearization introduces errors under strong nonlinearity (aggressive maneuvers, large attitude changes). The Unscented Kalman Filter (UKF) addresses this by propagating a set of "sigma points" through the nonlinear function instead of a derivative. More accurate, but roughly twice the computational cost.

Why does EKF outperform the classical Kalman filter for a wheeled robot?

IMU + GPS + LIDAR: fusion in production systems

Tesla Vision Stack 2023: 8 cameras, accelerometer, gyroscope - no LIDAR (Tesla dropped ultrasonic sensors in 2022). Waymo One: 29 cameras + 5 LIDARs + 6 radars + IMU. Different fusion architectures - but both do the same thing: merge 10+ data streams with different update rates, latencies, and noise profiles into a single 3D world state vector.

The practical pipeline looks like this: IMU runs at 200-1000 Hz, providing high-frequency acceleration and angular velocity. GPS updates at 10 Hz with 3-5 meter accuracy on open sky - and drops to zero in tunnels. LIDAR outputs 10-20 Hz with 100K+ points per scan. Cameras run at 30-60 Hz, delivering 2D pixels that require geometric calibration to convert into 3D.

IMU serves as the primary prediction source - high frequency, low latency. GPS and LIDAR act as correction sources at lower rates. When GPS signal is lost (tunnel, urban canyon), the EKF continues operating on IMU and LIDAR alone - this is called dead reckoning. Error accumulates, but slowly. When GPS returns, the correction step applies a jump that gets smoothed by the integrator.

LIDAR + camera fusion is its own story. LIDAR provides precise depth but no color and no semantic meaning. A camera provides color and semantics (this is a pedestrian, this is a stop sign) but struggles with depth estimation. Fusion: for each LIDAR point, find the corresponding camera pixel using the extrinsic calibration matrix. The result is a "colored point cloud" - 3D position + RGB + semantic class. This is how Waymo builds an HD map in real time.

The bottleneck in sensor fusion is calibration. The LIDAR-camera extrinsic matrix must be accurate to fractions of a degree. A 0.5-degree offset at 50 meters translates to a 43 cm positional error. This is why every autonomous vehicle manufacturer uses dedicated calibration rigs and recalibrates after every service event.

More sensors always means better accuracy - just average all the data together

Averaging works only when sensors are independent and have identical noise models. Kalman-based fusion accounts for the covariance of each source and computes the mathematically optimal weight - this is a rigorous result, not a heuristic.

Averaging a precise IMU with a noisy GPS signal equally produces a result worse than the IMU alone. The Kalman filter automatically reduces the weight of the noisy source through the measurement noise matrix $R$ - and the math guarantees this is optimal.

Why is IMU used as the primary source for the prediction phase, while GPS only handles corrections?

Key ideas

  • **Kalman filter** - the optimal MMSE estimator for linear Gaussian systems: two steps predict-update, with Kalman gain $K_k$ automatically balancing trust between model and sensor
  • **Extended Kalman Filter** - linearizes nonlinear functions via the Jacobian at the current estimate; first-order Taylor approximation, works well for smooth dynamics
  • **Sensor fusion in practice** - IMU as high-frequency predictor (200+ Hz), GPS/LIDAR as correctors; when GPS is lost, the system continues via dead reckoning on IMU
  • **LIDAR + camera fusion** - 3D position from LIDAR + semantics from camera through the extrinsic calibration matrix; accuracy depends on sub-degree calibration
  • **Averaging is not fusion**: the optimal weight for each source is determined by its noise covariance matrix $R$, not by heuristics

Related topics

Sensor fusion builds on probability theory and leads toward higher-level perception tasks:

  • SLAM — Simultaneous Localization and Mapping is built on top of EKF fusion
  • LiDAR and Point Clouds — Raw LIDAR data is the primary input for LIDAR+camera fusion pipelines
  • Normal Distribution — Gaussian noise model is the mathematical core of the Kalman filter
  • Autonomous Vehicles — Production use case: IMU + GPS + LIDAR + camera fusion in a single stack

Вопросы для размышления

  • Why did Waymo keep LIDAR despite its USD 7500+ unit cost, rather than going camera-only like Tesla?
  • The Kalman filter assumes Gaussian noise. What happens to accuracy when noise is non-Gaussian - for example, GPS multipath reflection from a building that suddenly shifts the measured position by 15 meters?
  • Particle filters are an alternative to EKF for arbitrary (non-Gaussian) distributions. When does it make sense to use one over EKF, given that particle filters require 1000+ particles and are orders of magnitude more expensive?

Связанные уроки

  • prob-11-normal — Gaussian noise model is the mathematical foundation of Kalman filter
  • prob-08-variance — Covariance is the language of uncertainty in Kalman filter
  • rob-07 — SLAM is built on top of sensor fusion as its foundation
  • rob-13 — Autonomous vehicles are the primary production use case for EKF and fusion
  • dsp-01 — Signals and noise - DSP basis for understanding sensor measurements
  • prob-04-bayes
Sensor Fusion

0

1

Sign In