AR/VR

3D Reconstruction for XR

Architects pay thousands of dollars for professional 3D scanners that take an hour to model a building. In 2024, an iPhone with LiDAR does the same in 2 minutes from a pocket. Neural methods add photorealistic textures without any special hardware. 3D reconstruction has stopped being an industry-only capability.

**Luma AI + Vision Pro:** shoot an apartment on iPhone, get a 3D Gaussian Splatting model, show buyers in VR - this is already available
**Apple Measure:** LiDAR in iPhone instantly places a virtual tape measure on real surfaces with ~1 cm accuracy
**Surgical planning:** MRI data + neural reconstruction builds an interactive 3D organ model for Vision Pro before a procedure

Depth Sensing: how a device measures distances

A phone is placed on a table. An AR app opens, and a virtual glass of water is placed on that table - it does not fall through the surface and does not float in midair. The device knows where the table is. This is the result of **depth sensing** - measuring the distance to every point in the field of view.

Three main approaches to depth measurement exist, each with different underlying physics:

In practice, devices combine methods. iPhone uses LiDAR for a coarse depth estimate (fast, energy-efficient) and stereo vision for detail refinement. Vision Pro with six cameras builds a depth map from multiple angles simultaneously, eliminating occlusions - areas one camera cannot see but another can.

**Confidence map:** ARKit returns a depth frame as both a matrix of distances and a confidence matrix (.low, .medium, .high) for each pixel. Pixels with low confidence (shiny surfaces, transparent objects) need special handling.

LiDAR works well in darkness, while structured light degrades in bright sunlight. Why?

Mesh Reconstruction: from points to surface

A depth sensor produces a **point cloud** - millions of three-dimensional coordinates with no connections between them. This is like knowing the position of every atom in a sculpture without knowing what it is made of. **Mesh reconstruction** is the algorithm that builds a surface (a triangle mesh) from this cloud of points.

ARKit provides **SceneReconstructionProvider** (visionOS) and **ARMeshAnchor** (iOS), delivering a ready-made room mesh without requiring a manual algorithm implementation. The mesh updates as the user moves and new sensor data arrives.

**Why mesh matters for XR:** a mesh enables correct **occlusion** handling - a virtual object behind a real sofa must be hidden by the sofa. Without a mesh, AR objects are always drawn on top of the entire real scene.

A virtual ball rolls behind a real chair in an AR app. What technology makes the ball correctly disappear behind the chair?

LiDAR: a laser scanner on the phone

In 2003, the Mars rover Spirit used LiDAR for surface navigation. The equipment weighed several kilograms. In 2020, Apple placed LiDAR in the iPad Pro and iPhone 12 Pro - a coin-sized component in a pocket. This transformed mobile AR.

Apple's mobile LiDAR is a **SPAD sensor** (Single-Photon Avalanche Diode) paired with an infrared laser. It emits laser pulses and measures the return time of reflected photons. Up to 2.5 million point measurements per second, covering a range from 0.5 cm to 5 meters.

LiDAR has a fundamental limitation: **mirrors and transparent surfaces**. A mirror deflects the beam sideways; transparent glass lets it pass through. In both cases the sensor receives no valid return. This is a physics problem with no pure software solution without additional sensors.

ARKit with LiDAR builds a room mesh almost instantly. However, a glass table appears as a 'hole' in the mesh. Why?

Neural 3D Reconstruction: networks that build worlds

NeRF (Neural Radiance Fields) appeared in 2020 and caused an upheaval: a few dozen photos of an object from different angles, and a neural network synthesizes a complete 3D scene including lighting and materials. No LiDAR. No special cameras. Just ordinary photographs.

Original NeRF trained in hours and rendered in seconds - unacceptable for XR. **3D Gaussian Splatting** (2023) is the next step: instead of a neural network, the scene is represented as a set of 3D Gaussians rendered via rasterization rather than ray marching. Speed: real time (60+ FPS) at quality comparable to NeRF.

A practical XR workflow already exists: the **Luma AI** app on iPhone captures a scene as video, uploads it to a server, receives a 3D Gaussian Splatting model, and displays it on Vision Pro. The entire process takes about 20 minutes. This is the first example of neural 3D reconstruction in a mass consumer product.

**Limitation of all Neural 3D methods:** dynamic objects (people, cars) reconstruct poorly - the network optimizes for a static scene. Dynamic NeRF and D-3DGS address this partially, but require significantly more compute.

Neural 3D Reconstruction will replace LiDAR in XR devices

They solve different problems. LiDAR provides accurate geometry in real time for tracking and physics. Neural 3D builds photorealistic models over minutes but is not suitable for real-time sensor tracking

LiDAR operates at 30+ FPS with millisecond latency. NeRF/3DGS requires minutes of training on a captured scene. XR needs both: LiDAR for spatial tracking, Neural 3D for content creation

NeRF trains for hours and renders in seconds. 3D Gaussian Splatting renders in real time. What is the key change in how the scene is represented?