AR/VR

ARKit and ARCore

In 2017 Apple released ARKit - and within weeks the App Store was flooded with thousands of AR apps. Developers did not write computer vision algorithms from scratch: the framework hid years of SLAM, VIO, and neural network research behind clean APIs. But understanding what happens under the hood is critical - otherwise it is unclear why AR loses tracking during fast movement or why objects drift on a surface.

  • **IKEA Place** - plane detection for furniture placement in a room
  • **Snapchat and Instagram masks** - real-time face tracking with blend shapes
  • **Measure (built into iOS)** - AR ruler built on plane detection and anchors
  • **Google Maps Live View** - world anchors for AR street navigation

Plane Detection: how the phone sees surfaces

The IKEA Place app lets users place a virtual sofa in a room to see how it looks. But how does the phone know where the floor is? The camera sees only a pixel stream - no explicit geometry. **Plane detection** addresses this: an algorithm that reconstructs scene geometry from a moving camera.

ARKit and ARCore use **Visual-Inertial Odometry (VIO)**: camera data is fused with gyroscope and accelerometer readings. The algorithm tracks feature points across frames, computes their displacement, and builds a sparse point cloud. Clusters of coplanar points are merged into a plane.

Planes are refined over time: when first detected they are small and imprecise. After a few seconds of camera movement ARKit expands and levels them - this fires the `didUpdate node for anchor` callback. Apps typically render a mesh overlay on planes to signal the user that a surface has been recognized.

FeatureARKit (iOS)ARCore (Android)
Horizontal planesYesYes
Vertical planesYes (A12+)Yes
Angled planesYes (ARKit 3.5+)Limited
Depth dataLiDAR (iPhone 12 Pro+)ToF sensor (select Android)

Plane detection in ARKit/ARCore is primarily based on:

World Anchors: objects that stick to the world

Once a surface is detected, a virtual object needs to stay in place as the camera moves. This is the job of **anchors** - attachment points in world space. An anchor stores a position and orientation in the session coordinate system. When ARKit refines its world map it updates all anchor transforms, keeping objects in place.

**Persistent anchors** allow saving positions across sessions. ARKit serializes the ARWorldMap - a map containing a point cloud and anchors - to disk. On the next launch in the same space the map is loaded and objects reappear at their original positions. This is how collaborative AR apps work: multiple devices share one map.

ARCore uses **Cloud Anchors** for cross-device sharing: an anchor is uploaded to Google's servers, other devices download it and localize against it. This is the foundation of multiplayer AR experiences.

Why does ARKit update an anchor's transform after it is created?

Light Estimation: virtual lighting in a real room

A virtual cube inserted into a real room looks fake if lit differently from surrounding objects. An AR object on the sunny side of a room should be bright; in shadow it should be dark. **Light estimation** analyzes camera frames and recovers the lighting parameters of the real scene.

ARKit offers two levels: basic (intensity + colorTemperature) and advanced (**Environment Lighting** via spherical harmonics). Spherical harmonics encode the distribution of light across all directions - similar to an HDR panorama compressed into a few coefficients.

On LiDAR devices (iPhone 12 Pro+) ARKit also estimates the direction of the light source and casts correct shadows from virtual objects onto real surfaces - this is called **Scene Geometry** + ray casting for shadows.

Spherical harmonics in light estimation are used to:

Face Tracking: ARFaceAnchor and blend shapes

Snapchat masks, Animoji, virtual eyewear try-on - all rely on face tracking. ARKit uses the front-facing TrueDepth camera (infrared projector + IR camera) to build an accurate 3D face model in real time. On Android, ARCore Face Mesh builds the model from the RGB camera using an ML model.

ARKit returns an **ARFaceAnchor** containing a 3D face mesh (1220 vertices) and **blendShapes**: a dictionary of 52 coefficients, each describing one element of facial expression. The `jawOpen` coefficient represents how open the mouth is (0 = closed, 1 = fully open). These coefficients are what Animoji uses to drive 3D character animation.

  • **jawOpen, jawLeft, jawRight** - lower jaw movement
  • **mouthSmileLeft/Right** - smile
  • **eyeBlinkLeft/Right** - blink
  • **browOuterUpLeft/Right** - raised eyebrows
  • **tongueOut** - tongue (ARKit 3+)

Face tracking requires the front-facing TrueDepth camera (Face ID iPhones). On devices without TrueDepth ARKit falls back to a limited mode using the standard front camera, with fewer blend shapes available.

Face tracking is available on any iPhone with ARKit

Full face tracking with 52 blend shapes requires the TrueDepth camera (Face ID). Without it only a limited RGB-camera mode is available with fewer blend shapes

TrueDepth projects 30,000 infrared dots and reads them back - this enables accurate 3D face geometry. An RGB camera only analyzes texture, which is less precise

Blend shapes in ARFaceAnchor are:

ARKit and ARCore

  • Plane detection reconstructs scene geometry via VIO: camera feature points + IMU; planes are refined continuously over time
  • Anchors fix virtual objects in world coordinates; ARKit corrects their transforms as the map is refined so objects stay put
  • Light estimation from basic (intensity/temperature) to spherical harmonics lets virtual objects reflect real environment lighting
  • Face tracking returns a 3D face mesh and 52 blend shapes for driving character expressions in real time

Related topics

AR frameworks are the foundation on which higher-level capabilities are built.

  • Web AR and WebXR — Next step: AR without app installation
  • VR: performance optimization — Previous lesson: frame budget and foveated rendering

Вопросы для размышления

  • Why does plane detection lose accuracy during fast camera movement and how does this affect anchor behavior?
  • In what scenarios are ARCore Cloud Anchors preferable to local ARWorldMap from ARKit?
  • How does light estimation via spherical harmonics relate to image-based lighting techniques in traditional 3D graphics?

Связанные уроки

  • ml-01-intro
ARKit and ARCore

0

1

Sign In