AR/VR
Multiplayer XR
In 2021, Microsoft sold the US Pentagon 120,000 HoloLens 2 headsets for USD 21.88 billion. The primary use case was not gaming but collaborative tactical planning: multiple soldiers seeing the same AR map of the terrain with live data overlaid. That is multiplayer XR at its most serious - not entertainment, but coordinating people in a shared virtual space.
- **Meta Horizon Workrooms** supports up to 16 avatars working at the same virtual table, viewing shared documents and whiteboards, with positional audio - a colleague's voice comes from their side of the table
- **Niantic Lightship** (Pokemon GO, Ingress) builds a global visual anchor map from data contributed by 100+ million devices - every player scanning their environment improves the shared map for co-location
- **Apple Vision Pro** Spatial Personas: two users see three-dimensional avatars of each other in a shared space with real-time synchronization of head and hand movements
Shared Spaces
The hardest problem in multiplayer XR is not networking or rendering - it is agreeing on a shared reality. When two people in different cities stand at the same virtual table, their headsets must reconcile coordinate systems to centimeter precision - otherwise a virtual hand passes through another participant's shoulder. A shared space is a common three-dimensional environment where multiple users coexist simultaneously, see the same objects at the same positions, and interact with shared state.
Two fundamentally different types of shared spaces: **Remote shared** - users in different locations, coordinates synchronized through a server, each with their own physical environment. **Co-located** - users physically in the same room; headsets must find a common coordinate system via cloud anchors or direct peer-to-peer alignment. Co-located is technically harder and requires participants to be physically close.
What is the key technical difference between a remote shared space and a co-located shared space?
XR Networking
XR imposes extreme networking requirements: head and hand positions update 90 times per second, and latency above 50 ms becomes physically perceptible - users experience motion sickness. Standard HTTP REST is unusable for multiplayer XR - specialized protocols with minimal overhead and client-side position prediction are required. In 2023, Meta Horizon Worlds supported up to 32 simultaneous avatars - each transmitting 6DoF position, 26 hand joints, and voice data every frame.
Key XR networking techniques: **Dead reckoning** - predicting position from velocity and acceleration to hide network latency. **Interpolation** - smooth blending between received state snapshots. **State Authority** - only one client (the object owner) writes authoritative state; others read. **Interest management** - send updates only about objects visible to each specific user (network-level LOD).
Why is dead reckoning needed in XR multiplayer?
Co-location
Co-location - multiple people in the same room seeing the same AR objects at precise real-world positions - is what transforms AR from a personal gadget into a collaborative tool. IKEA tested this in 2022: designers and customers simultaneously viewed furniture in a real room through different headsets and smartphones. The core challenge: every headset initializes in its own coordinate system - a mechanism is required for all devices to agree on a shared origin.
Co-location alignment methods: **QR marker** - all devices scan the same physical tag; its position becomes the origin (simple, but requires a physical marker). **LiDAR mesh matching** - comparing 3D room scans between devices (marker-free, computationally expensive). **Cloud Anchors** - the first device uploads feature points to the cloud; others find the same physical object via the server (Google ARCore, Apple ARKit). **UWB ranging** - ultra-wideband signals for precise inter-device distance measurement (10 cm accuracy, requires dedicated hardware).
Which co-location alignment method requires neither physical markers nor specialized hardware?
Cloud Anchors
A Cloud Anchor is a persistent binding to a physical location stored in the cloud: not a GPS coordinate, but a detailed map of visual features (feature map) of a specific surface or object. When a device scans a room, the server (Google ARCore Geospatial API or Apple ARKit) compares observed features against the stored map and computes the precise position and orientation. Persistence means the anchor survives app restarts - a user can return the next day and see AR objects in the same locations.
**Cloud Anchor lifecycle**: Hosting (first device creates anchor, uploads feature map) -> Storage (cloud server retains for 1-365 days) -> Resolving (other devices locate the anchor and receive a transform in their own coordinate space). Accuracy: Google claims under 1 cm with good lighting and sufficient surface texture variety. On homogeneous surfaces (white walls) accuracy degrades severely.
Cloud Anchors use GPS to georeference AR objects
Cloud Anchors store detailed visual feature maps of specific physical surfaces - this is computer vision, not GPS
GPS provides 3-5 meter accuracy - unacceptable for AR objects. Feature maps enable centimeter-level precision, but only when the camera sees sufficiently textured surfaces.
When does Cloud Anchor accuracy drop significantly?
Key Ideas
- **Shared space** demands not just network sync but coordinate system alignment - especially hard for co-located scenarios where every headset builds its own origin independently
- **XR networking** operates under strict latency constraints: above 50 ms causes motion sickness, requiring dead reckoning, interpolation, and the State Authority pattern to minimize round-trip cost
- **Cloud Anchors** bridge sessions and devices: a feature map of a physical place stored in the cloud lets AR objects persist between sessions and be visible across different devices
Related Topics
Multiplayer XR builds on tracking and haptics foundations:
- Haptics and Multimodal Input — In multiplayer, haptic feedback must sync with other participants' actions - an additional latency dimension to manage
- XR Tracking — Tracking accuracy of each headset directly determines the precision of co-location alignment
Вопросы для размышления
- With 32 avatars in a shared XR space, each sending 90 position updates per second - how would you scale server infrastructure, and at what user count does it make sense to switch from a full mesh topology to a star topology?
- Co-location via Cloud Anchors only works with good lighting and textured surfaces - how would you design a fallback for dark rooms or plain white walls in an enterprise application?
- If XR collaboration replaces physical meetings, how do you create a sense of genuine presence - the feeling that others are actually 'there' rather than just avatars on a screen?