Computational Geometry

Geometric Deep Learning

AlphaFold2 predicts protein structures more accurately than X-ray crystallography. Under the hood: Invariant Point Attention - a neural network that analytically embeds SE(3)-symmetry. Without geometric priors the same task is unsolvable: protein orientation in space must not affect structure prediction. Geometric deep learning is ML with physics built into the architecture.

  • **Tesla Autopilot**: PointNet-style architectures for real-time lidar point cloud processing (<10ms)
  • **Isomorphic Labs (DeepMind)**: EGNN for protein-ligand binding prediction and drug discovery
  • **3D Gaussian Splatting (2023)**: geometric neural networks for photorealistic rendering 100x faster than NeRF

Geometric Priors and Symmetries

CNN for images: shifting a pixel 5 positions right does not change the content - **translational invariance**. This architectural inductive bias allows a 50-million-parameter network to generalize across millions of images. But what if the data is a molecule? A molecule is the same after rotation and translation in space. CNN is useless here. **Geometric deep learning** builds neural networks with the right symmetries.

**SE(3)-equivariance**: function f is equivariant with respect to SE(3) (rotations + translations) if $f(Rx + t) = Rf(x) + t$. For molecules: force prediction is equivariant - rotate the molecule, forces rotate accordingly. Energy prediction is invariant - energy does not change under rotation. **E(3)-equivariant GNNs** (EGNN, 2021; SE(3)-Transformers) build exactly such functions analytically.

**AlphaFold2** uses SE(3)-equivariant modules (Invariant Point Attention) for 3D protein structure prediction. Without geometric priors the model cannot generalize: protein orientation in space carries no information that cannot be learned from invariant features.

A neural network predicts molecular energy. What property must the architecture have?

GNN: Message Passing Framework

All graph neural networks - GraphSAGE, GAT, GIN, MPNN - are special cases of one framework. **Message Passing Neural Network (MPNN, Gilmer 2017)**: three steps per iteration. (1) **Message**: each node v collects messages from neighbors $m_{uv} = M(h_u, h_v, e_{uv})$. (2) **Aggregate**: aggregates: $m_v = \sum_{u \in N(v)} m_{uv}$. (3) **Update**: updates state: $h_v' = U(h_v, m_v)$. After K iterations - K-hop neighborhood.

**Graph Attention Network (GAT, 2018)**: attention coefficient $\alpha_{uv}$ weights messages from different neighbors. This same idea later becomes the basis of the transformer applied to arbitrary graphs. **GIN (Graph Isomorphism Network, Xu 2019)**: proves that GNN with sum (not mean) aggregation is as expressive as the Weisfeiler-Lehman test - the upper bound on graph distinguishability.

**GNN at Meta**: the Facebook social graph has 3 billion users. GraphSAGE (Hamilton, Ying, Leskovec, 2017) was designed specifically for inductive learning on such graphs. Instead of storing the whole graph in memory - sample k-hop neighborhoods for each node. This is exactly why Pinterest, Uber Eats, and Twitter use GraphSAGE.

GNN with mean aggregation vs GNN with sum aggregation differ in discriminative power. Which is stronger and why?

PointNet and Point Cloud Processing

Tesla Autopilot lidar produces 100 000 points per second. The sensor does not give a pixel grid - only a point cloud of 3D coordinates in arbitrary order. CNN does not apply: there is no grid. Sorting by coordinates is not rotation-invariant. **PointNet (Qi et al., 2017)** solves this with a compact trick: process each point independently with a shared MLP, then apply max-pooling - a permutation-invariant aggregator.

The key property of max-pooling: permutation invariance. For any point set $\{p_1, ..., p_n\}$, the result does not depend on order. **PointNet++** adds hierarchical sampling: FPS (Farthest Point Sampling) selects k representative points, then ball query gathers their local neighborhoods, and each cluster is processed by a separate PointNet.

**PointNet in Tesla**: Autopilot HW3 processes lidar point clouds with PointNet-style architectures for real-time object detection. Inference on the Neural Processing Unit: <10ms per frame. PointNet++ with hierarchical sampling improved mAP by 12% compared to base PointNet on the KITTI dataset.

Why does max-pooling in PointNet provide permutation invariance?

Applications of Geometric DL

**Molecular modeling**: SchNet (Schütt 2017), DimeNet (Klicpera 2020), PaiNN, EGNN - all SE(3)-equivariant GNNs for predicting molecular properties. DFT (density functional theory) computes molecular energy in hours; EGNN does it in milliseconds with 1 kcal/mol accuracy. This opens virtual screening of billions of drug candidates.

**3D reconstruction**: NeRF (Neural Radiance Field, 2020) represents a scene as an implicit MLP function: $f(x, y, z, \theta, \phi) \to (RGB, \sigma)$. Gaussian Splatting (2023) replaces the implicit representation with explicit 3D Gaussians, speeding up rendering 100x at comparable quality. Both are geometric neural networks embedding spatial priors into architecture.

**Isomorphic Labs (DeepMind, 2021)**: the company applies geometric GNNs to drug discovery. EGNN on molecular graphs predicts protein-ligand binding. In 2023-2024, 40+ new drug candidates were identified - more than traditional pharma achieves in a decade. Geometric DL is not an academic niche.

Geometric DL is a specialized niche unrelated to standard deep learning.

Transformers and attention are a special case of message passing on a complete graph; geometric DL generalizes standard DL to arbitrary symmetries and data structures.

Bronstein et al. (2021) 'Geometric Deep Learning' shows: CNN, GNN, transformer, RNN - all unified through symmetry groups. This is not a niche, it is fundamental structure.

NeRF represents a 3D scene as an MLP. What geometric property does this embed in the architecture?

Related Topics

Geometric DL unifies computational geometry and deep learning through symmetries.

  • Convolutional Neural Networks — CNN is geometric DL with translational symmetry; GNN generalizes to arbitrary structures
  • Graphs and Algorithms — GNN operates on graphs; understanding graph structure determines architecture choice

Key Ideas

  • **Geometric priors**: embedding symmetries (SE(3), invariance/equivariance) into architecture is not optional for physical tasks
  • **MPNN framework**: message, aggregate, update - unification of all GNNs (GAT, GIN, GraphSAGE)
  • **PointNet**: per-point MLP + max-pooling = invariance to point order and count
  • **Molecular modeling**: EGNN, SchNet - SE(3)-equivariant GNNs, accelerating DFT by 10^6x
  • **Transformer as GNN**: attention on a complete graph is a special case of message passing

Вопросы для размышления

  • PointNet is invariant to point permutation via max-pooling. How would one add scale invariance to a point cloud?
  • GNN power is bounded by the Weisfeiler-Lehman test. Do architectures exist that distinguish more graph classes - and at what cost?
  • Gaussian Splatting replaces implicit NeRF with explicit Gaussians. How does this choice of representation relate to inductive bias in ML?

Связанные уроки

  • cgeom-07 — 3D convex hulls - base structures for point cloud processing
  • ml-29-cnn — GNN generalizes convolutional networks to irregular geometric structures
  • ml-31-transformers — Attention in transformers is a special case of GNN message passing
  • ds-16-graphs-intro — Graph is the fundamental data structure for GNN
  • alg-14-dijkstra — Dijkstra and message passing both propagate information through a graph
Geometric Deep Learning

0

1

Sign In