Computer Graphics
Animation: Skeletal Rigs, Inverse Kinematics, and Blend Shapes
The Last of Us Part II has 10 000 unique animations and 300 blend shapes per character face. None of it is pre-rendered. Every frame is computed in real-time at 30 Hz on a PlayStation 4 GPU. Skeletal rigs and blend shapes are why that is possible - they compress a character's entire motion space into a 200 KB data structure that the GPU evaluates in microseconds.
- Unreal Engine FBIK: full-body inverse kinematics for foot placement on stairs and uneven terrain - runs at 60 Hz without precomputation
- ARKit 52 blend shapes: iPhone depth camera regresses FACS muscle weights from face point cloud at 60 Hz for FaceTime Animoji
- Fortnite: shared skeleton across all character skins - different meshes, same rig, zero per-skin animation data
- VRChat: user-uploaded avatars use Unity Humanoid rig standard - 17 required joints + optional blend shapes, auto-remapped by runtime
Skeletal Hierarchy: Bind Pose, Joint Transforms, and Skinning
Every human character in a AAA game - The Last of Us, Cyberpunk 2077, Red Dead Redemption - is animated by the same technique invented at University of Utah in the 1970s: a **skeletal rig**. A hierarchy of joints, each with a local transform, deforming a mesh through weighted influence. One skeleton, infinite poses.
A **skeleton** is a tree of joints. Each joint stores a local transform (translation, rotation, scale) relative to its parent. The **bind pose** (T-pose) is the reference configuration where all local transforms are identity-like. Animating means setting joint rotations and computing the resulting **world transform** for every joint via a bottom-up matrix multiplication.
**Linear Blend Skinning (LBS)** is the GPU standard: each vertex stores indices and weights for up to 4 joints. The skinned position is the weighted sum of the vertex transformed by each influencing joint's skinning matrix. LBS has one known artifact: **candy wrapper twisting** at wrists and forearms where two joints rotate in opposite directions. Dual Quaternion Skinning (DQS) fixes this at marginal GPU cost.
**Inverse bind matrix** is computed once at rig export time: `inverseBindMatrix = worldTransform_at_bind_pose.inverse()`. Without it, the skinning matrix would move vertices from the wrong reference frame. The formula `skinMatrix = worldTransform * inverseBindMatrix` cancels the bind pose, leaving only the delta transformation - the actual pose change.
Why does the skinning matrix formula multiply worldTransform by the inverseBindMatrix?
Inverse Kinematics: FABRIK and Constraint Solving
**Forward Kinematics (FK)**: given joint rotations, compute end-effector position. Animators set rotations, the engine computes where the hand ends up. **Inverse Kinematics (IK)**: given where the hand must be (a target position), compute the joint rotations. FK is direct and cheap. IK is a constrained optimization problem - and the animation industry's most-used solver is **FABRIK**: Forward And Backward Reaching IK.
FABRIK by Aristidou and Lasenby (2011) solves multi-joint IK chains without Jacobian matrices or matrix inversions. It runs two passes per iteration. **Forward pass**: set the end joint to the target, then pull each parent joint toward its child while preserving bone length. **Backward pass**: anchor the root back to its original position and push each joint toward its child. Repeat until the end joint is within tolerance of the target.
FABRIK converges in 3-10 iterations for typical game chains, making it suitable for real-time at 60+ Hz. Game engines add **constraints** on top: joint angle limits (knee only bends forward), pole vectors (elbow points to a specific direction), and secondary motion (small oscillation after movement settles). Unreal's Full Body IK (FBIK) generalizes FABRIK to full-body contact scenarios.
What does the backward pass in FABRIK accomplish?
Blend Shapes, Morph Targets, and Animation State Machines
Facial animation cannot be done with a skeleton alone. A human face has 43 muscles producing thousands of distinct expressions. The industry solution is **blend shapes** (also called morph targets): pre-sculpted mesh offsets from the neutral pose. A smile is a delta-mesh. A raised eyebrow is another. The final face is a weighted sum of deltas added to the base mesh.
**Animation State Machines (ASM)** manage transitions between animations. A character has states: Idle, Walk, Run, Jump, Attack. Each state plays a clip. Transitions are triggered by game events (speed > 2.0 -> transition to Run) with a blend duration (cross-fade 0.15s). Unreal's AnimGraph and Unity's Animator use this model. **Blend trees** extend it: a 2D blend space where movement speed and direction smoothly interpolate between 9 directional walk/run clips.
**ARKit Face Tracking** uses 52 blend shapes corresponding to the FACS (Facial Action Coding System) muscle units. An iPhone depth camera captures face depth in real time, and ARKit regresses 52 blend shape weights from the point cloud every frame at 60 Hz. The same 52 weights drive a 3D avatar in Apple Clips, FaceTime Animoji, and third-party apps via ARFaceAnchor.
IK completely replaces FK in game animation - animators set targets and the solver handles everything
IK and FK are layered: FK drives the primary animation clip (locomotion, attacks), IK is applied on top for contact correction (feet on uneven terrain, hand reaching a surface)
Pure IK is unstable - small target movements cause large joint angle changes (Jacobian singularities). FK clips give the animation the right feel and timing. IK then corrects contact points without changing the overall motion. Unreal's Control Rig layers both: FK from Sequencer, IK from runtime contact solvers.
Why are blend shapes preferred over skeletal joints for facial animation?
Key ideas
- Skeleton: joint tree with local transforms; world transform = accumulated parent chain multiplied bottom-up
- Skinning matrix = worldTransform * inverseBindMatrix - cancels bind pose to give only the delta
- FABRIK: forward-backward geometric IK, converges in 3-10 iterations, no Jacobians, real-time safe
- Blend shapes: per-vertex delta meshes weighted and summed - mandatory for facial animation (ARKit uses 52 FACS shapes)
- Animation state machines: FK for primary motion, IK layered on top for contact correction
Related topics
Skeletal animation sits at the center of the real-time character rendering pipeline.
- Post-Processing Pipeline — Animated scene output passes through HDR tone mapping and bloom before display
- Particle Systems — Particle emitters attach to skeleton joints - fire, smoke, and sparks originate from animated bones
Вопросы для размышления
- How would a game engine efficiently update only the dirty subtree of a 200-joint skeleton when only the right arm moves?
- Design a blend tree for a 2D movement space (speed 0-10, direction -180 to +180 degrees) - what clips are needed and how are weights interpolated?
- Why does Dual Quaternion Skinning fix the candy-wrapper artifact that Linear Blend Skinning produces at wrist joints?
Связанные уроки
- cg-16 — Post-processing and HDR pipeline wraps the animated scene output
- cg-18 — Particle systems attach to skeletal joints for VFX like fire from a torch
- alg-12-bfs — Joint hierarchy traversal is DFS on a tree - same algorithm, richer transform payloads
- la-06-transformations