Knowledge Graphs
Knowledge Graph Embeddings
Google Knowledge Graph contains 70 billion facts - and is still incomplete. Does Wikidata know all children of a given scientist? All employers of every researcher? Completing knowledge graphs through ML prediction of missing links is a task where 2013-2019 produced several elegant geometric solutions.
- **Google/Bing Knowledge Panels:** predicting related facts in search results
- **Recommendation systems:** user-item-category graphs for interest prediction
- **Drug discovery:** predicting drug-protein-disease interactions
TransE: relation as translation
Google Knowledge Graph contains 70 billion facts of the form (subject, relation, object). The graph is incomplete - many facts are missing. **KG Embeddings** map entities and relations to a vector space so that missing triples can be predicted from geometric patterns.
**TransE** (Bordes et al., 2013) captures a simple geometric idea: if triple (h, r, t) holds, the head vector plus the relation vector should be close to the tail. The relation acts as a translation (shift) in embedding space.
- **Strengths:** simple, trains fast, good for 1-to-1 and antisymmetric relations
- **Weaknesses:** cannot model symmetry, reflexivity, or multi-hop composition
- **Complexity:** O(n_e + n_r) parameters, O(1) scoring
TransE cannot correctly model the symmetric relation 'is_friend_of'. Why?
DistMult: diagonal bilinear model
**DistMult** (Yang et al., 2015) replaces addition with multiplication: the relation vector r acts as a diagonal matrix, scoring the compatibility of head and tail. Formally: score(h, r, t) = h^T diag(r) t = sum(h_i * r_i * t_i).
**Intuition:** h * r * t is high when h and t are aligned in dimensions amplified by r. If r_i > 0, same-sign h_i and t_i are preferred; if r_i < 0, opposite signs are preferred.
| Relation pattern | Example | TransE | DistMult |
|---|---|---|---|
| Antisymmetric | parent_of | yes | no |
| Symmetric | married_to | no | yes |
| Inverse | bought / sold | yes | no |
| 1-to-N | works_at | weak | partial |
Why is score(h,r,t) = score(t,r,h) in DistMult?
RotatE: relation as rotation in the complex plane
**RotatE** (Sun et al., 2019) combines the best of both worlds: entities are complex vectors, and the relation r acts as an element-wise rotation: t_i = h_i * r_i where |r_i| = 1 (enforced as r_i = e^{i*theta_i}).
What does the constraint |r_i| = 1 mean in RotatE?
ComplEx: complex embeddings for antisymmetry
**ComplEx** (Trouillon et al., 2016) uses complex vectors for both entities and relations, but without the |r_i| = 1 constraint. Scoring: Re(sum h_i * r_i * conj(t_i)) - the real part of a trilinear form with complex conjugation.
| Model | Parameters | Symmetry | Antisymmetry | Inverse | Composition | FB15k MRR |
|---|---|---|---|---|---|---|
| TransE | 2n+m | no | yes | yes | yes | 0.294 |
| DistMult | 2n+m | yes | no | no | no | 0.241 |
| ComplEx | 4n+2m | yes | yes | yes | no | 0.347 |
| RotatE | 4n+m | yes | yes | yes | yes | 0.338 |
**Evaluation metrics:** MRR (Mean Reciprocal Rank) - average 1/rank of the true entity across all test triples. Hits@k - fraction where the true entity is in the top-k. Standard benchmarks: FB15k-237, WN18RR (WordNet).
Why can DistMult not model antisymmetric relations while ComplEx can?
Progress in KG embedding models
- **TransE (2013):** h + r ≈ t, simple but cannot model symmetry
- **DistMult (2015):** h diag(r) t, symmetric by design, weak on antisymmetry
- **RotatE (2019):** r as rotation e^{i*theta}, handles all patterns except composition
- **ComplEx (2016):** Re(h * r * conj(t)), combines symmetry and antisymmetry
Related topics
KG Embeddings bridge symbolic AI (knowledge graphs) with neural methods.
- GNN on Knowledge Graphs — Next step: graph neural networks instead of direct embedding lookup
- Word Embeddings (Word2Vec, GloVe) — Same principle: semantics through geometry of vector space
Вопросы для размышления
- What relation patterns from real-world domains (e.g. a medical knowledge graph) are not captured by any of the four models covered here?
- How can the Hits@10 metric mislead when comparing models on graphs with very different structural properties?
- Why is link prediction in knowledge graphs fundamentally harder than predicting missing edges in plain unlabeled graphs?