Knowledge Graphs

Knowledge Graph Embeddings

Google Knowledge Graph contains 70 billion facts - and is still incomplete. Does Wikidata know all children of a given scientist? All employers of every researcher? Completing knowledge graphs through ML prediction of missing links is a task where 2013-2019 produced several elegant geometric solutions.

**Google/Bing Knowledge Panels:** predicting related facts in search results
**Recommendation systems:** user-item-category graphs for interest prediction
**Drug discovery:** predicting drug-protein-disease interactions

TransE: relation as translation

Google Knowledge Graph contains 70 billion facts of the form (subject, relation, object). The graph is incomplete - many facts are missing. **KG Embeddings** map entities and relations to a vector space so that missing triples can be predicted from geometric patterns.

**TransE** (Bordes et al., 2013) captures a simple geometric idea: if triple (h, r, t) holds, the head vector plus the relation vector should be close to the tail. The relation acts as a translation (shift) in embedding space.

**Strengths:** simple, trains fast, good for 1-to-1 and antisymmetric relations
**Weaknesses:** cannot model symmetry, reflexivity, or multi-hop composition
**Complexity:** O(n_e + n_r) parameters, O(1) scoring

TransE cannot correctly model the symmetric relation 'is_friend_of'. Why?

DistMult: diagonal bilinear model

**DistMult** (Yang et al., 2015) replaces addition with multiplication: the relation vector r acts as a diagonal matrix, scoring the compatibility of head and tail. Formally: score(h, r, t) = h^T diag(r) t = sum(h_i * r_i * t_i).

**Intuition:** h * r * t is high when h and t are aligned in dimensions amplified by r. If r_i > 0, same-sign h_i and t_i are preferred; if r_i < 0, opposite signs are preferred.

Relation pattern	Example	TransE	DistMult
Antisymmetric	parent_of	yes	no
Symmetric	married_to	no	yes
Inverse	bought / sold	yes	no
1-to-N	works_at	weak	partial

Why is score(h,r,t) = score(t,r,h) in DistMult?

RotatE: relation as rotation in the complex plane

**RotatE** (Sun et al., 2019) combines the best of both worlds: entities are complex vectors, and the relation r acts as an element-wise rotation: t_i = h_i * r_i where |r_i| = 1 (enforced as r_i = e^{i*theta_i}).

What does the constraint |r_i| = 1 mean in RotatE?

ComplEx: complex embeddings for antisymmetry

**ComplEx** (Trouillon et al., 2016) uses complex vectors for both entities and relations, but without the |r_i| = 1 constraint. Scoring: Re(sum h_i * r_i * conj(t_i)) - the real part of a trilinear form with complex conjugation.

Model	Parameters	Symmetry	Antisymmetry	Inverse	Composition	FB15k MRR
TransE	2n+m	no	yes	yes	yes	0.294
DistMult	2n+m	yes	no	no	no	0.241
ComplEx	4n+2m	yes	yes	yes	no	0.347
RotatE	4n+m	yes	yes	yes	yes	0.338

**Evaluation metrics:** MRR (Mean Reciprocal Rank) - average 1/rank of the true entity across all test triples. Hits@k - fraction where the true entity is in the top-k. Standard benchmarks: FB15k-237, WN18RR (WordNet).

Why can DistMult not model antisymmetric relations while ComplEx can?

Progress in KG embedding models

**TransE (2013):** h + r ≈ t, simple but cannot model symmetry
**DistMult (2015):** h diag(r) t, symmetric by design, weak on antisymmetry
**RotatE (2019):** r as rotation e^{i*theta}, handles all patterns except composition
**ComplEx (2016):** Re(h * r * conj(t)), combines symmetry and antisymmetry

Вопросы для размышления

What relation patterns from real-world domains (e.g. a medical knowledge graph) are not captured by any of the four models covered here?
How can the Hits@10 metric mislead when comparing models on graphs with very different structural properties?
Why is link prediction in knowledge graphs fundamentally harder than predicting missing edges in plain unlabeled graphs?

Связанные уроки

la-15-svd