Knowledge Graphs

Question Answering over Knowledge Graphs

Google processes 8.5 billion queries per day. Many of them are factual: "When was Einstein born?", "What is the capital of Australia?". The direct answer boxes in search results are powered by KGQA systems that retrieve answers from a knowledge graph in milliseconds - no web page scanning required.

**Google Knowledge Graph:** direct factual answers in the search results panel - KGQA in production
**Amazon Alexa / Apple Siri:** voice assistants convert speech into KG queries to return precise facts
**Medical systems:** questions about drug interactions and disease symptoms over medical KGs such as UMLS and DrugBank

KGQA: Problem Definition

A user asks: "Who directed the film featuring the actor born in the same country as Nolan?" Wikidata contains the answer - the challenge is converting that sentence into a structured query automatically. This is **KGQA** (Knowledge Graph Question Answering): retrieving answers from a knowledge graph given natural-language questions.

KGQA decomposes into two core subtasks: **Entity Linking** (recognise "Nolan" in the text and map it to wd:Q25191) and **Relation Detection** (interpret "directed" as wdt:P57, "born" as wdt:P19). An error in either subtask breaks the full pipeline.

Standard benchmark datasets: **WebQuestions** (6,642 questions from Google Suggest, answers from Freebase), **SimpleQuestions** (108K single-hop questions), **WebQuestionsSP** (annotated with gold SPARQL), **HotpotQA** (multi-hop, requires reasoning over several entities).

**Metric:** the primary KGQA metric is **exact match accuracy** - the fraction of questions answered correctly. For questions with multiple valid answers, F1 is used. State-of-the-art on WebQuestions reached approximately 85% F1 by 2024.

Entity Linking in KGQA is:

Semantic Parsing: SPARQL Generation

**Semantic Parsing** is the classical KGQA approach: translate a natural-language question into a formal query (SPARQL, logical form, or lambda calculus). A model is trained to learn the mapping from text to structured query.

Modern systems use **BERT/T5** as the question encoder and an autoregressive decoder to generate SPARQL token by token. The central problem is **Entity Linking**: entity URIs (wd:Q584365) cannot be part of the model vocabulary directly. The solution: link mentions to URIs first, then substitute them into a generation template.

**SPARQL Constraint Decoding:** to guarantee syntactically valid SPARQL, constrained beam search masks out invalid tokens at each decoding step. Without this, the model may produce malformed queries that cannot be executed.

Why can entity URIs (wd:Q583...) not be included directly in the seq2seq model vocabulary for SPARQL generation?

Embedding-based QA

An alternative to semantic parsing: work in embedding space rather than generating a query. Questions and candidate answers are projected into a shared vector space. The correct answer is the candidate whose vector is closest to the question vector.

Answer embeddings come from **KG embeddings** (TransE, RotatE, ComplEx) - each KG entity already has a learned vector. The answer vector is composed as: `entity_embed(Tolstoy) + relation_embed(P19)` should point towards "place of birth of Tolstoy".

**Advantage of Embedding QA:** it works even on incomplete graphs - embeddings generalise the structure and can answer questions about implicit facts. **Drawback:** harder to explain why a specific answer was returned.

In TransE-based QA the answer vector is computed as topic_entity + relation. What does this mean geometrically?

Multi-hop Reasoning

"Who is the president of the country where the author of Crime and Punishment was born?" requires four steps: find the author (Dostoevsky), find the birthplace (Moscow), find the country (Russia), find the head of government. This is **multi-hop reasoning** - the answer requires chaining through several entities.

**GraftNet** builds a subgraph around the topic entities (up to K hops), then runs a GNN to aggregate information and classifies each node as "answer / not answer". This enables reasoning over graph structure without explicit SPARQL generation.

**LLM + KG hybrids** (2023-2024): large language models reason well but hallucinate facts. Knowledge graphs are precise but cannot reason in natural language. Hybrid systems use LLMs for question decomposition and step planning, and KGs as a source of verified facts at each step.

**HotpotQA** is the standard multi-hop benchmark requiring supporting-fact explanations. SOTA in 2024 is around 75% F1; humans score ~91%. The gap reflects questions requiring common-sense reasoning not captured in the graph.

Why do single-hop embedding QA systems fail on multi-hop questions?

Question Answering over KG

KGQA = Entity Linking + Relation Detection + Query/Answer Generation. An error in any component breaks the result
Semantic Parsing: NL -> SPARQL via seq2seq (T5/BERT + decoder). Precise and explainable, but fragile to language variation
Embedding QA: questions and answers share a vector space. topic_entity + relation ≈ answer_entity (TransE)
Multi-hop: GraftNet/PullNet build subgraph + GNN. LLM+KG hybrids use LLM for planning and KG for fact verification

Related lessons

KGQA combines several threads from the knowledge graph course:

SPARQL and Cypher — The query language generated by the Semantic Parsing approach
KG Embeddings: TransE and RotatE — Vector representations of entities and relations used in Embedding QA
GNN on Knowledge Graphs — GNN architectures behind GraftNet and multi-hop reasoning

Вопросы для размышления

For which question types is Semantic Parsing more reliable than Embedding QA, and when is the reverse true?
How do LLM+KG hybrid systems address the hallucination problem of language models on factual questions?
Why does KGQA accuracy on HotpotQA remain well below human performance even though the graph contains all the required facts?

Связанные уроки

ir-01