Natural Language Processing
Sentiment Analysis
Предварительные знания
- Text classification: pipeline, evaluation metrics, fine-tuning a transformer encoder
- Named Entity Recognition: span extraction from text (the basis for Aspect Term Extraction)
Pang, Lee, and Vaithyanathan and the birth of sentiment classification
In 2002, Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan published 'Thumbs up? Sentiment Classification using Machine Learning Techniques', the first paper to frame sentiment analysis as supervised classification over movie reviews. They showed that Naive Bayes, maximum entropy, and SVMs on bag-of-words features clearly beat hand-built sentiment word lists, yet that sentiment was harder than topic classification because of irony and context. In 2008, Pang and Lee wrote the survey 'Opinion Mining and Sentiment Analysis', which established the field as its own discipline. The next leap came with the Stanford Sentiment Treebank (Socher et al., 2013): a recursive model over parse trees that captured how polarity composes, such as negation flipping 'good' after 'not'. These three milestones traced the path from bag-of-words to ABSA and neural sentiment models. The irony and compositionality problems flagged back in 2002 remain open and still shape how modern sentiment analysis is structured.
"Great product!" - positive? What if it is sarcasm? What if the battery is great but the camera is terrible? What if the speaker's face contradicts the words? Document-level sentiment fails on all three cases. This lesson covers the tools that handle them.
- **Amazon Reviews:** ABSA extracts aspect-level ratings (quality, price, shipping) - the basis of structured review summaries shown under each product
- **Brand Monitoring (Twitter):** irony and stance detection eliminate false-positive alerts generated by sarcastic praise
- **Call Centers (Cisco, Salesforce):** real-time acoustic sentiment on both agent and customer audio channels
Aspect-Based Sentiment Analysis
The review "Battery life is excellent but the camera is disappointing" contains **two** sentiments in one sentence. A document-level classifier returns "neutral" - losing both signals entirely. **Aspect-Based Sentiment Analysis (ABSA)** addresses this by decomposing opinion into aspect terms and their associated polarities.
ABSA decomposes into three subtasks: **ATE** (Aspect Term Extraction) locates aspect words ("battery", "camera"); **ASC** (Aspect Sentiment Classification) assigns polarity per aspect; **APC** (Aspect-oriented Polarity) solves both jointly. Modern end-to-end models (BERT fine-tuned on SemEval, GRACE) treat APC as span-extraction plus classification.
**Benchmarks:** SemEval-2014 Task 4 (restaurants, laptops), SemEval-2016 Task 5. **Metric:** aspect-level F1, macro-averaged across categories. **Baseline:** BERT + CRF for ATE; BERT + linear head for ASC. State-of-the-art reaches ~90 F1 on the restaurant domain.
Review: "The food was amazing but the service ruined the evening". Document-level sentiment returns neutral. ABSA returns:
Stance Detection
The tweet "Vaccines save lives" carries positive sentiment, but its stance depends entirely on context: a vaccine advocate writes it in earnest, an opponent quotes it sarcastically. Sentiment answers "what tone?"; **Stance Detection** answers "what position does the author take toward a specific target?"
Stance labels: **FAVOR** (supports the target), **AGAINST** (opposes), **NONE** (neither). The target is fixed - the same text can carry opposite stances toward different targets. Highly negative text can still be FAVOR stance: this orthogonality from sentiment is the defining property.
**Stance vs Sentiment:** "Sadly, I agree with candidate X" = negative sentiment, FAVOR stance. **Datasets:** SemEval-2016 Task 6, RumourEval (rumor stance in news), VaccinStance. Cross-target stance generalization - predicting stance on unseen targets - remains an active research area.
Text: "Unfortunately, Candidate X is the least bad option - so yes, voting for them". Sentiment is negative. Stance toward Candidate X is:
Irony and Sarcasm in NLP
"Oh great, another flight delay" is literally positive, pragmatically negative. Irony inverts the literal meaning, and this inversion is the primary failure mode of surface-level sentiment classifiers. On Twitter data, neglecting irony detection degrades sentiment accuracy by 20-30 percentage points.
**Subtypes:** irony (mild mismatch between literal and intended meaning), sarcasm (irony with hostile intent), overstatement, understatement. Key surface signals: context contrast (negative event paired with positive phrasing), punctuation (exclamation mark in a negative situation), lexical markers ("great", "wonderful", "totally" in implausible positive contexts).
**Datasets:** SemEval-2018 Task 3 (English Twitter irony, 4 subtypes), Reddit Sarcasm Corpus (1M examples). **State of the art:** RoBERTa fine-tuned reaches F1 ~0.72 on SemEval-2018 Task 3 - the task remains open. Adding audio features (intonation) yields +5-8% F1 on spoken irony.
Why is irony detection critical for brand monitoring systems on Twitter?
Multimodal Sentiment
A video review where someone says "the product is excellent" in a flat monotone with a tense expression: text classifier returns positive, a multimodal model factors in acoustic features (monotone pitch signals possible sarcasm) and facial action units (tension signals dissatisfaction) - and may return negative or ironic. **Multimodal sentiment** fuses text, audio, and visual signals into a single prediction.
**Three modalities:** Text (tokens, syntax via BERT), Audio (pitch, tempo, pauses, intensity via openSMILE or wav2vec2), Visual (facial action units via OpenFace or ResNet). Standard benchmarks: CMU-MOSI, CMU-MOSEI (video reviews), IEMOCAP (dyadic dialogues). Fusion strategies: early (concatenate before classification), late (average predictions), or cross-modal attention - the current best approach.
**Industry applications:** Google uses multimodal signals for YouTube Content Policy enforcement. Cisco and Salesforce deploy acoustic sentiment in real-time call center analytics. CMU-MOSI benchmark: best text-only MAE = 0.92, best multimodal = 0.71 (lower is better).
Multimodal sentiment is just averaging predictions from separate text, audio, and video models
Cross-modal or early fusion consistently outperforms late averaging because modalities interact rather than vote independently
When tone contradicts words, that contradiction is itself the signal (irony, deception). Interaction-aware fusion captures this; late averaging loses it by treating each modality in isolation
Cross-modal attention in multimodal sentiment serves to:
Sentiment Analysis: the full picture
- ABSA: extract aspect terms and assign per-aspect polarity, not a single document-level label
- Stance Detection: author position toward a specific target - orthogonal to overall sentiment polarity
- Irony detection: literal text inverts intended meaning - ~10-15% of Twitter content, systematic bias without detection
- Multimodal sentiment: text + audio + video via cross-modal attention, -20% MAE vs text-only on CMU-MOSEI
Related topics
Sentiment analysis builds on text classification and embeddings, and feeds into opinion-aware dialogue systems.
- Text Classification and BERT fine-tuning — Core tool for all sentiment subtasks
- Word Embeddings and contextual representations — Embeddings encode emotional valence, forming the feature base for sentiment models
- Dialogue Systems and Opinion Mining — Sentiment in conversation - input for emotion-aware response generation
Вопросы для размышления
- When does a product review require both stance detection and ABSA simultaneously? Describe a concrete scenario where combining both changes a downstream business decision.
- Multimodal sentiment is more accurate but far harder to deploy than text-only. At what accuracy gain threshold does the additional engineering cost become justified?
- Irony detection models trained on Twitter generalize poorly to other domains. What properties of Twitter irony make it particularly domain-specific?
Связанные уроки
- nlp-07 — Text classification is the core tool for sentiment subtasks
- nlp-04 — Embeddings encode emotional valence as features
- nlp-21 — Multimodal sentiment combines text, audio and video cues
- ml-05-evaluation — Imbalanced sentiment classes need careful metric choice
- ml-01-intro