Analysis of sentiment syntagma using dependency tree Serge B. Potemkin Moscow State University potemkin@philol.msu.ru.

Презентация:



Advertisements
Похожие презентации
Some ideas of semantic analysis for anaphora resolution Dmitry P. Vetrov Dorodnicyn Computing Centre of RAS.
Advertisements

Учимся писать Эссе. Opinion essays § 1- introduce the subject and state your opinion § 2-4 – or more paragraphs - first viewpoint supported by reasons/
Making PowerPoint Slides Avoiding the Pitfalls of Bad Slides.
In mathematics, the notion of permutation is used with several slightly different meanings, all related to the act of permuting (rearranging) objects.
General characteristics As any other part of speech, the noun can be characterized by three criteria: Semantic (the meaning) Morphological (the form and.
The Law of Demand The work was done by Daria Beloglazova.
Time-Series Analysis and Forecasting Lecture on the 5 th of October.
Chap 8-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 8 Estimation: Single Population Statistics for Business and Economics.
1 Vocabulary Instruction. 2 How We Learn New Words Firsthand experience with the concept is directly related to reading comprehension Experience is a.
Time-Series Analysis and Forecasting – Part IV To read at home.
Operator Overloading Customised behaviour of operators Chapter: 08 Lecture: 26 & 27 Date:
How to crack technical interview ? Yogesh Mehla. Many of my friends who are technically good and even great, but they are unable to crack their first.
Ideal Family Were prepared by Iryna Molokova and Ilona Synytsia.
Introduction to Emotional Intelligence. What is Emotional Intelligence? Emotional intelligence is a set of emotional and social skills that collectively.
© The McGraw-Hill Companies, Inc., Chapter 4 Counting Techniques.
Tasks of Project: 1) To know more about m-teaching and m- learning; 2) To know opinion of pupils about gadgets; 3) To know opinions of teachers about.
Comparative Analysis of Phylogenic Algorithms V. Bayrasheva, R. Faskhutdinov, V. Solovyev Kazan University, Russia.
What to expect? How to prepare? What to do? How to win and find a good job? BUSINESS ENGLISH COURSE NOVA KAKHOVKA GUMNASUIM 2012.
Michael Marchenko. In mathematics, a sequence is an ordered list of objects (or events). Like a set, it contains members (also called elements, or terms),
PERT/CPM PROJECT SCHEDULING Allocation of resources. Includes assigning the starting and completion dates to each part (or activity) in such a manner that.
Транксрипт:

Analysis of sentiment syntagma using dependency tree Serge B. Potemkin Moscow State University

Terms Sentiment A thought, view, or attitude, especially one based mainly on emotion instead of reason Sentiment Analysis (opinion mining) use of natural language processing (NLP) and computational techniques for extraction or classification of sentiment from (unstructured) text

What for? Consumer information Product reviews Consumer attitudes Trends Politics Politicians want to know voters views Voters want to know policitians intentions and who else supports them Social Find like-minded individuals or communities Financial Predict market trends given the current opinions

Features Which features to use? Words (unigrams) Phrases/n-grams Sentences How to interpret features for sentiment detection? Bag of words Annotated lexicons (WordNet, SentiWordNet) Syntactic patterns Paragraph structure

Challenges Harder than topical classification, with which bag of words features perform well Must consider other features due to… Ambiguity of sentiment expression irony expression of sentiment using neutral words … many others Domain/context dependence words/phrases can mean different things in different contexts and domains Effect of syntax on semantics

Formal description Semantic orientation of a sentence expressed by a ternary predicate: O(subject, object, sentiment) sentiment = {bad, neutral, good} i.e., the subject of assessment considers the object of assessment to be good or bad (or neutral = not a sentiment)

Sentiment expression in NL Predicate O may be expressed explicitly: (Vania likes Masha) - only the surface syntactic analysis is needed: Vania (subj) likes (sentiment) Masha (obj) to determine its semantic orientation (SO). The common case is quite different: (Vania suffers from Mashas absence) – both suffer and absence are negative but the sense is equivalent.

Bag of words vs. syntagma an. Bag of words (number of positive and negative words) gives good results for large texts Syntagma = a phrase forming a syntactic unit, say modifier (X) + keyword (Y) i.e. adjective+noun or adverb+verb Signature of syntagma SO = sgn(X,Y,neg/0/pos).

SO Calculus X,Y.[sgn(X,Y,pos) dep(mod,X,Y),sgn(X,pos),sgn(Y,pos)].(a) i.e. if X,Y positive then X+Y positive X,Y,Z.[sgn(X,Y,Z) dep(mod,X,Y),sgn(X,0),sgn(Y,Z)]. (b) i.e. if X pos., Y neut. then X+Y pos. X,Y,Z.[sgn(X,Y,Z) dep(mod,X,Y),sgn(X,Z),sgn(Y,0)]. (c)

Different orientation of syntagma constituent words sgn(безумная,радость,pos)= sgn(mad,happyness,pos), sgn(бешеный,успех,pos)= sgn(furious,success,pos), sgn(солидный,ущерб,neg)= sgn(considerable,damage,neg), sgn(хороший,нагоняй,neg)= sgn(good,scolding,neg). [Kustova, 1]

Ambigoues cases sgn(худой,мир,?), sgn(добрая,война,?) sgn (bad,peace,?), sgn (good,war,?) The expression "a bad peace is better than a good war," establishes an order relation "better" among its member attributive constructions, but one can assume that both are bad, i.e., sgn sgn(bad,peace,neg), sgn(good,war,neg). In some other context, "good war" could be perceived as a positive phenomenon.

Double negative Logical rule of double negation : * X,Y,Z.[sgn(X,Y,pos) dep(mod,X,Y),sgn(X,neg),sgn(Y,neg)]. fails in NL: weak opponent, impotent aggressor, toothless criticism (neut.) or bitter sorrow, blatant outrage, brutal torture (neg.)

Syntagma evaluation Methods: expert evaluations performed by several independent experts [Osgood,2], who are asked to mark up SO of isolated words and syntagma, assigning them a label {pos/0/neg} corpus techniques, performed on an sentiment-annotated corpus [Zagibalov,3], SentiWordNet

SentiWordNet Based on WordNet synsets Ternary classifier Positive, negative, and neutral scores for each synset Provides means of gauging sentiment for a text

SentiWordNet: Construction Created training sets of synsets, L p and L n Start with small number of synsets with fundamentally positive or negative semantics, e.g., nice and nasty Use WordNet relations, e.g., direct antonymy, similarity, derived-from, to expand L p and L n over K iterations L o (objective) is set of synsets not in L p or L n Trained classifiers on training set Rocchio and SVM Use four values of K to create eight classifiers with different precision/recall characteristics As K increases, P decreases and R increases

SentiWordNet: Results 24.6% synsets with Objective<1.0 Many terms are classified with some degree of subjectivity 10.45% with Objective<= % with Objective<=0.125 Only a few terms are classified as definitively subjective Difficult (if not impossible) to accurately assess performance

Corpus-based method Sentiment annotated corpora (English and Russian) of approx short utterances concerning popular books. Each utterance contains from 1 to 15 sentences and was marked with a label {neg / pos}.

Corpus processing - Stemming and determination of morphological characters of each word (without morphology disambiguation); - Parse with obtaining the dependency tree for each sentence [Potemkin, 4]; - Joining the particle "no/not" to the associated word (not understand => not_understand) - Selection of constructions modifier+key word (adjective+noun, adverb+verb); - Counting the number of occurrences for each key word = nverb,

Corpus processing (continued) - Counting the number of occurrences in the positive-marked utterances = nvp and negatively labeled utterances = nvn - Calculation of the normalized assessment factor for each key word kv = (nvp-nvn) / nverb; - The same calculations for each modifier to give the normalized assessment factor kd, and for each syntagma in the corpus - the normalized assessment factor ks.

Assessment thresholds Assessment factors ks [-1,1], ks [-1, -0.6) = neg; ks [-0.6, 0.6] = 0; ks (0.6, 1] = pos

Table of syntagma signatures neg -key0 -keypos -key neg -modneg not_palatable demagogy pos –defeated enemy neg uninteresting book pos forgotten kingdoms neg banal action-film pos secondery pleasure 0 -modneg star fever; pos imminent defeat; neg unexpected level. pos only book. neg. late success pos continues growth pos -modneg happy end pos fine rubbish neg good intentions pos pleasant book neg sweet honey pos best masterpiece

Histogram of syntagma distribution over the texts

Histogram of the 1 st word of syntagma distribution

Histogram of the 2 nd word of syntagma distribution

Conclusion The report presents considerations for determining the sentiment of syntagma on the basis of evaluation of the signature of its constituent words for structures such as adjective+noun, verb+adverb. Logical formulas specifying the calculation of semantic orientations are listed. An experiment over the semantically annotated sentences was performed. The further research concerning predictive syntagma of type subject + verb + object will be undertaken.

References Charles E. Osgood, George Suci, & Percy Tannenbaum, The Measurement of Meaning. University of Illinois Press, /tz21/ /tz21/ aachen.de/Publications/CEUR-WS/Vol- 476/paper6.pdf aachen.de/Publications/CEUR-WS/Vol- 476/paper6.pdf