Semantic distance & WordNet Serge B. Potemkin Moscow State University Philological faculty.

Презентация:



Advertisements
Похожие презентации
1.1 Chapter 1 Introduction Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Advertisements

In mathematics, the notion of permutation is used with several slightly different meanings, all related to the act of permuting (rearranging) objects.
The reconstruction of coding scheme through errors distributions Lyakhovetskii V.A., Karpinskaya V.Ju*, Bobrova E.V. Pavlov Institute of Physiology of.
Linear Block Codes Mahdi Barhoush Mohammad Hanaysheh.
Multiples Michael Marchenko. Definition In mathematics, a multiple is the product of any quantity and an integer. in other words, for the quantities a.
Some ideas of semantic analysis for anaphora resolution Dmitry P. Vetrov Dorodnicyn Computing Centre of RAS.
Knot theory. In topology, knot theory is the study of mathematical knots. While inspired by knots which appear in daily life in shoelaces and rope, a.
How can we measure distances in open space. Distances in open space.
A S ANY LANGUAGE IN THE WORLD A SIGN LANGUAGE HAS MANY ADVANTAGES. F IRST OF ALL, IT IS QUITE RICH TO SHOW THE MOST IMPORTANT MEANINGS THAT EXIST IN ALL.
© 2002 IBM Corporation Confidential | Date | Other Information, if necessary © Wind River Systems, released under EPL 1.0. All logos are TM of their respective.
Science Science (from Latin scientia, meaning "knowledge") is a systematic enterprise that builds and organizes knowledge in the form of testable explanations.
The program requirements to TG at school. Prepared by: Kanat Karina Zhaksylykova Aktoty Ermahan Uldana.
Statistics Probability. Statistics is the study of the collection, organization, analysis, and interpretation of data.[1][2] It deals with all aspects.
Rules, checks equivalent dimensions and formulas are located in the Relations node. Parameters are located in the Parameters node. The parameter is located.
© 2006 Cisco Systems, Inc. All rights reserved. ICND v Determining IP Routes Introducing Distance Vector Routing.
© 2005 Cisco Systems, Inc. All rights reserved. BGP v BGP Overview Processing BGP Routes.
Case is the form of the noun indicated the relation of the noun to other words in the sentence or phrase.
CONSTRAINTS 52. You do your CONSTRAINING in Sketcher mode to create your part to exacting dimensions. This is the opposite of free-form creating we have.
Genetics Genetics (from Ancient Greek γενετικός genetikos, "genitive" and that from γένεσις genesis, "origin"),[1][2][3] a discipline of biology, is the.
24-Jul-15Workgroup and Workflow Systems - When are two Workflows the Same?1 When are two Workflows the same? Polo Regionale di Como of the Politecnico.
Транксрипт:

Semantic distance & WordNet Serge B. Potemkin Moscow State University Philological faculty

Distance and metrics Fundamental concept = distance between entities under consideration Semantic distance between words or concepts Metrical space axioms?

Distance is needed for: word sense disambiguation, determining the structure of texts, text summarization and annotation, information extraction and retrieval, automatic indexing, lexical selection, the automatic correction of word errors in text …

Approaches to distance measuring: Corpora-based Dictionary-based Roget-structured thesauri WordNet and other semantic networks

WordNet Synonym sets (synsets) Subsumption hierarchy (hyponymy / hypernymy), 3 meronymic (PART-OF) relations COMPONENT-OF, MEMBER-OF, SUBSTANCE-OF and their inverses; Antonymy, COMPLEMENT-OF

WordNet shortcomings: synsets – inadequate coverage Non-English versions 20 – 70% of English ( synsets for Russian) Extension is hard Distance measuring is controversial

Corpora-based approach Two words wa and wb are as close as often their neighbors (+/- 5 words) coincide. Ex. (distributional profile of the word) star: space 0.28, movie 0.2, famous 0.13, light 0.09, rich 0.04,..

Dictionary-based approach Two words wa and wb are as close as often words in definitions coincide. Ex. wa=linguistics wb=stylistics {the, study, of, language, in, general, and, of, particular, languages, and, their, structure, and, grammar, and, history} {the, study, of, style, in, written, or, spoken, language}. 2 words coincide in definitions

Bilingual dictionary approach Two words wa and wb are as close as often their equivalents coincide. ρ(Wa, Wb) = 1/Σni, Where Σ is the sum over all coinciding Russian equivalents and ni is the number of dictionaries where an equivalent occurs Or ρ(Wa, Wb) = Σ nai nbi /(||aR|| ||bR||)

Multidimensional scaling Semantic network is a graph nodes -- words edges -- links between words via bilingual lexicon || edge || = ρ(Wa, Wb) Immersion of graph is possible to N-dimensional space where N=number of words in the lexicon (>100000) Multidimensional scaling for visualization

New synonyms

1-neighborhood of accolade Links between synonyms (black) Links between synonyms from the dictionary (green) 2 isolated clusters.

Dominant in acerbity neighborhood ascerbity (терпкость) excluded cluster (bold lines) derived by Markovian process asperity (резкость) is the centre of the cluster

2 dominants for bicycling (wheel+crook)

Adjustable parameters - space dimension; - minimal number of dictionaries linking synonyms; - maximal distance from the word under consideration - maximal number of displayed words - word excluded from clustering …

Compare LDB with WordNet (accolade) SynsetWordNet # of syn. LDB # of syn. Synonyms in LDB award3n+2v80 accolade1n8commendation, praise, approbation, applause, + honorable mention, mention, positive mention honor = honour 4n+3v>100 laurels2n15 n – noun, v - verb

Controversy 1 Immediate hyperonym for the accolade synset in WordNet is symbol -- (an arbitrary sign (written or printed) that has acquired a conventional significance). Immediate hyperonym for commendation, (more frequent than accolade) is accolade synset Actually accolade is hyponym for commendation It is impossible to disambiguate accolade (bracket) from accolade (praise)

Controversy 2 WordNet: dog 1 – «domestic dog» hyperonym - canine, canid. further – mammal, …, entity Nor animal, neither pet, are linked with dog as hyperonyms. Tree structure is inadequate for semantic coding.

Conclusion Each meaning of the polysemic word could be coded as pair (wE, wR) in contrast to synset coding. Metrics superimposed over LDB enables homograph disambiguation and extraction of dominants Network has particular advantages over hierarchical representation of semantic relations