Solving Some Text Mining Problems with Conceptual Graphs M. Bogatyrev, V. Tuhtin Tula State University Faculty of Cybernetics Laboratory of Information.

Презентация:



Advertisements
Похожие презентации
Some ideas of semantic analysis for anaphora resolution Dmitry P. Vetrov Dorodnicyn Computing Centre of RAS.
Advertisements

Statistics Probability. Statistics is the study of the collection, organization, analysis, and interpretation of data.[1][2] It deals with all aspects.
WELCOME TO THE WORLD OF FUZZY SYSTEMS. DEFINITION Fuzzy logic is a superset of conventional (Boolean) logic that has been extended to handle the concept.
© 2002 IBM Corporation Confidential | Date | Other Information, if necessary © Wind River Systems, released under EPL 1.0. All logos are TM of their respective.
Functional modeling of processes in the development of new technical solutions Kozhevnikova V.I. Scientific advisor: Chizhiumov S.D.
A new interface model for the Jazyki Mira typological database Oleg Belyaev The research is supported by RFBR grant ( а.
Control Processes Research Center Director Dr. Tech. Sc., Professor V.I. Gurman.
LANGUAGE, SPEECH, SPEECH ACTIVITY Suggests to allocate the following functions: communicative; thinking tools; mastering the socio-historical; experience;
Power saving control for the mobile DVB-H receivers based on H.264/SVC standard Eugeny Belyaev, Vitaly Grinko, Ann Ukhanova Saint-Petersburg State University.
Institute for Information Problems of the Russian academy of Sciences and its linguistic research Olga Kozhunova CML-2008, Becici, 6-13 September.
Fractal A fractal is a mathematical set that has a fractal dimension that usually exceeds its topological dimension and may fall between the integers.
Science Science (from Latin scientia, meaning "knowledge") is a systematic enterprise that builds and organizes knowledge in the form of testable explanations.
Comparative Analysis of Phylogenic Algorithms V. Bayrasheva, R. Faskhutdinov, V. Solovyev Kazan University, Russia.
HPC Pipelining Parallelism is achieved by starting to execute one instruction before the previous one is finished. The simplest kind overlaps the execution.
In mathematics, the notion of permutation is used with several slightly different meanings, all related to the act of permuting (rearranging) objects.
Introduction The modern world of computer graphics is mostly dominated by polygonal models. Due to their scalability and ease of rendering such models.
Tool: Pareto Charts. The Pareto Principle This is also known as the "80/20 Rule". The rule states that about 80% of the problems are created by 20% of.
Intelligence framework for labour-market and educational services resources management Personalreserve Authors: Antonets A. Galushkin M. c.t.s. Kravets.
Normal Distribution. in probability theory, the normal (or Gaussian) distribution is a continuous probability distribution that has a bell-shaped probability.
© 2005 Cisco Systems, Inc. All rights reserved. BGP v Customer-to-Provider Connectivity with BGP Connecting a Multihomed Customer to Multiple Service.
Транксрипт:

Solving Some Text Mining Problems with Conceptual Graphs M. Bogatyrev, V. Tuhtin Tula State University Faculty of Cybernetics Laboratory of Information Systems 2008

The Nature of Text Mining 1.W. Frawley and G. Piatetsky-Shapiro and C. Matheus (Fall 1992), Knowledge Discovery in Databases: An Overview, AI Magazine: pp. 213– D. Hand, H. Mannila, P. Smyth (2001). Principles of Data Mining. MIT Press, Cambridge, MA. Data mining: "the nontrivial extraction of implicit, previously unknown, and potentially useful information from data"[1] "the science of extracting useful information from large data sets or databases."[2] Text mining: process of deriving high quality information from text; text data mining; text analytics information retrieval, machine learning, statistics, Computational Linguistics Text mining is interdisciplinary:

Computational (Corpora) Linguistics text categorization, text clustering, concept/entity extraction, sentiment analysis, document summarization Natural Language Processing annotation abstraction ontologies semantic roles Objects of tagging clusters, trends, associations, deviations Corpora: large and structured text tagging Data: Plain text Knowledge Models: rules; ontologies Processing objects Metadata Knowledge Discovery Text Mining Global Problems Analysis of: syntax grammar morphology semantics Problems

: Conceptual Graph: Example: John is going to Boston by bus Concepts Conceptual relations [City*a:'Boston'] [Bus*b:''] [Person*c:'John'] [Going*d:''] (agent?d?c) (dest?d?a) (instrument?d?b) Representations. Conceptual Graph Standard by J. Sowa [8] 1. Conceptual Graph Interchange Form (CGIF) 2. XML Form Proposition … Applying Predicate Calculus (CGIF + NOTIO)

Conceptual Graphs in Digital Libraries Supporting CGs in Digital Libraries: 1. Building and storing CGs Automated building of CGs Organizing access to CGs in Datastore 2. Solving applied problems with CGs Automated building and developing catalogues and rubricators of DLs KDD problems

Semantic Role Labelling helps to create conceptual relations in CGs Supporting Conceptual Graphs : Supporting Conceptual Graphs : Building and storing CGs Standard way of building CG The sentences are marked with part-of- speech tags. Some titles and sentences from abstracts are filtered The selected sentences are parsed, obtaining their syntactic tree. The syntactic tree is traversed and the canonical conceptual graphs related to it nodes are joined. Lexical restrictions are needed: 1.DL contains scientific papers 2.Only abstracts are transformed to CGs

Semantic Role Labelling for CGs Building The working of a genetic algorithm is usually explained by the search for superior building blocks. John is going to Boston by bus

Conceptual Graphs in Some Text Mining Problems 1. Building Association Rules - initial set - transactional set -subsets represented in T - Association Rule Supported by Having Confidence as - Set of CGs - Set of generalized CGs - Association Rule on CGs Generalization for concepts Disjoin for relations

Conceptual Graphs in Some Text Mining Problems 2. Building Ontologies by Aggregation of CGs Supporting Contexts: - with CGs: - with Corpora: In analyzing the ambiguities, Wittgenstein developed his theory of language games, which allow words to have different senses in different contexts, applications, or modes of use.

Solving Text Mining problems by CGs clustering CGs Hierarchy CGs Contexts problem CGs Similarity problem ? Clustering algorithm for specific similarity measures

Conceptual Graphs Clustering Similarity Measures Conceptual similarity Relational similarity Some modifications of similarity measures Unified similarity measure

Genetic algorithm : speciality of decisions Ackley test function Fitness function trajectories Initial population Final population

Genetic algorithm for clustering GA chromosomes representing the clustering for various encoding schemes for clustering[5]: (a) group number; (b) matrix; (c) permutation with the separator character 7; (d) greedy permutation; (e) order based. Our encoding scheme: picks the number of object which is in the same cluster as i -th object …… a1a1 aiai a2a2 anan n objects Clusters: {X 1, X 3, X 6 }, {X 2, X 4, X 5 }

Genetic algorithm for clustering Chain encoding for Conceptual Graphs: realizes implicit parallelism of genetic algorithms; forces clustering algorithm to work faster; is invariant under similarity measure on CGs …… a1a1 aiai a2a2 anan n objects An idea about a possibility to vary fitness function of GA by varying its parameters

EVO – LIB Project EVO – LIB Project Systems architecture 13 Объекты Агент Структура БД

Conceptual Graphs Clustering: Conceptual Graphs Clustering: Data Example for Clustering 1.We assume that the modality (i.e., number of local optima) of a fitness landscape is related to the difficulty of finding the best point on that landscape by evolutionary computation (e.g., hillclimbers and genetic algorithms (GAs)). 2.We first examine the limits of modality by constructing a unimodal function and a maximally multimodal function. 3.At such extremes our intuition breaks down. 4.A fitness landscape consisting entirely of a single hill leading to the global optimum proves to be hard for hillclimbers but apparently easy for GAs. 5.A provably maximally multimodal function, in which half the points in the search space are local optima, can be easy for both hillclimbers and GAs. 6.Exploring the more realistic intermediate range between the extremes of modality, we construct local optima with varying degrees of attraction to our evolutionary algorithms. 7.Most work on optima and their basins of attraction has focused on hills and hillclimbers, while some research has explored attraction for the GA's crossover operator. 8.We extend the latter results by defining and implementing maximal partial deception in problems with k arbitrarily placed global optima. 9.This allows us to create functions with multiple local optima attractive to crossover. 10.The resulting maximally deceptive function has several local optima, in addition to the global optima, each with various size basins of attraction for hillclimbers as well as attraction for GA crossover. 11.This minimum distance function seems to be a powerful new tool for generalizing deception and relating hillclimbers (and Hamming space) to GAs and crossover. 12.This paper describes an initial version of a library of sharable and reusable medical ontological theories, organized according to a proposed classification of ontologies.

Conceptual Graphs Clustering: Conceptual Graphs Clustering: clustering results - applying conceptual nearness- applying relational nearness

Resume Conceptual graphs is the perspective tool for modelling semantics of texts in DL A process of creating ontologies can be based on technologies which use conceptual graphs Conceptual graphs clustering helps in solving structural problems in DLs and in understanding its data Evolutionary approach is perspective in semantic modelling with conceptual graphs To progress CGs technologies, a joined efforts of computer specialists and linguists are needed.

References A World of Conceptual Graphs: Boytcheva, S. Dobrev, P. Angelova, G.CGExtract: Towards Extraction of Conceptual Graphs from Controlled English. Lecture Notes in Computer Science 2120, Springer F. Southey J. G. Linders. Notio - A Java API for Developing CG Tools. 7th International Conference on Conceptual Structures, P.p Hirst G. Ontology and the Lexicon. - Handbook on Ontologies in Information Systems, Berlin – Springer, Cole, R. M. Clustering With Genetic Algorithms citeseer.ist.psu.edu/cole98clustering.html Montes-y-Gomez, Gelbukh, Lopez-Lopez, Baeza-Yates, Text Mining at Detail Level Using Conceptual Graphs. Lecture Notes in Computer Science Vol Springer-Verlag, Pp Sarbo, J. Formal conceptual structure in language. In Dubois, D. M., editor, Proceedings of Computing Anticipatory Systems (CASYS'98), pp , Woodbury, New York Sowa R., Conceptual Graphs: Draft Proposed American National Standard, International Conference on Conceptual Structures ICCS-99, Lecture Notes in Artificial Intelligence 1640, Springer Богатырёв М.Ю., Латов В.Е. Исследование генетических алгоритмов кластеризации. - Изв. ТулГУ. Сер. Математика. Механика. Информатика. Том 8, вып. 3. Информатика. - Тула, С Holland J.H. Adaptation in Natural and Artificial Systems, Ann Arbor: The University of Michigan Press. Reprinted by MIT, Растригин Л.А. Адаптация сложных систем. Рига: Зинатне, с Емельянов В.В., Курейчик В.В., Курейчик В.М. Теория и практика эволюционного моделирования. – М.: Физматлит, с Богатырёв М.Ю. Генетические алгоритмы: принципы работы, моделирование, применение. Тула, ТулГУ, с M. Bogatyrev. Modelling Systems With Symmetry// Proceedings of the 4 th International IMACS Symposium of Mathematical Modelling. - Vienna, Austria, February 5-7, ARGESIM-Verlag, Vienna, pp M. Bogatyrev, V. Latov, K. Avdeev. Symmetry Based Decomposition and its Application in Evolutionary Modelling. – Applied Mathematica: Proc. of 8 th International Mathematica Symposium. Avignon, June, France, 2006