Скачать презентацию

Идет загрузка презентации. Пожалуйста, подождите

Презентация была опубликована 3 года назад пользователемВасилий Лемешев

1 Solving Some Text Mining Problems with Conceptual Graphs M. Bogatyrev, V. Tuhtin Tula State University Faculty of Cybernetics Laboratory of Information Systems 2008

2 The Nature of Text Mining 1.W. Frawley and G. Piatetsky-Shapiro and C. Matheus (Fall 1992), Knowledge Discovery in Databases: An Overview, AI Magazine: pp. 213– D. Hand, H. Mannila, P. Smyth (2001). Principles of Data Mining. MIT Press, Cambridge, MA. Data mining: "the nontrivial extraction of implicit, previously unknown, and potentially useful information from data"[1] "the science of extracting useful information from large data sets or databases."[2] Text mining: process of deriving high quality information from text; text data mining; text analytics information retrieval, machine learning, statistics, Computational Linguistics Text mining is interdisciplinary:

3 Computational (Corpora) Linguistics text categorization, text clustering, concept/entity extraction, sentiment analysis, document summarization Natural Language Processing annotation abstraction ontologies semantic roles Objects of tagging clusters, trends, associations, deviations Corpora: large and structured text tagging Data: Plain text Knowledge Models: rules; ontologies Processing objects Metadata Knowledge Discovery Text Mining Global Problems Analysis of: syntax grammar morphology semantics Problems

4 : Conceptual Graph: Example: John is going to Boston by bus Concepts Conceptual relations [City*a:'Boston'] [Bus*b:''] [Person*c:'John'] [Going*d:''] (agent?d?c) (dest?d?a) (instrument?d?b) Representations. Conceptual Graph Standard by J. Sowa [8] 1. Conceptual Graph Interchange Form (CGIF) 2. XML Form Proposition … Applying Predicate Calculus (CGIF + NOTIO)

5 Conceptual Graphs in Digital Libraries Supporting CGs in Digital Libraries: 1. Building and storing CGs Automated building of CGs Organizing access to CGs in Datastore 2. Solving applied problems with CGs Automated building and developing catalogues and rubricators of DLs KDD problems

6 Semantic Role Labelling helps to create conceptual relations in CGs Supporting Conceptual Graphs : Supporting Conceptual Graphs : Building and storing CGs Standard way of building CG The sentences are marked with part-of- speech tags. Some titles and sentences from abstracts are filtered The selected sentences are parsed, obtaining their syntactic tree. The syntactic tree is traversed and the canonical conceptual graphs related to it nodes are joined. Lexical restrictions are needed: 1.DL contains scientific papers 2.Only abstracts are transformed to CGs

7 Semantic Role Labelling for CGs Building The working of a genetic algorithm is usually explained by the search for superior building blocks. John is going to Boston by bus

8 Conceptual Graphs in Some Text Mining Problems 1. Building Association Rules - initial set - transactional set -subsets represented in T - Association Rule Supported by Having Confidence as - Set of CGs - Set of generalized CGs - Association Rule on CGs Generalization for concepts Disjoin for relations

9 Conceptual Graphs in Some Text Mining Problems 2. Building Ontologies by Aggregation of CGs Supporting Contexts: - with CGs: - with Corpora: In analyzing the ambiguities, Wittgenstein developed his theory of language games, which allow words to have different senses in different contexts, applications, or modes of use.

10 Solving Text Mining problems by CGs clustering CGs Hierarchy CGs Contexts problem CGs Similarity problem ? Clustering algorithm for specific similarity measures

11 Conceptual Graphs Clustering Similarity Measures Conceptual similarity Relational similarity Some modifications of similarity measures Unified similarity measure

12 Genetic algorithm : speciality of decisions Ackley test function Fitness function trajectories Initial population Final population

13 Genetic algorithm for clustering GA chromosomes representing the clustering for various encoding schemes for clustering[5]: (a) group number; (b) matrix; (c) permutation with the separator character 7; (d) greedy permutation; (e) order based. Our encoding scheme: picks the number of object which is in the same cluster as i -th object …… a1a1 aiai a2a2 anan n objects Clusters: {X 1, X 3, X 6 }, {X 2, X 4, X 5 }

14 Genetic algorithm for clustering Chain encoding for Conceptual Graphs: realizes implicit parallelism of genetic algorithms; forces clustering algorithm to work faster; is invariant under similarity measure on CGs …… a1a1 aiai a2a2 anan n objects An idea about a possibility to vary fitness function of GA by varying its parameters

15 EVO – LIB Project EVO – LIB Project Systems architecture 13 Объекты Агент Структура БД

16 Conceptual Graphs Clustering: Conceptual Graphs Clustering: Data Example for Clustering 1.We assume that the modality (i.e., number of local optima) of a fitness landscape is related to the difficulty of finding the best point on that landscape by evolutionary computation (e.g., hillclimbers and genetic algorithms (GAs)). 2.We first examine the limits of modality by constructing a unimodal function and a maximally multimodal function. 3.At such extremes our intuition breaks down. 4.A fitness landscape consisting entirely of a single hill leading to the global optimum proves to be hard for hillclimbers but apparently easy for GAs. 5.A provably maximally multimodal function, in which half the points in the search space are local optima, can be easy for both hillclimbers and GAs. 6.Exploring the more realistic intermediate range between the extremes of modality, we construct local optima with varying degrees of attraction to our evolutionary algorithms. 7.Most work on optima and their basins of attraction has focused on hills and hillclimbers, while some research has explored attraction for the GA's crossover operator. 8.We extend the latter results by defining and implementing maximal partial deception in problems with k arbitrarily placed global optima. 9.This allows us to create functions with multiple local optima attractive to crossover. 10.The resulting maximally deceptive function has several local optima, in addition to the global optima, each with various size basins of attraction for hillclimbers as well as attraction for GA crossover. 11.This minimum distance function seems to be a powerful new tool for generalizing deception and relating hillclimbers (and Hamming space) to GAs and crossover. 12.This paper describes an initial version of a library of sharable and reusable medical ontological theories, organized according to a proposed classification of ontologies.

17 Conceptual Graphs Clustering: Conceptual Graphs Clustering: clustering results - applying conceptual nearness- applying relational nearness

18 Resume Conceptual graphs is the perspective tool for modelling semantics of texts in DL A process of creating ontologies can be based on technologies which use conceptual graphs Conceptual graphs clustering helps in solving structural problems in DLs and in understanding its data Evolutionary approach is perspective in semantic modelling with conceptual graphs To progress CGs technologies, a joined efforts of computer specialists and linguists are needed.

19 References A World of Conceptual Graphs: Boytcheva, S. Dobrev, P. Angelova, G.CGExtract: Towards Extraction of Conceptual Graphs from Controlled English. Lecture Notes in Computer Science 2120, Springer F. Southey J. G. Linders. Notio - A Java API for Developing CG Tools. 7th International Conference on Conceptual Structures, P.p Hirst G. Ontology and the Lexicon. - Handbook on Ontologies in Information Systems, Berlin – Springer, Cole, R. M. Clustering With Genetic Algorithms citeseer.ist.psu.edu/cole98clustering.html Montes-y-Gomez, Gelbukh, Lopez-Lopez, Baeza-Yates, Text Mining at Detail Level Using Conceptual Graphs. Lecture Notes in Computer Science Vol Springer-Verlag, Pp Sarbo, J. Formal conceptual structure in language. In Dubois, D. M., editor, Proceedings of Computing Anticipatory Systems (CASYS'98), pp , Woodbury, New York Sowa R., Conceptual Graphs: Draft Proposed American National Standard, International Conference on Conceptual Structures ICCS-99, Lecture Notes in Artificial Intelligence 1640, Springer Богатырёв М.Ю., Латов В.Е. Исследование генетических алгоритмов кластеризации. - Изв. ТулГУ. Сер. Математика. Механика. Информатика. Том 8, вып. 3. Информатика. - Тула, С Holland J.H. Adaptation in Natural and Artificial Systems, Ann Arbor: The University of Michigan Press. Reprinted by MIT, Растригин Л.А. Адаптация сложных систем. Рига: Зинатне, с Емельянов В.В., Курейчик В.В., Курейчик В.М. Теория и практика эволюционного моделирования. – М.: Физматлит, с Богатырёв М.Ю. Генетические алгоритмы: принципы работы, моделирование, применение. Тула, ТулГУ, с M. Bogatyrev. Modelling Systems With Symmetry// Proceedings of the 4 th International IMACS Symposium of Mathematical Modelling. - Vienna, Austria, February 5-7, ARGESIM-Verlag, Vienna, pp M. Bogatyrev, V. Latov, K. Avdeev. Symmetry Based Decomposition and its Application in Evolutionary Modelling. – Applied Mathematica: Proc. of 8 th International Mathematica Symposium. Avignon, June, France, 2006

Еще похожие презентации в нашем архиве:

Готово:

Tool: Pareto Charts. The Pareto Principle This is also known as the "80/20 Rule". The rule states that about 80% of the problems are created by 20% of.

Tool: Pareto Charts. The Pareto Principle This is also known as the "80/20 Rule". The rule states that about 80% of the problems are created by 20% of.

© 2017 MyShared Inc.

All rights reserved.