Организация информационного взаимодействия разнородных астрономических ресурсов при решении задач в виртуальной обсерватории В.В.Витковский, О.П.Желенкова.

Презентация:



Advertisements
Похожие презентации
WEB SERVICES Mr. P. VASANTH SENA. W EB SERVICES The world before Situation Problems Solutions Motiv. for Web Services Probs. with Curr. sols. Web Services.
Advertisements

© 2009 Avaya Inc. All rights reserved.1 Chapter Two, Voic Pro Components Module Two – Actions, Variables & Conditions.
© 2002 IBM Corporation Confidential | Date | Other Information, if necessary © Wind River Systems, released under EPL 1.0. All logos are TM of their respective.
Designing Network Management Services © 2004 Cisco Systems, Inc. All rights reserved. Designing the Network Management Architecture ARCH v
Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap 1-1 Chapter 1 Why Study Statistics? Statistics for Business and Economics.
Brief introduction to the general genetic law of development Nikolai Veresov 1.
The waterfall model is a popular version of the systems development life cycle model for software engineering. Often considered the classic approach to.
Special relativity. Special relativity (SR, also known as the special theory of relativity or STR) is the physical theory of measurement in an inertial.
© 2003, Cisco Systems, Inc. All rights reserved. CSPFA Chapter 3 Cisco PIX Firewall Technology and Features.
© 2006 Cisco Systems, Inc. All rights reserved. BSCI v Implementing BGP Explaining BGP Concepts and Terminology.
Evgeniy Krivosheev Andrey Stukalenko Vyacheslav Yakovenko Last update: Nov, 2013 Spring Framework Module 1 - Introduction.
Mobility Control and one-X Mobile. Mobility Control User Configuration Mobile Call Control requires PRI-U, BRI or SIP (RFC2833) trunks in the IP Office.
© 2005 Cisco Systems, Inc. All rights reserved.INTRO v Connecting Networks Understanding How TCP/IP Works.
© 2005 Cisco Systems, Inc. All rights reserved. BGP v Customer-to-Provider Connectivity with BGP Connecting a Multihomed Customer to Multiple Service.
How can we measure distances in open space. Distances in open space.
The waterfall model is a popular version of the systems development life cycle model for software engineering. Often considered the classic approach to.
Institute for Information Problems of the Russian academy of Sciences and its linguistic research Olga Kozhunova CML-2008, Becici, 6-13 September.
© 2005 Cisco Systems, Inc. All rights reserved. BGP v Route Selection Using Policy Controls Applying Route-Maps as BGP Filters.
PAT312, Section 21, December 2006 S21-1 Copyright 2007 MSC.Software Corporation SECTION 21 GROUPS.
OLAP ModelKit is a universal solution in the field of interactive reporting and thorough data analysis which allows programmers to create effective decision-support.
Транксрипт:

Организация информационного взаимодействия разнородных астрономических ресурсов при решении задач в виртуальной обсерватории В.В.Витковский, О.П.Желенкова (САО РАН), Д.О.Брюхов, В.Н.Захаров, Л.А.Калиниченко (ИПИ РАН) ВАК-2004, Симпозиум 1 «Телескопы будущего и виртуальные обсерватории»

Astronomical data is ideal for use in the development of this new type of science because it has no commercial value or ethical constraints, theres lots of it, its complex, its heterogeneous and its real. Jim Gray

План доклада Цели аванпроекта информационной инфраструктуры РВО Примеры задач, решаемых при помощи ВО Методы интеграции неоднородных источников Пример предметного посредника Возможные архитектурные решения Организационные вопросы

Цель проекта Аванпроект при поддержке РФФИ Анализ работ по ВО в мире, определение классов астрофизических задач, на решение которых должен быть направлен проект, определение первоочередных источников (архивов) данных и программных сервисов, которые должны быть включены в инфраструктуру РВО. Определение архитектуры и основных компонентов инфраструктуры, их интерфейсов и технологий, определение концептуальной схемы предметных посредников для решения первоочередных классов задач. Основным архитектурным решением настоящего проекта информационной инфраструктуры ВО предполагается применение технологии предметных посредников

Примеры цифровых архивов наблюдений Архив космического телескопа Хаббла (the Hubble Space Telescope), рентгеновского телескопа Чандра (the Chandra X-Ray Observatory), 2- микронного обзора неба the Two Micron All Sky Survey (2MASS) и цифрового Паломарского обзора (the Digitized Palomar All Sky). Sloan Digital Sky Survey (SDSS) - обзор неба (50% северного полушария) в 5 спектральных диапазонах от ультафиолетового до инфракрасного ( Центр данных в Страсбурге CDS ( Имеются в открытом web-доступе архивы, содержащие астрономические наблюдения десятков миллионов астрономических объектов как спектральных, так и мониторинговых обзоров. Чтобы решить проблемы интегрированного использования астрономических данных, астрономическое сообщество разрабатывает новый подход к работе с ними - создание виртуальной обсерватории (ВО). NVO: Проект направлен на объединение для совместного пользования имеющихся и планируемых в США архивов с наблюдательными данными

VO: General requirements A Virtual Observatory (VO) is a collection of interoperating data archives and software tools which utilize the internet to form a scientific research environment in which astronomical research programs can be conducted. The VO consists of a collection of data centres each with unique collections of astronomical data, software systems and processing capabilities. If large surveys and catalogues could be joined into a uniform and interoperating "digital universe", entire new areas of astronomical research would become feasible. Astronomical data falls into two broad categories: catalog (hundreds of attributes for billions of objects ) and image (10s TB of pixel data). The specific data classes include source catalog, time series, event list, visibility data, (including the various image subclasses), spectrum. A great many astronomical queries have a spatial component, so a spatial indexing scheme is crucial for good query execution performance.

IVOA International Virtual Observatory Alliance (IVOA): создан для того, чтобы способствовать международной координации и сотрудничеству, необходимому для разработок и размещения инструментов, систем и организационных структур, что позволит использовать астрономические архивы как объединенную интероперабельную ВО. Участники IVOA: AstroGrid: UK VO initiative Aus.VO: Australian Virtual Observatory AVO: Astrophysical Virtual Observatory AVO: AVO Science Working Group CDS: Centre de Donnees astronomiques de Strasbourg China.VO: Chinese Virtual Observatory CVO: Canadian Virtual Observatory GAVO: German Virtual Observatory GSC : UK Grid Steering Committee India.VO: Indian Virtual Observatory JVO: Japanese Virtual Observatory jvo.nao.ac.jp/index.e.html NVO: National Virtual Observatory RVO: Russian Virtual Observatory IVOA планирует разработать Реестр ресурсов. Метаданные (FITS), семантика (Unified Content Descriptors - 4-х уровневое иерархическое дерево, содержащее 1500 терминов)

From Tera to Petabytes Large Synoptic Survey Telescope (LSST) ranging from Earth's vicinity to the edge of the optical universe. It will reach 24th mag in 10 seconds, and will survey up to 14,000 square degrees three times per month. Over a period of years, 30,000 square degrees will be surveyed in multiple bands and the co-added images will go to 27th magnitude. High technology in microelectronics, large optics fabrication and metrology, and software. Comparing the LSST (8.4 m) telescope with the SDSS, and allowing also for its increased pixel sampling and resolution, the advantage in figure of merit is by a factor of close to 200 Data products will consist of photometric catalogs which will be continuously updating during the survey, a moving object database, images in at least 5 bands (updated on a regular schedule), the huge time-tagged processed image database, totally will climb to around 15 Petabytes.

Примеры задач

Subject Domain in Natural Science Material System Def in NL Domain Terminology and Concepts (abstract, methodological, concrete) Theory (Model) 1. T1 Signature Concretization A of T1 Concretization B of T1 (attributes, types, classes, processes) [simulators] … Semantics of T1…Tn constituents Observable/Measurable Characteristics Methods and Instruments for observa- tion, experimentation, measurement, data analysis, discovery T1 Measurable Characteristics (attributes, types, classes, procs) Observations, simulations, measurements for T1 Explaining, forecasting Semantics Interpreta- tions T2, …, Tn measu- rable characteristics Theories (Models) T2, …, Tn Problems, methods of solutions, algorithms, programs, workflows Simulation

Задача поиска далеких объектов В САО РАН в течение ряда лет под руководством Ю.Н.Парийского ведутся исследования радиоисточников по программе «Большое Трио». В рамках этой программы решается задача поиска далеких галактик и разработана методология такого поиска, включая следующие этапы: Cелекция по радиосвойствам угловой размер радиоисточника морфология спектральный индекс Cелекция по оптическим свойствам Селекция по наличию рентгеновского излучения Исследование окружения далеких объектов

NVO Astronomical Grid Applications The Galaxy Morphology prototype is a highly-specialized analysis service aimed at studying the morphological properties of galaxies in rich clusters. The Galaxy Morphology prototype needs to support the following operations: find online catalogs of galaxies in clusters, obtain images of the many hundreds of those galaxies, compute a set of morphological parameters on those images using Grid computing, and integrate the new results into the catalogs. Correlation Functions of Galaxies: gravity naturally leads to a highly clustered universe. Cosmologists have chosen to characterize such clustering using n-point correlation functions. Precision measurements of the higher-order correlations of galaxies are now possible due to the availability of high quality data from the SDSS survey.

VirtU - The Virtual Universe (UK) VirtU is a computing infrastructure to enable direct and rigorous comparisons of realistic simulations of cosmic structures, based on the best current theoretical understanding, with real data. The TVO is a completely novel concept. Member scientists, in close collaboration with Europes Virtual Observatory programme, will build up the infrastructure required to publish simulated data and analysis tools in standardized formats. The best simulations will become readily accessible to non-specialists, leading to entirely new science applications. This model, now widely accepted as the standard cosmogony, is based on two key assumptions: (i) that the Universe underwent an early period of inflationary expansion during which its curvature was flattened and small irregularities of quantum origin were imprinted and (ii) that these irregularities grew into cosmological structures by gravitational evolution driven by massive, weakly interacting elementary particles or cold dark matter (CDM). This model agrees with the distribution of galaxies, as mapped by the new generation of surveys

The relationship between the TVO, TOI and AstroGrid

An example of an extragalactic application of VirtU: TVO

An example of an extragalactic application of VirtU: TOI

Requirements for scientific results publishing To publish means to make data products in an archive available through services that are accessible via a VO supplied internet site. To allow independent checks of conclusions based on theoretical results, reproducing certain results To allow comparisons with similar results/methodologies or with the corresponding data by observers/theoreticians. To make theoretical results more easily accessible and understandable for observers. Journals may require links to actual data products and/or software used in published work. To allow querying of publications, real and simulated data products in a uniform manner (joint queries on a structured content items and on metadata – on observations and publications) Invariants for observable classes, observable classes as interpretations of theories (models), triggers watching for inconsistencies of observations and theoretical models

Методы интеграции неоднородных источников

Виртуальная интеграция: Получение глобальной схемы в результате интеграции фиксированного заранее набора схем коллекций (Global as View) Глобальная схема определяется независимо от коллекций (схема предметной области) – Local as View Материализация интегрированных данных (хранилища данных) Комбинированные методы (GLAV, частичная материализация)

SkyQuery A Distributed Web-based Query Service for Astronomy SkyQuery provides a user-friendly interface to run distributed queries over the federation of registered astronomical archives. SkyQuery will not only provide location transparency, but will also take care of vertical fragmentation of the data and will run the query efficiently to minimize query execution costs. Briefly, the technologies used are: DATABASES: In principle, any database can be used. For this service, we will use SQL Server Each database will be accessible through a.NET web service (hereafter SkyNode) PORTAL: The portal is another C#.NET web service that executes a distributed query by splitting the job up between the SkyNode web services. CLIENT: The client is an ASP web page. It is planned to have data covering most of the sky in over 10 different wavelengths. The astronomy data set today consists of over 50 surveys of the sky, with a total data volume of 100TB.

DistributedQueryService The OGSA-DAI (Distributed Access and Integration) Distributed Query Processor (DQP) involves a single query referencing data held at multiple sites. DQP requires the Grids capabilities for systematic access to remote data and computational resources. DQP extends the core of OGSA-DAI by defining a new portType –Grid Distributed Query (GDQ)- and two new services – Grid Distributed Query Service (GDQS) and Grid Query Evaluator Service (GQES). Query processing in DQP consists of the following five stages: Logical optimisation. (rewriting in GAV) Physical optimisation Partitioning. (adding move operator) Scheduling. Partitions are allocated to Grid nodes Query Evaluation Parallel DB machine is used.

The mediator architecture (Wiederhold, 1992) deals with the problem of integration of heterogeneous information. The sources are "heterogeneous" on many levels. Mediator is to provide a uniform query interface to the multiple data sources, thereby freeing the user from having to locate the relevant sources, query each one in isolation, and combine manually the information from the different sources. Subject Mediator Concept

Mediator Definition as a Subject Metainformation Consolidation For the mediator's scalability two separate phases of the mediator's functioning are distinguished: consolidation and operational. On the consolidation phase the efforts of the scientific community are focused on the mediator subject definition by declaring its metainformation. The metainformation created at the consolidation phase constitutes a definition of the subject domain of the mediator. During the operational phase arbitrary information collections can be registered at the mediator expressed in terms of the mediator. Process of the registration is autonomous and can be done by collection providers independently of each other. Users of the mediator know only the metainformation defining the mediators subject and formulate their queries in terms of the mediators subject.

Advantages of subject domain mediation 1. Semantic integration of heterogeneous information collections is reached 2. Users should know only subject definitions as defined by a community 3. Information providers can disseminate their information for integration independently of each other and at any time. 4. Autonomous information collections are absolutely independent on the mediator and its consolidated metainformation definitions 5. Users have integrated access to all information registered up to the moment of a query. 6. Mediators form recursive structure. Multiple subjects can be semantically integrated defining mediators of the higher level.

Пример описания предметного посредника для класса задач поиска далеких объектов

Схема посредника

Примеры дескрипторов основных понятий UCD POS_EQ_RA_MAIN represents: Right Ascension UCD POS_EQ_DEC_MAIN represents: Declination UCD ERROR represents: Error or Uncertainty in Measurements UCD ID_MAIN represents: Main Identifier of a Celestial Object UCD CODE_MULT_INDEX represents: Multiplicity Index Code UCD EXTENSION_DIAM represents: Angular Diameter or Size of the Major Axis UCD CLASS_OBJECT represents: Object Type Classification UCD MORPH_TYPE represents: Morphological Type UCD PHOT_FLUX represents: Flux UCD OBS_FREQUENCY represents: Frequency of the observation UCD SPECT_SP-INDEX represents: Spectral Index = -d(Log F)/d(Log nu) UCD REDSHIFT_HC represents: Redshift (normally heliocentric)

Некоторые коллекции, отобранные для регистрации в посреднике rcCatalog(rSource/RCdata[spatialCoord, flux, origin, spIndex]) radioScienceData (rsd/RadioScienceData[spatialCoord, flux, origin, spIndex]) nvss(nvssSource/NVSSdata[spatialCoord, flux, origin]) radioScienceData (rsd/RadioScienceData[spatialCoord, flux, origin]) координаты, представленные в RC cataloge и в NVSS в виде строки определенного формата, преобразуются в градусы, значения потоков преобразуются из mJy в Jy, значение частоты берется из названия колонок 2mass(2massSource/2MASSdata[spatialCoord, flux, origin]) irScienceData(irs/IRScienceData[spatialCoord, flux, origin]) ошибки для каталога указываются в его описании, необходимо занести их при регистрации; преобразовать значение звездной величины для объектов в шкалу потоков, принятую в посреднике

Пример запроса на OQL для поиска радиоисточников Выбрать координаты и потоки радиоисточников, у которых спектральный индекс лежит в указанном диапазоне значений, потоки не превышают указанного значения и линейные размеры источника не превышают указанного. При этом спектральный индекс вычислять функцией calcIndex для подмножества объектов типа RadioScienceData в классе radioScienceData, имеющих совпадающие координаты. select scoord, sf from (select scoord: s.spatialCoord, sf: s.flux, spind: calcIndex(RA, DE, partition) from (select r from radioScienceData r where r.flux.fluxValue < value3 and r.las < value4) as s group_by RA: s.spatialCoord.ra DE: s.spatialCoord.de) where between (spind, value1, value2)

Отождествление радиоисточников Отождествить радиоисточники, у которых спектральный индекс лежит в указанном диапазоне значений, потоки не превышают указанного значения и линейные размеры источника не превышают указанного, с оптическими, для которых объектом является галактика и выдать эти галактики select o.observes from radioScienceData r opticalScienceData o where between (r.spIndex, value1, value2) and r.flux.fluxValue < value3 and r.las < value4 and match(r, o) and o.observes in galaxy

Возможные архитектурные решения

Компоненты инфраструктуры РВО Основными компонентами архитектуры ВО являются: репозиторий метаинформации посредника; средства поддержки процесса регистрации информационных источников в посреднике; средства компиляции запросов посредника и планирование их совмещенного во времени выполнения в среде множественных источников; система управления базами данных (объектно-реляционная), служащая для вычисления ответа на запрос; среда для унифицированного доступа к источникам данных и сервисам (грид); среда решения задач, включая средства управления потоками работ, извлечения знаний; адаптеры для подключения конкретных источников информации к посреднику и их интерфейсы; средства поддержки электронных библиотек; порталы для взаимодействия различных категорий пользователей с ВО.

Средства предметного посредника

Компоненты общей инфраструктуры

Посредники в OGSA DAI

Data Mining (извлечение знаний) как часть PSE Two basic classes of models: predictive and descriptive Predictive (прогнозирующие): one of the observational features is chosen as the target. The model provides a way of calculating the target as a function of the rest of the features: Y=F(X1, …,Xn). Two approaches – classification (predicts a class to which an object may belong with a certain probability) and regression (predicts a value of the target) Descriptive (дескриптивные): a) Clusterization applying certain criteria of similarity (in contrast with classification features and classes of partitioning are unknown), b) Associative model (looking for stable associations – e.g., pampers – bier) For each model many algorithms exist (classification and regression decision trees, genetic algorithms, neuron nets, discriminant analysis, etc.) Technology of data mining: 1) problem statement, 2) data preparation, 3) model development and choosing the algorithm, 4) evaluation and interpretation. Not all models allow interpretation (e.g., neuron nets). But if rules are applied, they give a way for interpretation

Data Mining (2) DARWIN (Thinking Machine Corp.) has been bought by Oracle in First release of Oracle DM appeared in 2001 (Oracle 9i). DM is incorporated directly into Oracle DB. Algorithms are implemented as stored procedures. Parallel computations are used if possible. Windows is not a proper environment. Specific repository contains information on models, their applications, results.DM4J – data mining graphical client. Oracle provides DM infrastructure, not DM instrument facilities. This provides for incorporation of DM into applications. DM infrastructure provides a way for application problems solving. Java API and PL/SQL – two kinds of interfaces. JDM – new standard under development. DBMS_DATA_MINING, DBMS_MINING_TRANSFORMATION Predictive algs: classification (Naïve Bayes, Adaptive Bayes, Support Vector Machines (SVM), regression, searching for essential attributes (actually, creating new concepts – example: matrix (animal X properties) decomposition into two matrices whose product leads to the original one) Descriptive algs: (enhanced K-means, O-cluster, association search (Apriori algorithm)) Unstructured data analysis (texts, bioinformation, maps, schemas, etc.)

Организационные вопросы

РВО сообщество РВО-сообщество (вопросы формирования сообщества ученых, вовлеченных в процесс создания и использования РВО в научных исследованиях). РВО и образование РВО и международное сотрудничество Проект РВО в организационном плане (структура, управление, финансирование, рабочие группы, симпозиумы). Устойчивое развитие.

IVOA Working Groups Resource Registry Data Modeling Content Description (UCD) Data Access Layer VOTable VO Query Language Grid & Web Services Standards & Processes Interest groups VO Architecture VO Applications VO Theory GGF Astro-RG

IVOA Documents UCD (Unified Content Descriptor) IVOA Working Draft Metadata Content within VO Resources, Version 1.9.9b IVOA Working Draft, A unified domain model for astronomy, for use in the Virtual Observatory, Version 0.9 IVOA Working Draft, Observation Coverage and Space-Time Coordinates IVOA DM WG Internal Note, Data Model for Quantity IVOA DM WG Internal Draft, Data Model for Observation, Version 0.2 IVOA DM WG Internal Draft, IVOA Astronomical Data Query Language, Version IVOA Working Draft,

IVOA Documents IVOA SkyNode Interface, Version 0.7 IVOA Working Draft IVOA Data Access Layer (DAL) Work Package, July 2003 Resource Metadata for the Virtual Observatory Version 0.8 IVOA Working Draft IVOA Document Standards Version 0.1 IVOA Working Draft IVOA: Theory in the VO, Astro-RG: Proposed Global Grid Forum Research Group Charter, The Astronomical Grid Community,