Презентация на тему: " Data Warehousing Design Approaches and Methodologies." — Транскрипт:
Data Warehousing Design Approaches and Methodologies
Top-Down Approach Data is transferred from diverse OLTP systems into a centralized place where it could be used for analysis. The data flow in the Top-Down Approach: data extraction from the operational data sources loading into the staging area validation and consolidation transfer to the Operational Data Store (ODS) loading into the data warehouse in a parallel process Detailed data is regularly extracted from the ODS and temporarily hosted in the staging area for aggregation, summarization and then extracted and loaded into the data warehouse. The major benefit of a Top-Down Approach is that it provides an integrated, flexible architecture to support downstream analytic data structures. On the downside, a top-down approach may take longer and cost more to deploy than other approaches, especially in the initial increments.
Bottom-Up Approach The bus structure contains all the common elements that are used by data marts such as conformed dimensions, measures etc defined for the enterprise as a whole. The data flow in the Bottom-Up Approach: extraction of data from operational databases into the staging area process and consolidation loading into the ODS ODS is refreshed extraction into the staging area process to fit into the data mart structure The data from the data mart, then is extracted to the staging area, aggregated, summarized and so on and loaded into the data warehouse and made available to the end user for analysis. The major benefit of a bottom-up approach is user-friendly, flexible data structure using dimensional, star schema models. It also delivers value rapidly. One problem is that it requires organizations to enforce the use of standard dimensions and facts to ensure integration and deliver a single version of the truth.
Hybrid Approach The Hybrid Approach aims to harness the speed and user orientation of the Bottom-Up Approach to the integration of the Top-Down Approach. The data flow in the Hybrid Approach: The Extract, Transform and Load (ETL) tool is deployed ETL extracts data from the source into a non persistent staging area data extraction into dimensional data marts that contain both atomic and summary data data transfer from data marts to the data warehouse The data from the various data marts are then transferred to the data warehouse and query tools are reprogrammed to request summary data from the marts and atomic data from the data warehouse. The major benefit of a hybrid approach is that it combines rapid development techniques within an enterprise architecture framework. However, backfilling a data warehouse can be a disruptive process that delivers no ostensible value and therefore may never be funded.
Federated Approach This is a hub-and-spoke architecture often described as the architecture of architectures. It recommends an integration of: heterogeneous data warehouses data marts packaged applications that already exist in the enterprise. The goal is to integrate existing analytic structures wherever possible to define the highest value metrics, dimensions and measures to share and reuse them within existing analytic structures. This may result in the creation of a common staging area to eliminate redundant data feeds or building of a data warehouse that sources data from multiple data marts, data warehouses or analytic applications. The major problem with the federated approach is that it is not well documented. Also, integrating meta data is a pernicious problem in a heterogeneous, ever-changing environment.
NCR Data Warehousing Methodology NCR Teradata Solutions Methodology allows for a controlled way of building and introducing a successful data warehouse. It describes all the steps required to generate the information out of the structured data. The methodology consists of four phases: Data Warehouse Strategy Development Planning Design & Implementation Support & Enhancement
SAS Institute Rapid Data Warehousing Method The SAS System provides a data warehouse model that addresses the entire scope of warehouse management, organization and exploitation. Management process involves creation and administration of a data warehouse. It consists of data extraction, data transformation, data loading, job scheduling and metadata management. Organization is at the heart of the SAS Data Warehouse Model. It utilizes the SAS database engine, repository, and the accompanying metadata system. Exploitation includes an array of extremely powerful and flexible reporting, analysis and statistical tools that include point and click interfaces to mine, visualize and present data. It also includes OLAP tools for interactive multidimensional analysis, interactive query tools, and client/server development tools for custom interfaces.
Microsoft Data Warehousing The goal of the Microsoft Data Warehousing Framework is to provide access to any data from any source. Microsoft SQL Server 7.0 includes built-in support for OLE-DB and Open Information Models for universal data and metadata access. The built-in Data Transformation Services simplify the process of data transformation and loading. Third parties offer OLE-DB providers to access other data sources and complementary Extraction, Transformation, and Loading tools that extend the capabilities of Data Transformation Services. Microsoft Office 2000, the newest version of the world's most widely used desktop productivity suite, includes rich, integrated business intelligence features: Performing data analysis and reporting with Excel PivotTables and Access Data Access Pages Building interactive Web-based solutions with Office Web Components Collaborating to turn insight into action with Outlook
The Kimball Method Ralph Kimball was co-inventor of the Xerox Star workstation, the first commercial product to use mice, icons, and windows. He was vice president of applications at Metaphor Computer Systems, and is the founder and former CEO of Red Brick Systems. He invented the Nine-Step Method in the Design of a Data Warehouse: Choosing the Process Choosing the Grain Identifying and Conforming the Dimensions Choosing the Facts Storing Precalculations in the Fact Table Rounding Out the Dimension Tables Choosing the Duration of the Database The Need to Track Slowly Changing Dimensions Deciding the Query Priorities and the Query Modes
References Eckerson, Wayne. "Four Ways to Build a Data Warehouse". 101communications January Kimball, Ralph. "Letting the Users Sleep, Part 1". DBMS and Internet Systems January McConnell, Steve, Rapid Development: Taming Wild Software Schedules, Redmond, WA: Microsoft Press, "Microsoft Data Warehousing Framework". Microsoft Corporation. 13 January "Microsoft Data Warehousing Framework Poster". Microsoft Corporation. 13 January "Tutorial 4 : Design of the data warehouse: Kimball Vs Inmon". Exforsys Inc. 13 January "Services for Data Warehousing Solutions". NCR Corporation January Tanrikorur, Tulu. "Enterprise-Ready DSS: The Hybrid Approach". Miller Freeman Inc January "Tutorial 4 : Design of the data warehouse: Kimball Vs Inmon". Exforsys Inc January Welbrock, P.R., Strategic Data Warehousing Principles Using SAS® Software, Cary, NC: SAS Institute Inc., 1998.