DistribuiÃÃo dos Dados em Ambientes de Data Warehousing: O Sistema WebD2W e Algoritmos Voltados Ã FragmentaÃÃo Horizontal dos Dados

Cristina Dutra de Aguiar Ciferri

A data warehousing environment consolidates data of interest from distributed, autonomous and heterogeneous information sources into a single database, called as data warehouse. This environment guarantees efficiency and flexibility in the recovery of strategic information turned to management and decision-making processes, and maintains integrated data in the warehouse with high quality and reliability. The data extracted from each information source are translated, cleaned when needed and integrated with information from other sources before being stored into the data warehouse. This data loading process is performed in advance, so that OLAP (online analytical processing) queries can be answered directly from the data warehouse, without needing to access the original information sources. In general, the warehouse data are stored in a centralized database. Thus, the main reason for this work development is to distribute the data of such a database, taking into account both the characteristics of data warehousing applications and the needs of decision-making analysts. On the one hand, the data warehouse distribution introduces several advantages into a data warehousing environment. On the other hand, the distributed data warehousing environment must face additional challenges caused by such a distribution. In this thesis, we introduce the WebD2W system, focusing on one of its main objectives: the data warehouse distribution. The WebD2W (Web Distributed Data Warehousing) system is a distributed client-server data warehousing environment, which is aimed not only at the data warehouse distribution, but also at the distributed access to these data using the Web technology as an infrastructure. The generic objectives of the WebD2W system are: to increase the availability of the warehouse data, to increase the availability of access to such data, to maintain the distributed data consistency, to improve the OLAP query performance, to guarantee the fragmentation, replication and location transparencies in data manipulation, and to support a great number of users. Besides presenting the architecture of the WebD2W system, we also propose a set of algorithms for horizontally fragmenting the warehouse data: the FHUâD algorithm, the FHUâDHA algorithm, the FHMâD algorithm, the FHMâDHA algorithm and the FHâMN algorithm. These algorithms are based on the concepts of derivation graph, propagation of the fragmented dimensions and respective restrictions to the graph vertices, and fragmentation or reconstruction of aggregations. The proposed algorithms are used as a basis for the WebD2W system. The differentials of the proposed algorithms refer to the fact that these algorithms: (i) take into account the data warehouse organization in different levels of aggregation; (ii) can be applied to different scenarios, according both to the characteristics of the derivation graph which represents the data warehousing application being fragmented and to the dimensionality of the fragmentation process; (iii) focus on the execution of drill-down, roll-up and slice and dice queries in the individual sites; (iv) are independent of the multidimensional data storage in relational data structures (i.e., ROLAP systems) or in specialized data structures (i.e., MOLAP systems); (v) can be applied both when all aggregations that can be generated from the detailed

DistribuiÃÃo dos Dados em Ambientes de Data Warehousing: O Sistema WebD2W e Algoritmos Voltados Ã FragmentaÃÃo Horizontal dos Dados

AUTOR(ES)

DATA DE PUBLICAÇÃO

RESUMO

ASSUNTO(S)

ACESSO AO ARTIGO

Documentos Relacionados