Schema quality analysis in a data integration system

AUTOR(ES)
DATA DE PUBLICAÇÃO

2008

RESUMO

Information Quality (IQ) has become a critical topic in organizations and, consequently, in Information Systems research. Poor information quality can have a severe impact on the overall effectiveness of an organization. The growth of data warehouses and the direct access to information from various sources by managers and information users have increased the need for, and awareness of, highquality information in organizations. The notion of Information Quality (IQ) has emerged during the past years and shows a steadily increasing interest. There is no common or agreed definition or measure for Information Quality apart from such general and classical notion as âfitness for useâ. The information is considered appropriate for use in the perspective of users requirements, i.e., the value of the information depends on its utility when being used. In data integration systems, the access to information that is spread over multiple, distributed and heterogeneous sources is an important problem in many domains. Typically there are many ways to obtain answers to a global query, using data from different sources in different combinations, but in general, it is prohibitively expensive to obtain all answers. While much work has been done on query processing and choosing plans under cost criteria, very little is known about the important problem of measuring the Information Quality aspects into data integration global schemas. In our work, we present the proposal of IQ analysis in a data integration system, mainly related to the system schemas. The main goal we intend to accomplish is to minimize the query processing time. Our hypothesis is that an acceptable alternative to decrease a query execution time would be the construction of good schemas, with high quality scores, and we have based our approach in this affirmative. We focused on developing IQ analysis mechanisms to address schema quality specially the integrated schema. Initially we built a list of IQ criteria related to data integration aspects and chose to focus on formally specifying the algorithms and definitions of schema IQ criteria â minimality, schema completeness and type consistency. We also defined an algorithm to carry out with schema minimality improvements and algorithms for testing the type consistency measurements. With these experiments we have showed that the query execution time in a data integration system may decrease if the query is submitted to a schema with high scores of minimality and type consistency

ASSUNTO(S)

integraÃÃo de dados qualidade de dados data quality qualidade da informaÃÃo information quality ciencia da computacao data integration

Documentos Relacionados