Escore de incerteza em bancos de dados categóricos


IBICT - Instituto Brasileiro de Informação em Ciência e Tecnologia




We have been witnessing a signicant growth in the volume of biological data, in particular biomolecular data that are stored in databases such as Genbank, KOGG SCOP, PDB, and Uniprot, which are made available through the internet and have been causing a major impact in research and development activities. Such growth is explained by the development of novel and less costly data gathering techniques, as well as, lower costs and higher availability of storage and communication resources. A key feature that distinguishes those databases is regarding the rocedure to generate and to maintain those databases. Several databases are created using automated procedures (in silico) and the resulting data is not curated by an expert. Other databases, named curated, employ specialized supervision for both generation and revision of haracteristics, which may be performed by the users that access the databases through the internet. The curated databases present a much higher quality with respect to annotations, but are very costly when compared to automatic processes. In this scenario, research on novel methodologies and techniques that help on the revision process are relevant, since they make it more ecient and less costly. This work aims to investigate, develop, and evaluate these methodologies and techniques and has two main contributions. The rst is a methodology for temporally characterizing the modications in a categorical database. This methodology is applied to the UniprotKB/Swiss-prot, and quantied the record changes in keywords from this database. We also characterize the modications on the keywork associations, under a temporal perspective. The second contribution is a methodology for improving the revision process. An example of application scenario is the revision of the eld keywords from the UniprotKB/Swiss-prot database, where we can clearly see that proposed methodology is efective.


bioinformática teses. banco de dados teses. mineração de dados (computação) teses.

Documentos Relacionados