Bulk-loading Dynamic Metric Acess Methods / Operação de carga-rápida (bulk-loading) em métodos de acesso métricos

AUTOR(ES)
DATA DE PUBLICAÇÃO

2007

RESUMO

The similarity degree between data elements is the primordial factor for information retrieval in databases that handle complex data, such as genetic sequences, time series and multimedia objects (long images, audio, videos, texts). To answer these queries in a reduced time, it is necessary methods that use metrics to evaluate the similarity between elements. These methods are known as Metric Access Methods. The most known Metric Access Methods in the literature are the M-tree and the Slim-tree. There are two ways to build index in any access method: inserting element one by one or using the bulk-load operation. The first build type is very common and required for all kinds of dynamic access methods. The bulk-load operations are used for bigger datasets, as for example, in the recovery of backups and re-creation of database indexes. In these situations, the individual insertion takes much time. The bulk-load operation makes it possible to construct indexes more efficiently and faster, because it has the availability of the whole data when the index structure are created, and thus, it is possible to explore the properties of the whole set. Database Management Systems offer bulk-load operations for the traditional methods, so it is important that they can be also supplied for Metric Access Methods. This work presents three bulk-loading approaches and it proposes a technique to bulk-load data into Metric Access Methods. An algorithm based on this technique was developed to construct a Slim-tree. This is the first bulk-load algorithm based on sampling that always produces a valid Slim-tree, therefore is the first one described in literature that can be enclosed in a Database Management System. The experiments show that this algorithm keeps good clustering of data and in such a way that it surpasses the performance of sequential insertion, taking into account the construction performance and the efficiency to perform queries

ASSUNTO(S)

estruturas de indexação indexing structures base de dados bulk-loading database métodos de acesso métricos operação de carga-rápida metric access methods

Documentos Relacionados