Uma heuristica para o problema de classificação de classificação de conferências explorando relacionamentos múltiplos e indiretos

AUTOR(ES)
DATA DE PUBLICAÇÃO

2008

RESUMO

Extracting usable knowledge from large amounts of data has become one of the main challenges to a variety of fields, such as scientific, industrial or governmental areas. This task requires the data to be represented in a way that not only is the relational information captured, but that it also allows an effective and efficient mining of these data and the understanding of the resulting knowledge. In most of the cases, however, the data are modeled as graphs that arent able to represent multiple relations. This restriction may cause a flaw in the process of matching the real data with the model constructed, and, as a consequence, essential information of the real application is lost. Therefore, this work discuss data mining based on relations where, in contrast with traditional techniques, we propose an innovative heuristic of multigraph mining capable of dealing with multiple and indirect relations in the data, using these relations to identify correlated groups. We constructed a theoretical base on which many reallife applications can be modeled, so that they preserve important relations that exist in the data and are ignored by other models. We applied our new technique in a real scenario of co-authorship networks, in which we intend to group and classify scientific conferences based on authorship affinities. In order to do that, we modeled the data of these networks as multigraph sets, and then we use them to find groups of conferences that are correlated. If these groups can be found in, at least, a certain number of different parts of the multigraph, they will be considered as belonging to the same area. In spite of the fact that the problem we dealt with is NP-Complete and that there is a quite variety in the computational cost of the heuristic, experimental results show that our technique is effective in identifying different areas, even when the data is sparse.

ASSUNTO(S)

computação teses. recuperação de informação teses.

Documentos Relacionados