AUTOMATIC CLASSIFICATION OF SEMI-STRUCTURED DATA / CLASSIFICAÇÃO AUTOMÁTICA DE DADOS SEMI-ESTRUTURADOS

AUTOR(ES)
DATA DE PUBLICAÇÃO

2009

RESUMO

The problem of data classification goes back to the definition of taxonomies covering knowledge areas. With the advent of the Web, the amount of data available has increased several orders of magnitude, making manual data classification impossible. This dissertation proposes a method to automatically classify semi-structured data, represented by frames, without any previous knowledge about structured classes. The dissertation introduces an algorithm, based on K-Medoid, capable of organizing a set of frames into classes, structured as a strict hierarchy. The classification of the frames is based on a closeness criterion that takes into account the attributes and their values in each frame.

ASSUNTO(S)

classificacao classification

Documentos Relacionados