Um estudo sobre métodos de Kernel para classificação e agrupamento de dados

AUTOR(ES)
FONTE

IBICT - Instituto Brasileiro de Informação em Ciência e Tecnologia

DATA DE PUBLICAÇÃO

25/08/2009

RESUMO

The learning machines project involves modelling a set of samples based on the mapping performance of the input-output pairs. The group of samples submitted to training provides information for determining the parameters of the model. And the validation and/or test group evaluates the performance of the classifier on its generalization ability. However, the classifier obtained at the end of this process in most cases does not embody the relationship of similarity between samples and classes. This approach therefore results in an incomplete modelling of the information provided by the data. In this work, we deal simultaneously with the basic problem of data analysis and of the project of kernel learning machines: the number of groups in a set of samples and the chosen parameters of the core function. For both, the metric used is the Empirical Alignment to determine similarity between the kernel and the proximity matrix of Fuzzy C-Means (FCM). It is shown that the metric chosen can be maximized depending on the parameters of the FCM and of the core function. The greater the consistency between the structural information embedded in the two data matrices, the higher is the alignment. However, the determination of parameters is not possible by direct adjustment methods. Thus, by solving the problem of mono-objective optimization formulated, the Genetic Algorithm and the Particle Swarm Optimization are the evolutionary methods chosen to find approximations of the parameters which maximize the chosen metric. The parameters obtained are used in Least Square Support Vector Machines (LS-SVMs) according to the methodology proposed here for designing classifiers. Using the eigenvector and Minus methods for ordering the samples in these matrices, it is possible to observe the similarity between individuals of each group and additional information to help characterize the latter. Through experiments using test and reference databases, the results obtained here corroborate the metric and the methods used in these databases for binary classification and clustering. Moreover, under the initially aforementioned problem, the provided observations raise greater awareness about the relationships and the methods employed, allowing for better use of the structural information of the data. 

ASSUNTO(S)

engenharia elétrica teses.

Documentos Relacionados