Seleção de características por meio de algoritmos genéticos para aprimoramento de rankings e de modelos de classificação / Feature selection by genetic algorithms to improve ranking and classification models

AUTOR(ES)
DATA DE PUBLICAÇÃO

2011

RESUMO

Content-based image retrieval (CBIR) and classification systems rely on feature vectors extracted from images considering specific visual criteria. It is common that the size of a feature vector is of the order of hundreds of elements. When the size (dimensionality) of the feature vector is increased, a higher degree of redundancy and irrelevancy can be observed, leading to the "curse of dimensionality" problem. Thus, the selection of relevant features is a key aspect in a CBIR or classification system. This thesis presents new methods based on genetic algorithms (GA) to perform feature selection. The Fc ("Fitness coach") family of fitness functions proposed takes advantage of single valued ranking evaluation functions, in order to develop a new method of genetic feature selection tailored to improve the accuracy of CBIR systems. The ability of the genetic algorithms to boost feature selection by employing evaluation criteria (fitness functions) improves up to 22% the precision of the query answers in the analyzed databases when compared to traditional wrapper feature selection methods based on decision-tree (C4.5), naive bayes, support vector machine, 1-nearest neighbor and association rule mining. Other contributions of this thesis are two filter-based feature selection algorithms for classification purposes, which calculate the simplified silhouette statistic as evaluation function: the silhouette-based greedy search (SiGS) and the silhouette-based genetic algorithm search (SiGAS). The proposed algorithms overcome the state-of-the-art ones (CFS, FCBF and ReliefF, among others). It is important to stress that the gain in accuracy of the proposed methods family Fc, SiGS and SIGAS is allied to a significant decrease in the feature vector size, what can reach up to 90%

ASSUNTO(S)

algoritmos genéticos classificação classification consultas por similaridade feature selection genetic algorithms imagens médicas medical images seleção de características similarity search

Documentos Relacionados