ClusterizaÃÃo baseada em algoritmos fuzzy

Nicomedes Lopes Cavalcanti JÃnior

Clustering analysis is a technique with applications in many different fields such as data mining, pattern recognition, and image processing. Clustering algorithms aim at partitioning a data set into clusters such that the items within a given cluster have a high degree of similarity, while items belonging to different clusters have a high degree of dissimilarity. An important division of clustering algorithms is made between hard and fuzzy algorithms. Hard clustering algorithms associate a data point or individual with a single cluster, whereas fuzzy clustering algorithms associate an individual with all clusters, varying the individuals degree of membership according to the different clusters. The advantage of fuzzy clustering algorithms is that they are better able to represent uncertainty and it is important, for example, to show that an individual does not totally fit into any one class, but has a certain similarity to several classes. An intuitive way to measure similarity between two individuals is by using a distance measure such as the Euclidian distance. A number of distances are available in the literature. Many of the popular clustering algorithms generally try to minimize an objective criterion based on a distance measure. Throughout an iterative process these algorithms compute parameters in such a way that the value of the objective criterion ecreases until it reaches a convergence state. The problem with many of the distances found in the literature is that they are static. In the case of iterative clustering algorithms it seems reasonable to have distances that change or update according to what is going on with the data and algorithmâs data structure. This dissertation presents two different adaptive distances tailored for the fuzzy c-means algorithm by Prof. Francisco de Carvalho. This algorithm was chosen due to its widespread utilization. In order to evaluate the proposed distances experiments were carried out on benchmark datasets and artificial datasets (to have more accurate results a Monte Carlo experiment was performed in this case). Up to now, comparisons of the new fuzzy c-means algorithms, obtained using adaptive distances, with similar state-of-the-art algorithms have made possible to conclude that, in general, the new algorithms outperform the state-of-the-art ones

ClusterizaÃÃo baseada em algoritmos fuzzy

AUTOR(ES)

DATA DE PUBLICAÇÃO

RESUMO

ASSUNTO(S)

ACESSO AO ARTIGO

Documentos Relacionados