Semi-supervised learning based in disagreement by similarity / Classificação semi-supervisionada baseada em desacordo por similaridade
AUTOR(ES)
Victor Antonio Laguna Gutiérrez
DATA DE PUBLICAÇÃO
2010
RESUMO
Semi-supervised learning is a machine learning paradigm in which the induced hypothesis is improved by taking advantage of unlabeled data. Semi-supervised learning is particularly useful when labeled data is scarce and difficult to obtain. In this context, the Cotraining algorithm was proposed. Cotraining is a widely used semisupervised approach that assumes the availability of two independent views of the data. In most real world scenarios, the multi-view assumption is highly restrictive, impairing its usability for classifification purposes. In this work, we propose the Co2KNN algorithm, which is a one-view Cotraining approach that combines two different k-Nearest Neighbors (KNN) strategies referred to as global and local k-Nearest Neighbors. In the global KNN, the nearest neighbors used to classify a new instance are given by the set of training examples which contains this instance within its k-nearest neighbors. In the local KNN, on the other hand, the neighborhood considered to classify a new instance is the set of training examples computed by the traditional KNN approach. The Co2KNN algorithm is based on the theoretical background given by the Semi-supervised Learning by Disagreement, which claims that the success of the combination of two classifiers in the Cotraining framework is due to the disagreement between the classifiers. We carried out experiments showing that Co2KNN improves significatively the classification accuracy specially when just one view of training data is available. Moreover, we present an optimized algorithm to cope with time complexity of computing the global KNN, allowing Co2KNN to tackle real classification problems
ASSUNTO(S)
semi-supervised leaning semi-supervised learning based in disagreement aprendizado baseado em desacordo aprendizado semi-supervisionado contraining classificação classification cotraining
Documentos Relacionados
- Aprendizado semissupervisionado multidescrição em classificação de textos
- Abordagens para aprendizado semissupervisionado multirrótulo e hierárquico
- Semi-Supervised Methods to Predict Patient Survival from Gene Expression Data
- O algoritmo de aprendizado semi-supervisionado co-training e sua aplicação na rotulação de documentos
- Reabilitação não supervisionada ou semi-supervisionada: uma alternativa prática