Aprendizado semissupervisionado multidescrição em classificação de textos / Multi-view semi-supervised learning in text classification

AUTOR(ES)
DATA DE PUBLICAÇÃO

2010

RESUMO

Semi-supervised learning algorithms learn from a combination of both labeled and unlabeled data. Thus, they can be applied in domains where few labeled examples and a vast amount of unlabeled examples are available. Furthermore, semi-supervised learning algorithms may achieve a better performance than supervised learning algorithms trained on the same few labeled examples. A powerful approach to semi-supervised learning, called multi-view learning, can be used whenever the training examples are described by two or more disjoint sets of attributes. Text classification is a domain in which semi-supervised learning algorithms have shown some success. However, multi-view semi-supervised learning has not yet been well explored in this domain despite the possibility of describing textual documents in a myriad of ways. The aim of this work is to analyze the effectiveness of multi-view semi-supervised learning in text classification using unigrams and bigrams as two distinct descriptions of text documents. To this end, we initially consider the widely adopted CO-TRAINING multi-view algorithm and propose some modifications to it in order to deal with the problem of contention points. We also propose the COAL algorithm, which further improves CO-TRAINING by incorporating active learning as a way of dealing with contention points. A thorough experimental evaluation of these algorithms was conducted on real text data sets. The results show that the COAL algorithm, using unigrams as one description of text documents and bigrams as another description, achieves significantly better performance than a single-view semi-supervised algorithm. Taking into account the good results obtained by COAL, we conclude that the use of unigrams and bigrams as two distinct descriptions of text documents can be very effective

ASSUNTO(S)

aprendizado multidescrição machine learning unigramas classificação de textos bigrams co-training cial biogramas multi-view learning aprendizado de máquina unigrams aprendizado semissupervisionado co-training semi-supervised learning self-training coal self-training text classification

Documentos Relacionados