Aprendizado semissupervisionado multidescrição em classificação de textos / Multi-view semi-supervised learning in text classification
AUTOR(ES)
Ígor Assis Braga
DATA DE PUBLICAÇÃO
2010
RESUMO
Semi-supervised learning algorithms learn from a combination of both labeled and unlabeled data. Thus, they can be applied in domains where few labeled examples and a vast amount of unlabeled examples are available. Furthermore, semi-supervised learning algorithms may achieve a better performance than supervised learning algorithms trained on the same few labeled examples. A powerful approach to semi-supervised learning, called multi-view learning, can be used whenever the training examples are described by two or more disjoint sets of attributes. Text classification is a domain in which semi-supervised learning algorithms have shown some success. However, multi-view semi-supervised learning has not yet been well explored in this domain despite the possibility of describing textual documents in a myriad of ways. The aim of this work is to analyze the effectiveness of multi-view semi-supervised learning in text classification using unigrams and bigrams as two distinct descriptions of text documents. To this end, we initially consider the widely adopted CO-TRAINING multi-view algorithm and propose some modifications to it in order to deal with the problem of contention points. We also propose the COAL algorithm, which further improves CO-TRAINING by incorporating active learning as a way of dealing with contention points. A thorough experimental evaluation of these algorithms was conducted on real text data sets. The results show that the COAL algorithm, using unigrams as one description of text documents and bigrams as another description, achieves significantly better performance than a single-view semi-supervised algorithm. Taking into account the good results obtained by COAL, we conclude that the use of unigrams and bigrams as two distinct descriptions of text documents can be very effective
ASSUNTO(S)
aprendizado multidescrição machine learning unigramas classificação de textos bigrams co-training cial biogramas multi-view learning aprendizado de máquina unigrams aprendizado semissupervisionado co-training semi-supervised learning self-training coal self-training text classification
Documentos Relacionados
- Abordagens para aprendizado semissupervisionado multirrótulo e hierárquico
- Semi-supervised learning based in disagreement by similarity
- O algoritmo de aprendizado semi-supervisionado co-training e sua aplicação na rotulação de documentos
- Efeitos do treinamento fisíco multimodal na prevenção secundária de queda em idosos: treinamento supervisionado e semissupervisionado
- Semi-Supervised Methods to Predict Patient Survival from Gene Expression Data