The effect of using different forms of terms extraction on its comprehensibility and representability in Portuguese textual domains / O efeito do uso de diferentes formas de extração de termos na compreensibilidade e representatividade dos termos em coleções textuais na língua portuguesa
AUTOR(ES)
Merley da Silva Conrado
DATA DE PUBLICAÇÃO
2009
RESUMO
The task of term extraction in textual domains, which is a subtask of the text pre-processing in Text Mining, can be used for many purposes in knowledge extraction processes. These terms must be carefully extracted since their quality will have a high impact in the results. In this work, the quality of these terms involves both representativity in the specific domain and comprehensibility. Considering this high importance, in this work the effects produced in the comprehensibility and representativity of terms were evaluated when different term simplification techniques are utilized in text collections in Portuguese. The term extraction process follows the methodology presented in this work and the techniques used were radicalization, lematization and substantivation. To support this metodology, a term extraction tool was developed and is presented as ExtraT. In order to guarantee the quality of the extracted terms, they were evaluated in an objective and subjective way. The subjective evaluations, assisted by domain specialists, analyze the representativity of the terms in related documents, the comprehensibility of the terms with each technique, and the specialists opinion. The objective evaluations, which are assisted by TaxEM and by Thesagro (National Agricultural Thesaurus), consider the number of extracted terms by each technique and their representativity in the related documents. This objective evaluation of the representativity uses the CTW measure (Context Term Weight) as support. Eight real collections of the agronomy domain were used in the experimental evaluation. As a result, some positive and negative characteristics of each techniques were pointed out, showing that the best technique selection for this domain depends on the main pre-established goal, which can involve obtaining better comprehensibility terms for the user or reducing the quantity of extracted terms
ASSUNTO(S)
pré-processamento term extraction text mining lematização pre-processing mineração de textos stemming extração de termos radicalização substantivação substantivation lemmatization
Documentos Relacionados
- Extração de termos de manuais técnicos de produtos tecnológicos: uma aplicação em Sistemas de Adaptação Textual
- OS GÊNEROS TEXTUAIS EM LIVROS DIDÁTICOS DE LÍNGUA PORTUGUESA: EM BUSCA DO SEGREDO DA ESFINGE
- O tratamento na interação: formas e fórmulas usadas no estabelecimento e encerramento de contato em e-mails de lingua alemã e de língua portuguesa
- UMA NOTA SOBRE A TRANSPOSIÇÃO DE TERMOS QUÍMICOS PARA A LÍNGUA PORTUGUESA
- Gêneros textuais e língua inglesa em uso: uma análise das coleções aprovadas pelo PNLD/LE no Brasil