Sumarização automática multidocumento: seleção de conteúdo com base no Modelo CST (Cross-document Structure Theory) / Multidocument sumarization: content selection based on CST (Cross-document Structure Theory)

AUTOR(ES)
DATA DE PUBLICAÇÃO

2010

RESUMO

Multidocument summarization consists in producing a summary from a group of texts on a same topic, containing the most relevant information according to the users interest. Recently, with the huge amount of growing information over the internet and the short time available to learn and process the information of interest, automatic summaries have become a very important resource. In this work, we explored content selection methods for multidocument summarization based on CST (Cross-document Structure Theory) a recently proposed model and already investigated in the Computational Linguistics area. Particularly, in this work we defined and formalized content selection operators based on CST model. These operators represent possible summarization preferences and they focus on the treatment of the main challenges of multidocument summarization: redundancy, complementarity and contradiction among information. These operators are specified in templates containing rules and functions that relate the preferences to CST relations. Specifically, we define operators for extracting main information, context information, identifying authorship, treating redundancy and showing contradicted information. We also explored the impact of CST model over superficial summarization methods. Experiments were done using journalistic texts written in Brazilian Portuguese. Results show that the use of CST model helps to improve informativeness and quality in automatic summaries

ASSUNTO(S)

cst conteúdo seleção multidocument sumarization cst content sumarização. multidocumento selection

Documentos Relacionados