Um método para a fusão automática de sentenças similares em português / A method for automatic fusion of similar sentence in portuguese

AUTOR(ES)
DATA DE PUBLICAÇÃO

2010

RESUMO

In recent years, there is increasing interest in applications of Natural Language Processing (NLP) that process a collection of texts on the same subject and generate a new output text, for instance, a summary or an answer to a given question. In order to generate quality texts, these applications need to cope with various phenomena such as information redundancy, contradiction and complementarity. In this context, a process that is able to identify common information in a set of related sentences and generate a new sentence by merging information from the input sentences, without redundancies and contradictions, is of great relevance for applications that process multiple texts. Automatic sentence fusion is a relatively new research topic in NLP literature and for Portuguese, in particular, we are not aware of any such work. This work proposes a new method for fusing similar sentences in Portuguese, based on a symbolic and domainindependent approach, and produces Zíper, a sentence fusion system that implements the proposed method. Zíper is the first such system to generate sentences that express all the information from input sentences, i.e., the union of the input set. Moreover, it allows generating sentences that express only the redundant information of the set (considered more important), i.e., the intersection of the input sentences. The system was evaluated intrinsically and the results show that, in general, the generated sentences are well formed and preserve the original message of the set (i.e. the entire message in the fusion by union, and only the main message in the fusion by intersection). Zíper was also evaluated extrinsically in the context of a Portuguese multi-document summarizer. The results suggest that it can improve the quality of summaries by reducing redundancy, which often causes loss of cohesion and coherence

ASSUNTO(S)

automatic sentence fusion geração de texto a partir de texto text-on-text generation fusão automática de sentenças

Documentos Relacionados