Atribuição automática de autoria de obras da literatura brasileira / Atribuição automática de autoria de obras da literatura brasileira

AUTOR(ES)
DATA DE PUBLICAÇÃO

2010

RESUMO

Authorship attribution consists in categorizing an unknown document among some classes of authors previously selected. Knowledge about authorship of a text can be useful when it is required to detect plagiarism in any literary document or to properly give the credits to the author of a book. The most intuitive form of human analysis of a text is by selecting some characteristics that it has. The study of selecting attributes in any written document, such as average word length and vocabulary richness, is known as stylometry. For human analysis of an unknown text, the authorship discovery can take months, also becoming tiring activity. Some computational tools have the functionality of extracting such characteristics from the text, leaving the subjective analysis to the researcher. However, there are computational methods that, in addition to extract attributes, make the authorship attribution, based in the characteristics gathered in the text. Techniques such as neural network, decision tree and classification methods have been applied to this context and presented results that make them relevant to this question. This work presents a data compression method, Prediction by Partial Matching (PPM), as a solution of the authorship attribution problem of Brazilian literary works. The writers and works selected to compose the authors database were, mainly, by their representative in national literature. Besides, the availability of the books has also been considered. The PPM performs the authorship identification without any subjective interference in the text analysis. This method, also, does not make use of attributes presents in the text, differently of others methods. The correct classification rate obtained with PPM, in this work, was approximately 93%, while related works exposes a correct rate between 72% and 89%. In this work, was done, also, authorship attribution with SVM approach. For that, were selected attributes in the text divided in two groups, one word based and other in function-words frequency, obtaining a correct rate of 36,6% and 88,4%, respectively.

ASSUNTO(S)

literatura brasileira brazilian literature estilometria ciencia da computacao processamento de linguagem natural (pln) atribuição de autoria authorship attribution natural language processing (nlp) prediction by partial matching (ppm) stylometry prediction by partial matching (ppm)

Documentos Relacionados