Um estudo sobre a Teoria da Predição aplicada à análise semântica de Linguagens Naturais. / A study on the Theory of Prediction applied to the semantical analysis of Natural Languages.

AUTOR(ES)
DATA DE PUBLICAÇÃO

2010

RESUMO

In this work, computer learning is studied as a problem of induction. Starting with the proposal of an architecture for a system of semantic analisys of Natural Languages, the two modules necessary for its construction were built and tested independently: a pre-processor, capable of mapping the contents of texts to a representation in which the semantics of each symbol is explicit, and an inductor module, capable of formulating theories to explain chains of events. The component responsible for the induction of theories implements a restricted version of the Solomonoff Predictor, capable of producing hypotheses pertaining to the set of Regular Languages. Such device presents elevated computational complexity and very high processing time even for very simple inputs. Nonetheless, this work presents new and interesting results showing its functional performance. The pre-processing module of the proposed system consists of an implementation of Latent Semantic Analisys, a method which draws from statistical correlation to build a representation capable of approximating semantical relations made by human beings. It was used to index the more than 470 thousand texts contained in the first disk of the Reuters RCV1 corpus, resulting, through dozens of parameter variations, 71:5GB of data that were used for various statistical analises. The test results are convincing that the use of that pre-processing module leads to considerable gains in the system proposed. The integration of the two components built into a full-fledged semantical analyser of Natural Languages presents itself, at this moment, unachievable due to the processing time required by the inductor module, and remains as a task for future work. Still, Solomonoffs Theory of Prediction shows itself adequate for the treatment of semantical analysis of Natural Languages, provided new ways of palliating its processing time are devised.

ASSUNTO(S)

aprendizado computacional artificial intelligence computer learning formal semantics inteligência artificial linguagem natural natural language semântica formal

Documentos Relacionados