PROGRAMAÇÃO GENÉTICA, REDES NEURAIS ARTIFICIAIS E TÉCNICAS DE BALANCEAMENTO NA MODELAGEM DE DADOS AGRÍCOLAS: ESTUDO DA DOENÇA MOFO BRANCO

AUTOR(ES)
FONTE

IBICT - Instituto Brasileiro de Informação em Ciência e Tecnologia

DATA DE PUBLICAÇÃO

01/08/2012

RESUMO

Data regression problems are common in the literature, therein it is desired to infer the relationship between the dependent (output) and independent variable (input) from a dataset. Infer the relationship between variables is not a simple task, many times there is a high non-linearity and noise in the data inside them. Two machine learning techniques that are able to work with this type of information are investigated, the Genetic Programming and Artificial Neural Networks. Still, in many cases the machine learning technique cannot find a satisfactory solution due to the unbalance of the database. Therefore, the aim of this study was to apply machine learning techniques in regression of unbalanced data, evaluating and comparing the results obtained with different approaches. The balancing method used is summarized in constructing weights to the data set, one for each sample, which represents the importance of example during the learning process model. This problem of unbalanced data modeling applies in a real agronomic data modeling, specifically in the study of white mold disease caused by the fungus Sclerotinia sclerotiorum (Lib.) de Bary. Due to the high destructive power of the disease to crops, knowledge of the presence of resistance structures called sclerotia in an area is of paramount importance so that appropriate actions are taken to treat the disease. In this case study, the task is to use learning techniques to build a predictive model of sclerotia from meteorological characteristics and location of the sample to the state of Paraná, using a set of unbalanced data. Different approaches to the techniques and the balancing method was employed for constructing the model. The Artificial Neural Networks with resilient propagation learning algorithm achieved better performance in creating the model for prediction of sclerotia able to predict the actual outcome with a correlation of 0.763 and a mean absolute error of 24.35. To identify if the employee balancing method improved the results we applied the Kruskal-Wallis test. The test showed that there is a statistically significant improvement between genetic programming with and without balancing technique. However the technique that showed the best results was the neural network with resilient propagation learning algorithm, the data set of white mold and in some cases experimental.

ASSUNTO(S)

modelagem escleródio computação regressão desbalanceados modeling sclerotia computing regression unbalanced ciencia da computacao

Documentos Relacionados