Estratégias para tratamento de variáveis com dados faltantes durante o desenvolvimento de modelos preditivos / Strategies for treatment of variables with missing data during the development of predictive models
AUTOR(ES)
Fernando Assunção
FONTE
IBICT - Instituto Brasileiro de Informação em Ciência e Tecnologia
DATA DE PUBLICAÇÃO
09/05/2012
RESUMO
Predictive models have been increasingly used by the market in order to assist companies in risk mitigation, portfolio growth, customer retention, fraud prevention, among others. During the model development, however, it is usual to have, among the predictive variables, some who have data not filled in (missing values), thus it is necessary to adopt a procedure to treat these variables. Given this scenario, the aim of this study is to discuss frameworks to deal with missing data in predictive models, encouraging the use of some already known by academia that are still not used by the market. This paper describes seven methods, which were submitted to an empirical application using a Credit Score data set. Each framework described resulted in a predictive model developed and the results were evaluated and compared through a series of widely used performance metrics (KS, Gini, ROC curve, Approval curve). In this application, the frameworks that presented better performance were the ones that treated missing data as a separate category (technique already used by the market) and the framework which consists of grouping the missing data in the category most similar conceptually. The worst performance framework otherwise was the one that simply ignored the variable containing missing values, another procedure commonly used by the market.
ASSUNTO(S)
credit score credit score dados faltantes imputação múltipla missing values modelos preditivos multiple imputation predictive models
Documentos Relacionados
- Tratamento de dados faltantes empregando biclusterização com imputação múltipla
- Models for count data with applications
- ESTIMAÇÃO DE MODELOS LOGLINEARES COM DADOS FALTANTES: UMA APLICAÇÃO AO SAEB/99
- Strategies for the development of credit score with the inference rejected
- Development and Evaluation of Predictive Controllers Based on Bilinear Models