Missing value substitution: an approach based on evolutionary algorithm for clustering data / Substituição de valores ausentes: uma abordagem baseada em um algoritmo evolutivo para agrupamento de dados

AUTOR(ES)
DATA DE PUBLICAÇÃO

2010

RESUMO

The substitution of missing values, also called imputation, is an important data preparation task for data mining applications. This work proposes and evaluates an algorithm for missing values imputation that is based on an evolutionary algorithm for clustering. This algorithm is based on the assumption that clusters of (partially unknown) data can provide useful information for the imputation process. In order to experimentally assess the proposed method, simulations of missing values were performed on six classification datasets, with two missingness mechanisms widely used in practice: MCAR and MAR. Imputation algorithms have been traditionally assessed by some measures of prediction capability. However, this traditionall approach does not allow inferring the influence of imputed values in the ultimate modeling tasks (e.g., in classification). This work describes the experimental results obtained from the prediction and insertion bias perspectives in classification problems. The results illustrate different scenarios in which the proposed algorithm performs similarly to other six imputation algorithms reported in the literature. Finally, statistical analyses suggest that best prediction results do not necessarily imply in less classification bias

ASSUNTO(S)

clustering agrupamento de dados imputation missing values imputação data mining mineração de dados valores ausentes

Documentos Relacionados