Missing data characteristics and the choice of imputation technique: an empirical study

One important characteristic of good data is completeness. Missing data is a major problem in the classification of medical datasets. It leads to incorrect classification of patients, which is dangerous to health management of patients. Many imputation techniques have been employed to solve this pro...

Full description

Saved in:
Bibliographic Details
Main Authors: Alade, Oyekale Abel, Sallehuddin, Roselina, Mohamed Radzi, Nor Haizan, Selamat, Ali
Format: Article
Published: Springer Nature Switzerland AG 2020
Subjects:
Online Access:http://eprints.utm.my/id/eprint/89957/
http://dx.doi.org/10.1007/978-3-030-33582-3_9
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.utm.89957
record_format eprints
spelling my.utm.899572021-03-31T06:31:56Z http://eprints.utm.my/id/eprint/89957/ Missing data characteristics and the choice of imputation technique: an empirical study Alade, Oyekale Abel Sallehuddin, Roselina Mohamed Radzi, Nor Haizan Selamat, Ali QA75 Electronic computers. Computer science One important characteristic of good data is completeness. Missing data is a major problem in the classification of medical datasets. It leads to incorrect classification of patients, which is dangerous to health management of patients. Many imputation techniques have been employed to solve this problem, but these techniques are without recourse to the characteristics that cause the missingness. In this paper, we investigated the causes of missing data in a medical dataset and proposed multiple imputation technique to solving the problem of missing data. A 5-fold-iteration multiple imputation was employed. The whole missing values in the dataset was regenerated 100%. The imputed datasets were validated using extreme learning machine (ELM) classifier. The results show improvement on the accuracy of the imputed datasets. The work can, however, be extended to compare the accuracy of the imputed datasets with different classifiers. Springer Nature Switzerland AG 2020 Article PeerReviewed Alade, Oyekale Abel and Sallehuddin, Roselina and Mohamed Radzi, Nor Haizan and Selamat, Ali (2020) Missing data characteristics and the choice of imputation technique: an empirical study. Advances in Intelligent Systems and Computing, 1073 . pp. 88-97. ISSN 2194-5357 http://dx.doi.org/10.1007/978-3-030-33582-3_9 DOI:10.1007/978-3-030-33582-3_9
institution Universiti Teknologi Malaysia
building UTM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Teknologi Malaysia
content_source UTM Institutional Repository
url_provider http://eprints.utm.my/
topic QA75 Electronic computers. Computer science
spellingShingle QA75 Electronic computers. Computer science
Alade, Oyekale Abel
Sallehuddin, Roselina
Mohamed Radzi, Nor Haizan
Selamat, Ali
Missing data characteristics and the choice of imputation technique: an empirical study
description One important characteristic of good data is completeness. Missing data is a major problem in the classification of medical datasets. It leads to incorrect classification of patients, which is dangerous to health management of patients. Many imputation techniques have been employed to solve this problem, but these techniques are without recourse to the characteristics that cause the missingness. In this paper, we investigated the causes of missing data in a medical dataset and proposed multiple imputation technique to solving the problem of missing data. A 5-fold-iteration multiple imputation was employed. The whole missing values in the dataset was regenerated 100%. The imputed datasets were validated using extreme learning machine (ELM) classifier. The results show improvement on the accuracy of the imputed datasets. The work can, however, be extended to compare the accuracy of the imputed datasets with different classifiers.
format Article
author Alade, Oyekale Abel
Sallehuddin, Roselina
Mohamed Radzi, Nor Haizan
Selamat, Ali
author_facet Alade, Oyekale Abel
Sallehuddin, Roselina
Mohamed Radzi, Nor Haizan
Selamat, Ali
author_sort Alade, Oyekale Abel
title Missing data characteristics and the choice of imputation technique: an empirical study
title_short Missing data characteristics and the choice of imputation technique: an empirical study
title_full Missing data characteristics and the choice of imputation technique: an empirical study
title_fullStr Missing data characteristics and the choice of imputation technique: an empirical study
title_full_unstemmed Missing data characteristics and the choice of imputation technique: an empirical study
title_sort missing data characteristics and the choice of imputation technique: an empirical study
publisher Springer Nature Switzerland AG
publishDate 2020
url http://eprints.utm.my/id/eprint/89957/
http://dx.doi.org/10.1007/978-3-030-33582-3_9
_version_ 1696976240520462336
score 13.15806