An enhanced machine learning framework for Type 2 diabetes classification using imbalanced data with missing values

Diabetes is one of the most common metabolic diseases that cause high blood sugar. Early diagnosis of such a condition is challenging due to its complex interdependence on various factors. There is a need to develop critical decision support systems to assist medical practitioners in the diagnosis p...

Full description

Saved in:
Bibliographic Details
Main Authors: Roy, Kumarmangal, Ahmad, Muneer, Waqar, Kinza, Priyaah, Kirthanaah, Nebhen, Jamel, Alshamrani, Sultan S., Raza, Muhammad Ahsan, Ali, Ihsan
Format: Article
Published: Wiley 2021
Subjects:
Online Access:http://eprints.um.edu.my/33909/
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.um.eprints.33909
record_format eprints
spelling my.um.eprints.339092022-08-16T01:05:06Z http://eprints.um.edu.my/33909/ An enhanced machine learning framework for Type 2 diabetes classification using imbalanced data with missing values Roy, Kumarmangal Ahmad, Muneer Waqar, Kinza Priyaah, Kirthanaah Nebhen, Jamel Alshamrani, Sultan S. Raza, Muhammad Ahsan Ali, Ihsan Q Science (General) QA Mathematics T Technology (General) Diabetes is one of the most common metabolic diseases that cause high blood sugar. Early diagnosis of such a condition is challenging due to its complex interdependence on various factors. There is a need to develop critical decision support systems to assist medical practitioners in the diagnosis process. This research proposes developing a predictive model that can achieve a high classification accuracy of type 2 diabetes. The study consisted of two fundamental parts. Firstly, the study investigated handling missing data adopting data imputation, namely, median value imputation, K-nearest neighbor imputation, and iterative imputation. Consequently, the study validated the implications of these imputations using various classification algorithms, i.e., linear, tree-based, and ensemble algorithms, to see how each method affected classification accuracy. Secondly, Artificial Neural Network was employed to model the best performing imputed data, balanced with SMOTETomek ensuring each class is represented fairly. This approach provided the best accuracy of 98% on the test data, outperforming accuracies achieved in prior studies using the same dataset. The dataset used in this study is concerned with gender and population. As a prospect, the study recommends adopting a larger population sample without geographic boundaries. Additionally, as the developed Artificial Neural Network model did not undergo any specific hyperparameter tuning, it would be interesting to explore tuning on top of normalized data to optimize accuracy further. Wiley 2021-07-06 Article PeerReviewed Roy, Kumarmangal and Ahmad, Muneer and Waqar, Kinza and Priyaah, Kirthanaah and Nebhen, Jamel and Alshamrani, Sultan S. and Raza, Muhammad Ahsan and Ali, Ihsan (2021) An enhanced machine learning framework for Type 2 diabetes classification using imbalanced data with missing values. Complexity, 2021. ISSN 1076-2787, DOI https://doi.org/10.1155/2021/9953314 <https://doi.org/10.1155/2021/9953314>. 10.1155/2021/9953314
institution Universiti Malaya
building UM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Malaya
content_source UM Research Repository
url_provider http://eprints.um.edu.my/
topic Q Science (General)
QA Mathematics
T Technology (General)
spellingShingle Q Science (General)
QA Mathematics
T Technology (General)
Roy, Kumarmangal
Ahmad, Muneer
Waqar, Kinza
Priyaah, Kirthanaah
Nebhen, Jamel
Alshamrani, Sultan S.
Raza, Muhammad Ahsan
Ali, Ihsan
An enhanced machine learning framework for Type 2 diabetes classification using imbalanced data with missing values
description Diabetes is one of the most common metabolic diseases that cause high blood sugar. Early diagnosis of such a condition is challenging due to its complex interdependence on various factors. There is a need to develop critical decision support systems to assist medical practitioners in the diagnosis process. This research proposes developing a predictive model that can achieve a high classification accuracy of type 2 diabetes. The study consisted of two fundamental parts. Firstly, the study investigated handling missing data adopting data imputation, namely, median value imputation, K-nearest neighbor imputation, and iterative imputation. Consequently, the study validated the implications of these imputations using various classification algorithms, i.e., linear, tree-based, and ensemble algorithms, to see how each method affected classification accuracy. Secondly, Artificial Neural Network was employed to model the best performing imputed data, balanced with SMOTETomek ensuring each class is represented fairly. This approach provided the best accuracy of 98% on the test data, outperforming accuracies achieved in prior studies using the same dataset. The dataset used in this study is concerned with gender and population. As a prospect, the study recommends adopting a larger population sample without geographic boundaries. Additionally, as the developed Artificial Neural Network model did not undergo any specific hyperparameter tuning, it would be interesting to explore tuning on top of normalized data to optimize accuracy further.
format Article
author Roy, Kumarmangal
Ahmad, Muneer
Waqar, Kinza
Priyaah, Kirthanaah
Nebhen, Jamel
Alshamrani, Sultan S.
Raza, Muhammad Ahsan
Ali, Ihsan
author_facet Roy, Kumarmangal
Ahmad, Muneer
Waqar, Kinza
Priyaah, Kirthanaah
Nebhen, Jamel
Alshamrani, Sultan S.
Raza, Muhammad Ahsan
Ali, Ihsan
author_sort Roy, Kumarmangal
title An enhanced machine learning framework for Type 2 diabetes classification using imbalanced data with missing values
title_short An enhanced machine learning framework for Type 2 diabetes classification using imbalanced data with missing values
title_full An enhanced machine learning framework for Type 2 diabetes classification using imbalanced data with missing values
title_fullStr An enhanced machine learning framework for Type 2 diabetes classification using imbalanced data with missing values
title_full_unstemmed An enhanced machine learning framework for Type 2 diabetes classification using imbalanced data with missing values
title_sort enhanced machine learning framework for type 2 diabetes classification using imbalanced data with missing values
publisher Wiley
publishDate 2021
url http://eprints.um.edu.my/33909/
_version_ 1744649157143953408
score 13.160551