Comparison of imbalanced data treatments: a case study on cleft lip and palate data

This study was conducted to investigate if the resampling and the penalized approaches of balancing a small and imbalance data would improve the classification model produces by random forests learning algorithm on a small and imbalanced Cleft Lip and Palate (CLP) patients’ dataset. Comparison betwe...

Full description

Saved in:
Bibliographic Details
Main Authors: Zaturrawiah Ali Omar, Chin, Su Na, Siti Rahayu Mohd. Hashim, Norhafiza Hamzah
Format: Proceedings
Language:English
English
Published: Faculty of Science and Natural Resources 2020
Subjects:
Online Access:https://eprints.ums.edu.my/id/eprint/21431/1/Comparison%20of%20imbalanced%20data%20treatments.pdf
https://eprints.ums.edu.my/id/eprint/21431/2/Comparison%20of%20imbalanced%20data%20treatments1.pdf
https://eprints.ums.edu.my/id/eprint/21431/
https://www.ums.edu.my/fssa/wp-content/uploads/2020/12/PROCEEDINGS-BOOK-ST-2020-e-ISSN.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.ums.eprints.21431
record_format eprints
spelling my.ums.eprints.214312021-06-17T02:32:21Z https://eprints.ums.edu.my/id/eprint/21431/ Comparison of imbalanced data treatments: a case study on cleft lip and palate data Zaturrawiah Ali Omar Chin, Su Na Siti Rahayu Mohd. Hashim Norhafiza Hamzah QA Mathematics SD Forestry This study was conducted to investigate if the resampling and the penalized approaches of balancing a small and imbalance data would improve the classification model produces by random forests learning algorithm on a small and imbalanced Cleft Lip and Palate (CLP) patients’ dataset. Comparison between a Balanced Random Forest (BRF), Synthetic Minority Over-sampling Technique (SMOTE) on Random Forests (RF) and Weighted Random Forest (WRF) were then conducted on the CLP dataset and results were compared using the area under the curve (AUC) and the tradeoff between Sensitivity and Specificity. The results showed no difference in predictive ability between untreated (RF), oversampling (SMOTE+RF) and penalty treatment (WRF) but poor performances of the downsampling treatment (BRF). It was observed that the small number of training and test sample size had attributed to the results obtained and severely affect the performance of the classifier used for each treatment. The SMOTE+RF oversampling method, however, demonstrated to be promising for the CLP dataset. Faculty of Science and Natural Resources 2020 Proceedings PeerReviewed text en https://eprints.ums.edu.my/id/eprint/21431/1/Comparison%20of%20imbalanced%20data%20treatments.pdf text en https://eprints.ums.edu.my/id/eprint/21431/2/Comparison%20of%20imbalanced%20data%20treatments1.pdf Zaturrawiah Ali Omar and Chin, Su Na and Siti Rahayu Mohd. Hashim and Norhafiza Hamzah (2020) Comparison of imbalanced data treatments: a case study on cleft lip and palate data. https://www.ums.edu.my/fssa/wp-content/uploads/2020/12/PROCEEDINGS-BOOK-ST-2020-e-ISSN.pdf
institution Universiti Malaysia Sabah
building UMS Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Malaysia Sabah
content_source UMS Institutional Repository
url_provider http://eprints.ums.edu.my/
language English
English
topic QA Mathematics
SD Forestry
spellingShingle QA Mathematics
SD Forestry
Zaturrawiah Ali Omar
Chin, Su Na
Siti Rahayu Mohd. Hashim
Norhafiza Hamzah
Comparison of imbalanced data treatments: a case study on cleft lip and palate data
description This study was conducted to investigate if the resampling and the penalized approaches of balancing a small and imbalance data would improve the classification model produces by random forests learning algorithm on a small and imbalanced Cleft Lip and Palate (CLP) patients’ dataset. Comparison between a Balanced Random Forest (BRF), Synthetic Minority Over-sampling Technique (SMOTE) on Random Forests (RF) and Weighted Random Forest (WRF) were then conducted on the CLP dataset and results were compared using the area under the curve (AUC) and the tradeoff between Sensitivity and Specificity. The results showed no difference in predictive ability between untreated (RF), oversampling (SMOTE+RF) and penalty treatment (WRF) but poor performances of the downsampling treatment (BRF). It was observed that the small number of training and test sample size had attributed to the results obtained and severely affect the performance of the classifier used for each treatment. The SMOTE+RF oversampling method, however, demonstrated to be promising for the CLP dataset.
format Proceedings
author Zaturrawiah Ali Omar
Chin, Su Na
Siti Rahayu Mohd. Hashim
Norhafiza Hamzah
author_facet Zaturrawiah Ali Omar
Chin, Su Na
Siti Rahayu Mohd. Hashim
Norhafiza Hamzah
author_sort Zaturrawiah Ali Omar
title Comparison of imbalanced data treatments: a case study on cleft lip and palate data
title_short Comparison of imbalanced data treatments: a case study on cleft lip and palate data
title_full Comparison of imbalanced data treatments: a case study on cleft lip and palate data
title_fullStr Comparison of imbalanced data treatments: a case study on cleft lip and palate data
title_full_unstemmed Comparison of imbalanced data treatments: a case study on cleft lip and palate data
title_sort comparison of imbalanced data treatments: a case study on cleft lip and palate data
publisher Faculty of Science and Natural Resources
publishDate 2020
url https://eprints.ums.edu.my/id/eprint/21431/1/Comparison%20of%20imbalanced%20data%20treatments.pdf
https://eprints.ums.edu.my/id/eprint/21431/2/Comparison%20of%20imbalanced%20data%20treatments1.pdf
https://eprints.ums.edu.my/id/eprint/21431/
https://www.ums.edu.my/fssa/wp-content/uploads/2020/12/PROCEEDINGS-BOOK-ST-2020-e-ISSN.pdf
_version_ 1760229842184306688
score 13.189132