Comparison of imbalanced data treatments: a case study on cleft lip and palate data
This study was conducted to investigate if the resampling and the penalized approaches of balancing a small and imbalance data would improve the classification model produces by random forests learning algorithm on a small and imbalanced Cleft Lip and Palate (CLP) patients’ dataset. Comparison betwe...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Proceedings |
Language: | English English |
Published: |
Faculty of Science and Natural Resources
2020
|
Subjects: | |
Online Access: | https://eprints.ums.edu.my/id/eprint/21431/1/Comparison%20of%20imbalanced%20data%20treatments.pdf https://eprints.ums.edu.my/id/eprint/21431/2/Comparison%20of%20imbalanced%20data%20treatments1.pdf https://eprints.ums.edu.my/id/eprint/21431/ https://www.ums.edu.my/fssa/wp-content/uploads/2020/12/PROCEEDINGS-BOOK-ST-2020-e-ISSN.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
my.ums.eprints.21431 |
---|---|
record_format |
eprints |
spelling |
my.ums.eprints.214312021-06-17T02:32:21Z https://eprints.ums.edu.my/id/eprint/21431/ Comparison of imbalanced data treatments: a case study on cleft lip and palate data Zaturrawiah Ali Omar Chin, Su Na Siti Rahayu Mohd. Hashim Norhafiza Hamzah QA Mathematics SD Forestry This study was conducted to investigate if the resampling and the penalized approaches of balancing a small and imbalance data would improve the classification model produces by random forests learning algorithm on a small and imbalanced Cleft Lip and Palate (CLP) patients’ dataset. Comparison between a Balanced Random Forest (BRF), Synthetic Minority Over-sampling Technique (SMOTE) on Random Forests (RF) and Weighted Random Forest (WRF) were then conducted on the CLP dataset and results were compared using the area under the curve (AUC) and the tradeoff between Sensitivity and Specificity. The results showed no difference in predictive ability between untreated (RF), oversampling (SMOTE+RF) and penalty treatment (WRF) but poor performances of the downsampling treatment (BRF). It was observed that the small number of training and test sample size had attributed to the results obtained and severely affect the performance of the classifier used for each treatment. The SMOTE+RF oversampling method, however, demonstrated to be promising for the CLP dataset. Faculty of Science and Natural Resources 2020 Proceedings PeerReviewed text en https://eprints.ums.edu.my/id/eprint/21431/1/Comparison%20of%20imbalanced%20data%20treatments.pdf text en https://eprints.ums.edu.my/id/eprint/21431/2/Comparison%20of%20imbalanced%20data%20treatments1.pdf Zaturrawiah Ali Omar and Chin, Su Na and Siti Rahayu Mohd. Hashim and Norhafiza Hamzah (2020) Comparison of imbalanced data treatments: a case study on cleft lip and palate data. https://www.ums.edu.my/fssa/wp-content/uploads/2020/12/PROCEEDINGS-BOOK-ST-2020-e-ISSN.pdf |
institution |
Universiti Malaysia Sabah |
building |
UMS Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Malaysia Sabah |
content_source |
UMS Institutional Repository |
url_provider |
http://eprints.ums.edu.my/ |
language |
English English |
topic |
QA Mathematics SD Forestry |
spellingShingle |
QA Mathematics SD Forestry Zaturrawiah Ali Omar Chin, Su Na Siti Rahayu Mohd. Hashim Norhafiza Hamzah Comparison of imbalanced data treatments: a case study on cleft lip and palate data |
description |
This study was conducted to investigate if the resampling and the penalized approaches of balancing a small and imbalance data would improve the classification model produces by random forests learning algorithm on a small and imbalanced Cleft Lip and Palate (CLP) patients’ dataset. Comparison between a Balanced Random Forest (BRF), Synthetic Minority Over-sampling Technique (SMOTE) on Random Forests (RF) and Weighted Random Forest (WRF) were then conducted on the CLP dataset and results were compared using the area under the curve (AUC) and the tradeoff between Sensitivity and Specificity. The results showed no difference in predictive ability between untreated (RF), oversampling (SMOTE+RF) and penalty treatment (WRF) but poor performances of the downsampling treatment (BRF). It was observed that the small number of training and test sample size had attributed to the results obtained and severely affect the performance of the classifier used for each treatment. The SMOTE+RF oversampling method, however, demonstrated to be promising for the CLP dataset. |
format |
Proceedings |
author |
Zaturrawiah Ali Omar Chin, Su Na Siti Rahayu Mohd. Hashim Norhafiza Hamzah |
author_facet |
Zaturrawiah Ali Omar Chin, Su Na Siti Rahayu Mohd. Hashim Norhafiza Hamzah |
author_sort |
Zaturrawiah Ali Omar |
title |
Comparison of imbalanced data treatments: a case study on cleft lip and palate data |
title_short |
Comparison of imbalanced data treatments: a case study on cleft lip and palate data |
title_full |
Comparison of imbalanced data treatments: a case study on cleft lip and palate data |
title_fullStr |
Comparison of imbalanced data treatments: a case study on cleft lip and palate data |
title_full_unstemmed |
Comparison of imbalanced data treatments: a case study on cleft lip and palate data |
title_sort |
comparison of imbalanced data treatments: a case study on cleft lip and palate data |
publisher |
Faculty of Science and Natural Resources |
publishDate |
2020 |
url |
https://eprints.ums.edu.my/id/eprint/21431/1/Comparison%20of%20imbalanced%20data%20treatments.pdf https://eprints.ums.edu.my/id/eprint/21431/2/Comparison%20of%20imbalanced%20data%20treatments1.pdf https://eprints.ums.edu.my/id/eprint/21431/ https://www.ums.edu.my/fssa/wp-content/uploads/2020/12/PROCEEDINGS-BOOK-ST-2020-e-ISSN.pdf |
_version_ |
1760229842184306688 |
score |
13.189132 |