Investigating the effect of different sampling methods on imbalanced datasets using bankruptcy prediction model / Amirah Hazwani Abdul Rahim ... [et al.]
Most classifiers of bankruptcy studies encounter less difficulty when dealing with a balanced non-bankrupt and bankrupt data set. The classifiers evaluate performance of the model through the accuracy rate. However, accuracy rate is not an appropriate measurement when dealing with imbalanced distrib...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Conference or Workshop Item |
Language: | English |
Published: |
2021
|
Subjects: | |
Online Access: | https://ir.uitm.edu.my/id/eprint/56212/1/56212.pdf https://ir.uitm.edu.my/id/eprint/56212/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
my.uitm.ir.56212 |
---|---|
record_format |
eprints |
spelling |
my.uitm.ir.562122023-03-12T23:44:58Z https://ir.uitm.edu.my/id/eprint/56212/ Investigating the effect of different sampling methods on imbalanced datasets using bankruptcy prediction model / Amirah Hazwani Abdul Rahim ... [et al.] Abdul Rahim, Amirah Hazwani Abdul Rashid, Nurazlina Ahmad, Abd-Razak Shamsuddin, Norin Rahayu HG Finance Financial engineering Most classifiers of bankruptcy studies encounter less difficulty when dealing with a balanced non-bankrupt and bankrupt data set. The classifiers evaluate performance of the model through the accuracy rate. However, accuracy rate is not an appropriate measurement when dealing with imbalanced distribution of the data set. Sensitivity and precision were used instead to measure the performance of the classifier. This study employed three sampling strategies to deal with imbalanced datasets: oversampling, undersampling, and SMOTE (Synthetic Minority Oversampling Technique). The intent of this research is to examine how different sampling methods impact the performance of a bankruptcy prediction model utilising highly imbalanced real data. SMEs in the storage and transportation business were the subject of the research. The sample size is 9190 firms with 0.084% bankrupt firms and 99.16% non-bankrupt firms. As a classifier, Partial Least Square-Discriminant Analysis (PLS-DA) was selected. The findings suggest that employing Partial Least Square-Discriminant Analysis, SMOTE increases the classification probability for an imbalanced dataset. In the meantime, neither oversampling nor undersampling improved the results of the Partial Least Square-Discriminant Analysis. 2021 Conference or Workshop Item PeerReviewed text en https://ir.uitm.edu.my/id/eprint/56212/1/56212.pdf Investigating the effect of different sampling methods on imbalanced datasets using bankruptcy prediction model / Amirah Hazwani Abdul Rahim ... [et al.]. (2021) In: e-Proceedings of the 5th International Conference on Computing, Mathematics and Statistics (iCMS 2021), 4-5 August 2021. (Submitted) |
institution |
Universiti Teknologi Mara |
building |
Tun Abdul Razak Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Teknologi Mara |
content_source |
UiTM Institutional Repository |
url_provider |
http://ir.uitm.edu.my/ |
language |
English |
topic |
HG Finance Financial engineering |
spellingShingle |
HG Finance Financial engineering Abdul Rahim, Amirah Hazwani Abdul Rashid, Nurazlina Ahmad, Abd-Razak Shamsuddin, Norin Rahayu Investigating the effect of different sampling methods on imbalanced datasets using bankruptcy prediction model / Amirah Hazwani Abdul Rahim ... [et al.] |
description |
Most classifiers of bankruptcy studies encounter less difficulty when dealing with a balanced non-bankrupt and bankrupt data set. The classifiers evaluate performance of the model through the accuracy rate. However, accuracy rate is not an appropriate measurement when dealing with imbalanced distribution of the data set. Sensitivity and precision were used instead to measure the performance of the classifier. This study employed three sampling strategies to deal with imbalanced datasets: oversampling, undersampling, and SMOTE (Synthetic Minority Oversampling Technique). The intent of this research is to examine how different sampling methods impact the performance of a bankruptcy prediction model utilising highly imbalanced real data. SMEs in the storage and transportation business were the subject of the research. The sample size is 9190 firms with 0.084% bankrupt firms and 99.16% non-bankrupt firms. As a classifier, Partial Least Square-Discriminant Analysis (PLS-DA) was selected. The findings suggest that employing Partial Least Square-Discriminant Analysis, SMOTE increases the classification probability for an imbalanced dataset. In the meantime, neither oversampling nor undersampling improved the results of the Partial Least Square-Discriminant Analysis. |
format |
Conference or Workshop Item |
author |
Abdul Rahim, Amirah Hazwani Abdul Rashid, Nurazlina Ahmad, Abd-Razak Shamsuddin, Norin Rahayu |
author_facet |
Abdul Rahim, Amirah Hazwani Abdul Rashid, Nurazlina Ahmad, Abd-Razak Shamsuddin, Norin Rahayu |
author_sort |
Abdul Rahim, Amirah Hazwani |
title |
Investigating the effect of different sampling methods on imbalanced datasets using bankruptcy prediction model / Amirah Hazwani Abdul Rahim ... [et al.] |
title_short |
Investigating the effect of different sampling methods on imbalanced datasets using bankruptcy prediction model / Amirah Hazwani Abdul Rahim ... [et al.] |
title_full |
Investigating the effect of different sampling methods on imbalanced datasets using bankruptcy prediction model / Amirah Hazwani Abdul Rahim ... [et al.] |
title_fullStr |
Investigating the effect of different sampling methods on imbalanced datasets using bankruptcy prediction model / Amirah Hazwani Abdul Rahim ... [et al.] |
title_full_unstemmed |
Investigating the effect of different sampling methods on imbalanced datasets using bankruptcy prediction model / Amirah Hazwani Abdul Rahim ... [et al.] |
title_sort |
investigating the effect of different sampling methods on imbalanced datasets using bankruptcy prediction model / amirah hazwani abdul rahim ... [et al.] |
publishDate |
2021 |
url |
https://ir.uitm.edu.my/id/eprint/56212/1/56212.pdf https://ir.uitm.edu.my/id/eprint/56212/ |
_version_ |
1761622269607542784 |
score |
13.211869 |