Investigating the effect of different sampling methods on imbalanced datasets using bankruptcy prediction model / Amirah Hazwani Abdul Rahim ... [et al.]

Most classifiers of bankruptcy studies encounter less difficulty when dealing with a balanced non-bankrupt and bankrupt data set. The classifiers evaluate performance of the model through the accuracy rate. However, accuracy rate is not an appropriate measurement when dealing with imbalanced distrib...

Full description

Saved in:
Bibliographic Details
Main Authors: Abdul Rahim, Amirah Hazwani, Abdul Rashid, Nurazlina, Ahmad, Abd-Razak, Shamsuddin, Norin Rahayu
Format: Conference or Workshop Item
Language:English
Published: 2021
Subjects:
Online Access:https://ir.uitm.edu.my/id/eprint/56212/1/56212.pdf
https://ir.uitm.edu.my/id/eprint/56212/
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.uitm.ir.56212
record_format eprints
spelling my.uitm.ir.562122023-03-12T23:44:58Z https://ir.uitm.edu.my/id/eprint/56212/ Investigating the effect of different sampling methods on imbalanced datasets using bankruptcy prediction model / Amirah Hazwani Abdul Rahim ... [et al.] Abdul Rahim, Amirah Hazwani Abdul Rashid, Nurazlina Ahmad, Abd-Razak Shamsuddin, Norin Rahayu HG Finance Financial engineering Most classifiers of bankruptcy studies encounter less difficulty when dealing with a balanced non-bankrupt and bankrupt data set. The classifiers evaluate performance of the model through the accuracy rate. However, accuracy rate is not an appropriate measurement when dealing with imbalanced distribution of the data set. Sensitivity and precision were used instead to measure the performance of the classifier. This study employed three sampling strategies to deal with imbalanced datasets: oversampling, undersampling, and SMOTE (Synthetic Minority Oversampling Technique). The intent of this research is to examine how different sampling methods impact the performance of a bankruptcy prediction model utilising highly imbalanced real data. SMEs in the storage and transportation business were the subject of the research. The sample size is 9190 firms with 0.084% bankrupt firms and 99.16% non-bankrupt firms. As a classifier, Partial Least Square-Discriminant Analysis (PLS-DA) was selected. The findings suggest that employing Partial Least Square-Discriminant Analysis, SMOTE increases the classification probability for an imbalanced dataset. In the meantime, neither oversampling nor undersampling improved the results of the Partial Least Square-Discriminant Analysis. 2021 Conference or Workshop Item PeerReviewed text en https://ir.uitm.edu.my/id/eprint/56212/1/56212.pdf Investigating the effect of different sampling methods on imbalanced datasets using bankruptcy prediction model / Amirah Hazwani Abdul Rahim ... [et al.]. (2021) In: e-Proceedings of the 5th International Conference on Computing, Mathematics and Statistics (iCMS 2021), 4-5 August 2021. (Submitted)
institution Universiti Teknologi Mara
building Tun Abdul Razak Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Teknologi Mara
content_source UiTM Institutional Repository
url_provider http://ir.uitm.edu.my/
language English
topic HG Finance
Financial engineering
spellingShingle HG Finance
Financial engineering
Abdul Rahim, Amirah Hazwani
Abdul Rashid, Nurazlina
Ahmad, Abd-Razak
Shamsuddin, Norin Rahayu
Investigating the effect of different sampling methods on imbalanced datasets using bankruptcy prediction model / Amirah Hazwani Abdul Rahim ... [et al.]
description Most classifiers of bankruptcy studies encounter less difficulty when dealing with a balanced non-bankrupt and bankrupt data set. The classifiers evaluate performance of the model through the accuracy rate. However, accuracy rate is not an appropriate measurement when dealing with imbalanced distribution of the data set. Sensitivity and precision were used instead to measure the performance of the classifier. This study employed three sampling strategies to deal with imbalanced datasets: oversampling, undersampling, and SMOTE (Synthetic Minority Oversampling Technique). The intent of this research is to examine how different sampling methods impact the performance of a bankruptcy prediction model utilising highly imbalanced real data. SMEs in the storage and transportation business were the subject of the research. The sample size is 9190 firms with 0.084% bankrupt firms and 99.16% non-bankrupt firms. As a classifier, Partial Least Square-Discriminant Analysis (PLS-DA) was selected. The findings suggest that employing Partial Least Square-Discriminant Analysis, SMOTE increases the classification probability for an imbalanced dataset. In the meantime, neither oversampling nor undersampling improved the results of the Partial Least Square-Discriminant Analysis.
format Conference or Workshop Item
author Abdul Rahim, Amirah Hazwani
Abdul Rashid, Nurazlina
Ahmad, Abd-Razak
Shamsuddin, Norin Rahayu
author_facet Abdul Rahim, Amirah Hazwani
Abdul Rashid, Nurazlina
Ahmad, Abd-Razak
Shamsuddin, Norin Rahayu
author_sort Abdul Rahim, Amirah Hazwani
title Investigating the effect of different sampling methods on imbalanced datasets using bankruptcy prediction model / Amirah Hazwani Abdul Rahim ... [et al.]
title_short Investigating the effect of different sampling methods on imbalanced datasets using bankruptcy prediction model / Amirah Hazwani Abdul Rahim ... [et al.]
title_full Investigating the effect of different sampling methods on imbalanced datasets using bankruptcy prediction model / Amirah Hazwani Abdul Rahim ... [et al.]
title_fullStr Investigating the effect of different sampling methods on imbalanced datasets using bankruptcy prediction model / Amirah Hazwani Abdul Rahim ... [et al.]
title_full_unstemmed Investigating the effect of different sampling methods on imbalanced datasets using bankruptcy prediction model / Amirah Hazwani Abdul Rahim ... [et al.]
title_sort investigating the effect of different sampling methods on imbalanced datasets using bankruptcy prediction model / amirah hazwani abdul rahim ... [et al.]
publishDate 2021
url https://ir.uitm.edu.my/id/eprint/56212/1/56212.pdf
https://ir.uitm.edu.my/id/eprint/56212/
_version_ 1761622269607542784
score 13.211869