Investigating the effect of different sampling methods on imbalanced datasets using bankruptcy prediction model / Amirah Hazwani Abdul Rahim ... [et al.]

Most classifiers of bankruptcy studies encounter less difficulty when dealing with a balanced non-bankrupt and bankrupt data set. The classifiers evaluate performance of the model through the accuracy rate. However, accuracy rate is not an appropriate measurement when dealing with imbalanced distrib...

Full description

Saved in:
Bibliographic Details
Main Authors: Abdul Rahim, Amirah Hazwani, Abdul Rashid, Nurazlina, Ahmad, Abd-Razak, Shamsuddin, Norin Rahayu
Format: Conference or Workshop Item
Language:English
Published: 2021
Subjects:
Online Access:https://ir.uitm.edu.my/id/eprint/56212/1/56212.pdf
https://ir.uitm.edu.my/id/eprint/56212/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Most classifiers of bankruptcy studies encounter less difficulty when dealing with a balanced non-bankrupt and bankrupt data set. The classifiers evaluate performance of the model through the accuracy rate. However, accuracy rate is not an appropriate measurement when dealing with imbalanced distribution of the data set. Sensitivity and precision were used instead to measure the performance of the classifier. This study employed three sampling strategies to deal with imbalanced datasets: oversampling, undersampling, and SMOTE (Synthetic Minority Oversampling Technique). The intent of this research is to examine how different sampling methods impact the performance of a bankruptcy prediction model utilising highly imbalanced real data. SMEs in the storage and transportation business were the subject of the research. The sample size is 9190 firms with 0.084% bankrupt firms and 99.16% non-bankrupt firms. As a classifier, Partial Least Square-Discriminant Analysis (PLS-DA) was selected. The findings suggest that employing Partial Least Square-Discriminant Analysis, SMOTE increases the classification probability for an imbalanced dataset. In the meantime, neither oversampling nor undersampling improved the results of the Partial Least Square-Discriminant Analysis.