Optimal Weighted Learning of PCA and PLS for Multicollinearity Discriminators and Imbalanced Groups in Big Data (S/O: 13224)

This study developed an algorithm for statistical classification that enable ones to classify a future data to one of predetermined groups based on the measured data which facing two major threats; (i) multicollinearity among the measured variables and (ii) imbalanced groups. The developed algorithm...

Full description

Saved in:
Bibliographic Details
Main Authors: Mahat, Nor Idayu, Engku Abu Bakar, Engku Muhammad Nazri, Zakaria, Ammar, Mohd Nazir, Mohd Amril Nurman, Misiran, Masnita
Format: Monograph
Language:English
Published: UUM
Subjects:
Online Access:https://repo.uum.edu.my/id/eprint/31770/1/13224.pdf
https://repo.uum.edu.my/id/eprint/31770/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This study developed an algorithm for statistical classification that enable ones to classify a future data to one of predetermined groups based on the measured data which facing two major threats; (i) multicollinearity among the measured variables and (ii) imbalanced groups. The developed algorithm weighted the n objects contribution in explaining the separation between groups. Then, the weights are used together with either Principal Component Analysis (PCA) or Partial Least Square (PLS) to tackle the collinearity among variables. Next, the weighted and transformed features were used to train Linear Discriminant Function (LDA) and to evaluate the constructed rule. The designed algorithm was structured in k-fold cross-validation in attempt to minimise the biasness of the classification performance, measured using error rate. Both simulation on bivariate and multivariate cases show some promising results that the weighted PCA on LDA and the weighted PLS on LDA are better than the traditional LDA, kernel discriminant, and PCA+LDA methods. Whilst, critical investigation on the minority group using sensitivity value has given some evidence how the two proposed methods are competitive, but they are similar if the groups are well separated. Evidence obtained from the real data sets also providing similar results to the simulated ones. Hence, both weighted PCA on LDA and the weighted PLS on LDA can be recommended to discriminate imbalanced groups with correlated variables