Kernel partial least square regression with high resistance to multiple outliers and bad leverage points on near-infrared spectral data analysis

Multivariate statistical analysis such as partial least square regression (PLSR) is the common data processing technique used to handle high-dimensional data space on near-infrared (NIR) spectral datasets. The PLSR is useful to tackle the multicollinearity and heteroscedasticity problem that can be...

Full description

Saved in:
Bibliographic Details
Main Authors: Silalahi, Divo Dharma, Midi, Habshah, Arasan, Jayanthi, Mustafa, Mohd Shafie, Caliman, Jean-Pierre
Format: Article
Published: MDPI AG 2021
Online Access:http://psasir.upm.edu.my/id/eprint/93966/
https://www.mdpi.com/2073-8994/13/4/547
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.upm.eprints.93966
record_format eprints
spelling my.upm.eprints.939662023-04-12T04:41:32Z http://psasir.upm.edu.my/id/eprint/93966/ Kernel partial least square regression with high resistance to multiple outliers and bad leverage points on near-infrared spectral data analysis Silalahi, Divo Dharma Midi, Habshah Arasan, Jayanthi Mustafa, Mohd Shafie Caliman, Jean-Pierre Multivariate statistical analysis such as partial least square regression (PLSR) is the common data processing technique used to handle high-dimensional data space on near-infrared (NIR) spectral datasets. The PLSR is useful to tackle the multicollinearity and heteroscedasticity problem that can be commonly found in such data space. With the problem of the nonlinear structure in the original input space, the use of the classical PLSR model might not be appropriate. In addition, the contamination of multiple outliers and high leverage points (HLPs) in the dataset could further damage the model. Generally, HLPs contain both good leverage points (GLPs) and bad leverage points (BLPs); therefore, in this case, removing the BLPs seems relevant since it has a significant impact on the parameter estimates and can slow down the convergence process. On the other hand, the GLPs provide a good efficiency in the model calibration process; thus, they should not be eliminated. In this study, robust alternatives to the existing kernel partial least square (KPLS) regression, which are called the kernel partial robust GM6-estimator (KPRGM6) regression and the kernel partial robust modified GM6-estimator (KPRMGM6) regression are introduced. The nonlinear solution on PLSR was handled through kernel-based learning by nonlinearly projecting the original input data matrix into a high-dimensional feature mapping that corresponded to the reproducing kernel Hilbert spaces (RKHS). To increase the robustness, the improvements on GM6 estimators are presented with the nonlinear PLSR. Based on the investigation using several artificial dataset scenarios from Monte Carlo simulations and two sets from the near-infrared (NIR) spectral dataset, the proposed robust KPRMGM6 is found to be superior to the robust KPRGM6 and non-robust KPLS. MDPI AG 2021-03-26 Article PeerReviewed Silalahi, Divo Dharma and Midi, Habshah and Arasan, Jayanthi and Mustafa, Mohd Shafie and Caliman, Jean-Pierre (2021) Kernel partial least square regression with high resistance to multiple outliers and bad leverage points on near-infrared spectral data analysis. Symmetry, 13 (4). pp. 1-23. ISSN 2073-8994 https://www.mdpi.com/2073-8994/13/4/547 10.3390/sym13040547
institution Universiti Putra Malaysia
building UPM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Putra Malaysia
content_source UPM Institutional Repository
url_provider http://psasir.upm.edu.my/
description Multivariate statistical analysis such as partial least square regression (PLSR) is the common data processing technique used to handle high-dimensional data space on near-infrared (NIR) spectral datasets. The PLSR is useful to tackle the multicollinearity and heteroscedasticity problem that can be commonly found in such data space. With the problem of the nonlinear structure in the original input space, the use of the classical PLSR model might not be appropriate. In addition, the contamination of multiple outliers and high leverage points (HLPs) in the dataset could further damage the model. Generally, HLPs contain both good leverage points (GLPs) and bad leverage points (BLPs); therefore, in this case, removing the BLPs seems relevant since it has a significant impact on the parameter estimates and can slow down the convergence process. On the other hand, the GLPs provide a good efficiency in the model calibration process; thus, they should not be eliminated. In this study, robust alternatives to the existing kernel partial least square (KPLS) regression, which are called the kernel partial robust GM6-estimator (KPRGM6) regression and the kernel partial robust modified GM6-estimator (KPRMGM6) regression are introduced. The nonlinear solution on PLSR was handled through kernel-based learning by nonlinearly projecting the original input data matrix into a high-dimensional feature mapping that corresponded to the reproducing kernel Hilbert spaces (RKHS). To increase the robustness, the improvements on GM6 estimators are presented with the nonlinear PLSR. Based on the investigation using several artificial dataset scenarios from Monte Carlo simulations and two sets from the near-infrared (NIR) spectral dataset, the proposed robust KPRMGM6 is found to be superior to the robust KPRGM6 and non-robust KPLS.
format Article
author Silalahi, Divo Dharma
Midi, Habshah
Arasan, Jayanthi
Mustafa, Mohd Shafie
Caliman, Jean-Pierre
spellingShingle Silalahi, Divo Dharma
Midi, Habshah
Arasan, Jayanthi
Mustafa, Mohd Shafie
Caliman, Jean-Pierre
Kernel partial least square regression with high resistance to multiple outliers and bad leverage points on near-infrared spectral data analysis
author_facet Silalahi, Divo Dharma
Midi, Habshah
Arasan, Jayanthi
Mustafa, Mohd Shafie
Caliman, Jean-Pierre
author_sort Silalahi, Divo Dharma
title Kernel partial least square regression with high resistance to multiple outliers and bad leverage points on near-infrared spectral data analysis
title_short Kernel partial least square regression with high resistance to multiple outliers and bad leverage points on near-infrared spectral data analysis
title_full Kernel partial least square regression with high resistance to multiple outliers and bad leverage points on near-infrared spectral data analysis
title_fullStr Kernel partial least square regression with high resistance to multiple outliers and bad leverage points on near-infrared spectral data analysis
title_full_unstemmed Kernel partial least square regression with high resistance to multiple outliers and bad leverage points on near-infrared spectral data analysis
title_sort kernel partial least square regression with high resistance to multiple outliers and bad leverage points on near-infrared spectral data analysis
publisher MDPI AG
publishDate 2021
url http://psasir.upm.edu.my/id/eprint/93966/
https://www.mdpi.com/2073-8994/13/4/547
_version_ 1762963278697332736
score 13.160551