A modified reweighted fast consistent and high-breakdown estimator for high-dimensional datasets

Outlier detection and classification algorithms play a critical role in statistical analysis. The reweighted fast consistent and high breakdown point (RFCH) estimator is an outlier-resistant estimator of multivariate location and dispersion. Still, some difficulties hamper the application of the RFC...

Full description

Saved in:
Bibliographic Details
Main Authors: A. Baba, Ishaq, Midi, Habshah, June, Leong W., Ibragimov, Gafurjan
Format: Article
Language:English
Published: Elsevier 2024
Online Access:http://psasir.upm.edu.my/id/eprint/112070/1/1-s2.0-S2772662224000286-main.pdf
http://psasir.upm.edu.my/id/eprint/112070/
https://www.sciencedirect.com/science/article/pii/S2772662224000286?via%3Dihub
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.upm.eprints.112070
record_format eprints
spelling my.upm.eprints.1120702024-10-28T02:34:06Z http://psasir.upm.edu.my/id/eprint/112070/ A modified reweighted fast consistent and high-breakdown estimator for high-dimensional datasets A. Baba, Ishaq Midi, Habshah June, Leong W. Ibragimov, Gafurjan Outlier detection and classification algorithms play a critical role in statistical analysis. The reweighted fast consistent and high breakdown point (RFCH) estimator is an outlier-resistant estimator of multivariate location and dispersion. Still, some difficulties hamper the application of the RFCH in high-dimensional settings. One main difficulty is that the RFCH cannot be applied when the dimension exceeds the sample size. We propose a modified reweighted fast consistent and high breakdown point (MRFCH) estimator to make it applicable to high-dimensional settings. The basic idea of our proposed method is to modify the Mahalanobis distance so that it uses only the diagonal elements of the scatter matrix in the computation of the RFCH algorithm. The proposed method preserves the robustness properties of the RFCH estimator. As a result, we achieve a robust and efficient high-dimensional procedure for computing location and scatter matrix estimates and a powerful outlier detection method. One of the main advantages of our proposed procedure over the existing RFCH is that it can be applied to both low and high-dimensional datasets. Based on the real-life datasets and simulation study, our proposed method showed promising results irrespective of sample size, dimensions, amount of contamination, computational time, and distance of the contamination. Thus, the new proposed algorithm can be applied to solve the problem of regression outliers in high-dimensional data (HDD) and serve as a better alternative to the minimum regularized covariance determinant (MRCD) estimator. © 2024 The Author(s) Elsevier 2024 Article PeerReviewed text en http://psasir.upm.edu.my/id/eprint/112070/1/1-s2.0-S2772662224000286-main.pdf A. Baba, Ishaq and Midi, Habshah and June, Leong W. and Ibragimov, Gafurjan (2024) A modified reweighted fast consistent and high-breakdown estimator for high-dimensional datasets. Decision Analytics Journal, 10. art. no. 100424. pp. 1-11. ISSN 2772-6622 https://www.sciencedirect.com/science/article/pii/S2772662224000286?via%3Dihub 10.1016/j.dajour.2024.100424
institution Universiti Putra Malaysia
building UPM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Putra Malaysia
content_source UPM Institutional Repository
url_provider http://psasir.upm.edu.my/
language English
description Outlier detection and classification algorithms play a critical role in statistical analysis. The reweighted fast consistent and high breakdown point (RFCH) estimator is an outlier-resistant estimator of multivariate location and dispersion. Still, some difficulties hamper the application of the RFCH in high-dimensional settings. One main difficulty is that the RFCH cannot be applied when the dimension exceeds the sample size. We propose a modified reweighted fast consistent and high breakdown point (MRFCH) estimator to make it applicable to high-dimensional settings. The basic idea of our proposed method is to modify the Mahalanobis distance so that it uses only the diagonal elements of the scatter matrix in the computation of the RFCH algorithm. The proposed method preserves the robustness properties of the RFCH estimator. As a result, we achieve a robust and efficient high-dimensional procedure for computing location and scatter matrix estimates and a powerful outlier detection method. One of the main advantages of our proposed procedure over the existing RFCH is that it can be applied to both low and high-dimensional datasets. Based on the real-life datasets and simulation study, our proposed method showed promising results irrespective of sample size, dimensions, amount of contamination, computational time, and distance of the contamination. Thus, the new proposed algorithm can be applied to solve the problem of regression outliers in high-dimensional data (HDD) and serve as a better alternative to the minimum regularized covariance determinant (MRCD) estimator. © 2024 The Author(s)
format Article
author A. Baba, Ishaq
Midi, Habshah
June, Leong W.
Ibragimov, Gafurjan
spellingShingle A. Baba, Ishaq
Midi, Habshah
June, Leong W.
Ibragimov, Gafurjan
A modified reweighted fast consistent and high-breakdown estimator for high-dimensional datasets
author_facet A. Baba, Ishaq
Midi, Habshah
June, Leong W.
Ibragimov, Gafurjan
author_sort A. Baba, Ishaq
title A modified reweighted fast consistent and high-breakdown estimator for high-dimensional datasets
title_short A modified reweighted fast consistent and high-breakdown estimator for high-dimensional datasets
title_full A modified reweighted fast consistent and high-breakdown estimator for high-dimensional datasets
title_fullStr A modified reweighted fast consistent and high-breakdown estimator for high-dimensional datasets
title_full_unstemmed A modified reweighted fast consistent and high-breakdown estimator for high-dimensional datasets
title_sort modified reweighted fast consistent and high-breakdown estimator for high-dimensional datasets
publisher Elsevier
publishDate 2024
url http://psasir.upm.edu.my/id/eprint/112070/1/1-s2.0-S2772662224000286-main.pdf
http://psasir.upm.edu.my/id/eprint/112070/
https://www.sciencedirect.com/science/article/pii/S2772662224000286?via%3Dihub
_version_ 1814936526225670144
score 13.211869