Staff View: Feature selection by mutual information: robust ranking on high- dimension low-sample-size data

Feature selection by mutual information: robust ranking on high- dimension low-sample-size data

Feature selection is a process of selecting a group of relevant features by removing unnecessary features for use in constructing the predictive model. The current benchmark for the data set is obtained by including all the features, such as redundancy and noise. Therefore, for this research, an opt...

Full description

Saved in:

Bibliographic Details
Main Author:	Chin, Fung Yuen
Format:	Final Year Project / Dissertation / Thesis
Published:	2024
Subjects:	HA Statistics Q Science (General)
Online Access:	http://eprints.utar.edu.my/7067/1/THE_1002128.pdf http://eprints.utar.edu.my/7067/
Tags:	Add Tag No Tags, Be the first to tag this record!

id	my-utar-eprints.7067
record_format	eprints
spelling	my-utar-eprints.70672025-01-19T01:59:21Z Feature selection by mutual information: robust ranking on high- dimension low-sample-size data Chin, Fung Yuen HA Statistics Q Science (General) Feature selection is a process of selecting a group of relevant features by removing unnecessary features for use in constructing the predictive model. The current benchmark for the data set is obtained by including all the features, such as redundancy and noise. Therefore, for this research, an optimal baseline for the data set will be proposed using the feature ranking method. To achieve this optimal baseline, a total number of features will be obtained at the same time to serve as the guideline on the number of features needed in a feature selection method. In addition, the high dimensional data which increases the difficulty on the features selection due to the curse of dimensionality. To overcome this problem, a robust feature selection algorithm, named ranked mutual information with support vector machine (rMI-SVM) can be applied on the data with missing value regardless of the linearity of the data set, as it does not require additional parameter or preset on the number of features needed. The features selected by rMI-SVM can avoid overfitting as the chosen candidate feature will provide new information to the predictive model. The receiver operating characteristic curve has been plotted to show the sensitivity of the model built by rMI-SVM compared to the regression method under the same number of features. Also, the Z- score graph was plotted to confirm that the features chosen by rMI-SVM were not selected by chance. The experimental results show that the proposed method can select a compact subset of features that can perform better than the benchmark of the data set and the optimal baseline proposed in this study. The biological meaning of the selected features confirmed that the selected features are related to the relevant disease. 2024 Final Year Project / Dissertation / Thesis NonPeerReviewed application/pdf http://eprints.utar.edu.my/7067/1/THE_1002128.pdf Chin, Fung Yuen (2024) Feature selection by mutual information: robust ranking on high- dimension low-sample-size data. PhD thesis, UTAR. http://eprints.utar.edu.my/7067/
institution	Universiti Tunku Abdul Rahman
building	UTAR Library
collection	Institutional Repository
continent	Asia
country	Malaysia
content_provider	Universiti Tunku Abdul Rahman
content_source	UTAR Institutional Repository
url_provider	http://eprints.utar.edu.my
topic	HA Statistics Q Science (General)
spellingShingle	HA Statistics Q Science (General) Chin, Fung Yuen Feature selection by mutual information: robust ranking on high- dimension low-sample-size data
description	Feature selection is a process of selecting a group of relevant features by removing unnecessary features for use in constructing the predictive model. The current benchmark for the data set is obtained by including all the features, such as redundancy and noise. Therefore, for this research, an optimal baseline for the data set will be proposed using the feature ranking method. To achieve this optimal baseline, a total number of features will be obtained at the same time to serve as the guideline on the number of features needed in a feature selection method. In addition, the high dimensional data which increases the difficulty on the features selection due to the curse of dimensionality. To overcome this problem, a robust feature selection algorithm, named ranked mutual information with support vector machine (rMI-SVM) can be applied on the data with missing value regardless of the linearity of the data set, as it does not require additional parameter or preset on the number of features needed. The features selected by rMI-SVM can avoid overfitting as the chosen candidate feature will provide new information to the predictive model. The receiver operating characteristic curve has been plotted to show the sensitivity of the model built by rMI-SVM compared to the regression method under the same number of features. Also, the Z- score graph was plotted to confirm that the features chosen by rMI-SVM were not selected by chance. The experimental results show that the proposed method can select a compact subset of features that can perform better than the benchmark of the data set and the optimal baseline proposed in this study. The biological meaning of the selected features confirmed that the selected features are related to the relevant disease.
format	Final Year Project / Dissertation / Thesis
author	Chin, Fung Yuen
author_facet	Chin, Fung Yuen
author_sort	Chin, Fung Yuen
title	Feature selection by mutual information: robust ranking on high- dimension low-sample-size data
title_short	Feature selection by mutual information: robust ranking on high- dimension low-sample-size data
title_full	Feature selection by mutual information: robust ranking on high- dimension low-sample-size data
title_fullStr	Feature selection by mutual information: robust ranking on high- dimension low-sample-size data
title_full_unstemmed	Feature selection by mutual information: robust ranking on high- dimension low-sample-size data
title_sort	feature selection by mutual information: robust ranking on high- dimension low-sample-size data
publishDate	2024
url	http://eprints.utar.edu.my/7067/1/THE_1002128.pdf http://eprints.utar.edu.my/7067/
_version_	1822896906458628096
score	13.23648

Feature selection by mutual information: robust ranking on high- dimension low-sample-size data

Similar Items