Staff View: Bayesian random forests for high-dimensional classification and regression with complete and incomplete microarray data

Bayesian random forests for high-dimensional classification and regression with complete and incomplete microarray data

Random Forests (RF) are ensemble of trees methods widely used for data prediction, interpretation and variable selection purposes. The wide acceptance can be attributed to its robustness to high dimensionality problem. However, when the high-dimensional data is a sparse one, RF procedures are ineffi...

Full description

Saved in:

Bibliographic Details
Main Author:	Oyebayo, Olaniran Ridwan
Format:	Thesis
Language:	English English English
Published:	2018
Subjects:	QA76.75-76.765 Computer software
Online Access:	http://eprints.uthm.edu.my/326/1/24p%20OLANIRAN%20RIDWAN%20OYEBAYO.pdf http://eprints.uthm.edu.my/326/2/OLANIRAN%20RIDWAN%20OYEBAYO%20COPYRIGHT%20DECLARATION.pdf http://eprints.uthm.edu.my/326/3/OLANIRAN%20RIDWAN%20OYEBAYO%20WATERMARK.pdf http://eprints.uthm.edu.my/326/
Tags:	Add Tag No Tags, Be the first to tag this record!

id	my.uthm.eprints.326
record_format	eprints
spelling	my.uthm.eprints.3262021-07-21T04:51:58Z http://eprints.uthm.edu.my/326/ Bayesian random forests for high-dimensional classification and regression with complete and incomplete microarray data Oyebayo, Olaniran Ridwan QA76.75-76.765 Computer software Random Forests (RF) are ensemble of trees methods widely used for data prediction, interpretation and variable selection purposes. The wide acceptance can be attributed to its robustness to high dimensionality problem. However, when the high-dimensional data is a sparse one, RF procedures are inefficient. Thus, this thesis aims at improving the efficiency of RF by providing a probabilistic framework using Bayesian reasoning. The modification comprises of two main modelling problems: high-dimensionality and missing data. These problems were extensively studied within the scope of classification (binary and multi-class) and regression (linear and survival). The new procedure called Bayesian Random Forest (BRF) focuses on modification of terminal node parameter estimation and selection of random subsets for splitting. BRF algorithm combines the strengths of random subset and greedy selection procedures in creating new maximal ordered variable relevance weights. These weights are in turn used to develop new impurity functions for selecting optimal splits for each tree in a forest. BRF works mainly because the maximal weights are computed using a data-driven procedure called bootstrap prior which was shown to satisfy the uniformly minimum variance property under mild regularity conditions. In addition, BRF ensures that important variables are selected at each subset selection step, thus reducing false signals and eventually improving accuracy of models. As a further extension, missing covariates problem was also handled by pre-imputing the variables using Multivariate Imputation by Chain Equation (MICE) before building forests. Performance analysis was achieved using simulated and eighteen real-life classification and regression microarray cancer datasets. Empirical results from the data analysis established appreciable supremacy over RF and several other competing methods. Keyword: Random Forest, Bayesian Inference, Classification, Regression, Missing Data. 2018-11 Thesis NonPeerReviewed text en http://eprints.uthm.edu.my/326/1/24p%20OLANIRAN%20RIDWAN%20OYEBAYO.pdf text en http://eprints.uthm.edu.my/326/2/OLANIRAN%20RIDWAN%20OYEBAYO%20COPYRIGHT%20DECLARATION.pdf text en http://eprints.uthm.edu.my/326/3/OLANIRAN%20RIDWAN%20OYEBAYO%20WATERMARK.pdf Oyebayo, Olaniran Ridwan (2018) Bayesian random forests for high-dimensional classification and regression with complete and incomplete microarray data. Doctoral thesis, Universiti Tun Hussein Onn Malaysia.
institution	Universiti Tun Hussein Onn Malaysia
building	UTHM Library
collection	Institutional Repository
continent	Asia
country	Malaysia
content_provider	Universiti Tun Hussein Onn Malaysia
content_source	UTHM Institutional Repository
url_provider	http://eprints.uthm.edu.my/
language	English English English
topic	QA76.75-76.765 Computer software
spellingShingle	QA76.75-76.765 Computer software Oyebayo, Olaniran Ridwan Bayesian random forests for high-dimensional classification and regression with complete and incomplete microarray data
description	Random Forests (RF) are ensemble of trees methods widely used for data prediction, interpretation and variable selection purposes. The wide acceptance can be attributed to its robustness to high dimensionality problem. However, when the high-dimensional data is a sparse one, RF procedures are inefficient. Thus, this thesis aims at improving the efficiency of RF by providing a probabilistic framework using Bayesian reasoning. The modification comprises of two main modelling problems: high-dimensionality and missing data. These problems were extensively studied within the scope of classification (binary and multi-class) and regression (linear and survival). The new procedure called Bayesian Random Forest (BRF) focuses on modification of terminal node parameter estimation and selection of random subsets for splitting. BRF algorithm combines the strengths of random subset and greedy selection procedures in creating new maximal ordered variable relevance weights. These weights are in turn used to develop new impurity functions for selecting optimal splits for each tree in a forest. BRF works mainly because the maximal weights are computed using a data-driven procedure called bootstrap prior which was shown to satisfy the uniformly minimum variance property under mild regularity conditions. In addition, BRF ensures that important variables are selected at each subset selection step, thus reducing false signals and eventually improving accuracy of models. As a further extension, missing covariates problem was also handled by pre-imputing the variables using Multivariate Imputation by Chain Equation (MICE) before building forests. Performance analysis was achieved using simulated and eighteen real-life classification and regression microarray cancer datasets. Empirical results from the data analysis established appreciable supremacy over RF and several other competing methods. Keyword: Random Forest, Bayesian Inference, Classification, Regression, Missing Data.
format	Thesis
author	Oyebayo, Olaniran Ridwan
author_facet	Oyebayo, Olaniran Ridwan
author_sort	Oyebayo, Olaniran Ridwan
title	Bayesian random forests for high-dimensional classification and regression with complete and incomplete microarray data
title_short	Bayesian random forests for high-dimensional classification and regression with complete and incomplete microarray data
title_full	Bayesian random forests for high-dimensional classification and regression with complete and incomplete microarray data
title_fullStr	Bayesian random forests for high-dimensional classification and regression with complete and incomplete microarray data
title_full_unstemmed	Bayesian random forests for high-dimensional classification and regression with complete and incomplete microarray data
title_sort	bayesian random forests for high-dimensional classification and regression with complete and incomplete microarray data
publishDate	2018
url	http://eprints.uthm.edu.my/326/1/24p%20OLANIRAN%20RIDWAN%20OYEBAYO.pdf http://eprints.uthm.edu.my/326/2/OLANIRAN%20RIDWAN%20OYEBAYO%20COPYRIGHT%20DECLARATION.pdf http://eprints.uthm.edu.my/326/3/OLANIRAN%20RIDWAN%20OYEBAYO%20WATERMARK.pdf http://eprints.uthm.edu.my/326/
_version_	1738580722624495616
score	13.160551

Bayesian random forests for high-dimensional classification and regression with complete and incomplete microarray data

Similar Items