Application of gradient boosting regression model for the evaluation of feature selection techniques in improving reservoir characterisation predictions

Feature Selection, a critical data preprocessing step in machine learning, is an effective way in removing irrelevant variables, thus reducing the dimensionality of input features. Removing uninformative or, even worse, misinformative input columns helps train a machine learning model on a more gene...

Full description

Saved in:
Bibliographic Details
Main Authors: Otchere, D.A., Ganat, T.O.A., Ojero, J.O., Tackie-Otoo, B.N., Taki, M.Y.
Format: Article
Published: Elsevier B.V. 2022
Online Access:https://www.scopus.com/inward/record.uri?eid=2-s2.0-85113218886&doi=10.1016%2fj.petrol.2021.109244&partnerID=40&md5=82a135f80f4b611a342755fe48872095
http://eprints.utp.edu.my/28851/
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.utp.eprints.28851
record_format eprints
spelling my.utp.eprints.288512022-03-17T02:21:04Z Application of gradient boosting regression model for the evaluation of feature selection techniques in improving reservoir characterisation predictions Otchere, D.A. Ganat, T.O.A. Ojero, J.O. Tackie-Otoo, B.N. Taki, M.Y. Feature Selection, a critical data preprocessing step in machine learning, is an effective way in removing irrelevant variables, thus reducing the dimensionality of input features. Removing uninformative or, even worse, misinformative input columns helps train a machine learning model on a more generalised data with better performances on new and unseen data. In this paper, eight feature selection techniques paired with the gradient boosting regressor model were evaluated based on the statistical comparison of their prediction errors and computational efficiency in characterising a shallow marine reservoir. Analysis of the results shows that the best technique in selecting relevant logs for permeability, porosity and water saturation prediction was the Random Forest, SelectKBest and Lasso regularisation methods, respectively. These techniques did not only reduce the features of the high dimensional dataset but also achieved low prediction errors based on MAE and RMSE and improved computational efficiency. This indicates that the Random Forest, SelectKBest, and Lasso regularisation can identify the best input features for permeability, porosity and water saturation predictions, respectively. © 2021 Elsevier B.V. Elsevier B.V. 2022 Article NonPeerReviewed https://www.scopus.com/inward/record.uri?eid=2-s2.0-85113218886&doi=10.1016%2fj.petrol.2021.109244&partnerID=40&md5=82a135f80f4b611a342755fe48872095 Otchere, D.A. and Ganat, T.O.A. and Ojero, J.O. and Tackie-Otoo, B.N. and Taki, M.Y. (2022) Application of gradient boosting regression model for the evaluation of feature selection techniques in improving reservoir characterisation predictions. Journal of Petroleum Science and Engineering, 208 . http://eprints.utp.edu.my/28851/
institution Universiti Teknologi Petronas
building UTP Resource Centre
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Teknologi Petronas
content_source UTP Institutional Repository
url_provider http://eprints.utp.edu.my/
description Feature Selection, a critical data preprocessing step in machine learning, is an effective way in removing irrelevant variables, thus reducing the dimensionality of input features. Removing uninformative or, even worse, misinformative input columns helps train a machine learning model on a more generalised data with better performances on new and unseen data. In this paper, eight feature selection techniques paired with the gradient boosting regressor model were evaluated based on the statistical comparison of their prediction errors and computational efficiency in characterising a shallow marine reservoir. Analysis of the results shows that the best technique in selecting relevant logs for permeability, porosity and water saturation prediction was the Random Forest, SelectKBest and Lasso regularisation methods, respectively. These techniques did not only reduce the features of the high dimensional dataset but also achieved low prediction errors based on MAE and RMSE and improved computational efficiency. This indicates that the Random Forest, SelectKBest, and Lasso regularisation can identify the best input features for permeability, porosity and water saturation predictions, respectively. © 2021 Elsevier B.V.
format Article
author Otchere, D.A.
Ganat, T.O.A.
Ojero, J.O.
Tackie-Otoo, B.N.
Taki, M.Y.
spellingShingle Otchere, D.A.
Ganat, T.O.A.
Ojero, J.O.
Tackie-Otoo, B.N.
Taki, M.Y.
Application of gradient boosting regression model for the evaluation of feature selection techniques in improving reservoir characterisation predictions
author_facet Otchere, D.A.
Ganat, T.O.A.
Ojero, J.O.
Tackie-Otoo, B.N.
Taki, M.Y.
author_sort Otchere, D.A.
title Application of gradient boosting regression model for the evaluation of feature selection techniques in improving reservoir characterisation predictions
title_short Application of gradient boosting regression model for the evaluation of feature selection techniques in improving reservoir characterisation predictions
title_full Application of gradient boosting regression model for the evaluation of feature selection techniques in improving reservoir characterisation predictions
title_fullStr Application of gradient boosting regression model for the evaluation of feature selection techniques in improving reservoir characterisation predictions
title_full_unstemmed Application of gradient boosting regression model for the evaluation of feature selection techniques in improving reservoir characterisation predictions
title_sort application of gradient boosting regression model for the evaluation of feature selection techniques in improving reservoir characterisation predictions
publisher Elsevier B.V.
publishDate 2022
url https://www.scopus.com/inward/record.uri?eid=2-s2.0-85113218886&doi=10.1016%2fj.petrol.2021.109244&partnerID=40&md5=82a135f80f4b611a342755fe48872095
http://eprints.utp.edu.my/28851/
_version_ 1738656892438183936
score 13.18916