Staff View: On the training sample size and classification performance: An experimental evaluation in seismic facies classification

On the training sample size and classification performance: An experimental evaluation in seismic facies classification

Machine learning algorithms (MLAs) perform better when enough high-quality training data is provided. However, a lack of training data is frequent in seismic facies classification and many other supervised learning applications. Data labeling for seismic facies classification is time-consuming and r...

Full description

Saved in:

Bibliographic Details
Main Authors:	Babikir, I., Elsaadany, M., Sajid, M., Laudon, C.
Format:	Article
Published:	Elsevier B.V. 2023
Online Access:	http://scholars.utp.edu.my/id/eprint/37516/ https://www.scopus.com/inward/record.uri?eid=2-s2.0-85159765394&doi=10.1016%2fj.geoen.2023.211809&partnerID=40&md5=bb201dd3f5ec9479db4d5d1224cf93c6
Tags:	Add Tag No Tags, Be the first to tag this record!

id	oai:scholars.utp.edu.my:37516
record_format	eprints
spelling	oai:scholars.utp.edu.my:375162023-10-04T13:30:40Z http://scholars.utp.edu.my/id/eprint/37516/ On the training sample size and classification performance: An experimental evaluation in seismic facies classification Babikir, I. Elsaadany, M. Sajid, M. Laudon, C. Machine learning algorithms (MLAs) perform better when enough high-quality training data is provided. However, a lack of training data is frequent in seismic facies classification and many other supervised learning applications. Data labeling for seismic facies classification is time-consuming and requires considerable effort from the domain knowledge expert. This study investigates the effect of training data size on the performance of three popular supervised MLAs used for seismic facies classification. We labeled slices from two seismic datasets of diverse geologic environments and varying classification complexity. AN Field in Malay Basin represents a simple classification problem with three classes, whereas a more complex six classes classification is defined in the Dangerous Grounds (DG) dataset offshore Sabah. The labeled data were constantly reduced by half, resulting in eight training subsets of varying sizes. We trained and evaluated support vector machine (SVM), random forest (RF), and neural network (NN) models using a 10-fold cross-validation (CV) procedure. Performance metrics were computed to study the change in performance in response to the training data size. The experimental results show that, for the DG dataset, where the classification is complex due to the heterogeneous geology and a more number of classes, the larger the training subset, the better the classification performance. Nevertheless, for the simple classification scenario of the AN dataset, the classifiers reached a performance plateau when trained on limited samples. We found that the NN model is the best performer on large datasets. The RF classifier performed well in both datasets. It proved to be robust when trained on limited samples of the DG data. The SVM performed the best where there was a clear margin of separation between the defined classes (the AN data). In contrast, it performed poorly on the DG data and exhibited a performance decline on the AN large subsets. Â© 2023 Elsevier B.V. Elsevier B.V. 2023 Article NonPeerReviewed Babikir, I. and Elsaadany, M. and Sajid, M. and Laudon, C. (2023) On the training sample size and classification performance: An experimental evaluation in seismic facies classification. Geoenergy Science and Engineering, 226. ISSN 29498910 https://www.scopus.com/inward/record.uri?eid=2-s2.0-85159765394&doi=10.1016%2fj.geoen.2023.211809&partnerID=40&md5=bb201dd3f5ec9479db4d5d1224cf93c6 10.1016/j.geoen.2023.211809 10.1016/j.geoen.2023.211809 10.1016/j.geoen.2023.211809
institution	Universiti Teknologi Petronas
building	UTP Resource Centre
collection	Institutional Repository
continent	Asia
country	Malaysia
content_provider	Universiti Teknologi Petronas
content_source	UTP Institutional Repository
url_provider	http://eprints.utp.edu.my/
description	Machine learning algorithms (MLAs) perform better when enough high-quality training data is provided. However, a lack of training data is frequent in seismic facies classification and many other supervised learning applications. Data labeling for seismic facies classification is time-consuming and requires considerable effort from the domain knowledge expert. This study investigates the effect of training data size on the performance of three popular supervised MLAs used for seismic facies classification. We labeled slices from two seismic datasets of diverse geologic environments and varying classification complexity. AN Field in Malay Basin represents a simple classification problem with three classes, whereas a more complex six classes classification is defined in the Dangerous Grounds (DG) dataset offshore Sabah. The labeled data were constantly reduced by half, resulting in eight training subsets of varying sizes. We trained and evaluated support vector machine (SVM), random forest (RF), and neural network (NN) models using a 10-fold cross-validation (CV) procedure. Performance metrics were computed to study the change in performance in response to the training data size. The experimental results show that, for the DG dataset, where the classification is complex due to the heterogeneous geology and a more number of classes, the larger the training subset, the better the classification performance. Nevertheless, for the simple classification scenario of the AN dataset, the classifiers reached a performance plateau when trained on limited samples. We found that the NN model is the best performer on large datasets. The RF classifier performed well in both datasets. It proved to be robust when trained on limited samples of the DG data. The SVM performed the best where there was a clear margin of separation between the defined classes (the AN data). In contrast, it performed poorly on the DG data and exhibited a performance decline on the AN large subsets. Â© 2023 Elsevier B.V.
format	Article
author	Babikir, I. Elsaadany, M. Sajid, M. Laudon, C.
spellingShingle	Babikir, I. Elsaadany, M. Sajid, M. Laudon, C. On the training sample size and classification performance: An experimental evaluation in seismic facies classification
author_facet	Babikir, I. Elsaadany, M. Sajid, M. Laudon, C.
author_sort	Babikir, I.
title	On the training sample size and classification performance: An experimental evaluation in seismic facies classification
title_short	On the training sample size and classification performance: An experimental evaluation in seismic facies classification
title_full	On the training sample size and classification performance: An experimental evaluation in seismic facies classification
title_fullStr	On the training sample size and classification performance: An experimental evaluation in seismic facies classification
title_full_unstemmed	On the training sample size and classification performance: An experimental evaluation in seismic facies classification
title_sort	on the training sample size and classification performance: an experimental evaluation in seismic facies classification
publisher	Elsevier B.V.
publishDate	2023
url	http://scholars.utp.edu.my/id/eprint/37516/ https://www.scopus.com/inward/record.uri?eid=2-s2.0-85159765394&doi=10.1016%2fj.geoen.2023.211809&partnerID=40&md5=bb201dd3f5ec9479db4d5d1224cf93c6
_version_	1779441395581321216
score	13.23648

On the training sample size and classification performance: An experimental evaluation in seismic facies classification

Similar Items