An evaluation of various data pre-processing techniques with machine learning models for water level prediction

artificial neural network; data processing; decomposition analysis; machine learning; prediction; river water; support vector machine; water level; Dungun Basin; Malaysia; Terengganu; West Malaysia

Saved in:
Bibliographic Details
Main Authors: Tiu E.S.K., Huang Y.F., Ng J.L., AlDahoul N., Ahmed A.N., Elshafie A.
Other Authors: 57202286717
Format: Article
Published: Springer Science and Business Media B.V. 2023
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.uniten.dspace-27313
record_format dspace
spelling my.uniten.dspace-273132023-05-29T17:42:33Z An evaluation of various data pre-processing techniques with machine learning models for water level prediction Tiu E.S.K. Huang Y.F. Ng J.L. AlDahoul N. Ahmed A.N. Elshafie A. 57202286717 55807263900 57192698412 56656478800 57214837520 16068189400 artificial neural network; data processing; decomposition analysis; machine learning; prediction; river water; support vector machine; water level; Dungun Basin; Malaysia; Terengganu; West Malaysia Floods are the most frequent type of natural disaster. It destroys wildlife habitat, damages bridges, railways, roads, properties, and puts millions of people at risk. As such, flood detection systems have been developed to monitor the changes of water level and raise an alarm should there be imminent danger. River water level prediction is a significant task in flood mitigation planning and floodplains management. Usually, using raw data of rainfall series directly with machine learning (ML) regression methods, does not result in sufficiently good prediction accuracy. The raw data should be pre-processed using specific techniques to enhance their quality a priori to being applied to the prediction methods. This paper serves to address the stated problem by utilizing various data pre-processing techniques such as the Variational Mode Decomposition (VMD), Bagging, Boosting, Bagging-VMD, and Boosting-VMD to enhance the quality of input data and thus culminating in improved model accuracy. The five proposed pre-processing techniques were applied to the observed daily rainfall series of the Dungun river basin, Malaysia, for the period starting from November to February (Northeast Monsoon) from 1996 to 2016. Two machine learning models, the base models (Ori), that is the artificial neural network (ANN) and the support vector regression (SVR), were used in conjunction with the data pre-processing methods. The comparison between the ML methods with and without data pre-processing was done. It was found that prediction of water levels with the two ML methods of SVR and ANN together with the Boosting-VMD was superior to those results derived with just the base original model (Ori). The advantage of the enhanced models (respectively, founded on SVR and ANN) over the original models (SVR and ANN) is best reflected in the performance statistics. Numerical results in terms of root mean square error (RMSE) of (0.42, 0.20 vs 1.85,1.82), mean absolute percentage error (MAPE) of (4.36, 2.82 vs 18.89, 22.56), mean absolute error (MAE) of (0.28,0.16 vs 1.25, 1.41), and Nash�Sutcliffe efficiency coefficient (NSE) (0.96, 0.99 vs 0.25, 0.27) were obtained for the respective models. Additionally, various data visualization graphs such as hydrographs, residual hydrographs, peak-estimates, and box and whisker plots were illustrated to compare between various data pre-processing techniques. The experimental results showed that both the Boosting and the Boosting-VMD methods showed better performance over the other techniques. The Boosting-ANN model was found to be the better model to predict river water levels with the lowest RMSE (0.19), MAPE (2.72), and MAE (0.15) and the highest NSE (0.99). � 2021, The Author(s), under exclusive licence to Springer Nature B.V. Final 2023-05-29T09:42:33Z 2023-05-29T09:42:33Z 2022 Article 10.1007/s11069-021-04939-8 2-s2.0-85111487732 https://www.scopus.com/inward/record.uri?eid=2-s2.0-85111487732&doi=10.1007%2fs11069-021-04939-8&partnerID=40&md5=f7c8d6581db2b5f2cb92943cb0b1d0d5 https://irepository.uniten.edu.my/handle/123456789/27313 110 1 121 153 Springer Science and Business Media B.V. Scopus
institution Universiti Tenaga Nasional
building UNITEN Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Tenaga Nasional
content_source UNITEN Institutional Repository
url_provider http://dspace.uniten.edu.my/
description artificial neural network; data processing; decomposition analysis; machine learning; prediction; river water; support vector machine; water level; Dungun Basin; Malaysia; Terengganu; West Malaysia
author2 57202286717
author_facet 57202286717
Tiu E.S.K.
Huang Y.F.
Ng J.L.
AlDahoul N.
Ahmed A.N.
Elshafie A.
format Article
author Tiu E.S.K.
Huang Y.F.
Ng J.L.
AlDahoul N.
Ahmed A.N.
Elshafie A.
spellingShingle Tiu E.S.K.
Huang Y.F.
Ng J.L.
AlDahoul N.
Ahmed A.N.
Elshafie A.
An evaluation of various data pre-processing techniques with machine learning models for water level prediction
author_sort Tiu E.S.K.
title An evaluation of various data pre-processing techniques with machine learning models for water level prediction
title_short An evaluation of various data pre-processing techniques with machine learning models for water level prediction
title_full An evaluation of various data pre-processing techniques with machine learning models for water level prediction
title_fullStr An evaluation of various data pre-processing techniques with machine learning models for water level prediction
title_full_unstemmed An evaluation of various data pre-processing techniques with machine learning models for water level prediction
title_sort evaluation of various data pre-processing techniques with machine learning models for water level prediction
publisher Springer Science and Business Media B.V.
publishDate 2023
_version_ 1806426113599078400
score 13.19449