Comparisons of various imputation methods for incomplete water quality data: a case study of the Langat River, Malaysia
In this study, the ability of numerous statistical and machine learning models to impute water quality data was investigated at three monitoring stations along the Langat River in Malaysia. Inconsistencies in the percentage of missing data between monitoring stations (varying from 20 percent (modera...
Saved in:
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Penerbit Universiti Kebangsaan Malaysia
2023
|
Online Access: | http://journalarticle.ukm.my/21963/1/kjt_18.pdf http://journalarticle.ukm.my/21963/ https://www.ukm.my/jkukm/volume-3501-2023/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
my-ukm.journal.21963 |
---|---|
record_format |
eprints |
spelling |
my-ukm.journal.219632023-07-27T05:58:45Z http://journalarticle.ukm.my/21963/ Comparisons of various imputation methods for incomplete water quality data: a case study of the Langat River, Malaysia Naeimah Mamat, Siti Fatin Mohd Razali, In this study, the ability of numerous statistical and machine learning models to impute water quality data was investigated at three monitoring stations along the Langat River in Malaysia. Inconsistencies in the percentage of missing data between monitoring stations (varying from 20 percent (moderate) to over 50 percent (high)) represent the greatest obstacle of the study. The main objective was to select the best method for imputation and compare whether there are differences between the methods used by the different stations. The paper focuses on different imputation methods such as Multiple Predictive Mean Matching (PMM), Multiple Random Forest Imputation (RF), Multiple Bayesian Linear Regression Imputation (BLR), Multiple Linear Regression (non-Bayesian) Imputation (LRNB), Multiple Classification and Regression Tree (CART), k-nearest neighbours (kNN) and Bootstrap-based Expectation Maximisation (EMB). Remarkably, among all seven imputation techniques, the kNN produces identically reliable results. The imputed data is all rated as ‘very good’ (NSE > 0.75). This was confirmed by the calculation of |PBIAS|<5.30 (all imputed data are‘very good’) and KGE≥0.87 (all imputations are rated as’ good’). Imputation performance improves for all three monitoring stations with an index of agreement, WI ≥ 0.94, despite varying percentages of missing data. According to the findings, the kNN imputation approach outperforms the others and should be prioritised in actual use. Future research with the existing methods could benefit from the addition of geographical data. Penerbit Universiti Kebangsaan Malaysia 2023 Article PeerReviewed application/pdf en http://journalarticle.ukm.my/21963/1/kjt_18.pdf Naeimah Mamat, and Siti Fatin Mohd Razali, (2023) Comparisons of various imputation methods for incomplete water quality data: a case study of the Langat River, Malaysia. Jurnal Kejuruteraan, 35 (1). pp. 191-201. ISSN 0128-0198 https://www.ukm.my/jkukm/volume-3501-2023/ |
institution |
Universiti Kebangsaan Malaysia |
building |
Tun Sri Lanang Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Kebangsaan Malaysia |
content_source |
UKM Journal Article Repository |
url_provider |
http://journalarticle.ukm.my/ |
language |
English |
description |
In this study, the ability of numerous statistical and machine learning models to impute water quality data was investigated at three monitoring stations along the Langat River in Malaysia. Inconsistencies in the percentage of missing data between monitoring stations (varying from 20 percent (moderate) to over 50 percent (high)) represent the greatest obstacle of the study. The main objective was to select the best method for imputation and compare whether there are differences between the methods used by the different stations. The paper focuses on different imputation methods such as Multiple Predictive Mean Matching (PMM), Multiple Random Forest Imputation (RF), Multiple Bayesian Linear Regression Imputation (BLR), Multiple Linear Regression (non-Bayesian) Imputation (LRNB), Multiple Classification and Regression Tree (CART), k-nearest neighbours (kNN) and Bootstrap-based Expectation Maximisation (EMB). Remarkably, among all seven imputation techniques, the kNN produces identically reliable results. The imputed data is all rated as ‘very good’ (NSE > 0.75). This was confirmed by the calculation of |PBIAS|<5.30 (all imputed data are‘very good’) and KGE≥0.87 (all imputations are rated as’ good’). Imputation performance improves for all three monitoring stations with an index of agreement, WI ≥ 0.94, despite varying percentages of missing data. According to the findings, the kNN imputation approach outperforms the others and should be prioritised in actual use. Future research with the existing methods could benefit from the addition of geographical data. |
format |
Article |
author |
Naeimah Mamat, Siti Fatin Mohd Razali, |
spellingShingle |
Naeimah Mamat, Siti Fatin Mohd Razali, Comparisons of various imputation methods for incomplete water quality data: a case study of the Langat River, Malaysia |
author_facet |
Naeimah Mamat, Siti Fatin Mohd Razali, |
author_sort |
Naeimah Mamat, |
title |
Comparisons of various imputation methods for incomplete water quality data: a case study of the Langat River, Malaysia |
title_short |
Comparisons of various imputation methods for incomplete water quality data: a case study of the Langat River, Malaysia |
title_full |
Comparisons of various imputation methods for incomplete water quality data: a case study of the Langat River, Malaysia |
title_fullStr |
Comparisons of various imputation methods for incomplete water quality data: a case study of the Langat River, Malaysia |
title_full_unstemmed |
Comparisons of various imputation methods for incomplete water quality data: a case study of the Langat River, Malaysia |
title_sort |
comparisons of various imputation methods for incomplete water quality data: a case study of the langat river, malaysia |
publisher |
Penerbit Universiti Kebangsaan Malaysia |
publishDate |
2023 |
url |
http://journalarticle.ukm.my/21963/1/kjt_18.pdf http://journalarticle.ukm.my/21963/ https://www.ukm.my/jkukm/volume-3501-2023/ |
_version_ |
1772812468192542720 |
score |
13.214268 |