Comparisons of various imputation methods for incomplete water quality data: a case study of the Langat River, Malaysia

In this study, the ability of numerous statistical and machine learning models to impute water quality data was investigated at three monitoring stations along the Langat River in Malaysia. Inconsistencies in the percentage of missing data between monitoring stations (varying from 20 percent (modera...

Full description

Saved in:
Bibliographic Details
Main Authors: Naeimah Mamat,, Siti Fatin Mohd Razali,
Format: Article
Language:English
Published: Penerbit Universiti Kebangsaan Malaysia 2023
Online Access:http://journalarticle.ukm.my/21963/1/kjt_18.pdf
http://journalarticle.ukm.my/21963/
https://www.ukm.my/jkukm/volume-3501-2023/
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-ukm.journal.21963
record_format eprints
spelling my-ukm.journal.219632023-07-27T05:58:45Z http://journalarticle.ukm.my/21963/ Comparisons of various imputation methods for incomplete water quality data: a case study of the Langat River, Malaysia Naeimah Mamat, Siti Fatin Mohd Razali, In this study, the ability of numerous statistical and machine learning models to impute water quality data was investigated at three monitoring stations along the Langat River in Malaysia. Inconsistencies in the percentage of missing data between monitoring stations (varying from 20 percent (moderate) to over 50 percent (high)) represent the greatest obstacle of the study. The main objective was to select the best method for imputation and compare whether there are differences between the methods used by the different stations. The paper focuses on different imputation methods such as Multiple Predictive Mean Matching (PMM), Multiple Random Forest Imputation (RF), Multiple Bayesian Linear Regression Imputation (BLR), Multiple Linear Regression (non-Bayesian) Imputation (LRNB), Multiple Classification and Regression Tree (CART), k-nearest neighbours (kNN) and Bootstrap-based Expectation Maximisation (EMB). Remarkably, among all seven imputation techniques, the kNN produces identically reliable results. The imputed data is all rated as ‘very good’ (NSE > 0.75). This was confirmed by the calculation of |PBIAS|<5.30 (all imputed data are‘very good’) and KGE≥0.87 (all imputations are rated as’ good’). Imputation performance improves for all three monitoring stations with an index of agreement, WI ≥ 0.94, despite varying percentages of missing data. According to the findings, the kNN imputation approach outperforms the others and should be prioritised in actual use. Future research with the existing methods could benefit from the addition of geographical data. Penerbit Universiti Kebangsaan Malaysia 2023 Article PeerReviewed application/pdf en http://journalarticle.ukm.my/21963/1/kjt_18.pdf Naeimah Mamat, and Siti Fatin Mohd Razali, (2023) Comparisons of various imputation methods for incomplete water quality data: a case study of the Langat River, Malaysia. Jurnal Kejuruteraan, 35 (1). pp. 191-201. ISSN 0128-0198 https://www.ukm.my/jkukm/volume-3501-2023/
institution Universiti Kebangsaan Malaysia
building Tun Sri Lanang Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Kebangsaan Malaysia
content_source UKM Journal Article Repository
url_provider http://journalarticle.ukm.my/
language English
description In this study, the ability of numerous statistical and machine learning models to impute water quality data was investigated at three monitoring stations along the Langat River in Malaysia. Inconsistencies in the percentage of missing data between monitoring stations (varying from 20 percent (moderate) to over 50 percent (high)) represent the greatest obstacle of the study. The main objective was to select the best method for imputation and compare whether there are differences between the methods used by the different stations. The paper focuses on different imputation methods such as Multiple Predictive Mean Matching (PMM), Multiple Random Forest Imputation (RF), Multiple Bayesian Linear Regression Imputation (BLR), Multiple Linear Regression (non-Bayesian) Imputation (LRNB), Multiple Classification and Regression Tree (CART), k-nearest neighbours (kNN) and Bootstrap-based Expectation Maximisation (EMB). Remarkably, among all seven imputation techniques, the kNN produces identically reliable results. The imputed data is all rated as ‘very good’ (NSE > 0.75). This was confirmed by the calculation of |PBIAS|<5.30 (all imputed data are‘very good’) and KGE≥0.87 (all imputations are rated as’ good’). Imputation performance improves for all three monitoring stations with an index of agreement, WI ≥ 0.94, despite varying percentages of missing data. According to the findings, the kNN imputation approach outperforms the others and should be prioritised in actual use. Future research with the existing methods could benefit from the addition of geographical data.
format Article
author Naeimah Mamat,
Siti Fatin Mohd Razali,
spellingShingle Naeimah Mamat,
Siti Fatin Mohd Razali,
Comparisons of various imputation methods for incomplete water quality data: a case study of the Langat River, Malaysia
author_facet Naeimah Mamat,
Siti Fatin Mohd Razali,
author_sort Naeimah Mamat,
title Comparisons of various imputation methods for incomplete water quality data: a case study of the Langat River, Malaysia
title_short Comparisons of various imputation methods for incomplete water quality data: a case study of the Langat River, Malaysia
title_full Comparisons of various imputation methods for incomplete water quality data: a case study of the Langat River, Malaysia
title_fullStr Comparisons of various imputation methods for incomplete water quality data: a case study of the Langat River, Malaysia
title_full_unstemmed Comparisons of various imputation methods for incomplete water quality data: a case study of the Langat River, Malaysia
title_sort comparisons of various imputation methods for incomplete water quality data: a case study of the langat river, malaysia
publisher Penerbit Universiti Kebangsaan Malaysia
publishDate 2023
url http://journalarticle.ukm.my/21963/1/kjt_18.pdf
http://journalarticle.ukm.my/21963/
https://www.ukm.my/jkukm/volume-3501-2023/
_version_ 1772812468192542720
score 13.214268