Water quality index using modified random forest technique: Assessing novel input features

Water quality analysis is essential to understand the ecological status of aquatic life. Conventional water quality index (WQI) assessment methods are limited to features such as water acidic or basicity (pH), dissolved oxygen (DO), biological oxygen demand (BOD), chemical oxygen demand (COD), ammon...

Full description

Saved in:
Bibliographic Details
Main Authors: Wong, Wen Yee, Al-Ani, Ayman Khallel Ibrahim, Hasikin, Khairunnisa, Mohd Khairuddin, Anis Salwa, Abdul Razak, Sarah, Hizaddin, Hanee Farzana, Mokhtar, Mohd Istajib, Azizan, Muhammad Mokhzaini
Format: Article
Published: Tech Science Press 2022
Subjects:
Online Access:http://eprints.um.edu.my/41705/
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.um.eprints.41705
record_format eprints
spelling my.um.eprints.417052023-11-22T04:33:36Z http://eprints.um.edu.my/41705/ Water quality index using modified random forest technique: Assessing novel input features Wong, Wen Yee Al-Ani, Ayman Khallel Ibrahim Hasikin, Khairunnisa Mohd Khairuddin, Anis Salwa Abdul Razak, Sarah Hizaddin, Hanee Farzana Mokhtar, Mohd Istajib Azizan, Muhammad Mokhzaini TD Environmental technology. Sanitary engineering Water quality analysis is essential to understand the ecological status of aquatic life. Conventional water quality index (WQI) assessment methods are limited to features such as water acidic or basicity (pH), dissolved oxygen (DO), biological oxygen demand (BOD), chemical oxygen demand (COD), ammoniacal nitrogen (NH3-N), and suspended solids (SS). These features are often insufficient to represent the water quality of a heavy metal-polluted river. Therefore, this paper aims to explore and analyze novel input features in order to formulate an improved WQI. In this work, prospective insights on the feasibility of alternative water quality input variables as new discriminant features are discussed. The new discriminant features are a step toward formulating adaptive water quality parameters according to the land use activities surrounding the river. The results and analysis obtained from this study have proven the possibility of predicting WQI using new input features. This work analyzes 17 new input features, namely conductivity (COND), salinity (SAL), turbidity (TUR), dissolved solids (DS), nitrate (NO3), chloride (Cl), phosphate (PO4), arsenic (As), chromium (Cr), zinc (Zn), calcium (Ca), iron (Fe), potassium (K), magnesium (Mg), sodium (Na), E. coli, and total coliform, in predicting WQI using machine learning techniques. Five regression algorithms-randomforest (RF), AdaBoost, support vector regression (SVR), decision tree regression (DTR), and multilayer perception (MLP)-are applied for preliminary model selection. The results show that the RF algorithm exhibits better prediction performance, with R-2 of 0.974. Then, this work proposes a modified RF by incorporating the synthetic minority oversampling technique (SMOTE) into the conventional RF method. The proposed modified RF method is shown to achieve 77.68%, 74%, 69%, and 71% accuracy, precision, recall, and F1-score, respectively. In addition, the sensitivity analysis is included to highlight the importance of the turbidity variable in WQI prediction. The results of sensitivity analysis highlight the importance of certain water quality variables that are not present in the conventionalWQI formulation. Tech Science Press 2022 Article PeerReviewed Wong, Wen Yee and Al-Ani, Ayman Khallel Ibrahim and Hasikin, Khairunnisa and Mohd Khairuddin, Anis Salwa and Abdul Razak, Sarah and Hizaddin, Hanee Farzana and Mokhtar, Mohd Istajib and Azizan, Muhammad Mokhzaini (2022) Water quality index using modified random forest technique: Assessing novel input features. CMES-Computer Modeling in Engineering & Sciences, 132 (3). pp. 1011-1038. ISSN 1526-1492,
institution Universiti Malaya
building UM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Malaya
content_source UM Research Repository
url_provider http://eprints.um.edu.my/
topic TD Environmental technology. Sanitary engineering
spellingShingle TD Environmental technology. Sanitary engineering
Wong, Wen Yee
Al-Ani, Ayman Khallel Ibrahim
Hasikin, Khairunnisa
Mohd Khairuddin, Anis Salwa
Abdul Razak, Sarah
Hizaddin, Hanee Farzana
Mokhtar, Mohd Istajib
Azizan, Muhammad Mokhzaini
Water quality index using modified random forest technique: Assessing novel input features
description Water quality analysis is essential to understand the ecological status of aquatic life. Conventional water quality index (WQI) assessment methods are limited to features such as water acidic or basicity (pH), dissolved oxygen (DO), biological oxygen demand (BOD), chemical oxygen demand (COD), ammoniacal nitrogen (NH3-N), and suspended solids (SS). These features are often insufficient to represent the water quality of a heavy metal-polluted river. Therefore, this paper aims to explore and analyze novel input features in order to formulate an improved WQI. In this work, prospective insights on the feasibility of alternative water quality input variables as new discriminant features are discussed. The new discriminant features are a step toward formulating adaptive water quality parameters according to the land use activities surrounding the river. The results and analysis obtained from this study have proven the possibility of predicting WQI using new input features. This work analyzes 17 new input features, namely conductivity (COND), salinity (SAL), turbidity (TUR), dissolved solids (DS), nitrate (NO3), chloride (Cl), phosphate (PO4), arsenic (As), chromium (Cr), zinc (Zn), calcium (Ca), iron (Fe), potassium (K), magnesium (Mg), sodium (Na), E. coli, and total coliform, in predicting WQI using machine learning techniques. Five regression algorithms-randomforest (RF), AdaBoost, support vector regression (SVR), decision tree regression (DTR), and multilayer perception (MLP)-are applied for preliminary model selection. The results show that the RF algorithm exhibits better prediction performance, with R-2 of 0.974. Then, this work proposes a modified RF by incorporating the synthetic minority oversampling technique (SMOTE) into the conventional RF method. The proposed modified RF method is shown to achieve 77.68%, 74%, 69%, and 71% accuracy, precision, recall, and F1-score, respectively. In addition, the sensitivity analysis is included to highlight the importance of the turbidity variable in WQI prediction. The results of sensitivity analysis highlight the importance of certain water quality variables that are not present in the conventionalWQI formulation.
format Article
author Wong, Wen Yee
Al-Ani, Ayman Khallel Ibrahim
Hasikin, Khairunnisa
Mohd Khairuddin, Anis Salwa
Abdul Razak, Sarah
Hizaddin, Hanee Farzana
Mokhtar, Mohd Istajib
Azizan, Muhammad Mokhzaini
author_facet Wong, Wen Yee
Al-Ani, Ayman Khallel Ibrahim
Hasikin, Khairunnisa
Mohd Khairuddin, Anis Salwa
Abdul Razak, Sarah
Hizaddin, Hanee Farzana
Mokhtar, Mohd Istajib
Azizan, Muhammad Mokhzaini
author_sort Wong, Wen Yee
title Water quality index using modified random forest technique: Assessing novel input features
title_short Water quality index using modified random forest technique: Assessing novel input features
title_full Water quality index using modified random forest technique: Assessing novel input features
title_fullStr Water quality index using modified random forest technique: Assessing novel input features
title_full_unstemmed Water quality index using modified random forest technique: Assessing novel input features
title_sort water quality index using modified random forest technique: assessing novel input features
publisher Tech Science Press
publishDate 2022
url http://eprints.um.edu.my/41705/
_version_ 1783876733287530496
score 13.209306