Prediction of rice biomass using machine learning algorithms

Conventional rice sampling methods are effective. However, they are destructive, laborious, time-consuming, impractical for large fields, and subject to human error. Unmanned aerial vehicles (UAVs) may address these issues. Machine learning algorithms (MLs) can predict rice biomass from UAV-based...

Full description

Saved in:
Bibliographic Details
Main Author: Radhwane, Derraz
Format: Thesis
Language:English
English
Published: 2022
Subjects:
Online Access:http://psasir.upm.edu.my/id/eprint/104544/1/FP%202022%2070%20-%20IR.pdf
http://psasir.upm.edu.my/id/eprint/104544/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Conventional rice sampling methods are effective. However, they are destructive, laborious, time-consuming, impractical for large fields, and subject to human error. Unmanned aerial vehicles (UAVs) may address these issues. Machine learning algorithms (MLs) can predict rice biomass from UAV-based vegetation indices (VIs). Nevertheless, VIs are highly collinear, noisy, and their large dataset collection is expensive. These issues affect the MLs' model performance, stability (under/overfitting), variance, and confidence. This study aims to: (i) compare the base and ensemble MLs’ model performance, variance, stability, and confidence for predicting rice biomass using collinear (multicollinearity context (MCC)) and non-collinear (non-multicollinearity context (NMCC)) VIs; (ii) compare the rice above ground biomass (TAGB) predictability from noised and Kalman filter’ denoised VIs using histogram gradient boosting regressor (HGBR); (iii) develop a trigonometric-Euclidean-smoother interpolator (TESI), including linear (LN-TESI), cubic (C-TESI), quadratic (Q-TESI), and logarithmic (L-TESI) interpolators, for continuous time-series and non-timeseries VIs data augmentation, and compare them to the tabular variational autoencoder (TVAE) and the conditional tabular generative adversarial network (CTGAN) for preventing DNN’s under/overfitting. A split-plot randomised complete block design (RCBD) experiment was conducted in a rice granary at Terengganu, Malaysia, with 120 quadrants. Each quadrant provides five rice biomass traits during the tillering, booting, and milking stages. A MicaSense Red- Edge multispectral camera mounted on a DJI quadcopter drone was used to acquire the blue, green, red, red-edge, and NIR bands to extract the VIs values corresponding to each quadrant. Besides the biomass dataset, the non-timeseries fertiliser dataset and the time-series oil palm and rice datasets were also collected to validate the TESI, TVAE, and CTGAN results. For the first objective, the MLs model performance and stability were better in MCC than in NMCC for predicting all rice biomass traits. The ensemble MLs outperformed the base MLs for predicting all rice biomass traits in MCC and NMCC. All base and ensemble MLs achieved inconsistent patterns of coefficient of determination (R2) and root mean squared error (RMSE) variances in MCC and NMCC. Multicollinearity and the base-ensemble MLs concept did not affect the model confidence; rather, the latter was subject to the cross-effects of the ML and dataset characteristics. For the second objective, the denoised VIs (R2 = 0.74-0.95, RMSE = 2.43–13.94 g q-1) outperformed the noised VIs (R2 = 0.63-0.90, RMSE = 3.28–17.91 g q-1) for the TAGB prediction. The denoised VIs achieved the highest R2 and lowest RMSE values at the booting stage (R2 = 0.93-0.95, RMSE = 8.22-9.30 g q-1), then tillering (R2 = 0.75-0.84, RMSE = 2.43-2.96 g q-1), and then milking stages (R2 = 0.74-0.80, RMSE = 13.34-13.94 g q-1). The HGBR achieved the lowest overfitting on the denoised VIs at the booting stage with a training-testing R2’s change (ΔR2) of 0.02-0.09 and a training-testing RMSE’s change (ΔRMSE) of 1.93-6.54 q-1, tillering (ΔR2 = 0.08-0.21, ΔRMSE = 1.23-2.36 g q-1), and then milking stages (ΔR2 = 0.14-0.25, ΔRMSE = 5.57-10.02 g q-1). For the third objective, the TESI, TVAE, and CTGAN were applied to increase the four datasets’ sizes. The TESI retained the features’ original probability distribution in the four datasets. The C-TESI achieved the lowest mean squared error mean percentage (MAEP) on the oil palm (0.60–2.85%), rice (0.77–1.72%), and fertiliser datasets (2.04–2.21%). The TESI retained the variance inflation factor (VIF) ranges less than 10 on the four datasets; the TESI retained a VIF range of 1.99–10.06 or reduced the VIF range to 1.55–6.66. Furthermore, the TESI retained the Spearman's r (rs) range of 0.79–0.97 or increased it to 0.81-0.99 on the four datasets. The DNN achieved the highest R2 (0.77–0.99) and lowest RMSE ranges (2.8E+01–8.1E+05) on the four datasets augmented with the TESI. The Q-TESI, C-TESI, and L-TESI overcame the LN-TESI in retaining the features’ original probability distribution, minimising the augmentation loss, reducing the VIF, increasing the rs, and decreasing the DNN under- and overfitting. Overall, as most of the agronomic research is conducted based on a few sensors’ bands, vegetation indices are highly collinear. Therefore, exploring the multilevel sensitivity of different MLs to multicollinearity may address the methodological choices of several future agronomic studies. Additionally, stable VI-biomass models accurately reflect rice yield potential, which may be significantly improved by VIs' denoising. Further, the Q-TESI, C-TESI, and LTESI minimise the proportionality of interpolation error to the square of the distance between the data points compared to the LN-TESI. Consequently, the Q-TESI, C-TESI, and L-TESI may approximate the nonlinear changes of crop phenology in time-spaced sampling, thereby reducing the cost of sampling for scientists. Furthermore, they intensify non-time series zonal, synthetic sampling, which reduces sampling labour.