Imputation Analysis of Time-Series Data Using a Random Forest Algorithm

Missing data poses a significant challenge in extensive datasets, particularly those containing time-series information, leading to potential inaccuracies in data analysis and machine learning model development. To address the issue, this paper compared and evaluated four imputation methods: MissFor...

Full description

Saved in:
Bibliographic Details
Main Authors: Nur Najmiyah, Jaafar, Muhammad Nur Ajmal, Rosdi, Khairur Rijal, Jamaludin, Faizir, Ramlie, Habibah, Abdul Talib
Format: Conference or Workshop Item
Language:English
English
Published: Springer Singapore 2024
Subjects:
Online Access:http://umpir.ump.edu.my/id/eprint/41147/1/Imputation%20Analysis%20of%20Time-Series%20Data.pdf
http://umpir.ump.edu.my/id/eprint/41147/2/Imputation%20Analysis%20of%20Time-Series%20Data%20Using%20a%20Random%20Forest%20Algorithm.pdf
http://umpir.ump.edu.my/id/eprint/41147/
https://doi.org/10.1007/978-981-99-8819-8_4
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.ump.umpir.41147
record_format eprints
spelling my.ump.umpir.411472024-05-16T04:24:57Z http://umpir.ump.edu.my/id/eprint/41147/ Imputation Analysis of Time-Series Data Using a Random Forest Algorithm Nur Najmiyah, Jaafar Muhammad Nur Ajmal, Rosdi Khairur Rijal, Jamaludin Faizir, Ramlie Habibah, Abdul Talib TS Manufactures Missing data poses a significant challenge in extensive datasets, particularly those containing time-series information, leading to potential inaccuracies in data analysis and machine learning model development. To address the issue, this paper compared and evaluated four imputation methods: MissForest, MICE, Simplefill, and Softimpute which utilized Random Forest Algorithm. The research examines the impact of missing ratios and temporal variations on the performance of the imputation methods. The results indicated that MissForest consistently outperformed other methods, exhibiting the lowest RMSE values and a high coefficient of determination (R2), indicating its accuracy and ability to explain the variation in the data. Furthermore, graphical analyses demonstrated the stability of MissForest over time, while MICE and Simplefill showed higher sensitivity to date changes. Softimpute demonstrated relative consistency but slightly lower performance compared to MissForest. Overall, this study highlights the effectiveness of MissForest as the preferred imputation method for AVL time-series data. Springer Singapore 2024 Conference or Workshop Item PeerReviewed pdf en http://umpir.ump.edu.my/id/eprint/41147/1/Imputation%20Analysis%20of%20Time-Series%20Data.pdf pdf en http://umpir.ump.edu.my/id/eprint/41147/2/Imputation%20Analysis%20of%20Time-Series%20Data%20Using%20a%20Random%20Forest%20Algorithm.pdf Nur Najmiyah, Jaafar and Muhammad Nur Ajmal, Rosdi and Khairur Rijal, Jamaludin and Faizir, Ramlie and Habibah, Abdul Talib (2024) Imputation Analysis of Time-Series Data Using a Random Forest Algorithm. In: Intelligent Manufacturing and Mechatronics, Lecture Notes in Networks and Systems. 4th International conference on Innovative Manufacturing, Mechatronics and Materials Forum, iM3F2023 , 07 – 08 August 2023 , Pekan, Malaysia. pp. 51-60., 850. ISSN 2367-3389 ISBN 978-981-99-8819-8 https://doi.org/10.1007/978-981-99-8819-8_4
institution Universiti Malaysia Pahang Al-Sultan Abdullah
building UMPSA Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Malaysia Pahang Al-Sultan Abdullah
content_source UMPSA Institutional Repository
url_provider http://umpir.ump.edu.my/
language English
English
topic TS Manufactures
spellingShingle TS Manufactures
Nur Najmiyah, Jaafar
Muhammad Nur Ajmal, Rosdi
Khairur Rijal, Jamaludin
Faizir, Ramlie
Habibah, Abdul Talib
Imputation Analysis of Time-Series Data Using a Random Forest Algorithm
description Missing data poses a significant challenge in extensive datasets, particularly those containing time-series information, leading to potential inaccuracies in data analysis and machine learning model development. To address the issue, this paper compared and evaluated four imputation methods: MissForest, MICE, Simplefill, and Softimpute which utilized Random Forest Algorithm. The research examines the impact of missing ratios and temporal variations on the performance of the imputation methods. The results indicated that MissForest consistently outperformed other methods, exhibiting the lowest RMSE values and a high coefficient of determination (R2), indicating its accuracy and ability to explain the variation in the data. Furthermore, graphical analyses demonstrated the stability of MissForest over time, while MICE and Simplefill showed higher sensitivity to date changes. Softimpute demonstrated relative consistency but slightly lower performance compared to MissForest. Overall, this study highlights the effectiveness of MissForest as the preferred imputation method for AVL time-series data.
format Conference or Workshop Item
author Nur Najmiyah, Jaafar
Muhammad Nur Ajmal, Rosdi
Khairur Rijal, Jamaludin
Faizir, Ramlie
Habibah, Abdul Talib
author_facet Nur Najmiyah, Jaafar
Muhammad Nur Ajmal, Rosdi
Khairur Rijal, Jamaludin
Faizir, Ramlie
Habibah, Abdul Talib
author_sort Nur Najmiyah, Jaafar
title Imputation Analysis of Time-Series Data Using a Random Forest Algorithm
title_short Imputation Analysis of Time-Series Data Using a Random Forest Algorithm
title_full Imputation Analysis of Time-Series Data Using a Random Forest Algorithm
title_fullStr Imputation Analysis of Time-Series Data Using a Random Forest Algorithm
title_full_unstemmed Imputation Analysis of Time-Series Data Using a Random Forest Algorithm
title_sort imputation analysis of time-series data using a random forest algorithm
publisher Springer Singapore
publishDate 2024
url http://umpir.ump.edu.my/id/eprint/41147/1/Imputation%20Analysis%20of%20Time-Series%20Data.pdf
http://umpir.ump.edu.my/id/eprint/41147/2/Imputation%20Analysis%20of%20Time-Series%20Data%20Using%20a%20Random%20Forest%20Algorithm.pdf
http://umpir.ump.edu.my/id/eprint/41147/
https://doi.org/10.1007/978-981-99-8819-8_4
_version_ 1822924310426157056
score 13.235796