Exploration of COVID‑19 data in Malaysia through mapper graph

Huge amounts of data have been collected from various sources during the COVID-19 pandemic, providing a unique opportunity for analysis, data-driven modelling, and machine learning in understanding the complexity of COVID-19 more effectively and make informed decisions. To keep with the expanding qu...

Full description

Saved in:
Bibliographic Details
Main Authors: Carey Ling, Yu Fan, Piau, Phang, Liew, Siaw Hong, Vivek Jason, Jayaraj, Benchawan, Wiwatanapataphee
Format: Article
Language:English
Published: Springer Nature 2024
Subjects:
Online Access:http://ir.unimas.my/id/eprint/45351/3/Exploration%20of%20COVID%E2%80%9119%20data%20-%20Copy.pdf
http://ir.unimas.my/id/eprint/45351/
https://link.springer.com/article/10.1007/s13721-024-00472-3
https://doi.org/10.1007/s13721-024-00472-3
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Huge amounts of data have been collected from various sources during the COVID-19 pandemic, providing a unique opportunity for analysis, data-driven modelling, and machine learning in understanding the complexity of COVID-19 more effectively and make informed decisions. To keep with the expanding quantity and complexity of data while employing minimal assumptions, a topological data analysis tool known as the Mapper algorithm is used to explore Malaysia’s daily confirmed cases, deaths, and vaccination data from the onset of the pandemic to June 2022 via data visualization and clustering. A support vector-based feature selection and a heuristic approach for fine-tuning parameters internally within the algorithm are conducted. Two anomalous groups of nodes with exceptionally high case numbers emerged respectively for Delta and Omicron dominant periods in the Mapper graphs for daily data. Selangor cumulative cases have been found to be numerically dissimilar from other states from August 2021 onwards. The evolution of Mapper graphs revealed unique early COVID-19 progression in Johor, Negeri Sembilan, and Kuala Lumpur in the first half of 2020, followed by a significant increase in confirmed cases in Sabah in September 2020. Clusters identified by the Mapper algorithm are comparable with those obtained from principal component analysis and hierarchical clustering. Still, the hierarchical clustering does not further subdivide Selangor data into two to three separate clusters as the Mapper algorithm does. This research provides valuable insights for comprehending the pandemic timeline in Malaysia via the Mapper algorithm, which serves as a highly compact data visualization technique.