Classification of sars-cov-2 and non-sars-cov-2 using machine learning algorithms

Due to the continued evolution of the SARS-CoV-2 pandemic, researchers worldwide are working to mitigate, suppress its spread, and better understand it by deploying digital signal processing (DSP) and machine learning approaches. This study presents an alignment-free approach to classify the SARS-Co...

Full description

Saved in:
Bibliographic Details
Main Authors: Singh, O. P., Vallejo, M., Badawy, I. M. L., Aysha, A., Madhanagopal, J., Mohd. Faudzi, A. A.
Format: Article
Language:English
Published: Elsevier Ltd. 2021
Subjects:
Online Access:http://eprints.utm.my/id/eprint/95161/1/AhmadAthifMohd2021_ClassificationofSARS.pdf
http://eprints.utm.my/id/eprint/95161/
http://dx.doi.org/10.1016/j.compbiomed.2021.104650
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.utm.95161
record_format eprints
spelling my.utm.951612022-04-29T22:02:39Z http://eprints.utm.my/id/eprint/95161/ Classification of sars-cov-2 and non-sars-cov-2 using machine learning algorithms Singh, O. P. Vallejo, M. Badawy, I. M. L. Aysha, A. Madhanagopal, J. Mohd. Faudzi, A. A. TK Electrical engineering. Electronics Nuclear engineering Due to the continued evolution of the SARS-CoV-2 pandemic, researchers worldwide are working to mitigate, suppress its spread, and better understand it by deploying digital signal processing (DSP) and machine learning approaches. This study presents an alignment-free approach to classify the SARS-CoV-2 using complementary DNA, which is DNA synthesized from the single-stranded RNA virus. Herein, a total of 1582 samples, with different lengths of genome sequences from different regions, were collected from various data sources and divided into a SARS-CoV-2 and a non-SARS-CoV-2 group. We extracted eight biomarkers based on three-base periodicity, using DSP techniques, and ranked those based on a filter-based feature selection. The ranked biomarkers were fed into k-nearest neighbor, support vector machines, decision trees, and random forest classifiers for the classification of SARS-CoV-2 from other coronaviruses. The training dataset was used to test the performance of the classifiers based on accuracy and F-measure via 10-fold cross-validation. Kappa-scores were estimated to check the influence of unbalanced data. Further, 10 × 10 cross-validation paired t-test was utilized to test the best model with unseen data. Random forest was elected as the best model, differentiating the SARS-CoV-2 coronavirus from other coronaviruses and a control a group with an accuracy of 97.4 %, sensitivity of 96.2 %, and specificity of 98.2 %, when tested with unseen samples. Moreover, the proposed algorithm was computationally efficient, taking only 0.31 s to compute the genome biomarkers, outperforming previous studies. Elsevier Ltd. 2021 Article PeerReviewed application/pdf en http://eprints.utm.my/id/eprint/95161/1/AhmadAthifMohd2021_ClassificationofSARS.pdf Singh, O. P. and Vallejo, M. and Badawy, I. M. L. and Aysha, A. and Madhanagopal, J. and Mohd. Faudzi, A. A. (2021) Classification of sars-cov-2 and non-sars-cov-2 using machine learning algorithms. Computers in Biology and Medicine, 136 . ISSN 0010-4825 http://dx.doi.org/10.1016/j.compbiomed.2021.104650 DOI: 10.1016/j.compbiomed.2021.104650
institution Universiti Teknologi Malaysia
building UTM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Teknologi Malaysia
content_source UTM Institutional Repository
url_provider http://eprints.utm.my/
language English
topic TK Electrical engineering. Electronics Nuclear engineering
spellingShingle TK Electrical engineering. Electronics Nuclear engineering
Singh, O. P.
Vallejo, M.
Badawy, I. M. L.
Aysha, A.
Madhanagopal, J.
Mohd. Faudzi, A. A.
Classification of sars-cov-2 and non-sars-cov-2 using machine learning algorithms
description Due to the continued evolution of the SARS-CoV-2 pandemic, researchers worldwide are working to mitigate, suppress its spread, and better understand it by deploying digital signal processing (DSP) and machine learning approaches. This study presents an alignment-free approach to classify the SARS-CoV-2 using complementary DNA, which is DNA synthesized from the single-stranded RNA virus. Herein, a total of 1582 samples, with different lengths of genome sequences from different regions, were collected from various data sources and divided into a SARS-CoV-2 and a non-SARS-CoV-2 group. We extracted eight biomarkers based on three-base periodicity, using DSP techniques, and ranked those based on a filter-based feature selection. The ranked biomarkers were fed into k-nearest neighbor, support vector machines, decision trees, and random forest classifiers for the classification of SARS-CoV-2 from other coronaviruses. The training dataset was used to test the performance of the classifiers based on accuracy and F-measure via 10-fold cross-validation. Kappa-scores were estimated to check the influence of unbalanced data. Further, 10 × 10 cross-validation paired t-test was utilized to test the best model with unseen data. Random forest was elected as the best model, differentiating the SARS-CoV-2 coronavirus from other coronaviruses and a control a group with an accuracy of 97.4 %, sensitivity of 96.2 %, and specificity of 98.2 %, when tested with unseen samples. Moreover, the proposed algorithm was computationally efficient, taking only 0.31 s to compute the genome biomarkers, outperforming previous studies.
format Article
author Singh, O. P.
Vallejo, M.
Badawy, I. M. L.
Aysha, A.
Madhanagopal, J.
Mohd. Faudzi, A. A.
author_facet Singh, O. P.
Vallejo, M.
Badawy, I. M. L.
Aysha, A.
Madhanagopal, J.
Mohd. Faudzi, A. A.
author_sort Singh, O. P.
title Classification of sars-cov-2 and non-sars-cov-2 using machine learning algorithms
title_short Classification of sars-cov-2 and non-sars-cov-2 using machine learning algorithms
title_full Classification of sars-cov-2 and non-sars-cov-2 using machine learning algorithms
title_fullStr Classification of sars-cov-2 and non-sars-cov-2 using machine learning algorithms
title_full_unstemmed Classification of sars-cov-2 and non-sars-cov-2 using machine learning algorithms
title_sort classification of sars-cov-2 and non-sars-cov-2 using machine learning algorithms
publisher Elsevier Ltd.
publishDate 2021
url http://eprints.utm.my/id/eprint/95161/1/AhmadAthifMohd2021_ClassificationofSARS.pdf
http://eprints.utm.my/id/eprint/95161/
http://dx.doi.org/10.1016/j.compbiomed.2021.104650
_version_ 1732945439605391360
score 13.159267