Classification of sars-cov-2 and non-sars-cov-2 using machine learning algorithms
Due to the continued evolution of the SARS-CoV-2 pandemic, researchers worldwide are working to mitigate, suppress its spread, and better understand it by deploying digital signal processing (DSP) and machine learning approaches. This study presents an alignment-free approach to classify the SARS-Co...
Saved in:
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier Ltd.
2021
|
Subjects: | |
Online Access: | http://eprints.utm.my/id/eprint/95161/1/AhmadAthifMohd2021_ClassificationofSARS.pdf http://eprints.utm.my/id/eprint/95161/ http://dx.doi.org/10.1016/j.compbiomed.2021.104650 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
my.utm.95161 |
---|---|
record_format |
eprints |
spelling |
my.utm.951612022-04-29T22:02:39Z http://eprints.utm.my/id/eprint/95161/ Classification of sars-cov-2 and non-sars-cov-2 using machine learning algorithms Singh, O. P. Vallejo, M. Badawy, I. M. L. Aysha, A. Madhanagopal, J. Mohd. Faudzi, A. A. TK Electrical engineering. Electronics Nuclear engineering Due to the continued evolution of the SARS-CoV-2 pandemic, researchers worldwide are working to mitigate, suppress its spread, and better understand it by deploying digital signal processing (DSP) and machine learning approaches. This study presents an alignment-free approach to classify the SARS-CoV-2 using complementary DNA, which is DNA synthesized from the single-stranded RNA virus. Herein, a total of 1582 samples, with different lengths of genome sequences from different regions, were collected from various data sources and divided into a SARS-CoV-2 and a non-SARS-CoV-2 group. We extracted eight biomarkers based on three-base periodicity, using DSP techniques, and ranked those based on a filter-based feature selection. The ranked biomarkers were fed into k-nearest neighbor, support vector machines, decision trees, and random forest classifiers for the classification of SARS-CoV-2 from other coronaviruses. The training dataset was used to test the performance of the classifiers based on accuracy and F-measure via 10-fold cross-validation. Kappa-scores were estimated to check the influence of unbalanced data. Further, 10 × 10 cross-validation paired t-test was utilized to test the best model with unseen data. Random forest was elected as the best model, differentiating the SARS-CoV-2 coronavirus from other coronaviruses and a control a group with an accuracy of 97.4 %, sensitivity of 96.2 %, and specificity of 98.2 %, when tested with unseen samples. Moreover, the proposed algorithm was computationally efficient, taking only 0.31 s to compute the genome biomarkers, outperforming previous studies. Elsevier Ltd. 2021 Article PeerReviewed application/pdf en http://eprints.utm.my/id/eprint/95161/1/AhmadAthifMohd2021_ClassificationofSARS.pdf Singh, O. P. and Vallejo, M. and Badawy, I. M. L. and Aysha, A. and Madhanagopal, J. and Mohd. Faudzi, A. A. (2021) Classification of sars-cov-2 and non-sars-cov-2 using machine learning algorithms. Computers in Biology and Medicine, 136 . ISSN 0010-4825 http://dx.doi.org/10.1016/j.compbiomed.2021.104650 DOI: 10.1016/j.compbiomed.2021.104650 |
institution |
Universiti Teknologi Malaysia |
building |
UTM Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Teknologi Malaysia |
content_source |
UTM Institutional Repository |
url_provider |
http://eprints.utm.my/ |
language |
English |
topic |
TK Electrical engineering. Electronics Nuclear engineering |
spellingShingle |
TK Electrical engineering. Electronics Nuclear engineering Singh, O. P. Vallejo, M. Badawy, I. M. L. Aysha, A. Madhanagopal, J. Mohd. Faudzi, A. A. Classification of sars-cov-2 and non-sars-cov-2 using machine learning algorithms |
description |
Due to the continued evolution of the SARS-CoV-2 pandemic, researchers worldwide are working to mitigate, suppress its spread, and better understand it by deploying digital signal processing (DSP) and machine learning approaches. This study presents an alignment-free approach to classify the SARS-CoV-2 using complementary DNA, which is DNA synthesized from the single-stranded RNA virus. Herein, a total of 1582 samples, with different lengths of genome sequences from different regions, were collected from various data sources and divided into a SARS-CoV-2 and a non-SARS-CoV-2 group. We extracted eight biomarkers based on three-base periodicity, using DSP techniques, and ranked those based on a filter-based feature selection. The ranked biomarkers were fed into k-nearest neighbor, support vector machines, decision trees, and random forest classifiers for the classification of SARS-CoV-2 from other coronaviruses. The training dataset was used to test the performance of the classifiers based on accuracy and F-measure via 10-fold cross-validation. Kappa-scores were estimated to check the influence of unbalanced data. Further, 10 × 10 cross-validation paired t-test was utilized to test the best model with unseen data. Random forest was elected as the best model, differentiating the SARS-CoV-2 coronavirus from other coronaviruses and a control a group with an accuracy of 97.4 %, sensitivity of 96.2 %, and specificity of 98.2 %, when tested with unseen samples. Moreover, the proposed algorithm was computationally efficient, taking only 0.31 s to compute the genome biomarkers, outperforming previous studies. |
format |
Article |
author |
Singh, O. P. Vallejo, M. Badawy, I. M. L. Aysha, A. Madhanagopal, J. Mohd. Faudzi, A. A. |
author_facet |
Singh, O. P. Vallejo, M. Badawy, I. M. L. Aysha, A. Madhanagopal, J. Mohd. Faudzi, A. A. |
author_sort |
Singh, O. P. |
title |
Classification of sars-cov-2 and non-sars-cov-2 using machine learning algorithms |
title_short |
Classification of sars-cov-2 and non-sars-cov-2 using machine learning algorithms |
title_full |
Classification of sars-cov-2 and non-sars-cov-2 using machine learning algorithms |
title_fullStr |
Classification of sars-cov-2 and non-sars-cov-2 using machine learning algorithms |
title_full_unstemmed |
Classification of sars-cov-2 and non-sars-cov-2 using machine learning algorithms |
title_sort |
classification of sars-cov-2 and non-sars-cov-2 using machine learning algorithms |
publisher |
Elsevier Ltd. |
publishDate |
2021 |
url |
http://eprints.utm.my/id/eprint/95161/1/AhmadAthifMohd2021_ClassificationofSARS.pdf http://eprints.utm.my/id/eprint/95161/ http://dx.doi.org/10.1016/j.compbiomed.2021.104650 |
_version_ |
1732945439605391360 |
score |
13.159267 |