Multi-stage feature selection in identifying potential biomarkers for cancer classification

Biomarkers are indicators that show the disease state or its progression of certain health conditions. Identification of biomarkers greatly raises the probability of earlier diagnosis and could be further applied in developing effective treatment for the disease. Besides conducting laboratory analys...

Full description

Saved in:
Bibliographic Details
Main Authors: Wong, Yit Khee, Chan, Weng Howe, Nies, Hui Wen, Moorthy, Kohbalan
Format: Conference or Workshop Item
Language:English
English
Published: Institute of Electrical and Electronics Engineers Inc. 2022
Subjects:
Online Access:http://umpir.ump.edu.my/id/eprint/39085/1/Multi-stage%20feature%20selection%20in%20identifying%20potential%20biomarkers.pdf
http://umpir.ump.edu.my/id/eprint/39085/2/Multi-stage%20feature%20selection%20in%20identifying%20potential%20biomarkers%20for%20cancer%20classification_ABS.pdf
http://umpir.ump.edu.my/id/eprint/39085/
https://doi.org/10.1109/ICICyTA57421.2022.10037807
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.ump.umpir.39085
record_format eprints
spelling my.ump.umpir.390852023-11-14T03:49:05Z http://umpir.ump.edu.my/id/eprint/39085/ Multi-stage feature selection in identifying potential biomarkers for cancer classification Wong, Yit Khee Chan, Weng Howe Nies, Hui Wen Moorthy, Kohbalan QA75 Electronic computers. Computer science QA76 Computer software T Technology (General) TA Engineering (General). Civil engineering (General) Biomarkers are indicators that show the disease state or its progression of certain health conditions. Identification of biomarkers greatly raises the probability of earlier diagnosis and could be further applied in developing effective treatment for the disease. Besides conducting laboratory analysis, potential biomarkers also can be identified by analysing gene expression data through feature selection and machine learning. Many algorithms have been applied and introduced in this area, yet the challenge of high dimensionality of gene expression data remains and it could lead to the existence of noise that could negatively impact the analysis outcome. Therefore, this study aims to investigate and develop a better feature selection to identify potential biomarkers from gene expression data and construct a deep neural network classification model using these selected features. Thus, a multistage feature selection, namely CIR is proposed, that composed of Chi-square, Information Gain and Recursive Feature Elimination. The dataset used in this study consists of the integration of seven ovarian cancer gene expression datasets from GEO database. Both selected genes and classification model are evaluated through biological context verification and classification performance respectively. The proposed method shows improvements over the existing methods in terms of accuracy (+2.2294%), precision (+8.1415%), recall (+2.2294%), Fl-score (+4.5494%) and AUC scores (+0.2302). The proposed CIR method successfully identified eight genes that could be potential biomarkers for ovarian cancer, including WFDC2,S100A13, PRG4, NRCAM, OGN, B3GALT2, VGLL3, and GATM which are further verified through literature. Institute of Electrical and Electronics Engineers Inc. 2022 Conference or Workshop Item PeerReviewed pdf en http://umpir.ump.edu.my/id/eprint/39085/1/Multi-stage%20feature%20selection%20in%20identifying%20potential%20biomarkers.pdf pdf en http://umpir.ump.edu.my/id/eprint/39085/2/Multi-stage%20feature%20selection%20in%20identifying%20potential%20biomarkers%20for%20cancer%20classification_ABS.pdf Wong, Yit Khee and Chan, Weng Howe and Nies, Hui Wen and Moorthy, Kohbalan (2022) Multi-stage feature selection in identifying potential biomarkers for cancer classification. In: 2022 2nd International Conference on Intelligent Cybernetics Technology and Applications, ICICyTA 2022, 15-16 December 2022 , Virtual, Online. pp. 6-11. (186617). ISBN 979-835039913-4 https://doi.org/10.1109/ICICyTA57421.2022.10037807
institution Universiti Malaysia Pahang Al-Sultan Abdullah
building UMPSA Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Malaysia Pahang Al-Sultan Abdullah
content_source UMPSA Institutional Repository
url_provider http://umpir.ump.edu.my/
language English
English
topic QA75 Electronic computers. Computer science
QA76 Computer software
T Technology (General)
TA Engineering (General). Civil engineering (General)
spellingShingle QA75 Electronic computers. Computer science
QA76 Computer software
T Technology (General)
TA Engineering (General). Civil engineering (General)
Wong, Yit Khee
Chan, Weng Howe
Nies, Hui Wen
Moorthy, Kohbalan
Multi-stage feature selection in identifying potential biomarkers for cancer classification
description Biomarkers are indicators that show the disease state or its progression of certain health conditions. Identification of biomarkers greatly raises the probability of earlier diagnosis and could be further applied in developing effective treatment for the disease. Besides conducting laboratory analysis, potential biomarkers also can be identified by analysing gene expression data through feature selection and machine learning. Many algorithms have been applied and introduced in this area, yet the challenge of high dimensionality of gene expression data remains and it could lead to the existence of noise that could negatively impact the analysis outcome. Therefore, this study aims to investigate and develop a better feature selection to identify potential biomarkers from gene expression data and construct a deep neural network classification model using these selected features. Thus, a multistage feature selection, namely CIR is proposed, that composed of Chi-square, Information Gain and Recursive Feature Elimination. The dataset used in this study consists of the integration of seven ovarian cancer gene expression datasets from GEO database. Both selected genes and classification model are evaluated through biological context verification and classification performance respectively. The proposed method shows improvements over the existing methods in terms of accuracy (+2.2294%), precision (+8.1415%), recall (+2.2294%), Fl-score (+4.5494%) and AUC scores (+0.2302). The proposed CIR method successfully identified eight genes that could be potential biomarkers for ovarian cancer, including WFDC2,S100A13, PRG4, NRCAM, OGN, B3GALT2, VGLL3, and GATM which are further verified through literature.
format Conference or Workshop Item
author Wong, Yit Khee
Chan, Weng Howe
Nies, Hui Wen
Moorthy, Kohbalan
author_facet Wong, Yit Khee
Chan, Weng Howe
Nies, Hui Wen
Moorthy, Kohbalan
author_sort Wong, Yit Khee
title Multi-stage feature selection in identifying potential biomarkers for cancer classification
title_short Multi-stage feature selection in identifying potential biomarkers for cancer classification
title_full Multi-stage feature selection in identifying potential biomarkers for cancer classification
title_fullStr Multi-stage feature selection in identifying potential biomarkers for cancer classification
title_full_unstemmed Multi-stage feature selection in identifying potential biomarkers for cancer classification
title_sort multi-stage feature selection in identifying potential biomarkers for cancer classification
publisher Institute of Electrical and Electronics Engineers Inc.
publishDate 2022
url http://umpir.ump.edu.my/id/eprint/39085/1/Multi-stage%20feature%20selection%20in%20identifying%20potential%20biomarkers.pdf
http://umpir.ump.edu.my/id/eprint/39085/2/Multi-stage%20feature%20selection%20in%20identifying%20potential%20biomarkers%20for%20cancer%20classification_ABS.pdf
http://umpir.ump.edu.my/id/eprint/39085/
https://doi.org/10.1109/ICICyTA57421.2022.10037807
_version_ 1822923822866628608
score 13.232414