Filter-Based Gene Selection Method for Tissues Classification on Large Scale Gene Expression Data

DNA microarray technology is a current innovative tool that has offers a new perspective to look sight into cellular systems and measure a large scale of gene expressions at once. Regardless the novel invention of DNA microarray, most of its results relies on the computational intelligence power, wh...

Full description

Saved in:
Bibliographic Details
Main Authors: Kabir Ahmad, Farzana, Yusof, Yuhanis, Yusoff, Nooraini
Format: Article
Published: Science Publishing Corporation Inc 2018
Subjects:
Online Access:http://repo.uum.edu.my/25273/
http://doi.org/10.14419/ijet.v7i2.15.11216
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.uum.repo.25273
record_format eprints
spelling my.uum.repo.252732018-12-12T06:04:25Z http://repo.uum.edu.my/25273/ Filter-Based Gene Selection Method for Tissues Classification on Large Scale Gene Expression Data Kabir Ahmad, Farzana Yusof, Yuhanis Yusoff, Nooraini QA75 Electronic computers. Computer science DNA microarray technology is a current innovative tool that has offers a new perspective to look sight into cellular systems and measure a large scale of gene expressions at once. Regardless the novel invention of DNA microarray, most of its results relies on the computational intelligence power, which is used to interpret the large number of data. At present, interpreting large scale of gene expression data remain a thought-provoking issue due to their innate nature of “high dimensional low sample size”. Microarray data mainly involved thousands of genes, n in a very small size sample, p. In addition, this data are often overwhelmed, over fitting and confused by the complexity of data analysis. Due to the nature of this microarray data, it is also common that a large number of genes may not be informative for classification purposes. For such a reason, many studies have used feature selection methods to select significant genes that present the maximum discriminative power between cancerous and normal tissues. In this study, we aim to investigate and compare the effectiveness of these four popular filter gene selection methods namely Signal-to-Noise ratio (SNR), Fisher Criterion (FC), Information Gain (IG) and t-Test in selecting informative genes that can distinguish cancer and normal tissues. Two common classifiers, Support Vector Machine (SVM) and Decision Tree (C4.5) are used to train the selected genes. These gene selection methods are tested on three large scales of gene expression datasets, namely breast cancer dataset, colon dataset, and lung dataset. This study has discovered that IG and SNR are more suitable to be used with SVM while IG fit for C4.5. In a colon dataset, SVM has achieved a specificity of 86% with SNR while and 80% for IG. In contract, C4.5 has obtained a specificity of 78% for IG on the identical dataset. These results indicate that SVM performed slightly better with IG pre-processed data compare to C4.5 on the same dataset. Science Publishing Corporation Inc 2018 Article PeerReviewed Kabir Ahmad, Farzana and Yusof, Yuhanis and Yusoff, Nooraini (2018) Filter-Based Gene Selection Method for Tissues Classification on Large Scale Gene Expression Data. International Journal of Engineering & Technology, 7 (2.15). pp. 68-71. ISSN 2227-524X http://doi.org/10.14419/ijet.v7i2.15.11216 doi:10.14419/ijet.v7i2.15.11216
institution Universiti Utara Malaysia
building UUM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Utara Malaysia
content_source UUM Institutionali Repository
url_provider http://repo.uum.edu.my/
topic QA75 Electronic computers. Computer science
spellingShingle QA75 Electronic computers. Computer science
Kabir Ahmad, Farzana
Yusof, Yuhanis
Yusoff, Nooraini
Filter-Based Gene Selection Method for Tissues Classification on Large Scale Gene Expression Data
description DNA microarray technology is a current innovative tool that has offers a new perspective to look sight into cellular systems and measure a large scale of gene expressions at once. Regardless the novel invention of DNA microarray, most of its results relies on the computational intelligence power, which is used to interpret the large number of data. At present, interpreting large scale of gene expression data remain a thought-provoking issue due to their innate nature of “high dimensional low sample size”. Microarray data mainly involved thousands of genes, n in a very small size sample, p. In addition, this data are often overwhelmed, over fitting and confused by the complexity of data analysis. Due to the nature of this microarray data, it is also common that a large number of genes may not be informative for classification purposes. For such a reason, many studies have used feature selection methods to select significant genes that present the maximum discriminative power between cancerous and normal tissues. In this study, we aim to investigate and compare the effectiveness of these four popular filter gene selection methods namely Signal-to-Noise ratio (SNR), Fisher Criterion (FC), Information Gain (IG) and t-Test in selecting informative genes that can distinguish cancer and normal tissues. Two common classifiers, Support Vector Machine (SVM) and Decision Tree (C4.5) are used to train the selected genes. These gene selection methods are tested on three large scales of gene expression datasets, namely breast cancer dataset, colon dataset, and lung dataset. This study has discovered that IG and SNR are more suitable to be used with SVM while IG fit for C4.5. In a colon dataset, SVM has achieved a specificity of 86% with SNR while and 80% for IG. In contract, C4.5 has obtained a specificity of 78% for IG on the identical dataset. These results indicate that SVM performed slightly better with IG pre-processed data compare to C4.5 on the same dataset.
format Article
author Kabir Ahmad, Farzana
Yusof, Yuhanis
Yusoff, Nooraini
author_facet Kabir Ahmad, Farzana
Yusof, Yuhanis
Yusoff, Nooraini
author_sort Kabir Ahmad, Farzana
title Filter-Based Gene Selection Method for Tissues Classification on Large Scale Gene Expression Data
title_short Filter-Based Gene Selection Method for Tissues Classification on Large Scale Gene Expression Data
title_full Filter-Based Gene Selection Method for Tissues Classification on Large Scale Gene Expression Data
title_fullStr Filter-Based Gene Selection Method for Tissues Classification on Large Scale Gene Expression Data
title_full_unstemmed Filter-Based Gene Selection Method for Tissues Classification on Large Scale Gene Expression Data
title_sort filter-based gene selection method for tissues classification on large scale gene expression data
publisher Science Publishing Corporation Inc
publishDate 2018
url http://repo.uum.edu.my/25273/
http://doi.org/10.14419/ijet.v7i2.15.11216
_version_ 1644284277039300608
score 13.2014675