Grey relational analysis feature selection for cancer classification using support vector machine
Nowadays, cancer is one of the leading causes of death in the world. However, cancer can be treated if it is diagnosed earlier. Recently, machine learning classifiers are widely applied in cancer detection due to their accurate diagnosis in cancer classification problems. However, the performance of...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Published: |
2014
|
Subjects: | |
Online Access: | http://eprints.utm.my/id/eprint/48461/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Nowadays, cancer is one of the leading causes of death in the world. However, cancer can be treated if it is diagnosed earlier. Recently, machine learning classifiers are widely applied in cancer detection due to their accurate diagnosis in cancer classification problems. However, the performance of the classifiers can be affected by the selection of the required variables used in the classification process. To choose these variables, this research proposed two classification models using two different feature selection methods namely: Grey Relational Analysis (GRA) and Improved Grey Relational Analysis (IGRA). Both of these methods are combined with a Support Vector Machine (SVM) classifier and named as GRA-SVM and IGRA-SVM. The GRA and IGRA act as a feature selection method in the preprocessing phase of SVM classifier to recognize potential variables in cancer data that can be used as significant input to SVM classifier to improve SVM classification capability performance. Using performance measuring tools, the efficiency of the proposed classification models: GRA-SVM and IGRA-SVM based on the value of geometric mean, sensitivity, specificity, accuracy and area under Receiver Operating Characteristic curve were compared with standard SVM and other classification models from previous studies. The results showed that the proposed GRA-SVM and IGRA-SVM classification models have achieved better performance in classifying the cancer data with better results ranging between 2.64% to 88.9% in the selection of potential variables |
---|