Feature Ranking Techniques For 3D ATS Drug Molecular Structure Identification
Existing laboratory analysis techniques of ATS drug identification have their challenges which include the cost of training expert operators, the cost of acquired materials, and the dangers involved in operating the experiments. Furthermore, with the constantly emerging of the new ATS drugs design i...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Language: | English English |
Published: |
2018
|
Subjects: | |
Online Access: | http://eprints.utem.edu.my/id/eprint/23334/1/Feature%20Ranking%20Techniques%20For%203D%20ATS%20Drug%20Molecular%20Structure%20Identification.pdf http://eprints.utem.edu.my/id/eprint/23334/2/Feature%20Ranking%20Techniques%20For%203D%20ATS%20Drug%20Molecular%20Structure%20Identification.pdf http://eprints.utem.edu.my/id/eprint/23334/ http://plh.utem.edu.my/cgi-bin/koha/opac-detail.pl?biblionumber=112737 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Existing laboratory analysis techniques of ATS drug identification have their challenges which include the cost of training expert operators, the cost of acquired materials, and the dangers involved in operating the experiments. Furthermore, with the constantly emerging of the new ATS drugs design into the illicit market, it serves as a challenge to the comprehensive analytical method to detect and validate these compounds. This research is aimed to propose a computational intelligence approach in assisting the analysis phase of ATS drug identification process. The dataset namely ATS drugs 3D molecular structure representation dataset was analyzed. It consists of 7212 sample records associated with 1185 features. This research has investigated numerous complexities and uncertainties that have embedded in the dataset in the form of high dimensionality and existence of irrelevant and noisy features. These challenges motivated this research to tackle these problems by reduce the dimensionality of the dataset and selecting the significant subset of features from the dataset. Hence, this led to the proposal of a feature selection approach for removing the irrelevant and noisy data and selecting a feature subset which best represent the ATS drug and produce a better identification performance. The proposed feature selection approach has a simple algorithmic framework and makes use of the existing feature selection techniques to cater different variety of data issues, namely Ensemble Filter-Embedded Feature Ranking Approach (FEFR). This proposed approach is performed in two main phases. The first phase is to carry out a thorough analysis of the effectiveness and capability of various feature ranking techniques in ATS drug identification. Six feature ranking techniques were used: Information Gain (IG), Gain Ratio (GR), Symmetrical Uncertainty (SU), Support vector machine based recursive feature elimination (SVM-RFE), and Variable Importance based random forest (VI-RF). The selected feature subset by each of the selected feature ranking technique were run through five different popular classifiers: Random forest (RF), Naïve Bayes (NB), IBK, Sequential Minimal Optimization (SMO), J48, and their performances were analyzed and compared. Experiments on the dataset showed that ReliefF and VIRF performed the best among the other techniques in retaining the significant features and eliminate the irrelevant features. For the second phase, the results of these two top performers in the analysis will be selected and aggregate to gain benefit from their advantages whilst minimize their shortcomings to yield a more reliable result. All the performance is evaluated in term of the number of features selected and classification accuracy. Paired t-test also carry out to further validated the quality of the FEFR based on the classification accuracy performance metric. The results show that the feature subset selected by the FEFR feature selection approach is either superior or at least as adequate as those subsets that selected by the individual feature ranking method and the original dataset. |
---|