Feature Ranking Techniques For 3D ATS Drug Molecular Structure Identification

Existing laboratory analysis techniques of ATS drug identification have their challenges which include the cost of training expert operators, the cost of acquired materials, and the dangers involved in operating the experiments. Furthermore, with the constantly emerging of the new ATS drugs design i...

Full description

Saved in:
Bibliographic Details
Main Author: Saw, Yee Ching
Format: Thesis
Language:English
English
Published: 2018
Subjects:
Online Access:http://eprints.utem.edu.my/id/eprint/23334/1/Feature%20Ranking%20Techniques%20For%203D%20ATS%20Drug%20Molecular%20Structure%20Identification.pdf
http://eprints.utem.edu.my/id/eprint/23334/2/Feature%20Ranking%20Techniques%20For%203D%20ATS%20Drug%20Molecular%20Structure%20Identification.pdf
http://eprints.utem.edu.my/id/eprint/23334/
http://plh.utem.edu.my/cgi-bin/koha/opac-detail.pl?biblionumber=112737
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.utem.eprints.23334
record_format eprints
spelling my.utem.eprints.233342022-02-21T10:33:28Z http://eprints.utem.edu.my/id/eprint/23334/ Feature Ranking Techniques For 3D ATS Drug Molecular Structure Identification Saw, Yee Ching Q Science (General) QA Mathematics Existing laboratory analysis techniques of ATS drug identification have their challenges which include the cost of training expert operators, the cost of acquired materials, and the dangers involved in operating the experiments. Furthermore, with the constantly emerging of the new ATS drugs design into the illicit market, it serves as a challenge to the comprehensive analytical method to detect and validate these compounds. This research is aimed to propose a computational intelligence approach in assisting the analysis phase of ATS drug identification process. The dataset namely ATS drugs 3D molecular structure representation dataset was analyzed. It consists of 7212 sample records associated with 1185 features. This research has investigated numerous complexities and uncertainties that have embedded in the dataset in the form of high dimensionality and existence of irrelevant and noisy features. These challenges motivated this research to tackle these problems by reduce the dimensionality of the dataset and selecting the significant subset of features from the dataset. Hence, this led to the proposal of a feature selection approach for removing the irrelevant and noisy data and selecting a feature subset which best represent the ATS drug and produce a better identification performance. The proposed feature selection approach has a simple algorithmic framework and makes use of the existing feature selection techniques to cater different variety of data issues, namely Ensemble Filter-Embedded Feature Ranking Approach (FEFR). This proposed approach is performed in two main phases. The first phase is to carry out a thorough analysis of the effectiveness and capability of various feature ranking techniques in ATS drug identification. Six feature ranking techniques were used: Information Gain (IG), Gain Ratio (GR), Symmetrical Uncertainty (SU), Support vector machine based recursive feature elimination (SVM-RFE), and Variable Importance based random forest (VI-RF). The selected feature subset by each of the selected feature ranking technique were run through five different popular classifiers: Random forest (RF), Naïve Bayes (NB), IBK, Sequential Minimal Optimization (SMO), J48, and their performances were analyzed and compared. Experiments on the dataset showed that ReliefF and VIRF performed the best among the other techniques in retaining the significant features and eliminate the irrelevant features. For the second phase, the results of these two top performers in the analysis will be selected and aggregate to gain benefit from their advantages whilst minimize their shortcomings to yield a more reliable result. All the performance is evaluated in term of the number of features selected and classification accuracy. Paired t-test also carry out to further validated the quality of the FEFR based on the classification accuracy performance metric. The results show that the feature subset selected by the FEFR feature selection approach is either superior or at least as adequate as those subsets that selected by the individual feature ranking method and the original dataset. 2018 Thesis NonPeerReviewed text en http://eprints.utem.edu.my/id/eprint/23334/1/Feature%20Ranking%20Techniques%20For%203D%20ATS%20Drug%20Molecular%20Structure%20Identification.pdf text en http://eprints.utem.edu.my/id/eprint/23334/2/Feature%20Ranking%20Techniques%20For%203D%20ATS%20Drug%20Molecular%20Structure%20Identification.pdf Saw, Yee Ching (2018) Feature Ranking Techniques For 3D ATS Drug Molecular Structure Identification. Masters thesis, Universiti Teknikal Malaysia Melaka. http://plh.utem.edu.my/cgi-bin/koha/opac-detail.pl?biblionumber=112737
institution Universiti Teknikal Malaysia Melaka
building UTEM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Teknikal Malaysia Melaka
content_source UTEM Institutional Repository
url_provider http://eprints.utem.edu.my/
language English
English
topic Q Science (General)
QA Mathematics
spellingShingle Q Science (General)
QA Mathematics
Saw, Yee Ching
Feature Ranking Techniques For 3D ATS Drug Molecular Structure Identification
description Existing laboratory analysis techniques of ATS drug identification have their challenges which include the cost of training expert operators, the cost of acquired materials, and the dangers involved in operating the experiments. Furthermore, with the constantly emerging of the new ATS drugs design into the illicit market, it serves as a challenge to the comprehensive analytical method to detect and validate these compounds. This research is aimed to propose a computational intelligence approach in assisting the analysis phase of ATS drug identification process. The dataset namely ATS drugs 3D molecular structure representation dataset was analyzed. It consists of 7212 sample records associated with 1185 features. This research has investigated numerous complexities and uncertainties that have embedded in the dataset in the form of high dimensionality and existence of irrelevant and noisy features. These challenges motivated this research to tackle these problems by reduce the dimensionality of the dataset and selecting the significant subset of features from the dataset. Hence, this led to the proposal of a feature selection approach for removing the irrelevant and noisy data and selecting a feature subset which best represent the ATS drug and produce a better identification performance. The proposed feature selection approach has a simple algorithmic framework and makes use of the existing feature selection techniques to cater different variety of data issues, namely Ensemble Filter-Embedded Feature Ranking Approach (FEFR). This proposed approach is performed in two main phases. The first phase is to carry out a thorough analysis of the effectiveness and capability of various feature ranking techniques in ATS drug identification. Six feature ranking techniques were used: Information Gain (IG), Gain Ratio (GR), Symmetrical Uncertainty (SU), Support vector machine based recursive feature elimination (SVM-RFE), and Variable Importance based random forest (VI-RF). The selected feature subset by each of the selected feature ranking technique were run through five different popular classifiers: Random forest (RF), Naïve Bayes (NB), IBK, Sequential Minimal Optimization (SMO), J48, and their performances were analyzed and compared. Experiments on the dataset showed that ReliefF and VIRF performed the best among the other techniques in retaining the significant features and eliminate the irrelevant features. For the second phase, the results of these two top performers in the analysis will be selected and aggregate to gain benefit from their advantages whilst minimize their shortcomings to yield a more reliable result. All the performance is evaluated in term of the number of features selected and classification accuracy. Paired t-test also carry out to further validated the quality of the FEFR based on the classification accuracy performance metric. The results show that the feature subset selected by the FEFR feature selection approach is either superior or at least as adequate as those subsets that selected by the individual feature ranking method and the original dataset.
format Thesis
author Saw, Yee Ching
author_facet Saw, Yee Ching
author_sort Saw, Yee Ching
title Feature Ranking Techniques For 3D ATS Drug Molecular Structure Identification
title_short Feature Ranking Techniques For 3D ATS Drug Molecular Structure Identification
title_full Feature Ranking Techniques For 3D ATS Drug Molecular Structure Identification
title_fullStr Feature Ranking Techniques For 3D ATS Drug Molecular Structure Identification
title_full_unstemmed Feature Ranking Techniques For 3D ATS Drug Molecular Structure Identification
title_sort feature ranking techniques for 3d ats drug molecular structure identification
publishDate 2018
url http://eprints.utem.edu.my/id/eprint/23334/1/Feature%20Ranking%20Techniques%20For%203D%20ATS%20Drug%20Molecular%20Structure%20Identification.pdf
http://eprints.utem.edu.my/id/eprint/23334/2/Feature%20Ranking%20Techniques%20For%203D%20ATS%20Drug%20Molecular%20Structure%20Identification.pdf
http://eprints.utem.edu.my/id/eprint/23334/
http://plh.utem.edu.my/cgi-bin/koha/opac-detail.pl?biblionumber=112737
_version_ 1725976102571081728
score 13.214268