Aco-based feature selection algorithm for classification

Dataset with a small number of records but big number of attributes represents a phenomenon called “curse of dimensionality”. The classification of this type of dataset requires Feature Selection (FS) methods for the extraction of useful information. The modified graph clustering ant colony optimisa...

Full description

Saved in:
Bibliographic Details
Main Author: Al-mazini, Hassan Fouad Abbas
Format: Thesis
Language:English
English
Published: 2022
Subjects:
Online Access:https://etd.uum.edu.my/10254/1/s903691_01.pdf
https://etd.uum.edu.my/10254/2/s903691_02.pdf
https://etd.uum.edu.my/10254/
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.uum.etd.10254
record_format eprints
spelling my.uum.etd.102542023-01-30T00:31:03Z https://etd.uum.edu.my/10254/ Aco-based feature selection algorithm for classification Al-mazini, Hassan Fouad Abbas QA Mathematics Dataset with a small number of records but big number of attributes represents a phenomenon called “curse of dimensionality”. The classification of this type of dataset requires Feature Selection (FS) methods for the extraction of useful information. The modified graph clustering ant colony optimisation (MGCACO) algorithm is an effective FS method that was developed based on grouping the highly correlated features. However, the MGCACO algorithm has three main drawbacks in producing a features subset because of its clustering method, parameter sensitivity, and the final subset determination. An enhanced graph clustering ant colony optimisation (EGCACO) algorithm is proposed to solve the three (3) MGCACO algorithm problems. The proposed improvement includes: (i) an ACO feature clustering method to obtain clusters of highly correlated features; (ii) an adaptive selection technique for subset construction from the clusters of features; and (iii) a genetic-based method for producing the final subset of features. The ACO feature clustering method utilises the ability of various mechanisms such as intensification and diversification for local and global optimisation to provide highly correlated features. The adaptive technique for ant selection enables the parameter to adaptively change based on the feedback of the search space. The genetic method determines the final subset, automatically, based on the crossover and subset quality calculation. The performance of the proposed algorithm was evaluated on 18 benchmark datasets from the University California Irvine (UCI) repository and nine (9) deoxyribonucleic acid (DNA) microarray datasets against 15 benchmark metaheuristic algorithms. The experimental results of the EGCACO algorithm on the UCI dataset are superior to other benchmark optimisation algorithms in terms of the number of selected features for 16 out of the 18 UCI datasets (88.89%) and the best in eight (8) (44.47%) of the datasets for classification accuracy. Further, experiments on the nine (9) DNA microarray datasets showed that the EGCACO algorithm is superior than the benchmark algorithms in terms of classification accuracy (first rank) for seven (7) datasets (77.78%) and demonstrates the lowest number of selected features in six (6) datasets (66.67%). The proposed EGCACO algorithm can be utilised for FS in DNA microarray classification tasks that involve large dataset size in various application domains. 2022 Thesis NonPeerReviewed text en https://etd.uum.edu.my/10254/1/s903691_01.pdf text en https://etd.uum.edu.my/10254/2/s903691_02.pdf Al-mazini, Hassan Fouad Abbas (2022) Aco-based feature selection algorithm for classification. Doctoral thesis, Universiti Utara Malaysia.
institution Universiti Utara Malaysia
building UUM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Utara Malaysia
content_source UUM Electronic Theses
url_provider http://etd.uum.edu.my/
language English
English
topic QA Mathematics
spellingShingle QA Mathematics
Al-mazini, Hassan Fouad Abbas
Aco-based feature selection algorithm for classification
description Dataset with a small number of records but big number of attributes represents a phenomenon called “curse of dimensionality”. The classification of this type of dataset requires Feature Selection (FS) methods for the extraction of useful information. The modified graph clustering ant colony optimisation (MGCACO) algorithm is an effective FS method that was developed based on grouping the highly correlated features. However, the MGCACO algorithm has three main drawbacks in producing a features subset because of its clustering method, parameter sensitivity, and the final subset determination. An enhanced graph clustering ant colony optimisation (EGCACO) algorithm is proposed to solve the three (3) MGCACO algorithm problems. The proposed improvement includes: (i) an ACO feature clustering method to obtain clusters of highly correlated features; (ii) an adaptive selection technique for subset construction from the clusters of features; and (iii) a genetic-based method for producing the final subset of features. The ACO feature clustering method utilises the ability of various mechanisms such as intensification and diversification for local and global optimisation to provide highly correlated features. The adaptive technique for ant selection enables the parameter to adaptively change based on the feedback of the search space. The genetic method determines the final subset, automatically, based on the crossover and subset quality calculation. The performance of the proposed algorithm was evaluated on 18 benchmark datasets from the University California Irvine (UCI) repository and nine (9) deoxyribonucleic acid (DNA) microarray datasets against 15 benchmark metaheuristic algorithms. The experimental results of the EGCACO algorithm on the UCI dataset are superior to other benchmark optimisation algorithms in terms of the number of selected features for 16 out of the 18 UCI datasets (88.89%) and the best in eight (8) (44.47%) of the datasets for classification accuracy. Further, experiments on the nine (9) DNA microarray datasets showed that the EGCACO algorithm is superior than the benchmark algorithms in terms of classification accuracy (first rank) for seven (7) datasets (77.78%) and demonstrates the lowest number of selected features in six (6) datasets (66.67%). The proposed EGCACO algorithm can be utilised for FS in DNA microarray classification tasks that involve large dataset size in various application domains.
format Thesis
author Al-mazini, Hassan Fouad Abbas
author_facet Al-mazini, Hassan Fouad Abbas
author_sort Al-mazini, Hassan Fouad Abbas
title Aco-based feature selection algorithm for classification
title_short Aco-based feature selection algorithm for classification
title_full Aco-based feature selection algorithm for classification
title_fullStr Aco-based feature selection algorithm for classification
title_full_unstemmed Aco-based feature selection algorithm for classification
title_sort aco-based feature selection algorithm for classification
publishDate 2022
url https://etd.uum.edu.my/10254/1/s903691_01.pdf
https://etd.uum.edu.my/10254/2/s903691_02.pdf
https://etd.uum.edu.my/10254/
_version_ 1756686877721624576
score 13.160551