Staff View: Efficient genetic partitioning-around-medoid algorithm for clustering

Efficient genetic partitioning-around-medoid algorithm for clustering

Throughout the years, considerable efforts made to tackle the clustering problem. Yet, because of the nature of the clustering problem, finding an efficient clustering optimization algorithm with reasonable performance is still an open challenge. In general, genetic based clustering algorithms sh...

Full description

Saved in:

Bibliographic Details
Main Author:	Garib, Sarmad Makki Mohammed
Format:	Thesis
Language:	English
Published:	2019
Subjects:	Cluster analysis - Computer programs
Online Access:	http://psasir.upm.edu.my/id/eprint/84550/1/FSKTM%202019%2050%20ir.pdf http://psasir.upm.edu.my/id/eprint/84550/
Tags:	Add Tag No Tags, Be the first to tag this record!

id	my.upm.eprints.84550
record_format	eprints
spelling	my.upm.eprints.845502021-12-31T08:22:58Z http://psasir.upm.edu.my/id/eprint/84550/ Efficient genetic partitioning-around-medoid algorithm for clustering Garib, Sarmad Makki Mohammed Throughout the years, considerable efforts made to tackle the clustering problem. Yet, because of the nature of the clustering problem, finding an efficient clustering optimization algorithm with reasonable performance is still an open challenge. In general, genetic based clustering algorithms showed the ability to reach near global optimal solution. These algorithms mostly built upon the partitioning k-means clustering algorithm. Nevertheless, these algorithms either dealt with part of the issues inherited in the k-means or in turn produce some deficiency by itself. One of the main issues in genetic k-means based algorithms is their sensitivity to outliers and unevenly distributed clusters due to the mean compromised computations. Besides, these algorithms more frequently implemented with the lengthy and redundant fixed N length integer encoding. Additionally, they used either random or non-mathematically proved population initializations methods. Adopting the medoid instead of the mean can enhance the efficiency. However, the complexity of the kmedoid based algorithms in general is more than the complexity of the k-means based algorithms. Lastly, in order to externally judge the validity of these types of clustering algorithms, there is a need for a method to correctly and efficiently evaluate their variant multiclass clustering results. This study utilizes genetic algorithms based upon the medoid rather than the mean as a centroid-selection schema to improve the clustering efficiency. It uses the compact and unique variable K length real encoding. Accordingly, the corresponding genetic operators are adapted to suite the medoid and to incorporate much clustering-specific domain knowledge. The algorithm is also preceded with careful seeding using mathematically proved to converge k-means++ algorithm. Secondly, by utilizing the medoid property as being an actual data item in the dataset and with the aid of the proposed indexing method, the complexity of the computation is notably decreased. Finally, an algorithm is developed for automatic evaluation of external validity measures on the generated variant multiclass clustering results. The experiments are divided into four interdependent sets. The first set showed the efficiency and performance of the proposed composing techniques including: kmean++ algorithm for initialization, fitness scaling for fitness selection, the proposed split and merge mutation operator, and choosing the medoid instead of the mean as a centroid-selection schema. The rest sets of experiments carried out to evaluate the proposed algorithms. Precisely, the second set revealed that the proposed genetic medoid based algorithms with both DB and VRC fitness functions produced more accurate results compared with the genetic means based algorithms in terms of the Fscore. The third set affirmed that the enhancement on the proposed algorithm, which made use of indexing method that suits the medoids, could boost the performance to about 9 to 27 times in terms of execution time depending on the complexity of the dataset. Finally, the fourth set dealt with the developed method for externally evaluating the variant multi class clusters. The experiments acknowledged that the proposed relabeling algorithm could perform better in quality than the external F-score measure. Also it outperform in term of its complexity O(N2) compared with the complexity O(2N) of the Adjusted Rand Index (ARI). 2019-05 Thesis NonPeerReviewed text en http://psasir.upm.edu.my/id/eprint/84550/1/FSKTM%202019%2050%20ir.pdf Garib, Sarmad Makki Mohammed (2019) Efficient genetic partitioning-around-medoid algorithm for clustering. Doctoral thesis, Universiti Putra Malaysia. Cluster analysis - Computer programs
institution	Universiti Putra Malaysia
building	UPM Library
collection	Institutional Repository
continent	Asia
country	Malaysia
content_provider	Universiti Putra Malaysia
content_source	UPM Institutional Repository
url_provider	http://psasir.upm.edu.my/
language	English
topic	Cluster analysis - Computer programs
spellingShingle	Cluster analysis - Computer programs Garib, Sarmad Makki Mohammed Efficient genetic partitioning-around-medoid algorithm for clustering
description	Throughout the years, considerable efforts made to tackle the clustering problem. Yet, because of the nature of the clustering problem, finding an efficient clustering optimization algorithm with reasonable performance is still an open challenge. In general, genetic based clustering algorithms showed the ability to reach near global optimal solution. These algorithms mostly built upon the partitioning k-means clustering algorithm. Nevertheless, these algorithms either dealt with part of the issues inherited in the k-means or in turn produce some deficiency by itself. One of the main issues in genetic k-means based algorithms is their sensitivity to outliers and unevenly distributed clusters due to the mean compromised computations. Besides, these algorithms more frequently implemented with the lengthy and redundant fixed N length integer encoding. Additionally, they used either random or non-mathematically proved population initializations methods. Adopting the medoid instead of the mean can enhance the efficiency. However, the complexity of the kmedoid based algorithms in general is more than the complexity of the k-means based algorithms. Lastly, in order to externally judge the validity of these types of clustering algorithms, there is a need for a method to correctly and efficiently evaluate their variant multiclass clustering results. This study utilizes genetic algorithms based upon the medoid rather than the mean as a centroid-selection schema to improve the clustering efficiency. It uses the compact and unique variable K length real encoding. Accordingly, the corresponding genetic operators are adapted to suite the medoid and to incorporate much clustering-specific domain knowledge. The algorithm is also preceded with careful seeding using mathematically proved to converge k-means++ algorithm. Secondly, by utilizing the medoid property as being an actual data item in the dataset and with the aid of the proposed indexing method, the complexity of the computation is notably decreased. Finally, an algorithm is developed for automatic evaluation of external validity measures on the generated variant multiclass clustering results. The experiments are divided into four interdependent sets. The first set showed the efficiency and performance of the proposed composing techniques including: kmean++ algorithm for initialization, fitness scaling for fitness selection, the proposed split and merge mutation operator, and choosing the medoid instead of the mean as a centroid-selection schema. The rest sets of experiments carried out to evaluate the proposed algorithms. Precisely, the second set revealed that the proposed genetic medoid based algorithms with both DB and VRC fitness functions produced more accurate results compared with the genetic means based algorithms in terms of the Fscore. The third set affirmed that the enhancement on the proposed algorithm, which made use of indexing method that suits the medoids, could boost the performance to about 9 to 27 times in terms of execution time depending on the complexity of the dataset. Finally, the fourth set dealt with the developed method for externally evaluating the variant multi class clusters. The experiments acknowledged that the proposed relabeling algorithm could perform better in quality than the external F-score measure. Also it outperform in term of its complexity O(N2) compared with the complexity O(2N) of the Adjusted Rand Index (ARI).
format	Thesis
author	Garib, Sarmad Makki Mohammed
author_facet	Garib, Sarmad Makki Mohammed
author_sort	Garib, Sarmad Makki Mohammed
title	Efficient genetic partitioning-around-medoid algorithm for clustering
title_short	Efficient genetic partitioning-around-medoid algorithm for clustering
title_full	Efficient genetic partitioning-around-medoid algorithm for clustering
title_fullStr	Efficient genetic partitioning-around-medoid algorithm for clustering
title_full_unstemmed	Efficient genetic partitioning-around-medoid algorithm for clustering
title_sort	efficient genetic partitioning-around-medoid algorithm for clustering
publishDate	2019
url	http://psasir.upm.edu.my/id/eprint/84550/1/FSKTM%202019%2050%20ir.pdf http://psasir.upm.edu.my/id/eprint/84550/
_version_	1724075441984110592
score	13.160551

Efficient genetic partitioning-around-medoid algorithm for clustering

Similar Items