Improved cluster partition in principal component analysis guided clustering

Principal component analysis (PCA) guided clustering approach is widely used in high dimensional data to improve the efficiency of K- means cluster solutions. Typically, Pearson correlation is used in PCA to provide an eigen-analysis to obtain the associated components that account for most of the v...

Full description

Saved in:
Bibliographic Details
Main Authors: Shaharudin, S. M., Ahmad, Norhaiza, Yusof, Fadhilah
Format: Article
Published: 2013
Subjects:
Online Access:http://eprints.utm.my/id/eprint/40261/
http://dx.doi.org/10.5120/13156-0839
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.utm.40261
record_format eprints
spelling my.utm.402612019-03-17T04:21:45Z http://eprints.utm.my/id/eprint/40261/ Improved cluster partition in principal component analysis guided clustering Shaharudin, S. M. Ahmad, Norhaiza Yusof, Fadhilah Q Science Principal component analysis (PCA) guided clustering approach is widely used in high dimensional data to improve the efficiency of K- means cluster solutions. Typically, Pearson correlation is used in PCA to provide an eigen-analysis to obtain the associated components that account for most of the variations in the data. However, PCA based Pearson correlation can be sensitive on non-Gaussian distributed data, which involve skewed observations such as outlying values. Thus, applying PCA based Pearson correlation on such data could affect cluster partitions and generate extremely imbalanced clusters in a high dimensional space. In this study, Tukey's biweight correlation based on M-estimate approach in PCA is used as an alternative to Pearson correlation. This approach is more resistant to outlying values as it examines each observation and down weight those that lie far from the center of the data. In particular two major features are highlighted: (1) fewer components are retained and imbalanced clusters at the recommended cumulative percentage of variation threshold is avoided; (2) the cluster quality with respect to external, internal and relative criteria as shown in Rand, Silhouette and Davies-Bouldin indices, outperform that of the clusters from PCA based Pearson correlation. 2013 Article PeerReviewed Shaharudin, S. M. and Ahmad, Norhaiza and Yusof, Fadhilah (2013) Improved cluster partition in principal component analysis guided clustering. International Journal of Computer Applications, 75 (11). pp. 23-25. ISSN 0975-8887 http://dx.doi.org/10.5120/13156-0839 DOI:10.5120/13156-0839
institution Universiti Teknologi Malaysia
building UTM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Teknologi Malaysia
content_source UTM Institutional Repository
url_provider http://eprints.utm.my/
topic Q Science
spellingShingle Q Science
Shaharudin, S. M.
Ahmad, Norhaiza
Yusof, Fadhilah
Improved cluster partition in principal component analysis guided clustering
description Principal component analysis (PCA) guided clustering approach is widely used in high dimensional data to improve the efficiency of K- means cluster solutions. Typically, Pearson correlation is used in PCA to provide an eigen-analysis to obtain the associated components that account for most of the variations in the data. However, PCA based Pearson correlation can be sensitive on non-Gaussian distributed data, which involve skewed observations such as outlying values. Thus, applying PCA based Pearson correlation on such data could affect cluster partitions and generate extremely imbalanced clusters in a high dimensional space. In this study, Tukey's biweight correlation based on M-estimate approach in PCA is used as an alternative to Pearson correlation. This approach is more resistant to outlying values as it examines each observation and down weight those that lie far from the center of the data. In particular two major features are highlighted: (1) fewer components are retained and imbalanced clusters at the recommended cumulative percentage of variation threshold is avoided; (2) the cluster quality with respect to external, internal and relative criteria as shown in Rand, Silhouette and Davies-Bouldin indices, outperform that of the clusters from PCA based Pearson correlation.
format Article
author Shaharudin, S. M.
Ahmad, Norhaiza
Yusof, Fadhilah
author_facet Shaharudin, S. M.
Ahmad, Norhaiza
Yusof, Fadhilah
author_sort Shaharudin, S. M.
title Improved cluster partition in principal component analysis guided clustering
title_short Improved cluster partition in principal component analysis guided clustering
title_full Improved cluster partition in principal component analysis guided clustering
title_fullStr Improved cluster partition in principal component analysis guided clustering
title_full_unstemmed Improved cluster partition in principal component analysis guided clustering
title_sort improved cluster partition in principal component analysis guided clustering
publishDate 2013
url http://eprints.utm.my/id/eprint/40261/
http://dx.doi.org/10.5120/13156-0839
_version_ 1643650427890171904
score 13.160551