Improved cluster partition in principal component analysis guided clustering
Principal component analysis (PCA) guided clustering approach is widely used in high dimensional data to improve the efficiency of K- means cluster solutions. Typically, Pearson correlation is used in PCA to provide an eigen-analysis to obtain the associated components that account for most of the v...
Saved in:
Main Authors: | , , |
---|---|
Format: | Article |
Published: |
2013
|
Subjects: | |
Online Access: | http://eprints.utm.my/id/eprint/40261/ http://dx.doi.org/10.5120/13156-0839 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
my.utm.40261 |
---|---|
record_format |
eprints |
spelling |
my.utm.402612019-03-17T04:21:45Z http://eprints.utm.my/id/eprint/40261/ Improved cluster partition in principal component analysis guided clustering Shaharudin, S. M. Ahmad, Norhaiza Yusof, Fadhilah Q Science Principal component analysis (PCA) guided clustering approach is widely used in high dimensional data to improve the efficiency of K- means cluster solutions. Typically, Pearson correlation is used in PCA to provide an eigen-analysis to obtain the associated components that account for most of the variations in the data. However, PCA based Pearson correlation can be sensitive on non-Gaussian distributed data, which involve skewed observations such as outlying values. Thus, applying PCA based Pearson correlation on such data could affect cluster partitions and generate extremely imbalanced clusters in a high dimensional space. In this study, Tukey's biweight correlation based on M-estimate approach in PCA is used as an alternative to Pearson correlation. This approach is more resistant to outlying values as it examines each observation and down weight those that lie far from the center of the data. In particular two major features are highlighted: (1) fewer components are retained and imbalanced clusters at the recommended cumulative percentage of variation threshold is avoided; (2) the cluster quality with respect to external, internal and relative criteria as shown in Rand, Silhouette and Davies-Bouldin indices, outperform that of the clusters from PCA based Pearson correlation. 2013 Article PeerReviewed Shaharudin, S. M. and Ahmad, Norhaiza and Yusof, Fadhilah (2013) Improved cluster partition in principal component analysis guided clustering. International Journal of Computer Applications, 75 (11). pp. 23-25. ISSN 0975-8887 http://dx.doi.org/10.5120/13156-0839 DOI:10.5120/13156-0839 |
institution |
Universiti Teknologi Malaysia |
building |
UTM Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Teknologi Malaysia |
content_source |
UTM Institutional Repository |
url_provider |
http://eprints.utm.my/ |
topic |
Q Science |
spellingShingle |
Q Science Shaharudin, S. M. Ahmad, Norhaiza Yusof, Fadhilah Improved cluster partition in principal component analysis guided clustering |
description |
Principal component analysis (PCA) guided clustering approach is widely used in high dimensional data to improve the efficiency of K- means cluster solutions. Typically, Pearson correlation is used in PCA to provide an eigen-analysis to obtain the associated components that account for most of the variations in the data. However, PCA based Pearson correlation can be sensitive on non-Gaussian distributed data, which involve skewed observations such as outlying values. Thus, applying PCA based Pearson correlation on such data could affect cluster partitions and generate extremely imbalanced clusters in a high dimensional space. In this study, Tukey's biweight correlation based on M-estimate approach in PCA is used as an alternative to Pearson correlation. This approach is more resistant to outlying values as it examines each observation and down weight those that lie far from the center of the data. In particular two major features are highlighted: (1) fewer components are retained and imbalanced clusters at the recommended cumulative percentage of variation threshold is avoided; (2) the cluster quality with respect to external, internal and relative criteria as shown in Rand, Silhouette and Davies-Bouldin indices, outperform that of the clusters from PCA based Pearson correlation. |
format |
Article |
author |
Shaharudin, S. M. Ahmad, Norhaiza Yusof, Fadhilah |
author_facet |
Shaharudin, S. M. Ahmad, Norhaiza Yusof, Fadhilah |
author_sort |
Shaharudin, S. M. |
title |
Improved cluster partition in principal component analysis guided clustering |
title_short |
Improved cluster partition in principal component analysis guided clustering |
title_full |
Improved cluster partition in principal component analysis guided clustering |
title_fullStr |
Improved cluster partition in principal component analysis guided clustering |
title_full_unstemmed |
Improved cluster partition in principal component analysis guided clustering |
title_sort |
improved cluster partition in principal component analysis guided clustering |
publishDate |
2013 |
url |
http://eprints.utm.my/id/eprint/40261/ http://dx.doi.org/10.5120/13156-0839 |
_version_ |
1643650427890171904 |
score |
13.160551 |