A comparative study and performance evaluation of similarity measures for data clustering
Clustering is a useful technique that organizes a large quantity of unordered datasets into a small number of meaningful and coherent clusters. A wide variety of distance functions and similarity measures have been used for clustering, such as squared Euclidean distance, Manhattan distance and relat...
Saved in:
Main Authors: | , |
---|---|
Format: | Conference or Workshop Item |
Language: | English |
Published: |
2014
|
Subjects: | |
Online Access: | http://eprints.utm.my/id/eprint/60995/1/IsmailMohamad2014_AComparativeStudyandPerformanceEvaluation.pdf http://eprints.utm.my/id/eprint/60995/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
my.utm.60995 |
---|---|
record_format |
eprints |
spelling |
my.utm.609952017-03-12T07:52:24Z http://eprints.utm.my/id/eprint/60995/ A comparative study and performance evaluation of similarity measures for data clustering Usman, Dauda Mohamad, Ismail QA Mathematics Clustering is a useful technique that organizes a large quantity of unordered datasets into a small number of meaningful and coherent clusters. A wide variety of distance functions and similarity measures have been used for clustering, such as squared Euclidean distance, Manhattan distance and relative entropy. In this paper, we compare and analyze the effectiveness of these measures in clustering for high dimensional datasets. Our experiments utilize the basic K-means algorithm with application of PCA and we report results on simulated high dimensional datasets and two distance/similarity measures that have been most commonly used in clustering. The analyzed results indicate that Squared Euclidean distance is much better than the Manhattan distance method. 2014 Conference or Workshop Item PeerReviewed application/pdf en http://eprints.utm.my/id/eprint/60995/1/IsmailMohamad2014_AComparativeStudyandPerformanceEvaluation.pdf Usman, Dauda and Mohamad, Ismail (2014) A comparative study and performance evaluation of similarity measures for data clustering. In: 2nd International Science Postgraduate Conference 2014 (ISPC2014), 10-12 Mac, 2014, Johor Bahru, Malaysia. |
institution |
Universiti Teknologi Malaysia |
building |
UTM Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Teknologi Malaysia |
content_source |
UTM Institutional Repository |
url_provider |
http://eprints.utm.my/ |
language |
English |
topic |
QA Mathematics |
spellingShingle |
QA Mathematics Usman, Dauda Mohamad, Ismail A comparative study and performance evaluation of similarity measures for data clustering |
description |
Clustering is a useful technique that organizes a large quantity of unordered datasets into a small number of meaningful and coherent clusters. A wide variety of distance functions and similarity measures have been used for clustering, such as squared Euclidean distance, Manhattan distance and relative entropy. In this paper, we compare and analyze the effectiveness of these measures in clustering for high dimensional datasets. Our experiments utilize the basic K-means algorithm with application of PCA and we report results on simulated high dimensional datasets and two distance/similarity measures that have been most commonly used in clustering. The analyzed results indicate that Squared Euclidean distance is much better than the Manhattan distance method. |
format |
Conference or Workshop Item |
author |
Usman, Dauda Mohamad, Ismail |
author_facet |
Usman, Dauda Mohamad, Ismail |
author_sort |
Usman, Dauda |
title |
A comparative study and performance evaluation of similarity measures for data clustering |
title_short |
A comparative study and performance evaluation of similarity measures for data clustering |
title_full |
A comparative study and performance evaluation of similarity measures for data clustering |
title_fullStr |
A comparative study and performance evaluation of similarity measures for data clustering |
title_full_unstemmed |
A comparative study and performance evaluation of similarity measures for data clustering |
title_sort |
comparative study and performance evaluation of similarity measures for data clustering |
publishDate |
2014 |
url |
http://eprints.utm.my/id/eprint/60995/1/IsmailMohamad2014_AComparativeStudyandPerformanceEvaluation.pdf http://eprints.utm.my/id/eprint/60995/ |
_version_ |
1643655038137008128 |
score |
13.214268 |