A comparative study and performance evaluation of similarity measures for data clustering

Clustering is a useful technique that organizes a large quantity of unordered datasets into a small number of meaningful and coherent clusters. A wide variety of distance functions and similarity measures have been used for clustering, such as squared Euclidean distance, Manhattan distance and relat...

詳細記述

保存先:
書誌詳細
主要な著者: Usman, Dauda, Mohamad, Ismail
フォーマット: Conference or Workshop Item
言語:English
出版事項: 2014
主題:
オンライン・アクセス:http://eprints.utm.my/id/eprint/60995/1/IsmailMohamad2014_AComparativeStudyandPerformanceEvaluation.pdf
http://eprints.utm.my/id/eprint/60995/
タグ: タグ追加
タグなし, このレコードへの初めてのタグを付けませんか!
その他の書誌記述
要約:Clustering is a useful technique that organizes a large quantity of unordered datasets into a small number of meaningful and coherent clusters. A wide variety of distance functions and similarity measures have been used for clustering, such as squared Euclidean distance, Manhattan distance and relative entropy. In this paper, we compare and analyze the effectiveness of these measures in clustering for high dimensional datasets. Our experiments utilize the basic K-means algorithm with application of PCA and we report results on simulated high dimensional datasets and two distance/similarity measures that have been most commonly used in clustering. The analyzed results indicate that Squared Euclidean distance is much better than the Manhattan distance method.