The effect of different similarity distance measures in detecting outliers using single-linkage clustering algorithm for univariate circular biological data

Clustering algorithms can be used to create an outlier detection procedure in univariate circular data. The circular distance between each point of angular observation in circular data is used to calculate the similarity measure to appropriately group observations. In this paper, we present a cluste...

Full description

Saved in:
Bibliographic Details
Main Authors: Nur Syahirah, Zulkipli, Siti Zanariah, Satari, Wan Nur Syahidah, Wan Yusoff
Format: Article
Language:English
Published: PJSOR
Subjects:
Online Access:http://umpir.ump.edu.my/id/eprint/35453/1/Zulkipli%20et%20al.%20PJSOR.pdf
http://umpir.ump.edu.my/id/eprint/35453/
http://dx.doi.org/10.18187/pjsor.v18i3.3982
http://dx.doi.org/10.18187/pjsor.v18i3.3982
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Clustering algorithms can be used to create an outlier detection procedure in univariate circular data. The circular distance between each point of angular observation in circular data is used to calculate the similarity measure to appropriately group observations. In this paper, we present a clustering-based procedure for detecting outliers in univariate circular biological data using various similarity distance measures. Three circular similarity distance measures; Satari distance, Di distance and Chang-chien distance were used to detect outliers using a single-linkage clustering algorithm. Satari distance and Di distance are two similarity measures that have similar formulas for univariate circular data. This study aims to develop and demonstrate the effectiveness of the proposed clustering-based procedure with various similarity distance measures in detecting outliers. The circular similarity distance of SL-Satari/Di and other similarity measures, including SL-Chang, were compared at various dendrogram cutting points. It is found that a clustering-based procedure using a single-linkage algorithm with various similarity distances is a practical and promising approach to detect outliers in univariate circular data, particularly for biological data. According to the results, the SL-Satari/Di distance outperformed the SL-Chang distance for certain data conditions.