An adaptive density-based method for clustering evolving data streams / Amineh Amini
Density-based method has emerged as a worthwhile class for clustering data streams. It has the abilities to discover clusters of arbitrary shapes, handle noise, and cluster without prior knowledge of number of clusters. The characteristics of data stream includes infinite volume, dynamically changin...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Published: |
2014
|
Subjects: | |
Online Access: | http://studentsrepo.um.edu.my/4684/1/Amineh_Amini_PhD_Thesis_20140914.pdf http://studentsrepo.um.edu.my/4684/2/CD_Cover_Amineh.pdf http://studentsrepo.um.edu.my/4684/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Density-based method has emerged as a worthwhile class for clustering data streams. It has the abilities to discover clusters of arbitrary shapes, handle noise, and cluster without prior knowledge of number of clusters. The characteristics of data stream includes infinite volume, dynamically changing, allowing only one or a small number of scans, and demanding fast response time. Due to these characteristics the traditional densitybased clustering is not applicable.
Recently, a number of density-based algorithms have been developed for clustering data streams. However, existing density-based data stream clustering algorithms are not
without problems. The first problem refers to the high computation time required for the clustering process. The second problem is the dramatic decrease in the quality of
clustering when there is a range in density of data. In this research, these problems are taken into account and a new method is proposed. This study proposes a density-based algorithm for clustering evolving data streams. The proposed method, which is called MuDi-Stream (Multi Density clustering algorithm for evolving data Stream), is an online-offline algorithm with four main components.
Three of components are applied in the online phase while the other one is used in the offline phase. The prominent tasks of these components are keeping synopsis information,
pruning these information, and forming final clusters.
In the first component, a hybrid method comprised of density grid and micro clustering techniques is applied to maintain summary information in the form of core mini
clusters while mapping outlier to the grids. The data points inside the grid form a new core mini cluster in case it reaches a density threshold in the second component. Furthermore, grid and core mini clusters are pruned using a pruning technique in the last component of online phase in order to keep the memory limited. A new multi density-based clustering method forms final clusters using both summarized synopsis information and statistical
information.
The quality of the algorithm is comprehensively evaluated on various synthetic and real datasets with different characteristics using variety of quality metrics. The complexity analysis shows that it uses limited time and memory which makes MuDi-Stream applicable for data stream. Furthermore, the scalability results prove that the proposed
algorithm is scalable in terms of both dimension and number of clusters. Finally, the experimental results show that the proposed method in this study improves clustering
quality in multi-density environments while minimizing the computation time. |
---|