An adaptive density-based method for clustering evolving data streams / Amineh Amini

Density-based method has emerged as a worthwhile class for clustering data streams. It has the abilities to discover clusters of arbitrary shapes, handle noise, and cluster without prior knowledge of number of clusters. The characteristics of data stream includes infinite volume, dynamically changin...

Full description

Saved in:
Bibliographic Details
Main Author: Amini, Amineh
Format: Thesis
Published: 2014
Subjects:
Online Access:http://studentsrepo.um.edu.my/4684/1/Amineh_Amini_PhD_Thesis_20140914.pdf
http://studentsrepo.um.edu.my/4684/2/CD_Cover_Amineh.pdf
http://studentsrepo.um.edu.my/4684/
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.um.stud.4684
record_format eprints
spelling my.um.stud.46842015-03-04T05:11:34Z An adaptive density-based method for clustering evolving data streams / Amineh Amini Amini, Amineh QA75 Electronic computers. Computer science T Technology (General) Density-based method has emerged as a worthwhile class for clustering data streams. It has the abilities to discover clusters of arbitrary shapes, handle noise, and cluster without prior knowledge of number of clusters. The characteristics of data stream includes infinite volume, dynamically changing, allowing only one or a small number of scans, and demanding fast response time. Due to these characteristics the traditional densitybased clustering is not applicable. Recently, a number of density-based algorithms have been developed for clustering data streams. However, existing density-based data stream clustering algorithms are not without problems. The first problem refers to the high computation time required for the clustering process. The second problem is the dramatic decrease in the quality of clustering when there is a range in density of data. In this research, these problems are taken into account and a new method is proposed. This study proposes a density-based algorithm for clustering evolving data streams. The proposed method, which is called MuDi-Stream (Multi Density clustering algorithm for evolving data Stream), is an online-offline algorithm with four main components. Three of components are applied in the online phase while the other one is used in the offline phase. The prominent tasks of these components are keeping synopsis information, pruning these information, and forming final clusters. In the first component, a hybrid method comprised of density grid and micro clustering techniques is applied to maintain summary information in the form of core mini clusters while mapping outlier to the grids. The data points inside the grid form a new core mini cluster in case it reaches a density threshold in the second component. Furthermore, grid and core mini clusters are pruned using a pruning technique in the last component of online phase in order to keep the memory limited. A new multi density-based clustering method forms final clusters using both summarized synopsis information and statistical information. The quality of the algorithm is comprehensively evaluated on various synthetic and real datasets with different characteristics using variety of quality metrics. The complexity analysis shows that it uses limited time and memory which makes MuDi-Stream applicable for data stream. Furthermore, the scalability results prove that the proposed algorithm is scalable in terms of both dimension and number of clusters. Finally, the experimental results show that the proposed method in this study improves clustering quality in multi-density environments while minimizing the computation time. 2014 Thesis NonPeerReviewed application/pdf http://studentsrepo.um.edu.my/4684/1/Amineh_Amini_PhD_Thesis_20140914.pdf application/pdf http://studentsrepo.um.edu.my/4684/2/CD_Cover_Amineh.pdf Amini, Amineh (2014) An adaptive density-based method for clustering evolving data streams / Amineh Amini. PhD thesis, University of Malaya. http://studentsrepo.um.edu.my/4684/
institution Universiti Malaya
building UM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Malaya
content_source UM Student Repository
url_provider http://studentsrepo.um.edu.my/
topic QA75 Electronic computers. Computer science
T Technology (General)
spellingShingle QA75 Electronic computers. Computer science
T Technology (General)
Amini, Amineh
An adaptive density-based method for clustering evolving data streams / Amineh Amini
description Density-based method has emerged as a worthwhile class for clustering data streams. It has the abilities to discover clusters of arbitrary shapes, handle noise, and cluster without prior knowledge of number of clusters. The characteristics of data stream includes infinite volume, dynamically changing, allowing only one or a small number of scans, and demanding fast response time. Due to these characteristics the traditional densitybased clustering is not applicable. Recently, a number of density-based algorithms have been developed for clustering data streams. However, existing density-based data stream clustering algorithms are not without problems. The first problem refers to the high computation time required for the clustering process. The second problem is the dramatic decrease in the quality of clustering when there is a range in density of data. In this research, these problems are taken into account and a new method is proposed. This study proposes a density-based algorithm for clustering evolving data streams. The proposed method, which is called MuDi-Stream (Multi Density clustering algorithm for evolving data Stream), is an online-offline algorithm with four main components. Three of components are applied in the online phase while the other one is used in the offline phase. The prominent tasks of these components are keeping synopsis information, pruning these information, and forming final clusters. In the first component, a hybrid method comprised of density grid and micro clustering techniques is applied to maintain summary information in the form of core mini clusters while mapping outlier to the grids. The data points inside the grid form a new core mini cluster in case it reaches a density threshold in the second component. Furthermore, grid and core mini clusters are pruned using a pruning technique in the last component of online phase in order to keep the memory limited. A new multi density-based clustering method forms final clusters using both summarized synopsis information and statistical information. The quality of the algorithm is comprehensively evaluated on various synthetic and real datasets with different characteristics using variety of quality metrics. The complexity analysis shows that it uses limited time and memory which makes MuDi-Stream applicable for data stream. Furthermore, the scalability results prove that the proposed algorithm is scalable in terms of both dimension and number of clusters. Finally, the experimental results show that the proposed method in this study improves clustering quality in multi-density environments while minimizing the computation time.
format Thesis
author Amini, Amineh
author_facet Amini, Amineh
author_sort Amini, Amineh
title An adaptive density-based method for clustering evolving data streams / Amineh Amini
title_short An adaptive density-based method for clustering evolving data streams / Amineh Amini
title_full An adaptive density-based method for clustering evolving data streams / Amineh Amini
title_fullStr An adaptive density-based method for clustering evolving data streams / Amineh Amini
title_full_unstemmed An adaptive density-based method for clustering evolving data streams / Amineh Amini
title_sort adaptive density-based method for clustering evolving data streams / amineh amini
publishDate 2014
url http://studentsrepo.um.edu.my/4684/1/Amineh_Amini_PhD_Thesis_20140914.pdf
http://studentsrepo.um.edu.my/4684/2/CD_Cover_Amineh.pdf
http://studentsrepo.um.edu.my/4684/
_version_ 1738505699631038464
score 13.211869