Towards lowering computational power in IoT systems: Clustering algorithm for high-dimensional data stream using entropy window reduction
In a world of connectivity empowered by the advancement of the Internet of Things (IoT), an infinite number of data streams have emerged. Thus, data stream clustering is crucial for extracting hidden knowledge and data mining. Various data stream clustering methods have lately been introduced. Yet,...
Saved in:
Main Authors: | , , , , , |
---|---|
Other Authors: | |
Format: | Article |
Published: |
Elsevier B.V.
2024
|
Subjects: | |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | In a world of connectivity empowered by the advancement of the Internet of Things (IoT), an infinite number of data streams have emerged. Thus, data stream clustering is crucial for extracting hidden knowledge and data mining. Various data stream clustering methods have lately been introduced. Yet, the majority of such algorithms are affected by the curse of high dimensionality. Lately, a fully online buffer-based clustering algorithm for handling evolving data streams (BOCEDS) was developed. Similarly to other existing density-based clustering methods, BOCEDS is not capable of handling high-dimensional data and has high computational power and high memory utilization. This paper introduces an Entropy Window Reduction (EWR) algorithm, which is an improved version of the BOCEDS technique. EWR is a fully online clustering technique for handling high-dimensional data streams using feature ranking and sorting. This process is accomplished by calculating the entropy of specific features with respect to the time window. The findings of the experiments are compared to the outcomes of BOCEDS, CEDAS, and MuDi-Stream algorithms. The outcomes indicate that the EWR algorithm outperformed the baseline clustering algorithms. The results are demonstrated using the KDDCup�99 dataset in terms of quality and complexity evaluation on the average of F-Measures, Jaccard Index, Fowlkes�Mallows index, Purity, and Rand Index as well as the memory usage and computational power with 88%, 66%, 81%, 100%, and 66%, respectively. The results also show low memory usage and computing power in comparison with the baseline algorithms. � 2023 THE AUTHORS |
---|