A COMPARISON STUDY OF DATA CLUSTERING AND VISUALISATION TECHNIQUES WITH VARIOUS DATA TYPES

Clustering is used to identify the intrinsic grouping of a set of unlabelled data. It can be applied in data mining exploration and statistical data analysis. The clustering technique plays an important role in the current digital environment. As the quality and complication of data on the internet...

Full description

Saved in:
Bibliographic Details
Main Author: Ling, Chien
Format: Final Year Project Report
Language:English
English
Published: Universiti Malaysia Sarawak (UNIMAS) 2020
Subjects:
Online Access:http://ir.unimas.my/id/eprint/32941/1/Ling%20Chien%20-%2024%20pgs.pdf
http://ir.unimas.my/id/eprint/32941/4/Ling%20Chien%20ft.pdf
http://ir.unimas.my/id/eprint/32941/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Clustering is used to identify the intrinsic grouping of a set of unlabelled data. It can be applied in data mining exploration and statistical data analysis. The clustering technique plays an important role in the current digital environment. As the quality and complication of data on the internet are increasing in today’s rapidly evolving area, the clustering methods become the indispensable techniques to find the patterns of the data. There are many types of clustering techniques that have been developed included partitioning methods, hierarchical clustering, density-based clustering, model-based clustering, and fuzzy clustering. This study only focuses on three types of clustering techniques which are k-means clustering, agglomerative hierarchical clustering with the ward’s linkage, complete linkage, and average linkage, and Self-Organizing Map (SOM). The clustering algorithms are written using Python language by modifying the coding obtained from the Internet. In this project, experiments on visualisation and performance analysis of selected clustering methods are conducted. Besides that, a case study is conducted by implementing the clustering technique on online product reviews. The results for the experiment on visualisation of clustering methods, it showed that various clustering techniques have their visualisation for cluster analysis. Meanwhile, the results of the predictive accuracy indicated that k-means clustering and self-organizing map (SOM) are the most suitable techniques for cluster analysis. Based on the results of the case study, it concluded that the accuracy in clustering the online product reviews has the relationship with the structures and amount of the sentences. The extractive text summarisation with the clustering technique can be improved and further developed to imply in the customer review system as the correction between them have been known.