A COMPARISON STUDY OF DATA CLUSTERING AND VISUALISATION TECHNIQUES WITH VARIOUS DATA TYPES

Clustering is used to identify the intrinsic grouping of a set of unlabelled data. It can be applied in data mining exploration and statistical data analysis. The clustering technique plays an important role in the current digital environment. As the quality and complication of data on the internet...

Full description

Saved in:
Bibliographic Details
Main Author: Ling, Chien
Format: Final Year Project Report
Language:English
English
Published: Universiti Malaysia Sarawak (UNIMAS) 2020
Subjects:
Online Access:http://ir.unimas.my/id/eprint/32941/1/Ling%20Chien%20-%2024%20pgs.pdf
http://ir.unimas.my/id/eprint/32941/4/Ling%20Chien%20ft.pdf
http://ir.unimas.my/id/eprint/32941/
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.unimas.ir.32941
record_format eprints
spelling my.unimas.ir.329412024-01-08T08:05:08Z http://ir.unimas.my/id/eprint/32941/ A COMPARISON STUDY OF DATA CLUSTERING AND VISUALISATION TECHNIQUES WITH VARIOUS DATA TYPES Ling, Chien H Social Sciences (General) Clustering is used to identify the intrinsic grouping of a set of unlabelled data. It can be applied in data mining exploration and statistical data analysis. The clustering technique plays an important role in the current digital environment. As the quality and complication of data on the internet are increasing in today’s rapidly evolving area, the clustering methods become the indispensable techniques to find the patterns of the data. There are many types of clustering techniques that have been developed included partitioning methods, hierarchical clustering, density-based clustering, model-based clustering, and fuzzy clustering. This study only focuses on three types of clustering techniques which are k-means clustering, agglomerative hierarchical clustering with the ward’s linkage, complete linkage, and average linkage, and Self-Organizing Map (SOM). The clustering algorithms are written using Python language by modifying the coding obtained from the Internet. In this project, experiments on visualisation and performance analysis of selected clustering methods are conducted. Besides that, a case study is conducted by implementing the clustering technique on online product reviews. The results for the experiment on visualisation of clustering methods, it showed that various clustering techniques have their visualisation for cluster analysis. Meanwhile, the results of the predictive accuracy indicated that k-means clustering and self-organizing map (SOM) are the most suitable techniques for cluster analysis. Based on the results of the case study, it concluded that the accuracy in clustering the online product reviews has the relationship with the structures and amount of the sentences. The extractive text summarisation with the clustering technique can be improved and further developed to imply in the customer review system as the correction between them have been known. Universiti Malaysia Sarawak (UNIMAS) 2020 Final Year Project Report NonPeerReviewed text en http://ir.unimas.my/id/eprint/32941/1/Ling%20Chien%20-%2024%20pgs.pdf text en http://ir.unimas.my/id/eprint/32941/4/Ling%20Chien%20ft.pdf Ling, Chien (2020) A COMPARISON STUDY OF DATA CLUSTERING AND VISUALISATION TECHNIQUES WITH VARIOUS DATA TYPES. [Final Year Project Report] (Unpublished)
institution Universiti Malaysia Sarawak
building Centre for Academic Information Services (CAIS)
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Malaysia Sarawak
content_source UNIMAS Institutional Repository
url_provider http://ir.unimas.my/
language English
English
topic H Social Sciences (General)
spellingShingle H Social Sciences (General)
Ling, Chien
A COMPARISON STUDY OF DATA CLUSTERING AND VISUALISATION TECHNIQUES WITH VARIOUS DATA TYPES
description Clustering is used to identify the intrinsic grouping of a set of unlabelled data. It can be applied in data mining exploration and statistical data analysis. The clustering technique plays an important role in the current digital environment. As the quality and complication of data on the internet are increasing in today’s rapidly evolving area, the clustering methods become the indispensable techniques to find the patterns of the data. There are many types of clustering techniques that have been developed included partitioning methods, hierarchical clustering, density-based clustering, model-based clustering, and fuzzy clustering. This study only focuses on three types of clustering techniques which are k-means clustering, agglomerative hierarchical clustering with the ward’s linkage, complete linkage, and average linkage, and Self-Organizing Map (SOM). The clustering algorithms are written using Python language by modifying the coding obtained from the Internet. In this project, experiments on visualisation and performance analysis of selected clustering methods are conducted. Besides that, a case study is conducted by implementing the clustering technique on online product reviews. The results for the experiment on visualisation of clustering methods, it showed that various clustering techniques have their visualisation for cluster analysis. Meanwhile, the results of the predictive accuracy indicated that k-means clustering and self-organizing map (SOM) are the most suitable techniques for cluster analysis. Based on the results of the case study, it concluded that the accuracy in clustering the online product reviews has the relationship with the structures and amount of the sentences. The extractive text summarisation with the clustering technique can be improved and further developed to imply in the customer review system as the correction between them have been known.
format Final Year Project Report
author Ling, Chien
author_facet Ling, Chien
author_sort Ling, Chien
title A COMPARISON STUDY OF DATA CLUSTERING AND VISUALISATION TECHNIQUES WITH VARIOUS DATA TYPES
title_short A COMPARISON STUDY OF DATA CLUSTERING AND VISUALISATION TECHNIQUES WITH VARIOUS DATA TYPES
title_full A COMPARISON STUDY OF DATA CLUSTERING AND VISUALISATION TECHNIQUES WITH VARIOUS DATA TYPES
title_fullStr A COMPARISON STUDY OF DATA CLUSTERING AND VISUALISATION TECHNIQUES WITH VARIOUS DATA TYPES
title_full_unstemmed A COMPARISON STUDY OF DATA CLUSTERING AND VISUALISATION TECHNIQUES WITH VARIOUS DATA TYPES
title_sort comparison study of data clustering and visualisation techniques with various data types
publisher Universiti Malaysia Sarawak (UNIMAS)
publishDate 2020
url http://ir.unimas.my/id/eprint/32941/1/Ling%20Chien%20-%2024%20pgs.pdf
http://ir.unimas.my/id/eprint/32941/4/Ling%20Chien%20ft.pdf
http://ir.unimas.my/id/eprint/32941/
_version_ 1787519568482140160
score 13.160551