A web-based implementation of k-means algorithms

The K-means algorithm has been around for over a century. While a rather simplistic and dated algorithm, it remains widely used and taught till this day. The K-means algorithm requires two inputs for it to be applied onto a data set, the value K, and a proximity measure. Picking the right inputs is...

Full description

Saved in:
Bibliographic Details
Main Author: Lee, Quan
Format: Final Year Project / Dissertation / Thesis
Published: 2022
Subjects:
Online Access:http://eprints.utar.edu.my/5010/1/1801846_LEE_QUAN.pdf
http://eprints.utar.edu.my/5010/
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-utar-eprints.5010
record_format eprints
spelling my-utar-eprints.50102022-12-26T14:19:36Z A web-based implementation of k-means algorithms Lee, Quan QA76 Computer software The K-means algorithm has been around for over a century. While a rather simplistic and dated algorithm, it remains widely used and taught till this day. The K-means algorithm requires two inputs for it to be applied onto a data set, the value K, and a proximity measure. Picking the right inputs is of utmost importance if one wishes to achieve good results with the algorithm, especially the proximity measure. There are plenty of different proximity measures available in the world, all of them best suited for different types of applications and data sets. Yet knowing this, most modern data mining tools only offer a handful of proximity measures to the user, with the most common ones being Euclidean distance and Manhattan distance. This stinginess of proximity measures in data mining tools is stifling the performance of the algorithm. This is where k-luster comes in. k-luster, the web application developed as a result of this project, implements the K-means and K-means++ algorithm along with ten proximity measures, seven of which are distance measures and whereas the remaining three are similarity measures. The project was planned using the Kanban development methodology, and was built using HTML, CSS, JavaScript, Django, NumPy and pandas. The completed web application is then hosted on Heroku. k-luster allows users to upload their own data set, or choose from one of three samples if they just want to try out the application. Playing around with different settings and comparing the results obtained, it is clear how large of an impact choosing the right proximity measure can make. In conclusion, this project has accomplished what it first set out to achieve. However, there is still much room for improvement. Firstly, k-luster could incorporate additional clustering algorithms, or even classification algorithms in the future. Furthermore, the web application could save the users’ past work, so that they may resume their work at a later time without skipping a beat. 2022 Final Year Project / Dissertation / Thesis NonPeerReviewed application/pdf http://eprints.utar.edu.my/5010/1/1801846_LEE_QUAN.pdf Lee, Quan (2022) A web-based implementation of k-means algorithms. Final Year Project, UTAR. http://eprints.utar.edu.my/5010/
institution Universiti Tunku Abdul Rahman
building UTAR Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Tunku Abdul Rahman
content_source UTAR Institutional Repository
url_provider http://eprints.utar.edu.my
topic QA76 Computer software
spellingShingle QA76 Computer software
Lee, Quan
A web-based implementation of k-means algorithms
description The K-means algorithm has been around for over a century. While a rather simplistic and dated algorithm, it remains widely used and taught till this day. The K-means algorithm requires two inputs for it to be applied onto a data set, the value K, and a proximity measure. Picking the right inputs is of utmost importance if one wishes to achieve good results with the algorithm, especially the proximity measure. There are plenty of different proximity measures available in the world, all of them best suited for different types of applications and data sets. Yet knowing this, most modern data mining tools only offer a handful of proximity measures to the user, with the most common ones being Euclidean distance and Manhattan distance. This stinginess of proximity measures in data mining tools is stifling the performance of the algorithm. This is where k-luster comes in. k-luster, the web application developed as a result of this project, implements the K-means and K-means++ algorithm along with ten proximity measures, seven of which are distance measures and whereas the remaining three are similarity measures. The project was planned using the Kanban development methodology, and was built using HTML, CSS, JavaScript, Django, NumPy and pandas. The completed web application is then hosted on Heroku. k-luster allows users to upload their own data set, or choose from one of three samples if they just want to try out the application. Playing around with different settings and comparing the results obtained, it is clear how large of an impact choosing the right proximity measure can make. In conclusion, this project has accomplished what it first set out to achieve. However, there is still much room for improvement. Firstly, k-luster could incorporate additional clustering algorithms, or even classification algorithms in the future. Furthermore, the web application could save the users’ past work, so that they may resume their work at a later time without skipping a beat.
format Final Year Project / Dissertation / Thesis
author Lee, Quan
author_facet Lee, Quan
author_sort Lee, Quan
title A web-based implementation of k-means algorithms
title_short A web-based implementation of k-means algorithms
title_full A web-based implementation of k-means algorithms
title_fullStr A web-based implementation of k-means algorithms
title_full_unstemmed A web-based implementation of k-means algorithms
title_sort web-based implementation of k-means algorithms
publishDate 2022
url http://eprints.utar.edu.my/5010/1/1801846_LEE_QUAN.pdf
http://eprints.utar.edu.my/5010/
_version_ 1753793017444040704
score 13.211869