New approaches to normalization techniques to enhance k-means clustering algorithm

Clustering is fundamentally one of the leading origin of basic data mining tools, which makes researchers believe the normal grouping of attributes in datasets. The main aim of clustering is to ascertain similarities and arrangements with a large dataset by partitioning data into clusters. It is imp...

Full description

Saved in:
Bibliographic Details
Main Authors: Dalatu, Paul Inuwa, Midi, Habshah
Format: Article
Language:English
Published: Institute for Mathematical Research, Universiti Putra Malaysia 2020
Online Access:http://psasir.upm.edu.my/id/eprint/38339/1/3.%20Paul%20n%20Habshah.pdf
http://psasir.upm.edu.my/id/eprint/38339/
http://einspem.upm.edu.my/journal/fullpaper/vol14no1jan/3.%20Paul%20n%20Habshah.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.upm.eprints.38339
record_format eprints
spelling my.upm.eprints.383392020-05-04T16:19:08Z http://psasir.upm.edu.my/id/eprint/38339/ New approaches to normalization techniques to enhance k-means clustering algorithm Dalatu, Paul Inuwa Midi, Habshah Clustering is fundamentally one of the leading origin of basic data mining tools, which makes researchers believe the normal grouping of attributes in datasets. The main aim of clustering is to ascertain similarities and arrangements with a large dataset by partitioning data into clusters. It is important to note that distance measures like Euclidean distance, should not be used without normalization of datasets. The limitation of using both Min-Max (MM) and Decimal Scaling (DS) normalization methods are that the minimum and maximum values may be out-of-samples when dataset are unknown. Therefore, we proposed two new normalization approaches to overcome attributes with initially large magnitudes from overweighing attributes with initially smaller magnitudes. The two new normalization approaches are called New Approach to Min-Max (NAMM) and New Approach to Decimal Scaling (NADS). To evaluate the performance of our proposed approaches, simulation study and real data applications are considered. However, the two proposed approaches have shown good performance compared to the existing methods, by achieving nearly maximum points in the average external validity measures, recorded lower computing time and clustering the object points to almost all their cluster centers. Consequently, from the results obtained, it can be noted that the NAMM and NADS approaches yielded better performance in the data preprocessing methods, which down weight the magnitudes of large values. Institute for Mathematical Research, Universiti Putra Malaysia 2020 Article PeerReviewed text en http://psasir.upm.edu.my/id/eprint/38339/1/3.%20Paul%20n%20Habshah.pdf Dalatu, Paul Inuwa and Midi, Habshah (2020) New approaches to normalization techniques to enhance k-means clustering algorithm. Malaysian Journal of Mathematical Sciences, 14 (1). pp. 41-62. ISSN 1823-8343; ESSN: 2289-750X http://einspem.upm.edu.my/journal/fullpaper/vol14no1jan/3.%20Paul%20n%20Habshah.pdf
institution Universiti Putra Malaysia
building UPM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Putra Malaysia
content_source UPM Institutional Repository
url_provider http://psasir.upm.edu.my/
language English
description Clustering is fundamentally one of the leading origin of basic data mining tools, which makes researchers believe the normal grouping of attributes in datasets. The main aim of clustering is to ascertain similarities and arrangements with a large dataset by partitioning data into clusters. It is important to note that distance measures like Euclidean distance, should not be used without normalization of datasets. The limitation of using both Min-Max (MM) and Decimal Scaling (DS) normalization methods are that the minimum and maximum values may be out-of-samples when dataset are unknown. Therefore, we proposed two new normalization approaches to overcome attributes with initially large magnitudes from overweighing attributes with initially smaller magnitudes. The two new normalization approaches are called New Approach to Min-Max (NAMM) and New Approach to Decimal Scaling (NADS). To evaluate the performance of our proposed approaches, simulation study and real data applications are considered. However, the two proposed approaches have shown good performance compared to the existing methods, by achieving nearly maximum points in the average external validity measures, recorded lower computing time and clustering the object points to almost all their cluster centers. Consequently, from the results obtained, it can be noted that the NAMM and NADS approaches yielded better performance in the data preprocessing methods, which down weight the magnitudes of large values.
format Article
author Dalatu, Paul Inuwa
Midi, Habshah
spellingShingle Dalatu, Paul Inuwa
Midi, Habshah
New approaches to normalization techniques to enhance k-means clustering algorithm
author_facet Dalatu, Paul Inuwa
Midi, Habshah
author_sort Dalatu, Paul Inuwa
title New approaches to normalization techniques to enhance k-means clustering algorithm
title_short New approaches to normalization techniques to enhance k-means clustering algorithm
title_full New approaches to normalization techniques to enhance k-means clustering algorithm
title_fullStr New approaches to normalization techniques to enhance k-means clustering algorithm
title_full_unstemmed New approaches to normalization techniques to enhance k-means clustering algorithm
title_sort new approaches to normalization techniques to enhance k-means clustering algorithm
publisher Institute for Mathematical Research, Universiti Putra Malaysia
publishDate 2020
url http://psasir.upm.edu.my/id/eprint/38339/1/3.%20Paul%20n%20Habshah.pdf
http://psasir.upm.edu.my/id/eprint/38339/
http://einspem.upm.edu.my/journal/fullpaper/vol14no1jan/3.%20Paul%20n%20Habshah.pdf
_version_ 1665895976770994176
score 13.160551