Privacy preserving data mining using anonymization and K-means clustering on labor dataset
Privacy Preserving Data Mining (PPDM) has recently become an important research area. There are some issues and problems related to PPDM have been identified. Information loss occur when the original of data are modified to keep the privacy of those data. Effects of PPDM also cause the level of data...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Language: | English |
Published: |
2019
|
Subjects: | |
Online Access: | http://eprints.utm.my/id/eprint/96295/1/SamahahSolehahMSC2019.pdf.pdf http://eprints.utm.my/id/eprint/96295/ http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:143456 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
my.utm.96295 |
---|---|
record_format |
eprints |
spelling |
my.utm.962952022-07-12T08:16:13Z http://eprints.utm.my/id/eprint/96295/ Privacy preserving data mining using anonymization and K-means clustering on labor dataset Ahmad Zahari, Samahah Solehah QA75 Electronic computers. Computer science Privacy Preserving Data Mining (PPDM) has recently become an important research area. There are some issues and problems related to PPDM have been identified. Information loss occur when the original of data are modified to keep the privacy of those data. Effects of PPDM also cause the level of data quality become lower. Aim of this research is to minimize information loss and increase the accuracy of mining result while maintaining the privacy level of data. A randomization approach based on anonymization and clustering algorithms are proposed in order to minimize the information loss and improve the accuracy of data clustering quality for PPDM results. Anonymization method is used in order to generalize and supress the data and limit the disclosure risk. Besides, the accuracy of data mining results could be increased by applying clustering using K-Means and EM algorithms. Labor dataset is used in this research and all instances are numerical value. WEKA tool is used to perform clustering algorithm on the labor dataset. Outcome for this research is the privacy level of dataset was increased while the information loss is minimized. The experimental results also show that the proposed method provides better result in privacy level of data mining. 2019 Thesis NonPeerReviewed application/pdf en http://eprints.utm.my/id/eprint/96295/1/SamahahSolehahMSC2019.pdf.pdf Ahmad Zahari, Samahah Solehah (2019) Privacy preserving data mining using anonymization and K-means clustering on labor dataset. Masters thesis, Universiti Teknologi Malaysia. http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:143456 |
institution |
Universiti Teknologi Malaysia |
building |
UTM Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Teknologi Malaysia |
content_source |
UTM Institutional Repository |
url_provider |
http://eprints.utm.my/ |
language |
English |
topic |
QA75 Electronic computers. Computer science |
spellingShingle |
QA75 Electronic computers. Computer science Ahmad Zahari, Samahah Solehah Privacy preserving data mining using anonymization and K-means clustering on labor dataset |
description |
Privacy Preserving Data Mining (PPDM) has recently become an important research area. There are some issues and problems related to PPDM have been identified. Information loss occur when the original of data are modified to keep the privacy of those data. Effects of PPDM also cause the level of data quality become lower. Aim of this research is to minimize information loss and increase the accuracy of mining result while maintaining the privacy level of data. A randomization approach based on anonymization and clustering algorithms are proposed in order to minimize the information loss and improve the accuracy of data clustering quality for PPDM results. Anonymization method is used in order to generalize and supress the data and limit the disclosure risk. Besides, the accuracy of data mining results could be increased by applying clustering using K-Means and EM algorithms. Labor dataset is used in this research and all instances are numerical value. WEKA tool is used to perform clustering algorithm on the labor dataset. Outcome for this research is the privacy level of dataset was increased while the information loss is minimized. The experimental results also show that the proposed method provides better result in privacy level of data mining. |
format |
Thesis |
author |
Ahmad Zahari, Samahah Solehah |
author_facet |
Ahmad Zahari, Samahah Solehah |
author_sort |
Ahmad Zahari, Samahah Solehah |
title |
Privacy preserving data mining using anonymization and K-means clustering on labor dataset |
title_short |
Privacy preserving data mining using anonymization and K-means clustering on labor dataset |
title_full |
Privacy preserving data mining using anonymization and K-means clustering on labor dataset |
title_fullStr |
Privacy preserving data mining using anonymization and K-means clustering on labor dataset |
title_full_unstemmed |
Privacy preserving data mining using anonymization and K-means clustering on labor dataset |
title_sort |
privacy preserving data mining using anonymization and k-means clustering on labor dataset |
publishDate |
2019 |
url |
http://eprints.utm.my/id/eprint/96295/1/SamahahSolehahMSC2019.pdf.pdf http://eprints.utm.my/id/eprint/96295/ http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:143456 |
_version_ |
1738510349168017408 |
score |
13.211869 |