Staff View: Big Data Mining Using K-Means and DBSCAN Clustering Techniques

Big Data Mining Using K-Means and DBSCAN Clustering Techniques

The World Wide Web industry generates big and complex data such as web server log files. Many data mining techniques can be used to analyze log files to extract knowledge and valuable information for both organizations and web developers. Large amounts of heterogeneous data are generated by websites...

Full description

Saved in:

Bibliographic Details
Main Authors:	Fawzia Omer, A., Mohammed, H.A., Awadallah, M.A., Khan, Z., Abrar, S.U., Shah, M.D.
Format:	Article
Published:	Springer Science and Business Media Deutschland GmbH 2022
Online Access:	http://scholars.utp.edu.my/id/eprint/34107/ https://www.scopus.com/inward/record.uri?eid=2-s2.0-85137571960&doi=10.1007%2f978-3-031-05752-6_15&partnerID=40&md5=046b945c39ff7687ef54619b07e0ded3
Tags:	Add Tag No Tags, Be the first to tag this record!

id	oai:scholars.utp.edu.my:34107
record_format	eprints
spelling	oai:scholars.utp.edu.my:341072023-01-03T07:23:01Z http://scholars.utp.edu.my/id/eprint/34107/ Big Data Mining Using K-Means and DBSCAN Clustering Techniques Fawzia Omer, A. Mohammed, H.A. Awadallah, M.A. Khan, Z. Abrar, S.U. Shah, M.D. The World Wide Web industry generates big and complex data such as web server log files. Many data mining techniques can be used to analyze log files to extract knowledge and valuable information for both organizations and web developers. Large amounts of heterogeneous data are generated by websites, performing effective analysis on these data and transforming them into useful information using the existing traditional techniques is a challenging process. Therefore, this paper aims to analyze and cluster the log file data to get useful information that helps understand the users' behavior. A variety of data mining techniques were used to address the problem; three steps of data pre-processing were applied, namely the cleaning of data, the identification of users, and the identification of sessions. Results obtained after pre-processing phase showed that the data quality will improve when the number of records reduced by (51.45). The density-based spatial clustering of applications with noise (DBSCAN) and the K-means algorithm were used to develop clustering algorithms. Density-based clustering with three clusters outperformed the K-Means algorithm with three clusters in terms of accuracy. Â© 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG. Springer Science and Business Media Deutschland GmbH 2022 Article NonPeerReviewed Fawzia Omer, A. and Mohammed, H.A. and Awadallah, M.A. and Khan, Z. and Abrar, S.U. and Shah, M.D. (2022) Big Data Mining Using K-Means and DBSCAN Clustering Techniques. Studies in Big Data, 111. pp. 231-246. ISSN 21976503 https://www.scopus.com/inward/record.uri?eid=2-s2.0-85137571960&doi=10.1007%2f978-3-031-05752-6_15&partnerID=40&md5=046b945c39ff7687ef54619b07e0ded3 10.1007/978-3-031-05752-6₁₅ 10.1007/978-3-031-05752-6₁₅
institution	Universiti Teknologi Petronas
building	UTP Resource Centre
collection	Institutional Repository
continent	Asia
country	Malaysia
content_provider	Universiti Teknologi Petronas
content_source	UTP Institutional Repository
url_provider	http://eprints.utp.edu.my/
description	The World Wide Web industry generates big and complex data such as web server log files. Many data mining techniques can be used to analyze log files to extract knowledge and valuable information for both organizations and web developers. Large amounts of heterogeneous data are generated by websites, performing effective analysis on these data and transforming them into useful information using the existing traditional techniques is a challenging process. Therefore, this paper aims to analyze and cluster the log file data to get useful information that helps understand the users' behavior. A variety of data mining techniques were used to address the problem; three steps of data pre-processing were applied, namely the cleaning of data, the identification of users, and the identification of sessions. Results obtained after pre-processing phase showed that the data quality will improve when the number of records reduced by (51.45). The density-based spatial clustering of applications with noise (DBSCAN) and the K-means algorithm were used to develop clustering algorithms. Density-based clustering with three clusters outperformed the K-Means algorithm with three clusters in terms of accuracy. Â© 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.
format	Article
author	Fawzia Omer, A. Mohammed, H.A. Awadallah, M.A. Khan, Z. Abrar, S.U. Shah, M.D.
spellingShingle	Fawzia Omer, A. Mohammed, H.A. Awadallah, M.A. Khan, Z. Abrar, S.U. Shah, M.D. Big Data Mining Using K-Means and DBSCAN Clustering Techniques
author_facet	Fawzia Omer, A. Mohammed, H.A. Awadallah, M.A. Khan, Z. Abrar, S.U. Shah, M.D.
author_sort	Fawzia Omer, A.
title	Big Data Mining Using K-Means and DBSCAN Clustering Techniques
title_short	Big Data Mining Using K-Means and DBSCAN Clustering Techniques
title_full	Big Data Mining Using K-Means and DBSCAN Clustering Techniques
title_fullStr	Big Data Mining Using K-Means and DBSCAN Clustering Techniques
title_full_unstemmed	Big Data Mining Using K-Means and DBSCAN Clustering Techniques
title_sort	big data mining using k-means and dbscan clustering techniques
publisher	Springer Science and Business Media Deutschland GmbH
publishDate	2022
url	http://scholars.utp.edu.my/id/eprint/34107/ https://www.scopus.com/inward/record.uri?eid=2-s2.0-85137571960&doi=10.1007%2f978-3-031-05752-6_15&partnerID=40&md5=046b945c39ff7687ef54619b07e0ded3
_version_	1754532127989301248
score	13.244109

Big Data Mining Using K-Means and DBSCAN Clustering Techniques

Similar Items