Data Pre-processing of Website Browsing Records: To Prepare Quality Dataset for Web Page Classification

The increased usage of the internet worldwide has led to an abundance of web pages designed to supply information to internet users. The use of web page classification is becoming increasingly necessary to organize the growing number of web pages. This classification model serves as a tool to restri...

Full description

Saved in:
Bibliographic Details
Main Authors: Siti Hawa, Apandi, Jamaludin, Sallim, Rozlina, Mohamed, Norkhairi, Ahmad
Format: Article
Language:English
Published: Politeknik Negeri Padang 2024
Subjects:
Online Access:http://umpir.ump.edu.my/id/eprint/40046/1/Data%20Pre-processing%20of%20Website%20Browsing%20Records%20-%20To%20Prepare%20Quality%20Dataset%20for%20Web%20Page%20Classification.pdf
http://umpir.ump.edu.my/id/eprint/40046/
https://dx.doi.org/10.62527/joiv.8.1.1618
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.ump.umpir.40046
record_format eprints
spelling my.ump.umpir.400462024-05-09T00:00:50Z http://umpir.ump.edu.my/id/eprint/40046/ Data Pre-processing of Website Browsing Records: To Prepare Quality Dataset for Web Page Classification Siti Hawa, Apandi Jamaludin, Sallim Rozlina, Mohamed Norkhairi, Ahmad QA75 Electronic computers. Computer science QA76 Computer software The increased usage of the internet worldwide has led to an abundance of web pages designed to supply information to internet users. The use of web page classification is becoming increasingly necessary to organize the growing number of web pages. This classification model serves as a tool to restrict internet usage to specific categories of web pages. To develop the classification model, it’s crucial to check the quality of the dataset, as it determines the performance of the web page classification model. Raw datasets are typically unreliable and subject to noise, which complicates data analysis. This is why data pre-processing is necessary to prepare the dataset properly. In this study, website browsing records serve as the dataset. The primary goal of this paper is to investigate data pre-processing techniques for website browsing records, focusing on Game and Online Video Streaming web pages. Data pre-processing involves two main steps: data cleaning and web content pre-processing. After completing the data cleaning process, the datasets are reduced from the original. This demonstrates that many datasets can be eliminated due to their inactivity or unsuitability as the datasets for Game and Online Video Streaming web pages. Meanwhile, web content pre-processing removes noise from an HTML document, retaining only relevant words that can represent the web page by creating a word cloud image. Convolutional Neural Networks (CNN) will be used to construct a model for categorizing web pages to determine whether they fall under Game or Online Video Streaming. The pre-processed data will be used as the input for this model. Politeknik Negeri Padang 2024-03 Article PeerReviewed pdf en cc_by_nc_sa_4 http://umpir.ump.edu.my/id/eprint/40046/1/Data%20Pre-processing%20of%20Website%20Browsing%20Records%20-%20To%20Prepare%20Quality%20Dataset%20for%20Web%20Page%20Classification.pdf Siti Hawa, Apandi and Jamaludin, Sallim and Rozlina, Mohamed and Norkhairi, Ahmad (2024) Data Pre-processing of Website Browsing Records: To Prepare Quality Dataset for Web Page Classification. International Journal on Informatics Visualization, 8 (1). pp. 239-246. ISSN 2549-9904. (Published) https://dx.doi.org/10.62527/joiv.8.1.1618 10.62527/joiv.8.1.1618
institution Universiti Malaysia Pahang Al-Sultan Abdullah
building UMPSA Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Malaysia Pahang Al-Sultan Abdullah
content_source UMPSA Institutional Repository
url_provider http://umpir.ump.edu.my/
language English
topic QA75 Electronic computers. Computer science
QA76 Computer software
spellingShingle QA75 Electronic computers. Computer science
QA76 Computer software
Siti Hawa, Apandi
Jamaludin, Sallim
Rozlina, Mohamed
Norkhairi, Ahmad
Data Pre-processing of Website Browsing Records: To Prepare Quality Dataset for Web Page Classification
description The increased usage of the internet worldwide has led to an abundance of web pages designed to supply information to internet users. The use of web page classification is becoming increasingly necessary to organize the growing number of web pages. This classification model serves as a tool to restrict internet usage to specific categories of web pages. To develop the classification model, it’s crucial to check the quality of the dataset, as it determines the performance of the web page classification model. Raw datasets are typically unreliable and subject to noise, which complicates data analysis. This is why data pre-processing is necessary to prepare the dataset properly. In this study, website browsing records serve as the dataset. The primary goal of this paper is to investigate data pre-processing techniques for website browsing records, focusing on Game and Online Video Streaming web pages. Data pre-processing involves two main steps: data cleaning and web content pre-processing. After completing the data cleaning process, the datasets are reduced from the original. This demonstrates that many datasets can be eliminated due to their inactivity or unsuitability as the datasets for Game and Online Video Streaming web pages. Meanwhile, web content pre-processing removes noise from an HTML document, retaining only relevant words that can represent the web page by creating a word cloud image. Convolutional Neural Networks (CNN) will be used to construct a model for categorizing web pages to determine whether they fall under Game or Online Video Streaming. The pre-processed data will be used as the input for this model.
format Article
author Siti Hawa, Apandi
Jamaludin, Sallim
Rozlina, Mohamed
Norkhairi, Ahmad
author_facet Siti Hawa, Apandi
Jamaludin, Sallim
Rozlina, Mohamed
Norkhairi, Ahmad
author_sort Siti Hawa, Apandi
title Data Pre-processing of Website Browsing Records: To Prepare Quality Dataset for Web Page Classification
title_short Data Pre-processing of Website Browsing Records: To Prepare Quality Dataset for Web Page Classification
title_full Data Pre-processing of Website Browsing Records: To Prepare Quality Dataset for Web Page Classification
title_fullStr Data Pre-processing of Website Browsing Records: To Prepare Quality Dataset for Web Page Classification
title_full_unstemmed Data Pre-processing of Website Browsing Records: To Prepare Quality Dataset for Web Page Classification
title_sort data pre-processing of website browsing records: to prepare quality dataset for web page classification
publisher Politeknik Negeri Padang
publishDate 2024
url http://umpir.ump.edu.my/id/eprint/40046/1/Data%20Pre-processing%20of%20Website%20Browsing%20Records%20-%20To%20Prepare%20Quality%20Dataset%20for%20Web%20Page%20Classification.pdf
http://umpir.ump.edu.my/id/eprint/40046/
https://dx.doi.org/10.62527/joiv.8.1.1618
_version_ 1822924299886919680
score 13.235367