Features extraction for illicit web pages identification using identification component analysis

The illicit Web content such as pornography, violence, gambling, etc. have greatly polluted the mind of immature web users. Pornography perhaps is one of the biggest threats related to current childrenpsilas and teenagerspsila healthy mental life. A proper way to identify illicit web pages efficient...

Full description

Saved in:
Bibliographic Details
Main Authors: Lee, Zhi Sam, Maarof, Mohd. Aizaini, Selamat, Ali, Shamsuddin, Siti Mariyam
Format: Conference or Workshop Item
Published: 2007
Subjects:
Online Access:http://eprints.utm.my/id/eprint/13979/
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.utm.13979
record_format eprints
spelling my.utm.139792017-08-06T04:27:58Z http://eprints.utm.my/id/eprint/13979/ Features extraction for illicit web pages identification using identification component analysis Lee, Zhi Sam Maarof, Mohd. Aizaini Selamat, Ali Shamsuddin, Siti Mariyam QA75 Electronic computers. Computer science The illicit Web content such as pornography, violence, gambling, etc. have greatly polluted the mind of immature web users. Pornography perhaps is one of the biggest threats related to current childrenpsilas and teenagerspsila healthy mental life. A proper way to identify illicit web pages efficiently is highly desired. In this paper, we analyze the textual content of web pages such as pornography, gynecology, sex education and general business news using independent component analysis (ICA) algorithm. We establish three similar models which are principal component analysis (PCA) model, ICA model and PCA-ICA model as comparison. We evaluate the effectiveness of these proposed models using information retrieval measurement such as precision, recall, F1 and accuracy. Our experiment result shown that PCA and PCA-ICA models are capable to identify illicit web pages correctly with overall performance above than 90%. The idea of this research would give researchers an insight into textual content-based for web pages categorization. 2007 Conference or Workshop Item PeerReviewed Lee, Zhi Sam and Maarof, Mohd. Aizaini and Selamat, Ali and Shamsuddin, Siti Mariyam (2007) Features extraction for illicit web pages identification using identification component analysis. In: International Conference on Intelligent and Advanced Systems (ICIAS’07), 2007, Kuala Lumpur.
institution Universiti Teknologi Malaysia
building UTM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Teknologi Malaysia
content_source UTM Institutional Repository
url_provider http://eprints.utm.my/
topic QA75 Electronic computers. Computer science
spellingShingle QA75 Electronic computers. Computer science
Lee, Zhi Sam
Maarof, Mohd. Aizaini
Selamat, Ali
Shamsuddin, Siti Mariyam
Features extraction for illicit web pages identification using identification component analysis
description The illicit Web content such as pornography, violence, gambling, etc. have greatly polluted the mind of immature web users. Pornography perhaps is one of the biggest threats related to current childrenpsilas and teenagerspsila healthy mental life. A proper way to identify illicit web pages efficiently is highly desired. In this paper, we analyze the textual content of web pages such as pornography, gynecology, sex education and general business news using independent component analysis (ICA) algorithm. We establish three similar models which are principal component analysis (PCA) model, ICA model and PCA-ICA model as comparison. We evaluate the effectiveness of these proposed models using information retrieval measurement such as precision, recall, F1 and accuracy. Our experiment result shown that PCA and PCA-ICA models are capable to identify illicit web pages correctly with overall performance above than 90%. The idea of this research would give researchers an insight into textual content-based for web pages categorization.
format Conference or Workshop Item
author Lee, Zhi Sam
Maarof, Mohd. Aizaini
Selamat, Ali
Shamsuddin, Siti Mariyam
author_facet Lee, Zhi Sam
Maarof, Mohd. Aizaini
Selamat, Ali
Shamsuddin, Siti Mariyam
author_sort Lee, Zhi Sam
title Features extraction for illicit web pages identification using identification component analysis
title_short Features extraction for illicit web pages identification using identification component analysis
title_full Features extraction for illicit web pages identification using identification component analysis
title_fullStr Features extraction for illicit web pages identification using identification component analysis
title_full_unstemmed Features extraction for illicit web pages identification using identification component analysis
title_sort features extraction for illicit web pages identification using identification component analysis
publishDate 2007
url http://eprints.utm.my/id/eprint/13979/
_version_ 1643646303826083840
score 13.154949