Content based fraudulent website detection using supervised machine learning techniques

Fraudulent websites pose as legitimate sources of information, goods, product and services are propagating and resulted in loss of billions of dollars. Due to several undesirable impacts of Internet fraud and scam, several studies and approaches are focused to identify fraudulent Internet websites,...

Full description

Saved in:
Bibliographic Details
Main Authors: Maktabar, Mahdi, Zainal, Anazida, Maarof, Mohd. Aizaini, Kassim, Mohamad Nizam
Format: Conference or Workshop Item
Published: 2018
Subjects:
Online Access:http://eprints.utm.my/id/eprint/81884/
http://dx.doi.org/10.1007/978-3-319-76351-4_30
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.utm.81884
record_format eprints
spelling my.utm.818842019-09-30T12:59:38Z http://eprints.utm.my/id/eprint/81884/ Content based fraudulent website detection using supervised machine learning techniques Maktabar, Mahdi Zainal, Anazida Maarof, Mohd. Aizaini Kassim, Mohamad Nizam QA75 Electronic computers. Computer science Fraudulent websites pose as legitimate sources of information, goods, product and services are propagating and resulted in loss of billions of dollars. Due to several undesirable impacts of Internet fraud and scam, several studies and approaches are focused to identify fraudulent Internet websites, yet none of them managed to offer an efficient solution to suppress these fraudulent activities. With this regard, this research proposes a fraudulent website detection model based on sentiment analysis of the textual contents of a given website, natural language processing and supervised machine learning techniques. The proposed model consists of four primary phases which are data acquisition phase, preprocessing phase, feature extraction phase and classification phase. Crawler is used to obtained data from Internet and data was cleaned to remove non-discriminative noises and reshape into desired format. Later, meaningful and discriminative patterns are extracted. Finally classification phase consists of supervised machine learning techniques to construct the fraudulent website detection model. This research employs 10-fold stratified cross validation technique in order to validate the performance of the proposed model. Experimental results show that the proposed fraudulent website detection model with cross validated accuracy of 97.67% and FPR of 3.49% achieved satisfactory results and served the aim of this research. 2018 Conference or Workshop Item PeerReviewed Maktabar, Mahdi and Zainal, Anazida and Maarof, Mohd. Aizaini and Kassim, Mohamad Nizam (2018) Content based fraudulent website detection using supervised machine learning techniques. In: 17th International Conference on Hybrid Intelligent Systems, HIS 2017, 14 December 2017 through 16 December 2017, Delhi, India. http://dx.doi.org/10.1007/978-3-319-76351-4_30
institution Universiti Teknologi Malaysia
building UTM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Teknologi Malaysia
content_source UTM Institutional Repository
url_provider http://eprints.utm.my/
topic QA75 Electronic computers. Computer science
spellingShingle QA75 Electronic computers. Computer science
Maktabar, Mahdi
Zainal, Anazida
Maarof, Mohd. Aizaini
Kassim, Mohamad Nizam
Content based fraudulent website detection using supervised machine learning techniques
description Fraudulent websites pose as legitimate sources of information, goods, product and services are propagating and resulted in loss of billions of dollars. Due to several undesirable impacts of Internet fraud and scam, several studies and approaches are focused to identify fraudulent Internet websites, yet none of them managed to offer an efficient solution to suppress these fraudulent activities. With this regard, this research proposes a fraudulent website detection model based on sentiment analysis of the textual contents of a given website, natural language processing and supervised machine learning techniques. The proposed model consists of four primary phases which are data acquisition phase, preprocessing phase, feature extraction phase and classification phase. Crawler is used to obtained data from Internet and data was cleaned to remove non-discriminative noises and reshape into desired format. Later, meaningful and discriminative patterns are extracted. Finally classification phase consists of supervised machine learning techniques to construct the fraudulent website detection model. This research employs 10-fold stratified cross validation technique in order to validate the performance of the proposed model. Experimental results show that the proposed fraudulent website detection model with cross validated accuracy of 97.67% and FPR of 3.49% achieved satisfactory results and served the aim of this research.
format Conference or Workshop Item
author Maktabar, Mahdi
Zainal, Anazida
Maarof, Mohd. Aizaini
Kassim, Mohamad Nizam
author_facet Maktabar, Mahdi
Zainal, Anazida
Maarof, Mohd. Aizaini
Kassim, Mohamad Nizam
author_sort Maktabar, Mahdi
title Content based fraudulent website detection using supervised machine learning techniques
title_short Content based fraudulent website detection using supervised machine learning techniques
title_full Content based fraudulent website detection using supervised machine learning techniques
title_fullStr Content based fraudulent website detection using supervised machine learning techniques
title_full_unstemmed Content based fraudulent website detection using supervised machine learning techniques
title_sort content based fraudulent website detection using supervised machine learning techniques
publishDate 2018
url http://eprints.utm.my/id/eprint/81884/
http://dx.doi.org/10.1007/978-3-319-76351-4_30
_version_ 1651866375584481280
score 13.154949