Content based fraudulent website detection using supervised machine learning techniques

Fraudulent websites pose as legitimate sources of information, goods, product and services are propagating and resulted in loss of billions of dollars. Due to several undesirable impacts of Internet fraud and scam, several studies and approaches are focused to identify fraudulent Internet websites,...

詳細記述

保存先:
書誌詳細
主要な著者: Maktabar, Mahdi, Zainal, Anazida, Maarof, Mohd. Aizaini, Kassim, Mohamad Nizam
フォーマット: Conference or Workshop Item
出版事項: 2018
主題:
オンライン・アクセス:http://eprints.utm.my/id/eprint/81884/
http://dx.doi.org/10.1007/978-3-319-76351-4_30
タグ: タグ追加
タグなし, このレコードへの初めてのタグを付けませんか!
その他の書誌記述
要約:Fraudulent websites pose as legitimate sources of information, goods, product and services are propagating and resulted in loss of billions of dollars. Due to several undesirable impacts of Internet fraud and scam, several studies and approaches are focused to identify fraudulent Internet websites, yet none of them managed to offer an efficient solution to suppress these fraudulent activities. With this regard, this research proposes a fraudulent website detection model based on sentiment analysis of the textual contents of a given website, natural language processing and supervised machine learning techniques. The proposed model consists of four primary phases which are data acquisition phase, preprocessing phase, feature extraction phase and classification phase. Crawler is used to obtained data from Internet and data was cleaned to remove non-discriminative noises and reshape into desired format. Later, meaningful and discriminative patterns are extracted. Finally classification phase consists of supervised machine learning techniques to construct the fraudulent website detection model. This research employs 10-fold stratified cross validation technique in order to validate the performance of the proposed model. Experimental results show that the proposed fraudulent website detection model with cross validated accuracy of 97.67% and FPR of 3.49% achieved satisfactory results and served the aim of this research.