Comparative study of machine learning algorithms in website phishing detection

Harmful programs that are created to thieve user credentials have become a lot over the recent years, potentially leading to a loss of cash. The methods which are utilized by attackers to collect confidential information vary, when online banking systems continue to be the main goal of these attacks...

Full description

Saved in:
Bibliographic Details
Main Author: Kalybayev, Almukhammed
Format: Thesis
Language:English
Published: 2013
Subjects:
Online Access:http://eprints.utm.my/id/eprint/35828/5/AlmukhammedKalbayevMFSKSM2013.pdf
http://eprints.utm.my/id/eprint/35828/
http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:70363?site_name=Restricted Repository
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Harmful programs that are created to thieve user credentials have become a lot over the recent years, potentially leading to a loss of cash. The methods which are utilized by attackers to collect confidential information vary, when online banking systems continue to be the main goal of these attacks. Nowadays most widespread approach to protect against phishing attack is using blacklists in antiviruses and browser toolbars. Unfortunately, blacklist method fails in responding to newly emanating phishing attacks since registering new domain names has become easier, no comprehensive blacklist can ensure a perfect up-to-date database. Therefore it requires another approach to counter phishing attack which is more accurate and efficient than blacklist method. The purpose of this work is to evaluate and analyze the effectiveness of applying machine learning algorithms such as an Artificial Neural Network, Support Vector Machines and K-nearest Neighbor to website phishing detection. The datasets of phishing and non-phishing websites were gathered in order to train, test machine learning algorithm models, compare evaluative metrics of algorithms between each other. In addition, the final dataset was divided into three datasets with different ratios to see whether or not the trained models will show constant performance in testing results and whether these proportions have a good or bad influence on the ability of trained models to classify website. After all the analysis of the performance of each machine learning algorithm was made. This project suggests the Support Vector Machines algorithm as the best one to be used in phishing detection regardless of dataset proportion, because it showed almost the same performance throughout all test phases which is 98.5% on average.