A graph-theoretic approach for the detection of phishing webpages

Over the years, various technical means have been developed to protect Internet users from phishing attacks. To enrich the anti-phishing efforts, we capitalise on concepts from graph theories, and propose a set of novel graph features to improve the phishing detection accuracy. The initial phase of...

Full description

Saved in:
Bibliographic Details
Main Authors: Tan, Choon Lin, Chiew, Kang Leng, Yong, Kelvin S.C., Sze, San Nah, Abdullah, Johari, Sebastian, Yakub
Format: Article
Language:English
Published: Elsevier 2020
Subjects:
Online Access:http://ir.unimas.my/id/eprint/31278/1/A%20Graph-Theoretic%20-%20Copy.pdf
http://ir.unimas.my/id/eprint/31278/
https://www.sciencedirect.com/science/article/pii/S016740482030078X
https://doi.org/10.1016/j.cose.2020.101793
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.unimas.ir.31278
record_format eprints
spelling my.unimas.ir.312782022-08-09T03:43:16Z http://ir.unimas.my/id/eprint/31278/ A graph-theoretic approach for the detection of phishing webpages Tan, Choon Lin Chiew, Kang Leng Yong, Kelvin S.C. Sze, San Nah Abdullah, Johari Sebastian, Yakub QA75 Electronic computers. Computer science Over the years, various technical means have been developed to protect Internet users from phishing attacks. To enrich the anti-phishing efforts, we capitalise on concepts from graph theories, and propose a set of novel graph features to improve the phishing detection accuracy. The initial phase of the proposed technique involved the extraction of hyperlinks in the webpage under scrutiny and fetching the corresponding neighbourhood webpages. During this process, the page linking data were collected, and used to construct a web graph which models the overall hyperlink and network structure of the webpage. From the web graph, graph measures were computed and extracted as graph features to derive a classifier for detecting phishing webpages. Experimental results show that the proposed graph features achieve an improved overall accuracy of 97.8% when C4.5 was utilised as classifier, outperforming the existing conventional features derived from the same data samples. Unlike conventional features, the proposed graph features leverage inherent phishing patterns that are only visible at a higher level of abstraction, thus making it robust and difficult to be evaded by direct manipulations on the webpage contents. Our proposed graph-based technique also shows promising results when benchmarked against a prominent phishing detection technique. Hence, the proposed technique is an important contribution to the existing anti-phishing research towards improving the detection performance. Elsevier 2020-08 Article PeerReviewed text en http://ir.unimas.my/id/eprint/31278/1/A%20Graph-Theoretic%20-%20Copy.pdf Tan, Choon Lin and Chiew, Kang Leng and Yong, Kelvin S.C. and Sze, San Nah and Abdullah, Johari and Sebastian, Yakub (2020) A graph-theoretic approach for the detection of phishing webpages. Computers & Security, 95. p. 101793. ISSN 0167-4048 https://www.sciencedirect.com/science/article/pii/S016740482030078X https://doi.org/10.1016/j.cose.2020.101793
institution Universiti Malaysia Sarawak
building Centre for Academic Information Services (CAIS)
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Malaysia Sarawak
content_source UNIMAS Institutional Repository
url_provider http://ir.unimas.my/
language English
topic QA75 Electronic computers. Computer science
spellingShingle QA75 Electronic computers. Computer science
Tan, Choon Lin
Chiew, Kang Leng
Yong, Kelvin S.C.
Sze, San Nah
Abdullah, Johari
Sebastian, Yakub
A graph-theoretic approach for the detection of phishing webpages
description Over the years, various technical means have been developed to protect Internet users from phishing attacks. To enrich the anti-phishing efforts, we capitalise on concepts from graph theories, and propose a set of novel graph features to improve the phishing detection accuracy. The initial phase of the proposed technique involved the extraction of hyperlinks in the webpage under scrutiny and fetching the corresponding neighbourhood webpages. During this process, the page linking data were collected, and used to construct a web graph which models the overall hyperlink and network structure of the webpage. From the web graph, graph measures were computed and extracted as graph features to derive a classifier for detecting phishing webpages. Experimental results show that the proposed graph features achieve an improved overall accuracy of 97.8% when C4.5 was utilised as classifier, outperforming the existing conventional features derived from the same data samples. Unlike conventional features, the proposed graph features leverage inherent phishing patterns that are only visible at a higher level of abstraction, thus making it robust and difficult to be evaded by direct manipulations on the webpage contents. Our proposed graph-based technique also shows promising results when benchmarked against a prominent phishing detection technique. Hence, the proposed technique is an important contribution to the existing anti-phishing research towards improving the detection performance.
format Article
author Tan, Choon Lin
Chiew, Kang Leng
Yong, Kelvin S.C.
Sze, San Nah
Abdullah, Johari
Sebastian, Yakub
author_facet Tan, Choon Lin
Chiew, Kang Leng
Yong, Kelvin S.C.
Sze, San Nah
Abdullah, Johari
Sebastian, Yakub
author_sort Tan, Choon Lin
title A graph-theoretic approach for the detection of phishing webpages
title_short A graph-theoretic approach for the detection of phishing webpages
title_full A graph-theoretic approach for the detection of phishing webpages
title_fullStr A graph-theoretic approach for the detection of phishing webpages
title_full_unstemmed A graph-theoretic approach for the detection of phishing webpages
title_sort graph-theoretic approach for the detection of phishing webpages
publisher Elsevier
publishDate 2020
url http://ir.unimas.my/id/eprint/31278/1/A%20Graph-Theoretic%20-%20Copy.pdf
http://ir.unimas.my/id/eprint/31278/
https://www.sciencedirect.com/science/article/pii/S016740482030078X
https://doi.org/10.1016/j.cose.2020.101793
_version_ 1740829580391874560
score 13.18916