Malware behavior profiling from unstructured data

Recently, the emergence of the new malware has caused a major threat especially in finance sector in which many of the online banking data was stolen by the adversaries. The malware threats information needs to be collected immediately after its outbreak. Early detection can save others from being t...

Full description

Saved in:
Bibliographic Details
Main Authors: Yoong, Jien Chiam, Maarof, Mohd. Aizaini, Kassim, Mohamad Nizam, Zainal, Anazida
Format: Conference or Workshop Item
Published: 2020
Subjects:
Online Access:http://eprints.utm.my/id/eprint/92351/
http://dx.doi.org/10.1007/978-3-030-49345-5_14
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.utm.92351
record_format eprints
spelling my.utm.923512021-09-28T07:38:56Z http://eprints.utm.my/id/eprint/92351/ Malware behavior profiling from unstructured data Yoong, Jien Chiam Maarof, Mohd. Aizaini Kassim, Mohamad Nizam Zainal, Anazida QA75 Electronic computers. Computer science Recently, the emergence of the new malware has caused a major threat especially in finance sector in which many of the online banking data was stolen by the adversaries. The malware threats information needs to be collected immediately after its outbreak. Early detection can save others from being the victims. Unfortunately, there is time delay to get the new malware information into the Malware Database such as ExploitDB. A pre-emptive way needs to be taken to gather the first-hand information of the new malware as a preventive measure. One of the methods is by extracting information from open source data such as online news by using Named Entity Recognition (NER). However, the existing NER system is incapable to extract the domain specific entities from the online news accurately. The aim of this paper is to extract the malware entities and its behaviour attributes using extended version of NER with HMM and CRF. A malware annotated corpus is produced in order to conduct the supervise learning for the machine learning approach of the name entity tagger. The results show CRF performs slightly better than HMM. Few experiments are performed in order to optimize the performance of CRF in terms of feature extraction. Finally, the malware behaviour information is visualized onto a dashboard by combining few statistical graphs using matplotlib. The purpose of visualizing the malware behaviour profile extracted from the online news is to help cyber security experts to better understand the malware behaviour. 2020 Conference or Workshop Item PeerReviewed Yoong, Jien Chiam and Maarof, Mohd. Aizaini and Kassim, Mohamad Nizam and Zainal, Anazida (2020) Malware behavior profiling from unstructured data. In: 11th International Conference on Soft Computing & Pattern Recognition (SOCPAR 2019), 13 – 15 December 2019, Hyderabad, India. http://dx.doi.org/10.1007/978-3-030-49345-5_14
institution Universiti Teknologi Malaysia
building UTM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Teknologi Malaysia
content_source UTM Institutional Repository
url_provider http://eprints.utm.my/
topic QA75 Electronic computers. Computer science
spellingShingle QA75 Electronic computers. Computer science
Yoong, Jien Chiam
Maarof, Mohd. Aizaini
Kassim, Mohamad Nizam
Zainal, Anazida
Malware behavior profiling from unstructured data
description Recently, the emergence of the new malware has caused a major threat especially in finance sector in which many of the online banking data was stolen by the adversaries. The malware threats information needs to be collected immediately after its outbreak. Early detection can save others from being the victims. Unfortunately, there is time delay to get the new malware information into the Malware Database such as ExploitDB. A pre-emptive way needs to be taken to gather the first-hand information of the new malware as a preventive measure. One of the methods is by extracting information from open source data such as online news by using Named Entity Recognition (NER). However, the existing NER system is incapable to extract the domain specific entities from the online news accurately. The aim of this paper is to extract the malware entities and its behaviour attributes using extended version of NER with HMM and CRF. A malware annotated corpus is produced in order to conduct the supervise learning for the machine learning approach of the name entity tagger. The results show CRF performs slightly better than HMM. Few experiments are performed in order to optimize the performance of CRF in terms of feature extraction. Finally, the malware behaviour information is visualized onto a dashboard by combining few statistical graphs using matplotlib. The purpose of visualizing the malware behaviour profile extracted from the online news is to help cyber security experts to better understand the malware behaviour.
format Conference or Workshop Item
author Yoong, Jien Chiam
Maarof, Mohd. Aizaini
Kassim, Mohamad Nizam
Zainal, Anazida
author_facet Yoong, Jien Chiam
Maarof, Mohd. Aizaini
Kassim, Mohamad Nizam
Zainal, Anazida
author_sort Yoong, Jien Chiam
title Malware behavior profiling from unstructured data
title_short Malware behavior profiling from unstructured data
title_full Malware behavior profiling from unstructured data
title_fullStr Malware behavior profiling from unstructured data
title_full_unstemmed Malware behavior profiling from unstructured data
title_sort malware behavior profiling from unstructured data
publishDate 2020
url http://eprints.utm.my/id/eprint/92351/
http://dx.doi.org/10.1007/978-3-030-49345-5_14
_version_ 1712285082152075264
score 13.15806