Structural features with nonnegative matrix factorization for metamorphic malware detection

Metamorphic malware is well known for evading signature-based detection by exploiting various code obfuscation techniques. Current metamorphic malware detection approaches require some prior knowledge during feature engineering stage to extract patterns and behaviors from malware. In this paper, we...

Full description

Saved in:
Bibliographic Details
Main Authors: Yeong, Tyng Ling, Mohd Sani, Nor Fazlida, Abdullah, Mohd. Taufik, Abdul Hamid, Nor Asilah Wati
Format: Article
Language:English
Published: Elsevier Advanced Technology 2021
Online Access:http://psasir.upm.edu.my/id/eprint/95181/1/Structural%20features%20with%20nonnegative%20matrix%20factorization%20for%20metamorphic%20malware%20detection.pdf
http://psasir.upm.edu.my/id/eprint/95181/
https://www.sciencedirect.com/science/article/pii/S0167404821000407
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Metamorphic malware is well known for evading signature-based detection by exploiting various code obfuscation techniques. Current metamorphic malware detection approaches require some prior knowledge during feature engineering stage to extract patterns and behaviors from malware. In this paper, we attempt to complement and extend previous techniques by proposing a metamorphic malware detection approach based on structure analysis by using information theoretic measures and statistical metrics with machine learning model. In particular, compression ratio, entropy, Jaccard coefficient and Chi-square tests are used as feature representations to reveal the byte information existing in malware binary file. Furthermore, by using Nonnegative Matrix Factorization, feature dimension can be reduced. The experimental results show the Jaccard coefficient on hexadecimal byte as feature representation is effective for Windows metamorphic malware detection with an accuracy rate and F-score as high as 0.9972 and 0.9958, respectively. Whereas for Linux morphed malware detection, the Chi-square statistic test shows as effective feature representation with an accuracy rate and F-score as high as 0.9878 and 0.9901, respectively. Overall, the proposed feature representations and the technique of dimension reduction can be useful for detecting metamorphic malware.