Hybridized term-weighting method for web contents classification using SVM

The role of intelligence and security informatics based on statistical computations is becoming more significant in detecting terrorism activities proactively as the extremist groups are misusing many of the obtainable facilities on the Internet to incite violence and hatred. However, the performanc...

Full description

Saved in:
Bibliographic Details
Main Authors: Odeh Sabbah, Thabit Sulaiman, Selamat, Ali, Selamat, Md. Hafiz, Ibrahim, Roliana, Fujita, Hamido
Format: Conference or Workshop Item
Published: 2015
Subjects:
Online Access:http://eprints.utm.my/id/eprint/63288/
https://www.researchgate.net/publication/282254779_Hybridized_Term-Weighting_Method_for_Web_Contents_Classification_using_SVM
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The role of intelligence and security informatics based on statistical computations is becoming more significant in detecting terrorism activities proactively as the extremist groups are misusing many of the obtainable facilities on the Internet to incite violence and hatred. However, the performance of statistical methods is reported to be limited due to the inadequate accuracy produced by the inability of these methods to comprehend the meaning of texts created by humans. Miss classification of the actual terrorism web content as non-terrorism or vice versa reduces the usefulness of intelligent techniques to support the efforts against potential threats, and limits the opportunities for the effective use of intelligence and security informatics in the early detection of terrorist activities. In this paper, we propose a hybridized method based on the basic term-weighting techniques for accurate terrorism activities detection in textual contexts. The proposed method combines the feature sets generated by different individual term-weighting techniques such as Term Frequency (TF), Document Frequency (DF), Term Frequency-Inverse Document Frequency (TF-IDF), Glasgow, and Entropy into one feature set for effective classification. Moreover, two combination functions are proposed to reduce the dimensionality of combined feature set. The method is tested on a selected dataset from the Dark Web Portal Forum (DWPF) and benchmarked using Support Vector Machine (SVM), and other famous text classifiers such as K-Nearest Neighbor (KNN), Decision Trees (DT), Naïve Bayes (NB), and Extreme Learning Machine (ELM) classifiers. Experimental results show that the hybridized method efficiently identifies the terrorist activities content and outperforms the individual methods. Moreover, the results further revealed that the classification performance achieved by hybridizing few feature sets is relatively competitive in the number of features used for classification with higher hybridization levels. Moreover, the experiments of hybridizing functions show that the dimensionality of the feature sets is significantly reduced by applying the symmetric difference function for feature sets combination.