An enhanced binary bat and Markov clustering algorithms to improve event detection for heterogeneous news text documents

Event Detection (ED) works on identifying events from various types of data. Building an ED model for news text documents greatly helps decision-makers in various disciplines in improving their strategies. However, identifying and summarizing events from such data is a non-trivial task due to the la...

Full description

Saved in:
Bibliographic Details
Main Author: Al-Dyani, Wafa Zubair Abdullah
Format: Thesis
Language:English
English
Published: 2022
Subjects:
Online Access:https://etd.uum.edu.my/10228/1/s901775_01.pdf
https://etd.uum.edu.my/10228/2/s901775_02.pdf
https://etd.uum.edu.my/10228/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Event Detection (ED) works on identifying events from various types of data. Building an ED model for news text documents greatly helps decision-makers in various disciplines in improving their strategies. However, identifying and summarizing events from such data is a non-trivial task due to the large volume of published heterogeneous news text documents. Such documents create a high-dimensional feature space that influences the overall performance of the baseline methods in ED model. To address such a problem, this research presents an enhanced ED model that includes improved methods for the crucial phases of the ED model such as Feature Selection (FS), ED, and summarization. This work focuses on the FS problem by automatically detecting events through a novel wrapper FS method based on Adapted Binary Bat Algorithm (ABBA) and Adapted Markov Clustering Algorithm (AMCL), termed ABBA-AMCL. These adaptive techniques were developed to overcome the premature convergence in BBA and fast convergence rate in MCL. Furthermore, this study proposes four summarizing methods to generate informative summaries. The enhanced ED model was tested on 10 benchmark datasets and 2 Facebook news datasets. The effectiveness of ABBA-AMCL was compared to 8 FS methods based on meta-heuristic algorithms and 6 graph-based ED methods. The empirical and statistical results proved that ABBAAMCL surpassed other methods on most datasets. The key representative features demonstrated that ABBA-AMCL method successfully detects real-world events from Facebook news datasets with 0.96 Precision and 1 Recall for dataset 11, while for dataset 12, the Precision is 1 and Recall is 0.76. To conclude, the novel ABBA-AMCL presented in this research has successfully bridged the research gap and resolved the curse of high dimensionality feature space for heterogeneous news text documents. Hence, the enhanced ED model can organize news documents into distinct events and provide policymakers with valuable information for decision making.