KDA: An unsupervised approach for analyzing keyphrases distance from news articles as a feature of keyphrase extraction

Automatic keyphrase extraction remains a significant and difficult issue in the current research domain because of the exponential explosion of information and internet sources. Various activities involving natural language processing and information retrieval systems greatly benefit from the use of...

Full description

Saved in:
Bibliographic Details
Main Authors: Alam Miah, Mohammad Badrul, Suryanti, Awang
Format: Conference or Workshop Item
Language:English
Published: 2022
Subjects:
Online Access:http://umpir.ump.edu.my/id/eprint/36844/1/KDA%20_%20An%20unsupervised%20approach%20for%20analyzing%20keyphrases%20distance%20from%20news%20articles%20as%20a%20feature%20of%20keyphrase%20extraction.pdf
http://umpir.ump.edu.my/id/eprint/36844/
https://ncon-pgr.ump.edu.my/index.php/en/?option=com_fileman&view=file&routed=1&name=E-BOOK%20NCON%202022%20.pdf&folder=E-BOOK%20NCON%202022&container=fileman-files
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Automatic keyphrase extraction remains a significant and difficult issue in the current research domain because of the exponential explosion of information and internet sources. Various activities involving natural language processing and information retrieval systems greatly benefit from the use of keyphrases. To extract the best keyphrases and summarize the documents to the highest standard, feature extractions for those keyphrases are crucial. This paper proposes an unsupervised region-based KDA technique for analyzing the distance of keyphrases from news articles as feature of keyphrase extraction. The proposed technique is divided into eight phases: data collection, data pre-processing, data processing, keyphrase searching, distance calculating, distance averaging, curve-plotting, and curve-fitting. At first, the proposed technique collects two different datasets that contain the news articles; it is then applied to the data pre-processing step that uses a few preprocessing algorithms. Then this pre-processing data is used in the data processing stage, where it is sent to the keyphrase searching step, the distance calculation process, and then the distance averaging steps. Curve plotting analysis is then applied, and finally the curve fitting technique is used. Afterwards, the performance of the proposed technique is put to test and evaluated using two of the most accessible benchmark datasets. The proposed method is then compared to other available methods in order to demonstrate its efficiency, advantages, and importance. Lastly, the results of the experiment demonstrated that the proposed approach efficiently analyzed the keyphrase distance from news articles, produced an F1-score of 96.91%, and presented keyphrases of 94.55%, as well as greatly improved the effectiveness of the current keyphrase extraction methods.