Review of feature extraction approaches on biomedical text classification

The overcoming volume of online biomedical literature causes congestion of data and difficulties in organizing these documents and also to retrieve the required documents from the database, especially in the Medline database. One of the solutions to surpass the overwhelming of documents is to apply...

Full description

Saved in:
Bibliographic Details
Main Authors: Dollah, R., Jafni, T. I., Hashim, H., Othman, M. S., Rasib, A. W.
Format: Article
Published: Inst Advanced Science Extension 2020
Subjects:
Online Access:http://eprints.utm.my/id/eprint/87028/
http://www.dx.doi.org/10.21833/ijaas.2020.04.001
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.utm.87028
record_format eprints
spelling my.utm.870282020-10-31T12:16:41Z http://eprints.utm.my/id/eprint/87028/ Review of feature extraction approaches on biomedical text classification Dollah, R. Jafni, T. I. Hashim, H. Othman, M. S. Rasib, A. W. QA Mathematics The overcoming volume of online biomedical literature causes congestion of data and difficulties in organizing these documents and also to retrieve the required documents from the database, especially in the Medline database. One of the solutions to surpass the overwhelming of documents is to apply classification. However, each document must be represented by a set of terminology or feature vectors. The identification of terminology or feature from biomedical literature is one of the most important and challenging tasks in text classification. This is due to a large number of new features and entities that appear in the biomedical domain. In addition, combining sets of features from different terminological resources leads to naming conflicts such as homonymous use of names and terminological ambiguities. Therefore, the purpose of this research is to investigate and evaluate the effective ways for extracting the relevant and meaningful features in order to increase the classification accuracy and improve the performance of web searches. Towards this effort, we conduct several classification experiments to evaluate and compare the effectiveness of feature extraction approaches for extracting the relevant and informative features from the biomedical literature. For our experiments, we use two different sets of features, which are a set of features that are extracted using the Genia tagger tool and set of features that are extracted by medical experts from Pusat Perubatan Universiti Kebangsaan Malaysia (PPUKM). The results show the performance of classification using features that are extracted by medical experts outperform the performance of classification using the Genia Tagger tool when applying feature selection method. Inst Advanced Science Extension 2020-04 Article PeerReviewed Dollah, R. and Jafni, T. I. and Hashim, H. and Othman, M. S. and Rasib, A. W. (2020) Review of feature extraction approaches on biomedical text classification. International Journal of Advanced And Applied Sciences, 7 (4). pp. 1-8. http://www.dx.doi.org/10.21833/ijaas.2020.04.001 DOI:10.21833/ijaas.2020.04.001
institution Universiti Teknologi Malaysia
building UTM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Teknologi Malaysia
content_source UTM Institutional Repository
url_provider http://eprints.utm.my/
topic QA Mathematics
spellingShingle QA Mathematics
Dollah, R.
Jafni, T. I.
Hashim, H.
Othman, M. S.
Rasib, A. W.
Review of feature extraction approaches on biomedical text classification
description The overcoming volume of online biomedical literature causes congestion of data and difficulties in organizing these documents and also to retrieve the required documents from the database, especially in the Medline database. One of the solutions to surpass the overwhelming of documents is to apply classification. However, each document must be represented by a set of terminology or feature vectors. The identification of terminology or feature from biomedical literature is one of the most important and challenging tasks in text classification. This is due to a large number of new features and entities that appear in the biomedical domain. In addition, combining sets of features from different terminological resources leads to naming conflicts such as homonymous use of names and terminological ambiguities. Therefore, the purpose of this research is to investigate and evaluate the effective ways for extracting the relevant and meaningful features in order to increase the classification accuracy and improve the performance of web searches. Towards this effort, we conduct several classification experiments to evaluate and compare the effectiveness of feature extraction approaches for extracting the relevant and informative features from the biomedical literature. For our experiments, we use two different sets of features, which are a set of features that are extracted using the Genia tagger tool and set of features that are extracted by medical experts from Pusat Perubatan Universiti Kebangsaan Malaysia (PPUKM). The results show the performance of classification using features that are extracted by medical experts outperform the performance of classification using the Genia Tagger tool when applying feature selection method.
format Article
author Dollah, R.
Jafni, T. I.
Hashim, H.
Othman, M. S.
Rasib, A. W.
author_facet Dollah, R.
Jafni, T. I.
Hashim, H.
Othman, M. S.
Rasib, A. W.
author_sort Dollah, R.
title Review of feature extraction approaches on biomedical text classification
title_short Review of feature extraction approaches on biomedical text classification
title_full Review of feature extraction approaches on biomedical text classification
title_fullStr Review of feature extraction approaches on biomedical text classification
title_full_unstemmed Review of feature extraction approaches on biomedical text classification
title_sort review of feature extraction approaches on biomedical text classification
publisher Inst Advanced Science Extension
publishDate 2020
url http://eprints.utm.my/id/eprint/87028/
http://www.dx.doi.org/10.21833/ijaas.2020.04.001
_version_ 1683230692621680640
score 13.19449