A Highly Accurate PDF-To-Text Conversion System for Academic Papers Using Natural Language Processing Approach

Extracting text out of PDF documents is never an easy task when a higher degree of accuracy and consistency are the two main criteria to be attained. Although, there exist a considerable number of such systems; however, most of them are falling short of offering desirable performance especially when...

Full description

Saved in:
Bibliographic Details
Main Authors: Yong, Tien Fui, Azad, Saiful, Rahman, Mohammed Mostafizur, Kamal Z., Zamli, Gollam, Rabby
Format: Article
Language:English
Published: American Scientific Publisher 2018
Subjects:
Online Access:http://umpir.ump.edu.my/id/eprint/21838/1/A%20Highly%20Accurate%20PDF.pdf
http://umpir.ump.edu.my/id/eprint/21838/
https://doi.org/10.1166/asl.2018.13029
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.ump.umpir.21838
record_format eprints
spelling my.ump.umpir.218382018-11-29T03:06:43Z http://umpir.ump.edu.my/id/eprint/21838/ A Highly Accurate PDF-To-Text Conversion System for Academic Papers Using Natural Language Processing Approach Yong, Tien Fui Azad, Saiful Rahman, Mohammed Mostafizur Kamal Z., Zamli Gollam, Rabby QA76 Computer software Extracting text out of PDF documents is never an easy task when a higher degree of accuracy and consistency are the two main criteria to be attained. Although, there exist a considerable number of such systems; however, most of them are falling short of offering desirable performance especially when academic literature is the concern. Researches, those involved heavily in text mining and project analyzing, need an accurate and consistent supporting tool for PDF-To-Text (PTT) conversion. Therefore, in this paper, we propose a Natural Language Processing based PDF-to-text (NLPDF) conversion system, which comprises of two major steps, namely (i) reads contents from the PDF and (ii) reconstruct the text. The performance of the proposed system is evaluated via four metrics, namely Precision, Recall, F -Measure (AF), and standard deviation, and compared with eight other similar benchmarked systems available in the market. The experimental results evidently demonstrate the effectiveness of the proposed system. American Scientific Publisher 2018-10-01 Article PeerReviewed pdf en http://umpir.ump.edu.my/id/eprint/21838/1/A%20Highly%20Accurate%20PDF.pdf Yong, Tien Fui and Azad, Saiful and Rahman, Mohammed Mostafizur and Kamal Z., Zamli and Gollam, Rabby (2018) A Highly Accurate PDF-To-Text Conversion System for Academic Papers Using Natural Language Processing Approach. Advanced Science Letters, 24 (10). pp. 7844-7849. ISSN 1936-6612 https://doi.org/10.1166/asl.2018.13029 doi:10.1166/asl.2018.13029
institution Universiti Malaysia Pahang
building UMP Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Malaysia Pahang
content_source UMP Institutional Repository
url_provider http://umpir.ump.edu.my/
language English
topic QA76 Computer software
spellingShingle QA76 Computer software
Yong, Tien Fui
Azad, Saiful
Rahman, Mohammed Mostafizur
Kamal Z., Zamli
Gollam, Rabby
A Highly Accurate PDF-To-Text Conversion System for Academic Papers Using Natural Language Processing Approach
description Extracting text out of PDF documents is never an easy task when a higher degree of accuracy and consistency are the two main criteria to be attained. Although, there exist a considerable number of such systems; however, most of them are falling short of offering desirable performance especially when academic literature is the concern. Researches, those involved heavily in text mining and project analyzing, need an accurate and consistent supporting tool for PDF-To-Text (PTT) conversion. Therefore, in this paper, we propose a Natural Language Processing based PDF-to-text (NLPDF) conversion system, which comprises of two major steps, namely (i) reads contents from the PDF and (ii) reconstruct the text. The performance of the proposed system is evaluated via four metrics, namely Precision, Recall, F -Measure (AF), and standard deviation, and compared with eight other similar benchmarked systems available in the market. The experimental results evidently demonstrate the effectiveness of the proposed system.
format Article
author Yong, Tien Fui
Azad, Saiful
Rahman, Mohammed Mostafizur
Kamal Z., Zamli
Gollam, Rabby
author_facet Yong, Tien Fui
Azad, Saiful
Rahman, Mohammed Mostafizur
Kamal Z., Zamli
Gollam, Rabby
author_sort Yong, Tien Fui
title A Highly Accurate PDF-To-Text Conversion System for Academic Papers Using Natural Language Processing Approach
title_short A Highly Accurate PDF-To-Text Conversion System for Academic Papers Using Natural Language Processing Approach
title_full A Highly Accurate PDF-To-Text Conversion System for Academic Papers Using Natural Language Processing Approach
title_fullStr A Highly Accurate PDF-To-Text Conversion System for Academic Papers Using Natural Language Processing Approach
title_full_unstemmed A Highly Accurate PDF-To-Text Conversion System for Academic Papers Using Natural Language Processing Approach
title_sort highly accurate pdf-to-text conversion system for academic papers using natural language processing approach
publisher American Scientific Publisher
publishDate 2018
url http://umpir.ump.edu.my/id/eprint/21838/1/A%20Highly%20Accurate%20PDF.pdf
http://umpir.ump.edu.my/id/eprint/21838/
https://doi.org/10.1166/asl.2018.13029
_version_ 1643669227583832064
score 13.164666