Investigation of Pre-Trained Bidirectional Encoder Representations from Transformers Checkpoints for Indonesian Abstractive Text Summarization

Text summarization aims to reduce text by removing less useful information to obtain information quickly and precisely. In Indonesian abstractive text summarization, the research mostly focuses on multi-document summarization which methods will not work optimally in single-document summarization. A...

Full description

Saved in:
Bibliographic Details
Main Authors: Lucky, Henry, Suhartono, Derwin
Format: Article
Language:English
Published: Universiti Utara Malaysia Press 2022
Subjects:
Online Access:https://repo.uum.edu.my/id/eprint/28753/1/JICT%2021%2001%202022%2071-94.pdf
https://repo.uum.edu.my/id/eprint/28753/
https://e-journal.uum.edu.my/index.php/jict/article/view/13548
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.uum.repo.28753
record_format eprints
spelling my.uum.repo.287532023-02-09T03:05:34Z https://repo.uum.edu.my/id/eprint/28753/ Investigation of Pre-Trained Bidirectional Encoder Representations from Transformers Checkpoints for Indonesian Abstractive Text Summarization Lucky, Henry Suhartono, Derwin QA75 Electronic computers. Computer science Text summarization aims to reduce text by removing less useful information to obtain information quickly and precisely. In Indonesian abstractive text summarization, the research mostly focuses on multi-document summarization which methods will not work optimally in single-document summarization. As the public summarization datasets and works in English are focusing on single-document summarization, this study emphasized on Indonesian single-document summarization. Abstractive text summarization studies in English frequently use Bidirectional Encoder Representations from Transformers (BERT), and since Indonesian BERT checkpoint is available, it was employed in this study. This study investigated the use of Indonesian BERT in abstractive text summarization on the IndoSum dataset using the BERTSum model. The investigation proceeded by using various combinations of model encoders, model embedding sizes, and model decoders. Evaluation results showed that models with more embedding size and used Generative Pre-Training (GPT)-like decoder could improve the Recall-Oriented Understudy for Gisting Evaluation (ROUGE) score and BERTScore of the model results. Universiti Utara Malaysia Press 2022 Article PeerReviewed application/pdf en cc4_by https://repo.uum.edu.my/id/eprint/28753/1/JICT%2021%2001%202022%2071-94.pdf Lucky, Henry and Suhartono, Derwin (2022) Investigation of Pre-Trained Bidirectional Encoder Representations from Transformers Checkpoints for Indonesian Abstractive Text Summarization. Journal of Information and Communication Technology, 21 (01). pp. 71-94. ISSN 2180-3862 https://e-journal.uum.edu.my/index.php/jict/article/view/13548
institution Universiti Utara Malaysia
building UUM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Utara Malaysia
content_source UUM Institutional Repository
url_provider http://repo.uum.edu.my/
language English
topic QA75 Electronic computers. Computer science
spellingShingle QA75 Electronic computers. Computer science
Lucky, Henry
Suhartono, Derwin
Investigation of Pre-Trained Bidirectional Encoder Representations from Transformers Checkpoints for Indonesian Abstractive Text Summarization
description Text summarization aims to reduce text by removing less useful information to obtain information quickly and precisely. In Indonesian abstractive text summarization, the research mostly focuses on multi-document summarization which methods will not work optimally in single-document summarization. As the public summarization datasets and works in English are focusing on single-document summarization, this study emphasized on Indonesian single-document summarization. Abstractive text summarization studies in English frequently use Bidirectional Encoder Representations from Transformers (BERT), and since Indonesian BERT checkpoint is available, it was employed in this study. This study investigated the use of Indonesian BERT in abstractive text summarization on the IndoSum dataset using the BERTSum model. The investigation proceeded by using various combinations of model encoders, model embedding sizes, and model decoders. Evaluation results showed that models with more embedding size and used Generative Pre-Training (GPT)-like decoder could improve the Recall-Oriented Understudy for Gisting Evaluation (ROUGE) score and BERTScore of the model results.
format Article
author Lucky, Henry
Suhartono, Derwin
author_facet Lucky, Henry
Suhartono, Derwin
author_sort Lucky, Henry
title Investigation of Pre-Trained Bidirectional Encoder Representations from Transformers Checkpoints for Indonesian Abstractive Text Summarization
title_short Investigation of Pre-Trained Bidirectional Encoder Representations from Transformers Checkpoints for Indonesian Abstractive Text Summarization
title_full Investigation of Pre-Trained Bidirectional Encoder Representations from Transformers Checkpoints for Indonesian Abstractive Text Summarization
title_fullStr Investigation of Pre-Trained Bidirectional Encoder Representations from Transformers Checkpoints for Indonesian Abstractive Text Summarization
title_full_unstemmed Investigation of Pre-Trained Bidirectional Encoder Representations from Transformers Checkpoints for Indonesian Abstractive Text Summarization
title_sort investigation of pre-trained bidirectional encoder representations from transformers checkpoints for indonesian abstractive text summarization
publisher Universiti Utara Malaysia Press
publishDate 2022
url https://repo.uum.edu.my/id/eprint/28753/1/JICT%2021%2001%202022%2071-94.pdf
https://repo.uum.edu.my/id/eprint/28753/
https://e-journal.uum.edu.my/index.php/jict/article/view/13548
_version_ 1758580949089517568
score 13.211869