Investigation of Pre-Trained Bidirectional Encoder Representations from Transformers Checkpoints for Indonesian Abstractive Text Summarization
Text summarization aims to reduce text by removing less useful information to obtain information quickly and precisely. In Indonesian abstractive text summarization, the research mostly focuses on multi-document summarization which methods will not work optimally in single-document summarization. A...
Saved in:
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Universiti Utara Malaysia Press
2022
|
Subjects: | |
Online Access: | https://repo.uum.edu.my/id/eprint/28753/1/JICT%2021%2001%202022%2071-94.pdf https://repo.uum.edu.my/id/eprint/28753/ https://e-journal.uum.edu.my/index.php/jict/article/view/13548 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
my.uum.repo.28753 |
---|---|
record_format |
eprints |
spelling |
my.uum.repo.287532023-02-09T03:05:34Z https://repo.uum.edu.my/id/eprint/28753/ Investigation of Pre-Trained Bidirectional Encoder Representations from Transformers Checkpoints for Indonesian Abstractive Text Summarization Lucky, Henry Suhartono, Derwin QA75 Electronic computers. Computer science Text summarization aims to reduce text by removing less useful information to obtain information quickly and precisely. In Indonesian abstractive text summarization, the research mostly focuses on multi-document summarization which methods will not work optimally in single-document summarization. As the public summarization datasets and works in English are focusing on single-document summarization, this study emphasized on Indonesian single-document summarization. Abstractive text summarization studies in English frequently use Bidirectional Encoder Representations from Transformers (BERT), and since Indonesian BERT checkpoint is available, it was employed in this study. This study investigated the use of Indonesian BERT in abstractive text summarization on the IndoSum dataset using the BERTSum model. The investigation proceeded by using various combinations of model encoders, model embedding sizes, and model decoders. Evaluation results showed that models with more embedding size and used Generative Pre-Training (GPT)-like decoder could improve the Recall-Oriented Understudy for Gisting Evaluation (ROUGE) score and BERTScore of the model results. Universiti Utara Malaysia Press 2022 Article PeerReviewed application/pdf en cc4_by https://repo.uum.edu.my/id/eprint/28753/1/JICT%2021%2001%202022%2071-94.pdf Lucky, Henry and Suhartono, Derwin (2022) Investigation of Pre-Trained Bidirectional Encoder Representations from Transformers Checkpoints for Indonesian Abstractive Text Summarization. Journal of Information and Communication Technology, 21 (01). pp. 71-94. ISSN 2180-3862 https://e-journal.uum.edu.my/index.php/jict/article/view/13548 |
institution |
Universiti Utara Malaysia |
building |
UUM Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Utara Malaysia |
content_source |
UUM Institutional Repository |
url_provider |
http://repo.uum.edu.my/ |
language |
English |
topic |
QA75 Electronic computers. Computer science |
spellingShingle |
QA75 Electronic computers. Computer science Lucky, Henry Suhartono, Derwin Investigation of Pre-Trained Bidirectional Encoder Representations from Transformers Checkpoints for Indonesian Abstractive Text Summarization |
description |
Text summarization aims to reduce text by removing less useful
information to obtain information quickly and precisely. In Indonesian abstractive text summarization, the research mostly focuses on multi-document summarization which methods will not work optimally in single-document summarization. As the public
summarization datasets and works in English are focusing on single-document summarization, this study emphasized on Indonesian single-document summarization. Abstractive text summarization studies in English frequently use Bidirectional Encoder Representations from Transformers (BERT), and since Indonesian BERT checkpoint is available, it was employed in this study. This study investigated the use of Indonesian BERT in abstractive text summarization on the IndoSum dataset using the BERTSum model. The investigation proceeded by using various combinations of model encoders, model embedding sizes, and model decoders. Evaluation results showed that models with more embedding size and used Generative Pre-Training (GPT)-like decoder could improve the Recall-Oriented Understudy for Gisting Evaluation (ROUGE) score and BERTScore of the model
results. |
format |
Article |
author |
Lucky, Henry Suhartono, Derwin |
author_facet |
Lucky, Henry Suhartono, Derwin |
author_sort |
Lucky, Henry |
title |
Investigation of Pre-Trained Bidirectional Encoder Representations from Transformers Checkpoints for Indonesian Abstractive Text Summarization |
title_short |
Investigation of Pre-Trained Bidirectional Encoder Representations from Transformers Checkpoints for Indonesian Abstractive Text Summarization |
title_full |
Investigation of Pre-Trained Bidirectional Encoder Representations from Transformers Checkpoints for Indonesian Abstractive Text Summarization |
title_fullStr |
Investigation of Pre-Trained Bidirectional Encoder Representations from Transformers Checkpoints for Indonesian Abstractive Text Summarization |
title_full_unstemmed |
Investigation of Pre-Trained Bidirectional Encoder Representations from Transformers Checkpoints for Indonesian Abstractive Text Summarization |
title_sort |
investigation of pre-trained bidirectional encoder representations from transformers checkpoints for indonesian abstractive text summarization |
publisher |
Universiti Utara Malaysia Press |
publishDate |
2022 |
url |
https://repo.uum.edu.my/id/eprint/28753/1/JICT%2021%2001%202022%2071-94.pdf https://repo.uum.edu.my/id/eprint/28753/ https://e-journal.uum.edu.my/index.php/jict/article/view/13548 |
_version_ |
1758580949089517568 |
score |
13.211869 |