عرض للأخصائي: Text based personality prediction from multiple social media data sources using pre-trained language model and model averaging

Text based personality prediction from multiple social media data sources using pre-trained language model and model averaging

The ever-increasing social media users has dramatically contributed to significant growth as far as the volume of online information is concerned. Often, the contents that these users put in social media can give valuable insights on their personalities (e.g., in terms of predicting job satisfaction...

وصف كامل

محفوظ في:

التفاصيل البيبلوغرافية
المؤلفون الرئيسيون:	Christian, Hans, Suhartono, Derwin, Chowanda, Andry, Kamal Z., Zamli
التنسيق:	مقال
اللغة:	English
منشور في:	Springer 2021
الموضوعات:	QA76 Computer software
الوصول للمادة أونلاين:	http://umpir.ump.edu.my/id/eprint/32200/1/Text%20based%20personality%20prediction%20from%20multiple%20social%20media.pdf http://umpir.ump.edu.my/id/eprint/32200/ https://doi.org/10.1186/s40537-021-00459-1 https://doi.org/10.1186/s40537-021-00459-1
الوسوم:	إضافة وسم لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!

id	my.ump.umpir.32200
record_format	eprints
spelling	my.ump.umpir.322002022-02-10T02:32:30Z http://umpir.ump.edu.my/id/eprint/32200/ Text based personality prediction from multiple social media data sources using pre-trained language model and model averaging Christian, Hans Suhartono, Derwin Chowanda, Andry Kamal Z., Zamli QA76 Computer software The ever-increasing social media users has dramatically contributed to significant growth as far as the volume of online information is concerned. Often, the contents that these users put in social media can give valuable insights on their personalities (e.g., in terms of predicting job satisfaction, specific preferences, as well as the success of professional and romantic relationship) and getting it without the hassle of taking formal personality test. Termed personality prediction, the process involves extracting the digital content into features and mapping it according to a personality model. Owing to its simplicity and proven capability, a well-known personality model, called the big five personality traits, has often been adopted in the literature as the de facto standard for personality assessment. To date, there are many algorithms that can be used to extract embedded contextualized word from textual data for personality prediction system; some of them are based on ensembled model and deep learning. Although useful, existing algorithms such as RNN and LSTM suffers from the following limitations. Firstly, these algorithms take a long time to train the model owing to its sequential inputs. Secondly, these algorithms also lack the ability to capture the true (semantic) meaning of words; therefore, the context is slightly lost. To address these aforementioned limitations, this paper introduces a new prediction using multi model deep learning architecture combined with multiple pre-trained language model such as BERT, RoBERTa, and XLNet as features extraction method on social media data sources. Finally, the system takes the decision based on model averaging to make prediction. Unlike earlier work which adopts a single social media data with open and close vocabulary extraction method, the proposed work uses multiple social media data sources namely Facebook and Twitter and produce a predictive model for each trait using bidirectional context feature combine with extraction method. Our experience with the proposed work has been encouraging as it has outperformed similar existing works in the literature. More precisely, our results achieve a maximum accuracy of 86.2% and 0.912 f1 measure score on the Facebook dataset; 88.5% accuracy and 0.882 f1 measure score on the Twitter dataset. Springer 2021-05-17 Article PeerReviewed pdf en cc_by_4 http://umpir.ump.edu.my/id/eprint/32200/1/Text%20based%20personality%20prediction%20from%20multiple%20social%20media.pdf Christian, Hans and Suhartono, Derwin and Chowanda, Andry and Kamal Z., Zamli (2021) Text based personality prediction from multiple social media data sources using pre-trained language model and model averaging. Journal of Big Data, 8 (1). pp. 1-20. ISSN 2196-1115 https://doi.org/10.1186/s40537-021-00459-1 https://doi.org/10.1186/s40537-021-00459-1
institution	Universiti Malaysia Pahang
building	UMP Library
collection	Institutional Repository
continent	Asia
country	Malaysia
content_provider	Universiti Malaysia Pahang
content_source	UMP Institutional Repository
url_provider	http://umpir.ump.edu.my/
language	English
topic	QA76 Computer software
spellingShingle	QA76 Computer software Christian, Hans Suhartono, Derwin Chowanda, Andry Kamal Z., Zamli Text based personality prediction from multiple social media data sources using pre-trained language model and model averaging
description	The ever-increasing social media users has dramatically contributed to significant growth as far as the volume of online information is concerned. Often, the contents that these users put in social media can give valuable insights on their personalities (e.g., in terms of predicting job satisfaction, specific preferences, as well as the success of professional and romantic relationship) and getting it without the hassle of taking formal personality test. Termed personality prediction, the process involves extracting the digital content into features and mapping it according to a personality model. Owing to its simplicity and proven capability, a well-known personality model, called the big five personality traits, has often been adopted in the literature as the de facto standard for personality assessment. To date, there are many algorithms that can be used to extract embedded contextualized word from textual data for personality prediction system; some of them are based on ensembled model and deep learning. Although useful, existing algorithms such as RNN and LSTM suffers from the following limitations. Firstly, these algorithms take a long time to train the model owing to its sequential inputs. Secondly, these algorithms also lack the ability to capture the true (semantic) meaning of words; therefore, the context is slightly lost. To address these aforementioned limitations, this paper introduces a new prediction using multi model deep learning architecture combined with multiple pre-trained language model such as BERT, RoBERTa, and XLNet as features extraction method on social media data sources. Finally, the system takes the decision based on model averaging to make prediction. Unlike earlier work which adopts a single social media data with open and close vocabulary extraction method, the proposed work uses multiple social media data sources namely Facebook and Twitter and produce a predictive model for each trait using bidirectional context feature combine with extraction method. Our experience with the proposed work has been encouraging as it has outperformed similar existing works in the literature. More precisely, our results achieve a maximum accuracy of 86.2% and 0.912 f1 measure score on the Facebook dataset; 88.5% accuracy and 0.882 f1 measure score on the Twitter dataset.
format	Article
author	Christian, Hans Suhartono, Derwin Chowanda, Andry Kamal Z., Zamli
author_facet	Christian, Hans Suhartono, Derwin Chowanda, Andry Kamal Z., Zamli
author_sort	Christian, Hans
title	Text based personality prediction from multiple social media data sources using pre-trained language model and model averaging
title_short	Text based personality prediction from multiple social media data sources using pre-trained language model and model averaging
title_full	Text based personality prediction from multiple social media data sources using pre-trained language model and model averaging
title_fullStr	Text based personality prediction from multiple social media data sources using pre-trained language model and model averaging
title_full_unstemmed	Text based personality prediction from multiple social media data sources using pre-trained language model and model averaging
title_sort	text based personality prediction from multiple social media data sources using pre-trained language model and model averaging
publisher	Springer
publishDate	2021
url	http://umpir.ump.edu.my/id/eprint/32200/1/Text%20based%20personality%20prediction%20from%20multiple%20social%20media.pdf http://umpir.ump.edu.my/id/eprint/32200/ https://doi.org/10.1186/s40537-021-00459-1 https://doi.org/10.1186/s40537-021-00459-1
_version_	1724608134358499328
score	13.149126

Text based personality prediction from multiple social media data sources using pre-trained language model and model averaging

مواد مشابهة