A Survey on Forms of Visualization and Tools Used in Topic Modelling

In this paper, we surveyed recent publications on topic modeling and analyzed the forms of visualizations and tools used. Expectedly, this information will help Natural Language Processing (NLP) researchers to make better decisions about which types of visualization are appropriate for them and whic...

Full description

Saved in:
Bibliographic Details
Main Authors: Maskat, Ruhaila, Shaharudin, Shazlyn Milleana, Witarsyah, Deden, Mahdin, Hairulnizam
Format: Article
Language:English
Published: JOIV 2023
Subjects:
Online Access:http://eprints.uthm.edu.my/11627/1/J16136_dd38e6fdb59711e4af7fdf32b619097b.pdf
http://eprints.uthm.edu.my/11627/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In this paper, we surveyed recent publications on topic modeling and analyzed the forms of visualizations and tools used. Expectedly, this information will help Natural Language Processing (NLP) researchers to make better decisions about which types of visualization are appropriate for them and which tools can help them. This could also spark further development of existing visualizations or the emergence of new visualizations if a gap is present. Topic modeling is an NLP technique used to identify topics hidden in a collection of documents. Visualizing these topics permits a faster understanding of the underlying subject matter in terms of its domain. This survey covered publications from 2017 to early 2022. The PRISMA methodology was used to review the publications. One hundred articles were collected, and 42 were found eligible for this study after filtration. Two research questions were formulated. The first question asks, "What are the different forms of visualizations used to display the result of topic modeling?" and the second question is "What visualization software or API is used? From our results, we discovered that different forms of visualizations meet different purposes of their display. We categorized them as maps, networks, evolution-based charts, and others. We also discovered that LDAvis is the most frequently used software/API, followed by the R language packages and D3.js. The primary limitation of this survey is it is not exhaustive. Hence, some eligible publications may not be included.