A deep autoencoder-based representation for Arabic text categorization
Arabic text representation is a challenging assignment for several applications such as text categorization and clustering since the Arabic language is known for its variety, richness and complex morphology. Until recently, the Bag-of-Words remains the most common method for Arabic text representati...
Saved in:
Main Authors: | , , , |
---|---|
格式: | Article |
语言: | English |
出版: |
Universiti Utara Malaysia Press
2020
|
主题: | |
在线阅读: | http://repo.uum.edu.my/28135/1/JICT%2019%203%202020%20381-398.pdf http://repo.uum.edu.my/28135/ http://jict.uum.edu.my/index.php/previous-issues/172-journal-of-information-and-communication-technology-jict-vol-19-no-3-july-2020#a4 |
标签: |
添加标签
没有标签, 成为第一个标记此记录!
|
id |
my.uum.repo.28135 |
---|---|
record_format |
eprints |
spelling |
my.uum.repo.281352021-02-02T02:52:19Z http://repo.uum.edu.my/28135/ A deep autoencoder-based representation for Arabic text categorization El-Alami, Fatima-Zahra El Mahdaouy, Abdelkader El Alaoui, Said Ouatik En-Nahnahi, Noureddine QA75 Electronic computers. Computer science Arabic text representation is a challenging assignment for several applications such as text categorization and clustering since the Arabic language is known for its variety, richness and complex morphology. Until recently, the Bag-of-Words remains the most common method for Arabic text representation. However, it suffers from several shortcomings such as semantics deficiency and high dimensionality of feature space. Moreover, most existing methods ignore the explicit knowledge contained in semantic vocabularies such as Arabic WordNet. To overcome these shortcomings, we proposed a deep Autoencoder based representation for Arabic text categorization. It consisted of three stages: (1) Extracting from Arabic WordNet the most relevant concepts based on feature selection processes (2) Features learning via an unsupervised algorithm for text representation (3) Categorizing text using deep Autoencoder. Our method allowed for the consideration of document semantics by combining both implicit and explicit semantics and reducing feature space dimensionality. To evaluate our method, we conducted several experiments on the standard Arabic dataset, OSAC. The obtained results showed the effectiveness of the proposed method compared to state-of-the-art ones. Universiti Utara Malaysia Press 2020 Article PeerReviewed application/pdf en http://repo.uum.edu.my/28135/1/JICT%2019%203%202020%20381-398.pdf El-Alami, Fatima-Zahra and El Mahdaouy, Abdelkader and El Alaoui, Said Ouatik and En-Nahnahi, Noureddine (2020) A deep autoencoder-based representation for Arabic text categorization. Journal of Information and Communication Technology, 19 (3). pp. 381-398. ISSN 2180-3862 http://jict.uum.edu.my/index.php/previous-issues/172-journal-of-information-and-communication-technology-jict-vol-19-no-3-july-2020#a4 |
institution |
Universiti Utara Malaysia |
building |
UUM Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Utara Malaysia |
content_source |
UUM Institutional Repository |
url_provider |
http://repo.uum.edu.my/ |
language |
English |
topic |
QA75 Electronic computers. Computer science |
spellingShingle |
QA75 Electronic computers. Computer science El-Alami, Fatima-Zahra El Mahdaouy, Abdelkader El Alaoui, Said Ouatik En-Nahnahi, Noureddine A deep autoencoder-based representation for Arabic text categorization |
description |
Arabic text representation is a challenging assignment for several applications such as text categorization and clustering since the Arabic language is known for its variety, richness and complex morphology. Until recently, the Bag-of-Words remains the most common method for Arabic text representation. However, it suffers from several shortcomings such as semantics deficiency and high dimensionality of feature space. Moreover, most existing methods ignore the explicit knowledge contained in semantic vocabularies such as Arabic WordNet. To overcome these shortcomings, we proposed a deep Autoencoder based representation for Arabic text categorization. It consisted of three stages: (1) Extracting from Arabic WordNet the most relevant concepts based on feature selection processes (2) Features learning via an unsupervised algorithm for text representation (3) Categorizing text using deep Autoencoder. Our method allowed for the consideration of document semantics by combining both implicit and explicit semantics and reducing feature space dimensionality. To evaluate our method, we conducted several experiments on the standard Arabic dataset, OSAC. The obtained results showed the effectiveness of the proposed method compared to state-of-the-art ones. |
format |
Article |
author |
El-Alami, Fatima-Zahra El Mahdaouy, Abdelkader El Alaoui, Said Ouatik En-Nahnahi, Noureddine |
author_facet |
El-Alami, Fatima-Zahra El Mahdaouy, Abdelkader El Alaoui, Said Ouatik En-Nahnahi, Noureddine |
author_sort |
El-Alami, Fatima-Zahra |
title |
A deep autoencoder-based representation for Arabic text categorization |
title_short |
A deep autoencoder-based representation for Arabic text categorization |
title_full |
A deep autoencoder-based representation for Arabic text categorization |
title_fullStr |
A deep autoencoder-based representation for Arabic text categorization |
title_full_unstemmed |
A deep autoencoder-based representation for Arabic text categorization |
title_sort |
deep autoencoder-based representation for arabic text categorization |
publisher |
Universiti Utara Malaysia Press |
publishDate |
2020 |
url |
http://repo.uum.edu.my/28135/1/JICT%2019%203%202020%20381-398.pdf http://repo.uum.edu.my/28135/ http://jict.uum.edu.my/index.php/previous-issues/172-journal-of-information-and-communication-technology-jict-vol-19-no-3-july-2020#a4 |
_version_ |
1691735342086881280 |
score |
13.149126 |