A deep autoencoder-based representation for Arabic text categorization

Arabic text representation is a challenging assignment for several applications such as text categorization and clustering since the Arabic language is known for its variety, richness and complex morphology. Until recently, the Bag-of-Words remains the most common method for Arabic text representati...

全面介绍

Saved in:
书目详细资料
Main Authors: El-Alami, Fatima-Zahra, El Mahdaouy, Abdelkader, El Alaoui, Said Ouatik, En-Nahnahi, Noureddine
格式: Article
语言:English
出版: Universiti Utara Malaysia Press 2020
主题:
在线阅读:http://repo.uum.edu.my/28135/1/JICT%2019%203%202020%20381-398.pdf
http://repo.uum.edu.my/28135/
http://jict.uum.edu.my/index.php/previous-issues/172-journal-of-information-and-communication-technology-jict-vol-19-no-3-july-2020#a4
标签: 添加标签
没有标签, 成为第一个标记此记录!
id my.uum.repo.28135
record_format eprints
spelling my.uum.repo.281352021-02-02T02:52:19Z http://repo.uum.edu.my/28135/ A deep autoencoder-based representation for Arabic text categorization El-Alami, Fatima-Zahra El Mahdaouy, Abdelkader El Alaoui, Said Ouatik En-Nahnahi, Noureddine QA75 Electronic computers. Computer science Arabic text representation is a challenging assignment for several applications such as text categorization and clustering since the Arabic language is known for its variety, richness and complex morphology. Until recently, the Bag-of-Words remains the most common method for Arabic text representation. However, it suffers from several shortcomings such as semantics deficiency and high dimensionality of feature space. Moreover, most existing methods ignore the explicit knowledge contained in semantic vocabularies such as Arabic WordNet. To overcome these shortcomings, we proposed a deep Autoencoder based representation for Arabic text categorization. It consisted of three stages: (1) Extracting from Arabic WordNet the most relevant concepts based on feature selection processes (2) Features learning via an unsupervised algorithm for text representation (3) Categorizing text using deep Autoencoder. Our method allowed for the consideration of document semantics by combining both implicit and explicit semantics and reducing feature space dimensionality. To evaluate our method, we conducted several experiments on the standard Arabic dataset, OSAC. The obtained results showed the effectiveness of the proposed method compared to state-of-the-art ones. Universiti Utara Malaysia Press 2020 Article PeerReviewed application/pdf en http://repo.uum.edu.my/28135/1/JICT%2019%203%202020%20381-398.pdf El-Alami, Fatima-Zahra and El Mahdaouy, Abdelkader and El Alaoui, Said Ouatik and En-Nahnahi, Noureddine (2020) A deep autoencoder-based representation for Arabic text categorization. Journal of Information and Communication Technology, 19 (3). pp. 381-398. ISSN 2180-3862 http://jict.uum.edu.my/index.php/previous-issues/172-journal-of-information-and-communication-technology-jict-vol-19-no-3-july-2020#a4
institution Universiti Utara Malaysia
building UUM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Utara Malaysia
content_source UUM Institutional Repository
url_provider http://repo.uum.edu.my/
language English
topic QA75 Electronic computers. Computer science
spellingShingle QA75 Electronic computers. Computer science
El-Alami, Fatima-Zahra
El Mahdaouy, Abdelkader
El Alaoui, Said Ouatik
En-Nahnahi, Noureddine
A deep autoencoder-based representation for Arabic text categorization
description Arabic text representation is a challenging assignment for several applications such as text categorization and clustering since the Arabic language is known for its variety, richness and complex morphology. Until recently, the Bag-of-Words remains the most common method for Arabic text representation. However, it suffers from several shortcomings such as semantics deficiency and high dimensionality of feature space. Moreover, most existing methods ignore the explicit knowledge contained in semantic vocabularies such as Arabic WordNet. To overcome these shortcomings, we proposed a deep Autoencoder based representation for Arabic text categorization. It consisted of three stages: (1) Extracting from Arabic WordNet the most relevant concepts based on feature selection processes (2) Features learning via an unsupervised algorithm for text representation (3) Categorizing text using deep Autoencoder. Our method allowed for the consideration of document semantics by combining both implicit and explicit semantics and reducing feature space dimensionality. To evaluate our method, we conducted several experiments on the standard Arabic dataset, OSAC. The obtained results showed the effectiveness of the proposed method compared to state-of-the-art ones.
format Article
author El-Alami, Fatima-Zahra
El Mahdaouy, Abdelkader
El Alaoui, Said Ouatik
En-Nahnahi, Noureddine
author_facet El-Alami, Fatima-Zahra
El Mahdaouy, Abdelkader
El Alaoui, Said Ouatik
En-Nahnahi, Noureddine
author_sort El-Alami, Fatima-Zahra
title A deep autoencoder-based representation for Arabic text categorization
title_short A deep autoencoder-based representation for Arabic text categorization
title_full A deep autoencoder-based representation for Arabic text categorization
title_fullStr A deep autoencoder-based representation for Arabic text categorization
title_full_unstemmed A deep autoencoder-based representation for Arabic text categorization
title_sort deep autoencoder-based representation for arabic text categorization
publisher Universiti Utara Malaysia Press
publishDate 2020
url http://repo.uum.edu.my/28135/1/JICT%2019%203%202020%20381-398.pdf
http://repo.uum.edu.my/28135/
http://jict.uum.edu.my/index.php/previous-issues/172-journal-of-information-and-communication-technology-jict-vol-19-no-3-july-2020#a4
_version_ 1691735342086881280
score 13.149126