Cross-modal retrieval: a review of methodologies, datasets, and future perspectives

With the rapid development of science and technology, all types of mixed media contain large amounts of data. Traditional single multimedia data can no longer satisfy daily requirements. Therefore, the cross-modal retrieval technology has become an urgent requirement. Consequently, there is a pressi...

Full description

Saved in:
Bibliographic Details
Main Authors: Han, Zhichao, Azman, Azreen Bin, Rina Binti Mustaffa, Mas, Binti Khalid, Fatimah
Format: Article
Language:English
Published: Institute of Electrical and Electronics Engineers Inc. 2024
Online Access:http://psasir.upm.edu.my/id/eprint/113886/1/113886.pdf
http://psasir.upm.edu.my/id/eprint/113886/
https://ieeexplore.ieee.org/document/10638061/
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.upm.eprints.113886
record_format eprints
spelling my.upm.eprints.1138862025-01-13T02:43:05Z http://psasir.upm.edu.my/id/eprint/113886/ Cross-modal retrieval: a review of methodologies, datasets, and future perspectives Han, Zhichao Azman, Azreen Bin Rina Binti Mustaffa, Mas Binti Khalid, Fatimah With the rapid development of science and technology, all types of mixed media contain large amounts of data. Traditional single multimedia data can no longer satisfy daily requirements. Therefore, the cross-modal retrieval technology has become an urgent requirement. Consequently, there is a pressing need for cross-modal retrieval technology. Its purpose is to mine the connection between different modal samples, that is, to retrieve another modal sample with approximate semantics through one modal sample. For example, users can retrieve multimedia data such as images or videos with text. However, there are differences in the modal representation of different types of multimedia data, and measuring the correlation between different modes is the main problem of cross-modal retrieval. Currently, the most popular deep learning methods have achieved remarkable results in the field of data processing and graphics. Many researchers have applied deep learning methods to cross-modal retrieval to solve the problem of similarity measurement between different multimedia data. By summarizing the relevant paper methods of cross-modal retrieval, this paper provides a definition of cross-modal retrieval problems, reviews the core ideas of the current mainstream cross-modal retrieval methods in the form of three main methods, lists the commonly used data sets and evaluation methods, and finally analyzes the problems and future research trends of cross-modal retrieval. Institute of Electrical and Electronics Engineers Inc. 2024-08 Article PeerReviewed text en cc_by_nc_nd_4 http://psasir.upm.edu.my/id/eprint/113886/1/113886.pdf Han, Zhichao and Azman, Azreen Bin and Rina Binti Mustaffa, Mas and Binti Khalid, Fatimah (2024) Cross-modal retrieval: a review of methodologies, datasets, and future perspectives. IEEE Access, 12. pp. 115716-115741. ISSN 2169-3536; eISSN: 2169-3536 https://ieeexplore.ieee.org/document/10638061/ 10.1109/ACCESS.2024.3444817
institution Universiti Putra Malaysia
building UPM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Putra Malaysia
content_source UPM Institutional Repository
url_provider http://psasir.upm.edu.my/
language English
description With the rapid development of science and technology, all types of mixed media contain large amounts of data. Traditional single multimedia data can no longer satisfy daily requirements. Therefore, the cross-modal retrieval technology has become an urgent requirement. Consequently, there is a pressing need for cross-modal retrieval technology. Its purpose is to mine the connection between different modal samples, that is, to retrieve another modal sample with approximate semantics through one modal sample. For example, users can retrieve multimedia data such as images or videos with text. However, there are differences in the modal representation of different types of multimedia data, and measuring the correlation between different modes is the main problem of cross-modal retrieval. Currently, the most popular deep learning methods have achieved remarkable results in the field of data processing and graphics. Many researchers have applied deep learning methods to cross-modal retrieval to solve the problem of similarity measurement between different multimedia data. By summarizing the relevant paper methods of cross-modal retrieval, this paper provides a definition of cross-modal retrieval problems, reviews the core ideas of the current mainstream cross-modal retrieval methods in the form of three main methods, lists the commonly used data sets and evaluation methods, and finally analyzes the problems and future research trends of cross-modal retrieval.
format Article
author Han, Zhichao
Azman, Azreen Bin
Rina Binti Mustaffa, Mas
Binti Khalid, Fatimah
spellingShingle Han, Zhichao
Azman, Azreen Bin
Rina Binti Mustaffa, Mas
Binti Khalid, Fatimah
Cross-modal retrieval: a review of methodologies, datasets, and future perspectives
author_facet Han, Zhichao
Azman, Azreen Bin
Rina Binti Mustaffa, Mas
Binti Khalid, Fatimah
author_sort Han, Zhichao
title Cross-modal retrieval: a review of methodologies, datasets, and future perspectives
title_short Cross-modal retrieval: a review of methodologies, datasets, and future perspectives
title_full Cross-modal retrieval: a review of methodologies, datasets, and future perspectives
title_fullStr Cross-modal retrieval: a review of methodologies, datasets, and future perspectives
title_full_unstemmed Cross-modal retrieval: a review of methodologies, datasets, and future perspectives
title_sort cross-modal retrieval: a review of methodologies, datasets, and future perspectives
publisher Institute of Electrical and Electronics Engineers Inc.
publishDate 2024
url http://psasir.upm.edu.my/id/eprint/113886/1/113886.pdf
http://psasir.upm.edu.my/id/eprint/113886/
https://ieeexplore.ieee.org/document/10638061/
_version_ 1821108002215690240
score 13.244413