An application for identifying movies from plot with word embeddings and deep learning

Natural language processing (NLP) is a field of study in computer science that aims to help computers understand and process human language. Advancements in NLP technology have led to improvements in interactions between humans and computers. Through NLP, the average technology user does not nece...

Full description

Saved in:
Bibliographic Details
Main Author: Kean, Soh Zhe Herng
Format: Final Year Project / Dissertation / Thesis
Published: 2023
Subjects:
Online Access:http://eprints.utar.edu.my/5519/1/fyp_CS_2023_KSZH.pdf
http://eprints.utar.edu.my/5519/
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-utar-eprints.5519
record_format eprints
spelling my-utar-eprints.55192023-09-08T13:36:28Z An application for identifying movies from plot with word embeddings and deep learning Kean, Soh Zhe Herng Q Science (General) T Technology (General) Natural language processing (NLP) is a field of study in computer science that aims to help computers understand and process human language. Advancements in NLP technology have led to improvements in interactions between humans and computers. Through NLP, the average technology user does not necessarily have to be an expert in computers to “talk” to computers. A common NLP task carried out by computers is multiclass text classification, which allows computers to group documents of similar meaning into one category. In this paper, a movie identifier from plot which implements the multiclass text classification task mentioned above through a combination of natural language processing and deep learning techniques is proposed to help people who wish to identify movies they have watched in the past but have forgotten their titles. The application can also help people who have heard of bits and pieces of a movie’s plot search for the movie themselves. The proposed model receives an input of plots from movies extracted from a dataset. Next, preprocessing is performed on the text, such as stemming and lemmatization. Stopwords are removed from the text to discard any words that are not meaningful. The corresponding movie titles of the plots are encoded into integers as targets for the model to predict. The text from the plots is tokenized and encoded into integers as well so that it can be interpreted by the model. As seen in the upcoming parts of this paper, multiple architectures will be reviewed and experimented on. However, most of these architectures follow a similar route in terms of learning features from the text mentioned above, that is transforming the tokens into some sort of embedding layer, subjecting those embeddings through multiple layers in a neural network, and finally classifying the input text and predict the title of the movie referenced in it. 2023-01 Final Year Project / Dissertation / Thesis NonPeerReviewed application/pdf http://eprints.utar.edu.my/5519/1/fyp_CS_2023_KSZH.pdf Kean, Soh Zhe Herng (2023) An application for identifying movies from plot with word embeddings and deep learning. Final Year Project, UTAR. http://eprints.utar.edu.my/5519/
institution Universiti Tunku Abdul Rahman
building UTAR Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Tunku Abdul Rahman
content_source UTAR Institutional Repository
url_provider http://eprints.utar.edu.my
topic Q Science (General)
T Technology (General)
spellingShingle Q Science (General)
T Technology (General)
Kean, Soh Zhe Herng
An application for identifying movies from plot with word embeddings and deep learning
description Natural language processing (NLP) is a field of study in computer science that aims to help computers understand and process human language. Advancements in NLP technology have led to improvements in interactions between humans and computers. Through NLP, the average technology user does not necessarily have to be an expert in computers to “talk” to computers. A common NLP task carried out by computers is multiclass text classification, which allows computers to group documents of similar meaning into one category. In this paper, a movie identifier from plot which implements the multiclass text classification task mentioned above through a combination of natural language processing and deep learning techniques is proposed to help people who wish to identify movies they have watched in the past but have forgotten their titles. The application can also help people who have heard of bits and pieces of a movie’s plot search for the movie themselves. The proposed model receives an input of plots from movies extracted from a dataset. Next, preprocessing is performed on the text, such as stemming and lemmatization. Stopwords are removed from the text to discard any words that are not meaningful. The corresponding movie titles of the plots are encoded into integers as targets for the model to predict. The text from the plots is tokenized and encoded into integers as well so that it can be interpreted by the model. As seen in the upcoming parts of this paper, multiple architectures will be reviewed and experimented on. However, most of these architectures follow a similar route in terms of learning features from the text mentioned above, that is transforming the tokens into some sort of embedding layer, subjecting those embeddings through multiple layers in a neural network, and finally classifying the input text and predict the title of the movie referenced in it.
format Final Year Project / Dissertation / Thesis
author Kean, Soh Zhe Herng
author_facet Kean, Soh Zhe Herng
author_sort Kean, Soh Zhe Herng
title An application for identifying movies from plot with word embeddings and deep learning
title_short An application for identifying movies from plot with word embeddings and deep learning
title_full An application for identifying movies from plot with word embeddings and deep learning
title_fullStr An application for identifying movies from plot with word embeddings and deep learning
title_full_unstemmed An application for identifying movies from plot with word embeddings and deep learning
title_sort application for identifying movies from plot with word embeddings and deep learning
publishDate 2023
url http://eprints.utar.edu.my/5519/1/fyp_CS_2023_KSZH.pdf
http://eprints.utar.edu.my/5519/
_version_ 1778167129546162176
score 13.214268