An application for identifying movies from plot with word embeddings and deep learning
Natural language processing (NLP) is a field of study in computer science that aims to help computers understand and process human language. Advancements in NLP technology have led to improvements in interactions between humans and computers. Through NLP, the average technology user does not nece...
Saved in:
Main Author: | |
---|---|
Format: | Final Year Project / Dissertation / Thesis |
Published: |
2023
|
Subjects: | |
Online Access: | http://eprints.utar.edu.my/5519/1/fyp_CS_2023_KSZH.pdf http://eprints.utar.edu.my/5519/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
my-utar-eprints.5519 |
---|---|
record_format |
eprints |
spelling |
my-utar-eprints.55192023-09-08T13:36:28Z An application for identifying movies from plot with word embeddings and deep learning Kean, Soh Zhe Herng Q Science (General) T Technology (General) Natural language processing (NLP) is a field of study in computer science that aims to help computers understand and process human language. Advancements in NLP technology have led to improvements in interactions between humans and computers. Through NLP, the average technology user does not necessarily have to be an expert in computers to “talk” to computers. A common NLP task carried out by computers is multiclass text classification, which allows computers to group documents of similar meaning into one category. In this paper, a movie identifier from plot which implements the multiclass text classification task mentioned above through a combination of natural language processing and deep learning techniques is proposed to help people who wish to identify movies they have watched in the past but have forgotten their titles. The application can also help people who have heard of bits and pieces of a movie’s plot search for the movie themselves. The proposed model receives an input of plots from movies extracted from a dataset. Next, preprocessing is performed on the text, such as stemming and lemmatization. Stopwords are removed from the text to discard any words that are not meaningful. The corresponding movie titles of the plots are encoded into integers as targets for the model to predict. The text from the plots is tokenized and encoded into integers as well so that it can be interpreted by the model. As seen in the upcoming parts of this paper, multiple architectures will be reviewed and experimented on. However, most of these architectures follow a similar route in terms of learning features from the text mentioned above, that is transforming the tokens into some sort of embedding layer, subjecting those embeddings through multiple layers in a neural network, and finally classifying the input text and predict the title of the movie referenced in it. 2023-01 Final Year Project / Dissertation / Thesis NonPeerReviewed application/pdf http://eprints.utar.edu.my/5519/1/fyp_CS_2023_KSZH.pdf Kean, Soh Zhe Herng (2023) An application for identifying movies from plot with word embeddings and deep learning. Final Year Project, UTAR. http://eprints.utar.edu.my/5519/ |
institution |
Universiti Tunku Abdul Rahman |
building |
UTAR Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Tunku Abdul Rahman |
content_source |
UTAR Institutional Repository |
url_provider |
http://eprints.utar.edu.my |
topic |
Q Science (General) T Technology (General) |
spellingShingle |
Q Science (General) T Technology (General) Kean, Soh Zhe Herng An application for identifying movies from plot with word embeddings and deep learning |
description |
Natural language processing (NLP) is a field of study in computer science that aims to help
computers understand and process human language. Advancements in NLP technology have
led to improvements in interactions between humans and computers. Through NLP, the
average technology user does not necessarily have to be an expert in computers to “talk” to
computers. A common NLP task carried out by computers is multiclass text classification,
which allows computers to group documents of similar meaning into one category.
In this paper, a movie identifier from plot which implements the multiclass text classification
task mentioned above through a combination of natural language processing and deep learning
techniques is proposed to help people who wish to identify movies they have watched in the
past but have forgotten their titles. The application can also help people who have heard of bits
and pieces of a movie’s plot search for the movie themselves.
The proposed model receives an input of plots from movies extracted from a dataset. Next,
preprocessing is performed on the text, such as stemming and lemmatization. Stopwords are
removed from the text to discard any words that are not meaningful. The corresponding movie
titles of the plots are encoded into integers as targets for the model to predict. The text from
the plots is tokenized and encoded into integers as well so that it can be interpreted by the
model. As seen in the upcoming parts of this paper, multiple architectures will be reviewed and
experimented on. However, most of these architectures follow a similar route in terms of
learning features from the text mentioned above, that is transforming the tokens into some sort
of embedding layer, subjecting those embeddings through multiple layers in a neural network,
and finally classifying the input text and predict the title of the movie referenced in it. |
format |
Final Year Project / Dissertation / Thesis |
author |
Kean, Soh Zhe Herng |
author_facet |
Kean, Soh Zhe Herng |
author_sort |
Kean, Soh Zhe Herng |
title |
An application for identifying movies from plot with word embeddings and deep learning
|
title_short |
An application for identifying movies from plot with word embeddings and deep learning
|
title_full |
An application for identifying movies from plot with word embeddings and deep learning
|
title_fullStr |
An application for identifying movies from plot with word embeddings and deep learning
|
title_full_unstemmed |
An application for identifying movies from plot with word embeddings and deep learning
|
title_sort |
application for identifying movies from plot with word embeddings and deep learning |
publishDate |
2023 |
url |
http://eprints.utar.edu.my/5519/1/fyp_CS_2023_KSZH.pdf http://eprints.utar.edu.my/5519/ |
_version_ |
1778167129546162176 |
score |
13.214268 |