Identification of expressive speech video segment using acoustic features / Nur Amanini Syahirah Alim

A sound retrieval method enables users to easily obtain their preferred sound. When we communicate, we exchange the expressive and related messages. This project reviews about identification of expressive speech video segment using acoustic features. Specifically, the segmented expressive speech ret...

Full description

Saved in:
Bibliographic Details
Main Author: Alim, Nur Amanini Syahirah
Format: Thesis
Language:English
Published: 2017
Subjects:
Online Access:https://ir.uitm.edu.my/id/eprint/98097/1/98097.pdf
https://ir.uitm.edu.my/id/eprint/98097/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:A sound retrieval method enables users to easily obtain their preferred sound. When we communicate, we exchange the expressive and related messages. This project reviews about identification of expressive speech video segment using acoustic features. Specifically, the segmented expressive speech retrieves the expressive speech and non-expressive speech from the video. From the sermon video that we have choose, the expression of motivator looks like similar from the beginning until the end. The audience cannot focus on what the motivator is talk about because there is no interesting part based on the motivator’s expression. This project applies manual video segmentation to differentiate expressive speech and non-expressive speech. Then, this project extracted the audio features from segmented expressive and non-expressive speech such as pitch and intensity by using Pratt tools. Then, we used Random Forest Classifier technique in Spyder (IDE) using Python language to get the accuracy which is 43% and used the prediction method to classify the expressive speech and non-expressive speech as the intended results. The training audio features was trained to get the performance accuracy. The correctness of the project has been showed from the evaluation. The project compared the predicted and manually segmented data to get the percentage of matches using pitch, the percentage of match is 80% while using the intensity is 75%. The correctness of the results has been verified to improve the identification of expressive speech video segment automatically.