Feature engineering techniques to classify cause of death from forensic autopsy reports / Ghulam Mujtaba

Forensic autopsy focuses on revealing the cause of death (CoD) by examining a dead body. This process is performed by medical pathologists during the investigation of criminal and civil law cases. In forensic autopsy, pathologists examine corpses externally and anatomically to collect autopsy findin...

Full description

Saved in:
Bibliographic Details
Main Author: Ghulam , Mujtaba
Format: Thesis
Published: 2018
Subjects:
Online Access:http://studentsrepo.um.edu.my/10667/1/Ghulam_Mujtaba.pdf
http://studentsrepo.um.edu.my/10667/6/ghulam.pdf
http://studentsrepo.um.edu.my/10667/
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.um.stud.10667
record_format eprints
spelling my.um.stud.106672021-02-08T20:34:52Z Feature engineering techniques to classify cause of death from forensic autopsy reports / Ghulam Mujtaba Ghulam , Mujtaba QA75 Electronic computers. Computer science Forensic autopsy focuses on revealing the cause of death (CoD) by examining a dead body. This process is performed by medical pathologists during the investigation of criminal and civil law cases. In forensic autopsy, pathologists examine corpses externally and anatomically to collect autopsy findings. Moreover, these experts collect the history of the deceased and death scene-related information from the deceased’s relatives and eyewitnesses. Afterward, the pathologists determine the CoD through their expert knowledge while correlating the current autopsy findings with previous autopsy reports. Therefore, determining the CoD from autopsy findings is laborious, time consuming, and subject to inconsistencies associated with any labor-intensive process. Hence, automated text classification (ATC) techniques must be employed to overcome the aforementioned issues in determining the CoD. This study aimed to employ ATC techniques to classify the CoD from forensic autopsy reports. In the ATC technique, feature engineering is a highly important step because the success or failure of any ATC model is heavily dependent on the quality of the features used in the classification task. In ATC, the traditional feature engineering techniques include bag of words (BoW) and n-gram. This study argues that BoW and its variant techniques are inadequate in determining the CoD from forensic autopsy reports because these techniques ignore word-order, word-context, and word-level synonymy and polysemy. To overcome the aforementioned issues of BoW and its variant techniques, this study aimed to achieve the following four main objectives. First, this work intended to investigate the existing feature engineering techniques to classify free-text clinical reports, including forensic autopsy reports. Second, this study aimed to develop semi-automated expert-driven feature engineering to overcome the issue of word-level synonymy and polysemy. Third, this research sought to propose a fully automated conceptual graph-based feature engineering technique to address issues in word-order and word-context. Finally, this work intended to evaluate the proposed techniques by comparing their performances with existing baseline techniques. For the experimental evaluation, forensic autopsy reports of 16 different CoDs were obtained from a very large hospital in Kuala Lumpur, Malaysia. These reports were preprocessed by applying various text preprocessing techniques. The discriminative features were then extracted from the preprocessed reports through the proposed feature engineering techniques and formed numeric master feature vectors. These master feature vectors were fed as input to six machine learning algorithms to construct and evaluate the classification models. Furthermore, to show the effectiveness of the proposed techniques, this study compared their performances with five state-of-the-art baseline feature engineering techniques. Experimental results showed that the proposed techniques outperformed the traditional BoW and its variant techniques. Moreover, support vector machines and random forest algorithms outperformed the four other algorithms. The proposed techniques are feasible and practical in determining the CoD from forensic autopsy reports and can assist pathologists to accurately and rapidly determine the CoD from autopsy findings. Finally, the proposed techniques are generally applicable to other kinds of free-text clinical reports. 2018-08 Thesis NonPeerReviewed application/pdf http://studentsrepo.um.edu.my/10667/1/Ghulam_Mujtaba.pdf application/pdf http://studentsrepo.um.edu.my/10667/6/ghulam.pdf Ghulam , Mujtaba (2018) Feature engineering techniques to classify cause of death from forensic autopsy reports / Ghulam Mujtaba. PhD thesis, University of Malaya. http://studentsrepo.um.edu.my/10667/
institution Universiti Malaya
building UM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Malaya
content_source UM Student Repository
url_provider http://studentsrepo.um.edu.my/
topic QA75 Electronic computers. Computer science
spellingShingle QA75 Electronic computers. Computer science
Ghulam , Mujtaba
Feature engineering techniques to classify cause of death from forensic autopsy reports / Ghulam Mujtaba
description Forensic autopsy focuses on revealing the cause of death (CoD) by examining a dead body. This process is performed by medical pathologists during the investigation of criminal and civil law cases. In forensic autopsy, pathologists examine corpses externally and anatomically to collect autopsy findings. Moreover, these experts collect the history of the deceased and death scene-related information from the deceased’s relatives and eyewitnesses. Afterward, the pathologists determine the CoD through their expert knowledge while correlating the current autopsy findings with previous autopsy reports. Therefore, determining the CoD from autopsy findings is laborious, time consuming, and subject to inconsistencies associated with any labor-intensive process. Hence, automated text classification (ATC) techniques must be employed to overcome the aforementioned issues in determining the CoD. This study aimed to employ ATC techniques to classify the CoD from forensic autopsy reports. In the ATC technique, feature engineering is a highly important step because the success or failure of any ATC model is heavily dependent on the quality of the features used in the classification task. In ATC, the traditional feature engineering techniques include bag of words (BoW) and n-gram. This study argues that BoW and its variant techniques are inadequate in determining the CoD from forensic autopsy reports because these techniques ignore word-order, word-context, and word-level synonymy and polysemy. To overcome the aforementioned issues of BoW and its variant techniques, this study aimed to achieve the following four main objectives. First, this work intended to investigate the existing feature engineering techniques to classify free-text clinical reports, including forensic autopsy reports. Second, this study aimed to develop semi-automated expert-driven feature engineering to overcome the issue of word-level synonymy and polysemy. Third, this research sought to propose a fully automated conceptual graph-based feature engineering technique to address issues in word-order and word-context. Finally, this work intended to evaluate the proposed techniques by comparing their performances with existing baseline techniques. For the experimental evaluation, forensic autopsy reports of 16 different CoDs were obtained from a very large hospital in Kuala Lumpur, Malaysia. These reports were preprocessed by applying various text preprocessing techniques. The discriminative features were then extracted from the preprocessed reports through the proposed feature engineering techniques and formed numeric master feature vectors. These master feature vectors were fed as input to six machine learning algorithms to construct and evaluate the classification models. Furthermore, to show the effectiveness of the proposed techniques, this study compared their performances with five state-of-the-art baseline feature engineering techniques. Experimental results showed that the proposed techniques outperformed the traditional BoW and its variant techniques. Moreover, support vector machines and random forest algorithms outperformed the four other algorithms. The proposed techniques are feasible and practical in determining the CoD from forensic autopsy reports and can assist pathologists to accurately and rapidly determine the CoD from autopsy findings. Finally, the proposed techniques are generally applicable to other kinds of free-text clinical reports.
format Thesis
author Ghulam , Mujtaba
author_facet Ghulam , Mujtaba
author_sort Ghulam , Mujtaba
title Feature engineering techniques to classify cause of death from forensic autopsy reports / Ghulam Mujtaba
title_short Feature engineering techniques to classify cause of death from forensic autopsy reports / Ghulam Mujtaba
title_full Feature engineering techniques to classify cause of death from forensic autopsy reports / Ghulam Mujtaba
title_fullStr Feature engineering techniques to classify cause of death from forensic autopsy reports / Ghulam Mujtaba
title_full_unstemmed Feature engineering techniques to classify cause of death from forensic autopsy reports / Ghulam Mujtaba
title_sort feature engineering techniques to classify cause of death from forensic autopsy reports / ghulam mujtaba
publishDate 2018
url http://studentsrepo.um.edu.my/10667/1/Ghulam_Mujtaba.pdf
http://studentsrepo.um.edu.my/10667/6/ghulam.pdf
http://studentsrepo.um.edu.my/10667/
_version_ 1738506395697807360
score 13.209306