Feature engineering techniques to classify cause of death from forensic autopsy reports / Ghulam Mujtaba
Forensic autopsy focuses on revealing the cause of death (CoD) by examining a dead body. This process is performed by medical pathologists during the investigation of criminal and civil law cases. In forensic autopsy, pathologists examine corpses externally and anatomically to collect autopsy findin...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Published: |
2018
|
Subjects: | |
Online Access: | http://studentsrepo.um.edu.my/10667/1/Ghulam_Mujtaba.pdf http://studentsrepo.um.edu.my/10667/6/ghulam.pdf http://studentsrepo.um.edu.my/10667/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
my.um.stud.10667 |
---|---|
record_format |
eprints |
spelling |
my.um.stud.106672021-02-08T20:34:52Z Feature engineering techniques to classify cause of death from forensic autopsy reports / Ghulam Mujtaba Ghulam , Mujtaba QA75 Electronic computers. Computer science Forensic autopsy focuses on revealing the cause of death (CoD) by examining a dead body. This process is performed by medical pathologists during the investigation of criminal and civil law cases. In forensic autopsy, pathologists examine corpses externally and anatomically to collect autopsy findings. Moreover, these experts collect the history of the deceased and death scene-related information from the deceased’s relatives and eyewitnesses. Afterward, the pathologists determine the CoD through their expert knowledge while correlating the current autopsy findings with previous autopsy reports. Therefore, determining the CoD from autopsy findings is laborious, time consuming, and subject to inconsistencies associated with any labor-intensive process. Hence, automated text classification (ATC) techniques must be employed to overcome the aforementioned issues in determining the CoD. This study aimed to employ ATC techniques to classify the CoD from forensic autopsy reports. In the ATC technique, feature engineering is a highly important step because the success or failure of any ATC model is heavily dependent on the quality of the features used in the classification task. In ATC, the traditional feature engineering techniques include bag of words (BoW) and n-gram. This study argues that BoW and its variant techniques are inadequate in determining the CoD from forensic autopsy reports because these techniques ignore word-order, word-context, and word-level synonymy and polysemy. To overcome the aforementioned issues of BoW and its variant techniques, this study aimed to achieve the following four main objectives. First, this work intended to investigate the existing feature engineering techniques to classify free-text clinical reports, including forensic autopsy reports. Second, this study aimed to develop semi-automated expert-driven feature engineering to overcome the issue of word-level synonymy and polysemy. Third, this research sought to propose a fully automated conceptual graph-based feature engineering technique to address issues in word-order and word-context. Finally, this work intended to evaluate the proposed techniques by comparing their performances with existing baseline techniques. For the experimental evaluation, forensic autopsy reports of 16 different CoDs were obtained from a very large hospital in Kuala Lumpur, Malaysia. These reports were preprocessed by applying various text preprocessing techniques. The discriminative features were then extracted from the preprocessed reports through the proposed feature engineering techniques and formed numeric master feature vectors. These master feature vectors were fed as input to six machine learning algorithms to construct and evaluate the classification models. Furthermore, to show the effectiveness of the proposed techniques, this study compared their performances with five state-of-the-art baseline feature engineering techniques. Experimental results showed that the proposed techniques outperformed the traditional BoW and its variant techniques. Moreover, support vector machines and random forest algorithms outperformed the four other algorithms. The proposed techniques are feasible and practical in determining the CoD from forensic autopsy reports and can assist pathologists to accurately and rapidly determine the CoD from autopsy findings. Finally, the proposed techniques are generally applicable to other kinds of free-text clinical reports. 2018-08 Thesis NonPeerReviewed application/pdf http://studentsrepo.um.edu.my/10667/1/Ghulam_Mujtaba.pdf application/pdf http://studentsrepo.um.edu.my/10667/6/ghulam.pdf Ghulam , Mujtaba (2018) Feature engineering techniques to classify cause of death from forensic autopsy reports / Ghulam Mujtaba. PhD thesis, University of Malaya. http://studentsrepo.um.edu.my/10667/ |
institution |
Universiti Malaya |
building |
UM Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Malaya |
content_source |
UM Student Repository |
url_provider |
http://studentsrepo.um.edu.my/ |
topic |
QA75 Electronic computers. Computer science |
spellingShingle |
QA75 Electronic computers. Computer science Ghulam , Mujtaba Feature engineering techniques to classify cause of death from forensic autopsy reports / Ghulam Mujtaba |
description |
Forensic autopsy focuses on revealing the cause of death (CoD) by examining a dead body. This process is performed by medical pathologists during the investigation of criminal and civil law cases. In forensic autopsy, pathologists examine corpses externally and anatomically to collect autopsy findings. Moreover, these experts collect the history of the deceased and death scene-related information from the deceased’s relatives and eyewitnesses. Afterward, the pathologists determine the CoD through their expert knowledge while correlating the current autopsy findings with previous autopsy reports. Therefore, determining the CoD from autopsy findings is laborious, time consuming, and subject to inconsistencies associated with any labor-intensive process. Hence, automated text classification (ATC) techniques must be employed to overcome the aforementioned issues in determining the CoD. This study aimed to employ ATC techniques to classify the CoD from forensic autopsy reports. In the ATC technique, feature engineering is a highly important step because the success or failure of any ATC model is heavily dependent on the quality of the features used in the classification task. In ATC, the traditional feature engineering techniques include bag of words (BoW) and n-gram. This study argues that BoW and its variant techniques are inadequate in determining the CoD from forensic autopsy reports because these techniques ignore word-order, word-context, and word-level synonymy and polysemy. To overcome the aforementioned issues of BoW and its variant techniques, this study aimed to achieve the following four main objectives. First, this work intended to investigate the existing feature engineering techniques to classify free-text clinical reports, including forensic autopsy reports. Second, this study aimed to develop semi-automated expert-driven feature engineering to overcome the issue of word-level synonymy and polysemy. Third, this research sought to propose a fully automated conceptual graph-based feature engineering technique to address issues in word-order and word-context. Finally, this work intended to evaluate the proposed techniques by comparing their performances with existing baseline techniques. For the experimental evaluation, forensic autopsy reports of 16 different CoDs were obtained from a very large hospital in Kuala Lumpur, Malaysia. These reports were preprocessed by applying various text preprocessing techniques. The discriminative features were then extracted from the preprocessed reports through the proposed feature engineering techniques and formed numeric master feature vectors. These master feature vectors were fed as input to six machine learning algorithms to construct and evaluate the classification models. Furthermore, to show the effectiveness of the proposed techniques, this study compared their performances with five state-of-the-art baseline feature engineering techniques. Experimental results showed that the proposed techniques outperformed the traditional BoW and its variant techniques. Moreover, support vector machines and random forest algorithms outperformed the four other algorithms. The proposed techniques are feasible and practical in determining the CoD from forensic autopsy reports and can assist pathologists to accurately and rapidly determine the CoD from autopsy findings. Finally, the proposed techniques are generally applicable to other kinds of free-text clinical reports. |
format |
Thesis |
author |
Ghulam , Mujtaba |
author_facet |
Ghulam , Mujtaba |
author_sort |
Ghulam , Mujtaba |
title |
Feature engineering techniques to classify cause of death from forensic autopsy reports / Ghulam Mujtaba |
title_short |
Feature engineering techniques to classify cause of death from forensic autopsy reports / Ghulam Mujtaba |
title_full |
Feature engineering techniques to classify cause of death from forensic autopsy reports / Ghulam Mujtaba |
title_fullStr |
Feature engineering techniques to classify cause of death from forensic autopsy reports / Ghulam Mujtaba |
title_full_unstemmed |
Feature engineering techniques to classify cause of death from forensic autopsy reports / Ghulam Mujtaba |
title_sort |
feature engineering techniques to classify cause of death from forensic autopsy reports / ghulam mujtaba |
publishDate |
2018 |
url |
http://studentsrepo.um.edu.my/10667/1/Ghulam_Mujtaba.pdf http://studentsrepo.um.edu.my/10667/6/ghulam.pdf http://studentsrepo.um.edu.my/10667/ |
_version_ |
1738506395697807360 |
score |
13.209306 |