Deep learning-based colorectal cancer classification using augmented and normalised gut microbiome data / Mwenge Mulenga

Colorectal cancer is the third most deadly cancer worldwide. The use of gut microbiome in early detection of the disease has attracted much attention from the research community due to its non-invasive nature. Recent achievements in next generation sequencing technology that have resulted in an incr...

Full description

Saved in:
Bibliographic Details
Main Author: Mwenge , Mulenga
Format: Thesis
Published: 2022
Subjects:
Online Access:http://studentsrepo.um.edu.my/14415/1/Mwenge_Mulenga.pdf
http://studentsrepo.um.edu.my/14415/2/Mwenge.pdf
http://studentsrepo.um.edu.my/14415/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Colorectal cancer is the third most deadly cancer worldwide. The use of gut microbiome in early detection of the disease has attracted much attention from the research community due to its non-invasive nature. Recent achievements in next generation sequencing technology that have resulted in an increased availability of sequence data have also created an enabling environment for the growth of the gut microbiome research area. At the same time, there has been growing interest from the research community in machine learning based detection of diseases using sequence based on gut microbiome data. The detection of colorectal cancer using this approach offers a non-invasive alternative in colorectal cancer research where data can be obtained from stool samples. Considering the limitations of existing methods for colorectal cancer detection, such as colonoscopy and faecal occult blood test, the medical research community has adopted the use of sequence data to identify the disease. While the complex relations that exist between the microbiome and host phenotypes make machine learning algorithms suitable for analysing the microbiome data, deep learning methods are becoming more popular due to their outstanding performance in related fields. However, the performance of deep learning methods is also affected by limitations such as dimensionality, sparsity, and feature dominance inherent in microbiome data. Therefore, to address the above-mentioned limitations in deep learning classification of colorectal cancer based on gut microbiome data, three objectives were formulated. First, to investigate the methods used to address limitations associated with microbiome-based datasets in colorectal cancer identification using deep neural network algorithms. Second, to develop novel techniques that combine the strengths of normalisation, feature engineering and data augmentation to address the problem of dimensionality, feature dominance and sparsity in colorectal cancer identification based on gut microbiome data, using deep neural network algorithms. Third, to evaluate the proposed techniques using the benchmark datasets and compare the results with the baseline methods. Consequently, the techniques for combining existing normalisation methods, namely feature extension, chaining and stacking were proposed in the research. Based on the results, the proposed techniques significantly outperform baseline methods. The research shows that a model that addresses dimensionality, feature dominance and sparsity produce outstanding prediction results in colorectal cancer identification using high sequence-based gut microbiome data. The improved results due to the proposed techniques could aid the growth of the research field and beyond.