A cluster analysis of population based cancer registry in Brunei Darussalam : an exploratory study

Machine learning techniques have been mostly applied in gene expression cancer data. Socio-demographic data available in cancer registries could be explored, to get further insight into relationships between cancer types and their contributing factors. Moreover, less attention has been paid to analy...

Full description

Saved in:
Bibliographic Details
Main Authors: Lai, Daphne Teck Ching, Owais A. Malik,
Format: Article
Language:English
Published: Penerbit Universiti Kebangsaan Malaysia 2022
Online Access:http://journalarticle.ukm.my/19427/1/05.pdf
http://journalarticle.ukm.my/19427/
https://www.ukm.my/apjitm/articles-issues
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Machine learning techniques have been mostly applied in gene expression cancer data. Socio-demographic data available in cancer registries could be explored, to get further insight into relationships between cancer types and their contributing factors. Moreover, less attention has been paid to analyse the mixed demographic data (numeric and categorical) from cancer registries and its association to the cancer types. The aim of this study is to identify subgroups of patients, having similar demographics characteristics, from the population based cancer registry in Brunei Darussalam and examine the prevalent cancer types in these subgroups. Four clustering algorithms are explored in the cluster analysis of Brunei Darussalam Cancer Registry; Two-step, Partitional Around Medoid, Agglomerative Hierarchical and Model-based. Gower distance was used for measuring similarity for mixed data types. To evaluate the clusters found; cluster distribution and Silhouette index were used for cluster quality, Cohen's Kappa Index for cluster stability and Cramer's V Coefficient for clinical relevance of clusters. Six distinct demographic subgroups were consistently found by three algorithms while model-based clustering solution were not considered for deeper analysis as highly imbalanced clusters were produced. The subgroups found have good quality clusters, moderate association with cancer types and high stability. The top three prevalent cancers associated with these subgroups were consistently identified using the three algorithms. Upon comparing the subgroups’ ages during diagnosis, we identify possible screening behaviours of specific subgroups, suggesting for early screening awareness programmes. This study demonstrates the use of cluster analysis in a cancer registry to identify demographic subgroups that could suggest potential areas to develop targeted and improved healthcare management strategies.