Comparison on some machine learning techniques in breast cancer classification

Breast cancer is the second most common cancer after lung cancer and one of the main causes of death worldwide. Women have a higher risk of breast cancer as compared to men. Thus, one of the early diagnosis with an accurate and reliable system is critical in breast cancer treatment. Machine learning...

Full description

Saved in:
Bibliographic Details
Main Authors: Mashudi, N. A., Rossli, S. A., Ahmad, N., Mohd. Noor, N.
Format: Conference or Workshop Item
Language:English
Published: 2021
Subjects:
Online Access:http://eprints.utm.my/id/eprint/95683/1/NurulAmirahMashudi2021_ComparisononSomeMachineLearningTechniques.pdf
http://eprints.utm.my/id/eprint/95683/
http://dx.doi.org/10.1109/IECBES48179.2021.9398837
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Breast cancer is the second most common cancer after lung cancer and one of the main causes of death worldwide. Women have a higher risk of breast cancer as compared to men. Thus, one of the early diagnosis with an accurate and reliable system is critical in breast cancer treatment. Machine learning techniques are well known and popular among researchers, especially for classification and prediction. An investigation was conducted to evaluate the performance of breast cancer classification for malignant tumors and benign tumors using various machine learning techniques, namely k-Nearest Neighbors (k-NN), Random Forest, and Support Vector Machine (SVM) and ensemble techniques to compute the prediction of the breast cancer survival by implementing 10-fold cross validation. Additionally, the proposed methods are classified using 2-fold, 3-fold, and 5-fold cross validation to meet the best accuracy rate. This study used a dataset obtained from Wisconsin Diagnostic Breast Cancer (WDBC) with 23 selected attributes measured from 569 patients, from which 212 patients have malignant tumors and 357 patients have benign tumors. The performance evaluation of the proposed methods was computed to obtain accuracy, sensitivity, and specificity. Comparison results between all methods show that AdaBoost ensemble methods gave the highest accuracy at 98.77% for 10-fold cross validation, while 2-fold and 3-fold cross validation at 98.41% and 98.24%, respectively. Nevertheless, the result with 5-fold cross validation show SVM produced the best accuracy rate at 98.60% with the lowest error rate.