Staff View: Impact of data balancing and feature selection on machine learning based network intrusion detection

Impact of data balancing and feature selection on machine learning based network intrusion detection

Unbalanced datasets are a common problem in supervised machine learning. It leads to a deeper understanding of the majority of classes in machine learning. Therefore, the machine learning model is more effective at recognizing the majority classes than the minority classes. Naturally, imbalanced da...

Full description

Saved in:

Bibliographic Details
Main Authors:	Barkah, Azhari Shouni, Selamat, Siti Rahayu, Zainal Abidin, Zaheera, Wahyudi, Rizki
Format:	Article
Language:	English
Published:	Information Technology Department, Politeknik Negeri Padang 2023
Online Access:	http://eprints.utem.edu.my/id/eprint/27751/2/0101729052023.pdf http://eprints.utem.edu.my/id/eprint/27751/ https://joiv.org/index.php/joiv/article/view/1041/0 http://dx.doi.org/10.30630/joiv.7.1.1041
Tags:	Add Tag No Tags, Be the first to tag this record!

id	my.utem.eprints.27751
record_format	eprints
spelling	my.utem.eprints.277512024-10-07T12:33:49Z http://eprints.utem.edu.my/id/eprint/27751/ Impact of data balancing and feature selection on machine learning based network intrusion detection Barkah, Azhari Shouni Selamat, Siti Rahayu Zainal Abidin, Zaheera Wahyudi, Rizki Unbalanced datasets are a common problem in supervised machine learning. It leads to a deeper understanding of the majority of classes in machine learning. Therefore, the machine learning model is more effective at recognizing the majority classes than the minority classes. Naturally, imbalanced data, such as disease data and data networking, has emerged in real life. DDOS is one of the network intrusions found to happen more often than R2L. There is an imbalance in the composition of network attacks in Intrusion Detection System (IDS) public datasets such as NSL-KDD and UNSW-NB15. Besides, researchers propose many techniques to transform it into balanced data by duplicating the minority class and producing synthetic data. Synthetic Minority Oversampling Technique (SMOTE) and Adaptive Synthetic (ADASYN) algorithms duplicate the data and construct synthetic data for the minority classes. Meanwhile, machine learning algorithms can capture the labeled data's pattern by considering the input features. Unfortunately, not all the input features have an equal impact on the output (predicted class or value). Some features are interrelated and misleading. Therefore, the important features should be selected to produce a good model. In this research, we implement the recursive feature elimination (RFE) technique to select important features from the available dataset. According to the experiment, SMOTE provides a better synthetic dataset than ADASYN for the UNSW-B15 dataset with a high level of imbalance. RFE feature selection slightly reduces the model's accuracy but improves the training speed. Then, the Decision Tree classifier consistently achieves a better recognition rate than Random Forest and KNN. Information Technology Department, Politeknik Negeri Padang 2023 Article PeerReviewed text en http://eprints.utem.edu.my/id/eprint/27751/2/0101729052023.pdf Barkah, Azhari Shouni and Selamat, Siti Rahayu and Zainal Abidin, Zaheera and Wahyudi, Rizki (2023) Impact of data balancing and feature selection on machine learning based network intrusion detection. International Journal On Informatics Visualization, 7 (1). pp. 241-248. ISSN 2549-9610 https://joiv.org/index.php/joiv/article/view/1041/0 http://dx.doi.org/10.30630/joiv.7.1.1041
institution	Universiti Teknikal Malaysia Melaka
building	UTEM Library
collection	Institutional Repository
continent	Asia
country	Malaysia
content_provider	Universiti Teknikal Malaysia Melaka
content_source	UTEM Institutional Repository
url_provider	http://eprints.utem.edu.my/
language	English
description	Unbalanced datasets are a common problem in supervised machine learning. It leads to a deeper understanding of the majority of classes in machine learning. Therefore, the machine learning model is more effective at recognizing the majority classes than the minority classes. Naturally, imbalanced data, such as disease data and data networking, has emerged in real life. DDOS is one of the network intrusions found to happen more often than R2L. There is an imbalance in the composition of network attacks in Intrusion Detection System (IDS) public datasets such as NSL-KDD and UNSW-NB15. Besides, researchers propose many techniques to transform it into balanced data by duplicating the minority class and producing synthetic data. Synthetic Minority Oversampling Technique (SMOTE) and Adaptive Synthetic (ADASYN) algorithms duplicate the data and construct synthetic data for the minority classes. Meanwhile, machine learning algorithms can capture the labeled data's pattern by considering the input features. Unfortunately, not all the input features have an equal impact on the output (predicted class or value). Some features are interrelated and misleading. Therefore, the important features should be selected to produce a good model. In this research, we implement the recursive feature elimination (RFE) technique to select important features from the available dataset. According to the experiment, SMOTE provides a better synthetic dataset than ADASYN for the UNSW-B15 dataset with a high level of imbalance. RFE feature selection slightly reduces the model's accuracy but improves the training speed. Then, the Decision Tree classifier consistently achieves a better recognition rate than Random Forest and KNN.
format	Article
author	Barkah, Azhari Shouni Selamat, Siti Rahayu Zainal Abidin, Zaheera Wahyudi, Rizki
spellingShingle	Barkah, Azhari Shouni Selamat, Siti Rahayu Zainal Abidin, Zaheera Wahyudi, Rizki Impact of data balancing and feature selection on machine learning based network intrusion detection
author_facet	Barkah, Azhari Shouni Selamat, Siti Rahayu Zainal Abidin, Zaheera Wahyudi, Rizki
author_sort	Barkah, Azhari Shouni
title	Impact of data balancing and feature selection on machine learning based network intrusion detection
title_short	Impact of data balancing and feature selection on machine learning based network intrusion detection
title_full	Impact of data balancing and feature selection on machine learning based network intrusion detection
title_fullStr	Impact of data balancing and feature selection on machine learning based network intrusion detection
title_full_unstemmed	Impact of data balancing and feature selection on machine learning based network intrusion detection
title_sort	impact of data balancing and feature selection on machine learning based network intrusion detection
publisher	Information Technology Department, Politeknik Negeri Padang
publishDate	2023
url	http://eprints.utem.edu.my/id/eprint/27751/2/0101729052023.pdf http://eprints.utem.edu.my/id/eprint/27751/ https://joiv.org/index.php/joiv/article/view/1041/0 http://dx.doi.org/10.30630/joiv.7.1.1041
_version_	1814061422906179584
score	13.214268

Impact of data balancing and feature selection on machine learning based network intrusion detection

Similar Items