Impact of dataset balancing on machine learning-based intrusion detection systems

Intrusion Detection Systems (IDS) are indispensable for cybersecurity, as they safeguard networks from increasingly sophisticated and sophisticated cyberattacks. This paper assesses the influence of dataset balancing on the performance of machine learning-based IDS, thereby addressing the challenge...

Full description

Saved in:
Bibliographic Details
Main Authors: Yusri, Muhammad Iqbal, Habaebi, Mohamed Hadi, Gunawan, Teddy Surya, Mansor, Hasmah, Kartiwi, Mira, Nur, Levy Olivia
Format: Proceeding Paper
Language:English
Published: IEEE 2024
Subjects:
Online Access:http://irep.iium.edu.my/114534/7/114534_Impact%20of%20dataset%20balancing.pdf
http://irep.iium.edu.my/114534/
https://ieeexplore.ieee.org/document/10675568
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.iium.irep.114534
record_format dspace
spelling my.iium.irep.1145342024-09-20T07:45:31Z http://irep.iium.edu.my/114534/ Impact of dataset balancing on machine learning-based intrusion detection systems Yusri, Muhammad Iqbal Habaebi, Mohamed Hadi Gunawan, Teddy Surya Mansor, Hasmah Kartiwi, Mira Nur, Levy Olivia TK7885 Computer engineering Intrusion Detection Systems (IDS) are indispensable for cybersecurity, as they safeguard networks from increasingly sophisticated and sophisticated cyberattacks. This paper assesses the influence of dataset balancing on the performance of machine learning-based IDS, thereby addressing the challenge of imbalanced data in detecting network intrusions. We concentrate on three IDS implementations: Tree-based Intelligent IDS, Multi-Tiered Hybrid IDS (MTH-IDS), and Leader Class and Confidence Decision Ensemble (LCCDE). We utilized the Synthetic Minority Over-Sampling Technique (SMOTE) to balance data and implemented feature selection and hyperparameter optimization to improve the model's performance using the CICIDS 2017 dataset. The LCCDE model exhibits the highest performance, as our comparative analysis demonstrates that the combination of SMOTE and feature selection enhances the F1 scores. The results underscore the significance of advanced ensemble techniques and data preprocessing in developing resilient IDS. This research emphasizes the necessity of ongoing optimization and evaluation of IDS models to guarantee effective protection against the development of cyber threats. IEEE 2024-09-18 Proceeding Paper PeerReviewed application/pdf en http://irep.iium.edu.my/114534/7/114534_Impact%20of%20dataset%20balancing.pdf Yusri, Muhammad Iqbal and Habaebi, Mohamed Hadi and Gunawan, Teddy Surya and Mansor, Hasmah and Kartiwi, Mira and Nur, Levy Olivia (2024) Impact of dataset balancing on machine learning-based intrusion detection systems. In: 2024 IEEE 10th International Conference on Smart Instrumentation, Measurement and Applications (ICSIMA), 30-31 July 2024, BANDUNG, INDONESIA. https://ieeexplore.ieee.org/document/10675568 10.1109/ICSIMA62563.2024.10675568
institution Universiti Islam Antarabangsa Malaysia
building IIUM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider International Islamic University Malaysia
content_source IIUM Repository (IREP)
url_provider http://irep.iium.edu.my/
language English
topic TK7885 Computer engineering
spellingShingle TK7885 Computer engineering
Yusri, Muhammad Iqbal
Habaebi, Mohamed Hadi
Gunawan, Teddy Surya
Mansor, Hasmah
Kartiwi, Mira
Nur, Levy Olivia
Impact of dataset balancing on machine learning-based intrusion detection systems
description Intrusion Detection Systems (IDS) are indispensable for cybersecurity, as they safeguard networks from increasingly sophisticated and sophisticated cyberattacks. This paper assesses the influence of dataset balancing on the performance of machine learning-based IDS, thereby addressing the challenge of imbalanced data in detecting network intrusions. We concentrate on three IDS implementations: Tree-based Intelligent IDS, Multi-Tiered Hybrid IDS (MTH-IDS), and Leader Class and Confidence Decision Ensemble (LCCDE). We utilized the Synthetic Minority Over-Sampling Technique (SMOTE) to balance data and implemented feature selection and hyperparameter optimization to improve the model's performance using the CICIDS 2017 dataset. The LCCDE model exhibits the highest performance, as our comparative analysis demonstrates that the combination of SMOTE and feature selection enhances the F1 scores. The results underscore the significance of advanced ensemble techniques and data preprocessing in developing resilient IDS. This research emphasizes the necessity of ongoing optimization and evaluation of IDS models to guarantee effective protection against the development of cyber threats.
format Proceeding Paper
author Yusri, Muhammad Iqbal
Habaebi, Mohamed Hadi
Gunawan, Teddy Surya
Mansor, Hasmah
Kartiwi, Mira
Nur, Levy Olivia
author_facet Yusri, Muhammad Iqbal
Habaebi, Mohamed Hadi
Gunawan, Teddy Surya
Mansor, Hasmah
Kartiwi, Mira
Nur, Levy Olivia
author_sort Yusri, Muhammad Iqbal
title Impact of dataset balancing on machine learning-based intrusion detection systems
title_short Impact of dataset balancing on machine learning-based intrusion detection systems
title_full Impact of dataset balancing on machine learning-based intrusion detection systems
title_fullStr Impact of dataset balancing on machine learning-based intrusion detection systems
title_full_unstemmed Impact of dataset balancing on machine learning-based intrusion detection systems
title_sort impact of dataset balancing on machine learning-based intrusion detection systems
publisher IEEE
publishDate 2024
url http://irep.iium.edu.my/114534/7/114534_Impact%20of%20dataset%20balancing.pdf
http://irep.iium.edu.my/114534/
https://ieeexplore.ieee.org/document/10675568
_version_ 1811679654727122944
score 13.214268