Impact of dataset balancing on machine learning-based intrusion detection systems
Intrusion Detection Systems (IDS) are indispensable for cybersecurity, as they safeguard networks from increasingly sophisticated and sophisticated cyberattacks. This paper assesses the influence of dataset balancing on the performance of machine learning-based IDS, thereby addressing the challenge...
Saved in:
Main Authors: | , , , , , |
---|---|
Format: | Proceeding Paper |
Language: | English |
Published: |
IEEE
2024
|
Subjects: | |
Online Access: | http://irep.iium.edu.my/114534/7/114534_Impact%20of%20dataset%20balancing.pdf http://irep.iium.edu.my/114534/ https://ieeexplore.ieee.org/document/10675568 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
my.iium.irep.114534 |
---|---|
record_format |
dspace |
spelling |
my.iium.irep.1145342024-09-20T07:45:31Z http://irep.iium.edu.my/114534/ Impact of dataset balancing on machine learning-based intrusion detection systems Yusri, Muhammad Iqbal Habaebi, Mohamed Hadi Gunawan, Teddy Surya Mansor, Hasmah Kartiwi, Mira Nur, Levy Olivia TK7885 Computer engineering Intrusion Detection Systems (IDS) are indispensable for cybersecurity, as they safeguard networks from increasingly sophisticated and sophisticated cyberattacks. This paper assesses the influence of dataset balancing on the performance of machine learning-based IDS, thereby addressing the challenge of imbalanced data in detecting network intrusions. We concentrate on three IDS implementations: Tree-based Intelligent IDS, Multi-Tiered Hybrid IDS (MTH-IDS), and Leader Class and Confidence Decision Ensemble (LCCDE). We utilized the Synthetic Minority Over-Sampling Technique (SMOTE) to balance data and implemented feature selection and hyperparameter optimization to improve the model's performance using the CICIDS 2017 dataset. The LCCDE model exhibits the highest performance, as our comparative analysis demonstrates that the combination of SMOTE and feature selection enhances the F1 scores. The results underscore the significance of advanced ensemble techniques and data preprocessing in developing resilient IDS. This research emphasizes the necessity of ongoing optimization and evaluation of IDS models to guarantee effective protection against the development of cyber threats. IEEE 2024-09-18 Proceeding Paper PeerReviewed application/pdf en http://irep.iium.edu.my/114534/7/114534_Impact%20of%20dataset%20balancing.pdf Yusri, Muhammad Iqbal and Habaebi, Mohamed Hadi and Gunawan, Teddy Surya and Mansor, Hasmah and Kartiwi, Mira and Nur, Levy Olivia (2024) Impact of dataset balancing on machine learning-based intrusion detection systems. In: 2024 IEEE 10th International Conference on Smart Instrumentation, Measurement and Applications (ICSIMA), 30-31 July 2024, BANDUNG, INDONESIA. https://ieeexplore.ieee.org/document/10675568 10.1109/ICSIMA62563.2024.10675568 |
institution |
Universiti Islam Antarabangsa Malaysia |
building |
IIUM Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
International Islamic University Malaysia |
content_source |
IIUM Repository (IREP) |
url_provider |
http://irep.iium.edu.my/ |
language |
English |
topic |
TK7885 Computer engineering |
spellingShingle |
TK7885 Computer engineering Yusri, Muhammad Iqbal Habaebi, Mohamed Hadi Gunawan, Teddy Surya Mansor, Hasmah Kartiwi, Mira Nur, Levy Olivia Impact of dataset balancing on machine learning-based intrusion detection systems |
description |
Intrusion Detection Systems (IDS) are indispensable for cybersecurity, as they safeguard networks from increasingly sophisticated and sophisticated cyberattacks. This paper assesses the influence of dataset balancing on the performance of machine learning-based IDS, thereby addressing the challenge of imbalanced data in detecting network intrusions. We concentrate on three IDS implementations: Tree-based Intelligent IDS, Multi-Tiered Hybrid IDS (MTH-IDS), and Leader Class and Confidence Decision Ensemble (LCCDE). We utilized the Synthetic Minority Over-Sampling Technique (SMOTE) to balance data and implemented feature selection and hyperparameter optimization to improve the model's performance using the CICIDS 2017 dataset. The LCCDE model exhibits the highest performance, as our comparative analysis demonstrates that the combination of SMOTE and feature selection enhances the F1 scores. The results underscore the significance of advanced ensemble techniques and data preprocessing in developing resilient IDS. This research emphasizes the necessity of ongoing optimization and evaluation of IDS models to guarantee effective protection against the development of cyber threats. |
format |
Proceeding Paper |
author |
Yusri, Muhammad Iqbal Habaebi, Mohamed Hadi Gunawan, Teddy Surya Mansor, Hasmah Kartiwi, Mira Nur, Levy Olivia |
author_facet |
Yusri, Muhammad Iqbal Habaebi, Mohamed Hadi Gunawan, Teddy Surya Mansor, Hasmah Kartiwi, Mira Nur, Levy Olivia |
author_sort |
Yusri, Muhammad Iqbal |
title |
Impact of dataset balancing on machine learning-based intrusion detection systems |
title_short |
Impact of dataset balancing on machine learning-based intrusion detection systems |
title_full |
Impact of dataset balancing on machine learning-based intrusion detection systems |
title_fullStr |
Impact of dataset balancing on machine learning-based intrusion detection systems |
title_full_unstemmed |
Impact of dataset balancing on machine learning-based intrusion detection systems |
title_sort |
impact of dataset balancing on machine learning-based intrusion detection systems |
publisher |
IEEE |
publishDate |
2024 |
url |
http://irep.iium.edu.my/114534/7/114534_Impact%20of%20dataset%20balancing.pdf http://irep.iium.edu.my/114534/ https://ieeexplore.ieee.org/document/10675568 |
_version_ |
1811679654727122944 |
score |
13.214268 |