Description: VulBERTa: simplified source code pre-training for vulnerability detection

VulBERTa: simplified source code pre-training for vulnerability detection

This paper presents VulBERTa, a deep learning approach to detect security vulnerabilities in source code. Our approach pre-trains a RoBERTa model with a custom tokenisation pipeline on real-world code from open-source C/C++ projects. The model learns a deep knowledge representation of the code synta...

Full description

Saved in:

Bibliographic Details
Main Authors:	Hanif, Hazim, Maffeis, Sergio
Format:	Conference or Workshop Item
Published:	IEEE 2022
Subjects:	QA75 Electronic computers. Computer science QA76 Computer software
Online Access:	http://eprints.um.edu.my/40469/
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	This paper presents VulBERTa, a deep learning approach to detect security vulnerabilities in source code. Our approach pre-trains a RoBERTa model with a custom tokenisation pipeline on real-world code from open-source C/C++ projects. The model learns a deep knowledge representation of the code syntax and semantics, which we leverage to train vulnerability detection classifiers. We evaluate our approach on binary and multi-class vulnerability detection tasks across several datasets (Vuldeepecker, Draper, REVEAL and muVuldeepecker) and benchmarks (CodeXGLUE and D2A). The evaluation results show that VulBERTa achieves state-of-the-art performance and outperforms existing approaches across different datasets, despite its conceptual simplicity, and limited cost in terms of size of training data and number of model parameters.

VulBERTa: simplified source code pre-training for vulnerability detection

Similar Items