Generic code clone detection model for java applications

Code clone is a common term used for codes that are repeated multiple times in a program. There are Type 1, Type 2, Type 3 and Type 4 code clones. Various code clone detection approaches and models have been used to detect a code clone. However, a major challenge faced in detecting code clone using...

Full description

Saved in:
Bibliographic Details
Main Authors: Mubarak-Ali, Al-Fahim, Sulaiman, Shahida
Format: Conference or Workshop Item
Language:English
Published: 2020
Subjects:
Online Access:http://eprints.utm.my/id/eprint/92334/1/ShahidaSulaiman2020_GenericCodeCloneDetectionModelforJavaApplications.pdf
http://eprints.utm.my/id/eprint/92334/
http://dx.doi.org/10.1088/1757-899X/769/1/012023
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Code clone is a common term used for codes that are repeated multiple times in a program. There are Type 1, Type 2, Type 3 and Type 4 code clones. Various code clone detection approaches and models have been used to detect a code clone. However, a major challenge faced in detecting code clone using these models is the lack of generality in detecting all clone types. To address this problem, Generic Code Clone Detection (GCCD) model that consists of five processes which are Preprocessing, Transformation, Parameterization, Categorization and Match Detection process is proposed. Initially, a pre-processing process produces source units through the application of five combinatorial rules. This is followed by the transformation process to produce transformed source units based on the letter to number substitution concept. Next, a parameterization process produces parameters used in categorization and match detection process. Next, a categorization process groups the source units into pools. Finally, a match detection process uses a hybrid exact matching with Euclidean distance to detect the clones. Based on these processes, a prototype of the GCCD was developed using Netbeans 8.0. The model was compared with the Generic Pipeline Model (GPM). The comparisons showed that the GCCD was able to detect clone pairs of Type-1 until Type-4 while the GPM was able to detect clone pair for Type-1 only. Furthermore, the GCCD prototype was empirically tested with Bellons benchmark data and it was able to detect clones in Java applications with up to 203,000 line of codes. As a conclusion, the GCCD model is able to overcome the lack of generality in detecting all code clone types by detecting Type 1, Type 2, Type 3 and Type 4 clones.