Constrained clustering approach to aid in remodularisation of object-oriented software systems / Chong Chun Yong

Effective execution of software maintenance requires knowledge of the detailed working of software. The structure of a software, however, may not be clear to software maintainers because it is poorly designed or, worse, there is no updated software documentation. To effectively address this issue, r...

詳細記述

保存先:
書誌詳細
第一著者: Chong, Chun Yong
フォーマット: 学位論文
出版事項: 2016
主題:
オンライン・アクセス:http://studentsrepo.um.edu.my/6606/4/chun_yong.pdf
http://studentsrepo.um.edu.my/6606/
タグ: タグ追加
タグなし, このレコードへの初めてのタグを付けませんか!
id my.um.stud.6606
record_format eprints
institution Universiti Malaya
building UM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Malaya
content_source UM Student Repository
url_provider http://studentsrepo.um.edu.my/
topic QA76 Computer software
spellingShingle QA76 Computer software
Chong, Chun Yong
Constrained clustering approach to aid in remodularisation of object-oriented software systems / Chong Chun Yong
description Effective execution of software maintenance requires knowledge of the detailed working of software. The structure of a software, however, may not be clear to software maintainers because it is poorly designed or, worse, there is no updated software documentation. To effectively address this issue, researchers have proposed to apply software clustering to help in recovering a high-level semantic representation of the software design by grouping sets of collaborating software components into meaningful subsystems. This high-level semantic representation serves to help bridge the dichotomy between the perceived software design from the maintainers’ view and the actual code structure. However, software clustering is typically conducted in an unsupervised and rigid manner, where maintainers have no influence on the clustering results and only a single solution is produced for any given dataset. Even if maintainers possess additional information that could be useful to guide and improve the clustering results, traditional clustering algorithms have no way to take advantage of this information. These practical concerns have led the researcher to propose the idea of integrating domain knowledge into traditional unsupervised clustering algorithms, herewith referred as constrained clustering, a semi-supervised clustering technique where domain experts can explicitly exert their opinions in the form of explicit clustering constraints to restrict whether a pair of software components should or should not be clustered into the same subsystem. Apart from the explicit clustering constraints from domain experts, other sources of information to guide and improve clustering results can be derived implicitly from the source code itself. To help maintainers effectively identify and interpret the implicit information hidden in the source code, this study proposes representing software using weighted complex network in conjunction with graph theory to help in understanding and analysing the structure, behaviour, as well as the complexity of the software components and their iii relationships from the graph theory’s point of view. The results of the analysis can be subsequently converted into implicit clustering constraints. Hence, maintainers can make use of both the explicit and implicit constraints to help in creating a high-level semantic representation of the software design that is coherent and consistent with the actual code structure. This thesis proposes a constrained clustering approach to aid in remodularisation of poorly designed or poorly documented object-oriented software systems. The source code of an object-oriented software system is first converted into UML class diagrams. Next, information from the class diagrams are extracted to measure the strength of cohesion among related classes together with their relationships, and then transform them into a weighted complex network with its nodes and edges associated with measured weights. Graph theory metrics are subsequently applied onto the constructed weighted complex network so that the structure, behaviour, and the complexity of software components and their relationships can be analysed. The results are then converted into sets of clustering constraints. Guided by the explicit and implicit clustering constraints, sets of cohesive clusters are progressively derived to act as a high-level semantic representation of the software design. This research follows an empirical research methodology, where the proposed approach is validated using 40 object-oriented open-source software systems written in Java. Using MoJoFM, which is a well-established technique used to compare the similarity between multiple clustering results, the proposed approach achieves an aggregated average of 80.33% accuracy when compared against the original package diagrams of the 40 software systems, thus considerably outperforms conventional unconstrained clustering approach. The clustering results serve as supplementary information for software iv maintainers to aid in making critical decisions for re-engineering, maintaining and evolving software systems. Ultimately, this research helps in reducing the cost of software maintenance through better comprehension of the recovered software design.
format Thesis
author Chong, Chun Yong
author_facet Chong, Chun Yong
author_sort Chong, Chun Yong
title Constrained clustering approach to aid in remodularisation of object-oriented software systems / Chong Chun Yong
title_short Constrained clustering approach to aid in remodularisation of object-oriented software systems / Chong Chun Yong
title_full Constrained clustering approach to aid in remodularisation of object-oriented software systems / Chong Chun Yong
title_fullStr Constrained clustering approach to aid in remodularisation of object-oriented software systems / Chong Chun Yong
title_full_unstemmed Constrained clustering approach to aid in remodularisation of object-oriented software systems / Chong Chun Yong
title_sort constrained clustering approach to aid in remodularisation of object-oriented software systems / chong chun yong
publishDate 2016
url http://studentsrepo.um.edu.my/6606/4/chun_yong.pdf
http://studentsrepo.um.edu.my/6606/
_version_ 1738505937251991552
spelling my.um.stud.66062020-01-18T03:01:04Z Constrained clustering approach to aid in remodularisation of object-oriented software systems / Chong Chun Yong Chong, Chun Yong QA76 Computer software Effective execution of software maintenance requires knowledge of the detailed working of software. The structure of a software, however, may not be clear to software maintainers because it is poorly designed or, worse, there is no updated software documentation. To effectively address this issue, researchers have proposed to apply software clustering to help in recovering a high-level semantic representation of the software design by grouping sets of collaborating software components into meaningful subsystems. This high-level semantic representation serves to help bridge the dichotomy between the perceived software design from the maintainers’ view and the actual code structure. However, software clustering is typically conducted in an unsupervised and rigid manner, where maintainers have no influence on the clustering results and only a single solution is produced for any given dataset. Even if maintainers possess additional information that could be useful to guide and improve the clustering results, traditional clustering algorithms have no way to take advantage of this information. These practical concerns have led the researcher to propose the idea of integrating domain knowledge into traditional unsupervised clustering algorithms, herewith referred as constrained clustering, a semi-supervised clustering technique where domain experts can explicitly exert their opinions in the form of explicit clustering constraints to restrict whether a pair of software components should or should not be clustered into the same subsystem. Apart from the explicit clustering constraints from domain experts, other sources of information to guide and improve clustering results can be derived implicitly from the source code itself. To help maintainers effectively identify and interpret the implicit information hidden in the source code, this study proposes representing software using weighted complex network in conjunction with graph theory to help in understanding and analysing the structure, behaviour, as well as the complexity of the software components and their iii relationships from the graph theory’s point of view. The results of the analysis can be subsequently converted into implicit clustering constraints. Hence, maintainers can make use of both the explicit and implicit constraints to help in creating a high-level semantic representation of the software design that is coherent and consistent with the actual code structure. This thesis proposes a constrained clustering approach to aid in remodularisation of poorly designed or poorly documented object-oriented software systems. The source code of an object-oriented software system is first converted into UML class diagrams. Next, information from the class diagrams are extracted to measure the strength of cohesion among related classes together with their relationships, and then transform them into a weighted complex network with its nodes and edges associated with measured weights. Graph theory metrics are subsequently applied onto the constructed weighted complex network so that the structure, behaviour, and the complexity of software components and their relationships can be analysed. The results are then converted into sets of clustering constraints. Guided by the explicit and implicit clustering constraints, sets of cohesive clusters are progressively derived to act as a high-level semantic representation of the software design. This research follows an empirical research methodology, where the proposed approach is validated using 40 object-oriented open-source software systems written in Java. Using MoJoFM, which is a well-established technique used to compare the similarity between multiple clustering results, the proposed approach achieves an aggregated average of 80.33% accuracy when compared against the original package diagrams of the 40 software systems, thus considerably outperforms conventional unconstrained clustering approach. The clustering results serve as supplementary information for software iv maintainers to aid in making critical decisions for re-engineering, maintaining and evolving software systems. Ultimately, this research helps in reducing the cost of software maintenance through better comprehension of the recovered software design. 2016 Thesis NonPeerReviewed application/pdf http://studentsrepo.um.edu.my/6606/4/chun_yong.pdf Chong, Chun Yong (2016) Constrained clustering approach to aid in remodularisation of object-oriented software systems / Chong Chun Yong. PhD thesis, University of Malaya. http://studentsrepo.um.edu.my/6606/
score 13.153044