Staff View: Framework for mining XML format business process log data

Framework for mining XML format business process log data

With the advent of the Internet, there is a dramatic increase in the volume of semi-structured and unstructured data. Therefore, a lot of frequent subtree mining (FSM) algorithms and methods were developed to get information from semi-structured data specifically data with hierarchical nature. Howev...

Full description

Saved in:

Bibliographic Details
Main Author:	Ang, Jin Sheng
Format:	Thesis
Language:	English English English
Published:	2024
Subjects:	T58.5-58.64 Information technology QA299.6-433 Analysis
Online Access:	https://etd.uum.edu.my/11012/1/permission%20to%20deposit-allow%20embargo%2012%20months-s904045.pdf https://etd.uum.edu.my/11012/2/s904045_01.pdf https://etd.uum.edu.my/11012/3/s904045_02.pdf https://etd.uum.edu.my/11012/
Tags:	Add Tag No Tags, Be the first to tag this record!

id	my.uum.etd.11012
record_format	eprints
spelling	my.uum.etd.110122024-02-29T00:24:50Z https://etd.uum.edu.my/11012/ Framework for mining XML format business process log data Ang, Jin Sheng T58.5-58.64 Information technology QA299.6-433 Analysis With the advent of the Internet, there is a dramatic increase in the volume of semi-structured and unstructured data. Therefore, a lot of frequent subtree mining (FSM) algorithms and methods were developed to get information from semi-structured data specifically data with hierarchical nature. However, many existing FSM algorithms and methods often neglect or fail to preserve structural information, which hinders extracting meaningful insights from such data. Besides, statistical analysis and data mining techniques are difficult to be applied in eXtensible Markup Language (XML) format documents. This study introduces an alternative approach for mining XML format documents which can be modelled into tree-structured format. The Flatten Sequential Structure Model (FSSM) was developed to transform tree-structured data into structured, preserving its structural integrity, thus facilitating comprehensive statistical analysis and data mining. FSSM was divided into two phases. The first phase converted tree structure data into flat structure with the structural information. The second phase converted the first phase data into structured format. After that, statistical analysis or classification were conducted. The effectiveness of the methods and framework was assessed by applying them to both simulation datasets and real-life event logs, namely the Business Process Intelligence Challenge (BPIC). After applying FSSM phases to simulation and real-life event log data, descriptive statistics, t-tests, and chi-square tests were successfully executed. Association rules revealed that they outnumbered those from existing FSM methods. The Random Forest model outperformed others with a classification accuracy of 0.75 for simulation data, while the decision tree achieved the highest accuracy (0.7474) in the BPIC 2017 dataset. In the BPIC 2018 dataset, all three models performed well, exceeding 0.99 in classification accuracy. The results indicate that by transforming complex hierarchical data into a format suitable for statistical analysis, the analysis process is simplified and made more accessible to researchers in various fields. 2024 Thesis NonPeerReviewed text en https://etd.uum.edu.my/11012/1/permission%20to%20deposit-allow%20embargo%2012%20months-s904045.pdf text en https://etd.uum.edu.my/11012/2/s904045_01.pdf text en https://etd.uum.edu.my/11012/3/s904045_02.pdf Ang, Jin Sheng (2024) Framework for mining XML format business process log data. Doctoral thesis, Universiti Utara Malaysia.
institution	Universiti Utara Malaysia
building	UUM Library
collection	Institutional Repository
continent	Asia
country	Malaysia
content_provider	Universiti Utara Malaysia
content_source	UUM Electronic Theses
url_provider	http://etd.uum.edu.my/
language	English English English
topic	T58.5-58.64 Information technology QA299.6-433 Analysis
spellingShingle	T58.5-58.64 Information technology QA299.6-433 Analysis Ang, Jin Sheng Framework for mining XML format business process log data
description	With the advent of the Internet, there is a dramatic increase in the volume of semi-structured and unstructured data. Therefore, a lot of frequent subtree mining (FSM) algorithms and methods were developed to get information from semi-structured data specifically data with hierarchical nature. However, many existing FSM algorithms and methods often neglect or fail to preserve structural information, which hinders extracting meaningful insights from such data. Besides, statistical analysis and data mining techniques are difficult to be applied in eXtensible Markup Language (XML) format documents. This study introduces an alternative approach for mining XML format documents which can be modelled into tree-structured format. The Flatten Sequential Structure Model (FSSM) was developed to transform tree-structured data into structured, preserving its structural integrity, thus facilitating comprehensive statistical analysis and data mining. FSSM was divided into two phases. The first phase converted tree structure data into flat structure with the structural information. The second phase converted the first phase data into structured format. After that, statistical analysis or classification were conducted. The effectiveness of the methods and framework was assessed by applying them to both simulation datasets and real-life event logs, namely the Business Process Intelligence Challenge (BPIC). After applying FSSM phases to simulation and real-life event log data, descriptive statistics, t-tests, and chi-square tests were successfully executed. Association rules revealed that they outnumbered those from existing FSM methods. The Random Forest model outperformed others with a classification accuracy of 0.75 for simulation data, while the decision tree achieved the highest accuracy (0.7474) in the BPIC 2017 dataset. In the BPIC 2018 dataset, all three models performed well, exceeding 0.99 in classification accuracy. The results indicate that by transforming complex hierarchical data into a format suitable for statistical analysis, the analysis process is simplified and made more accessible to researchers in various fields.
format	Thesis
author	Ang, Jin Sheng
author_facet	Ang, Jin Sheng
author_sort	Ang, Jin Sheng
title	Framework for mining XML format business process log data
title_short	Framework for mining XML format business process log data
title_full	Framework for mining XML format business process log data
title_fullStr	Framework for mining XML format business process log data
title_full_unstemmed	Framework for mining XML format business process log data
title_sort	framework for mining xml format business process log data
publishDate	2024
url	https://etd.uum.edu.my/11012/1/permission%20to%20deposit-allow%20embargo%2012%20months-s904045.pdf https://etd.uum.edu.my/11012/2/s904045_01.pdf https://etd.uum.edu.my/11012/3/s904045_02.pdf https://etd.uum.edu.my/11012/
_version_	1793158727691403264
score	13.160551

Framework for mining XML format business process log data

Similar Items