Dominance relationship-based skyline query framework over dynamic and incomplete database

Skyline queries rely on the notion of Pareto dominance, filter the data items by keeping only those data items that are the best, most preferred, also known as skylines, from a database to meet the user’s preferences. Skyline query has been studied extensively and a significant number of skyline...

Full description

Saved in:
Bibliographic Details
Main Author: Dehaki, Ghazaleh Babanejad
Format: Thesis
Language:English
Published: 2020
Subjects:
Online Access:http://psasir.upm.edu.my/id/eprint/98076/1/FSKTM%202021%209%20-%20IR.1.pdf
http://psasir.upm.edu.my/id/eprint/98076/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Skyline queries rely on the notion of Pareto dominance, filter the data items by keeping only those data items that are the best, most preferred, also known as skylines, from a database to meet the user’s preferences. Skyline query has been studied extensively and a significant number of skyline algorithms have been proposed, mostly attempt to resolve the optimisation problem that is mainly associated with a reduction in the processing time of skyline computations. In today’s era, the presence of incomplete data in a database is inevitable. The skyline algorithms in such situation will have to deal with several issues besides the optimisation problem. The missing values in databases give a negative influence on the number of pairwise comparisons that needs to be performed between the data items. Moreover, the transitivity property of skylines is no longer hold. Cyclic dominance is another issue that needs to be tackled as it yields empty skyline results. Furthermore, databases are dynamic in nature in which their states change throughout the time. These changes are necessary as databases must reflect the current and latest information of the applications. The changes are normally achieved through data manipulation operations and data definition operations. The skylines derived before changes are made towards the initial database are no longer valid in the new state of the database. Utilising the existing skyline algorithms would require performing the algorithms on the new state of the database. However, computing the skylines over the entire database after changes are made is inefficient as not all the data items are affected by the changes. In tackling the above stated issues, we propose a solution, named DyIn-Skyline, which consists of three main phases, namely: Phase I – processing skyline queries over the initial incomplete database, Phase II – processing skyline queries over a dynamic and incomplete database, in which the changing state of the database is due to a data manipulation operation(s) (insert, delete or update a data item(s)), and Phase III – processing skyline queries over a dynamic and incomplete database, in which the changing state of the database is due to a data definition operation(s) (add or remove a dimension(s)). For each phase, a framework is proposed. The proposed framework in the Phase I consists of three main components, namely: Data Grouping Builder (DGB), Bucket Skyline Identifier (BSI), and Final Skyline Identifier (FSI). We have also introduced and designed three lists, namely: Bucket Dominating (BDG), Bucket Dominated (BDD), and Domination History (DH) to keep track of the dominating data items, dominated data items, and dominance relationships, respectively; this information is useful and is utilised by the Phase II and Phase III of the DyIn-Skyline solution. The framework of Phase II consists of three components, namely: Skyline- Insert Identifier (S-II), which derives a set of skylines after a data item(s) is inserted into a database, Skyline-Delete Identifier (S-DI), which derives a set of skylines after an existing data item(s) is deleted from a database, and Skyline-Update Identifier (SUI), which produces a set of skylines after an existing data item(s) of a database is updated. Meanwhile, the framework of Phase III consists of two components, namely: Skyline-Add Dimension Analyser (S-ADA) which derives a set of skylines after a new dimension(s) is added to a database and Skyline-Remove Dimension Analyser (S-RDA) which derives a set of skylines after an existing dimension(s) is removed from a database. Extensive experiments have been conducted to evaluate the performance and prove the efficiency of our proposed solution, DyIn-Skyline, in processing skyline queries over a dynamic and incomplete database. The performance results of DyIn-Skyline are compared to other existing works that are the closest to this research, namely: ISkyline, SIDS, and Incoskyline. In most cases, DyIn-Skyline shows a steady performance and achieves better performance with regard to the number of pairwise comparisons and processing time compared to the previous works. Unlike ISkyline, SIDS, and Incoskyline which derive skylines over the entire database after changes are made towards the database, i.e. the new state of the database, DyIn-Skyline avoids unnecessary skyline computations. It relies on the information saved in the following lists: Bucket Dominating (BDG), Bucket Dominated (BDD), and Domination History (DH) and focuses only on those data items that are affected by the changes.