An empirical comparative study of instance-based schema matching
The main issue concern of schema matching is how to support the merging decision by providing matching between attributes of different schemas. There have been many works in the literature toward utilizing database instances to detect the correspondence between attributes. Most of these previous wor...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English English |
Published: |
Institute of Advanced Engineering and Science (IAES)
2018
|
Subjects: | |
Online Access: | http://irep.iium.edu.my/63238/1/An%20Empirical%20Comparative%20Study%20of%20Instance-based%20Schema%20Matching_Published_Version.pdf http://irep.iium.edu.my/63238/7/63238_An%20empirical%20comparative%20study%20of%20instance-based_scopus.pdf http://irep.iium.edu.my/63238/ http://iaescore.com/journals/index.php/IJEECS/article/view/12336/8489 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | The main issue concern of schema matching is how to support the merging decision by providing matching between attributes of different schemas. There have been many works in the literature toward utilizing database instances to detect the correspondence between attributes. Most of these previous works aim at improving the match accuracy. We observed that no technique managed to provide an accurate matching for different types of data. In other words, some of the techniques treat numeric values as strings. Similarly, other techniques process textual instance, as numeric, and this negatively influences the process of discovering the match and compromising the matching result. Thus, a practical comparative study between syntactic and semantic techniques is needed. The study emphasizes on analyzing these techniques to determine the strengths and weaknesses of each technique. This paper aims at comparing two different instance-based matching techniques, namely: (i) regular expression and (ii) Google similarity to identify the match between attributes. Several analyses have been conducted on real and synthetic data sets to evaluate the performance of these techniques with respect to Precision (P), Recall (R) and F-Measure. |
---|