Staff View: Web Data Extraction Approach for Deep Web using WEIDJ

Web Data Extraction Approach for Deep Web using WEIDJ

Data extraction is one of the most prominent areas in data mining analysis that is been extensively studied especially in the field of data requirements and reservoir. The main aim of data extraction with regards to semi-structured data is to retrieve beneficial information from the World Wide Web...

Full description

Saved in:

Bibliographic Details
Main Authors:	Wan Aezwani, Wan Abu Bakar, Ahmad Nazari, Mohd Rose
Format:	Conference or Workshop Item
Language:	English
Published:	2019
Subjects:	QA75 Electronic computers. Computer science QA76 Computer software
Online Access:	http://eprints.unisza.edu.my/1870/1/FH03-FIK-20-37010.pdf http://eprints.unisza.edu.my/1870/
Tags:	Add Tag No Tags, Be the first to tag this record!

id	my-unisza-ir.1870
record_format	eprints
spelling	my-unisza-ir.18702020-11-23T08:17:53Z http://eprints.unisza.edu.my/1870/ Web Data Extraction Approach for Deep Web using WEIDJ Wan Aezwani, Wan Abu Bakar Ahmad Nazari, Mohd Rose QA75 Electronic computers. Computer science QA76 Computer software Data extraction is one of the most prominent areas in data mining analysis that is been extensively studied especially in the field of data requirements and reservoir. The main aim of data extraction with regards to semi-structured data is to retrieve beneficial information from the World Wide Web. The data from large web data also known as deep web is retrievable but it requires request through form submission because it cannot be performed by any search engines. Data mining applications and automatic data extraction are very cumbersome due to the diverse structure of web pages. Most of the previous data extraction techniques were dealing with various data types such as text, audio, video and etc. but research works that are focusing on image as data are still lacking. Document Object Model (DOM) is an example of the state of the art of data extraction technique that is related to research work in mining image data. DOM was the method used to solve semi-structured data extraction from web. However, as the HTML documents start to grow larger, it has been found that the process of data extraction has been plagued with lengthy processing time and noisy information. In this research work, we propose an improved model namely Wrapper Extraction of Image using DOM and JSON (WEIDJ) in response to the promising results of mining in a higher volume of web data from a various types of image format and taking the consideration of web data extraction from deep web. To observe the efficiency of the proposed model, we compare the performance of data extraction by different level of page extraction with existing methods such as VIBS, MDR, DEPTA and VIDE. It has yielded the best results in Precision with 100, Recall with 97.93103 and F-measure with 98.9547. 2019 Conference or Workshop Item NonPeerReviewed text en http://eprints.unisza.edu.my/1870/1/FH03-FIK-20-37010.pdf Wan Aezwani, Wan Abu Bakar and Ahmad Nazari, Mohd Rose (2019) Web Data Extraction Approach for Deep Web using WEIDJ. In: 16th International Learning and Technology Conference, L and T 2019, 30-31 January 2019, Effat UniversityJeddah; Saudi Arabia.
institution	Universiti Sultan Zainal Abidin
building	UNISZA Library
collection	Institutional Repository
continent	Asia
country	Malaysia
content_provider	Universiti Sultan Zainal Abidin
content_source	UNISZA Institutional Repository
url_provider	https://eprints.unisza.edu.my/
language	English
topic	QA75 Electronic computers. Computer science QA76 Computer software
spellingShingle	QA75 Electronic computers. Computer science QA76 Computer software Wan Aezwani, Wan Abu Bakar Ahmad Nazari, Mohd Rose Web Data Extraction Approach for Deep Web using WEIDJ
description	Data extraction is one of the most prominent areas in data mining analysis that is been extensively studied especially in the field of data requirements and reservoir. The main aim of data extraction with regards to semi-structured data is to retrieve beneficial information from the World Wide Web. The data from large web data also known as deep web is retrievable but it requires request through form submission because it cannot be performed by any search engines. Data mining applications and automatic data extraction are very cumbersome due to the diverse structure of web pages. Most of the previous data extraction techniques were dealing with various data types such as text, audio, video and etc. but research works that are focusing on image as data are still lacking. Document Object Model (DOM) is an example of the state of the art of data extraction technique that is related to research work in mining image data. DOM was the method used to solve semi-structured data extraction from web. However, as the HTML documents start to grow larger, it has been found that the process of data extraction has been plagued with lengthy processing time and noisy information. In this research work, we propose an improved model namely Wrapper Extraction of Image using DOM and JSON (WEIDJ) in response to the promising results of mining in a higher volume of web data from a various types of image format and taking the consideration of web data extraction from deep web. To observe the efficiency of the proposed model, we compare the performance of data extraction by different level of page extraction with existing methods such as VIBS, MDR, DEPTA and VIDE. It has yielded the best results in Precision with 100, Recall with 97.93103 and F-measure with 98.9547.
format	Conference or Workshop Item
author	Wan Aezwani, Wan Abu Bakar Ahmad Nazari, Mohd Rose
author_facet	Wan Aezwani, Wan Abu Bakar Ahmad Nazari, Mohd Rose
author_sort	Wan Aezwani, Wan Abu Bakar
title	Web Data Extraction Approach for Deep Web using WEIDJ
title_short	Web Data Extraction Approach for Deep Web using WEIDJ
title_full	Web Data Extraction Approach for Deep Web using WEIDJ
title_fullStr	Web Data Extraction Approach for Deep Web using WEIDJ
title_full_unstemmed	Web Data Extraction Approach for Deep Web using WEIDJ
title_sort	web data extraction approach for deep web using weidj
publishDate	2019
url	http://eprints.unisza.edu.my/1870/1/FH03-FIK-20-37010.pdf http://eprints.unisza.edu.my/1870/
_version_	1684657765566382080
score	13.160551

Web Data Extraction Approach for Deep Web using WEIDJ

Similar Items