An automated web scraping tool for Malaysia tourism

This project is a web scraper design project for Malaysia tourism data. Data are the essential element of the data analytics process, but most public tourism data on the Internet have been overlooked for its value due to the process to collect data is very time-consuming and difficult. Therefore, th...

Full description

Saved in:
Bibliographic Details
Main Author: Choong, Wei Jen
Format: Final Year Project / Dissertation / Thesis
Published: 2019
Subjects:
Online Access:http://eprints.utar.edu.my/3493/1/CS%2D2019%2D1505499.pdf
http://eprints.utar.edu.my/3493/
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-utar-eprints.3493
record_format eprints
spelling my-utar-eprints.34932019-08-20T04:16:07Z An automated web scraping tool for Malaysia tourism Choong, Wei Jen Q Science (General) This project is a web scraper design project for Malaysia tourism data. Data are the essential element of the data analytics process, but most public tourism data on the Internet have been overlooked for its value due to the process to collect data is very time-consuming and difficult. Therefore, this project is motivated to provide a low-cost and simple solution for collecting public tourism data on the Internet. Insights will be offered to those who want to build their own web scraper on the methodology, concept, and design through the realization of this project. As for the technical part, agile System Development Life Cycle (SDLC) methodology is being adopted throughout this project. Emphasize of this project has been placed on capturing the public tourism data from the travel website by targeting the HTML code structure of that particular website. Thus, this project will be demonstrating how to interpret the HTML code structure of a website and how to locate targeted element for data extraction through HTML locator. Besides, this project will discuss on the selection of the most suitable programming language, libraries, tools and frameworks. As this project will be developed in Python, therefore the understanding on building a simple user interface using Python and the technique to save the extracted data into a csv file will be delivered as well. Furthermore, this project also covered some degree of data pre-processing because the extracted data attributes may have excessive text. A very important aspect in this project is to test the performance of the proposed system, therefore the most appropriate testing approach will also be surveyed and implemented on the system. Last but not least, a contingency plan regarding backup and recovery will also be discussed in case of event that system encountered errors. A web scraping system which is specifically designed for Malaysia tourism will be developed to ease the process of collecting tourism data and it could potentially bring the focus of tourism industries and government sector on the public tourism data for the improvement of Malaysia tourism. 2019-04-23 Final Year Project / Dissertation / Thesis NonPeerReviewed application/pdf http://eprints.utar.edu.my/3493/1/CS%2D2019%2D1505499.pdf Choong, Wei Jen (2019) An automated web scraping tool for Malaysia tourism. Final Year Project, UTAR. http://eprints.utar.edu.my/3493/
institution Universiti Tunku Abdul Rahman
building UTAR Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Tunku Abdul Rahman
content_source UTAR Institutional Repository
url_provider http://eprints.utar.edu.my
topic Q Science (General)
spellingShingle Q Science (General)
Choong, Wei Jen
An automated web scraping tool for Malaysia tourism
description This project is a web scraper design project for Malaysia tourism data. Data are the essential element of the data analytics process, but most public tourism data on the Internet have been overlooked for its value due to the process to collect data is very time-consuming and difficult. Therefore, this project is motivated to provide a low-cost and simple solution for collecting public tourism data on the Internet. Insights will be offered to those who want to build their own web scraper on the methodology, concept, and design through the realization of this project. As for the technical part, agile System Development Life Cycle (SDLC) methodology is being adopted throughout this project. Emphasize of this project has been placed on capturing the public tourism data from the travel website by targeting the HTML code structure of that particular website. Thus, this project will be demonstrating how to interpret the HTML code structure of a website and how to locate targeted element for data extraction through HTML locator. Besides, this project will discuss on the selection of the most suitable programming language, libraries, tools and frameworks. As this project will be developed in Python, therefore the understanding on building a simple user interface using Python and the technique to save the extracted data into a csv file will be delivered as well. Furthermore, this project also covered some degree of data pre-processing because the extracted data attributes may have excessive text. A very important aspect in this project is to test the performance of the proposed system, therefore the most appropriate testing approach will also be surveyed and implemented on the system. Last but not least, a contingency plan regarding backup and recovery will also be discussed in case of event that system encountered errors. A web scraping system which is specifically designed for Malaysia tourism will be developed to ease the process of collecting tourism data and it could potentially bring the focus of tourism industries and government sector on the public tourism data for the improvement of Malaysia tourism.
format Final Year Project / Dissertation / Thesis
author Choong, Wei Jen
author_facet Choong, Wei Jen
author_sort Choong, Wei Jen
title An automated web scraping tool for Malaysia tourism
title_short An automated web scraping tool for Malaysia tourism
title_full An automated web scraping tool for Malaysia tourism
title_fullStr An automated web scraping tool for Malaysia tourism
title_full_unstemmed An automated web scraping tool for Malaysia tourism
title_sort automated web scraping tool for malaysia tourism
publishDate 2019
url http://eprints.utar.edu.my/3493/1/CS%2D2019%2D1505499.pdf
http://eprints.utar.edu.my/3493/
_version_ 1646031182239891456
score 13.160551