An automated web scraping tool for Malaysia tourism

This project is a web scraper design project for Malaysia tourism data. Data are the essential element of the data analytics process, but most public tourism data on the Internet have been overlooked for its value due to the process to collect data is very time-consuming and difficult. Therefore, th...

Full description

Saved in:
Bibliographic Details
Main Author: Choong, Wei Jen
Format: Final Year Project / Dissertation / Thesis
Published: 2019
Subjects:
Online Access:http://eprints.utar.edu.my/3493/1/CS%2D2019%2D1505499.pdf
http://eprints.utar.edu.my/3493/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This project is a web scraper design project for Malaysia tourism data. Data are the essential element of the data analytics process, but most public tourism data on the Internet have been overlooked for its value due to the process to collect data is very time-consuming and difficult. Therefore, this project is motivated to provide a low-cost and simple solution for collecting public tourism data on the Internet. Insights will be offered to those who want to build their own web scraper on the methodology, concept, and design through the realization of this project. As for the technical part, agile System Development Life Cycle (SDLC) methodology is being adopted throughout this project. Emphasize of this project has been placed on capturing the public tourism data from the travel website by targeting the HTML code structure of that particular website. Thus, this project will be demonstrating how to interpret the HTML code structure of a website and how to locate targeted element for data extraction through HTML locator. Besides, this project will discuss on the selection of the most suitable programming language, libraries, tools and frameworks. As this project will be developed in Python, therefore the understanding on building a simple user interface using Python and the technique to save the extracted data into a csv file will be delivered as well. Furthermore, this project also covered some degree of data pre-processing because the extracted data attributes may have excessive text. A very important aspect in this project is to test the performance of the proposed system, therefore the most appropriate testing approach will also be surveyed and implemented on the system. Last but not least, a contingency plan regarding backup and recovery will also be discussed in case of event that system encountered errors. A web scraping system which is specifically designed for Malaysia tourism will be developed to ease the process of collecting tourism data and it could potentially bring the focus of tourism industries and government sector on the public tourism data for the improvement of Malaysia tourism.