DATA COMPRESSION AND DATA HIDING DURING LARGE DATA INGESTION

This paper explains Data Ingestion which is the process of collecting data. Data ingestion usually occurs in the internal organization so that the organization can analyze the data further. A famous file storage for big data analysis is Hadoop Distributed File System (HDFS). There are two tools r...

Full description

Saved in:
Bibliographic Details
Main Author: Lai, Zhen Yean
Format: Final Year Project
Language:English
Published: IRC 2019
Subjects:
Online Access:http://utpedia.utp.edu.my/20909/1/LAI%20ZHEN%20YEAN_22888.pdf
http://utpedia.utp.edu.my/20909/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This paper explains Data Ingestion which is the process of collecting data. Data ingestion usually occurs in the internal organization so that the organization can analyze the data further. A famous file storage for big data analysis is Hadoop Distributed File System (HDFS). There are two tools related to data ingestion in Hadoop, which are Apache Sqoop and Apache Flume. Apache Sqoop is a tool to transfer data between Hadoop and Relational Database Management System (RDBMS) . Apache Flume is a distributed service to collect data from multiple variety of sources and forward to Hadoop Storage. The concerns of these tools are they do not have built-in data compression and data hiding feature during the data transmission. The proposed solution to this concern is applying the Fixed Length Coding (FLC) compression with Audio Steganography technique by using a new data ingestion method to achieve data compression and data hiding. The proposed solution methodology is implementing the data compression and audio steganography during the transmission of the data from RDBMS to Hadoop Distributed File System (HDFS) Storage. However, there is an inefficient aspect which is the capability of overcome data loss during audio steganography. Further performance evaluation is performed to valid the data transmission, the evaluation parameters including compression ratio, signal to noise ratio and information loss.