NORMA eResearch @NCI Library

Enhancing Extraction in ETL flow by modifying as P-ECTL based on Spark Model

Bagave, Rahul (2020) Enhancing Extraction in ETL flow by modifying as P-ECTL based on Spark Model. Masters thesis, Dublin, National College of Ireland.

[thumbnail of Master of Science]
Preview
PDF (Master of Science)
Download (914kB) | Preview

Abstract

Big data is a term that refers to a massive volume of data that cannot be stored or processed efficiently with a traditional Extract Transform Load (ETL) system within a given time frame. The amount of data generated in enterprises these days is too huge and may surpass the current processing capacity. The processing of this big data with the existing ETL methods can be a tedious job as it consumes an additional amount of time. Presently, standalone system is used to extract data from varied data sources. The proposed approach deals with the current problem of processing big data for the analytical phase, precisely in Business intelligence. To solve the issue of processing massive data, the proposed framework will extract data with spark framework and distribute the extraction task over different nodes. Also, removal of dirty data takes place at the initial level, and extraction tasks running on different machines is serialized and dumped in the Relational Database Management System (RDBMS) using the Sqoop framework. The primary purpose of this research focuses on enhancing the traditional ETL process. This can be done by implementing the proposed Parallel-Extract Clean Transform Load (P-ECTL) framework based on the Spark Model. The test result shows that the P-ECTL framework improves the prolonged Extraction process and stack up well against current extraction techniques.

Item Type: Thesis (Masters)
Subjects: Q Science > QA Mathematics > Electronic computers. Computer science
T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science
T Technology > T Technology (General) > Information Technology > Cloud computing
Divisions: School of Computing > Master of Science in Cloud Computing
Depositing User: Caoimhe Ní Mhaicín
Date Deposited: 23 Mar 2020 17:23
Last Modified: 23 Mar 2020 17:23
URI: https://norma.ncirl.ie/id/eprint/4142

Actions (login required)

View Item View Item