Using Relational Algebra Operators to Model ETL Data Conciliation Tasks

Vasco Santos (School of Management and Technology & Polytechnic of Porto, Portugal); Orlando Belo (University of Minho, Portugal)

The design and development of a Data Warehousing System (DWS) is a high risk/reward project due to exceptional demand for complex resources. In order to minimize the risk, some design methodologies and tools are used along the several phases of the project. The Extract-Transform-Load (ETL) component is one of the most critical components of a DWS since it gathers, corrects and conforms data to be loaded into the Data Warehouse. Data conciliation tasks are considered a dull and manual intensive job that normally deals with heterogeneous sources making it critical to the correct representation of the enterprise's information. In this article, we analyze some of the common ETL tasks for data conciliation using a Relational Algebra approach, as an effort to standardize them for future use in a generic ETL environment. A slowly changed dimension scenario will be used to support the data conciliation modelling process designed for this work.

Journal: International Journal of Simulation- Systems, Science and Technology- IJSSST V16

Published: Feb 28, 2015

DOI: 10.5013/IJSSST.a.16.01.04