1 a comparative study between etl and e-lt approaches for loading data into a data warehouse vikas...

24
1 A Comparative Study between ETL and E-LT approaches for loading data into a Data Warehouse Vikas Ranjan CSCI 693

Post on 22-Dec-2015

226 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 A Comparative Study between ETL and E-LT approaches for loading data into a Data Warehouse Vikas Ranjan CSCI 693

1

A Comparative Study between ETL and E-LT approaches for loading data into a Data Warehouse

Vikas Ranjan

CSCI 693

Page 2: 1 A Comparative Study between ETL and E-LT approaches for loading data into a Data Warehouse Vikas Ranjan CSCI 693

2

Agenda

Introduction Extract, Transform and Load (ETL) Approach Strengths/ Weaknesses of ETL Extract, Load and Transform (E-LT) Approach Strengths/ Weaknesses of E-LT Experiments Results Conclusion/ Future Work References Q & A

Page 3: 1 A Comparative Study between ETL and E-LT approaches for loading data into a Data Warehouse Vikas Ranjan CSCI 693

3

Introduction All Business Intelligence applications are data-centric.

Large volumes of data is stored and processed along with its history in the Data Warehouse.

The Data is extracted from various heterogeneous source systems and transformed as per the business requirements.

Data is used for analytical purposes- future forecasting, profitability analysis, trend analysis etc. to drive the business.

Page 4: 1 A Comparative Study between ETL and E-LT approaches for loading data into a Data Warehouse Vikas Ranjan CSCI 693

4

Extract, Transform and Load (ETL)

Traditional approach of loading data into data warehouses. Data is first pulled/ pushed from various heterogeneous sources

like ERP, CRM, RDBMS, Flat Files. Business rules are applied on the data in the Staging Area. Transformed data is loaded into target database. Often designed backward, thus only the relevant data is fetched.

Page 5: 1 A Comparative Study between ETL and E-LT approaches for loading data into a Data Warehouse Vikas Ranjan CSCI 693

5

Strengths/ Weaknesses of ETL

Strengths Can perform complex operations in single data flow diagrams. Mostly designed backward, thus only relevant data is loaded. Used for building real-time data warehouses. Robust tools are available like Informatica, Ab Initio, Data Stage .

Weaknesses Data transformation step of ELT is performed by ETL engine,

therefore increasing processing time. Data is moved over the network twice. Since it is developed backward, more effort for future redesign.

Page 6: 1 A Comparative Study between ETL and E-LT approaches for loading data into a Data Warehouse Vikas Ranjan CSCI 693

6

Extract, Load and Transform (E-LT)

Newer Approach of loading into target data warehouse. The Data is extracted from various sources same way it is done via

ETL. This extracted data is loaded directly into target data warehouse. The transformations and complex business rules are applied by

native SQL drivers. The processing is done by database engine rather than ETL engine.

Page 7: 1 A Comparative Study between ETL and E-LT approaches for loading data into a Data Warehouse Vikas Ranjan CSCI 693

7

Strengths/ Weaknesses of E-LT

Strengths Since all the data is available, the future changes can be easily

incorporated. Once the data is loaded on the target platform, all transformations are

placed on the RDBMS engine. This reduces network congestion. Provides optimal performance as no extra hardware needed.Weaknesses E-LT suffers from a limited availability of tools like Informatica

Pushdown, Data Integrator etc. Does not work well for complex business cases. E-LT cannot be used to design near real-time enterprise data

warehouse.

Page 8: 1 A Comparative Study between ETL and E-LT approaches for loading data into a Data Warehouse Vikas Ranjan CSCI 693

8

Tools and Data Set

Software / Hardware

OS Platform: Sun M9000 Server (www.sun.com). Relational Database Management System: Teradata V2R6.2.

1 (www.teradata.com). ETL/E-LT Tool: Informatica PC 8.6.1 Hot Fix 9 Advanced E

dition (www.informatica.com).

Data Set All the data used is experiments is obtained from the test data

base a telecom company. All the experiments were conducted using Informatica as ETL

and E-LT tool.

Page 9: 1 A Comparative Study between ETL and E-LT approaches for loading data into a Data Warehouse Vikas Ranjan CSCI 693

9

Experiment 1

ETL vs. E-LT (Full Pushdown): Informatica has recently introduced both ETL and E-LT capabilities in its Power Center tool

Job Name: Tax _Write_Off Job Same Job was developed using ETL approach as well as E-LT

approach. Informatica pushed down all the code processing to the RDBMS

engine.

Page 10: 1 A Comparative Study between ETL and E-LT approaches for loading data into a Data Warehouse Vikas Ranjan CSCI 693

10

Experiment1 contd.

ETL Job: Informatica Server handled all the code processing and generated its own internal SQL

Page 11: 1 A Comparative Study between ETL and E-LT approaches for loading data into a Data Warehouse Vikas Ranjan CSCI 693

11

Experiment 1 contd.

E-LT Job ( Full Pushdown): In the full pushdown, Informatica Server pushed all the code to RDBMS engine and Informatica Server worked purely as an E-LT tool

Page 12: 1 A Comparative Study between ETL and E-LT approaches for loading data into a Data Warehouse Vikas Ranjan CSCI 693

12

Experiment 1 Results

Approximately Five Times more performance gain using E-LT Full Pushdown

Approach Data Read (rows)

Data Load (rows)

Runtime

(second)

Throughput

(rows/sec)

Memory CPU

ETL 32044 32044 110 291 DB/ETL Server

DB/ETL Server

E-LT Full 32044 32044 6 5340 DB Memory

DB CPU

Page 13: 1 A Comparative Study between ETL and E-LT approaches for loading data into a Data Warehouse Vikas Ranjan CSCI 693

13

Experiment 2

ETL vs. E-LT (Target Pushdown):

Job Name: Tax_Extract_ETL_tst Job Same job was developed using ETL approach as well as E-LT approach with Target Pushdown only The Pushdown to database engine happened at Target database only

Page 14: 1 A Comparative Study between ETL and E-LT approaches for loading data into a Data Warehouse Vikas Ranjan CSCI 693

14

Experiment 2 contd.

ETL Job: Informatica Server handled all the code processing and generated its own internal SQL

Page 15: 1 A Comparative Study between ETL and E-LT approaches for loading data into a Data Warehouse Vikas Ranjan CSCI 693

15

Experiment 2 contd.

E-LT Job (Target Pushdown): Informatica Server pushed code processing on target database to RDBMS engine

Page 16: 1 A Comparative Study between ETL and E-LT approaches for loading data into a Data Warehouse Vikas Ranjan CSCI 693

16

Experiment 2 Results

No Performance Differences since the Pushdown was on Target Database only

Approach Data Read (rows)

Data Load (rows)

Runtime

(second)

Throughput

(rows/sec)

Memory CPU

ETL 154271 462813 1879 246 ETL/Partial DB

ETL/Partial DB

E-LT Target 154271 462813 1887 245.26 DB/E-LT Server

DB/E-LT Server

Page 17: 1 A Comparative Study between ETL and E-LT approaches for loading data into a Data Warehouse Vikas Ranjan CSCI 693

17

Experiment 3

ETL vs. E-LT (Source Pushdown):

Job Name: Tax_Extract_ETL Job Same job was developed using ETL approach as well as E-LT approach with Source Pushdown only The Pushdown to database engine happened at Source database only

Page 18: 1 A Comparative Study between ETL and E-LT approaches for loading data into a Data Warehouse Vikas Ranjan CSCI 693

18

Experiment 3 contd.

ETL Job: Informatica Server handled all the code processing and generated its own internal SQL

Page 19: 1 A Comparative Study between ETL and E-LT approaches for loading data into a Data Warehouse Vikas Ranjan CSCI 693

19

Experiment 3 contd.

E-LT Job (Source Pushdown): Informatica Server pushed down code processing on source side to RDBMS engine

Page 20: 1 A Comparative Study between ETL and E-LT approaches for loading data into a Data Warehouse Vikas Ranjan CSCI 693

20

Experiments 3 Results No Performance Differences as Pushdown was on Source Database only Source Pushdown E-LT does not work with Teradata Sequence Generator

Approach Data Read (rows)

Data Load (rows)

Runtime

(second)

Throughput

(rows/sec)

Memory CPU

ETL 154271 154271 948 162 ETL/Partial DB ETL/Partial DB

E-LT Source

154271 154271 947 162 DB/E-LT Server DB/E-LT Server

Page 21: 1 A Comparative Study between ETL and E-LT approaches for loading data into a Data Warehouse Vikas Ranjan CSCI 693

21

Results Significant performance gains were obtained using full pushdown E-

LT over ETL approach.

No performance gains using both source and target pushdown of E-LT approach and ETL approach as both used ETL Server and Database Engine resources( memory and CPU).

Code changes were needed to redesign existing ETL jobs to use E-LT power of database engine.

E-LT does not work in building Real Time Data warehouses

Page 22: 1 A Comparative Study between ETL and E-LT approaches for loading data into a Data Warehouse Vikas Ranjan CSCI 693

22

Conclusion and Future Work ETL works well for very complex transformations and

active data warehouses

E-LT works well for small and medium-sized data marts and when the source and target are on the same database platform only

Future: Building data warehouse solutions using hybrid approach (combination of ETL and E-LT processes)

.

Page 23: 1 A Comparative Study between ETL and E-LT approaches for loading data into a Data Warehouse Vikas Ranjan CSCI 693

23

References

.

[1] A. Simitsis, P. Vassiliadis, T. Sellis “Optimizing ETL Processes in DataWarehouses,” in Proc. 21st International Conference on Data Engineering, 2005, (ICDE 2005), pp. 564-575. [2] G.X. Zhou, Q.S. Xie, Y. Hu, “E-LT Integration to Heterogeneous Data Information for SMEs Networking based on E-HUB,” in Proc. Fourth International Conference on Natural Computation, 2008, IEEE, pp. 212-216.[3] I. William, S. Derek, and N. Genia, DW 2.0: The Architecture for the Next Generation of Data Warehousing. Burlington, MA: Morgan Kaufman, 2008, pp. 215-229.[4] R. J. Davenport, September 2007. [Online] ETL vs. ELT: A Subjective View. InSource IT Consulting Ltd., U.K. Available at: http://www.insource.co.uk/pdf/ETL_ELT.pdf[5] L. Troy, C. Pydimukkala, How to Use PowerCenter with Teradata to Load and Unload Data, Informatica Corporation [Online], Available at: www.myinformatica.com

Page 24: 1 A Comparative Study between ETL and E-LT approaches for loading data into a Data Warehouse Vikas Ranjan CSCI 693

24

Questions ???