www.sagelogix.com advanced pl/sql and oracle etl doug cosman senior oracle dba sagelogix, inc....

41
www.SageLogix.com Advanced PL/SQL and Oracle ETL Doug Cosman Senior Oracle DBA SageLogix, Inc. [email protected] Open World 2003

Upload: rudolph-kelly

Post on 12-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Www.SageLogix.com Advanced PL/SQL and Oracle ETL Doug Cosman Senior Oracle DBA SageLogix, Inc. doug_cosman@yahoo.com Open World 2003

www.SageLogix.com

Advanced PL/SQL and Oracle ETL

Doug CosmanSenior Oracle DBA

SageLogix, [email protected]

Open World 2003Open World 2003

Page 2: Www.SageLogix.com Advanced PL/SQL and Oracle ETL Doug Cosman Senior Oracle DBA SageLogix, Inc. doug_cosman@yahoo.com Open World 2003

www.SageLogix.com

AgendaAgenda

Overview of Oracle 9i ETLProvides Fast Transformations Using Only the 9i DB

Advanced PL/SQL Features Necessary for Understanding Oracle 9i ETL

PL/SQL Performance Techniques for Data Warehouse Environments

Page 3: Www.SageLogix.com Advanced PL/SQL and Oracle ETL Doug Cosman Senior Oracle DBA SageLogix, Inc. doug_cosman@yahoo.com Open World 2003

www.SageLogix.com

What Is ETL?What Is ETL?

ExtractPull the Data From the Source

TransformConvert the Input Format to the Target Format

Encode any Values

LoadInsert the Transformed Data to the Target Tables

Page 4: Www.SageLogix.com Advanced PL/SQL and Oracle ETL Doug Cosman Senior Oracle DBA SageLogix, Inc. doug_cosman@yahoo.com Open World 2003

www.SageLogix.com

Oracle 9i ETLOracle 9i ETL

ExtractOracle 9i External Tables

TransformPL/SQL Pipelined Table Functions

Oracle Warehouse Builder Can also be Used to Build Pipelined Table Functions

Maps Source Data Layout and Target Schema and Builds PL/SQL and SQL Code

LoadDirect Path Inserts

Page 5: Www.SageLogix.com Advanced PL/SQL and Oracle ETL Doug Cosman Senior Oracle DBA SageLogix, Inc. doug_cosman@yahoo.com Open World 2003

www.SageLogix.com

Performance FactorsPerformance Factors

SQL Execution TimeEfficiency of Execution Plan

Hardware Resource Waits

Code Logic Execution TimeSpeed of Host Language

Variable BindingTime to Bind Values Back to Host Language

Page 6: Www.SageLogix.com Advanced PL/SQL and Oracle ETL Doug Cosman Senior Oracle DBA SageLogix, Inc. doug_cosman@yahoo.com Open World 2003

www.SageLogix.com

PL/SQL BindingPL/SQL Binding

Types of BindsIN-Binds

Bind Values From Host Language to SQL Engine

OUT-Binds

Values are Returned from SQL Objects to Host Variables

Bind OptionsSingle Row Binds

Bulk Binds

Page 7: Www.SageLogix.com Advanced PL/SQL and Oracle ETL Doug Cosman Senior Oracle DBA SageLogix, Inc. doug_cosman@yahoo.com Open World 2003

www.SageLogix.com

Single Row BindsSingle Row Binds

Cursor FOR-LOOP

DECLARE CURSOR cust_cur (p_customer_id NUMBER) IS SELECT * FROM f_sales_detail WHERE customer_id = p_customer_id;  v_customer_id NUMBER := 1234;  BEGIN FOR rec IN cust_cur (v_customer_id) LOOP INSERT INTO sales_hist (customer_id, detail_id, process_date) VALUES (v_customer_id, rec.sales_id, sysdate); END LOOP;END; 

Page 8: Www.SageLogix.com Advanced PL/SQL and Oracle ETL Doug Cosman Senior Oracle DBA SageLogix, Inc. doug_cosman@yahoo.com Open World 2003

www.SageLogix.com

Context SwitchingContext Switching

PL/SQL EngineDB SQL Engine

OUT-BIND

IN-BIND

Page 9: Www.SageLogix.com Advanced PL/SQL and Oracle ETL Doug Cosman Senior Oracle DBA SageLogix, Inc. doug_cosman@yahoo.com Open World 2003

www.SageLogix.com

Single Row BindsSingle Row Binds

The Most Expensive Operation by Far is the Binding

Single Row Binding is SLOW for Large Result Sets

Page 10: Www.SageLogix.com Advanced PL/SQL and Oracle ETL Doug Cosman Senior Oracle DBA SageLogix, Inc. doug_cosman@yahoo.com Open World 2003

www.SageLogix.com

Bulk BindingBulk Binding

PL/SQL Bulk Bind Support added in 8i

IN-Binds An Array of Values is Passed to the SQL Engine

OUT-Binds SQL Engine Populates a PL/SQL Bind-Array

Context Switch Once per Batch Instead of Once per Row

Performance Increase of Up to 15 Times

Page 11: Www.SageLogix.com Advanced PL/SQL and Oracle ETL Doug Cosman Senior Oracle DBA SageLogix, Inc. doug_cosman@yahoo.com Open World 2003

www.SageLogix.com

Bulk OperatorsBulk Operators

BULK COLLECTSpecifies that Bulk Fetches Should be Used

Be Careful to Handle Last Batch

LIMITDefines the Batch Size for Bulk Collections

FORALLBulk DML Operator

Not a Looping Construct like a Cursor-For-Loop

PL/SQL Table is Referenced in the Statement

Page 12: Www.SageLogix.com Advanced PL/SQL and Oracle ETL Doug Cosman Senior Oracle DBA SageLogix, Inc. doug_cosman@yahoo.com Open World 2003

DECLARE TYPE sales_t IS TABLE OF f_sales_detail.sales_id%TYPE

INDEX BY BINARY_INTEGER;  sales_ids sales_t; v_customer_id NUMBER := 1234; max_rows CONSTANT NUMBER := 10000;  CURSOR sales(p_customer_id NUMBER) IS SELECT sales_id FROM f_sales_detail WHERE customer_id = p_customer_id; BEGIN OPEN sales(v_customer_id); LOOP EXIT WHEN sales%NOTFOUND; FETCH sales BULK COLLECT INTO sales_ids LIMIT max_rows; FORALL i IN 1..sales_ids.COUNT INSERT INTO sales_hist (customer_id, detail_id, process_date) VALUES (v_customer_id, sales_ids(i), sysdate);  END LOOP; CLOSE sales; END;

Page 13: Www.SageLogix.com Advanced PL/SQL and Oracle ETL Doug Cosman Senior Oracle DBA SageLogix, Inc. doug_cosman@yahoo.com Open World 2003

www.SageLogix.com

Native CompilationNative Compilation

Allows PL/SQL to be Executed as a Compiled C Program

Requires Native C Compiler on Host

Enabling Set init.ora PLSQL_* Parameters

Compile as Native Code

PL/SQL is First Compiled Down to P-Code

C Source Code is Generated from the P-Code

Native Compiler is Invoked Creating a ‘C’ Shared Object Library

Subsequent Calls to PL/SQL Object are Run by the ‘C’ Library

Page 14: Www.SageLogix.com Advanced PL/SQL and Oracle ETL Doug Cosman Senior Oracle DBA SageLogix, Inc. doug_cosman@yahoo.com Open World 2003

www.SageLogix.com

Native CompilationNative Compilation

Performance Language Execution Speed is About Five Times Faster when not Interacting with the Database

In Typical Code Interacting with Larger Data Volumes Execution Speed is Very Similar to Interpreted Code

Remember that Variable Binding can be a Bigger Factor than Code Execution Speed

Mixing Native and Interpreted PL/SQLOracle Recommends an All or None Approach for Production

Including User-Defined and Supplied Packages

Page 15: Www.SageLogix.com Advanced PL/SQL and Oracle ETL Doug Cosman Senior Oracle DBA SageLogix, Inc. doug_cosman@yahoo.com Open World 2003

www.SageLogix.com

Collection TypesCollection Types

Associative Arrays (PL/SQL Tables)PL/SQL Type Only

Nested TablesShared Type

VarraysShared Type

Page 16: Www.SageLogix.com Advanced PL/SQL and Oracle ETL Doug Cosman Senior Oracle DBA SageLogix, Inc. doug_cosman@yahoo.com Open World 2003

www.SageLogix.com

Associative ArraysAssociative Arrays

PL/SQL Type OnlyNot a SQL Type

Easy to UseAutomatic Element Allocation

No Need to Initialize

Two Kinds in 9i Release 2INDEX BY BINARY_INTEGER

INDEX BY VARCHAR2

Similar to:

Java Hashtables

Perl and Awk Associative Arrays

Page 17: Www.SageLogix.com Advanced PL/SQL and Oracle ETL Doug Cosman Senior Oracle DBA SageLogix, Inc. doug_cosman@yahoo.com Open World 2003

www.SageLogix.com

Associative ArraysAssociative Arrays

DECLARE TYPE hash_table_t IS TABLE OF NUMBER INDEX BY VARCHAR2(30); email_map hash_table_t;

CURSOR users IS SELECT username, user_id FROM dba_users;BEGIN FOR user IN users LOOP email_map(user.username) := user.user_id; END LOOP;END;

Page 18: Www.SageLogix.com Advanced PL/SQL and Oracle ETL Doug Cosman Senior Oracle DBA SageLogix, Inc. doug_cosman@yahoo.com Open World 2003

www.SageLogix.com

Multi-Dimensional Arrays

Multi-Dimensional Arrays

New in 9i Release 1Implemented as Collection of Collections

DECLARE TYPE element IS TABLE OF NUMBER INDEX BY BINARY_INTEGER; TYPE twoDimensional IS TABLE OF element INDEX BY BINARY_INTEGER; twoD twoDimensional;  BEGIN twoD(1)(1) := 123; twoD(1)(2) := 456; END;

Page 19: Www.SageLogix.com Advanced PL/SQL and Oracle ETL Doug Cosman Senior Oracle DBA SageLogix, Inc. doug_cosman@yahoo.com Open World 2003

www.SageLogix.com

Nested TablesNested Tables

No Maximum Size

Harder to Use than Associative ArraysNeed to be Initialized

Code Must Explicitly Allocate New Elements

Shared Type with SQL

Two Options for Type DefinitionLocal PL/SQL Definition

Global SQL Type Declared in the Database

Allows Variables to be Shared Between Both Environments

Page 20: Www.SageLogix.com Advanced PL/SQL and Oracle ETL Doug Cosman Senior Oracle DBA SageLogix, Inc. doug_cosman@yahoo.com Open World 2003

www.SageLogix.com

Nested TablesNested Tables

PL/SQL Scoped Type

DECLARE

TYPE nest_tab_t IS TABLE OF NUMBER;

nt nest_tab_t := nest_tab_t();

BEGIN

FOR i IN 1..100 LOOP

nt.EXTEND;

nt(i) := i;

END LOOP;

END;

Page 21: Www.SageLogix.com Advanced PL/SQL and Oracle ETL Doug Cosman Senior Oracle DBA SageLogix, Inc. doug_cosman@yahoo.com Open World 2003

www.SageLogix.com

Nested TablesNested Tables

Globally Defined in SQL

CREATE OR REPLACE TYPE email_demo_obj_t AS OBJECT

( email_id NUMBER,

demo_code NUMBER,

value VARCHAR2(30) );

/

 

CREATE OR REPLACE TYPE email_demo_nt_t AS TABLE OF email_demo_obj_t;

/

Page 22: Www.SageLogix.com Advanced PL/SQL and Oracle ETL Doug Cosman Senior Oracle DBA SageLogix, Inc. doug_cosman@yahoo.com Open World 2003

www.SageLogix.com

Nested TablesNested Tables

SQL-Defined Nested TablesPL/SQL Variables can be Manipulated by the SQL Engine

Local PL/SQL Variables Can Be:

Sorted

Aggregated

Used for Dynamic In-Lists

Joined With SQL Tables

Joined with Other PL/SQL Nested Tables

Page 23: Www.SageLogix.com Advanced PL/SQL and Oracle ETL Doug Cosman Senior Oracle DBA SageLogix, Inc. doug_cosman@yahoo.com Open World 2003

www.SageLogix.com

Table FunctionsTable Functions

Nested Tables Enable Table Functions

SELECT * FROM TABLE( CAST(eml_dmo_nt AS email_demo_nt_t) )

TABLE OperatorTells Oracle to Treat the Variable as a SQL Table

CAST OperatorExplicitly Tells Oracle the Data Type to be Used to Handle the Operation

Page 24: Www.SageLogix.com Advanced PL/SQL and Oracle ETL Doug Cosman Senior Oracle DBA SageLogix, Inc. doug_cosman@yahoo.com Open World 2003

www.SageLogix.com

Table Function Example

Table Function Example

DECLARE eml_dmo_nt email_demo_nt_t := email_demo_nt_t();  BEGIN -- Some logic that populates the nested table … eml_dmo_nt.EXTEND(3); eml_dmo_nt(1) := email_demo_obj_t(45, 3, '23'); eml_dmo_nt(2) := email_demo_obj_t(22, 3, '41'); eml_dmo_nt(3) := email_demo_obj_t(18, 7, 'over_100k'); -- Process the data in assending order of email id. FOR r IN (SELECT * FROM TABLE(CAST(eml_dmo_nt AS email_demo_nt_t)) ORDER BY 1) LOOP dbms_output.put_line(r.email_id || ' ' || r.demo_id); END LOOP; END;

Page 25: Www.SageLogix.com Advanced PL/SQL and Oracle ETL Doug Cosman Senior Oracle DBA SageLogix, Inc. doug_cosman@yahoo.com Open World 2003

www.SageLogix.com

Returning Result SetsReturning Result Sets

Returning Collections DirectlyReturn the Data Structure Itself

Returning Reference CursorsReturns and Open Cursor to an Application

Doesn’t Return Data from PL/SQL Directly

Calling a Table Function from the SQL Context

Convert Function Return Value into a Cursor

Page 26: Www.SageLogix.com Advanced PL/SQL and Oracle ETL Doug Cosman Senior Oracle DBA SageLogix, Inc. doug_cosman@yahoo.com Open World 2003

www.SageLogix.com

Returning CollectionsReturning Collections

Return a Collection Type ExplicitlyBest Suited for PL/SQL Calling Programs

FUNCTION get_email_demo(p_email_id NUMBER) RETURN email_demo_nt_t IS eml_dmo email_demo_nt_t; BEGIN SELECT email_demo_obj_t(email_id, demo_id, value) BULK COLLECT INTO eml_dmo FROM email_demographic WHERE email_id = p_email_id;  -- Apply some business logic on the nested table here. RETURN eml_dmo; END;

Page 27: Www.SageLogix.com Advanced PL/SQL and Oracle ETL Doug Cosman Senior Oracle DBA SageLogix, Inc. doug_cosman@yahoo.com Open World 2003

www.SageLogix.com

Table FunctionsTable Functions

Can be Used in a SQL Context TooA Table Function Takes a Collection Type as an Argument

A Function that Returns a Collection Works Too

Allows us to Pass Out PL/SQL Collections as a Cursor to any Host Language

SELECT * FROM

TABLE( CAST( get_email_demo(45) AS email_demo_nt_t));

Page 28: Www.SageLogix.com Advanced PL/SQL and Oracle ETL Doug Cosman Senior Oracle DBA SageLogix, Inc. doug_cosman@yahoo.com Open World 2003

www.SageLogix.com

Table FunctionsTable Functions

Data is Buffered in the Local Variable During Function Execution

Cursor Returns Rows after Function CompletesPrivate Memory Issues if the Result Set is LargeNeed a Way to Stream Results

9i Pipelined Table FunctionsProvides a Streaming InterfaceRows are Returned as they are ProducedRows are Actually Buffered in Small Batches

Remember Bulk Binding Issue?Can be Run in ParallelPIPELINED KeywordPIPE ROW Operator

Page 29: Www.SageLogix.com Advanced PL/SQL and Oracle ETL Doug Cosman Senior Oracle DBA SageLogix, Inc. doug_cosman@yahoo.com Open World 2003

www.SageLogix.com

Pipelined Table Function

Pipelined Table Function

FUNCTION get_email_demo RETURN email_demo_nt_t PIPELINED IS CURSOR email_demo_cur IS SELECT email_demo_obj_t(email_id, demo_id, value) FROM email_demographic;  eml_dmo_nt email_demo_nt_t; BEGIN OPEN email_demo_cur; LOOP EXIT WHEN email_demo_cur%NOTFOUND; FETCH email_demo_cur BULK COLLECT INTO eml_dmo_nt LIMIT 1000;  FOR i IN 1..eml_dmo_nt.COUNT LOOP /* Apply some business logic on the object here, and return a row. */ PIPE ROW (eml_dmo_nt(i)); END LOOP;  END LOOP; RETURN;END;

Page 30: Www.SageLogix.com Advanced PL/SQL and Oracle ETL Doug Cosman Senior Oracle DBA SageLogix, Inc. doug_cosman@yahoo.com Open World 2003

www.SageLogix.com

External TablesExternal Tables

One Last Piece of Background Information

Oracle 9i External TablesProvides a Way for Oracle to Read Directly from Flat Files on the Database Server

File can be Queried as if it is a Real Database Table

Can Sort, Aggregate, Filter Rows, etc.

External File Can be Queried in Parallel

Only Table Definition is Stored in the Database

Data is ‘External’

Table Definition is Similar to SQL*Loader Control File

Page 31: Www.SageLogix.com Advanced PL/SQL and Oracle ETL Doug Cosman Senior Oracle DBA SageLogix, Inc. doug_cosman@yahoo.com Open World 2003

www.SageLogix.com

External TablesExternal Tables

CREATE TABLE ext_tab (email VARCHAR2(50), age NUMBER, income VARCHAR2(20))ORGANIZATION EXTERNAL ( TYPE oracle_loader DEFAULT DIRECTORY data_dirACCESS PARAMETERS (RECORDS DELIMITED BY NEWLINE LOGFILE data_dir: 'ext_tab.log' BADFILE data_dir: 'ext_tab.bad' FIELDS TERMINATED BY ',' MISSING FIELD VALUES ARE NULL (email CHAR(50), age INTEGER EXTERNAL(2), income CHAR(20) ) ) LOCATION ('ext_tab.dat') ) REJECT LIMIT UNLIMITED;

Page 32: Www.SageLogix.com Advanced PL/SQL and Oracle ETL Doug Cosman Senior Oracle DBA SageLogix, Inc. doug_cosman@yahoo.com Open World 2003

www.SageLogix.com

ETL ExampleETL Example

EMAIL AGE INCOME

[email protected] 58 over_100k

EMAIL_ID DEMO_CODE VALUE

2345 3 58

2345 7 over_100k

• Normalize, Encode and Pivot Input Record

Page 33: Www.SageLogix.com Advanced PL/SQL and Oracle ETL Doug Cosman Senior Oracle DBA SageLogix, Inc. doug_cosman@yahoo.com Open World 2003

PACKAGE BODY etl IS TYPE hash_table_t IS TABLE OF NUMBER INDEX BY VARCHAR2(30); email_map hash_table_t;  FUNCTION transform (new_data SYS_REFCURSOR) RETURN email_demo_nt_t PIPELINED PARALLEL_ENABLE ( PARTITION new_data BY ANY ) IS  TYPE ext_tab_array IS TABLE OF ext_tab%ROWTYPE INDEX BY BINARY_INTEGER;  indata ext_tab_array; email_demo_obj email_demo_obj_t := email_demo_obj_t(null,null,null); demo_map hash_table_t;  BEGIN LOOP EXIT WHEN new_data%NOTFOUND; FETCH new_data BULK COLLECT INTO indata LIMIT 1000; FOR i IN 1..indata.COUNT LOOP email_demo_obj.email_id := email_map(indata(i).email); email_demo_obj.demo_code := 3; email_demo_obj.value := indata(i).age; PIPE ROW (email_demo_obj); email_demo_obj.demo_code := 7; email_demo_obj.value := indata(i).income; PIPE ROW (email_demo_obj); END LOOP; END LOOP; RETURN; END;BEGIN FOR email IN (SELECT email_id, email FROM email) LOOP email_map(email.email) := email.email_id; END LOOP;END;

Page 34: Www.SageLogix.com Advanced PL/SQL and Oracle ETL Doug Cosman Senior Oracle DBA SageLogix, Inc. doug_cosman@yahoo.com Open World 2003

www.SageLogix.com

Oracle 9i ETLOracle 9i ETL

Transformation is Just a Simple INSERT as SELECT

Elegant Solution to Parallel, Transactional Co-processing

INSERT /*+ append nologging */ INTO email_demographic

(SELECT /*+ parallel( a, 4 ) */ *

FROM TABLE(

CAST( etl.transform( CURSOR(SELECT * FROM ext_tab )) AS email_demo_nt_t)) a);

Page 35: Www.SageLogix.com Advanced PL/SQL and Oracle ETL Doug Cosman Senior Oracle DBA SageLogix, Inc. doug_cosman@yahoo.com Open World 2003

www.SageLogix.com

Parallel Co-processingParallel Co-processing

DB

PL/SQL

PL/SQL

PL/SQL

PL/SQL

PQ Slave

PQ Slave

PQ Slave

PQ Slave

InputFile INSERT

Extract Transform Load

Page 36: Www.SageLogix.com Advanced PL/SQL and Oracle ETL Doug Cosman Senior Oracle DBA SageLogix, Inc. doug_cosman@yahoo.com Open World 2003

www.SageLogix.com

Performance IssuesPerformance Issues

Speed is Respectable but There is a Performance Bottleneck with the Table Function Mechanism

Possibly an Issue Binding Data Back from the SQL Engine

Throughput is about Three Times Slower than Coding with BULK COLLECT and FORALL Operators

However These Don’t Support Parallel Operations

Oracle Expects to have it Fixed in Next Release

Page 37: Www.SageLogix.com Advanced PL/SQL and Oracle ETL Doug Cosman Senior Oracle DBA SageLogix, Inc. doug_cosman@yahoo.com Open World 2003

www.SageLogix.com

ETL AlternativesETL Alternatives

The Multi-Table INSERT Statement

New in 9i

Each Sub-Query Input Row Can be INSERT’ed to a Different Table

… or the Same Table Multiple Times

Faster than Using PL/SQLIt’s Always Faster to do Something in Pure SQL than Using Any Host Language

Binding is Avoided

Page 38: Www.SageLogix.com Advanced PL/SQL and Oracle ETL Doug Cosman Senior Oracle DBA SageLogix, Inc. doug_cosman@yahoo.com Open World 2003

www.SageLogix.com

Multi-Table InsertMulti-Table Insert

INSERT /*+ append nologging */ ALLINTO email_demographic (email_id, demo_id, value)VALUES (email_id, 3, age)INTO email_demographic (email_id, demo_id, value)VALUES (email_id, 7, income) (SELECT /*+ ordered index( b ) */ b.email_id, a.income, a.age FROM ext_tab a, email b WHERE a.email = b.email);

Page 39: Www.SageLogix.com Advanced PL/SQL and Oracle ETL Doug Cosman Senior Oracle DBA SageLogix, Inc. doug_cosman@yahoo.com Open World 2003

www.SageLogix.com

SQL-Only ProcessingSQL-Only Processing

DB

PQ Slave

PQ Slave

PQ Slave

PQ Slave

InputFile INSERT

Extract Transform Load

Page 40: Www.SageLogix.com Advanced PL/SQL and Oracle ETL Doug Cosman Senior Oracle DBA SageLogix, Inc. doug_cosman@yahoo.com Open World 2003

www.SageLogix.com

Performance SolutionsPerformance Solutions

Minimize SQL Execution TimeExploiting Caching to Eliminate Some SQL Look-ups and Joins

Direct Path Inserts

Code Logic Execution TimeReplacing Interpreted PL/SQL with Native Compilation

Eliminating Host Language Using Multi-Table INSERTS

Variable BindingReplace Single Row Binds with Bulk Binds

Page 41: Www.SageLogix.com Advanced PL/SQL and Oracle ETL Doug Cosman Senior Oracle DBA SageLogix, Inc. doug_cosman@yahoo.com Open World 2003

www.SageLogix.com

ConclusionConclusion

Oracle 9i ETL is a High Performance ETL Solution

Especially Once the Table Function Issue is Resolved

Already Included in the Cost of the RDBMS