www.sagelogix.com advanced pl/sql and oracle etl doug cosman senior oracle dba sagelogix, inc....
TRANSCRIPT
www.SageLogix.com
Advanced PL/SQL and Oracle ETL
Doug CosmanSenior Oracle DBA
SageLogix, [email protected]
Open World 2003Open World 2003
www.SageLogix.com
AgendaAgenda
Overview of Oracle 9i ETLProvides Fast Transformations Using Only the 9i DB
Advanced PL/SQL Features Necessary for Understanding Oracle 9i ETL
PL/SQL Performance Techniques for Data Warehouse Environments
www.SageLogix.com
What Is ETL?What Is ETL?
ExtractPull the Data From the Source
TransformConvert the Input Format to the Target Format
Encode any Values
LoadInsert the Transformed Data to the Target Tables
www.SageLogix.com
Oracle 9i ETLOracle 9i ETL
ExtractOracle 9i External Tables
TransformPL/SQL Pipelined Table Functions
Oracle Warehouse Builder Can also be Used to Build Pipelined Table Functions
Maps Source Data Layout and Target Schema and Builds PL/SQL and SQL Code
LoadDirect Path Inserts
www.SageLogix.com
Performance FactorsPerformance Factors
SQL Execution TimeEfficiency of Execution Plan
Hardware Resource Waits
Code Logic Execution TimeSpeed of Host Language
Variable BindingTime to Bind Values Back to Host Language
www.SageLogix.com
PL/SQL BindingPL/SQL Binding
Types of BindsIN-Binds
Bind Values From Host Language to SQL Engine
OUT-Binds
Values are Returned from SQL Objects to Host Variables
Bind OptionsSingle Row Binds
Bulk Binds
www.SageLogix.com
Single Row BindsSingle Row Binds
Cursor FOR-LOOP
DECLARE CURSOR cust_cur (p_customer_id NUMBER) IS SELECT * FROM f_sales_detail WHERE customer_id = p_customer_id; v_customer_id NUMBER := 1234; BEGIN FOR rec IN cust_cur (v_customer_id) LOOP INSERT INTO sales_hist (customer_id, detail_id, process_date) VALUES (v_customer_id, rec.sales_id, sysdate); END LOOP;END;
www.SageLogix.com
Context SwitchingContext Switching
PL/SQL EngineDB SQL Engine
OUT-BIND
IN-BIND
www.SageLogix.com
Single Row BindsSingle Row Binds
The Most Expensive Operation by Far is the Binding
Single Row Binding is SLOW for Large Result Sets
www.SageLogix.com
Bulk BindingBulk Binding
PL/SQL Bulk Bind Support added in 8i
IN-Binds An Array of Values is Passed to the SQL Engine
OUT-Binds SQL Engine Populates a PL/SQL Bind-Array
Context Switch Once per Batch Instead of Once per Row
Performance Increase of Up to 15 Times
www.SageLogix.com
Bulk OperatorsBulk Operators
BULK COLLECTSpecifies that Bulk Fetches Should be Used
Be Careful to Handle Last Batch
LIMITDefines the Batch Size for Bulk Collections
FORALLBulk DML Operator
Not a Looping Construct like a Cursor-For-Loop
PL/SQL Table is Referenced in the Statement
DECLARE TYPE sales_t IS TABLE OF f_sales_detail.sales_id%TYPE
INDEX BY BINARY_INTEGER; sales_ids sales_t; v_customer_id NUMBER := 1234; max_rows CONSTANT NUMBER := 10000; CURSOR sales(p_customer_id NUMBER) IS SELECT sales_id FROM f_sales_detail WHERE customer_id = p_customer_id; BEGIN OPEN sales(v_customer_id); LOOP EXIT WHEN sales%NOTFOUND; FETCH sales BULK COLLECT INTO sales_ids LIMIT max_rows; FORALL i IN 1..sales_ids.COUNT INSERT INTO sales_hist (customer_id, detail_id, process_date) VALUES (v_customer_id, sales_ids(i), sysdate); END LOOP; CLOSE sales; END;
www.SageLogix.com
Native CompilationNative Compilation
Allows PL/SQL to be Executed as a Compiled C Program
Requires Native C Compiler on Host
Enabling Set init.ora PLSQL_* Parameters
Compile as Native Code
PL/SQL is First Compiled Down to P-Code
C Source Code is Generated from the P-Code
Native Compiler is Invoked Creating a ‘C’ Shared Object Library
Subsequent Calls to PL/SQL Object are Run by the ‘C’ Library
www.SageLogix.com
Native CompilationNative Compilation
Performance Language Execution Speed is About Five Times Faster when not Interacting with the Database
In Typical Code Interacting with Larger Data Volumes Execution Speed is Very Similar to Interpreted Code
Remember that Variable Binding can be a Bigger Factor than Code Execution Speed
Mixing Native and Interpreted PL/SQLOracle Recommends an All or None Approach for Production
Including User-Defined and Supplied Packages
www.SageLogix.com
Collection TypesCollection Types
Associative Arrays (PL/SQL Tables)PL/SQL Type Only
Nested TablesShared Type
VarraysShared Type
www.SageLogix.com
Associative ArraysAssociative Arrays
PL/SQL Type OnlyNot a SQL Type
Easy to UseAutomatic Element Allocation
No Need to Initialize
Two Kinds in 9i Release 2INDEX BY BINARY_INTEGER
INDEX BY VARCHAR2
Similar to:
Java Hashtables
Perl and Awk Associative Arrays
www.SageLogix.com
Associative ArraysAssociative Arrays
DECLARE TYPE hash_table_t IS TABLE OF NUMBER INDEX BY VARCHAR2(30); email_map hash_table_t;
CURSOR users IS SELECT username, user_id FROM dba_users;BEGIN FOR user IN users LOOP email_map(user.username) := user.user_id; END LOOP;END;
www.SageLogix.com
Multi-Dimensional Arrays
Multi-Dimensional Arrays
New in 9i Release 1Implemented as Collection of Collections
DECLARE TYPE element IS TABLE OF NUMBER INDEX BY BINARY_INTEGER; TYPE twoDimensional IS TABLE OF element INDEX BY BINARY_INTEGER; twoD twoDimensional; BEGIN twoD(1)(1) := 123; twoD(1)(2) := 456; END;
www.SageLogix.com
Nested TablesNested Tables
No Maximum Size
Harder to Use than Associative ArraysNeed to be Initialized
Code Must Explicitly Allocate New Elements
Shared Type with SQL
Two Options for Type DefinitionLocal PL/SQL Definition
Global SQL Type Declared in the Database
Allows Variables to be Shared Between Both Environments
www.SageLogix.com
Nested TablesNested Tables
PL/SQL Scoped Type
DECLARE
TYPE nest_tab_t IS TABLE OF NUMBER;
nt nest_tab_t := nest_tab_t();
BEGIN
FOR i IN 1..100 LOOP
nt.EXTEND;
nt(i) := i;
END LOOP;
END;
www.SageLogix.com
Nested TablesNested Tables
Globally Defined in SQL
CREATE OR REPLACE TYPE email_demo_obj_t AS OBJECT
( email_id NUMBER,
demo_code NUMBER,
value VARCHAR2(30) );
/
CREATE OR REPLACE TYPE email_demo_nt_t AS TABLE OF email_demo_obj_t;
/
www.SageLogix.com
Nested TablesNested Tables
SQL-Defined Nested TablesPL/SQL Variables can be Manipulated by the SQL Engine
Local PL/SQL Variables Can Be:
Sorted
Aggregated
Used for Dynamic In-Lists
Joined With SQL Tables
Joined with Other PL/SQL Nested Tables
www.SageLogix.com
Table FunctionsTable Functions
Nested Tables Enable Table Functions
SELECT * FROM TABLE( CAST(eml_dmo_nt AS email_demo_nt_t) )
TABLE OperatorTells Oracle to Treat the Variable as a SQL Table
CAST OperatorExplicitly Tells Oracle the Data Type to be Used to Handle the Operation
www.SageLogix.com
Table Function Example
Table Function Example
DECLARE eml_dmo_nt email_demo_nt_t := email_demo_nt_t(); BEGIN -- Some logic that populates the nested table … eml_dmo_nt.EXTEND(3); eml_dmo_nt(1) := email_demo_obj_t(45, 3, '23'); eml_dmo_nt(2) := email_demo_obj_t(22, 3, '41'); eml_dmo_nt(3) := email_demo_obj_t(18, 7, 'over_100k'); -- Process the data in assending order of email id. FOR r IN (SELECT * FROM TABLE(CAST(eml_dmo_nt AS email_demo_nt_t)) ORDER BY 1) LOOP dbms_output.put_line(r.email_id || ' ' || r.demo_id); END LOOP; END;
www.SageLogix.com
Returning Result SetsReturning Result Sets
Returning Collections DirectlyReturn the Data Structure Itself
Returning Reference CursorsReturns and Open Cursor to an Application
Doesn’t Return Data from PL/SQL Directly
Calling a Table Function from the SQL Context
Convert Function Return Value into a Cursor
www.SageLogix.com
Returning CollectionsReturning Collections
Return a Collection Type ExplicitlyBest Suited for PL/SQL Calling Programs
FUNCTION get_email_demo(p_email_id NUMBER) RETURN email_demo_nt_t IS eml_dmo email_demo_nt_t; BEGIN SELECT email_demo_obj_t(email_id, demo_id, value) BULK COLLECT INTO eml_dmo FROM email_demographic WHERE email_id = p_email_id; -- Apply some business logic on the nested table here. RETURN eml_dmo; END;
www.SageLogix.com
Table FunctionsTable Functions
Can be Used in a SQL Context TooA Table Function Takes a Collection Type as an Argument
A Function that Returns a Collection Works Too
Allows us to Pass Out PL/SQL Collections as a Cursor to any Host Language
SELECT * FROM
TABLE( CAST( get_email_demo(45) AS email_demo_nt_t));
www.SageLogix.com
Table FunctionsTable Functions
Data is Buffered in the Local Variable During Function Execution
Cursor Returns Rows after Function CompletesPrivate Memory Issues if the Result Set is LargeNeed a Way to Stream Results
9i Pipelined Table FunctionsProvides a Streaming InterfaceRows are Returned as they are ProducedRows are Actually Buffered in Small Batches
Remember Bulk Binding Issue?Can be Run in ParallelPIPELINED KeywordPIPE ROW Operator
www.SageLogix.com
Pipelined Table Function
Pipelined Table Function
FUNCTION get_email_demo RETURN email_demo_nt_t PIPELINED IS CURSOR email_demo_cur IS SELECT email_demo_obj_t(email_id, demo_id, value) FROM email_demographic; eml_dmo_nt email_demo_nt_t; BEGIN OPEN email_demo_cur; LOOP EXIT WHEN email_demo_cur%NOTFOUND; FETCH email_demo_cur BULK COLLECT INTO eml_dmo_nt LIMIT 1000; FOR i IN 1..eml_dmo_nt.COUNT LOOP /* Apply some business logic on the object here, and return a row. */ PIPE ROW (eml_dmo_nt(i)); END LOOP; END LOOP; RETURN;END;
www.SageLogix.com
External TablesExternal Tables
One Last Piece of Background Information
Oracle 9i External TablesProvides a Way for Oracle to Read Directly from Flat Files on the Database Server
File can be Queried as if it is a Real Database Table
Can Sort, Aggregate, Filter Rows, etc.
External File Can be Queried in Parallel
Only Table Definition is Stored in the Database
Data is ‘External’
Table Definition is Similar to SQL*Loader Control File
www.SageLogix.com
External TablesExternal Tables
CREATE TABLE ext_tab (email VARCHAR2(50), age NUMBER, income VARCHAR2(20))ORGANIZATION EXTERNAL ( TYPE oracle_loader DEFAULT DIRECTORY data_dirACCESS PARAMETERS (RECORDS DELIMITED BY NEWLINE LOGFILE data_dir: 'ext_tab.log' BADFILE data_dir: 'ext_tab.bad' FIELDS TERMINATED BY ',' MISSING FIELD VALUES ARE NULL (email CHAR(50), age INTEGER EXTERNAL(2), income CHAR(20) ) ) LOCATION ('ext_tab.dat') ) REJECT LIMIT UNLIMITED;
www.SageLogix.com
ETL ExampleETL Example
EMAIL AGE INCOME
[email protected] 58 over_100k
EMAIL_ID DEMO_CODE VALUE
2345 3 58
2345 7 over_100k
• Normalize, Encode and Pivot Input Record
PACKAGE BODY etl IS TYPE hash_table_t IS TABLE OF NUMBER INDEX BY VARCHAR2(30); email_map hash_table_t; FUNCTION transform (new_data SYS_REFCURSOR) RETURN email_demo_nt_t PIPELINED PARALLEL_ENABLE ( PARTITION new_data BY ANY ) IS TYPE ext_tab_array IS TABLE OF ext_tab%ROWTYPE INDEX BY BINARY_INTEGER; indata ext_tab_array; email_demo_obj email_demo_obj_t := email_demo_obj_t(null,null,null); demo_map hash_table_t; BEGIN LOOP EXIT WHEN new_data%NOTFOUND; FETCH new_data BULK COLLECT INTO indata LIMIT 1000; FOR i IN 1..indata.COUNT LOOP email_demo_obj.email_id := email_map(indata(i).email); email_demo_obj.demo_code := 3; email_demo_obj.value := indata(i).age; PIPE ROW (email_demo_obj); email_demo_obj.demo_code := 7; email_demo_obj.value := indata(i).income; PIPE ROW (email_demo_obj); END LOOP; END LOOP; RETURN; END;BEGIN FOR email IN (SELECT email_id, email FROM email) LOOP email_map(email.email) := email.email_id; END LOOP;END;
www.SageLogix.com
Oracle 9i ETLOracle 9i ETL
Transformation is Just a Simple INSERT as SELECT
Elegant Solution to Parallel, Transactional Co-processing
INSERT /*+ append nologging */ INTO email_demographic
(SELECT /*+ parallel( a, 4 ) */ *
FROM TABLE(
CAST( etl.transform( CURSOR(SELECT * FROM ext_tab )) AS email_demo_nt_t)) a);
www.SageLogix.com
Parallel Co-processingParallel Co-processing
DB
PL/SQL
PL/SQL
PL/SQL
PL/SQL
PQ Slave
PQ Slave
PQ Slave
PQ Slave
InputFile INSERT
Extract Transform Load
www.SageLogix.com
Performance IssuesPerformance Issues
Speed is Respectable but There is a Performance Bottleneck with the Table Function Mechanism
Possibly an Issue Binding Data Back from the SQL Engine
Throughput is about Three Times Slower than Coding with BULK COLLECT and FORALL Operators
However These Don’t Support Parallel Operations
Oracle Expects to have it Fixed in Next Release
www.SageLogix.com
ETL AlternativesETL Alternatives
The Multi-Table INSERT Statement
New in 9i
Each Sub-Query Input Row Can be INSERT’ed to a Different Table
… or the Same Table Multiple Times
Faster than Using PL/SQLIt’s Always Faster to do Something in Pure SQL than Using Any Host Language
Binding is Avoided
www.SageLogix.com
Multi-Table InsertMulti-Table Insert
INSERT /*+ append nologging */ ALLINTO email_demographic (email_id, demo_id, value)VALUES (email_id, 3, age)INTO email_demographic (email_id, demo_id, value)VALUES (email_id, 7, income) (SELECT /*+ ordered index( b ) */ b.email_id, a.income, a.age FROM ext_tab a, email b WHERE a.email = b.email);
www.SageLogix.com
SQL-Only ProcessingSQL-Only Processing
DB
PQ Slave
PQ Slave
PQ Slave
PQ Slave
InputFile INSERT
Extract Transform Load
www.SageLogix.com
Performance SolutionsPerformance Solutions
Minimize SQL Execution TimeExploiting Caching to Eliminate Some SQL Look-ups and Joins
Direct Path Inserts
Code Logic Execution TimeReplacing Interpreted PL/SQL with Native Compilation
Eliminating Host Language Using Multi-Table INSERTS
Variable BindingReplace Single Row Binds with Bulk Binds
www.SageLogix.com
ConclusionConclusion
Oracle 9i ETL is a High Performance ETL Solution
Especially Once the Table Function Issue is Resolved
Already Included in the Cost of the RDBMS