ugf3587 scaling to infinity - partitioning dw in oracle db

36
Oracle Open World 2014 Scaling To Infinity Partitioning Data Warehouses on Oracle Database Session UGF-3587 Sunday 28-Sep 2014 Tim Gorman - www.Delphix.com

Upload: suchai

Post on 04-Jan-2016

223 views

Category:

Documents


2 download

DESCRIPTION

Open World 2014 UGF3587 Scaling to Infinity - Partitioning DW in Oracle DB

TRANSCRIPT

Page 1: UGF3587 Scaling to Infinity - Partitioning DW in Oracle DB

Oracle Open World 2014

Scaling To Infinity Partitioning Data Warehouses on

Oracle Database

Session UGF-3587 Sunday 28-Sep 2014

Tim Gorman - www.Delphix.com

Page 2: UGF3587 Scaling to Infinity - Partitioning DW in Oracle DB

Oracle Open World 2014

Agenda •  Stating the problem •  Brief overview of star schemas •  Brief overview of star transformations •  The virtuous cycle •  The death spiral •  EXCHANGE PARTITION load technique in detail •  More EXCHANGE PARTITION ideas

28-Sep 2014 www.Delphix.com 2

Page 3: UGF3587 Scaling to Infinity - Partitioning DW in Oracle DB

Oracle Open World 2014

A short story… •  Background: DBA support for about 70 data warehouse databases at a

large telecommunications company o  Emails about “out of space in TEMP tablespace” scraped out of “alert.log”

file in a QA/TEST database •  Offshore DBAs already reacting by killing sessions

o  QA/TEST database already has 56G temp, and PROD has almost 300G temp

o  Reviewing AWR reports reveals a parallel aggregated query against a view comprising a “star schema” between a fact table and 33 dimension tables

•  Further analysis reveals that none of the dimension-key columns on the fact table supported by bitmap indexes

•  Thus, the “star join” consists of a partition-pruned FULL table scan on the fact, followed by 33 HASH joins in parallel to the dimensions

28-Sep 2014 www.Delphix.com 3

Page 4: UGF3587 Scaling to Infinity - Partitioning DW in Oracle DB

Oracle Open World 2014

A short story… •  When asked about the removed bitmap indexes…

o  An Informatica ETL developer commented… “I don’t know why all those indexes were removed”

•  It was because the bitmap indexes made ETL data loading into the fact table utterly impossible o  Project manager confirmed this, adding…

“Once the indexes were dropped, everything worked great”

o  DW architect added… “This setup has been running in PROD for 10 months!”

o  In other words, everything is OK! Stop causing trouble!

o  When asked how the end-users felt about performance… …not good at all… …bring on Netezza, Exadata, etc...

28-Sep 2014 www.Delphix.com 4

Page 5: UGF3587 Scaling to Infinity - Partitioning DW in Oracle DB

Oracle Open World 2014

Why Star Schemas? •  BI analysts just want a big spreadsheet

o  Lots and lots of attribute and measure columns •  Attributes characterize the data •  Measures are usually additive and numeric

•  Dimensional data model is really just that spreadsheet o  Normalized to recursive depth of one

•  More sophisticated models might normalize further (snowflake) •  Normalized entities are dimension tables

o  Columns are primary key and attribute columns •  Original spreadsheet is the fact table

o  Columns are foreign-keys to dimensions and measures

28-Sep 2014 www.Delphix.com 5

Page 6: UGF3587 Scaling to Infinity - Partitioning DW in Oracle DB

Oracle Open World 2014

Customers

Orders

Order Lines

Products

Suppliers

Order Facts

Products Dim Suppliers Dim

Customers Dim Time Dim

Transactional Operational

Entity-Relational Modeling

Dimensional Modeling

Why Star Schemas?

28-Sep 2014 www.Delphix.com 6

Page 7: UGF3587 Scaling to Infinity - Partitioning DW in Oracle DB

Oracle Open World 2014

Why Star Transformations? Star transformation

compared to other join methods (NL, SM, HA):

•  Filter result set in one of

the dimension tables •  Join from that dimension

table to the fact along a low-cardinality dimension key

•  Join back from fact to other dimensions using dimension PK o  Filtering rows retrieved

along the way •  Wasteful and inefficient

Fact table

Dim Table 1

Dim Table 2

Dim Table 3

Dim Table 4

28-Sep 2014 www.Delphix.com 7

Page 8: UGF3587 Scaling to Infinity - Partitioning DW in Oracle DB

Oracle Open World 2014

Why Star Transformations? Star transformation:

•  Filter on query set in each dimension

•  Merge result set from all dimensions

•  Join to the fact from merged result set, using BITMAP MERGE index scan

Dim Table 1

Dim Table 2

Dim Table 3

Dim Table 4

28-Sep 2014 www.Delphix.com 8

Page 9: UGF3587 Scaling to Infinity - Partitioning DW in Oracle DB

Oracle Open World 2014

Why Star Transformations? •  Point: Single-column bitmap indexes on dimension-key

columns are required for star transformation •  Counter-point: Bitmap indexes become impossible to

load/maintain when data volume increases past dozens of Gb

•  Catch-22? Does this mean that Oracle cannot handle

large data warehouses?

28-Sep 2014 www.Delphix.com 9

Page 10: UGF3587 Scaling to Infinity - Partitioning DW in Oracle DB

Oracle Open World 2014

The Virtuous Cycle •  Non-volatile time-variant data implies…

o  Data warehouses are INSERT only

•  Insert-only data warehouses implies… o  Tables and indexes range-partitioned by a DATE column

•  Tables range-partitioned by DATE enables… o  Data loading using EXCHANGE PARTITION load technique o  Incremental statistics gathering and summarization

•  Data loading using EXCHANGE PARTITION enables… o  Direct-path (a.k.a. append) inserts o  Partitions organized into time-variant tablespaces o  Data purging using DROP/TRUNCATE PARTITION instead of DELETE o  Bitmap indexes and bitmap-join indexes o  Elimination of ETL “load window” and 24x7 availability for queries

28-Sep 2014 www.Delphix.com 10

Page 11: UGF3587 Scaling to Infinity - Partitioning DW in Oracle DB

Oracle Open World 2014

The Virtuous Cycle •  Direct-path (a.k.a. append) inserts enable…

o  Load more data, faster, more efficiently o  Optional NOLOGGING on inserts o  Basic table compression (9i and above) o  Eliminates contention in Oracle Buffer Cache during data loading

•  Optional NOLOGGING inserts enable… o  Option to generate less redo during data loads

•  Basic table compression enables… o  Less space consumed for tables and indexes o  Fewer I/O operations during queries

•  Partitions organized into time-variant tablespaces enable… o  READ ONLY tablespaces for older, less-volatile data

28-Sep 2014 www.Delphix.com 11

Page 12: UGF3587 Scaling to Infinity - Partitioning DW in Oracle DB

Oracle Open World 2014

The Virtuous Cycle •  READ ONLY tablespaces for older less-volatile data enables…

o  Tiered storage o  Backup efficiencies

•  Data purging using DROP/TRUNCATE PARTITION enables… o  Faster more efficient data purging than using DELETE statements

•  Bitmap indices enable… o  Star transformations

•  Star transformations enable… o  Optimal query-execution plan for dimensional data models

o  Bitmap-join indexes •  Bitmap-join indexes enable…

o  Further optimization of star transformations

28-Sep 2014 www.Delphix.com 12

Page 13: UGF3587 Scaling to Infinity - Partitioning DW in Oracle DB

Oracle Open World 2014

The Death Spiral •  ETL using “conventional-path” INSERT, UPDATE, and DELETE operations •  Conventional-path operations are trouble with:

o  High-volume data loads o  Contention in Shared Pool, Buffer Cache, global structures

•  Mixing of queries and loads simultaneously on table and indexes •  Periodic rebuilds/reorgs of tables if deletions occur •  Full redo and undo generation for all inserts, updates, and deletes

o  Bitmap indexes and bitmap-join indexes •  Modifying bitmap indexes is slow •  Prone to locking issues in concurrency situations

•  ETL will dominate the workload in the database o  Queries will consist mainly of “dumps” or extracts to downstream systems o  Query performance will be abysmal and worsening…

28-Sep 2014 www.Delphix.com 13

Page 14: UGF3587 Scaling to Infinity - Partitioning DW in Oracle DB

Oracle Open World 2014

The Death Spiral •  Without partitioning and insert-only (insert mostly?)

o  Query performance worsens as tables/indexes grow larger o  ETL performance worsens as…

•  Loads must be performed into “live” tables o  Users must be locked out during “load cycle” o  In-flight queries must be killed during “load cycle” o  Bitmap indexes must be dropped/rebuilt during “load cycle” o  Entire tables must be re-analyzed during “load cycle”

o  Entire database must be backed up frequently •  No change for setting tablespaces to READ ONLY •  Data cannot be “right-sized” to storage options according to IOPS

•  Everything just gets harder and harder to do… o  …and that stupid Oracle software is to blame… o  Bring on Exadata or Netezza or <expensive-flavor-of-the-month>

28-Sep 2014 www.Delphix.com 14

Page 15: UGF3587 Scaling to Infinity - Partitioning DW in Oracle DB

Oracle Open World 2014

Exchange Partition •  The technique of bulk-loading new data into a temporary “swap table”,

which is then “published” using the EXCHANGE PARTITION operation, should be the default load technique for all large tables in a data warehouse o  fact tables o  slowly-changing dimensions

•  Assumptions for this upcoming example: o  Composite partitioned fact table named TXN

•  Range partitioned on DATE column TXN_DATE •  Hash sub-partitioned on NUMBER column ACCTKEY •  Data to be loaded into partition P20140225 on TXN

28-Sep 2014 www.Delphix.com 15

Page 16: UGF3587 Scaling to Infinity - Partitioning DW in Oracle DB

Oracle Open World 2014

22-Feb 2014

23-Feb 2014

24-Feb 2014

25-Feb 2014

Composite-partitioned table TXN

Hash-partitioned table TXN_SWAP

Exchange Partition, step 1

CREATE TABLE TXN_SWAP … AS SELECT …

FROM TXN PARTITION

(P20140225)

28-Sep 2014 www.Delphix.com 16

Page 17: UGF3587 Scaling to Infinity - Partitioning DW in Oracle DB

Oracle Open World 2014

22-Feb 2014

23-Feb 2014

24-Feb 2014

Composite-partitioned table TXN

Hash-partitioned table TXN_SWAP

Exchange Partition, step 2

Load

Load

Load

25-Feb 2014

28-Sep 2014 www.Delphix.com 17

Page 18: UGF3587 Scaling to Infinity - Partitioning DW in Oracle DB

Oracle Open World 2014

22-Feb 2014

23-Feb 2014

24-Feb 2014

Composite-partitioned table TXN

Hash-partitioned table TXN_SWAP

Exchange Partition, step 3

CREATE INDEX

CREATE INDEX

CREATE INDEX

25-Feb 2014

28-Sep 2014 www.Delphix.com 18

Page 19: UGF3587 Scaling to Infinity - Partitioning DW in Oracle DB

Oracle Open World 2014

22-Feb 2014

23-Feb 2014

24-Feb 2014

Composite-partitioned table TXN

Exchange Partition, step 4

25-Feb 2014

PARTITION

EXCHANGE

Hash-partitioned table TXN_SWAP

28-Sep 2014 www.Delphix.com 19

Page 20: UGF3587 Scaling to Infinity - Partitioning DW in Oracle DB

Oracle Open World 2014

22-Feb 2014

23-Feb 2014

24-Feb 2014

Composite-partitioned table TXN

Exchange Partition, step 5

25-Feb 2014

Hash-partitioned table TXN_SWAP

Gather partition statistics for table, columns, indexes

28-Sep 2014 www.Delphix.com 20

Page 21: UGF3587 Scaling to Infinity - Partitioning DW in Oracle DB

Oracle Open World 2014

Exchange Partition 1.  Create temporary table TXN_SWAP as a hash-partitioned table 2.  Perform parallel, direct-path load of new data into TXN_SWAP

•  Perform any other DML needed to prepare data in TXN_SWAP for publishing into the TXN table

3.  Create indexes on TXN_SWAP corresponding to the local indexes on TXN

4.  Exchange partition to “publish” new data to TXN alter table TXN

exchange partition P20140225 with table TXN_SWAP including indexes update global indexes;

5.  Gather CBO statistics on table TXN partition P20140225 •  DBMS_STATS preference INCREMENTAL will gather partition-

level statistics as well as updating global-level statistics

28-Sep 2014 www.Delphix.com 21

Page 22: UGF3587 Scaling to Infinity - Partitioning DW in Oracle DB

Oracle Open World 2014

Exchange Partition •  It is a good idea to encapsulate this logic inside PL/SQL packaged- or

stored-procedures:

SQL> execute exchpart.prepare(‘TXN’,’_SWAP’, -! 2 ’25-FEB-2014’);!SQL> alter session enable parallel dml;!SQL> insert /*+ append parallel(n,4) */! 2 into txn_swap n! 3 select /*+ full(x) parallel(x,4) */ *! 4 from stage_txn_fact x;!SQL> commit;!SQL> execute exchpart.finish(‘TXN’,’_SWAP’);!

•  EXCHPART package (“exchpart.sql”) posted at http://www.EvDBT.com/scripts!

28-Sep 2014 www.Delphix.com 22

Page 23: UGF3587 Scaling to Infinity - Partitioning DW in Oracle DB

Oracle Open World 2014

The “dribble effect” •  In real-life, data loading is often much messier

than shown here… o  For example, for a daily load frequency, data to be

loaded during the 25-Feb load cycle might consist of: •  950,000 rows for the 25-Feb partition •  4,500 rows for the 24-Feb partition •  400 rows for the 23-Feb partition •  700 rows for the 22-Feb partition •  200 rows for the 21-Feb partition •  100 rows for the 20-Feb partition •  …and 3 rows for the 07-Jan partition…

28-Sep 2014 www.Delphix.com 23

Page 24: UGF3587 Scaling to Infinity - Partitioning DW in Oracle DB

Oracle Open World 2014

The “dribble effect” •  How can this be handled?

o  One suggestion: 1. Use EXCHPART package to load the data for the 25-Feb and 24-

Feb partitions 2. Load the data to the remainder of the partitions by just

inserting (conventional-path) directly into the partitioned table o Must determine a threshold when to use EXCHPART,

and when to simply insert rows •  Data volume is the metric… •  Threshold value of “N rows” varies due to many factors…

o Number of bitmap indexes on partitioned table? o Using compression during load or not? o Degree of parallelism? o Are there any global indexes?

28-Sep 2014 www.Delphix.com 24

Page 25: UGF3587 Scaling to Infinity - Partitioning DW in Oracle DB

Oracle Open World 2014

The “dribble effect” Example: Use EXCHANGE PARTITION when rows-to-be-loaded > 1000, else just use conventional INSERT for d in (select trunc(txn_dt) dt, count(*) cnt from EXT_STAGE_TXN group by trunc(txn_dt)) loop -- if d.cnt > 1000 then

-- exchpart.prepare(‘TXN’,’_SWAP’||to_char(d.dt,’YYYYMMDD’), d.dt); insert /*+ append parallel(n,16) */ into TXN_20140224 n select /*+ parallel(x,16) */ * from EXT_STAGE x where x.txn_dt >= d.dt and x.txn_dt < d.dt + 1; exchpart.finish(‘TXN’, ’TXN_’||to_char(d.dt,’YYYYMMDD’)); exchpart.drop_indexes(’TXN_’||to_char(d.dt,’YYYYMMDD’)); insert /*+ append parallel(n,16) */ into TXN_20140224 n select /*+ parallel(x,16) */ * from EXT_STAGE x where x.txn_dt >= d.dt and x.txn_dt < d.dt + 1; --

else -- insert into TXN select * from ext_stage where txn_dt >= d.dt and txn_dt < d.dt + 1; --

end if; -- end loop;

28-Sep 2014 www.Delphix.com 25

Page 26: UGF3587 Scaling to Infinity - Partitioning DW in Oracle DB

Oracle Open World 2014

Slowly-changing dimensions •  Loading time-variant fact and dimension tables is not the

only load activity in most data warehouses o  Often, some tables contain current or point-in-time data

•  Example: type-1 dimension derived from type-2 dimension

•  With each load cycle loading new data into a new partition of the type-2 dimension, the type-1 dimension needs to be updated o  Instead of performing transactional MERGE (i.e. Update or Insert)

logic directly on the table •  Rebuild the table into a temporary table, then “swap” it in using

EXCHANGE PARTITION

28-Sep 2014 www.Delphix.com 26

Page 27: UGF3587 Scaling to Infinity - Partitioning DW in Oracle DB

Oracle Open World 2014

Merge logic to update SCDs •  The MERGE statement to update the type-1 dimension

(CURR_ACCT_DIM) from the just-loaded partition of the type-2 dimension (ACCT_DIM) could look like…

merge into curr_acct_dim using (select * from acct_dim where eff_dt >= ‘25-FEB-2014’ and eff_dt < ‘26-FEB-2014’) when matched then update set ... when not matched then insert ...;

•  Simple to write, but sloooowwwwwwwww…

28-Sep 2014 www.Delphix.com 27

Page 28: UGF3587 Scaling to Infinity - Partitioning DW in Oracle DB

Oracle Open World 2014

23-Feb 2014

24-Feb 2014

25-Feb 2014

ACCT_DIM (type-2 dimension)

CURR_ACCT_DIM (type-1 dimension) ACCT_DIM_SWAP

ExchPart instead of MERGE

22-Feb 2014

ç Just loaded 25-Feb data

ç Data current as of 24-Feb

Truncated as we begin è

28-Sep 2014 www.Delphix.com 28

Page 29: UGF3587 Scaling to Infinity - Partitioning DW in Oracle DB

Oracle Open World 2014

ExchPart instead of MERGE INSERT /*+ append parallel(t,8) */ INTO TMP_CURR_ACCOUNT_DIM T!SELECT /*+ parallel(x,8) */ …(list of columns)… FROM !(SELECT /*+ parallel(y,8) */ …(list of columns)…, ROW_NUMBER() over (PARTITION BY acctkey! ORDER BY effdt desc) rn! FROM (SELECT /*+ parallel(z1,8) */ …(list of columns)… FROM CURR_ACCOUNT_DIM z1! UNION ALL! SELECT /*+ parallel(z2,8) */ …(list of columns)… FROM ACCOUNT_DIM partition(P20140225) z2) y) x!WHERE x.RN = 1;!!1.  Inner-most query pulls changed data from type-2, merged with existing data from

type-1 2.  Middle query ranks within ACCTKEY values, sorted by EFFDT DESC 3.  Outer-most query selects only latest row for each ACCTKEY and passes to INSERT

28-Sep 2014 www.Delphix.com 29

Page 30: UGF3587 Scaling to Infinity - Partitioning DW in Oracle DB

Oracle Open World 2014

23-Feb 2014

24-Feb 2014

25-Feb 2014

ACCT_DIM (type-2 dimension)

CURR_ACCT_DIM (type-1 dimension) ACCT_DIM_SWAP

ExchPart instead of MERGE

22-Feb 2014

All rows

All rows

28-Sep 2014 www.Delphix.com 30

union all filter

Page 31: UGF3587 Scaling to Infinity - Partitioning DW in Oracle DB

Oracle Open World 2014

23-Feb 2014

24-Feb 2014

25-Feb 2014

ACCT_DIM (type-2 dimension)

CURR_ACCT_DIM (type-1 dimension) ACCT_DIM_SWAP

ExchPart instead of MERGE

22-Feb 2014

CREATE INDEX

CREATE INDEX

CREATE INDEX

28-Sep 2014 www.Delphix.com 31

Page 32: UGF3587 Scaling to Infinity - Partitioning DW in Oracle DB

Oracle Open World 2014

23-Feb 2014

24-Feb 2014

25-Feb 2014

ACCT_DIM (type-2 dimension)

CURR_ACCT_DIM (type-1 dimension) ACCT_DIM_SWAP

ExchPart instead of MERGE

22-Feb 2014

PARTITION

EXCHANGE

28-Sep 2014 www.Delphix.com 32

Page 33: UGF3587 Scaling to Infinity - Partitioning DW in Oracle DB

Oracle Open World 2014

23-Feb 2014

24-Feb 2014

25-Feb 2014

ACCT_DIM (type-2 dimension)

CURR_ACCT_DIM (type-1 dimension) ACCT_DIM_SWAP

ExchPart instead of MERGE

22-Feb 2014

Gather partition statistics for table, columns, indexes

28-Sep 2014 www.Delphix.com 33

Page 34: UGF3587 Scaling to Infinity - Partitioning DW in Oracle DB

Oracle Open World 2014

Summary 1.  Data warehouses use star schemas 2.  Star schemas are best queried using star

transformations 3.  Star transformation requires bitmap indexes

o  Bitmap indexes or bitmap-join indexes

4.  Large bitmap indexes become infeasible without using the EXCHANGE PARTITION load technique

5.  INSERT is always faster than UPDATE, DELETE, or MERGE

Data loading using EXCHANGE PARTITION is the key to unlock infinite scalability

28-Sep 2014 www.Delphix.com 34

Page 35: UGF3587 Scaling to Infinity - Partitioning DW in Oracle DB

Oracle Open World 2014

KScope15

28-Sep 2014 www.Delphix.com 35

http://www.KScope15.com #ODTUG #KScope15

Conference for EPM, APEX, ADF, BI, Oracle developers and DBAs

Page 36: UGF3587 Scaling to Infinity - Partitioning DW in Oracle DB

Oracle Open World 2014

Q & A

•  Session: UGF-3587 •  Email: [email protected] •  Twitter: @TimothyJGorman •  Blog: EvDBT.com

o  Papers: EvDBT.com/papers ç including this presentation! o  Scripts: EvDBT.com/scripts o  Videos: EvDBT.com/videos

www.Delphix.com 36 28-Sep 2014