data vault partitioningstrategies - wordpress.com...2 23.11.2017 data vault partitioning strategies...
TRANSCRIPT
Data VaultPartitioning StrategiesDani Schnider, Trivadis AG
DOAG Conference, 23 November 2017
@dani_schnider DOAG2017
Our company.
Data Vault Partitioning Strategies2 23.11.2017
Trivadis is a market leader in IT consulting, system integration, solution engineeringand the provision of IT services focusing on and technologiesin Switzerland, Germany, Austria and Denmark. We offer our services in the following strategic business fields:
Trivadis Services takes over the interacting operation of your IT systems.
O P E R A T I O N
COPENHAGEN
MUNICH
LAUSANNEBERN
ZURICHBRUGG
GENEVA
HAMBURG
DÜSSELDORF
FRANKFURT
STUTTGART
FREIBURG
BASEL
VIENNA
With over 600 specialists and IT experts in your region.
Data Vault Partitioning Strategies3 23.11.2017
14 Trivadis branches and more than600 employees
200 Service Level Agreements
Over 4,000 training participants
Research and development budget:CHF 5.0 million
Financially self-supporting and sustainably profitable
Experience from more than 1,900 projects per year at over 800customers
Dani Schnider
4
Working for Trivadis in Glattbrugg/Zurich
– Senior Principal Consultant
– Data Warehouse Lead Architect
– Trainer of several Courses
Co-Author of the books
– Data Warehousing mit Oracle
– Data Warehouse Blueprints
Certified Data Vault Data Modeler
23.11.2017 Data Vault Partitioning Strategies
@dani_schnider danischnider.wordpress.com
Data Vault Tables
Data Vault Partitioning Strategies5 23.11.2017
Surrogate Key (PK)Business Key(s) (UK)Load DateRecord Source
HUBSurrogate Key (PK)Foreign Key Hub 1Foreign Key Hub 2...Load DateRecord Source
LINKForeign Key to Hub (PK)Load Date (PK)Load End Date (optional)Context Attribute 1Context Attribute 2...Context Attribute nRecord Source
SATELLITE
Data Vault Partitioning Strategies6 23.11.2017
Source: How to Create a Data Vault Model, https://youtu.be/Q1qj_LjEawc
Example Data Vault Model (Subset)
Data Vault Partitioning Strategies7 23.11.2017
Data Vault Partitioning Strategies8 23.11.2017
Partitioning by Load Date
Partitioning by Load Date: A Good Strategy?
Data Vault Partitioning Strategies9 23.11.2017
SID LOAD_DATE LOAD_END_DATE
1 01.01.2014 13.04.2014
1 13.04.2014 28.07.2015
1 28.07.2015 09.02.2017
1 09.02.2017 31.12.9999
2 15.03.2014 31.12.9999
3 26.06.2016 10.03.2017
3 10.03.2017 13.03.2017
3 13.03.2017 14.03.2017
3 14.03.2017 31.12.9999
Partitioning by Load Date: Use Cases
Data Vault Partitioning Strategies10 23.11.2017
Master Data (Changing Data) Transactional Data (Events)Product Sales TransactionCustomer OrderEmployee Web Tracking… …Beer (general product description) Brew (particular brew batch)
Partitioning by Load Date: Overview
11
S_Brew_Journal
H_BrewH_Beer
S_Recipe
L_Beer_BrewS_Beer
23.11.2017 Data Vault Partitioning Strategies
Partitioning by Load Date: Hub Example
Data Vault Partitioning Strategies12 23.11.2017
CREATE TABLE H_BREW( H_Brew_Key RAW (16) NOT NULL,Brew_No NUMBER( 4) NOT NULL,Load_Date DATE NOT NULL,Record_Source VARCHAR2 (4 CHAR) NOT NULL
)PARTITION BY RANGE (Load_Date) INTERVAL(numtoyminterval(1,'MONTH'))(PARTITION p_old_data
VALUES LESS THAN (TO_DATE('01-01-2015','dd-mm-yyyy')));
Partitioning by Load Date: Satellite Example
Data Vault Partitioning Strategies13 23.11.2017
CREATE TABLE S_BREW_JOURNAL( H_Brew_Key RAW (16) NOT NULL,Load_Date DATE NOT NULL,Brew_Date DATE NOT,Brewer VARCHAR2 (40),...Record_Source VARCHAR2 (4 CHAR) NOT NULL
)PARTITION BY RANGE (Load_Date) INTERVAL(numtoyminterval(1,'MONTH'))(PARTITION p_old_data
VALUES LESS THAN (TO_DATE('01-01-2015','dd-mm-yyyy')));
Partitioning by Load Date
Data Vault Partitioning Strategies14 23.11.2017
ü Partition Pruning
û Partition-wise Join
ü Rolling History
ü Data Distribution
ü Partition Exchange
Restrictions:
Only for transactional data
Global indexes should be avoided
Partitioning by Load Date: Global Index Issue
Data Vault Partitioning Strategies15 23.11.2017
ALTER TABLE S_BREW_JOURNAL ADD CONSTRAINT S_BREW_JOURNAL_PKPRIMARY KEY (H_Brew_Key, Load_Date)RELY DISABLE NOVALIDATE;
ALTER TABLE S_BREW_JOURNAL ADD CONSTRAINT H_BREW_S_BREW_JOURNAL_FKFOREIGN KEY (H_Brew_Key) REFERENCES H_BREW (H_Brew_Key)RELY DISABLE NOVALIDATE;
ALTER TABLE H_BREW ADD CONSTRAINT H_BREW_PKPRIMARY KEY (H_Brew_Key)RELY DISABLE NOVALIDATE;
ALTER TABLE H_BREW ADD CONSTRAINT H_BREW_UN UNIQUE (Brew_No)RELY DISABLE NOVALIDATE;
How to Avoid Global Indexes (PK/UK on Hubs)?
Data Vault Partitioning Strategies16 23.11.2017
Partitioning by Load End Date
Partitioning by Load End Date: Find Current Versions
Data Vault Partitioning Strategies17 23.11.2017
SID LOAD_DATE LOAD_END_DATE
1 01.01.2014 13.04.2014
1 13.04.2014 28.07.2015
1 28.07.2015 09.02.2017
1 09.02.2017 31.12.9999
2 15.03.2014 31.12.9999
3 26.06.2016 10.03.2017
3 10.03.2017 13.03.2017
3 13.03.2017 14.03.2017
3 14.03.2017 31.12.9999
Partitioning by Load End Date: Overview
18
S_Brew_Journal
H_BrewH_Beer
S_Beer
L_Beer_Brew
His
tory
Parti
tion
Cur
rent
Parti
tion
His
tory
Parti
tion
Cur
rent
Parti
tion
S_Recipe
His
tory
Parti
tion
Cur
rent
Parti
tion
23.11.2017 Data Vault Partitioning Strategies
Partitioning by Load End Date: Satellite Example
Data Vault Partitioning Strategies19 23.11.2017
CREATE TABLE S_RECIPE(
H_Beer_Key RAW (16) NOT NULL,Load_Date DATE NOT NULL,Load_End_Date DATE DEFAULT ON NULL TO_DATE('31-12-9999', 'dd-mm-yyyy'),Start_Temp NUMBER (3),Mashing_Time_1 NUMBER (3),Mashing_Temp_1 NUMBER (3),…Record_Source VARCHAR2 (4 CHAR) NOT NULL
)ENABLE ROW MOVEMENTPARTITION BY LIST (Load_End_Date)(PARTITION p_current VALUES (TO_DATE('31-12-9999', 'dd-mm-yyyy')),PARTITION p_history VALUES (DEFAULT));
Partitioning by Load End Date
Data Vault Partitioning Strategies20 23.11.2017
ü Partition Pruning
û Partition-wise Join
û Rolling History
ü Data Distribution
ü Partition Exchange
Restrictions:
Only for Satellites, requires LOAD_END_DATE
ENABLE ROW MOVEMENT required
Partitioning by Load End Date: Partition Exchange
Data Vault Partitioning Strategies21 23.11.2017
Special Implementation of Satellite Load Jobs
Only useful if most versions are replaced
S_Recipe
His
tory
Parti
tion
Cur
rent
Parti
tion
Load
Tab
le
PartitionExchange
Moveold versions
Insertnew versions
Insertunchanged
versions
1. Load Table contains
• All new versions
• All unchanged versions
2. Move old versions to history partition
• Insert rows with load end date
3. Exchange current partition
Data Vault Partitioning Strategies22 23.11.2017
Partitioning by Hub Key
Partitioning by Hub Key
Data Vault Partitioning Strategies23 23.11.2017
Improve Join Performance:Full Partition-wise Joins
– Between Hubs and Satellites
– Between Links and Hubs
Equal Distribution with HASH Partitioning
Run Extraction Queries in Parallel
Partition Key:Primary Key of Hub
Foreign Key of Satellite / Link
Link: Composite HASH-HASH Partitioning
Satellite
part 1
part 2
part 3
part 4
Hub
part 1
part 2
part 3
part 4
slave1
slave2
slave3
slave4
proc(QC)
Partitioning by Hub Key: Overview
S_Brew_Journal
H_BrewH_Beer
S_Recipe
L_Beer_BrewS_Beer
23.11.2017 Data Vault Partitioning Strategies24
Partitioning by Hub Key: Hub Example
Data Vault Partitioning Strategies25 23.11.2017
CREATE TABLE H_BEER(H_Beer_Key RAW (16) NOT NULL,Beer_Name VARCHAR2 (40) NOT NULL,Load_Date DATE NOT NULL,Record_Source VARCHAR2 (4 CHAR) NOT NULL
)PARTITION BY HASH (H_Beer_Key) PARTITIONS 8;
Partitioning by Hub Key: Satellite Example
Data Vault Partitioning Strategies26 23.11.2017
CREATE TABLE S_BEER_DESCRIPTION(H_Beer_Key RAW (16) NOT NULL,Load_Date DATE NOT NULL,Style VARCHAR2 (40),ABV NUMBER (3,1),IBU NUMBER (3),Seasonal VARCHAR2 (10),Label_Color VARCHAR2 (10),Record_Source VARCHAR2 (4 CHAR) NOT NULL
)PARTITION BY HASH (H_Beer_Key) PARTITIONS 8;
Partitioning by Hub Key: Link Example
Data Vault Partitioning Strategies27 23.11.2017
CREATE TABLE L_BEER_BREW(L_Beer_Brew_Key RAW (16) NOT NULL,H_Beer_Key RAW (16) NOT NULL,H_Brew_Key RAW (16) NOT NULL,Load_Date DATE NOT NULL,Record_Source VARCHAR2 (4 CHAR) NOT NULL
)PARTITION BY HASH (H_Beer_Key)
SUBPARTITION BY HASH (H_Brew_Key)SUBPARTITIONS 8
PARTITIONS 8;
Partitioning by Hub Key
Data Vault Partitioning Strategies28 23.11.2017
û Partition Pruning
ü Partition-wise Join
û Rolling History
ü Data Distribution
û Partition Exchange
Restrictions:
Maximal two partition keys per Link
Data Vault Partitioning Strategies29 23.11.2017
Conclusion
Partitioning by Hub Key: Benefits
Data Vault Partitioning Strategies30 23.11.2017
Load Date 1) Load End Date 2) Hub Key
Partition Pruning ü ü û
Partition-wise Join û û ü
Rolling History ü û û
Data Distribution ü ü ü
Partition Exchange ü ü û
1) Only for transactional data2) Only for Satellites
White Paper: Data Vault Partitioning Strategies
Data Vault Partitioning Strategies31 23.11.2017
Page 1 of 18 | www.trivadis.com | Date 30.10.2017
White Paper Data Vault Partitioning Strategies
Dani Schnider
Data Vault Partitioning Strategies WHITE PAPER
Download: https://danischnider.wordpress.com/publications/
Trivadis @ DOAG 2017#opencompany
Stand: 3. Stock, direkt an der Rolltreppe
Wir teilen unser Knowhow!Einfach vorbei kommen, Live-Präsentationenund DokumentenarchivT-Shirts, Gewinnspiel und mehrWir freuen uns wenn Sie vorbei schauen
23.11.2017 Data Vault Partitioning Strategies32