worst practices in data warehouse design

21
Worst Practices in Data Warehouse Design Kent Graziano Data Warrior LLC Twitter @KentGraziano

Upload: kent-graziano

Post on 28-Nov-2014

604 views

Category:

Data & Analytics


1 download

DESCRIPTION

This presentation was given at OakTable World 2014 (#OTW14) in San Francisco. After many years of designing data warehouses and consulting on data warehouse architectures, I have seen a lot of bad design choices by supposedly experienced professional. A sense of professionalism, confidentiality agreements, and some sense of common decency have prevented me from calling people out on some of this. No more! In this session I will walk you through a typical bad design like many I have seen. I will show you what I see when I reverse engineer a supposedly complete design and walk through what is wrong with it and discuss options to correct it. This will be a test of your knowledge of data warehouse best practices by seeing if you can recognize these worst practices.

TRANSCRIPT

Page 1: Worst Practices in Data Warehouse Design

Worst Practices in Data Warehouse

Design

Kent Graziano

Data Warrior LLC

Twitter @KentGraziano

Page 2: Worst Practices in Data Warehouse Design

Agenda

My Bio

My Book

Survey

Backstory

What’s wrong with this picture?

The fallacy of the unconstrained data warehouse

Moral of the Story

© Data Warrior LLC

Page 3: Worst Practices in Data Warehouse Design

My Bio

Kent Graziano

● Oracle ACE Director (BI/DW)

● Data Architecture and Data Warehouse Specialist

● 30+ years in IT

● 20+ years of Oracle-related work

● 15+ years of data warehousing experience

● Member: Boulder BI Brain Trust

(http://www.boulderbibraintrust.org/ )

● Co-Author of

● The Business of Data Vault Modeling

● The Data Model Resource Book (1st Edition)

● Past-President of Oracle Development Tools User Group and

Rocky Mountain Oracle User Group

© Data Warrior LLC

Page 5: Worst Practices in Data Warehouse Design

Survey

Who are you? ● Data Modeler or Architect

● Project Managers

● IT Managers

● DBA

● Developer

Experience ● Data Warehousing?

● Less than 1 yr?

● 1-5 yrs?

● Over 5 years?

© Data Warrior LLC

Page 6: Worst Practices in Data Warehouse Design

The Backstory

Metrics data mart

Outsourced

POC worked great

● 500 records loaded!

Real world: 100K ++ rows

● 1st run – DBA cancelled after 8 hours

● Filled up 665GB temp space

Something wrong?

© Data Warrior LLC

Page 7: Worst Practices in Data Warehouse Design

Next step

DBA says

● Too many parallel sessions

● Too many partitions on fact table

● Load includes

● Select *

● Select distinct

Me

● Reverse engineer the tables first

● Look at the design

● Yikes!

© Data Warrior LLC

Page 8: Worst Practices in Data Warehouse Design

My email to management

“In general, the designs of both the source star schema and the target reporting table do not conform to best practices from either an Oracle tuning or data warehouse design perspective. “

“My only conclusion is that the folks who did the design were not well versed or experienced in designing high performance, high volume data warehouse databases on Oracle.”

“Some of the omissions are so basic as it is hard to comprehend how this could have been considered a completed system. “

© Data Warrior LLC

Page 9: Worst Practices in Data Warehouse Design

What’s wrong with this picture?

● All optional

columns

● The

measure is

optional!

● Even meta

data!

● Extra

Varchar

columns

● No PK

● No UK

● No FKs

● No

Indexes!

© Data Warrior LLC

Page 10: Worst Practices in Data Warehouse Design

So what?

Works fine for 500 rows

● Full table scans

No clues for the optimizer

No clues for customer!

● Design intent?

● Data profile?

No PK/UK – could get duplicates in load

No FK – could be missing dimension keys

Lazy design!

© Data Warrior LLC

Page 11: Worst Practices in Data Warehouse Design

What’s wrong with this picture?

● All

optional

columns

● Even the

PK and

meta

data!

● No UK

● PK on an

optional

column?

© Data Warrior LLC

Page 12: Worst Practices in Data Warehouse Design

So what?

No clue on business key

SCD Type 1 or 2?

There is a CRC Key and CRC Attr

● But which date is the Type 2 date?

Again no clues in the indexes or NOT NULL

Have to look at data to see if

DW_REC_CREATED_DT and

DW_REC_UPDATED_DT are different

Can’t discern the intent

© Data Warrior LLC

Page 13: Worst Practices in Data Warehouse Design

How about the Date Dimension?

● All

optional

columns

● Assume

1st column

is PK?

● No PK

● No UK

● No Indexes

© Data Warrior LLC

Page 14: Worst Practices in Data Warehouse Design

More examples

Let’s look into the data model….

© Data Warrior LLC

Page 15: Worst Practices in Data Warehouse Design

Other Stuff

Untested partitioning scheme

● Target report table partitioning and sub-partition is

non-standard – not on date field

● Pre-created 200 list-based partitions

● But the domain only had 37 values!

Did not use partition-aware loading approach

No indexes on partitions or sub partition

© Data Warrior LLC

Page 16: Worst Practices in Data Warehouse Design

Load approach

Uses a “select *” from source in a view

UPPER function in predicate

● Not needed

● Cancels index usage

Degree of parallelism hardcoded into view

Dummy columns coded into view

No documentation on why

NEVER TESTED with real data!

© Data Warrior LLC

Page 17: Worst Practices in Data Warehouse Design

The Fallacy of the Unconstrained Data Warehouse

Rationale ● Fast to load – no constraints

● All the validation is in the code

Reality ● May be fast load, but slow query

● Not tuned for extract!

● Code may not have been QA’d well ● No model to tell the programmers the rules

● What columns are required?

● What are the FKs to check?

● What defines a duplicate row?

Cost ● Slow query response

● Bad data loaded

● Few clues to help tune

© Data Warrior LLC

Page 18: Worst Practices in Data Warehouse Design

Moral of the story?

Be careful who you outsource to

Have someone independent do touch point

reviews of design

● Costs extra, but we have spent MONTHS fixing this

Insist on documentation

Insist on knowledge transfer with internal DBA

Require load testing with performance criteria

Trust but Verify! © Data Warrior LLC

Page 19: Worst Practices in Data Warehouse Design
Page 20: Worst Practices in Data Warehouse Design

Kscope15.com

SUBMIT YOUR ABSTRACTS TODAY!

Page 21: Worst Practices in Data Warehouse Design

Contact Information

Kent Graziano

The Oracle Data Warrior

Data Warrior LLC

[email protected]

On Twitter @KentGraziano

Visit my blog at

http://kentgraziano.com