the it perspective: data warehousing, management, and ... · 2 2 objectives explain the basics of:...

47
1 1 The IT Perspective: Data Warehousing, Management, and Analytical Structures Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd [email protected]

Upload: others

Post on 13-May-2020

3 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: The IT Perspective: Data Warehousing, Management, and ... · 2 2 Objectives Explain the basics of: 1. Master Data Management 2. Data Warehousing 3. ETL 4. OLAP/Multidimensional Data

1 1

The IT Perspective: Data Warehousing, Management, and Analytical Structures

Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd [email protected]

Page 2: The IT Perspective: Data Warehousing, Management, and ... · 2 2 Objectives Explain the basics of: 1. Master Data Management 2. Data Warehousing 3. ETL 4. OLAP/Multidimensional Data

2 2

Objectives Explain the basics of: 1. Master Data Management

2. Data Warehousing

3. ETL

4. OLAP/Multidimensional Data

The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The material presented is not certain and may vary based on several factors. Microsoft makes no warranties, express, implied or statutory, as to the information in this presentation. Portions © 2010 Project Botticelli Ltd & entire material © 2010 Microsoft Corp. Some slides contain quotations from copyrighted materials by other authors, as individually attributed or as already covered by Microsoft Copyright ownerships. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Project Botticelli Ltd as of the date of this presentation. Because Project Botticelli & Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft and Project Botticelli cannot guarantee the accuracy of any information provided after the date of this presentation. Project Botticelli makes no warranties, express, implied or statutory, as to the information in this presentation. E&OE.

This seminar is based on a number of sources including a few dozen of Microsoft-owned presentations, used with permission. Thank you to Chris Dial, Tara Seppa, Aydin Gencler, Ivan Kosyakov, Bryan Bredehoeft, Marin Bezic, and Donald Farmer with his entire team for all the support.

Page 3: The IT Perspective: Data Warehousing, Management, and ... · 2 2 Objectives Explain the basics of: 1. Master Data Management 2. Data Warehousing 3. ETL 4. OLAP/Multidimensional Data

3 3

Page 4: The IT Perspective: Data Warehousing, Management, and ... · 2 2 Objectives Explain the basics of: 1. Master Data Management 2. Data Warehousing 3. ETL 4. OLAP/Multidimensional Data

4

Master Data Management

Page 5: The IT Perspective: Data Warehousing, Management, and ... · 2 2 Objectives Explain the basics of: 1. Master Data Management 2. Data Warehousing 3. ETL 4. OLAP/Multidimensional Data

5 5

MDM

Ensures consistency of data across all organisational uses

Impacts overall data quality

Processes and tools for: Collection, aggregation, matching, distribution, and persistence of data

Consistently

Related to Federated Data Management

Key to MDM: Modelling

Page 6: The IT Perspective: Data Warehousing, Management, and ... · 2 2 Objectives Explain the basics of: 1. Master Data Management 2. Data Warehousing 3. ETL 4. OLAP/Multidimensional Data

6 6

Why MDM? It’s About Evolution of Enterprise Architecture

Page 7: The IT Perspective: Data Warehousing, Management, and ... · 2 2 Objectives Explain the basics of: 1. Master Data Management 2. Data Warehousing 3. ETL 4. OLAP/Multidimensional Data

7 7

MDM Processes

• Batched Acquisition from Staging Tables

• Members, Attributes, Parent-Child Relationships

• SQL Integration Services

Import & Integration

• Versioning Changes

• Auditing

• Compliance

• Tracking of Instances

Modeling • Subscription Views

• Export to:

• Operational Systems

• Data Warehouses

• BI Analytics

• Reporting Tools

Export & Subscription

Page 8: The IT Perspective: Data Warehousing, Management, and ... · 2 2 Objectives Explain the basics of: 1. Master Data Management 2. Data Warehousing 3. ETL 4. OLAP/Multidimensional Data

8 8

Microsoft Master Data Services SQL 2008 R2 Enterprise, Datacenter, Developer

Tools: Master Data Manager

Primary tool for managing your master data

MDS Configuration Manager

IT Pro tool

MDS Web Service For developers wanting to extend MDS

Concepts: Models

Entities

Attributes

Members

Hierarchies

Collections

Versions

Database

Page 9: The IT Perspective: Data Warehousing, Management, and ... · 2 2 Objectives Explain the basics of: 1. Master Data Management 2. Data Warehousing 3. ETL 4. OLAP/Multidimensional Data

9 9

Modelling Master Data

Model organises data at highest level Allowing versioning of changes to data

There are typically four categories of models: People (Customers, Staff)

Places (Geographies, Cities, Countries)

Things (Products)

Concepts (Accounts, Behaviours, Transactions)

Page 10: The IT Perspective: Data Warehousing, Management, and ... · 2 2 Objectives Explain the basics of: 1. Master Data Management 2. Data Warehousing 3. ETL 4. OLAP/Multidimensional Data

10 10

Example: Product MDM Model

Product (model)

Product (entity)

Name (free-form attr)

Code (free-form attr)

Subcategory (domain-

based attr)

Name (free-form attr)

Code (free-form attr)

Category (domain-

based attr)

Name (free-form attr)

Code (free-form attr)

StandardCost (free-form

attr)

ListPrice (free-form

attr)

Photo (file attr)

Page 11: The IT Perspective: Data Warehousing, Management, and ... · 2 2 Objectives Explain the basics of: 1. Master Data Management 2. Data Warehousing 3. ETL 4. OLAP/Multidimensional Data

11

1. Reviewing a Data Model Using Master Data Services

Page 12: The IT Perspective: Data Warehousing, Management, and ... · 2 2 Objectives Explain the basics of: 1. Master Data Management 2. Data Warehousing 3. ETL 4. OLAP/Multidimensional Data

12

Data Warehouse

Page 13: The IT Perspective: Data Warehousing, Management, and ... · 2 2 Objectives Explain the basics of: 1. Master Data Management 2. Data Warehousing 3. ETL 4. OLAP/Multidimensional Data

13 13

OLE DB

ODBC

DB2 Oracle

XML

SQL Server

Analysis Services

SQL Server

Report Server Models

SQL Server

Data Mining Models

SQL Server

Integration Services

MySAP

Hyperion Essbase

SAP

NetWeaver BI SQL Server

Teradata

Rich Connectivity Data Providers

Page 14: The IT Perspective: Data Warehousing, Management, and ... · 2 2 Objectives Explain the basics of: 1. Master Data Management 2. Data Warehousing 3. ETL 4. OLAP/Multidimensional Data

14 14

Let’s Store the Intelligence: DW

SQL Server Analysis Services server is a logical endpoint for data

Analytical data physically rests in another relational database called a Data Warehouse

Modelling of DW is a key BI function Integration with MDM is desirable

Page 15: The IT Perspective: Data Warehousing, Management, and ... · 2 2 Objectives Explain the basics of: 1. Master Data Management 2. Data Warehousing 3. ETL 4. OLAP/Multidimensional Data

15 15

Star Schema

Page 16: The IT Perspective: Data Warehousing, Management, and ... · 2 2 Objectives Explain the basics of: 1. Master Data Management 2. Data Warehousing 3. ETL 4. OLAP/Multidimensional Data

16 16

Star Schema Benefits

Simple, not-so-normalized model

High-performance queries Especially with Star Join Query Optimization

Mature and widely supported

Low-maintenance

Page 17: The IT Perspective: Data Warehousing, Management, and ... · 2 2 Objectives Explain the basics of: 1. Master Data Management 2. Data Warehousing 3. ETL 4. OLAP/Multidimensional Data

17 17

Snowflake Dimension Tables Define hierarchies using multiple dimension tables

Support fact tables with varying granularity

Simplify consolidation of heterogeneous data

Potential for slower query performance in relational reporting

No difference in performance in Analysis Services database

Page 18: The IT Perspective: Data Warehousing, Management, and ... · 2 2 Objectives Explain the basics of: 1. Master Data Management 2. Data Warehousing 3. ETL 4. OLAP/Multidimensional Data

18 18

Fact Table Fundamentals

Collection of measurements associated with a specific business process

Specific column types Foreign keys to dimensions

Measures – numeric and additive

Metadata and lineage

Consistent granularity – the most atomic level by which the facts can be defined

Page 19: The IT Perspective: Data Warehousing, Management, and ... · 2 2 Objectives Explain the basics of: 1. Master Data Management 2. Data Warehousing 3. ETL 4. OLAP/Multidimensional Data

19 19

Fact Table Examples

Day Grain

Quarter Grain

Reseller sales data by:

•Product

•Order Date

•Reseller

•Employee

•Sales Territory

Sales quota data by:

•Employee

•Time

Page 20: The IT Perspective: Data Warehousing, Management, and ... · 2 2 Objectives Explain the basics of: 1. Master Data Management 2. Data Warehousing 3. ETL 4. OLAP/Multidimensional Data

20 20

Date Dimension Table

Most common dimension used in analysis (aka Time dimension)

Use consistently with all facts

Useful common attributes – Year, Quarter, Month, Day

Time series analysis support

Navigation and summarization enabled with hierarchies, such as calendar or fiscal

Single table design (typically not snowflake design)

Tip: Format the key of the dimension as yyyymmdd (e.g. 20100115) to make it readily understandable

Page 21: The IT Perspective: Data Warehousing, Management, and ... · 2 2 Objectives Explain the basics of: 1. Master Data Management 2. Data Warehousing 3. ETL 4. OLAP/Multidimensional Data

21 21

Parent-Child Hierarchy

A dimension that contains a parent attribute

A parent attribute describes a self-referencing relationship, or a self-join, within a dimension table

Common examples Organizational charts

General Ledger structures

Bill of Materials

Page 22: The IT Perspective: Data Warehousing, Management, and ... · 2 2 Objectives Explain the basics of: 1. Master Data Management 2. Data Warehousing 3. ETL 4. OLAP/Multidimensional Data

22 22

Parent-Child Hierarchy Example

Brian

Amy

Stacia

Stephen

Shu Michael

Peter

José

Syed

Page 23: The IT Perspective: Data Warehousing, Management, and ... · 2 2 Objectives Explain the basics of: 1. Master Data Management 2. Data Warehousing 3. ETL 4. OLAP/Multidimensional Data

23 23

Slowly Changing Dimensions

Maintain historical context as dimension data changes

Three common ways (there are more): Type 1: Overwrite the existing dimension record

Type 2: Insert a new ‘versioned’ dimension record

Type 3: Track limited history with attributes

Page 24: The IT Perspective: Data Warehousing, Management, and ... · 2 2 Objectives Explain the basics of: 1. Master Data Management 2. Data Warehousing 3. ETL 4. OLAP/Multidimensional Data

24 24

SCD Type 1

Existing record is updated

History is not preserved

Page 25: The IT Perspective: Data Warehousing, Management, and ... · 2 2 Objectives Explain the basics of: 1. Master Data Management 2. Data Warehousing 3. ETL 4. OLAP/Multidimensional Data

25 25

SCD Type 2

Existing record is ‘expired’ and new record inserted

History is preserved

Most common form of SCD

Page 26: The IT Perspective: Data Warehousing, Management, and ... · 2 2 Objectives Explain the basics of: 1. Master Data Management 2. Data Warehousing 3. ETL 4. OLAP/Multidimensional Data

26 26

SCD Type 3

Existing record is updated

Limited history is preserved

Implementation is rare

SalesTerritoryKey update to 10

Page 27: The IT Perspective: Data Warehousing, Management, and ... · 2 2 Objectives Explain the basics of: 1. Master Data Management 2. Data Warehousing 3. ETL 4. OLAP/Multidimensional Data

27 27

Let’s Get the Data

We would like to populate facts and dimensions in our Data Warehouse from OLTP data...

Page 28: The IT Perspective: Data Warehousing, Management, and ... · 2 2 Objectives Explain the basics of: 1. Master Data Management 2. Data Warehousing 3. ETL 4. OLAP/Multidimensional Data

28

Integration and ETL

Page 29: The IT Perspective: Data Warehousing, Management, and ... · 2 2 Objectives Explain the basics of: 1. Master Data Management 2. Data Warehousing 3. ETL 4. OLAP/Multidimensional Data

29 29

Let’s do ETL with SSIS

SQL Server Integration Services (SSIS) service

SSIS object model

Two distinct runtime engines: Control flow

Data flow

32-bit and 64-bit editions

Page 30: The IT Perspective: Data Warehousing, Management, and ... · 2 2 Objectives Explain the basics of: 1. Master Data Management 2. Data Warehousing 3. ETL 4. OLAP/Multidimensional Data

30 30

The Package

The basic unit of work, deployment, and execution

An organized collection of: Connection managers

Control flow components

Data flow components

Variables

Event handlers

Configurations

Can be designed graphically or built programmatically

Saved in XML format to the file system or SQL Server

Page 31: The IT Perspective: Data Warehousing, Management, and ... · 2 2 Objectives Explain the basics of: 1. Master Data Management 2. Data Warehousing 3. ETL 4. OLAP/Multidimensional Data

31 31

Control Flow

Control flow is a process-oriented workflow engine

A package contains a single control flow

Control flow elements Containers

Tasks

Precedence constraints

Variables

Page 32: The IT Perspective: Data Warehousing, Management, and ... · 2 2 Objectives Explain the basics of: 1. Master Data Management 2. Data Warehousing 3. ETL 4. OLAP/Multidimensional Data

32 32

Data Flow

The Data Flow Task

Performs traditional ETL and more

Fast and scalable

Data Flow Components

Extract data from Sources

Load data into Destinations

Modify data with Transformations

Service Paths

Connect data flow components

Create the pipeline

Page 33: The IT Perspective: Data Warehousing, Management, and ... · 2 2 Objectives Explain the basics of: 1. Master Data Management 2. Data Warehousing 3. ETL 4. OLAP/Multidimensional Data

33

1. Using SQL Server Integration Services for Aggregating and Deriving Data

Page 34: The IT Perspective: Data Warehousing, Management, and ... · 2 2 Objectives Explain the basics of: 1. Master Data Management 2. Data Warehousing 3. ETL 4. OLAP/Multidimensional Data

34

OLAP/Multidimensional Data

Page 35: The IT Perspective: Data Warehousing, Management, and ... · 2 2 Objectives Explain the basics of: 1. Master Data Management 2. Data Warehousing 3. ETL 4. OLAP/Multidimensional Data

35 35

SQL Server 2008 Analysis Services

Multidimensional Analysis Aggregates and organizes data from business data sources

Performs calculations difficult to perform using relational queries

Supports advanced business intelligence, such as Key Performance Indicators

Data Mining Discovers patterns in both relational and OLAP data

Enhances the OLAP component with discovered results

Page 36: The IT Perspective: Data Warehousing, Management, and ... · 2 2 Objectives Explain the basics of: 1. Master Data Management 2. Data Warehousing 3. ETL 4. OLAP/Multidimensional Data

36 36

Cube = Unified Dimensional Model

Multidimensional data

Combination of measures and dimensions as one conceptual model

Measures are sourced from fact tables

Dimensions are sourced from dimension tables

Page 37: The IT Perspective: Data Warehousing, Management, and ... · 2 2 Objectives Explain the basics of: 1. Master Data Management 2. Data Warehousing 3. ETL 4. OLAP/Multidimensional Data

37 37

Dimensions

Members from tables/views in a data source view (based on a Data Warehouse)

Contain attributes matching dimension columns

Organize attributes as hierarchies One All level and one leaf level

User hierarchies are multi-level combinations of attributes

Can be placed in display folders

Used for slicing and dicing by attribute

Page 38: The IT Perspective: Data Warehousing, Management, and ... · 2 2 Objectives Explain the basics of: 1. Master Data Management 2. Data Warehousing 3. ETL 4. OLAP/Multidimensional Data

38 38

Hierarchies

Benefits View of data at different levels of summarization

Path to drill down or drill up

Implementation Denormalized star schema dimension

Normalized snowflake dimension

Self-referencing relationship

Page 39: The IT Perspective: Data Warehousing, Management, and ... · 2 2 Objectives Explain the basics of: 1. Master Data Management 2. Data Warehousing 3. ETL 4. OLAP/Multidimensional Data

39 39

Hierarchy

Defined in Analysis Services

Ordered collection of attributes into levels

Navigation path through dimensional space

Very important to get right!

Customers by Geography

Country

State

City

Customer

Customers by Demographics

Marital

Gender

Customer

Page 40: The IT Perspective: Data Warehousing, Management, and ... · 2 2 Objectives Explain the basics of: 1. Master Data Management 2. Data Warehousing 3. ETL 4. OLAP/Multidimensional Data

40 40

Measure Group

Group of measures with same dimensionality

Analogous to a fact table

Cube can contain more than one measure group

E.g. Sales, Inventory, Finance

Defined by dimension relationships

Page 41: The IT Perspective: Data Warehousing, Management, and ... · 2 2 Objectives Explain the basics of: 1. Master Data Management 2. Data Warehousing 3. ETL 4. OLAP/Multidimensional Data

41 41

Sales Inventory Finance

Customers X

Products X X

Time X X X

Promotions X

Warehouse X

Department X

Account X

Scenario X

Measure Group

Measure Group D

ime

ns

ion

Page 42: The IT Perspective: Data Warehousing, Management, and ... · 2 2 Objectives Explain the basics of: 1. Master Data Management 2. Data Warehousing 3. ETL 4. OLAP/Multidimensional Data

42 42

Dimension Relationships

Define interaction between dimensions and measure groups

Relationship types Regular

Reference

Fact (Degenerate)

Many-to-many

Data mining

Page 43: The IT Perspective: Data Warehousing, Management, and ... · 2 2 Objectives Explain the basics of: 1. Master Data Management 2. Data Warehousing 3. ETL 4. OLAP/Multidimensional Data

43 43

Customer

City

State

Country

Gender Marital

Country

State

City

Customer

Gender

Customer

Marital

Gender

Customer

Customer

City

State

Country

Gender

Marital

Attributes Hierarchies

Age

Dimension Model

Page 44: The IT Perspective: Data Warehousing, Management, and ... · 2 2 Objectives Explain the basics of: 1. Master Data Management 2. Data Warehousing 3. ETL 4. OLAP/Multidimensional Data

44 44

Calculations

Expressions evaluated at query time for values that cannot be stored in fact table

Types of calculations Calculated members

Named sets

Scoped assignments

Calculations are defined using MDX

MDX = MultiDimensional EXpressions

Page 45: The IT Perspective: Data Warehousing, Management, and ... · 2 2 Objectives Explain the basics of: 1. Master Data Management 2. Data Warehousing 3. ETL 4. OLAP/Multidimensional Data

45

1. Using BIDS to Review Dimension Design

2. Cube Design and Functionality

Page 46: The IT Perspective: Data Warehousing, Management, and ... · 2 2 Objectives Explain the basics of: 1. Master Data Management 2. Data Warehousing 3. ETL 4. OLAP/Multidimensional Data

46 46

Summary

As a platform for enterprise Business Intelligence you should consider four services:

Data Warehouse (can be relational)

Process for Data Management (MDS)

Process for Data Integration (ETL)

Analysis (OLAP, Data Mining, Columnar)

= SQL Server 2008 R2

Page 47: The IT Perspective: Data Warehousing, Management, and ... · 2 2 Objectives Explain the basics of: 1. Master Data Management 2. Data Warehousing 3. ETL 4. OLAP/Multidimensional Data

47 47

© 2010 Microsoft Corporation & Project Botticelli Ltd. All rights reserved. The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The material presented is not certain and may vary based on several factors. Microsoft makes no warranties, express, implied or statutory, as to the information in this presentation. Portions © 2010 Project Botticelli Ltd & entire material © 2010 Microsoft Corp. Some slides contain quotations from copyrighted materials by other authors, as individually attributed or as already covered by Microsoft Copyright ownerships. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Project Botticelli Ltd as of the date of this presentation. Because Project Botticelli & Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft and Project Botticelli cannot guarantee the accuracy of any information provided after the date of this presentation. Project Botticelli makes no warranties, express, implied or statutory, as to the information in this presentation. E&OE.