introducing parallel data warehouse (the project formerly known as madison)

Thomas KejserSenior Program ManagerMicrosoft Corp.

Introducing Parallel Data Warehouse(The project formerly known as Madison)

2

AgendaThe Typical problem with data warehousesMPP vs SMPSQL Server Parallel Data Warehouse

Hardware architectureQuery ProcessingData Loading

My email: [email protected]

mailto:[email protected]

3

Introducing Parallel Data WarehouseThe Typical Problem with Data Warehouses

11

Microsoft DW Solutions

SSRS SSAS SSIS

Microsoft & PartnerServices

12

Symmetric Multi-Processing vs. Massively Parallel

Processing

HW advancements increasing ability to scale-up

But scaling limited by designHigh end SMP very expensive

Extremely high concurrency for simple workloadsLess than 1-2 TB of data SMP will almost always be better.

At higher sizes - depends

HW advancements increasing ability to scale-out

Scaling to 1 PB+Scale out is relatively low cost

Relatively high concurrency for complex workloads> 2TB up to 1 PB for DW workloads

Data Warehousing(esp. VLDB, complex workloads)

OLTP, Transactional,Data Warehousing

MPPSMP

13

PDW: No Assembly RequiredSoftwareServersStorage arraysNetwork switchesCablesLicensesPower distribution unitsRacksComes fully assembledSoftware is installed at the factoryFully configured

14

Basic Building BlocksCompute Nodes

Handles the CPU cycles required to answer queriesStorage Nodes

Stores data using Fiber Attached Disks. Scaled to support CPU with enough throughput

Other nodesMore about those later

15

Anatomy of a Compute Node

Pre-configured For Each SQL Server Instance On Each Compute Node.

Drives Configured As RAID1 To Avoid Appliance Failover for a Single Drive FailureIBM Compute Nodes Will Have 1 Lun (1 RAID1 Pair)Dell Compute Nodes Will Have 2 Lun’s (2 RAID1 Pairs)HP Compute Nodes Will Have 3 Luns’s (3 RAID1 Pairs)

TempDB: Sort-work Area For Data Loading Into Clustered Index TablesWork Area for PDW Temporary Work FilesSpill Area For Hash Joins Not Fitting Into Memory

16

Anatomy of a Storage Node

Pre-configured4 RAID10 Pairs for Primary User Data1 RAID10 Pair for Database Logs2 LUN’s Are Spread Across Each RAID Pair

User Databases are Separate Physical SQL Server DatabasesStaging Database (Optional) Used for Loading & to Minimize Fragmentation

17

More Node TypesBackup node:

Stores backup files from the applianceCan be logged into by authorized Windows usersCan be augmented with 3rd party H/W and S/W

Landing Zone:Used as a holding place for data to be loadedCan be logged into by authorized Windows usersCan be augmented with 3rd party H/W and S/W

Management node:Runs the Windows domain controller (Active Directory)Used for deploying patches to all nodes in the applianceHolds images in case a node needs reimaging

18

Putting It All Together - PDWControl Node

Failover Protection:• Redundant Control Node• Redundant Compute Node• Cluster Failover

•Redundante Array of Inexpensive Databases

Spare Node

19

Software Architecture

SQL Server

DW Authenticati

on

DW Configuratio

nDW

Schema TempDB

MPP EngineData Movement

Service

IIS

Compute NodesCompute Nodes

Compute Node

Query Tool

SQL Server

Data Movement Service

User Data

Admin Console

MS BI(AS, RS)

Control Node

Other 3rd

Party Tools

OLEDB, ODBC, ADO.Net, JDBC

DWSQLInternet Explorer

Landing Zone Node


20

Create DatabaseCREATE DATABASE database_name WITH ( AUTOGROW = ON , REPLICATED_SIZE = 1024 , DISTRIBUTED_SIZE = 16384 , LOG_SIZE = 300)

21

Date Dim

D_DATE_SK

D_DATE_ID

D_DATE

D_MONTH

…

Item

I_ITEM_SK

I_ITEM_ID

I_REC_START_

DATE

I_ITEM_DESC

…

Store Sales

Ss_sold_date_sk

Ss_item_sk

Ss_customer_sk

Ss_cdemo_sk

Ss_store_sk

Ss_promo_sk

Ss_quantity

…

Promotion

P_PROMO_SK

P_PROMO_ID

P_START_DATE

_SK

P_END_DATE_

SK

…

Store

S_STORE_SK

S_STORE_ID

S_REC_START_D

ATE

S_REC_END_DAT

E

S_STORE_NAME

…

Customer

C-

CUSTOMER_SK

C_CUSTOMER_I

D

C_CURRENT_AD

DR

…

Customer

Demographics

CD_DEMO_SK

CD_GENDER

CD_MARITAL_STATU

S

CD_EDUCATION

…

Database Distributed & Replicated Tables

Data Distribution with Replication

C I

D

CD

S

P

C I

D

CD

S

P

C I

D

CD

S

P

C I

D

CD

S

P

C I

D

CD

S

P

C I

D

CD

S

P

SS

SS

SS

SS

SS

SS

Distribution and Replication

22

Table CreationCREATE TABLE table_name [ ( { <column_definition> } [ ,...n ] ) [ AS SELECT select_criteria ] [ WITH ( <table_option> ) ] [;] <column_definition> ::= column_name <data_type> [ NULL | NOT NULL ] <data

type> ::= type_name [ ( precision [ , scale ] ) ] <table_option> ::= { [ CLUSTER_ON ( column_name [ ,...n ] ) ]

, [ DISTRIBUTE_ON ( column_name ) ] | [ REPLICATE ] , [ PARTITION_ON column_name ( RANGE { LEFT | RIGHT } FOR VALUES

{ [ boundary_value [,...n] ] ) ) ] }

Type Class Types SupportedIntegers tinyint, smallint, int, bigintFloating point float, realCharacter char, varchar, nchar, nvarcharDate & time date, time, datetime, dateime2, datetimeoffset,

timestamp, smalldatetime

Fixed point decimal, money, smallmoneyBinary binary, varbinary (8192)Other uniqueidentifier (?)

23

Create Table – Behind the ScenesCreate Table store_sales withdistribute_on (ss_item_sk) partition_on(ss_sold_date_sk)cluster_on (ss_sold_date_sk)

8K8K

8K8K

8K

8 Filegroups (one per core) - 1 Table per Filegroup

12 Partitions(ss_sold_date_sk)

N-number ofPages

Row

24

Physical File Layout (Per Compute Node)

25

MPP Query ProcessingControl Node

Query Rewritten Into Steps That Run Efficiently On Compute Nodes

ODBC/JDBCSQL92 with Analytical Extensions

Distribution-incompatible JoinsResolved Using High Speed Dynamic Re-distribution

Select location, yearsum(b.sales_amt)from customer a, sales bwhere b.sales > 500 anda.custid = b.custidgroup by 2,1order by 1,2

26

MPP Execution PlansThe MPP engine creates parallel execution plans from client SQLThe plans can include the following types of operations:

SQL operations: used to pass SQL directly to SQL Server on 1 or more nodes.DMS operations: used to move data among the nodes in an appliance for further processing.Temp tables operations: used to stage data for further processing.Return operations: push data back to the client.

Simple plans may include just one type of operation.Complex plans may include all of these operations.Plans are executed serially, one step at a time.

27

Date Dim

D_DATE_SK

D_DATE_ID

D_DATE

D_MONTH

…

Item

I_ITEM_SK

I_ITEM_ID

I_REC_START_

DATE

I_ITEM_DESC

…

Store Sales

Ss_sold_date_sk

Ss_item_sk

Ss_customer_sk

Ss_cdemo_sk

Ss_store_sk

Ss_promo_sk

Ss_quantity

…

Promotion

P_PROMO_SK

P_PROMO_ID

P_START_DATE

_SK

P_END_DATE_

SK

…

Store

S_STORE_SK

S_STORE_ID

S_REC_START_D

ATE

S_REC_END_DAT

E

S_STORE_NAME

…

Customer

C-

CUSTOMER_SK

C_CUSTOMER_I

D

C_CURRENT_AD

DR

…

Customer

Demographics

CD_DEMO_SK

CD_GENDER

CD_MARITAL_STATU

S

CD_EDUCATION

…

Data Distribution with Replication Sales table distributed

on customer... And partitioned by time

Example Schema

28

Distribution Compatible QuerySELECT CustomerId, SUM(Amount) AS TotalSales,

SUM(Quantity) AS TotalUnitsSold

FROM Sales s

JOIN Item i ON s.ItemId = i.ItemId

WHERE SaleDate BETWEEN '2009-08-01' AND '2009-08-31‘ AND Description LIKE '%gadgets%'

GROUP BY CustomerId

ORDER BY CustomerId;

29

MPP Query PlanStep 1 – On each compute node:SELECT s.[customerid], sum(s.[amount]) AS totalsales, sum(s.

[quantity]) AS totalunitssold

FROM [tpch_3].[dbo].[h_sales_34] s JOIN [tpch_3].[dbo].item_37 I ON (s.[itemid] = i.[itemid])

WHERE (s.[saledate] BETWEEN '2009-08-01' AND '2009-08-31' and i.[description] like '%gadgets%')

GROUP BY s.[customerid]

ORDER BY s.[customerid];

30

Query 1 Processing Flow

SQL Server

DW Authenticati

on

DW Configuratio

nDW

Schema TempDB

Data Movement

Service

Compute Node 1

Query Tool

SQL Server


User Data

Control Node

MPP Engine

Parse SQLValidate & AuthorizeBuild MPP PlanExecute PlanReturn Data to Client

Compute Node N

SQL Server


User Data

31

Reshuffling the dataSELECT SaleDate, SUM(Amount) AS TotalSales,

SUM(Quantity) AS TotalUnitsSold

FROM Sales s JOIN Item i ON s.ItemId = i.ItemId

WHERE SaleDate BETWEEN '2009-08-01' AND '2009-08-31' AND Description LIKE '%gadgets%‘

GROUP BY SaleDate

ORDER BY SaleDate;

32

MPP Query PlanStep 1 – Create temp table on control nodeCREATE TABLE [tempdb].[dbo].Q_[TEMP_ID_6760]

( saledate DATE, totalsales DECIMAL(38, 2), totalunitssold INTEGER )

WITH (DATA_COMPRESSION = PAGE);

Step 2 – Run on each compute nodeSELECT s.[saledate], sum(s.[amount]) AS totalsales, sum(s.

[quantity]) AS totalunitssold

FROM [tpch_3].[dbo].[h_sales_34] s JOIN [tpch_3].[dbo].item_37 i ON (s.[itemid] = i.[itemid])

WHERE (s.[saledate] BETWEEN '2009-08-01' AND '2009-08-31' and i.[description] like '%gadgets%’)

GROUP BY s.[saledate]

33

MPP Query Plan continuedStep 3:SELECT [saledate], sum([totalsales]) AS totalsales,

sum([totalunitssold]) AS totalunitssold

FROM [tempdb].[dbo].Q_[TEMP_ID_6760]

GROUP BY [saledate]

ORDER BY [saledate]

Step 4:DROP TABLE [tempdb].[dbo].Q_[TEMP_ID_6760];

34

Reshuffling – Query Processing Flow

SQL Server

DW Authenticati

on

DW Configuratio

nDW

Schema TempDB

Data Movement

Service

Compute Node

Query Tool

SQL Server


User Data

Control Node

MPP Engine

Parse SQLValidate & AuthorizeBuild MPP PlanExecute PlanReturn Data to Client Compute Node

SQL Server


User Data

35

Control Node

Spare Node

Landing Zone Node

Text FileText

FileText FileText

File

Data Loading

Tables Are Hash Distributed Or

Replicated

36

Load File

Bulk Insert

Partitioned Staging

Table(Heap)

Insert-Select

Partitioned FinalTable(CIDX)

Sort each BATCH

in memory

or TempDB

Sort each partition

In memory

or TempDB

Bulk Insert Phase

Trace Flags None

BATCHSIZE Calculated

TABLOCK ON

TempDB Entire BATCHSIZE for Sort

TempDB Log Minimal

StageDB Log Minimal

ROLLBACK

Commits per BATCHSIZERollback to last BATCH Only

Trace Flags 610 per NUMA Session

MAXDOP 1 Per NUMA SessionTABLOCK OFF

TempDB Entire PARTITION for sort

TempDB Log Minimal

UserDB Log Twice Data File Size

ROLLBACK

Commits Full TRANSACTIONRollback Full TRANSACTION

Insert-Select Phase

Data Loader Process

37

© 2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after

the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

introducing parallel data warehouse (the project formerly known as madison)

Documents

sql data warehouse

tb of data smp

data warehousing mppsmp

parallel query

enterprisewindows server

test server

balancethis server

core serverconnected