1 designing and tuning high speed data loading thomas kejser principal program manager...

41
1 Designing and Tuning High Speed Data Loading Thomas Kejser Principal Program Manager [email protected]

Upload: felix-mathews

Post on 04-Jan-2016

218 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: 1 Designing and Tuning High Speed Data Loading Thomas Kejser Principal Program Manager tkejser@microsoft.com

1

Designing and Tuning High Speed Data Loading

Thomas KejserPrincipal Program [email protected]

Page 2: 1 Designing and Tuning High Speed Data Loading Thomas Kejser Principal Program Manager tkejser@microsoft.com

2

Agenda

Tuning Methodology Bulk Load API Basics Design Pattern and Techniques

Parallelism Table Layout

Tuning the SQL Server Engine Tuning the Network Stack Tuning Integration Services

Page 3: 1 Designing and Tuning High Speed Data Loading Thomas Kejser Principal Program Manager tkejser@microsoft.com

3

Tuning MethodologyTuning ETL and ELT

Page 4: 1 Designing and Tuning High Speed Data Loading Thomas Kejser Principal Program Manager tkejser@microsoft.com

4

The Tuning Loop

Generate Hypothesi

s

Measure

ChangeMeasure

Save Result

Get a baseline Make small change

at a time Agree on targets for

optimization Actual runtime CPU, Memory, I/O

The greedy tuner: “Tune it till it breaks,

then fix it, so you can break it again”

Page 5: 1 Designing and Tuning High Speed Data Loading Thomas Kejser Principal Program Manager tkejser@microsoft.com

5

Tools of the Trade - Windows

Perfmon Logical Disk Memory Processor Process (specifically the DTEXEC process) Network Interface

Task Manager WinDbg KernRate

Page 6: 1 Designing and Tuning High Speed Data Loading Thomas Kejser Principal Program Manager tkejser@microsoft.com

6

Tool of the Trade – SQL Server

Sys.dm_os_wait_stats All my tuning starts here Get familiar with common wait types

Sys.dm_os_latch_stats Allows deep dive into LATCH_<X> waits

Sys.dm_os_spinlock_stats When too much CPU seems to be spend

Sys.dm_io_virtual_filestats Because I/O systems are rarely perfect

Page 7: 1 Designing and Tuning High Speed Data Loading Thomas Kejser Principal Program Manager tkejser@microsoft.com

7

Bulk load API BasicsDesigning and Tuning High Speed Data Loading

Page 8: 1 Designing and Tuning High Speed Data Loading Thomas Kejser Principal Program Manager tkejser@microsoft.com

8

Four ways to Load Data to SQL Server Integration Services

OLEDB Destination SQL Server Destinations

BULK INSERT CSV or fixed width files

BCP Like BULK INSERT, but can be run remotely

INSERT ... SELECT

Page 9: 1 Designing and Tuning High Speed Data Loading Thomas Kejser Principal Program Manager tkejser@microsoft.com

9

Minimally logged and Bulk

Bulk Load Feeds a continuous stream of data into a table As opposed to running singleton INSERT statements

Minimally logged Only allocations are logged, not individual rows/pages

Key Takeway: An operation can be a bulk load operation without being minimally logged

Page 10: 1 Designing and Tuning High Speed Data Loading Thomas Kejser Principal Program Manager tkejser@microsoft.com

10

To TABLOCK or not to TABLOCK

General Rule (batch style): Heaps: Use TABLOCK on Heaps Cluster Indexes: Do NOT use TABLOCK

Minimally logged: INSERT Heap WITH (TABLOCK) SELECT ... If TF610 is on:

INSERT ClusterIndex SELECT ...

Same rules apply for SSIS OLEDB and SQL Destinations in SSIS

Page 11: 1 Designing and Tuning High Speed Data Loading Thomas Kejser Principal Program Manager tkejser@microsoft.com

11

Design PatternsDesigning and Tuning High Speed Data Loading

Page 12: 1 Designing and Tuning High Speed Data Loading Thomas Kejser Principal Program Manager tkejser@microsoft.com

12

Integration Services or T-SQL

Sometimes: Matter or preference Integration Services is graphical

Some users like this Hard to make modular

SQL Server uses T-SQL ”text language” Modular programming

The right tool for the right job Learn both…

Page 13: 1 Designing and Tuning High Speed Data Loading Thomas Kejser Principal Program Manager tkejser@microsoft.com

13

SQL Server – Which load method?BULK INSERT / BCP Pro

Can takes BU-lock No need for Linked

Servers or OPENROWSET

Cons Only CSV and fixed width

files for input

INSERT ... SELECT Pro

Can perform transformations

Any OLEDB enabled input

Cons Takes X-locks on table Linked Servers or

OPENROWSET needed

Page 14: 1 Designing and Tuning High Speed Data Loading Thomas Kejser Principal Program Manager tkejser@microsoft.com

14

Integration Services – Which Destination?OLEDB Destination Pros:

Can be used over TCP/IP ETL Servers can be scaled

out remote

Con: Typically slower than SQL

Destination

SQL Server Destination Pro:

Fastest option Easy to configure

Con: Must run on same box as

SQL Server (shared memory connections)

Page 15: 1 Designing and Tuning High Speed Data Loading Thomas Kejser Principal Program Manager tkejser@microsoft.com

15

Design Pattern: Parallel Load

Create a (priority) queue for your packages SQL Table good for this purpose

Packages / T-SQL include a loop: Loop takes one item from queue Until queue empty…

P5Pn …

Priority Queue

Get Task Do WorkLoop

Get Task Do WorkLoop

DTEXEC (1)

DTEXEC (2)P3P4 P1P2

Page 16: 1 Designing and Tuning High Speed Data Loading Thomas Kejser Principal Program Manager tkejser@microsoft.com

16

0

Design Pattern: Table Hash Partitioning Create filegroups to hold the

partitions

Equally balance over LUN using optimal layout

Use CREATE PARTITION FUNCTION command

Partition the tables into #cores partitions

Use CREATE PARTITION SCHEME command

Bind partition function to filegroups

Add hash column to table (tinyint, just one byte per row)

Calculate a good hash distribution

For example, use hashbytes with modulo or binary_checksum

123456

253254255

hash

Page 17: 1 Designing and Tuning High Speed Data Loading Thomas Kejser Principal Program Manager tkejser@microsoft.com

17

Design Pattern: Large UpdatesSale

s200

1200

2200

3200

4

Sales_Delta BULK INSERT

Update Records

Sales_OldSWITCH

Sales_New

SWITCH

Sales Updated

Page 18: 1 Designing and Tuning High Speed Data Loading Thomas Kejser Principal Program Manager tkejser@microsoft.com

18

Design Pattern: Large DeletesSale

s200

1200

2200

3200

4

Sales_Temp

(2001Filtered)

BULK INSERT

Sales_Temp

(2001)

SWITCH

SWITCH2001

(Filtered)

Page 19: 1 Designing and Tuning High Speed Data Loading Thomas Kejser Principal Program Manager tkejser@microsoft.com

19

Tuning the SQL Server Engine

Designing and Tuning High Speed Data Loading

Page 20: 1 Designing and Tuning High Speed Data Loading Thomas Kejser Principal Program Manager tkejser@microsoft.com

20

ALLOC_FREESPACE_CACHE - Heap limits Measure:

Sys.dm_os_latch_waits Long waits for

ALLOC_FREESPACE_CACHE SQL Server® Books Online:

“Used to synchronize the access to a cache of pages with available space for heaps and binary large objects (BLOBs). Contention on latches of this class can occur when multiple connections try to insert rows into a heap or BLOB at the same time. You can reduce this contention by partitioning the object.”

Hypothesis: More heaps = more speed

0 5 10 15 20 25 300.0

50.0

100.0

150.0

200.0

250.0

Concurrent Bulks

MB

/Sec

Page 21: 1 Designing and Tuning High Speed Data Loading Thomas Kejser Principal Program Manager tkejser@microsoft.com

21

PAGELATCH_UP – PFS contention Measure:

sys.dm_os_wait_stats

Hypothesis Generation I/O problem? What can we predict?

Fix: Add more files to the filegoup!

Page 22: 1 Designing and Tuning High Speed Data Loading Thomas Kejser Principal Program Manager tkejser@microsoft.com

22

RESOURCE_SEMAPHORE- Query memory usage DW load queries will

often be very memory intensive

By default, a single query can max use 25% of SQL Server’s allocated memory

Queries waiting to get a memory grant will wait for: RESOURCE_SEMAPHORE

Can use RG to work around it

Page 23: 1 Designing and Tuning High Speed Data Loading Thomas Kejser Principal Program Manager tkejser@microsoft.com

23

SOS_SCHEDULER_YIELD

Hypothesis: Caused by two bulk commands at same scheduler

Predict: We should see multiple bulk commands on same scheduler

Observe: And we do… scheduler_id in sys.dm_exec_requests

Page 24: 1 Designing and Tuning High Speed Data Loading Thomas Kejser Principal Program Manager tkejser@microsoft.com

24

Fixing SOS_SCHEDULER_YIELD

How can we fix this? Two ways:

Terminate and reconnect Soft NUMA

Soft-NUMANode 0

Soft-NUMANode X

x CPU cores

TCP port 1433

TCP port 1433 + X

BULK INSERT

x CPU cores

Core X

Core 0

BULK INSERT

Page 25: 1 Designing and Tuning High Speed Data Loading Thomas Kejser Principal Program Manager tkejser@microsoft.com

25

I/O Related Waits for BULK INSERT BULK insert uses a

double buffering scheme

Important to feed it fast enough

Also, target SQL Server must be able to absorb writes

64KB

CSV

64KB

Parse

Table

IMPROVIO_WAITOLEDBASYNC_NETWORK_IO

PAGEIOLATCH_EX

Page 26: 1 Designing and Tuning High Speed Data Loading Thomas Kejser Principal Program Manager tkejser@microsoft.com

26

CXPACKET – When it Matters

Statements of type INSERT…SELECT

Measure: Sometimes throughput drops with higher DOP

Hypothesis: backpressure in query execution

1 6 11 16 21 26 31 36 41 460.0

5.0

10.0

15.0

20.0

25.0

30.0

35.0

40.0

45.0

50.0

Throughput / DOP

DOP

Thro

ughput

(MB

/sec(

Page 27: 1 Designing and Tuning High Speed Data Loading Thomas Kejser Principal Program Manager tkejser@microsoft.com

27

Drinking From a Fire Hose

10.020.030.040.00

20,000,000

40,000,000

60,000,000

80,000,000

100,000,000

120,000,000

140,000,000

160,000,000

180,000,000

200,000,000

CXPACKET waits / Throughput

Throughput (MB/sec)

CX

PA

CK

ET W

ait

s

Solution: OPTION (MAXDOP = X)

Page 28: 1 Designing and Tuning High Speed Data Loading Thomas Kejser Principal Program Manager tkejser@microsoft.com

28

SQL Server waits - SummaryWait Type Typical Cause Resolution

PAGELATCH_UP Contention on PFS pages Add more data files to filegroup

ALLOC_FREESPACE_CACHE Heap allocation bottleneck Partition target table and use SWITCH

SOS_SCHEDULER_YIELD Network speed not keeping up Optimize network settings in Windows (Jumbo Frames)Increase packet size

RESOURCE_SEMAPHORE Too much memory used by query Optimize query for less memory or use Resource Governor to limit max allocation

LCK_X Locks prevent parallelism Use correct lock hints

WRITELOG Transaction log contention Use TF610, seeks minimally logged operatorions

PAGEIOLATCH_<X> I/O system not keeping Tune I/O

IMPROV_IOWAIT Input file I/O too slow Improve input file latency and/or through

CXPACKET Normallly harmless. But may be too much coordination

Use MAXDOP hint, but carefully

OLEDB/ASYNC_NETWORK_IO Not feeding bulk load fast enough Optimize source

Page 29: 1 Designing and Tuning High Speed Data Loading Thomas Kejser Principal Program Manager tkejser@microsoft.com

29

Tuning the Network StackDesigning and Tuning High Speed Data Loading

Page 30: 1 Designing and Tuning High Speed Data Loading Thomas Kejser Principal Program Manager tkejser@microsoft.com

30

How to Affinitize NICs

Using the Interrupt-Affinity Policy Tool you can affinitize individual NICs to CPU cores

Affinitize each of the NIC to their own core One NIC per hard NUMA

node You mileage may very –

depends on the box

Match Soft NUMA TCP/IP connections with NIC

NIC on the hardware NUMA node maps to SQL bulk stream target on same node

Page 31: 1 Designing and Tuning High Speed Data Loading Thomas Kejser Principal Program Manager tkejser@microsoft.com

31

Tune Network Parameters

Jumbo Frames = 9014 bytes enabled Adaptive Inter-Frame spacing disabled Flow control = Tx & Rx enabled Client & server Interrupt Moderation =

Medium Coalesc buffers = 256 Set server Rx buffers to 512 and server Tx

buffers to 512 Set client Rx buffers to 512 and client Tx

buffers to 256 Link speed 1000mbps Full Duplex

Page 32: 1 Designing and Tuning High Speed Data Loading Thomas Kejser Principal Program Manager tkejser@microsoft.com

32

Network Packet Size

Measure Perfmon shows huge

discrepancy between num reads and writes

Hypothesis: This is caused by small

network packet size (Default 4096) forcing stream to be broken into smaller pieces

Test and prove: Adjusting network packet

size to 32K Increases throughput by

15%

Page 33: 1 Designing and Tuning High Speed Data Loading Thomas Kejser Principal Program Manager tkejser@microsoft.com

33

Tuning Integration ServicesDesigning and Tuning High Speed Data Loading

Page 34: 1 Designing and Tuning High Speed Data Loading Thomas Kejser Principal Program Manager tkejser@microsoft.com

34

Integration Services vs. SQL

Lab Test Setup Transform fact data

with surrogate key lookups

5 dimension tables, 100K rows each

Partitioned fact table, total of 320M rows

Test speed of hash joins

Test 2: Raw Join Time/s Krows/s

SSIS 2008 144 2222

SQL MAXDOP = 0 158 2025

SQL MAXDOP = 1 x 32 162 1975

Test 3: Join and write

SQL MAXDOP = 1 x 32 246 1301

SSIS 2008 278 1151

SQL MAXDOP = 0 1927 166

Integration Services lookup join is comparable in speed with T-SQL!

Page 35: 1 Designing and Tuning High Speed Data Loading Thomas Kejser Principal Program Manager tkejser@microsoft.com

35

Baseline of Package

Sanity check: How much memory does each package use? How much CPU does each package stream use? Need enough CPU and Memory to run them all

Performance counters: Process – Private Bytes / Working Set (DTEXEC) Processor – % Processor Time Network interface

Network / Current Bandwidth Network / Bytes Total/sec

Page 36: 1 Designing and Tuning High Speed Data Loading Thomas Kejser Principal Program Manager tkejser@microsoft.com

36

Scaling the Package - Method

Using the parallel load technique described earlier you can run multiple copies of the package

Using the baseline of the package, you can now calculate how many scale servers you will need

Page 37: 1 Designing and Tuning High Speed Data Loading Thomas Kejser Principal Program Manager tkejser@microsoft.com

37

Data Loading – Fast Enough?

Bulk load scales near linearly with bulk streams Measured so far up to 96

cores

Possible to reach 100% CPU load on all cores “Just” Get rid of all bottlenecks

Page 38: 1 Designing and Tuning High Speed Data Loading Thomas Kejser Principal Program Manager tkejser@microsoft.com

38

Q A&Q A&

Page 39: 1 Designing and Tuning High Speed Data Loading Thomas Kejser Principal Program Manager tkejser@microsoft.com

39

© 2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after

the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Page 40: 1 Designing and Tuning High Speed Data Loading Thomas Kejser Principal Program Manager tkejser@microsoft.com

40

APPENDIXTuning ETL and ELT