scaling out ssis with parallelism, diving deep into the dataflow engine

40
Scaling Out SSIS with Parallelism

Upload: chris-adkin

Post on 07-Dec-2014

1.390 views

Category:

Technology


1 download

DESCRIPTION

Scaling out integration services with SSIS, incorporating a deep dive into the dataflow engine with XPerf.

TRANSCRIPT

Page 1: Scaling out SSIS with Parallelism, Diving Deep Into The Dataflow Engine

Scaling OutSSIS with

Parallelism

Page 2: Scaling out SSIS with Parallelism, Diving Deep Into The Dataflow Engine

An independent SQL ConsultantA user of SQL Server from version 2000 onwards with 12+ years

experience.A DBA / developer hybrid

About me . . .

Page 3: Scaling out SSIS with Parallelism, Diving Deep Into The Dataflow Engine

Techniques for scaling out the data flow and how well they scale

A look into the inner working of the dataflow engine using Xperf.

How ‘Elastic’ scalability might be achievedA wrap up with some key ‘Takeaway’ points

What Will Be Covered ?

Page 4: Scaling out SSIS with Parallelism, Diving Deep Into The Dataflow Engine

No parallel ‘On’ switchParallelism has to be implemented by design, at:Package levelIn the execution flowIn the data flow, by hand and / or throughTransforms that come with SSISThird party componentsSeparating out synchronous transforms

Integration Services Parallelism 101

Page 5: Scaling out SSIS with Parallelism, Diving Deep Into The Dataflow Engine

This flow helps determine:1. Maximum data flow performance <=

source extract speed Does the source need to be parallelized ?

2. CPU and I/O profile of the source when no back pressure is taking place.

Does this swamp the available hardware resources ?

Integration Services Performance 101

Page 6: Scaling out SSIS with Parallelism, Diving Deep Into The Dataflow Engine

Good parallel throughput requires:An even distribution of work between child

threads ( data flows )Hardware to be configured such that it is

“Hot spot free”SQL Server and SSIS configured such that

hardware resources are utilised evenly.In other words, the SSIS equivalent of Bad CX

Packet waits is to be avoided.

Parallel Throughput 101

Page 7: Scaling out SSIS with Parallelism, Diving Deep Into The Dataflow Engine

Four different ways of extracting data from the source will be looked at:NTILEDELETE statement with an OUTPUT clauseHash partitioning the source tableSelect statement to ‘Partition’ the source by

TransactionID.

Parallel Source Extract

Page 8: Scaling out SSIS with Parallelism, Diving Deep Into The Dataflow Engine

Using “WITH RESULT SETS” To Use Stored Procedures As The Source

Page 9: Scaling out SSIS with Parallelism, Diving Deep Into The Dataflow Engine

SQL Server 2012 SP 1Windows server 2008 R2Adam Mechanic's “Big adventure” databaseHardware

Intel i960, 6 core, 12 logical threads 3.2 Ghz22 Gb memory2 x 80Gb Fusion IO (Gen 1) io drives

The “Lab” Environment

Page 10: Scaling out SSIS with Parallelism, Diving Deep Into The Dataflow Engine

Demo 1: Scaling out the source extract

Page 11: Scaling out SSIS with Parallelism, Diving Deep Into The Dataflow Engine

Scaling beyond three threads was initially hampered by PATCHLATCH_EX, LCK_M_X, LCK_M_IX and SOS_SCHEDULER_YIELD waits.

The ‘Winning’ approach:Partition the bigTransactionHistory evenly across twelve file groups, one per

logical processorAssign specific threads to specific partitions.Page and row locking turned off on the table and lock escalation set to auto on

the clustered primary key in order to force partition level locking.

Destructive Read

Page 12: Scaling out SSIS with Parallelism, Diving Deep Into The Dataflow Engine

Destructive Read Tuning For Four Data Flows

Test Execution Time ( s )

CPU Consumption

( % )

IO Throughput ( Mb/s)

% ImprovementFrom Baseline

Baseline 57 40 130

Forced partition level locking 33 46 215 42

OLE.DB provider for SQL used instead of SQL native client 28 50 240 51

Packet size changed from 4K default to 8K 22 50 275 61

Page 13: Scaling out SSIS with Parallelism, Diving Deep Into The Dataflow Engine

1 2 3 4 60

20

40

60

80

100

120

140

Execution Time (s) Per Data Flow (Thread) Count

Destructive Read Partition Scan Range Scan Ntile

Page 14: Scaling out SSIS with Parallelism, Diving Deep Into The Dataflow Engine

1 2 3 4 60

10

20

30

40

50

60

70

80

Average Percentage CPU Consumption Per Data Flow (Thread) Count

Destructive Read Partition Scan Range Scan NTILE

Page 15: Scaling out SSIS with Parallelism, Diving Deep Into The Dataflow Engine

Destructive Read Range Scan Partition Scan Ntile0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Wait Event Breakdown ( Percentage )

ASYNC_NETWORK_IO PREEMPTIVE_OS_WAITFORSINGLEOBJECTASYNC_IO_COMPLETION SOS_SCHEDULER_YIELDWRITELOG LOGBUFFERPAGEIOLATCH_SH

Page 16: Scaling out SSIS with Parallelism, Diving Deep Into The Dataflow Engine

NTILE is clearly the slowest approach.The range scan and partition scan can only be separated by CPU

consumption.Wait activity stats are dominated by ASYNC_NETWORK_IO and

PREEMPTIVE_WAITFORSINGLEOBJECT

The source is out performing the rest of flow.

Conclusions From Scaling The Extract

Page 17: Scaling out SSIS with Parallelism, Diving Deep Into The Dataflow Engine

Use a heap version of the bigTransactionHistory table partitioned across twelve file groups on (TransactionID % 12) + 1.

Compare the scalability of the balanced data distributor versus the conditional split.

Source is a single straight select from the bigTransactionHistory table.

Scaling Out Destination

Page 18: Scaling out SSIS with Parallelism, Diving Deep Into The Dataflow Engine

Synchronous Non blocking Rows in = Rows out

AsynchronousRows out usually <> Rows in Semi Blocking Blocking “Magic” Virtual buffer ;-)

A Recap On Transforms

Page 19: Scaling out SSIS with Parallelism, Diving Deep Into The Dataflow Engine

Demo 2: Scaling out the destination

Conditional Split Vs The Balanced Data Distributor

Results on next slide

Page 20: Scaling out SSIS with Parallelism, Diving Deep Into The Dataflow Engine

1 2 3 4 5 60

20

40

60

80

100

120

140

160

Execution Time (s) Per Output Count

Balanced Data Distributor Conditional Split

Saturation point, time to scale out

Page 21: Scaling out SSIS with Parallelism, Diving Deep Into The Dataflow Engine

1 2 3 4 5 60

50

100

150

200

250

IO Throughput (MB/s) Per Output Thread Count

Balanced Data Distributor Conitional Split

The two Fusion I/O cards are capable of more throughput than that which appears on any of the graphs in this material. What is presented is sustained throughput, when performing the actual tests, during check points, ‘Spikes’ of much higher throughput were observed.

Page 22: Scaling out SSIS with Parallelism, Diving Deep Into The Dataflow Engine

1 2 3 4 5 60

10

20

30

40

50

60

Average CPU Consumption ( % )Per Thread Count

Balanced Data Distributor Conditional Split

A transform level view of the CPU can be obtained via xperf as per the next slide . . .

Page 23: Scaling out SSIS with Parallelism, Diving Deep Into The Dataflow Engine

TxBDD.dll weight= 79,997,966

Page 24: Scaling out SSIS with Parallelism, Diving Deep Into The Dataflow Engine

TxSplit.dll weight= 13,004,998.777

Page 25: Scaling out SSIS with Parallelism, Diving Deep Into The Dataflow Engine

Too few threads = CPU starvation Too many threads = context switching The “Sweet spot” is somewhere in between \O/Elements in the dataflow that can create new threads:Execution pathsConditional splits, multicasts and the balanced data distributor create

threads for their outputs Synchronous transforms

Threading

Page 26: Scaling out SSIS with Parallelism, Diving Deep Into The Dataflow Engine

A section in the dataflow starting with a asynchronous component and ending with a transform or destination with no synchronous output.

. . . as the next slide will help illustrate.

Execution Paths, What Are They ?

Page 27: Scaling out SSIS with Parallelism, Diving Deep Into The Dataflow Engine

Execution Path 1

EXECUTION PATH

Execution Path 2

Page 28: Scaling out SSIS with Parallelism, Diving Deep Into The Dataflow Engine

Demo 4: Scaling out by splitting synchronous transforms up

Page 29: Scaling out SSIS with Parallelism, Diving Deep Into The Dataflow Engine

1 2 3 4 50

5

10

15

20

25

30

Execution Time / Thread Count

Union Pass Through

Page 30: Scaling out SSIS with Parallelism, Diving Deep Into The Dataflow Engine

1 2 3 4 5 60

20

40

60

80

100

120

CPU Consumption / Data Flow (Thread) Count

Union Pass Through

Page 31: Scaling out SSIS with Parallelism, Diving Deep Into The Dataflow Engine

1 2 3 4 5 60

20

40

60

80

100

120

140

160

180

IO Throughput Per Data Flow (Thread) Count ( MB/s)

Union Pass Through

Page 32: Scaling out SSIS with Parallelism, Diving Deep Into The Dataflow Engine

One execution path= 37,039 context switches

Two execution paths= 69,986 context switches

Page 33: Scaling out SSIS with Parallelism, Diving Deep Into The Dataflow Engine

Most of the demos so far have achieved data flow scale out via “Copy and paste”.

Service broker is highly elastic, the number of readers associated with a queue can be increased via the ALTER QUEUE command.

SSIS has no “Out of the box” equivalent to this.However the work pile pattern can be adapted in order to achieve

‘Elastic’ style scale out as the next slide will illustrate.

‘Elastic’ Scale Out

Page 34: Scaling out SSIS with Parallelism, Diving Deep Into The Dataflow Engine

Package 1

Package N

“WORK PILE”

Package 2

DTEexec . . ./set Package.variables[MaxThreads].Value;3 /set Package.variables[ThreadNumber].Value;1

DTEexec . . ./set Package.variables[MaxThreads].Value;3 /set Package.variables[ThreadNumber].Value;2

DTEexec . . ./set Package.variables[MaxThreads].Value;3 /set Package.variables[ThreadNumber].Value;3

SSIS Server 1

SSIS Server 2

SSIS Server N

SSIS “Server Farm”

Page 35: Scaling out SSIS with Parallelism, Diving Deep Into The Dataflow Engine

With a dedicated server hardware for SSIS SQL Server, how does the resource utilisation vary on each as various scale out via parallelisation techniques are used ?.

How does SSIS perform with hyper threading turned on and off ?L2/3 cache is touted as the “New flash memory”:How does the “Performance curve” behave in relation to L2/3 misses ?What can be done to influence L2/3 cache misses.

Areas For Future Investigation

Page 36: Scaling out SSIS with Parallelism, Diving Deep Into The Dataflow Engine

The performance and scalability of extracting from the source is paramount, the only wait events you want to see are ASYNC_NETWORK_IO and PREEMPTIVE_WAITFORSINGLEOBJECT.

When deleting from partitions ( and inserting into them ), significant performance gains can be had by forcing partition level locking.

Packages with fewer execution paths will tend to incur fewer context switches and scale better.

Seek out opportunities to scale out synchronous transforms by splitting them up as much as possible.

Look to leverage the work pile pattern for ‘Elastic’ scale out.

Takeaways

Page 37: Scaling out SSIS with Parallelism, Diving Deep Into The Dataflow Engine

Integration Services: Performance Tuning TechniquesElizabeth Vitt, Intellimentum and Hitachi Corporation

SQL Server Integration Services Performance Design PatternsMatt Masson, Senior Program Manager Microsoft

Increasing Throughput of Pipelines by Splitting Synchronous Transformations into Multiple Tasks

Sedat Yogurtcuoglu, Henk van der Valk, and Thomas KejserResources for SSIS Performance Best Practices

Matt Masson and others

References and Material For Further Reading

Page 38: Scaling out SSIS with Parallelism, Diving Deep Into The Dataflow Engine

Questions ?

Page 40: Scaling out SSIS with Parallelism, Diving Deep Into The Dataflow Engine

Coming up…

#SQLBITS

Speaker Title Room

Jan Pieter Posthuma ETL with Hadoop and MapReduce Theatre

Phil Quinn XML: The Marmite of SQL Server Exhibition B

Laerte Junior The Posh DBA: Troubleshooting SQL Server with PowerShell Suite 3

James Skipwith Table-Based Database Object Factories Suite 1

Neil Hambly SQL Server 2012 Memory Management Suite 2

Matija Lah SQL Server 2012 Statistical Semantic Search Suite 4