optimizing ssis performance

61
Oct 2014 Sydney Business Intelligence User Group ETL Optimizing SSIS Performance Aaron Jackson

Upload: remedios-kelley

Post on 02-Jan-2016

50 views

Category:

Documents


5 download

DESCRIPTION

Optimizing SSIS Performance. Aaron Jackson. Primary Sponsor. Varigence software provides clients with efficient methods for developing, managing, and using Microsoft business intelligence solutions. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Optimizing SSIS Performance

Oct 2014Sydney Business Intelligence User Group

ETL

Optimizing SSIS PerformanceAaron Jackson

Page 2: Optimizing SSIS Performance

Oct 2014Sydney Business Intelligence User Group

Primary Sponsor

Varigence software provides clients with efficient methods for developing, managing, and using Microsoft business intelligence solutions.The software ecosystem rests on the foundation of Biml, or our home-grown Business Intelligence Markup Language.www.varigence.com.au

Business Intelligence Markup Language used to accelerate business intelligence development

IDE for accelerated business intelligence solutions

Excel addin that will make anyone using PowerPivot or cubes in Excel more productive

Page 3: Optimizing SSIS Performance

Oct 2014Sydney Business Intelligence User Group

Sponsor(s)

We just know Microsoft BI…and we know it well!

www.agilebi.com.au

Hilmax Solutions provides clients with easy to use business intelligence toolsets and solutions to extract critical and actionable insights from their business data.Our consultants are experts in business and technology solutions with focus on delivering business value.www.hilmax.com.au

Page 4: Optimizing SSIS Performance

Oct 2014Sydney Business Intelligence User Group

Sponsor(s)

SQL Tools is your source for Idera SQL Server tools sales & support in the APAC Region.Idera has SQL Server tools for:

BI Monitoring Performance & Availability Backup & Maintenance Security, Auditing & Governance Administration Development

Plus, Free Tools covering most functions.Email [email protected] for Special Trial Offers.

Page 5: Optimizing SSIS Performance

Oct 2014Sydney Business Intelligence User Group

Aaron JacksonAaron Jackson is an ETL Consultant and blogger. Aaron has been working on Data Warehouse solutions in the Finance industry since 2008, and has acquired significant domain knowledge. Aaron’s primary talents are: Database Development SSIS Development PowerShell Development Performance tuning

Email [email protected] www.barkingcat.com.au

Twitter @AaronJ85AU

LinkedIn www.linkedin.com/in/aaronljackson/

Page 6: Optimizing SSIS Performance

Oct 2014Sydney Business Intelligence User Group

AbstractWith data volumes increasing in both your source data and data warehouse, and shrinking ETL windows it is important to ensure that your SSIS packages and SSIS environment are performing optimally.

In this session we will have a look at common SSIS anti-patterns, how to optimise SSIS packages, how to troubleshoot / optimise SSIS performance before finishing with SSIS best practices.

With plenty of practical advice make sure you attend this session.

Page 7: Optimizing SSIS Performance

Oct 2014Sydney Business Intelligence User Group

Agenda

1. Common SSIS Anti-Patterns

2. Diagnosing Performance

3. Performance Tuning

4. SSIS Best Practices

Page 8: Optimizing SSIS Performance

Oct 2014Sydney Business Intelligence User Group

Common SSIS Anti-Patterns

1. Blocking Operations

2. Not using T-SQL

3. Unnecessary Data

4. Using compression and indexes

5. Not following the KISS principle

Page 9: Optimizing SSIS Performance

Oct 2014Sydney Business Intelligence User Group

Blocking Operations

• Must process all input rows before producing output• All downstream components must wait for the blocking operation (slows everything

down)• Blocking components use a new memory buffer (uses additional memory)• Blocking components are also known as asynchronous components

Page 10: Optimizing SSIS Performance

Oct 2014Sydney Business Intelligence User Group

Blocking Operations

Why is this an anti-pattern?

• Inefficient use of time and resources

• Entire data set must be loaded before downstream processing can continue

• Can cause paging to disk

• Goes against the core design principles of SSIS

• Design SSIS packages to keep the data “moving”• Design SSIS packages to keep the data in memory

Page 11: Optimizing SSIS Performance

Oct 2014Sydney Business Intelligence User Group

Blocking Operations

• The most common blocking operation is the Sort operator

• Also common is the Aggregate operator

Page 12: Optimizing SSIS Performance

Oct 2014Sydney Business Intelligence User Group

Blocking Operations• Remove sort operations by pre-sorting data

• Have extract ordered by business keys

• Use order by on SQL source destination

• Remove aggregates by using SQL where possible

• COUNT, SUM, MAX, GROUP BY, AVG, STDEV

Page 13: Optimizing SSIS Performance

Oct 2014Sydney Business Intelligence User Group

Not using T-SQL• T-SQL is a powerful language for set based operations • T-SQL is much more efficient for computing aggregates and performing updates

• Recent updates to the database engine make T-SQL even faster

• Column store indexes• In-memory tables• Improved cardinality estimator

Page 14: Optimizing SSIS Performance

Oct 2014Sydney Business Intelligence User Group

Not using T-SQL• SSIS is not set based, it is row based

• SSIS is better for complex logical decisions and error paths among the dataflow (think data cleansing and business defaulting rules)

• T-SQL will fail upon most errors (think SELECT 1/0 )

• Aggregates and data manipulation are typically faster using T-SQL

Page 15: Optimizing SSIS Performance

Oct 2014Sydney Business Intelligence User Group

Not using T-SQLWhy is this an anti-pattern? • SQL Server is a highly efficient system, that should be used where appropriate

• Adding T-SQL to your arsenal can speed up your designs

“If the only tool you have is a hammer, every problem looks like a nail.”

Page 16: Optimizing SSIS Performance

Oct 2014Sydney Business Intelligence User Group

Unnecessary Data Unused columns from the source destination

Using wider data types than required

Verbose logging settings (SSIS log)

Not taking advantage of bulk-logged T-SQL statements

Page 17: Optimizing SSIS Performance

Oct 2014Sydney Business Intelligence User Group

Unnecessary DataWhy is this an anti-pattern?

Unnecessary data leads to inefficient designs

Packages are doing more work than is required

Wasteful use of resources

Page 18: Optimizing SSIS Performance

Oct 2014Sydney Business Intelligence User Group

Unnecessary Data Use truncate instead of delete statements

Take advantage of bulk insertwhere possible

Configure SSIS application logging to report OnError and OnWarning events only (In production)

Page 19: Optimizing SSIS Performance

Oct 2014Sydney Business Intelligence User Group

Unnecessary Data Remove unused columns

Reduce memory needed per row

Increase the amount of rows thatcan be processed

Page 20: Optimizing SSIS Performance

Oct 2014Sydney Business Intelligence User Group

Unnecessary Data Sharpen data types

Page 21: Optimizing SSIS Performance

Oct 2014Sydney Business Intelligence User Group

Compression and Indexes

• Table Compression was introduced in SQL Server 2008

• Row level or Page level can be used

• Compression can reduce storage requirements

• Compression can speed up read queries

• Indexes can be either clustered indexes or non-clustered indexes

Page 22: Optimizing SSIS Performance

Oct 2014Sydney Business Intelligence User Group

Compression and IndexesWhy is this an anti-pattern?

• Compression slows down inserts• 35% penalty for row compression• 208% penalty for page compression

• Indexes also add overhead and slow down inserts

Page 23: Optimizing SSIS Performance

Oct 2014Sydney Business Intelligence User Group

Compression and Indexes Remove compression from target destination tables

Remove clustered indexes from destination tables

Use partitioning with switch to manage performance instead

Page 24: Optimizing SSIS Performance

Oct 2014Sydney Business Intelligence User Group

Not following the KISS principle KISS stands for Keep It Simple, Stupid.

Don’t over complicate designs

SSIS does not forgive poor design

Avoid monolithic package design

Page 25: Optimizing SSIS Performance

Oct 2014Sydney Business Intelligence User Group

Not following the KISS principleWhy is this an anti-pattern?

Complicated designs are hard to support, troubleshoot and debug

Complicated designs are hard to understand

Essentially this wastes people’s time

More than likely a sub-optimal design (slow performance)

Page 26: Optimizing SSIS Performance

Oct 2014Sydney Business Intelligence User Group

Not following the KISS principle Break your design into logical units of work

Use sub-packages with a distinct purpose

Use sequence containers within packages for logical units of work

Follow naming conventions

Use package configurations

Page 27: Optimizing SSIS Performance

Oct 2014Sydney Business Intelligence User Group

Diagnosing Performance

1. Server O/S Performance

2. SSIS Performance

Page 28: Optimizing SSIS Performance

Oct 2014Sydney Business Intelligence User Group

Server Performance Things to pay attention to in Perfmon

CPU

Memory

Disk

Network

Page 29: Optimizing SSIS Performance

Oct 2014Sydney Business Intelligence User Group

Server Performance CPU

Process / % Processor Time (Total) for dtexec

dtexec should be close to 100% CPU load

Not reaching close to 100% could mean application contention (sql server)

Hardware contention (Disk I/O or Memory sub optimal)

Design limitation. Not using parallelism or too many single-threaded tasks

Page 30: Optimizing SSIS Performance

Oct 2014Sydney Business Intelligence User Group

Server Performance Memory

Process / Private Bytes for dtexec – The amount of memory used by dtexec

Process / Working Set for dtexec – The amount of memory allocated by dtexec

Memory / Page Reads / sec – Total memory pressure on the system. Suggested by Microsoft that a value of over 500 indicates memory pressure

Page 31: Optimizing SSIS Performance

Oct 2014Sydney Business Intelligence User Group

Server Performance Disk I/O

Large subject area (Victor recently did a talk as this relates to SQL Server)

Create a performance baseline when designing your system for comparison

Average Disk/sec Read – Ideally you want to see 30ms or less for OLAP

Average Disk/sec Write - Ideally you want to see 30ms or less for OLAP

Page 32: Optimizing SSIS Performance

Oct 2014Sydney Business Intelligence User Group

Server Performance Network

Network Interface / Current Bandwidth – estimate bandwidth

Network Interface / Bytes Total / sec – the rate bytes are TX and RX over each NIC

Network Interface / Transfers/sec – to measure IOPS

SSIS will move data over your network as fast as your network can handle it

Page 33: Optimizing SSIS Performance

Oct 2014Sydney Business Intelligence User Group

SSIS Performance Things to pay attention to in Perfmon

Buffer memory – Total memory (virtual and physical) used by dataflow engine

Buffers in use – Number of buffers used by dataflow engine

Buffers spooled – The number of buffers written to disk. When this number is rising, you arepaging memory to disk

Page 34: Optimizing SSIS Performance

Oct 2014Sydney Business Intelligence User Group

Performance Tuning1. Remove Blocking Operations

2. Sharpen Data Types

3. T-SQL / Set-based Operations

4. Parallelism

5. Data Flow Tuning

6. Network

7. Lookups

Page 35: Optimizing SSIS Performance

Oct 2014Sydney Business Intelligence User Group

Remove Blocking Operations Asynchronous components

Inefficient memory use

Slows execution

Prevents downstream components from running

Can cause paging to disk

Page 36: Optimizing SSIS Performance

Oct 2014Sydney Business Intelligence User Group

Sharpen Data Types Make data types as narrow as possible

Use less memory and increase throughput

Note – Excessive casting will degrade performance

Be aware of precision issues when using money, decimal and float types

Page 37: Optimizing SSIS Performance

Oct 2014Sydney Business Intelligence User Group

T-SQL / Set-based Operations T-SQL is faster for

Aggregates

Grouping

Updates

Pivot

Unpivot

Page 38: Optimizing SSIS Performance

Oct 2014Sydney Business Intelligence User Group

T-SQL / Set-based Operations T-SQL can clean your data on the fly

SELECT LTRIM(RTRIM(FIRST_NAME)) AS [FIRST_NAME]

You can use T-SQL for type conversions, coalescing and data type sharpening

However you lack the error handling power that SSIS provides

Page 39: Optimizing SSIS Performance

Oct 2014Sydney Business Intelligence User Group

Parallelism SSIS allows you to have concurrent streams of work

If two steams of work have no dependencies, why not do it in parallel?

The arrows point to tasksthat will execute concurrently

Page 40: Optimizing SSIS Performance

Oct 2014Sydney Business Intelligence User Group

Parallelism In your data flow task, allocating enough to run concurrently

This example ismonolithic and not modular

(Don’t do this)

Page 41: Optimizing SSIS Performance

Oct 2014Sydney Business Intelligence User Group

Data Flow Tuning Three main levers to pull

DefaultBufferMaxRows

DefaultBufferSize

EngineThreads

Page 42: Optimizing SSIS Performance

Oct 2014Sydney Business Intelligence User Group

Data Flow Tuning DefaultBufferMaxRows

This is the maximum amount of rows that can be loaded at any one time.

This setting will only be reached if the size in memory of DefaultBufferMaxRows is less than or equal to DefaultBufferSize

Page 43: Optimizing SSIS Performance

Oct 2014Sydney Business Intelligence User Group

Data Flow Tuning DefaultBufferSize

This is the size of the buffer in bytes

The maximum value that can be set here is 100MB

Each EngineThread has its own buffer. Something to keep in mind when you are planning your resource capacity.

Page 44: Optimizing SSIS Performance

Oct 2014Sydney Business Intelligence User Group

Data Flow Tuning EngineThreads

This setting dictates how many threads the package is able to use in parallel

The usefulness of this setting really depends on your package design

A bigger number is not always better

Page 45: Optimizing SSIS Performance

Oct 2014Sydney Business Intelligence User Group

Network If you have a dedicated SQL Server and a separate application server for SSIS

Tune the connection manager packet size

Default value is 4K

Max value is 32K

Reduce network overhead

Page 46: Optimizing SSIS Performance

Oct 2014Sydney Business Intelligence User Group

Lookups Three modes of operation

Full Cache – Uses most memory, but is fastest

Partial Cache – Loads on the fly, can be expensive to performance

No Cache – Uses no memory, takes longer

Alternative – Use second source with merge join on lookup key

Page 47: Optimizing SSIS Performance

Oct 2014Sydney Business Intelligence User Group

SSIS Best Practices1. Parallel is fastest

2. Naming Conventions

3. Use annotations

4. Modular Designs

5. Use configurations

6. Separation Of Concerns

7. Emerging Technologies

Page 48: Optimizing SSIS Performance

Oct 2014Sydney Business Intelligence User Group

Parallel is fastest Take advantage of the SSIS Architecture

It was designed to run tasks in parallel

MaxConcurrentExecutables must be set to -1 to allow parallel execution of tasks

Page 49: Optimizing SSIS Performance

Oct 2014Sydney Business Intelligence User Group

Parallel is fastestExample

Parent-Child design pattern

23 sub packages

Parallel execution

SSIS will only execute what itcan handle at any one time

Page 50: Optimizing SSIS Performance

Oct 2014Sydney Business Intelligence User Group

Naming Conventions Control flow tasks Task Prefix

For Loop Container FLC

Foreach Loop Container FELC

Sequence Container SEQC

ActiveX Script AXS

Analysis Services Execute DDL ASE

Analysis Services Processing ASP

Bulk Insert BLK

Data Flow DFT

Data Mining Query DMQ

Execute Package EPT

Execute Process EPR

Execute SQL SQL

File System FSYS

FTP FTP

Message Queue MSMQ

Script SCR

Send Mail SMT

Transfer Database TDB

Transfer Error Messages TEM

Transfer Jobs TJT

Transfer Logins TLT

Transfer Master Stored Procedures TSP

Transfer SQL Server Objects TSO

Web Service WST

WMI Data Reader WMID

WMI Event Watcher WMIE

XML XML

Expression EXPR

Page 51: Optimizing SSIS Performance

Oct 2014Sydney Business Intelligence User Group

Naming Conventions Data flow components

Component Prefix

OLE DB Command CMD

Pivot PVT

Row Count CNT

Row Sampling RSMP

Script Component SCR

Slowly Changing Dimension SCD

Sort SRT

Union All ALL

Unpivot UPVT

Flat File Destination FF_DST

OLE DB Destination OLE_DST

Component Prefix

Flat File Source FF_SRC

OLE DB Source OLE_SRC

Conditional Split CSPL

Copy Column CPYC

Data Conversion DCNV

Derived Column DER

Lookup LKP

Merge MRG

Merge Join MRGJ

Multicast MLT

Page 52: Optimizing SSIS Performance

Oct 2014Sydney Business Intelligence User Group

Naming Conventions Example: SSIS Log tables

The source can be identified as a Data Flow Task (DFT prefix)

Page 53: Optimizing SSIS Performance

Oct 2014Sydney Business Intelligence User Group

Use annotations Comments make it clear to your future self and others what you were trying to do

Makes it easier to read the logic

Don’t be that guy or girl(the one that doesn’t comment their code)

Page 54: Optimizing SSIS Performance

Oct 2014Sydney Business Intelligence User Group

Modular Designs Use sequence containers

Logically group tasks together

Easier to develop and debug (disable entire containers)

Page 55: Optimizing SSIS Performance

Oct 2014Sydney Business Intelligence User Group

Modular DesignsExample

Page 56: Optimizing SSIS Performance

Oct 2014Sydney Business Intelligence User Group

Use configurations Different Configuration Options (Package Deployment Model)

XML File Environment Variable Registry Entry Parent Package Variable SQL Server

Page 57: Optimizing SSIS Performance

Oct 2014Sydney Business Intelligence User Group

Use configurations Configurations make your designs portable

This makes migrating changes between environments very easy

Configurations allow you change the behaviour of your package without changing the code

Page 58: Optimizing SSIS Performance

Oct 2014Sydney Business Intelligence User Group

Separation Of Concerns SoC is an object-oriented design principle

In SSIS, this typically refers to the parent-child design pattern (sub packages)

One example of this would be an orchestration package, that controls each logical stage of an ETL process

Page 59: Optimizing SSIS Performance

Oct 2014Sydney Business Intelligence User Group

Separation Of ConcernsExample

Sequence containers logically grouptasks together

Parent-Child design pattern

Page 60: Optimizing SSIS Performance

Oct 2014Sydney Business Intelligence User Group

Emerging Technologies Similar to NUnit and Junit, we now have SSISTester

This is an automated unit and integration testing framework for SSIS packages

This could see the emergence of Test Driven Development with ETL Development

Ties in with Continuous Integration and Continuous Delivery

Read more http://msdn.microsoft.com/en-us/magazine/dn342874.aspx

Page 61: Optimizing SSIS Performance

Oct 2014Sydney Business Intelligence User Group

Sponsors