parallel execution plans joe chang [email protected]

47
Parallel Execution Plans Joe Chang [email protected] www.sql-server-performance.com/joe _chang.asp

Upload: bruce-ray

Post on 04-Jan-2016

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 2: Parallel Execution Plans Joe Chang jchang6@yahoo.com

Parallel Execution PlansParallel Execution Plans

Allows single query to use multiple processors

Query should run faster but may consume more resources

Example

1 thread: 10 sec run time, 10 CPU-sec

2 threads: 6 sec run time, 12 CPU-sec

Page 3: Parallel Execution Plans Joe Chang jchang6@yahoo.com

Parallel Execution ConfigurationParallel Execution Configuration

Cost Threshold For ParallelismMinimum query plan threshold for considering queries for parallel execution

Default 5: Considering increasing to 20-50 for new systems

Max Degree of ParallelismDefault 0: Can use all available processors

SQL Server determines level based on available memory and recent CPU usage

Page 4: Parallel Execution Plans Joe Chang jchang6@yahoo.com

Parallel Plan OperatorsParallel Plan Operators

The Distribute Streams operator consumes a single input stream of records and produces multiple output streams. The record contents and format are not changed. Each record from the input stream appears in one of the output streams. This operator automatically preserves the relative order of the input records in the output streams. Usually, hashing is used to decide to which output stream a particular input record belongs.

The Repartition Streams operator consumes multiple streams and produces multiple streams of records. The record contents and format are not changed. Each record from an input stream is placed into one output stream. If this operator is order-preserving, then all input streams must be ordered and merged into several ordered output streams.

The Gather Streams operator consumes several input streams and produces a single output stream of records by combining the input streams. The record contents and format are not changed. If this operator is order-preserving, then all input streams must be ordered.

Page 5: Parallel Execution Plans Joe Chang jchang6@yahoo.com

Execution Plan Cost FormulasExecution Plan Cost Formulas

Table Scan or Index Scan

I/O: 0.0375785 + 0.0007407 per pageCPU: 0.0000785 + 0.0000011 per row

Index Seek – Plan Formula

I/O Cost = 0.006328500 + 0.000740741 per additional page (≤1GB)

= 0.003203425 + 0.000740741 per additional page (>1GB)

CPU Cost = 0.000079600 + 0.000001100 per additional row

Bookmark Lookup – May have changed ?

I/O Cost = multiple of 0.006250000 (≤1GB)

= multiple of 0.003124925 (>1GB)

CPU Cost = 0.0000011 per row

Table Scan or Index Scan

IUD I/O Cost ~ 0.01002 – 0.01010 (>100 rows)

IUD CPU Cost = 0.000001 per row

Page 6: Parallel Execution Plans Joe Chang jchang6@yahoo.com

Cost InterpretationCost Interpretation

Time in seconds? CPU time?0.0062500sec -> 160/sec

0.000740741 ->1350/sec (8KB)->169/sec(64K)-> 10.8MB/sec

S2K BOL: Administering SQL Server, Managing Servers,Setting Configuration Options: cost threshold for parallelism OptQuery cost refers to the estimated elapsed time, in seconds, required to execute a query on a specific hardware configuration.

Too fast for 7200RPM disk random I/Os.

About right for 1997 sequential disk transfer rate?

Page 7: Parallel Execution Plans Joe Chang jchang6@yahoo.com

Test TableTest Table

CREATE TABLE M3A_20 (GroupID int NOT NULL,ID int NOT NULL,ID2 int NOT NULL,ID3 int NOT NULL,ID4 int NOT NULL,sID smallint NOT NULL,bID1 bigint NOT NULL,bID2 bigint NOT NULL,bID3 bigint NOT NULL,rMoney money NOT NULL,rDate datetime NOT NULL,rReal real NOT NULL,rDecimal decimal (9,4) NOT NULL,CONSTRAINT [PK_M3A_20] PRIMARY KEY CLUSTERED ( [GroupID], [ID] ) WITH FILLFACTOR = 100 )

GO

Page 8: Parallel Execution Plans Joe Chang jchang6@yahoo.com

Data Population Script 1Data Population Script 1SET NOCOUNT ON DECLARE @BatchTotal int, @BatchSize int, @TotalRows int, @BatchStart int, @BatchEnd int, @BatchRow int, @I int, @RowsPerPage bigint , @Card int , @DistinctValues intSELECT @BatchStart=1, @BatchEnd=1000, @BatchTotal=1000, @BatchSize=100000, @RowsPerPage=100, @Card=100000SELECT @TotalRows=@BatchTotal*@BatchSize SELECT @I=(@BatchStart-1)*@BatchSize+1, @DistinctValues=@TotalRows/@CardWHILE @BatchStart <= @BatchEnd BEGIN BEGIN TRANSACTION SELECT @BatchRow = @BatchStart*@BatchSize WHILE @I <= @BatchRow BEGIN INSERT M3A_20 (GroupID, ID, ID2, ID3, ID4, sID, bID1, bID2, bID3, rMoney, rDate, rReal, rDecimal) VALUES ( 1, @I, @TotalRows-@I+1, (@I-1)/@Card+1, (@TotalRows-@I)%@Card+1, @I%32768, @I, (@I-1)%@Card+1, 1+(@I-1)*@RowsPerPage/@TotalRows+((@I-1)*@RowsPerPage)%@TotalRows, 10000*rand(), DATEADD(hour,@I%3000000,'1900-01-01'), 10000*rand(), 10000*rand() ) IF @@ERROR > 0 BEGIN GOTO B END SET @I = @I+1 END COMMIT TRANSACTION CHECKPOINTPRINT CONVERT(varchar,GETDATE(),121) + ', row ' + CONVERT(varchar,@BatchRow) SET @BatchStart = @BatchStart+1END B: IF @@TRANCOUNT > 0 COMMIT TRANSACTION PRINT '01 Complete ' + CONVERT(varchar,GETDATE(),121) + ', row ' + CONVERT(varchar,@BatchRow) + ', Trancount ' + CONVERT(varchar(10),@@TRANCOUNT)

Page 9: Parallel Execution Plans Joe Chang jchang6@yahoo.com

Data Population Script 1 NotesData Population Script 1 Notes

Double While LoopEach Insert/Update/Delete statement is an implicit transaction

Gets separate transaction log entry

Explicit transaction – generates a single transaction log write (max 64KB per IO)

Single TRAN for entire loop requires excessively large log file

Inserts are grouped into intermediate size batches

Page 10: Parallel Execution Plans Joe Chang jchang6@yahoo.com

Data Population Scripts 2Data Population Scripts 2

DECLARE @L int SELECT @L = 1WHILE @L <= 3 BEGIN INSERT M3A_11 (GroupID,ID,ID2,ID3,ID4,sID,bID1,bID2,bID3,rMoney,rDate,rReal, rDecimal) SELECT TOP 500000 GroupID, ID, 1500001-ID, ID3, ID4, sID, bID1, bID2, bID3, rMoney, rDate, rReal, rDecimal FROM M3A_20 WHERE GroupID = 1 AND ID BETWEEN (@L-1)*500000+1 AND @L*500000 SELECT @L = @L + 1 CHECKPOINT PRINT '11 Step ' + CONVERT(varchar,@L) + ', ' + CONVERT(varchar,GETDATE(),121)END

UPDATE STATISTICS M3A_01 (PK_M3A_01) WITH FULLSCAN

CREATE STATISTICS ST_01 ON M3A_01 (ID) WITH FULLSCAN, NORECOMPUTE

Primary table populated using single row inserts in a WHILE loop,Additional tables populated with INSERT / SELECT statement

Single row inserts ~20-30K rows/secINSERT / SELECT statement ~100K+ rows/sec

Page 11: Parallel Execution Plans Joe Chang jchang6@yahoo.com

Index Seek PlansIndex Seek Plans

Many rows returned,Non-parallel plan

Parallel Execution disabled

Cost: 9.34

Cost: 9.82

Cost: 4.94Parallel Plan

Page 12: Parallel Execution Plans Joe Chang jchang6@yahoo.com

Index Seek DetailsIndex Seek Details

Non-parallel plan

Parallel plan

Page 13: Parallel Execution Plans Joe Chang jchang6@yahoo.com

Index Seek – Non-parallelIndex Seek – Non-parallel

Cost assigned to SELECT

Index Seek, 1M rows in 11,115 pages (81 bytes/row, 90% Fill)I/O cost is: 8.2365CPU Cost is 1.1000785Cost & sub-tree Cost is correct, I/O & CPU is ½ of correct value

Page 14: Parallel Execution Plans Joe Chang jchang6@yahoo.com

Index Seek – Parallel PlanIndex Seek – Parallel Plan

No cost assigned to SELECT

Index Seek I/O and CPU cost ½ of non-parallel plan

Page 15: Parallel Execution Plans Joe Chang jchang6@yahoo.com

Index Seek with AggregateIndex Seek with Aggregate

1234

Page 16: Parallel Execution Plans Joe Chang jchang6@yahoo.com

Index Seek Aggregate Parallel Index Seek Aggregate Parallel Plan DetailsPlan Details

1

2

3

4

Page 17: Parallel Execution Plans Joe Chang jchang6@yahoo.com

Table ScanTable Scan

Cost: 9.01

Cost: 8.26

Page 18: Parallel Execution Plans Joe Chang jchang6@yahoo.com

Table Scan Details Table Scan Details

Non-parallel plan

Parallel plan

I/O cost sameCPU cost ½ of non parallel plan

Page 19: Parallel Execution Plans Joe Chang jchang6@yahoo.com

Table Scan DetailsTable Scan Details

Non-parallel plan

Parallel plan

No cost on Select

No cost

I/O cost sameCPU cost ½ of non parallel plan

Page 20: Parallel Execution Plans Joe Chang jchang6@yahoo.com

Parallel Plan Cost Formulas PatternsParallel Plan Cost Formulas Patterns

CPU costs are ½ of non-parallel plan

Index Seek I/O cost are also ½

Scan I/O cost is same as non-parallel plan

Parallel plan costs are based on 2 processors

Actual number of processors determined at runtime

Overhead operationsDistribute, Repartition & Gather Streams

Page 21: Parallel Execution Plans Joe Chang jchang6@yahoo.com

Hash Join Hash Join

Cost: 6.50

Cost: 4.79

200,000 rows15 byte OS row size

Page 22: Parallel Execution Plans Joe Chang jchang6@yahoo.com

Hash Join DetailsHash Join Details

Non-parallel plan

Parallel plan

Page 23: Parallel Execution Plans Joe Chang jchang6@yahoo.com

Hash Join DetailsHash Join Details

Non-parallel plan

Parallel plan

Page 24: Parallel Execution Plans Joe Chang jchang6@yahoo.com

Hash Join – Non-parallel planHash Join – Non-parallel plan

Page 25: Parallel Execution Plans Joe Chang jchang6@yahoo.com

Hash Join – Parallel PlanHash Join – Parallel Plan1234

1

2

3

4

Page 26: Parallel Execution Plans Joe Chang jchang6@yahoo.com

Hash Join with I/O CostHash Join with I/O Cost

900,000 rowsMAXDOP 1

Cost 74.1

Cost 85.1

Page 27: Parallel Execution Plans Joe Chang jchang6@yahoo.com

Hash Join – Join I/O CostHash Join – Join I/O Cost

730,000 rows

740,000 rows

Page 28: Parallel Execution Plans Joe Chang jchang6@yahoo.com

Hash Join - BitmapHash Join - Bitmap

Page 29: Parallel Execution Plans Joe Chang jchang6@yahoo.com

Hash Join Cost FormulaHash Join Cost Formula

Index Seek – Plan Formula

I/O Cost = 0.006328500 + 0.000740741 per additional page (≤1GB)

= 0.003203425 + 0.000740741 per additional page (>1GB)

CPU Cost = 0.000079600 + 0.000001100 per additional row

Hash Join

CPU Cost = 0.017750000 base + 0.0000001749 (2-30 rows)

+ 0.0000000720 (100 rows)

0.000015091 per row

0.000015857 (parallel)

+ 0.000001880 per row per 4 bytes in OS

+ 0.000005320 per additional row in IS

I/O Cost = 0.000042100 per row over 64MB (Row Size+8)

0.0000036609 per 4 byte over 15B

Page 30: Parallel Execution Plans Joe Chang jchang6@yahoo.com

Parallel Cost FormulaParallel Cost Formula

Base Cost 0.028500

Repartition StreamCost per row

= 0.0000024705 Base (15 Bytes) + 0.000000759 per additional 4 Bytes

Gather StreamCost per row

= 0.0000018735 Base(15) + 0.000000759 per additional 4 Bytes

Dispatch

Page 31: Parallel Execution Plans Joe Chang jchang6@yahoo.com

Loop JoinLoop Join

Page 32: Parallel Execution Plans Joe Chang jchang6@yahoo.com

Loop Join DetailsLoop Join Details

Non-parallel planOuter Source

Parallel planOuter Source

Page 33: Parallel Execution Plans Joe Chang jchang6@yahoo.com

Loop Join DetailsLoop Join Details

Inner Source cost identical for both non-parallel and parallel plans

Page 34: Parallel Execution Plans Joe Chang jchang6@yahoo.com

Loop Join DetailsLoop Join Details

Non-parallel plan

Parallel plan

Page 35: Parallel Execution Plans Joe Chang jchang6@yahoo.com

Merge JoinMerge Join

Page 36: Parallel Execution Plans Joe Chang jchang6@yahoo.com

Merge Join DetailsMerge Join Details

Non-parallel plan

Parallel plan

Page 37: Parallel Execution Plans Joe Chang jchang6@yahoo.com

Merge Join DetailsMerge Join Details

Non-parallel plan

Parallel plan

Page 38: Parallel Execution Plans Joe Chang jchang6@yahoo.com

Merge Join DetailsMerge Join Details

Non-parallel plan

Parallel plan

Page 39: Parallel Execution Plans Joe Chang jchang6@yahoo.com

Index Seek + Aggregate TestIndex Seek + Aggregate Test

0

0.2

0.4

0.6

0.8

1

1.2

1 Sum 1 NULL 2 Sum 2 NULL 3 Sum 3 NULL

Du

rati

on

/1K

ro

ws

(ms)

1P 2P

00.10.20.30.40.50.60.7

1 Sum 2 Sum 3 Sum

Du

rati

on

/1K

ro

ws

(ms)

1P

2P

Opteron2.2GHz 1M

Xeon 2.4GHz/512K

Page 40: Parallel Execution Plans Joe Chang jchang6@yahoo.com

Index Seek + Aggregate Test, Itanium 2Index Seek + Aggregate Test, Itanium 2

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

1 of 10M Count Sum Convert Max Money Decimal

Du

rati

on

ms/

1K r

ow

s 1P 2P 4P 8P

Itanium 2 1.5GHz/6M

Page 41: Parallel Execution Plans Joe Chang jchang6@yahoo.com

Index Seek + Aggregate Test, SUM(INT)Index Seek + Aggregate Test, SUM(INT)

Itanium 2 1.5GHz/6M

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Count 1 Sum 2 Sum 3 Sum

1P 2P

4P 8P

Page 42: Parallel Execution Plans Joe Chang jchang6@yahoo.com

Index Seek + Aggregate Test, NULLIndex Seek + Aggregate Test, NULL

Itanium 2 1.5GHz/6M

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1 Sum 1 NULL 2 Sum 2 NULL 3 Sum

1P

2P

4P

8P

Page 43: Parallel Execution Plans Joe Chang jchang6@yahoo.com

Loop Join, COUNT(*)Loop Join, COUNT(*)

Itanium 2 1.5GHz/6M

0

1

2

3

4

5

6

7

100 1,000 10,000

rows (000's)

Du

rati

on

/1K

ro

ws

(ms)

1P 2P 4P 8P

Page 44: Parallel Execution Plans Joe Chang jchang6@yahoo.com

Hash Join, COUNT(*)Hash Join, COUNT(*)

Itanium 2 1.5GHz/6M

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

100 1,000 10,000rows (000's)

Du

rati

on

/1K

ro

ws

(ms)

1P

2P

4P

8P

Page 45: Parallel Execution Plans Joe Chang jchang6@yahoo.com

Merge Join, COUNT(*)Merge Join, COUNT(*)

Itanium 2 1.5GHz/6M

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

100 1,000 10,000rows (000's)

Du

rati

on

/1K

ro

ws

(ms)

1P 2P 4P

Page 46: Parallel Execution Plans Joe Chang jchang6@yahoo.com

General RecommendationsGeneral Recommendations

Useful in DW, ETL, and maintenance activities

Use judgment on transactions processing

Is throughput more important

Or faster expensive queries

Increase Cost Threshold from 5 to 20-50

Limit MAXDOP to 4

Verify or limit parallelism on Xeon systems with Hyper-Threading enabled

Page 47: Parallel Execution Plans Joe Chang jchang6@yahoo.com

Additional InformationAdditional Information

www.sql-server-performance.com/joe_chang.asp

SQL Server Quantitative Performance AnalysisSQL Server Quantitative Performance AnalysisServer System ArchitectureServer System ArchitectureProcessor PerformanceProcessor PerformanceDirect Connect Gigabit NetworkingDirect Connect Gigabit NetworkingParallel Execution PlansParallel Execution PlansLarge Data OperationsLarge Data OperationsTransferring StatisticsTransferring StatisticsSQL Server Backup Performance with Imceda LiteSpeedSQL Server Backup Performance with Imceda [email protected]