parallel execution plans joe chang jchang6@yahoo.com

Parallel Execution Plans

Joe Changjchang6@yahoo.comwww.sql-server-performance.com/joe_chang.asp

Parallel Execution PlansParallel Execution Plans

Allows single query to use multiple processors

Query should run faster but may consume more resources

Example

1 thread: 10 sec run time, 10 CPU-sec

2 threads: 6 sec run time, 12 CPU-sec

Parallel Execution ConfigurationParallel Execution Configuration

Cost Threshold For ParallelismMinimum query plan threshold for considering queries for parallel execution

Default 5: Considering increasing to 20-50 for new systems

Max Degree of ParallelismDefault 0: Can use all available processors

SQL Server determines level based on available memory and recent CPU usage

Parallel Plan OperatorsParallel Plan Operators

The Distribute Streams operator consumes a single input stream of records and produces multiple output streams. The record contents and format are not changed. Each record from the input stream appears in one of the output streams. This operator automatically preserves the relative order of the input records in the output streams. Usually, hashing is used to decide to which output stream a particular input record belongs.

The Repartition Streams operator consumes multiple streams and produces multiple streams of records. The record contents and format are not changed. Each record from an input stream is placed into one output stream. If this operator is order-preserving, then all input streams must be ordered and merged into several ordered output streams.

The Gather Streams operator consumes several input streams and produces a single output stream of records by combining the input streams. The record contents and format are not changed. If this operator is order-preserving, then all input streams must be ordered.

Execution Plan Cost FormulasExecution Plan Cost Formulas

Table Scan or Index Scan

I/O: 0.0375785 + 0.0007407 per pageCPU: 0.0000785 + 0.0000011 per row

Index Seek – Plan Formula

I/O Cost = 0.006328500 + 0.000740741 per additional page (≤1GB)

= 0.003203425 + 0.000740741 per additional page (>1GB)

CPU Cost = 0.000079600 + 0.000001100 per additional row

Bookmark Lookup – May have changed ?

I/O Cost = multiple of 0.006250000 (≤1GB)

= multiple of 0.003124925 (>1GB)

CPU Cost = 0.0000011 per row

Table Scan or Index Scan

IUD I/O Cost ~ 0.01002 – 0.01010 (>100 rows)

IUD CPU Cost = 0.000001 per row

Cost InterpretationCost Interpretation

Time in seconds? CPU time?0.0062500sec -> 160/sec

0.000740741 ->1350/sec (8KB)->169/sec(64K)-> 10.8MB/sec

S2K BOL: Administering SQL Server, Managing Servers,Setting Configuration Options: cost threshold for parallelism OptQuery cost refers to the estimated elapsed time, in seconds, required to execute a query on a specific hardware configuration.

Too fast for 7200RPM disk random I/Os.

About right for 1997 sequential disk transfer rate?

Test TableTest Table

CREATE TABLE M3A_20 (GroupID int NOT NULL,ID int NOT NULL,ID2 int NOT NULL,ID3 int NOT NULL,ID4 int NOT NULL,sID smallint NOT NULL,bID1 bigint NOT NULL,bID2 bigint NOT NULL,bID3 bigint NOT NULL,rMoney money NOT NULL,rDate datetime NOT NULL,rReal real NOT NULL,rDecimal decimal (9,4) NOT NULL,CONSTRAINT [PK_M3A_20] PRIMARY KEY CLUSTERED ( [GroupID], [ID] ) WITH FILLFACTOR = 100 )

Data Population Script 1Data Population Script 1SET NOCOUNT ON DECLARE @BatchTotal int, @BatchSize int, @TotalRows int, @BatchStart int, @BatchEnd int, @BatchRow int, @I int, @RowsPerPage bigint , @Card int , @DistinctValues intSELECT @BatchStart=1, @BatchEnd=1000, @BatchTotal=1000, @BatchSize=100000, @RowsPerPage=100, @Card=100000SELECT @TotalRows=@BatchTotal*@BatchSize SELECT @I=(@BatchStart-1)*@BatchSize+1, @DistinctValues=@TotalRows/@CardWHILE @BatchStart <= @BatchEnd BEGIN BEGIN TRANSACTION SELECT @BatchRow = @BatchStart*@BatchSize WHILE @I <= @BatchRow BEGIN INSERT M3A_20 (GroupID, ID, ID2, ID3, ID4, sID, bID1, bID2, bID3, rMoney, rDate, rReal, rDecimal) VALUES ( 1, @I, @TotalRows-@I+1, (@I-1)/@Card+1, (@TotalRows-@I)%@Card+1, @I%32768, @I, (@I-1)%@Card+1, 1+(@I-1)*@RowsPerPage/@TotalRows+((@I-1)*@RowsPerPage)%@TotalRows, 10000*rand(), DATEADD(hour,@I%3000000,'1900-01-01'), 10000*rand(), 10000*rand() ) IF @@ERROR > 0 BEGIN GOTO B END SET @I = @I+1 END COMMIT TRANSACTION CHECKPOINTPRINT CONVERT(varchar,GETDATE(),121) + ', row ' + CONVERT(varchar,@BatchRow) SET @BatchStart = @BatchStart+1END B: IF @@TRANCOUNT > 0 COMMIT TRANSACTION PRINT '01 Complete ' + CONVERT(varchar,GETDATE(),121) + ', row ' + CONVERT(varchar,@BatchRow) + ', Trancount ' + CONVERT(varchar(10),@@TRANCOUNT)

Data Population Script 1 NotesData Population Script 1 Notes

Double While LoopEach Insert/Update/Delete statement is an implicit transaction

Gets separate transaction log entry

Explicit transaction – generates a single transaction log write (max 64KB per IO)

Single TRAN for entire loop requires excessively large log file

Inserts are grouped into intermediate size batches

Data Population Scripts 2Data Population Scripts 2

DECLARE @L int SELECT @L = 1WHILE @L <= 3 BEGIN INSERT M3A_11 (GroupID,ID,ID2,ID3,ID4,sID,bID1,bID2,bID3,rMoney,rDate,rReal, rDecimal) SELECT TOP 500000 GroupID, ID, 1500001-ID, ID3, ID4, sID, bID1, bID2, bID3, rMoney, rDate, rReal, rDecimal FROM M3A_20 WHERE GroupID = 1 AND ID BETWEEN (@L-1)*500000+1 AND @L*500000 SELECT @L = @L + 1 CHECKPOINT PRINT '11 Step ' + CONVERT(varchar,@L) + ', ' + CONVERT(varchar,GETDATE(),121)END

UPDATE STATISTICS M3A_01 (PK_M3A_01) WITH FULLSCAN

CREATE STATISTICS ST_01 ON M3A_01 (ID) WITH FULLSCAN, NORECOMPUTE

Primary table populated using single row inserts in a WHILE loop,Additional tables populated with INSERT / SELECT statement

Single row inserts ~20-30K rows/secINSERT / SELECT statement ~100K+ rows/sec

Index Seek PlansIndex Seek Plans

Many rows returned,Non-parallel plan

Parallel Execution disabled

Cost: 9.34

Cost: 9.82

Cost: 4.94Parallel Plan

Index Seek DetailsIndex Seek Details

Non-parallel plan

Parallel plan

Index Seek – Non-parallelIndex Seek – Non-parallel

Cost assigned to SELECT

Index Seek, 1M rows in 11,115 pages (81 bytes/row, 90% Fill)I/O cost is: 8.2365CPU Cost is 1.1000785Cost & sub-tree Cost is correct, I/O & CPU is ½ of correct value

Index Seek – Parallel PlanIndex Seek – Parallel Plan

No cost assigned to SELECT

Index Seek I/O and CPU cost ½ of non-parallel plan

Index Seek with AggregateIndex Seek with Aggregate

Index Seek Aggregate Parallel Index Seek Aggregate Parallel Plan DetailsPlan Details

Table ScanTable Scan

Cost: 9.01

Cost: 8.26

Table Scan Details Table Scan Details

Non-parallel plan

Parallel plan

I/O cost sameCPU cost ½ of non parallel plan

Table Scan DetailsTable Scan Details

Non-parallel plan

Parallel plan

No cost on Select

No cost

I/O cost sameCPU cost ½ of non parallel plan

Parallel Plan Cost Formulas PatternsParallel Plan Cost Formulas Patterns

CPU costs are ½ of non-parallel plan

Index Seek I/O cost are also ½

Scan I/O cost is same as non-parallel plan

Parallel plan costs are based on 2 processors

Actual number of processors determined at runtime

Overhead operationsDistribute, Repartition & Gather Streams

Hash Join Hash Join

Cost: 6.50

Cost: 4.79

200,000 rows15 byte OS row size

Hash Join DetailsHash Join Details

Non-parallel plan

Parallel plan

Hash Join DetailsHash Join Details

Non-parallel plan

Parallel plan

Hash Join – Non-parallel planHash Join – Non-parallel plan

Hash Join – Parallel PlanHash Join – Parallel Plan1234

Hash Join with I/O CostHash Join with I/O Cost

900,000 rowsMAXDOP 1

Cost 74.1

Cost 85.1

Hash Join – Join I/O CostHash Join – Join I/O Cost

730,000 rows

740,000 rows

Hash Join - BitmapHash Join - Bitmap

Hash Join Cost FormulaHash Join Cost Formula

Index Seek – Plan Formula

I/O Cost = 0.006328500 + 0.000740741 per additional page (≤1GB)

= 0.003203425 + 0.000740741 per additional page (>1GB)

CPU Cost = 0.000079600 + 0.000001100 per additional row

Hash Join

CPU Cost = 0.017750000 base + 0.0000001749 (2-30 rows)

+ 0.0000000720 (100 rows)

0.000015091 per row

0.000015857 (parallel)

+ 0.000001880 per row per 4 bytes in OS

+ 0.000005320 per additional row in IS

I/O Cost = 0.000042100 per row over 64MB (Row Size+8)

0.0000036609 per 4 byte over 15B

Parallel Cost FormulaParallel Cost Formula

Base Cost 0.028500

Repartition StreamCost per row

= 0.0000024705 Base (15 Bytes) + 0.000000759 per additional 4 Bytes

Gather StreamCost per row

= 0.0000018735 Base(15) + 0.000000759 per additional 4 Bytes

Dispatch

Loop JoinLoop Join

Loop Join DetailsLoop Join Details

Non-parallel planOuter Source

Parallel planOuter Source

Inner Source cost identical for both non-parallel and parallel plans

Non-parallel plan

Parallel plan

Merge JoinMerge Join

Merge Join DetailsMerge Join Details

Non-parallel plan

Parallel plan

Non-parallel plan

Parallel plan

Non-parallel plan

Parallel plan

Index Seek + Aggregate TestIndex Seek + Aggregate Test

1 Sum 1 NULL 2 Sum 2 NULL 3 Sum 3 NULL

00.10.20.30.40.50.60.7

1 Sum 2 Sum 3 Sum

Opteron2.2GHz 1M

Xeon 2.4GHz/512K

Index Seek + Aggregate Test, Itanium 2Index Seek + Aggregate Test, Itanium 2

1 of 10M Count Sum Convert Max Money Decimal

s 1P 2P 4P 8P

Itanium 2 1.5GHz/6M

Index Seek + Aggregate Test, SUM(INT)Index Seek + Aggregate Test, SUM(INT)

Itanium 2 1.5GHz/6M

Count 1 Sum 2 Sum 3 Sum

Index Seek + Aggregate Test, NULLIndex Seek + Aggregate Test, NULL

Itanium 2 1.5GHz/6M

1 Sum 1 NULL 2 Sum 2 NULL 3 Sum

Loop Join, COUNT(*)Loop Join, COUNT(*)

Itanium 2 1.5GHz/6M

100 1,000 10,000

rows (000's)

1P 2P 4P 8P

Hash Join, COUNT(*)Hash Join, COUNT(*)

Itanium 2 1.5GHz/6M

100 1,000 10,000rows (000's)

Merge Join, COUNT(*)Merge Join, COUNT(*)

Itanium 2 1.5GHz/6M

100 1,000 10,000rows (000's)

1P 2P 4P

General RecommendationsGeneral Recommendations

Useful in DW, ETL, and maintenance activities

Use judgment on transactions processing

Is throughput more important

Or faster expensive queries

Increase Cost Threshold from 5 to 20-50

Limit MAXDOP to 4

Verify or limit parallelism on Xeon systems with Hyper-Threading enabled

Additional InformationAdditional Information

www.sql-server-performance.com/joe_chang.asp

SQL Server Quantitative Performance AnalysisSQL Server Quantitative Performance AnalysisServer System ArchitectureServer System ArchitectureProcessor PerformanceProcessor PerformanceDirect Connect Gigabit NetworkingDirect Connect Gigabit NetworkingParallel Execution PlansParallel Execution PlansLarge Data OperationsLarge Data OperationsTransferring StatisticsTransferring StatisticsSQL Server Backup Performance with Imceda LiteSpeedSQL Server Backup Performance with Imceda LiteSpeedjchang6@yahoo.com

parallel execution plans joe chang jchang6@yahoo.com

Documents

quimica chang 10ma edición - raymond chang

cs 497c – introduction to unix lecture 36: - customizing...

nat traversal speaker: chin-chang chang date:2007.4.9

tpc-h studies joe chang jchang6@yahoo.com

storage performance for data warehousing joe chang...

parallel execution plans joe chang jchang6@yahoo.com

quantitative performance analysis joe chang...

sql server scaling on big iron (numa) systems joe chang...

rmarzoghi@yahoo.com jjahani80@yahoo.com sheslami@shirazu

[xls] · web viewstomato 2011 oanaelena31@yahoo.com...

comprehensive performance with automated execution plan...

alex chang & co. - by · 2014. 9. 5. · alex chang huey...

machine learning & bioinformatics 1 tien-hao chang (darby...

by semoon chang, chief economist gulf coast center for...

· 2017. 10. 20. · taeguk 2 chang cadete taeguk 6 chang...

poslovna (preduzetniČka) kultura i poslovna etika prof. dr...

mohamady jamal@yahoo...mohamady_jamal@yahoo.com...

initiator:danielurbinap@yahoo.com;wfstate:distributed...

chin-chih chang chang@cs.twsu

chang (2008)