parallelization in action with sas analytic procedures · title: parallelization in action with sas...

34
Copyright © 2003, SAS Institute Inc. All rights reserved. SAS is a registered trademark or trademark of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are registered trademarks or Trademarks of their respective companies Parallelization in Action with SAS Analytic Procedures Robert Cohen Senior Research Statistician Linear Models R&D

Upload: others

Post on 25-Jul-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Parallelization in Action with SAS Analytic Procedures · Title: Parallelization in Action with SAS Analytic Procedures Author: Bob Tschudi Keywords: SAS Analytic Procedures Created

Copyright © 2003, SAS Institute Inc. All rights reserved. SAS is a registered trademark or trademark of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are registered trademarks or Trademarks of their respective companies

Parallelization in Action with SAS Analytic Procedures Robert Cohen Senior Research Statistician Linear Models R&D

Page 2: Parallelization in Action with SAS Analytic Procedures · Title: Parallelization in Action with SAS Analytic Procedures Author: Bob Tschudi Keywords: SAS Analytic Procedures Created

Copyright © 2003, SAS Institute Inc. All rights reserved. 2

Your Rise and Shine Menu

Parallelization adds value to the IVC

Multithreading to provide parallel execution

How do you measure scalability

Selected demonstrations

Marketing: I should have slept in

Boring: I should have left when I had the chance

Insulting: This guy thinks I’m a 10 year old

Deceiving: The truth, but not the whole truth

Page 3: Parallelization in Action with SAS Analytic Procedures · Title: Parallelization in Action with SAS Analytic Procedures Author: Bob Tschudi Keywords: SAS Analytic Procedures Created

Copyright © 2003, SAS Institute Inc. All rights reserved. 3

IVC: Parallelization Adds Value

Complete today’s analyses faster

Analyze tomorrow’s problems within today’s time constraints

Multithreaded Procedures

Parallel access to data

Page 4: Parallelization in Action with SAS Analytic Procedures · Title: Parallelization in Action with SAS Analytic Procedures Author: Bob Tschudi Keywords: SAS Analytic Procedures Created

Copyright © 2003, SAS Institute Inc. All rights reserved. 4

The IVC in Action

I C

V

Page 5: Parallelization in Action with SAS Analytic Procedures · Title: Parallelization in Action with SAS Analytic Procedures Author: Bob Tschudi Keywords: SAS Analytic Procedures Created

Copyright © 2003, SAS Institute Inc. All rights reserved. 5

Changes You Have to Make in Your Legacy Code

TINSTAAFL

There are exceptions

Page 6: Parallelization in Action with SAS Analytic Procedures · Title: Parallelization in Action with SAS Analytic Procedures Author: Bob Tschudi Keywords: SAS Analytic Procedures Created

Copyright © 2003, SAS Institute Inc. All rights reserved. 6

Unthreaded GLM: 2 CPU Box

Thread View: Running Waiting I/O Blocked Exited

GLM runs in a single thread

GLM never blocks this thread

GLM work is NOT done in parallel

Page 7: Parallelization in Action with SAS Analytic Procedures · Title: Parallelization in Action with SAS Analytic Procedures Author: Bob Tschudi Keywords: SAS Analytic Procedures Created

Copyright © 2003, SAS Institute Inc. All rights reserved. 7

Unthreaded GLM: 2 CPU Box

Thread View: Running Waiting I/O Blocked Exited

CPU Utilization: CPU 1 CPU 2

Page 8: Parallelization in Action with SAS Analytic Procedures · Title: Parallelization in Action with SAS Analytic Procedures Author: Bob Tschudi Keywords: SAS Analytic Procedures Created

Copyright © 2003, SAS Institute Inc. All rights reserved. 8

Unthreaded GLM: 2 CPU Box

Thread View: Running Waiting I/O Blocked Exited

Combined CPU Utilization

100

50.

0.

Page 9: Parallelization in Action with SAS Analytic Procedures · Title: Parallelization in Action with SAS Analytic Procedures Author: Bob Tschudi Keywords: SAS Analytic Procedures Created

Copyright © 2003, SAS Institute Inc. All rights reserved. 9

Multithreaded GLM: 1 Active Thread 2 CPU Box

Thread View: Running Waiting I/O Blocked Exited

Worker threads used for specific tasks

Invert X’X

matrix

GLM thread blocks while a worker thread is active

GLM Thread

GLM does not execute in parallel

Page 10: Parallelization in Action with SAS Analytic Procedures · Title: Parallelization in Action with SAS Analytic Procedures Author: Bob Tschudi Keywords: SAS Analytic Procedures Created

Copyright © 2003, SAS Institute Inc. All rights reserved. 10

Multithreaded GLM: 1 Active Thread 2 CPU Box

Thread View: Running Waiting I/O Blocked Exited

CPU Utilization: CPU 1 CPU 2

Page 11: Parallelization in Action with SAS Analytic Procedures · Title: Parallelization in Action with SAS Analytic Procedures Author: Bob Tschudi Keywords: SAS Analytic Procedures Created

Copyright © 2003, SAS Institute Inc. All rights reserved. 11

Multithreaded GLM: 1 Active Thread 2 CPU Box

Thread View: Running Waiting I/O Blocked Exited

Combined CPU Utilization

100

50.

0.

Page 12: Parallelization in Action with SAS Analytic Procedures · Title: Parallelization in Action with SAS Analytic Procedures Author: Bob Tschudi Keywords: SAS Analytic Procedures Created

Copyright © 2003, SAS Institute Inc. All rights reserved. 12

Multithreaded GLM: 2 Active Threads 2 CPU Box

Thread View: Running Waiting I/O Blocked Exited

GLM thread spawns off worker threads

GLM Thread Invert X’X

matrix

Two independent worker threads per task

Work is done in parallel

Page 13: Parallelization in Action with SAS Analytic Procedures · Title: Parallelization in Action with SAS Analytic Procedures Author: Bob Tschudi Keywords: SAS Analytic Procedures Created

Copyright © 2003, SAS Institute Inc. All rights reserved. 13

Multithreaded GLM: 2 Active Threads 2 CPU Box

Thread View: Running Waiting I/O Blocked Exited

CPU Utilization: CPU 1 CPU 2

Page 14: Parallelization in Action with SAS Analytic Procedures · Title: Parallelization in Action with SAS Analytic Procedures Author: Bob Tschudi Keywords: SAS Analytic Procedures Created

Copyright © 2003, SAS Institute Inc. All rights reserved. 14

Multithreaded GLM: 2 Active Threads 2 CPU Box

Thread View: Running Waiting I/O Blocked Exited

Combined CPU Utilization

100

50.

0.

Page 15: Parallelization in Action with SAS Analytic Procedures · Title: Parallelization in Action with SAS Analytic Procedures Author: Bob Tschudi Keywords: SAS Analytic Procedures Created

Copyright © 2003, SAS Institute Inc. All rights reserved. 15

Multithreaded GLM: 4 Active Threads 2 CPU Box

Thread View: Running Waiting I/O Blocked Exited

Page 16: Parallelization in Action with SAS Analytic Procedures · Title: Parallelization in Action with SAS Analytic Procedures Author: Bob Tschudi Keywords: SAS Analytic Procedures Created

Copyright © 2003, SAS Institute Inc. All rights reserved. 16

Threading Comparison Multithreaded GLM: 2 CPU Box

Thread View: Running Waiting I/O Blocked Exited

Page 17: Parallelization in Action with SAS Analytic Procedures · Title: Parallelization in Action with SAS Analytic Procedures Author: Bob Tschudi Keywords: SAS Analytic Procedures Created

Copyright © 2003, SAS Institute Inc. All rights reserved. 17

Amdahl’s Law

CPUs Speedup

1 1.00

2 1.67

4 2.50

8 3.33

16 4.00

4.44 32

PF = 80% Not Scalable Scalable

Page 18: Parallelization in Action with SAS Analytic Procedures · Title: Parallelization in Action with SAS Analytic Procedures Author: Bob Tschudi Keywords: SAS Analytic Procedures Created

Copyright © 2003, SAS Institute Inc. All rights reserved. 18

Amdahl’s Law

Parallelizable Fraction

100%

99%

95%

90%

80%

60%

Page 19: Parallelization in Action with SAS Analytic Procedures · Title: Parallelization in Action with SAS Analytic Procedures Author: Bob Tschudi Keywords: SAS Analytic Procedures Created

Copyright © 2003, SAS Institute Inc. All rights reserved. 19

Scalability in PROC REG: Wide Data and Scalar I/O

Speedups

Linear

Amdahl, PF=93%

Test Details

50,000 observations

500 predictors

Stepwise Selection

Scalar I/O

Page 20: Parallelization in Action with SAS Analytic Procedures · Title: Parallelization in Action with SAS Analytic Procedures Author: Bob Tschudi Keywords: SAS Analytic Procedures Created

Copyright © 2003, SAS Institute Inc. All rights reserved. 20

Scalability in PROC REG: Wide Data and Scalar I/O

Speedups

Linear

Amdahl, PF=93%

Test Details

50,000 observations

500 predictors

Stepwise Selection

Scalar I/O Achieved

Page 21: Parallelization in Action with SAS Analytic Procedures · Title: Parallelization in Action with SAS Analytic Procedures Author: Bob Tschudi Keywords: SAS Analytic Procedures Created

Copyright © 2003, SAS Institute Inc. All rights reserved. 21

Scalability in PROC REG: Narrow Data, Parallel I/O

Test Details

4 million observations

20 predictors

Parallel I/O

Speedups

Linear

Amdahl, PF=99.9%

Page 22: Parallelization in Action with SAS Analytic Procedures · Title: Parallelization in Action with SAS Analytic Procedures Author: Bob Tschudi Keywords: SAS Analytic Procedures Created

Copyright © 2003, SAS Institute Inc. All rights reserved. 22

Scalability in PROC REG: Narrow Data, Parallel I/O

Test Details

4 million observations

20 predictors

Parallel I/O

Speedups

Linear

Amdahl, PF=99.9%

Achieved

Page 23: Parallelization in Action with SAS Analytic Procedures · Title: Parallelization in Action with SAS Analytic Procedures Author: Bob Tschudi Keywords: SAS Analytic Procedures Created

Copyright © 2003, SAS Institute Inc. All rights reserved. 23

Speedups

Linear

Amdahl, PF=93%

Test Details

500,000 observations

Predictors:

50 continuous 15 classification Logistic model

Parallel I/O

Scalability in PROC DMREG

Page 24: Parallelization in Action with SAS Analytic Procedures · Title: Parallelization in Action with SAS Analytic Procedures Author: Bob Tschudi Keywords: SAS Analytic Procedures Created

Copyright © 2003, SAS Institute Inc. All rights reserved. 24

Scalability in PROC DMREG

Speedups

Achieved

Linear

Amdahl, PF=93%

Test Details

500,000 observations

Predictors:

50 continuous 15 classification Logistic model

Parallel I/O

Page 25: Parallelization in Action with SAS Analytic Procedures · Title: Parallelization in Action with SAS Analytic Procedures Author: Bob Tschudi Keywords: SAS Analytic Procedures Created

Copyright © 2003, SAS Institute Inc. All rights reserved. 25

Baseline Speedup and Scalability in PROC DMREG

Linear

Amdahl, PF = 93%

Speedups

Achieved

V9/V8 ***

Test Details

500,000 observations

Predictors:

50 continuous 15 classification Logistic model

Parallel I/O

Page 26: Parallelization in Action with SAS Analytic Procedures · Title: Parallelization in Action with SAS Analytic Procedures Author: Bob Tschudi Keywords: SAS Analytic Procedures Created

Copyright © 2003, SAS Institute Inc. All rights reserved. 26

Scalability in PROC GLM

Linear

Amdahl, PF = 98%

Speedups Test Details

6000 observations

4 classification

variables

2000 parameters

Page 27: Parallelization in Action with SAS Analytic Procedures · Title: Parallelization in Action with SAS Analytic Procedures Author: Bob Tschudi Keywords: SAS Analytic Procedures Created

Copyright © 2003, SAS Institute Inc. All rights reserved. 27

Scalability in PROC GLM

Linear

Amdahl, PF = 98%

Speedups Test Details

6000 observations

4 classification

variables

2000 parameters

Achieved

Superlinear

Scalability!

Page 28: Parallelization in Action with SAS Analytic Procedures · Title: Parallelization in Action with SAS Analytic Procedures Author: Bob Tschudi Keywords: SAS Analytic Procedures Created

Copyright © 2003, SAS Institute Inc. All rights reserved. 28

Scalability in PROC LOESS

Linear

Amdahl, PF=95%

Speedups

Test Details

4000 observations

18 models evaluated

Confidence limits for

selected model

Page 29: Parallelization in Action with SAS Analytic Procedures · Title: Parallelization in Action with SAS Analytic Procedures Author: Bob Tschudi Keywords: SAS Analytic Procedures Created

Copyright © 2003, SAS Institute Inc. All rights reserved. 29

Scalability in PROC LOESS

Linear

Amdahl, PF=95%

Speedups

Test Details

4000 observations

18 models evaluated

Confidence limits for

selected model Achieved

Page 30: Parallelization in Action with SAS Analytic Procedures · Title: Parallelization in Action with SAS Analytic Procedures Author: Bob Tschudi Keywords: SAS Analytic Procedures Created

Copyright © 2003, SAS Institute Inc. All rights reserved. 30

Scalability in PROC LOESS

Linear

Amdahl, PF=99%

Speedups

Test Details

4000 observations

1 model specified

Confidence limits for

specified model

Page 31: Parallelization in Action with SAS Analytic Procedures · Title: Parallelization in Action with SAS Analytic Procedures Author: Bob Tschudi Keywords: SAS Analytic Procedures Created

Copyright © 2003, SAS Institute Inc. All rights reserved. 31

Scalability in PROC LOESS

Linear

Amdahl, PF=99%

Speedups

Test Details

4000 observations

1 model specified

Confidence limits for

specified model Achieved

Page 32: Parallelization in Action with SAS Analytic Procedures · Title: Parallelization in Action with SAS Analytic Procedures Author: Bob Tschudi Keywords: SAS Analytic Procedures Created

Copyright © 2003, SAS Institute Inc. All rights reserved. 32

Partially Multithreaded Procedures

Base SAS

• PROC SORT

• PROC SUMMARY

• SQL (Group by,Order by)

Enterprise Miner

• PROC DMDB

• PROC DMREG

• PROC DMINE

SAS/STAT

• PROC GLM

• PROC LOESS

• PROC REG

• PROC ROBUSTREG

NOTE: Not all usages of these

procedures are scalable.

Your mileage may vary!

Page 33: Parallelization in Action with SAS Analytic Procedures · Title: Parallelization in Action with SAS Analytic Procedures Author: Bob Tschudi Keywords: SAS Analytic Procedures Created

Copyright © 2003, SAS Institute Inc. All rights reserved. 33

Reading Between the Lines

Parallelization adds value to the IVC

Multithreading to provide parallel execution

How do you measure scalability

Selected demonstrations

Analyze bigger volumes of data

Not as boring as I feared

Predicting scalability is a subtle task

Some of my jobs will run faster in SAS 9

Page 34: Parallelization in Action with SAS Analytic Procedures · Title: Parallelization in Action with SAS Analytic Procedures Author: Bob Tschudi Keywords: SAS Analytic Procedures Created

Copyright © 2003, SAS Institute Inc. All rights reserved. 34

Questions and hopefully answers