divide and conquer writing parallel sas ® code to speed up your sas program doug haigh, sas...

32
Divide and Conquer Writing Parallel SAS ® Code to Speed Up Your SAS Program Doug Haigh, SAS Institute Inc.

Upload: cuthbert-smith

Post on 17-Jan-2016

250 views

Category:

Documents


12 download

TRANSCRIPT

Page 1: Divide and Conquer Writing Parallel SAS ® Code to Speed Up Your SAS Program Doug Haigh, SAS Institute Inc

Divide and ConquerWriting Parallel SAS® Code to Speed Up Your SAS Program

Doug Haigh, SAS Institute Inc.

Page 2: Divide and Conquer Writing Parallel SAS ® Code to Speed Up Your SAS Program Doug Haigh, SAS Institute Inc

2

Introduction Have you ever wanted to

Text and drive at the same time? Watch the big game and read a book at the same time? Be on vacation at the beach and get work done at the office?

Humans are not good at doing two things at the same time but your SAS code can be

Page 3: Divide and Conquer Writing Parallel SAS ® Code to Speed Up Your SAS Program Doug Haigh, SAS Institute Inc

3

Introduction Parallel code is when two or more streams of execution

occur at nearly the same time

Time

Page 4: Divide and Conquer Writing Parallel SAS ® Code to Speed Up Your SAS Program Doug Haigh, SAS Institute Inc

4

Introduction Parallel SAS code requires SAS/CONNECT

One CONNECT client to many CONNECT servers

Parallel SAS code using SAS/CONNECT created by SAS Data Integration Studio SAS Enterprise Miner SAS Enterprise Guide / PROC SCAPROC

Page 5: Divide and Conquer Writing Parallel SAS ® Code to Speed Up Your SAS Program Doug Haigh, SAS Institute Inc

5

Background SIGNON / SIGNOFF

Establish/terminate connection to CONNECT server on

» Same machine

» Remote machine

» SAS Grid machine

RSUBMIT / ENDRSUBMIT Sends SAS code to CONNECT server for processing May or may not wait for code to complete

Page 6: Divide and Conquer Writing Parallel SAS ® Code to Speed Up Your SAS Program Doug Haigh, SAS Institute Inc

6

Simple SIGNON 1) OPTIONS SASCMD="!SASCMD";

2) %let mySess=mySpawnerHost.myDomain.com 1234;3) %sysfunc(grdsvc_enable(mySess,server=SASApp));

SIGNON mySess;

RSUBMIT mySess; data _NULL_;rc=sleep(5,1);run;ENDRSUBMIT;

SIGNOFF mySess;

Page 7: Divide and Conquer Writing Parallel SAS ® Code to Speed Up Your SAS Program Doug Haigh, SAS Institute Inc

7

Simple SIGNON

SIGNON RSUBMIT SIGNOFF

Time

CONNECTServer(s)

CONNECTClient

Page 8: Divide and Conquer Writing Parallel SAS ® Code to Speed Up Your SAS Program Doug Haigh, SAS Institute Inc

8

Multiple SIGNONsSynchronous

SIGNON mySess1;SIGNON mySess2;

RSUBMIT mySess1; data _NULL_;rc=sleep(5,1);run;ENDRSUBMIT;

RSUBMIT mySess2; data _NULL_;rc=sleep(5,1);run;ENDRSUBMIT;

SIGNOFF mySess1;SIGNOFF mySess2;

Page 9: Divide and Conquer Writing Parallel SAS ® Code to Speed Up Your SAS Program Doug Haigh, SAS Institute Inc

9

Multiple SIGNONsSynchronous

SIGNON RSUBMIT SIGNOFF

Page 10: Divide and Conquer Writing Parallel SAS ® Code to Speed Up Your SAS Program Doug Haigh, SAS Institute Inc

10

Multiple SIGNONsAsynchronous

SIGNON mySess1 SIGNONWAIT=NO;SIGNON mySess2 SIGNONWAIT=NO;

RSUBMIT mySess1 WAIT=NO; data _NULL_;rc=sleep(5,1);run;ENDRSUBMIT;

RSUBMIT mySess2 WAIT=NO; data _NULL_;rc=sleep(5,1);run;ENDRSUBMIT;

SIGNOFF _ALL_;

Page 11: Divide and Conquer Writing Parallel SAS ® Code to Speed Up Your SAS Program Doug Haigh, SAS Institute Inc

11

Multiple SIGNONsAsynchronous

SIGNON RSUBMIT SIGNOFF

Page 12: Divide and Conquer Writing Parallel SAS ® Code to Speed Up Your SAS Program Doug Haigh, SAS Institute Inc

12

Multiple SIGNONsAsynchronous

SIGNON RSUBMIT SIGNOFF

Page 13: Divide and Conquer Writing Parallel SAS ® Code to Speed Up Your SAS Program Doug Haigh, SAS Institute Inc

13

Reusing a Session

SIGNON mySess1 SWAIT=NO;SIGNON mySess2 SWAIT=NO;

RSUBMIT mySess1 WAIT=NO; data _NULL_;rc=sleep(10,1);run;ENDRSUBMIT;

RSUBMIT mySess2 WAIT=NO; data _NULL_;rc=sleep(5,1);run;ENDRSUBMIT;

WAITFOR _ALL_ mySess1;

RSUBMIT mySess1 WAIT=NO; data _NULL_;rc=sleep(5,1);run;ENDRSUBMIT;

SIGNOFF _ALL_;

Page 14: Divide and Conquer Writing Parallel SAS ® Code to Speed Up Your SAS Program Doug Haigh, SAS Institute Inc

14

Reusing a Session

SIGNON RSUBMIT SIGNOFF

Page 15: Divide and Conquer Writing Parallel SAS ® Code to Speed Up Your SAS Program Doug Haigh, SAS Institute Inc

15

Reusing a Session

SIGNON RSUBMIT SIGNOFF

Page 16: Divide and Conquer Writing Parallel SAS ® Code to Speed Up Your SAS Program Doug Haigh, SAS Institute Inc

16

Reusing an AvailableSession

SIGNON mySess1 SWAIT=NO;SIGNON mySess2 SWAIT=NO;

RSUBMIT mySess1 WAIT=NO CMACVAR=myVar1; data _NULL_;rc=sleep(10,1);run;ENDRSUBMIT;RSUBMIT mySess2 WAIT=NO CMACVAR=myVar2; data _NULL_;rc=sleep(5,1);run;ENDRSUBMIT;

WAITFOR _ANY_ mySess1 mySess2;

%determineAvailableSession(2);

RSUBMIT mySess&openSess WAIT=NO; data _NULL_;rc=sleep(5,1);run;ENDRSUBMIT;

SIGNOFF _ALL_;

Page 17: Divide and Conquer Writing Parallel SAS ® Code to Speed Up Your SAS Program Doug Haigh, SAS Institute Inc

17

Reusing an AvailableSession

%macro determineAvailableSession(numSessions);

%global openSess;

%do sess=1 %to &numSessions; %if (&&myVar&sess eq 0) %then %do; %let openSess=&sess; %let sess=&numSessions; %end; %end;

%mend;

Page 18: Divide and Conquer Writing Parallel SAS ® Code to Speed Up Your SAS Program Doug Haigh, SAS Institute Inc

18

Reusing anAvailable Session

SIGNON RSUBMIT SIGNOFF

Page 19: Divide and Conquer Writing Parallel SAS ® Code to Speed Up Your SAS Program Doug Haigh, SAS Institute Inc

19

Reusing anAvailable Session

SIGNON RSUBMIT SIGNOFF

Page 20: Divide and Conquer Writing Parallel SAS ® Code to Speed Up Your SAS Program Doug Haigh, SAS Institute Inc

20

Reusing the Best AvailableSession

SIGNON mySess1 SWAIT=NO CMACVAR=mySignonVar1;…SIGNON mySessN SWAIT=NO CMACVAR=mySignonVarN;

%waitForAvailableSession(N);RSUBMIT mySess&openSess WAIT=NO CMACVAR=myVar&openSess; data _NULL_;rc=sleep(10,1);run;ENDRSUBMIT;

%waitForAvailableSession(N);RSUBMIT mySess&openSess WAIT=NO CMACVAR=myVar&openSess; data _NULL_;rc=sleep(1,1);run;ENDRSUBMIT;

…SIGNOFF _ALL_;

Page 21: Divide and Conquer Writing Parallel SAS ® Code to Speed Up Your SAS Program Doug Haigh, SAS Institute Inc

21

Reusing the Best AvailableSession

%macro waitForAvailableSession(numSessions); %global openSessID;

%let sessFound=0; %do %while (&sessFound eq 0); %do sess=1 %to &numSessions; %if (&&mySignonVar&sess eq 0) %then %if (&&myVar&sess eq 0) %then %do; %let openSess=&sess; %let sess=&numSessions; %let sessFound=1; %end; %end; %if (&sessFound eq 0) %then %let rc=%sysfunc(sleep(1,1)); %end;

%mend;

Page 22: Divide and Conquer Writing Parallel SAS ® Code to Speed Up Your SAS Program Doug Haigh, SAS Institute Inc

22

Reusing theBest Available Session

SIGNON RSUBMIT SIGNOFF

Page 23: Divide and Conquer Writing Parallel SAS ® Code to Speed Up Your SAS Program Doug Haigh, SAS Institute Inc

23

How about a macro to do all of this…

Perform SIGNONs as needed SASCMD or Grid

Retry SIGNONs if one fails

Manage RSUBMITs to available hosts

Retry RSUBMITs if one fails

Display progress of RSUBMITs

SIGNOFF hosts when no more work exists

Email user when done

Page 24: Divide and Conquer Writing Parallel SAS ® Code to Speed Up Your SAS Program Doug Haigh, SAS Institute Inc

24

%Distribute Determine code that needs to be executed once when SIGNON completes LIBNAME, FILENAME

Create code that can run at each iteration Base iteration differences on macro variables

provided

» Rem_Host, Rem_iHost, Rem_Seed, Rem_NIterAll, Rem_Niter, Rem_JobIters, Rem_JobID, GlobalNSub

Setup %Distribute parameters

Run

Page 25: Divide and Conquer Writing Parallel SAS ® Code to Speed Up Your SAS Program Doug Haigh, SAS Institute Inc

25

%Distribute Signing on...

GridDistribute: Maximum number of nodes is 4

Processing...

GridDistribute: Signing on to Host #1

GridDistribute: Signing on to Host #2

GridDistribute: Signing on to Host #3

GridDistribute: Signing on to Host #4

Stat: [0:00:00] ???? (0/0) 100000

GridDistribute: Host #1 is host2.mydomain.com

GridDistribute: Host #2 is host4.mydomain.com

GridDistribute: Host #3 is host1.mydomain.com

GridDistribute: Host #4 is host3.mydomain.com

Stat: [0:00:02] !!!. (0/0) 100000

Stat: [0:00:02] .... (8000/0) 100000

Stat: [0:00:05] !!!! (8000/8000) 100000

<similar lines deleted>

Stat: [0:00:14] ...! (100000/94000) 100000

Page 26: Divide and Conquer Writing Parallel SAS ® Code to Speed Up Your SAS Program Doug Haigh, SAS Institute Inc

26

Summary Writing parallel SAS code can significantly speed up

processing

Some SAS products will do it for you

See the paper for discussion of additional considerations Information movement Data movement Output management RSUBMIT and the SAS Macro Facility

Page 27: Divide and Conquer Writing Parallel SAS ® Code to Speed Up Your SAS Program Doug Haigh, SAS Institute Inc

27

Questions?

Page 28: Divide and Conquer Writing Parallel SAS ® Code to Speed Up Your SAS Program Doug Haigh, SAS Institute Inc

28

Session ID #1935

Page 29: Divide and Conquer Writing Parallel SAS ® Code to Speed Up Your SAS Program Doug Haigh, SAS Institute Inc
Page 30: Divide and Conquer Writing Parallel SAS ® Code to Speed Up Your SAS Program Doug Haigh, SAS Institute Inc

30

Additional Considerations

Information movement %SYSLPUT / %SYSRPUT for macro variables

%SYSLPUT remVar=&localVar /REMOTE=mySess1;

RSUBMIT mySess1; … %SYSRPUT localVar=&remVar;ENDRSUBMIT;

Page 31: Divide and Conquer Writing Parallel SAS ® Code to Speed Up Your SAS Program Doug Haigh, SAS Institute Inc

31

Additional Considerations

Data Movement Shared file system / RDBMS PROC UPLOAD/DOWNLOAD RLS

Output Movement Log and List files LOG=, LIST= PROC PRINTTO

Page 32: Divide and Conquer Writing Parallel SAS ® Code to Speed Up Your SAS Program Doug Haigh, SAS Institute Inc

32

Additional Considerations

RSUBMIT and the SAS Macro Facility

RSUBMIT mySess1; %SYSRPUT localVar=&remVar;ENDRSUBMIT;

needs to be quotedRSUBMIT mySess1; %NRSTR(%%)SYSRPUT localVar=&remVar;ENDRSUBMIT;

or wrapped in a macroRSUBMIT mySess1; %MACRO updateVar; %SYSRPUT localVar=&remVar; %MEND; %updateVar;ENDRSUBMIT;