dhillon loopingindistudio

33
Rupinder Dhillon Dec 14, 2012 TASS-i

Upload: defconbond0073803

Post on 04-Jun-2018

224 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Dhillon LoopingInDIStudio

8/13/2019 Dhillon LoopingInDIStudio

http://slidepdf.com/reader/full/dhillon-loopingindistudio 1/33

Rupinder Dhillon

Dec 14, 2012

TASS-i

Page 2: Dhillon LoopingInDIStudio

8/13/2019 Dhillon LoopingInDIStudio

http://slidepdf.com/reader/full/dhillon-loopingindistudio 2/33

Agenda

Using DI Studio at Bell

Problem we faced in our DI Studio ETL jobs

How we used Parameters and Looping in DI Studio

Adding Looping and Parameters to a Sample job Additional options available

Some other applications

Page 3: Dhillon LoopingInDIStudio

8/13/2019 Dhillon LoopingInDIStudio

http://slidepdf.com/reader/full/dhillon-loopingindistudio 3/33

Using DI Studio at Bell

Why use DI Studio? Can’t I just write all my ETL

in Base SAS?

Standardization

Built in Documentation

Traceability – ‘analyze’

Reusability/Maintainability/Consistency Centralization

Seamless integration with Scheduling

Page 4: Dhillon LoopingInDIStudio

8/13/2019 Dhillon LoopingInDIStudio

http://slidepdf.com/reader/full/dhillon-loopingindistudio 4/33

Problem we faced in our DI Studio ETL jobs

What made us turn to Looping

Extracting MASSIVE data from Bell EDW

Spool space restrictions

Needed to load history a chunk at a time

Page 5: Dhillon LoopingInDIStudio

8/13/2019 Dhillon LoopingInDIStudio

http://slidepdf.com/reader/full/dhillon-loopingindistudio 5/33

Page 6: Dhillon LoopingInDIStudio

8/13/2019 Dhillon LoopingInDIStudio

http://slidepdf.com/reader/full/dhillon-loopingindistudio 6/33

Time Out: DI Studio Basics

Used to Build ETL flows (BUT NOT LIMITED TO)

Made up of Source tables, Target Tables, Transformations

(and mappings in between)

Source/Target tables found inthe Inventory tab.

Transformations found in

….. Transformations tab. 

Page 7: Dhillon LoopingInDIStudio

8/13/2019 Dhillon LoopingInDIStudio

http://slidepdf.com/reader/full/dhillon-loopingindistudio 7/33

Time Out: DI Studio Basics

Transformations grouped together

Page 8: Dhillon LoopingInDIStudio

8/13/2019 Dhillon LoopingInDIStudio

http://slidepdf.com/reader/full/dhillon-loopingindistudio 8/33

Here is a sample DI Studio Job

•  Source table: Customer_Orders•  Extract Step•  Table Loader into Target Table•  Target table: ODS_T_ORDER•  User Written Code: Update Control Table

Page 9: Dhillon LoopingInDIStudio

8/13/2019 Dhillon LoopingInDIStudio

http://slidepdf.com/reader/full/dhillon-loopingindistudio 9/33

What does the Data Extract Step look like?

Page 10: Dhillon LoopingInDIStudio

8/13/2019 Dhillon LoopingInDIStudio

http://slidepdf.com/reader/full/dhillon-loopingindistudio 10/33

What does the Data Extract Step look like?

Page 11: Dhillon LoopingInDIStudio

8/13/2019 Dhillon LoopingInDIStudio

http://slidepdf.com/reader/full/dhillon-loopingindistudio 11/33

We want to turn it into a Looping Job

Page 12: Dhillon LoopingInDIStudio

8/13/2019 Dhillon LoopingInDIStudio

http://slidepdf.com/reader/full/dhillon-loopingindistudio 12/33

We want to turn it into a Looping Job

Steps:

Define parameters in the Job you want to loop

Create a parent job to control the loop

Create a dataset of values you want to loop through

Map the dataset values to the parameters that the job is

expecting

Specify what job should be looped

Close the loop

Page 13: Dhillon LoopingInDIStudio

8/13/2019 Dhillon LoopingInDIStudio

http://slidepdf.com/reader/full/dhillon-loopingindistudio 13/33

Define parameters in the Job you want to loop

Go to the Parameters Tab in the job properties

Page 14: Dhillon LoopingInDIStudio

8/13/2019 Dhillon LoopingInDIStudio

http://slidepdf.com/reader/full/dhillon-loopingindistudio 14/33

Define parameters in the Job you want to loop

Click new prompt to define a new prompt•  May consider creating a group to keep the job parameters together

•  Job Control

Page 15: Dhillon LoopingInDIStudio

8/13/2019 Dhillon LoopingInDIStudio

http://slidepdf.com/reader/full/dhillon-loopingindistudio 15/33

Define parameters in the Job you want to loop

On the General Tab:Define a parameter calledxtr_Date. This parameter will control what date ourextract will run for.

Page 16: Dhillon LoopingInDIStudio

8/13/2019 Dhillon LoopingInDIStudio

http://slidepdf.com/reader/full/dhillon-loopingindistudio 16/33

Define parameters in the Job you want to loop

The new parameter appears under our Parameter Tab

Page 17: Dhillon LoopingInDIStudio

8/13/2019 Dhillon LoopingInDIStudio

http://slidepdf.com/reader/full/dhillon-loopingindistudio 17/33

Define parameters in the Job you want to loop

Update our Data Extract where clause to use the newparameter

‘&’ in the job;tells us the job

uses a Parameter

Page 18: Dhillon LoopingInDIStudio

8/13/2019 Dhillon LoopingInDIStudio

http://slidepdf.com/reader/full/dhillon-loopingindistudio 18/33

Create a parent job to control the loop

•  Start with dragging in our Control Table•  Add a User Written Code transformation

Page 19: Dhillon LoopingInDIStudio

8/13/2019 Dhillon LoopingInDIStudio

http://slidepdf.com/reader/full/dhillon-loopingindistudio 19/33

Create a parent job to control the loop•

  In the General properties of the User Written Transformation, change the name

•  In the CODE tab, you can have:• All User Written• Automatic• User Written Body

Change to All User Written

Page 20: Dhillon LoopingInDIStudio

8/13/2019 Dhillon LoopingInDIStudio

http://slidepdf.com/reader/full/dhillon-loopingindistudio 20/33

Create a parent job to control the loop

How do we create our list of xtr_Dates? With SAS Code… 

Page 21: Dhillon LoopingInDIStudio

8/13/2019 Dhillon LoopingInDIStudio

http://slidepdf.com/reader/full/dhillon-loopingindistudio 21/33

Create a parent job to control the loop

What values did our SAS Code create?

Page 22: Dhillon LoopingInDIStudio

8/13/2019 Dhillon LoopingInDIStudio

http://slidepdf.com/reader/full/dhillon-loopingindistudio 22/33

Create a parent job to control the loop

Add the Loop transformation

Page 23: Dhillon LoopingInDIStudio

8/13/2019 Dhillon LoopingInDIStudio

http://slidepdf.com/reader/full/dhillon-loopingindistudio 23/33

Create a parent job to control the loop

Add the job that you want looped

Page 24: Dhillon LoopingInDIStudio

8/13/2019 Dhillon LoopingInDIStudio

http://slidepdf.com/reader/full/dhillon-loopingindistudio 24/33

Create a parent job to control the loop

Close the Loop:

Drag in the Loop End Transformation

Page 25: Dhillon LoopingInDIStudio

8/13/2019 Dhillon LoopingInDIStudio

http://slidepdf.com/reader/full/dhillon-loopingindistudio 25/33

Create a parent job to control the loop

Almost Done… We need to define the Loop Properties so that the loop

knows what to do with our xtr_Dates values

Page 26: Dhillon LoopingInDIStudio

8/13/2019 Dhillon LoopingInDIStudio

http://slidepdf.com/reader/full/dhillon-loopingindistudio 26/33

Create a parent job to control the loop

In the Loop Properties – Parameter Mapping TabOur loop recognizes that the looping job is expecting a

Parameter. Map our xtr_Date Parameter to the xtr_Datevariable in our input loop dataset.

 What could we do with Row Number?

Page 27: Dhillon LoopingInDIStudio

8/13/2019 Dhillon LoopingInDIStudio

http://slidepdf.com/reader/full/dhillon-loopingindistudio 27/33

Create a parent job to control the loop

Last Step – make sure our xtr_Date makes it to our Loopoutput:

Page 28: Dhillon LoopingInDIStudio

8/13/2019 Dhillon LoopingInDIStudio

http://slidepdf.com/reader/full/dhillon-loopingindistudio 28/33

Our Looping Job is complete

 We have 2 jobs:1. The core job that should be looped2. The Loop control jobNote: Only the Loop Control Job needs to be

deployed

Page 29: Dhillon LoopingInDIStudio

8/13/2019 Dhillon LoopingInDIStudio

http://slidepdf.com/reader/full/dhillon-loopingindistudio 29/33

Can put multiple jobs within a loop Can add Return Code checks for more flexibility

Can have multiple Parameters

Each loop can be set to run in Parallel when running in a

Grid Environment

Using looping to create conditional processing (if dataset

is empty, loop does not run)

Create Loop job Templates based on common function Developer just has to change the particulars (job to call

etc).

Additional options available

Page 30: Dhillon LoopingInDIStudio

8/13/2019 Dhillon LoopingInDIStudio

http://slidepdf.com/reader/full/dhillon-loopingindistudio 30/33

Where we have found Looping to be useful… 

Loading Type 2 dimension tables when you have to runone day at a time

Working around extract or load constraints

Processing text files one at a time

Reloading mismatched or missing data ranges

Page 31: Dhillon LoopingInDIStudio

8/13/2019 Dhillon LoopingInDIStudio

http://slidepdf.com/reader/full/dhillon-loopingindistudio 31/33

Lots more functionality to explore…. 

Looping in a GRID environment -http://support.sas.com/documentation/cdl/en/gridref/63292/HTML/default/viewer.htm#p069a2cddfgemtn0zxnvl0x94e87.htm 

Using Looping to create conditional processing -http://support.sas.com/resources/papers/proceedings12/097-2012.pdf  

Best Practices -

http://support.sas.com/resources/papers/proceedings11/137-2011.pdf  

Contact: [email protected] 

Page 32: Dhillon LoopingInDIStudio

8/13/2019 Dhillon LoopingInDIStudio

http://slidepdf.com/reader/full/dhillon-loopingindistudio 32/33

Appendix – setting dataset variables into Macro Variables

%macro doit;%let id=%sysfunc(open(sashelp.class));

%let NObs=%sysfunc(attrn(&id,NOBS));

%syscall set(id);

%do i=1 %to &NObs;

%let rc=%sysfunc(fetchobs(&id,&i));

%put # # # Processing &Name # # #;

%* ---- Analysis Code Goes Here ----- *;

%end;

%let id=sysfunc(close(&id));

%mend;

http://www2.sas.com/proceedings/forum2007/050-2007.pdf  

Page 33: Dhillon LoopingInDIStudio

8/13/2019 Dhillon LoopingInDIStudio

http://slidepdf.com/reader/full/dhillon-loopingindistudio 33/33

Questions???