datastage custom stages

22
Custom Stages Custom Stages

Upload: vamsi-karthik

Post on 02-Dec-2015

472 views

Category:

Documents


26 download

DESCRIPTION

DataStage Custom Stages

TRANSCRIPT

Page 1: DataStage Custom Stages

Custom StagesCustom Stages

Page 2: DataStage Custom Stages

2© 2002. Infosys Technologies Ltd.

Agenda

Introduction

Types of Stages

How to build Custom Stages

Page 3: DataStage Custom Stages

3© 2002. Infosys Technologies Ltd.

Introduction

Data stage provides large no of inbuilt Stages to extract and transform data.

In addition to existing Stages, also provides capability to build custom Stages.

Page 4: DataStage Custom Stages

4© 2002. Infosys Technologies Ltd.

Types of Stages

There are three different types of Stages that can be built.

Custom

– use an existing Orchestrate operator as a Stage and use in parallel jobs.

Build

– Creator own operators and use them in Stage.

Wrapper

– Specify a UNIX command as a Stage and use it.

Page 5: DataStage Custom Stages

5© 2002. Infosys Technologies Ltd.

Custom Stage

Custom Stages use already existing Orchestrate operators.

Steps in defining Custom Stages.

• Select the category from repository.

• Select File -> New Parallel Stage -> Custom

• On General page specify the name of the operator to be used.

• On Links page specify the maximum and minimum no of input and output links.

• On Properties page specify the properties.

Page 6: DataStage Custom Stages

6© 2002. Infosys Technologies Ltd.

Wrapped Stages

Wrapper Stages use UNIX commands.

When defining a Build stage you provide the following information:

• Details of the UNIX command that the stage will execute.

• Description of the data that will be input to the stage.

• Description of the data that will be output from the stage.

• Definition of the environment in which the command will execute.

– Unix command can be any command like sort, grep, a script, etc.

Page 7: DataStage Custom Stages

7© 2002. Infosys Technologies Ltd.

Build Stages

Enables you to create own operators.

Written in C++

Gives advantage of programming language control.

Page 8: DataStage Custom Stages

8© 2002. Infosys Technologies Ltd.

Build Stages

Buildop provides a simple means of creating own operator. It does not use an existing operators or executable

Reasons to use Buildop include:

Functionality of Multiple Stages can be combined into

Complex business logic that cannot be easily using existing stages

Lookups across a range of values

Surrogate key generation

Better Performance as there is no unwanted functionality

Buildop is reusable. It can used within a project as well as exported and used in other projects also

Page 9: DataStage Custom Stages

9© 2002. Infosys Technologies Ltd.

Build Stage

Interface is similar to wrapper Stage.

When defining a Build Stage one needs to provide

– Input interface/schema

– Output interface /schema

– Transfer type, if “Auto Transfer” is selected all the input columns are output.

– Header files and definitions

– Code to be executed before the stage

– Code to be executed for each record input

– Code to be executed after the stage

Page 10: DataStage Custom Stages

10© 2002. Infosys Technologies Ltd.

Build Stage

Page 11: DataStage Custom Stages

11© 2002. Infosys Technologies Ltd.

Steps

Steps for defining a Build Stage

1. Select the Stage Types category in which the Stage is to be created

2. Choose File New Parallel Stage Build from the main menu or New Parallel Stage -> Build from the shortcut menu.

• General tab has Stage Type, Category, Operator by default the Build Stage name and Class name by default the Build Stage name.

• Creator tab has generic information about the version of build Stage, Author name, copy right information.

• Properties page all the options to be passed to Build Stage as run time options are defined.

• Build page contains three tabs

• Interfaces – This page contains input and output interfaces/schemas defined.

• Logic – This tab contains three sections Pre – Loop, Per – Record and Post – Loop

• Advanced

Page 12: DataStage Custom Stages

12© 2002. Infosys Technologies Ltd.

Build Stage Macros

There are a number of macros you can use when specifying Pre-Loop, Per-Record, and Post-Loop code.

• Informational

• Flow-control

• Input and output

• Transfer

Page 13: DataStage Custom Stages

13© 2002. Infosys Technologies Ltd.

This slide shows Interfaces tab in Build page. This tab contains the input and output interfaces defined.

Page 14: DataStage Custom Stages

14© 2002. Infosys Technologies Ltd.

Build Stage Macros

Informational Macros

These macros are used to determine the number of inputs, outputs,and transfers

• inputs() - returns the number of inputs to the stage.

• outputs() - returns the number of outputs from the stage.

• transfers() - returns the number of transfers in the stage.

Flow-Control Macros

These macros used to override the default behavior of the Per-Record loop in stage definition

• endLoop() - stops looping after completion of the current loop after writing any auto outputs for this loop.

• nextLoop() - immediately move control to the start of next loop

• failStep() - return a failed status and terminate the job

Page 15: DataStage Custom Stages

15© 2002. Infosys Technologies Ltd.

Build Stage Macros

Input and Output Macros

The following macros are available:

• readRecord(input) - reads the next record from input, if there is one. If there is no record, the next call to inputDone() will return false.

• writeRecord(output) - writes a record to output.

• inputDone(input) - returns true if the last call to readRecord() for the specified input failed to read a new record, because the input has no more records.

• holdRecord(input) - auto input is suspended for the current record

• discardRecord(output) - auto output is suspended for the current record, so that the operator does not output the record at the end of the current loop.

• discardTransfer(index) - auto transfer is suspended

Page 16: DataStage Custom Stages

16© 2002. Infosys Technologies Ltd.

Build Stage Macros

Transfer Macros

The following macros are available:

• doTransfer(index) – transfers data specified by index.

• doTransfersFrom(input) - transfers input from the index specified.

• doTransfersTo(output) - transfers output to the index specified.

• transferAndWriteRecord(output) - transfers and writes a record for the specified output. Calling this macro is equivalent to calling the macros doTransfersTo() and writeRecord().

Page 17: DataStage Custom Stages

17© 2002. Infosys Technologies Ltd.

Build Stage

This page contains all header file information and definitions

Page 18: DataStage Custom Stages

18© 2002. Infosys Technologies Ltd.

Example

Definitions tab contains Header files and definitions#include "apt_util/string.h"

#include "apt_util/ints.h"

int iHold = 0;

int iVar = 0;

int iCounter=0;

struct extract_type

{

long long gst_i;

long long mail_addr_i;

char surname[32];

long long acct_cd_seq_i;

long long dummy_grp_seq_i;

char grp_end_d[10];

};

struct extract_type extract_rec[100];

Page 19: DataStage Custom Stages

19© 2002. Infosys Technologies Ltd.

Example

Pre Loop section contains Code to be executed before processing of input.

Per Record section.

This section contains logic to be implemented for each record.

if (input.MAIL_ADDR_I!=tempMail )

{

// reading first record

extract_rec[i].gst_i=input.GST_I;

extract_rec[i].mail_addr_i=input.MAIL_ADDR_I;

extract_rec[i].acct_cd_seq_i=input.ACCT_CD_SEQ_I;

// Begin of Grouping logic

Each of the input column is accessed as input.Column where input is the name of input interface

Page 20: DataStage Custom Stages

20© 2002. Infosys Technologies Ltd.

Per Record section contains the code to be executed for each of the input record.This page shows code to be executed for each record.

Page 21: DataStage Custom Stages

21© 2002. Infosys Technologies Ltd.

Example

Code is written in C++ same as any C++ program without main//write output to output interface

for ( m=0;m<i;m++)

{ output.GST_I=extract_rec[m].gst_i;

output.MAIL_ADDR_I=extract_rec[m].mail_addr_i;

output.ACCT_CD_SEQ_I=extract_rec[m].acct_cd_seq_i;

output.PRIM_LAST_NAME=extract_rec[m].surname;

// Writing the record to Output

writeRecord(output.portid_);

}

Data is transferred to output interface by assigning the computed values to output interface using output.Column where output is the interface name.

Output is written by calling writeRecord(output) macro. It transfers the data to output interface.

Page 22: DataStage Custom Stages

22© 2002. Infosys Technologies Ltd.

Example

Post Loop section contains code to be executed after the processing. This is same as Pre Loop and Per Record sections but is executed after completion of Per Record section.