datastage tricks & tips

Download DataStage Tricks & Tips

If you can't read please download the document

Post on 18-Apr-2015

306 views

Category:

Documents

9 download

Embed Size (px)

TRANSCRIPT

DataStage Designer Tips & TricksJim TsimisAdvanced Technical Support

Mike CarneyAdvanced Consulting Group

Michael RulandField Engineering

Steven TotmanProduct Manager Connectivity

1

Agenda Designer Session General Debug Tips & Tricks Handling Complex Flat Files Joy of the Command Line Transaction Handling Tips & Tricks Managing transactions in Server, Enterprise Edition, Enterprise MVS Edition and RTI

Re-usability Tips & Tricks Shared Containers, Templates, Pre-configured stages, Runtime column propagation

Performance Tuning

2

General Debug Tips & Tricks

3

Server - Generating Test Data

General Debug

Stage Variables are always executed and drive the transformer stage

Notice there are no input linksOutput rows will be generated until the constraint is false at that point the job will stop

4

Enterprise Edition Generating Test Data

General Debug

5

Building test data from live dataHead stage: selects the first N records from each partition of an input data set and copies the selected records to an output data set. Tail stage: selects the last N records from each partition of an input data set and copies the selected records to an output data set. Sample stage: samples an input data set. Operates in two modes: Percent mode, extracts rows, selecting them by means of a random number generator, and writes a given percentage of these to each output data set; Period mode, extracts every Nth row from each partition, where N is the period which you supply. Filter stage: transfers, unmodified, the records of the input data set which satisfy the specified requirements and filters out all other records. You can specify different requirements to route rows down different output links. External Filter stage: allows you to specify a UNIX command that acts as a filter on the data you are processing. An example would be to use the stage to grep a data set for a certain string, or pattern, and discard records which did not contain a match.

Sequential File stage: FILTER OPTION - use this to specify that the data is passed through a filter program before being written to a file or files on output or before being placed in a dataset on input.

6

Handling Complex Flat Files

7

Server Decoding Multi-formatted Files

Input column definitions (3 columns) The selected complex column is decoded into individual columns

8

Enterprise Edition Decoding Multi-formatted Files

Indicate the columns to import Map the columns to their destination

9

Enterprise Edition Taming the import

Print field option

od x A x

10

Working with Schemas

Converting Copybooks To Schemas

11

Enterprise Edition Working with Complex FilesMake Subrecord stage: combines specified vectors in an input data set into a vector of subrecords whose columns have the names and data types of the original vectors. Promote Subrecord stage: promotes the columns of an input subrecord to top-level columns, can also promote the columns in vectors of subrecords, in which case it acts as the inverse of the Combine Record stage. Split Subrecord stage: separates an input subrecord field into a set of top-level vector columns. Make Vector stage: combines specified columns of an input data record into a vector of columns of the same type. Split Vector stage: promotes the elements of a fixed-length vector to a set of similarly named top-level columns.

Combine Records stage: combines records, in which particular key-column values are identical, into vectors of subrecords.

Column Import stage: imports data from a single column and outputs it to one or more columns. Column Export stage: exports data from a number of columns of different data types into a single column of data type string or binary.

12

Enterprise Edition - Complex StructuresSubrecordsA subrecord is a nested data structure. The column with type subrecord does not itself define any storage, but the columns it contains do. These columns can have any data type, and you can nest subrecords one within another. The LEVEL property is used to specify the structure of subrecords. The following diagram gives an example of a subrecord structure.

Promote

Make

VectorsMake Split A vector is a 1 dimensional array of any type except tagged. Elements of a vector are of the same type, and are numbered from 0. A vector can be of fixed or variable length. For fixed length vectors the length is explicitly stated, for variable length ones a property defines a link field which gives the length at run time.

13

Enterprise Edition Combining Vectors and Subrecords

There is a rich ability to support very complex data structures here.

14

Joy of the Command Line

15

The Joy of the Command Line What is dsjob? Utility to backup all the jobs in a project Utility to take BMPs from the command line DSJob exposed as a web service

16

Automatically Backing up Projects@echo off rem This batch script is used to backup all the projects on a DataStage server. It rem must be run from a DataStage client machine and the parameters below should be rem modified to fit your environment. Use of parameters was avoided to simplify backup rem allow the command to be customized to a particular environment. rem rem Based on design by Manoli Krinos rem Modified by M Ruland to allow iteration through a complete server set of projects rem ***************************************************** rem Replace the following variables prior to running rem ***************************************************** rem Host is server name rem User is username to use to attach to DataStage rem PW is password to use to attach to DataStage rem BackupDir is the directory to place the backed up project in (don't forget final /) rem DsxCmd is directory of the export command on client rem DsxCmd1 is the dsjob command to retrieve the project list rem TempProjFile is temp file to store project names rem DSLog is the name of the log file accumulated during the backup rem ***************************************************** rem ***************************************************** Set Host=yourhosthere Set User=yourusername Set PW=yourpassword Set BackupDir=E:\Data\AutoBackup\UserConference\ SET DsxCmd=E:\Progra~1\Ascential\DataStage7Beta\dscmdexport.exe SET DsxCmd1=C:\Ascential\DataStage\Engine\bin\dsjob.exe Set TempProjFile=c:\temp\ProjectList.txt Set DSLog=DataStageDumpLog rem ****************************************************** rem rem ------------------------------------------------------------------------rem Get the current Date rem ------------------------------------------------------------------------FOR /f "tokens=2-4 delims=/ " %%a in ('DATE/T') do SET DsxDate=%%c%%a%%b rem rem ------------------------------------------------------------------------rem Get the current Time rem rem ------------------------------------------------------------------------FOR /f "tokens=1* delims=:" %%a in ('ECHO.^|TIME^|FINDSTR "[0-9]"') do (SET DsxTime=%%b) rem rem ------------------------------------------------------------------------rem Set delimeters so that current time can be broken down into components rem then execute FOR loop to parse the DsxTime variable into Hr/Min/Sec/Hun. rem rem ------------------------------------------------------------------------SET delim1=%DsxTime:~3,1% SET delim2=%DsxTime:~9,1% FOR /f "tokens=1-4 delims=%delim1%%delim2% " %%a in ('echo %DsxTime%') do ( set DsxHr=%%a set DsxMin=%%b set DsxSec=%%c set DsxHun=%%d) ECHO *** Backing up server %Host% == please be patient %DsxCmd1% -server %Host% -user %user% -password %user% -lprojects > %TempProjFile% echo AutoProjectBackup run on %DsxDate%%DsxHr%%DsxMin%%DsxSec% with the following parameters > %BackupDir%%DSLog%%DsxDate%%DsxHr%%DsxMin%%DsxSec%full.log echo Host=%Host%, User=%user%, BackupDir=%BackupDir%, DsxCmd=%DsxCmd%, DsxCmd1=%DsxCmd1% >> %BackupDir%%DSLog%%DsxDate%%DsxHr%%DsxMin%%DsxSec%full.log echo TempProjFile=%TempProjFile%, DSLog=%DSLog% >> %BackupDir%%DSLog%%DsxDate%%DsxHr%%DsxMin%%DsxSec%full.log echo. >> %BackupDir%%DSLog%%DsxDate%%DsxHr%%DsxMin%%DsxSec%full.log echo Following Projects found on %Host% >> %BackupDir%%DSLog%%DsxDate%%DsxHr%%DsxMin%%DsxSec%full.log type %TempProjFile% >> %BackupDir%%DSLog%%DsxDate%%DsxHr%%DsxMin%%DsxSec%full.log rem ************************* rem ** Begin backup loop ** rem ************************* for /F "tokens=1" %%i in (%TempProjFile%) do ( ECHO The current Project is %%i >> %BackupDir%%DSLog%%DsxDate%%DsxHr%%DsxMin%%DsxSec%full.log ECHO Backing up Project %%i rem rem ------------------------------------------------------------------------rem Issue message to screen (stdio) that the export is starting. rem ------------------------------------------------------------------------ECHO Exporting Project=%%i on Host=%Host% into File=%BackupDir%%HostName%%%i%DsxDate%%DsxHr%%DsxMin%%DsxSec%full.dsx ... >> %BackupDir%%DSLog%%DsxDate%%DsxHr%%DsxMin%%DsxSec%full.log %DsxCmd% /H=%Host% /U=%User% /P=%PW% %%i %BackupDir%%HostName%%%i%DsxDate%%DsxHr%%DsxMin%%DsxSec%full.dsx >> %BackupDir%%DSLog%%DsxDate%%DsxHr%%DsxMin%%DsxSec%full.log IF NOT %ERRORLEVEL%==0 GOTO BADEXPORT ECHO. >> %BackupDir%%DSLog%%DsxDate%%DsxHr%%DsxMin%%DsxSec%full.log ECHO *** Completed Export for Project: %%i on Host: %Host% to File: %BackupDir%%HostName%%%i%DsxDate%%DsxHr%%DsxMin%%DsxSec%full.dsx >> %BackupDir%%DSLog%%DsxDate%%DsxHr%%DsxMin%%DsxSec%full.log ECHO ************************************************************************** >> %BackupDir%%DSLog%%DsxDate%%DsxHr%%DsxMin%%DsxSec%full.log ECHO *************