ds designer guide

Upload: princeanilb

Post on 30-May-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/14/2019 DS Designer Guide

    1/280

  • 8/14/2019 DS Designer Guide

    2/280

    Published by A scential Software

    19972002 Ascential Software Corporation. All rights reserved.

    Ascential, DataStage and MetaStage are tradema rks of Ascential Software Corp oration or its affiliates and m ay

    be registered in other jurisdictions

    Documentation Team: Mandy deBelin

    GOVERNMEN T LICENSE RIGHTS

    Software and d ocumenta tion acquired by or for the US Governm ent are provided w ith rights as follows:(1) if for civilian a gency use, with rights as restricted by vend or s stand ard license, as prescribed in FAR 12.212;(2) if for Dept. of Defense use, with rights as restricted by v end ors standard license, un less superseded by anegotiated vend or license, as prescribed in DFARS 227.7202. Any w hole or p artial reprod uction of software ordocumentation marked w ith this legend m ust reproduce this legend.

  • 8/14/2019 DS Designer Guide

    3/280

    Table of Contents ii i

    Table of Contents

    Preface

    Organization of This Manu al .........................................................................................x

    Documentation Conventions .........................................................................................x

    User Interface Conventions .................................................................................. xii

    DataStage Documen tation ..........................................................................................xiii

    Chapter 1. Introduction

    About Data Warehousing ........................................................................................... 1-1

    Operational Databases Versus Data Warehouses ............................................. 1-2

    Constructing the Data Warehou se ...................................................................... 1-2

    Defining the Data Warehou se ............................................................................. 1-3

    Data Extraction ...................................................................................................... 1-3

    Data Aggregation .................................................................................................. 1-3

    Data Tran sform ation ............................................................................................. 1-3

    Ad vantages of Data Warehou sing ...................................................................... 1-4

    About Da taStage .......................................................................................................... 1-4

    Client Components ............................................................................................... 1-5

    Server Components .............................................................................................. 1-6

    DataStage Projects ........................................................................................................ 1-6

    DataStage Jobs .............................................................................................................. 1-6

    DataStage NLS .............................................................................................................. 1-8

    Char acter Set Map s and Locales ......................................................................... 1-8

    DataStage Terms and Concepts .................................................................................. 1-9

    Chapter 2. Your First DataStage Project

    Setting Up You r Project ............................................................................................... 2-2

    Starting the DataStage Designer ......................................................................... 2-3

    Creating a Job ........................................................................................................ 2-4

  • 8/14/2019 DS Designer Guide

    4/280

    iv Ascential DataStageDesigner Guide

    Defining Table Definitions ...................................................................................2-6

    Developin g a Job...........................................................................................................2-9

    Adding Stages ........................................................................................................2-9

    Linking Stages ......................................................................................................2-10

    Editing the Stages ....................................................................................................... 2-11

    Editing the Un iVerse Stage ................................................................................ 2-11

    Editing the Tran sform er Stage ...........................................................................2-16

    Editing th e Sequen tial File Stage.......................................................................2-21

    Compiling a Job ..........................................................................................................2-23

    Runn ing a Job ..............................................................................................................2-24

    Analyzing Your Data Warehouse .............................................................................2-25

    Chapter 3. DataStage Designer Overview

    Starting the DataStage Designer ................................................................................3-1

    The DataStage Designer Window .......................................................................3-2

    Using Annotations .....................................................................................................3-18

    Description Ann otation Properties ...................................................................3-19

    Ann otation Properties ........................................................................................3-20

    Specifying Designer Options ....................................................................................3-21

    Defau lt Options ...................................................................................................3-21

    Expression Editor Options ...............................................................................3-24

    Graph ical Performance Monitor Op tions ........................................................ 3-25Job Sequencer Op tions ........................................................................................3-25

    Printing Op tions ..................................................................................................3-28

    Promp ting Options .............................................................................................3-28

    Transformer Op tions ...........................................................................................3-31

    Exiting the DataStage Designer ................................................................................3-31

    Chapter 4. Developing a Job

    Getting Started with Jobs ............................................................................................4-2

    Creating a Job .........................................................................................................4-2Op ening an Existing Job .......................................................................................4-2

    Saving a Job ............................................................................................................4-4

    Stages ..............................................................................................................................4-5

    Server Job Stages ...................................................................................................4-5

    Mainfram e Job Stages ...........................................................................................4-7

    Parallel Job Stages .........................................................................................................4-9

  • 8/14/2019 DS Designer Guide

    5/280

    Table of Contents v

    Active Stages .......................................................................................................... 4-9

    File Stages............................................................................................................. 4-11

    Database Stages ................................................................................................... 4-12

    Links............................................................................................................................. 4-12

    Linking Server Stages ......................................................................................... 4-12

    Linking Parallel Jobs ........................................................................................... 4-15

    Linking Mainframe Stages................................................................................. 4-18

    Link Ordering ...................................................................................................... 4-20

    Developing the Job Design ....................................................................................... 4-21

    Adding Stages ..................................................................................................... 4-21

    Moving Stages ..................................................................................................... 4-22

    Renaming Stages ................................................................................................. 4-22

    Deleting Stages .................................................................................................... 4-23

    Linking Stages ..................................................................................................... 4-23

    Editing Stages ...................................................................................................... 4-25

    Using the Data Browser ..................................................................................... 4-31

    Using the Performance Mon itor ....................................................................... 4-34

    Compiling Server Jobs and Parallel Jobs ......................................................... 4-37

    Generating Cod e for Mainframe Jobs .............................................................. 4-39

    Job Proper ties .............................................................................................................. 4-44

    Server Job and Para llel Job Prop erties ............................................................. 4-44

    Specifying Job Param eters ................................................................................. 4-46Job Con trol Routines .......................................................................................... 4-55

    Specifying Job Depen dencies ............................................................................ 4-58

    Specifying Perform ance Enhancemen ts .......................................................... 4-60

    Specifying Execution Page Options ................................................................. 4-62

    Specifying Map s and Locales ............................................................................ 4-63

    Mainfram e Job Prop erties .................................................................................. 4-66

    Specifying Mainframe Job Parameters ............................................................ 4-67

    Specifying Mainframe Job Environment Properties ...................................... 4-70

    Specifying Extension Variable Values .............................................................. 4-71The Job Run Options Dialog Box ............................................................................. 4-72

    Chapter 5. Containers

    Local Conta iners ........................................................................................................... 5-1

    Creating a Local Container .................................................................................. 5-2

  • 8/14/2019 DS Designer Guide

    6/280

    vi Ascential DataStageDesigner Guide

    Viewing or Modifying a Local Container ..........................................................5-2

    Using Inpu t and Outp ut Stages ..........................................................................5-3

    Deconstructing a Local Con tainer ......................................................................5-4

    Shared Containers ........................................................................................................5-5

    Creating a Shared Container ...............................................................................5-5

    Viewing or Mod ifying a Shared Container Definition ....................................5-6

    Editing Shared Container Definition Properties ...............................................5-7

    Using a Shared Con tainer in a Job ......................................................................5-9

    Converting Containers ..............................................................................................5-16

    Chapter 6. Job Sequences

    Creating a Job Sequ ence ..............................................................................................6-2

    Activities ........................................................................................................................6-4

    Triggers ..........................................................................................................................6-4

    Control Entities .............................................................................................................6-7

    Nested Conditions ................................................................................................6-7

    Sequencer ................................................................................................................6-7

    Job Sequence Proper ties ..............................................................................................6-8

    Activity Prop erties .....................................................................................................6-12

    Job Activity Prop erties ........................................................................................6-16

    Routine Activity Properties ...............................................................................6-18

    Email Notification Activity Prop erties .............................................................6-19Wait-For-File Activity Properties ......................................................................6-21

    ExecComm and Activity Prop erties ..................................................................6-22

    Exception Activity Properties ............................................................................6-22

    Nested Condition Properties .............................................................................6-23

    Sequencer Properties ..........................................................................................6-23

    Compiling the Job Sequence .....................................................................................6-24

    Chapter 7. Table Definitions

    Table Definition Prop erties .........................................................................................7-2The Table Definition Dialog Box .........................................................................7-2

    Imp orting a Table Definition .............................................................................7-10

    Manua lly Enter ing a Table Definition ..............................................................7-12

    Viewing or Modifying a Table Definition ........................................................ 7-26

    Using the Data Browser ......................................................................................7-28

    Stored Procedure Definitions ....................................................................................7-30

  • 8/14/2019 DS Designer Guide

    7/280

    Table of Contents vi i

    Importing a Stored Procedure Definition ........................................................ 7-30

    The Table Definition Dialog Box for Stored P rocedu res ............................... 7-31

    Manually Entering a Stored Procedure Definition ........................................ 7-33

    Viewing or Modifying a Stored Procedure Definition .................................. 7-36

    Chapter 8. Programming in DataStage

    Programm ing in Server Jobs....................................................................................... 8-1

    The Expression Editor .......................................................................................... 8-2

    Programm ing Compon ents ................................................................................. 8-2

    Routines .................................................................................................................. 8-3

    Transforms ............................................................................................................. 8-4

    Functions ................................................................................................................ 8-4

    Expressions ............................................................................................................ 8-5

    Subroutines ............................................................................................................ 8-5

    Macros .................................................................................................................... 8-6

    Programm ing in Mainframe Jobs .............................................................................. 8-6

    Expressions ............................................................................................................ 8-6

    Routines .................................................................................................................. 8-7

    Programm ing in Parallel Jobs .................................................................................... 8-7

    Appendix A. Editing Grids

    Appendix B. Troubleshooting

    Index

  • 8/14/2019 DS Designer Guide

    8/280

    viii Ascential DataStageDesigner Guide

  • 8/14/2019 DS Designer Guide

    9/280

    Preface ix

    Preface

    This man ual describes the features of the DataStage Designer. It is

    intended for app lication developers and system adm inistrators who

    wan t to use DataStage to design and d evelop d ata warehousing

    applications.

    If you are new to DataStage, read the first two chap ters for an over-

    view of data wa rehousing an d the concepts and use of DataStage.

    The manua l contains enou gh information to get you started in

    designing DataStage jobs. For m ore detailed information abou t partic-ular typ es of d ata source or d ata target, refer toDataStage Server Job

    Developers Guide, DataStage Parallel Job Developers Guide, and XE/390

    Job Developer's Guide.

  • 8/14/2019 DS Designer Guide

    10/280

  • 8/14/2019 DS Designer Guide

    11/280

  • 8/14/2019 DS Designer Guide

    12/280

    xii Ascent ial DataStage Designer Guide

    All punctuation marks includ ed in the syntax for examp le,

    commas, parenth eses, or quotation m arks are required u nless

    otherwise indicated .

    Syntax lines that do not fit on one line in this man ual are

    continued on subsequen t lines. The continuation lines areinden ted. When entering syntax, type the entire syntax entry,

    includ ing the continuation lines, on the sam e inp ut line.

    User Interface Conventions

    The following p ictu re of a typ ical DataStage dialog box illustrates the

    terminology used in describing user interface elements:

    The DataStage u ser interface makes extensive u se of tabbed p ages,

    sometimes nesting them to enable you to reach the controls you needfrom within a single d ialog box. At the top level, these are called

    pages, at the inner level these are called tabs. In the examp le

    above, we are looking at the General tab of the Inputs page. When

    using context sensitive online help you w ill find that each p age has a

    OptionButton

    Button

    Check

    Box

    BrowseButton

    Drop

    ListDown

    The Inputs Page

    The

    TabGeneral

    Field

  • 8/14/2019 DS Designer Guide

    13/280

    Preface xiii

    separate help top ic, but each tab uses the help topic for the p arent

    page. You can jum p to the help p ages for the sepa rate tabs from w ithin

    the online help.

    DataStage DocumentationDataStage docum entation includ es the following:

    DataStage Designer Guide . This guide d escribes the DataStage

    Designer, and gives a general descrip tion of how to create, design,

    and develop a DataStage app lication.

    DataStage Manager Guide . This guide d escribes the DataStage

    Manager and describes how to use and maintain the DataStage

    Repository.

    Dat aStage Serv er Job Dev elopers Guide . This gu ide d escribes thespecific tools that a re used in bu ild ing a server job, and supplies

    programm er s reference informa tion.

    DataStage Parallel Job Developers Guide . This guide describes

    the sp ecific tools that are u sed in bu ilding a p arallel job, and

    sup plies programmer s reference information.

    XE/390 Job Developers Guide. This guide describes the specific

    tools that are used in building a mainframe job, and sup plies

    programm er s reference informa tion.

    DataStage Director Guide : This guide describes the DataStageDirector and h ow to validate, schedu le, run , and m onitor

    DataStage server jobs.

    DataStage Administ rator Guide : This gu ide d escribes DataStage

    setup, routine housekeeping, and administration.

    DataStage Insta ll and Upgrade Guide. This gu ide contains

    instru ctions for installing DataStage on Window s and UNIX

    platforms, and for u pgr ad ing existing installations of DataStage.

    These gu ides are also available online in PDF format. You can read

    them with the Adobe Acrobat Reader sup plied w ith DataStage. SeeDataStage Install and Upgrade Guide for deta ils abou t installing the

    man ua ls and the Adobe Acrobat Reader.

  • 8/14/2019 DS Designer Guide

    14/280

    xiv Ascent ial DataStage Designer Guide

    Extensive online help is also sup plied. This is especially usefu l when

    you have become fam iliar with using DataStage and need to look up

    particular pieces of information.

  • 8/14/2019 DS Designer Guide

    15/280

    Introduction 1-1

    1Introduction

    This chapter is an overview of data warehousing an d Da taStage.

    The last few years have seen the continued growth of IT (informa tion tech-

    nology) and the requiremen t of organizations to make better use of the

    da ta they have at th eir d isposal. This invo lves analyzing d ata in activeda tabases and comparing it with data in archive systems.

    Although offering th e adv antage o f a competitive edge, the cost of consol-

    idating da ta into a d ata mart or d ata warehouse w as high. It also required

    the use of da ta warehousing tools from a nu mber of vendors and the skill

    to create a data warehouse.

    Developing a d ata warehouse or data m art involves design of the data

    warehouse and development of operational processes to popu late and

    maintain it. In ad dition to the initial setup , you m ust be able to hand le on-

    going evolution to accomm oda te new d ata sources, processing, and goals.DataStage simp lifies the d ata w arehousing process. It is an integrated

    prod uct that su pp orts extraction of the source da ta, cleansing, decoding,

    transformation, integration, aggregation, and loading of target d atabases.

    Although p rimarily aimed at d ata warehou sing environm ents, DataStage

    can also be used in an y data hand ling, data m igration, or data reengi-

    neering projects.

    About Data Warehousing

    The aim of data w arehousing is to make more effective u se of the dataavailable in an organization and to aid decision-making p rocesses.

    A data warehouse is a central integrated d atabase containing d ata from all

    the op erational sources and archive systems in an organization. It contains

    a copy of transaction d ata sp ecifically structured for qu ery analysis. This

  • 8/14/2019 DS Designer Guide

    16/280

    1-2 Ascent ial DataStage Designer Guide

    da tabase can be accessed by all users, ensuring th at each g roup in an orga-

    nization is accessing v aluable, stable da ta.

    A data warehouse is a snap shot of the op erational d atabases combined

    with d ata from archives. The data wa rehouse can be created or up da ted at

    any time, with m inimu m d isrup tion to op erational systems. Any n um berof analyses can be performed on the data, which wou ld otherw ise be

    impractical on the operational sources.

    Operational Databases Versus Data Warehouses

    Operational d atabases are usually accessed by many concurren t users. The

    da ta in the database changes qu ickly and often. It is very d ifficult to obtain

    an accurate picture of the contents of the database at any on e time.

    Becau se operational d atabases are task oriented , for examp le, stock inven-

    tory systems, they are likely to contain d irty data. The high throu ghp utof data into operational databases makes it difficult to trap mistakes or

    incomplete entries. However, you can cleanse d ata before loading it into a

    da ta warehou se, ensuring tha t you store only good complete record s.

    Constructing the Data Warehouse

    A da ta warehouse is created by extracting d ata from on e or more opera-

    tional da tabases. The data is tran sform ed to eliminat e inconsistencies,

    aggregated to sum marize data, and loaded into the data w arehouse. The

    end result is a d ed icated da tabase wh ich contains stable, nonv olatile, inte-grated da ta. This data also represents a nu mber of time variants (for

    examp le, da ily, week ly, or month ly values), allowing the user to analyze

    trends in the data.

    The da ta in a data warehouse is classified based on the subjects of interest

    to the organization. For a bank, these sub jects may be custom er, account

    nu mber, and transaction d etails. For a retailer, these may includ e p rodu ct,

    price, quantity sold, and ord er num ber.

    Each d ata warehouse includes d etailed dat a. How ever, where only a

    portion of this d etailed d ata is requ ired , a data m art ma y be more suitable.

    A data martis generated from the data contained in the data wa rehouseand contains focused da ta that is frequen tly accessed or sum marized, for

    example, sales or marketing data.

  • 8/14/2019 DS Designer Guide

    17/280

  • 8/14/2019 DS Designer Guide

    18/280

    1-4 Ascent ial DataStage Designer Guide

    Data is transformed using rou tines based on a tran sforma tion ru le, for

    examp le, produ ct cod es can be m app ed to a comm on format using a trans-

    form ation ru le that ap plies only to product codes.

    After data has been transformed it can be loaded into the data w arehou se

    in a recognized and required format.

    Advantages of Data Warehousing

    A data w arehou sing strategy provides the following ad vantages:

    Capitalizes on th e p otential value of the organizations information

    Imp roves the quality and accessibility of d ata

    Combines valuable archive data with the latest data in operational

    sources

    Increases the amou nt of information available to users

    Redu ces the requirement of users to access operational data

    Redu ces the strain on IT d epartments, as they can prod uce one

    database to serve all u ser groups

    Allows new reports and studies to be introdu ced w ithout

    disrup ting operational systems

    Promotes users to be self sufficient

    About DataStageDataStage has the following features to aid the d esign and processing

    required to build a d ata warehouse:

    Uses graphical design tools. With simp le point-and -click tech-

    niques you can d raw a schem e to represent your p rocessing

    requirements.

    Extracts data from any num ber or type of database.

    Hand les all the meta d ata d efinitions required to define your d ata

    warehouse. You can view and modify the table definitions at any

    point dur ing the design of your ap plication.

    Aggregates data. You can modify SQL SELECT statements used to

    extract data.

  • 8/14/2019 DS Designer Guide

    19/280

    Introduction 1-5

    Transforms data. DataStage has a set of pred efined transforms and

    functions you can use to convert your dat a. You can easily extend

    the functionality by defining your ow n tran sforms to use.

    Loads the data warehouse.

    DataStage consists of a num ber of client an d server comp onents. For more

    information, see Client Components on page 1-5 and Server Compo-

    nentson page 1-6.

    DataStage jobs are comp iled and run on the Da taStage server. The job w ill

    connect to da tabases on other machines as necessary, extract data , process

    it, then w rite the data to the target d ata warehou se. This type of job is

    known as a server job.

    If you have XE/ 390 installed, DataStage is able to generate jobs which are

    compiled and run on a m ainframe. Data extracted by such jobs is then

    loaded into the d ata w arehouse. Such jobs are called mainframe jobs.

    Client Components

    DataStage has four client comp onents w hich are installed on any PC

    runn ing Windows 2000 or Windows NT 4.0 w ith Service Pack 4 or later:

    DataStage Designer. A design interface used to create DataStage

    applications (know n as jobs). Each job specifies the data sou rces,

    the transforms requ ired , and the d estination of the data. Jobs are

    compiled to create executab les that are schedu led by th e Director

    and run by the Server (mainfram e jobs are transferred an d ru n onthe mainframe).

    DataStage Director. A user interface u sed to validate, schedu le,

    run , and monitor DataStage server jobs.

    DataStage Manager. A user interface used to view an d ed it the

    content s of the Repository.

    DataStage Administrator. A user interface used to perform admin-

    istration tasks such as setting up DataStage u sers, creating and

    moving projects, an d setting u p pu rging criteria.

  • 8/14/2019 DS Designer Guide

    20/280

  • 8/14/2019 DS Designer Guide

    21/280

    Introduction 1-7

    Mainframe jobs. These are available only if you have XE/ 390

    installed . A mainframe job is compiled and run on the m ainframe.

    Data extracted by su ch jobs is then loaded into the data warehouse.

    There are two other entities that are similar to jobs in th e way they ap pear

    in the Da taStage Designer, and are han dled by it. These are:

    Shared containers. These are reusable job elements. They typically

    comprise a nu mber of stages and links. Copies of shared containers

    can be u sed in any number of server jobs and edited as required.

    Job Sequences. A job sequence allows you to specify a sequence of

    DataStage jobs to be executed , and actions to take d epend ing on

    results.

    DataStage jobs consist of ind ividu al stages. Each stage d escribes a partic-

    ular d atabase or p rocess. For example, one stage may extract d ata from a

    da ta source, while another transforms it. Stages are add ed to a job andlinked together using the Designer.

    There are three basic types of stage:

    Built-in stages. Sup plied w ith DataStage and used for extracting,

    aggregating, transforming , or writing d ata. All types of job have

    these stages.

    Plug-in stages. Additional stages that can be installed in DataStage

    to perform specialized tasks that the bu ilt-in stages do not sup por t.

    Only server jobs have th ese.

    Job Sequence S tages. Special bu ilt-in stages which allow you to

    define sequ ences of activities to run . Only Job Sequen ces have

    these.

    The follow ing diagram represents one of the simp lest jobs you cou ld hav e:

    a da ta sou rce, a Transformer (conversion) stage, and the fina l d atabase.

    The links betw een the stages represent the flow of data into or ou t of a

    stage.

    DataSource

    TransformerStage

    DataWarehouse

  • 8/14/2019 DS Designer Guide

    22/280

    1-8 Ascent ial DataStage Designer Guide

    You m ust specify the data you w ant at each stage, and how it is hand led.

    For examp le, d o you w ant all the columns in the source da ta, or only a

    select few? Shou ld the d ata be aggregated or converted before being

    passed on to the next stage?

    You can u se DataStage w ith MetaBrokers in ord er to exchange m eta d atawith other d ata w arehou sing tools. You might, for examp le, import table

    definitions from a d ata m odelling tool.

    DataStage NLSDataStage has bu ilt-in N ational Language Sup por t (NLS). With N LS

    installed, DataStage can do the following:

    Process data in a wide range of languages

    Accept d ata in an y character set into most DataStage fields

    Use local formats for dates, times, and money

    Sort data according to local rules

    Convert d ata between different encodings of the same language

    (for exam ple, for Japanese it can convert JIS to EUC)

    DataStage N LS is optiona lly installed a s pa rt of the DataStage server. If

    NLS is installed, variou s extra features (such as d ialog box pages and

    drop -down lists) app ear in th e prod uct. If NLS is not installed , these

    features do n ot app ear.Using N LS, the DataStage server engine hold s data in Unicode format.

    This is an in ternational standard character set that contains near ly all the

    characters used in langu ages around the world . DataStage map s data to or

    from Unicode format as required.

    Character Set Maps and Locales

    Each DataStage p roject has a langu age assigned to it du ring installation.

    This equates to one or more char acter set map s and locales which supp ort

    that langu age. One map an d one locale are assigned as project defau lts. The m aps define the character sets that the project can use.

    The locales d efine the local formats for da tes, times, sorting order,

    and so on that the p roject can use.

  • 8/14/2019 DS Designer Guide

    23/280

    Introduction 1-9

    The DataStage client and server comp onen ts also have m aps assigned to

    them d ur ing installation to ensure that data is transferred betw een them

    in the correct character set. For m ore information, seeDataStage Adminis-

    trators Guide.

    When you design a Da taStage job, you can override the p roject defaultmap at several levels:

    For a job

    For a stage within a job

    For a colum n w ithin a stage (for Sequential, ODBC, and generic

    plug-in stages)

    For transforms and routines used to manipulate data within a

    stage

    For imported meta data and table definitions

    The locale and character set information becomes an integral part of the

    job. When you package an d release a job, the NLS sup port can be used on

    another system, p rovided that th e correct maps an d locales are installed

    and loaded.

    DataStage Terms and ConceptsThe following terms are used in DataStage:

    Term Description

    ad ministrator The p erson who is resp onsible for the m ain te-nan ce and configu ration of DataStage, and forDataStage users.

    after-job subroutine A routine that is executed after a job runs.

    after-stage subroutine A routine that is executed after a stageprocesses data .

    Aggregator stage A stage type that computes tota ls o r o therfu nctions of sets of da ta.

    Annotation A note attached to a DataStage job in theDiagram w indow.

    BCPLoad stage A p lu g-in stage su pp lied w ith DataStage th atbu lk load s da ta into a Microsoft SQL Server orSybase table. (Server jobs on ly.)

  • 8/14/2019 DS Designer Guide

    24/280

    1-10 Ascent ial DataStage Designer Guide

    before-job subroutine A routine that is executed before a job is run.

    before-stage

    subroutine

    A rou tine that is executed before a stage

    processes any d ata.built-in d ata elements There are two types of built-in d ata elements:

    those that represent the base types used byDataStage du ring processing and th ose thatdescribe d ifferent date/ time formats.

    built-in t ransforms The transforms supplied with DataStage. SeeDataStage Server Job Developers Gu ide for acomp lete list.

    Change Apply stage A parallel job stage that applies a set of captu red changes to a d ata set.

    Change Capture stage A parallel job stage that compares two datasets and records th e d ifferences between them.

    Cluster Type of system provid ing parallel processing.In cluster systems, there are m ultiple proces-sors, and each has its own hard ware resourcessuch as disk and m emory.

    column defin it ion Defines the columns contained in a da ta table.Includ es the colum n n ame and the type of datacontained in the column.

    Column Export stage A parallel job stage that exports a colum n of

    another typ e to a string or binary colum n.Column Import stage A parallel job stage that imports a column

    from a string or binary colum n.

    Comb ine Recordsstage

    A parallel job stage that combines severa lcolum ns associated by a key field to bu ild avector.

    Com pare stage A p arallel job stage that p erform s a colu mn bycolum n comp are of two pre-sorted d ata sets.

    Com plex Flat Filestage

    A mainframe source stage that extracts datafrom a flat file containing complex data stru c-tures, such as a rrays, groups, and redefines.

    Compress stage A par allel job stage that compresses a data set.

    container A group of stages and links in a job design.

    Term Description

  • 8/14/2019 DS Designer Guide

    25/280

  • 8/14/2019 DS Designer Guide

    26/280

  • 8/14/2019 DS Designer Guide

    27/280

    Introduction 1-13

    Graphicalperformance monitor

    A monitor that d isplays status information an dperformance statistics again st links in a jobopen in the DataStage Designer canvas as the

    job ru ns in the Director or debugger.

    Hashed File st age A stage tha t ext ract s da ta from or loads da tainto a database that contains hashed files.(Server jobs only)

    H ead stage A parallel job stage that copies the specifiednu mber of records from the beginning of adata partition.

    Informix XPS stage A parallel job stage that allows you to read andwr ite an Inform ix XPS database.

    Inter -process stage A server job stage that allows you to run serverjobs in p arallel on an SMP system.

    job A collection of linked stages, data elements,and transforms that define how to extract,cleanse, transform, integrate, and load d atainto a ta rget da tabase. Jobs can either be serverjobs or mainframe jobs.

    job control rout ine A routine that is used to create a controllingjob, wh ich invokes and runs other jobs.

    job sequ en ce A con trolling job w hich invokes and ru ns oth er

    jobs, built using the grap hical job sequ encer.Join stage A mainframe processing stage or parallel job

    active stage that joins two inpu t sources.

    Link Collector stage A server job stage that collects previouslypar titioned d ata together.

    Link Partitioner stage A server job stage that allows you to partition

    da ta so that it can be processed in p arallel on an

    SMP system.

    local con ta in er A con tain er wh ich is local to th e job in w hich itwas created.

    Looku p stage A m ainframe p rocessing stage and Parallelactive stage that p erforms table lookup s.

    Lookup File st age A parallel job stage that provides storage for alookup table.

    Term Description

  • 8/14/2019 DS Designer Guide

    28/280

    1-14 Ascent ial DataStage Designer Guide

    m ainfram e job A job that is transferred to a main fram e, th encomp iled and run there.

    Make Subrecord stage A parallel job stage that combines a number ofvectors to form a subrecord.

    Make Vector stage A parallel job stage that combines a number of fields to form a vector.

    Merge stage A p arallel job stage th at combines d ata sets.

    meta data Data about d ata, for example, a table d efinitiondescribing column s in w hich data isstructured.

    MetaBroker A tool that allows you to exchange meta databetween DataStage and other data w are-

    hou sing tools.

    MPP Type of system provid ing parallel processing.In MPP (massively par allel processing)systems, there are multiple processors, andeach has its own hard ware resources such asdisk and m emory.

    Multi-Format Flat Filestage

    A mainframe source stage that hand lesdifferent formats in flat file data sou rces.

    N LS National Language Support. With NLSenabled, DataStage can sup port the hand ling

    of data in a var iety of character sets.normalization The conversion of records in N F2 (nonfirst-

    norm al form) format, containing m ultivaluedda ta, into on e or m ore 1NF (first normal form)rows.

    null value A special value representing an unknownvalue. This is not th e sam e as 0 (zero), a blank,or an empty string.

    ODBC stage A stage that extracts data from or loads d atainto a da tabase that implemen ts the indu stry

    standard Op en Database Connectivity API.Used to represent a d ata source, an ag grega-tion step , or a target data tab le. (Server jobsonly)

    operator The person scheduling and monitoringDataStage jobs.

    Term Description

  • 8/14/2019 DS Designer Guide

    29/280

    Introduction 1-15

    Orabu lk stage A p lu g-in stage su pp lied w ith DataStage thatbu lk loads data into an Oracle d atabase table.(Server jobs only)

    Oracle stage A parallel job stage that allow s you to read andwr ite an Or acle da tabase.

    par allel extender The Da taStage op tion that a llows you to runparallel jobs.

    parallel job A type of DataStage job that allow s you to takead van tage o f para llel processing on SMP, MPP,and cluster systems.

    Peek stage A parallel job stage that prin ts column values to

    the screen as records are copied from its input

    data set to one or more outp ut d ata sets.p lug-in A definition for aplug -in stage.

    plu g-in stage A stage that performs specific processing thatis not sup ported by the standard server jobstages.

    Promote Subrecordstage

    A parallel job stage that prom otes themem bers of a subrecord to a top level field.

    Relational stage A ma inframe source/ target stage that readsfrom or writes to an MVS/ DB2 database.

    Remove du plicatesstage A parallel job stage that removes d up licateentries from a data set.

    Repository A DataStage area w here projects and jobs arestored as w ell as d efinitions for all standardand user-defined d ata elemen ts, transforms,and stages.

    SAS Data Set stage A parallel job s tage that provides s torage forSAS data sets.

    SAS stage A parallel job stage that allows you to run SASapplications from within the DataStage job.

    Sam ple stage A p arallel job stage th at sam ples a d ata set.

    Sequen tial File stage A stage tha t extracts data from, or wr ites datato, a text file. (Server job an d parallel job only)

    server job A job that is compiled and run on theDataStage server.

    Term Description

  • 8/14/2019 DS Designer Guide

    30/280

    1-16 Ascent ial DataStage Designer Guide

    shared con ta iner A conta iner wh ich exists as a separ ate item inthe Repository and can be used by any serverjob in the p roject.

    SMP Type of system provid ing parallel processing.In SMP (symm etric multiprocessing ) systems,there are multiple processors, bu t these shareother hard ware resources such as disk andmemory.

    Sort stage A mainframe processing stage or parallel jobactive stage that sorts inpu t colum ns.

    source A source in DataStage terms means any data-base, whether you are extracting da ta from itor writing data to it.

    Split Subrecord stage A parallel job stage that separates a numb er ofsubrecords into top level colum ns.

    Split Vector st age A parallel job stage that separates a number of vector members into separ ate columns.

    stage A component that represents a d ata source, aprocessing step, or the data m art in aDataStage job.

    table d efin ition A d efin ition d escribin g the d ata you w an tinclud ing informa tion abou t the data table and

    the columns associated w ith it. Also referred toas meta data.

    Tail stage A parallel job stage that copies the specifiednu mber of records from the end of a datapartition.

    Terad ata stage A p arallel stage th at allow s you to read an dwr ite a Teradata d atabase.

    t ransform funct ion A funct ion that takes one va lue and computesanother v alue from it.

    Transformer Editor A graphical interface for edit ing Transformer

    stages.Transformer stage A stage where data is transformed (converted)

    using transform fun ctions.

    Term Description

  • 8/14/2019 DS Designer Guide

    31/280

  • 8/14/2019 DS Designer Guide

    32/280

  • 8/14/2019 DS Designer Guide

    33/280

    Your First DataStage Project 2-1

    2Your First

    DataStage Project

    This chap ter describes the steps you need to follow to create you r first data

    warehouse, using the sample data p rovided . The example builds a serverjob and uses a Un iVerse tab le called EXAMPLE1,which is autom atically

    copied into you r DataStage project du ring server installation.

    EXAMPLE1 represents an SQL table from a w holesaler who deals in car

    parts. It contains details of the wheels they h ave in stock. There are app rox-

    imately 255 rows of data and fou r colum ns:

    CODE. The produ ct code for each type of wh eel.

    PRODUCT. A text description of each typ e of wh eel.

    DATE. The d ate new w heels arrived in stock (given in terms ofyear, month, and d ay).

    QTY. The num ber of wh eels in stock.

    The aim of this examp le is to develop and run a DataStage job that:

    Extracts the data from the file.

    Converts (transforms) the data in the DATE colum n from a

    complete date (YYYY-MM-DD) stored in intern al data format, to a

    year and m onth (YYYY-MM) stored as a string.

    Loads d ata from the DATE, CODE, and QTY colum ns into a d atawarehouse. The da ta warehou se is a sequential file that is created

    wh en you ru n the job.

  • 8/14/2019 DS Designer Guide

    34/280

    2-2 Ascent ial DataStage Designer Guide

    To load a data mart or data warehouse, you mu st do the following:

    Set up your project

    Crea te a job

    Develop the job

    Edit the stages in the job Compile the job

    Run the job

    This chapter d escribes the m inimu m ta sks requ ired to create a DataStage

    job. In the examp le, you will use the bu ilt-in settings and options su pp lied

    with DataStage. How ever, because DataStage allows you to customize and

    extend the bu ilt-in fu nctionality provided , it is possible to perform ad di-

    tional processing at each step . Where this is possible, add itiona l

    procedu res are listed un d er a section called A dvanced Procedures. These

    adv anced p rocedu res are discussed in detail in subsequen t chap ters.

    Setting Up Your ProjectBefore you create any DataStage jobs, you m ust set u p your p roject by

    entering information about you r data. This includes the name an d location

    of the tables or files holding you r d ata and a d efinition of the column s they

    contain. Information is stored in tab le definitions in the Repository. The

    easiest way to enter a table definition is to imp ort d irectly from th e source

    data.

    If you w ere working on a large d ata warehousing p roject, you w ould p rob-ably use the DataStage Manager to set up the p roject. As this example is

    simp le, and requires you on ly to import a single table definition, you are

    better doing th is d irectly from th e DataStage Designer.

  • 8/14/2019 DS Designer Guide

    35/280

    Your First DataStage Project 2-3

    Starting the DataStage Designer

    To start the D ataStage Designer, choose StartPrograms Ascential

    DataStage DataStage Designer. The Attach to Project d ialog box

    appears:

    This dialog box ap pears w hen you start the DataStage Designer, Manager,

    or Director client com ponen ts from the DataStage program folder. In all

    cases, you mu st attach to a p roject by entering your logon d etails.

    Note: The program group may be called something other than

    DataStage,dep ending on how DataStage was installed .

    To connect to a project:1. Enter the name of your host in the Host system field. This is the n am e

    of the system w here the DataStage Server comp onents are installed.

    2. Enter your user name in the User name field. This is you r user name

    on the server system.

    3. Enter your password in the Password field.

    Note: If you are connecting to the server v ia LAN Manager, you can

    select the Omit check box. The User name and Password fields

    gray out and you log on to the server using you r Windows N T

    Domain accoun t details.

    4. Choose the project to connect to from the Project drop -down list box.

    This list box d isplays all the projects installed on your DataStage

    server. Choose you r project from the list box. At this point, you m ay

    only have one project installed on you r system and this is displayed

    by defau lt.

  • 8/14/2019 DS Designer Guide

    36/280

    2-4 Ascent ial DataStage Designer Guide

    5. Select th e Save settings check box to save you r logon settings.

    6. Click OK. The DataStage Designer w indow app ears with the New

    dialog box open, ready for you to create a new job:

    Creating a Job

    When a DataStage p roject is installed, it is empty and you mu st create the

    jobs you need. Each Da taStage job can load one or more d ata tables in th e

    final da ta warehouse. The nu mber of jobs you have in a project depend s

    on your data sources and how often you wan t to extract d ata or load the

    data warehouse.

  • 8/14/2019 DS Designer Guide

    37/280

  • 8/14/2019 DS Designer Guide

    38/280

    2-6 Ascent ial DataStage Designer Guide

    2. Enter Example1 in the Job name field.

    3. Enter Example in the Category field.

    4. Click OK to save the job. The u pd ated DataStage Designer w indow

    displays the nam e of the saved job.

    Defining Table Definitions

    For most da ta sources, the qu ickest and simp lest way to specify a table

    definition is to import it directly from your da ta source or data warehouse.

    In this examp le, you mu st specify a table definition forEXAMPLE1.

    Importing a Table Definition

    The following steps d escribe how to imp ort a tab le definition for

    EXAMPLE1:1. In the Repository window of the DataStage Designer, select the Table

    Definitions branch, and choose Import UniVerse Table D efini -

    tions from th e shortcu t menu . The Import Metadata (UniVerse

    Tables) dialog box appears:

    2. Choose localuv from the D SN d rop-dow n list box.

  • 8/14/2019 DS Designer Guide

    39/280

    Your First DataStage Project 2-7

    3. Click OK. The up da ted Import Metadata (UniVerse Tables) dialog

    box displays all the files for the chosen data sou rce nam e:

    Note: The screen shot shows an example of tables found un d er

    localuv. Your system may contain d ifferent files to the ones

    shown h ere.

    4. Selectproject.EXAMPLE1 from the Tables list box, wh ereproject is

    the name of your DataStage p roject.

    5. Click OK. The colum n information from EXAMPLE1 is imported into

    DataStage. A table definition is created and is stored u nd er the Table

    Definitions UniVerselocaluv branch in the Repository. The

    up da ted DataStage Designer w indow d isplays the new table defini-

    tion en try in the Repository w indow.

    To view the n ew table definition, dou ble-click theproject .EXAMPLE1

    item in the Repository wind ow. The Table D efinition dialog box appears.

    This dialog box has up to five pages. Click the tabs to d isplay each p age.

    The General page contains informa tion abou t where the data is foun d an d

    when the d efinition was created .

  • 8/14/2019 DS Designer Guide

    40/280

  • 8/14/2019 DS Designer Guide

    41/280

    Your First DataStage Project 2-9

    Developing a JobJobs are designed and dev eloped using th e Designer. The job d esign is

    developed in the Diagram w indow (the one with grid lines). Each data

    source, the d ata warehou se, and each p rocessing step is represented by astage in the job d esign. The stages are linked together to show the flow of

    data.

    This example requires three stages:

    A UniVerse stage to representEXAMPLE1 (the d ata source)

    A Transformer stage to convert the d ata in the DATE colum n from

    a YYYY-MM-DD d ate in internal date format to a string giving just

    year and m onth (YYYY-MM)

    A Sequen tial File stage to represent the file created at run time (the

    data w arehouse in this example)

    Adding Stages

    Stages are add ed using the tool palette. This palette contains icons that

    represent the comp onents you can add to a job. By d efau lt the tool palette

    is docked to th e top of the Designer screen, but you can move it anyw here.

    The compon ents present depend on what w as installed w ith DataStage. A

    typical tool palette is show n below:

    Link

    ContainerInput Stage

    TransformerStage

    AnnotationBCP LoadStage

    UniVerse

    Stage

    SequentialFile Stage

    Hashed File

    Stage

    Container

    Stage

    Container

    Output Stage

    Description

    Annotation

    Orabulk

    Stage

    Folder

    Stage

    UniData

    Stage

    ODBC

    StageAggregator

    StageIPC Stage

    Link Partition Link Collector

    StageStage

  • 8/14/2019 DS Designer Guide

    42/280

  • 8/14/2019 DS Designer Guide

    43/280

    Your First DataStage Project 2-11

    5. Save the job design by choosing File

    Save.Keep th e Designer open as you w ill need it for the n ext step.

    Advanced Procedures

    For more ad vanced procedu res, see the following top ics in Chapter 4:

    Moving Stageson page 4-22

    Renam ing Stages on page 4-22

    Deleting Stages on page 4-23

    Editing the StagesYour job d esign cu rrently displays the stages and the links betw een them .

    You mu st edit each stage in the job to specify the data to use and what to

    do with it. Stages are edited in the job design by double-clicking each stage

    in turn . Each stage type h as its ow n ed itor.

    Editing the UniVerse Stage

    The data sou rce (EXAMPLE1) is represented by a UniVerse stage. You

    mu st specify the d ata you wan t to extract from this file by ed iting thestage.

    Dou ble-click the stage to ed it it. The UniVerse Stage dialog box appear s:

  • 8/14/2019 DS Designer Guide

    44/280

    2-12 Ascent ial DataStage Designer Guide

    This dialog box has tw o pages:

    Stage. Displayed by d efault. This page contains the nam e of the

    stage you a re editing. The General tab sp ecifies where the file is

    foun d and the connection typ e.

    Outputs. Contains information d escribing t he da ta flowing fromthe stage. You edit this page to describe the d ata you want to

    extract from the file. In this examp le, the ou tpu t from this stage

    goes to the Tran sform er stage.

    To ed it the UniVerse stage:

    1. Check that you are displaying the General tab on the Stage page.

    Choose localuv from the D ata source name d rop-dow n list. localuv

    is whereEXAMPLE1 is copied to d uring installation.

    The remaining par ameters on the General and Details tabs are used

    to enter logon d etails and d escribe wh ere to find the file. BecauseEXAMPLE1 is installed in localuv, you d o not have to complete these

    fields, wh ich are d isabled.

    2. Click th e Outputs tab. The Outputs p age app ears:

  • 8/14/2019 DS Designer Guide

    45/280

    Your First DataStage Project 2-13

    The Outputs page contains the name of the link th e data flows along

    and the following four tabs:

    General. Contains the name of the table to use and an op tional

    description of the link.

    Columns. Contains information abou t the columns in the table.

    Selection. Used to enter an op tional SQL SELECT clause (anAdvancedprocedure).

    View SQL. Disp lays the SQL SELECT statem ent u sed to extract the

    data.

    3. Choose dstage.EXAMPLE1 from the Available tables drop-down

    list.

    4. Click Add to add dstage.EXAMPLE1 to the Table names field.

    5. Click th e Columns tab. The Columns tab ap pears at the front of the

    dialog box.You m ust specify the columns contained in the file you w ant to use.

    Because the colum n d efinitions are stored in a table definition in th e

    Repository, you can load them d irectly.

  • 8/14/2019 DS Designer Guide

    46/280

  • 8/14/2019 DS Designer Guide

    47/280

    Your First DataStage Project 2-15

    10. ClickOK to save the stage ed its and close the UniVerse Stage dialog

    box. Notice that a small table icon ap pear s on the ou tpu t link to indi-

    cate that it now has colum n d efinitions associated w ith it.

    11. Choose FileSave to save you r job d esign so far.

    Note: In server jobs column d efinitions are attached to a link. You can

    view or ed it them at either end of the link. If you chan ge them in a

    stage at one en d of the link, the chan ges are au tomatically seen in

    the stage at the other end of the link. This is how colum n d efini-

    tions are prop agated throug h all the stages in a Da taStage server

    job, so the colum n d efinitions you loaded into the Un iVerse stage

    are viewed when you edit the Transformer stage.

  • 8/14/2019 DS Designer Guide

    48/280

    2-16 Ascent ial DataStage Designer Guide

    Editing the Transformer Stage

    The Transformer stage performs any d ata conversion required before the

    da ta is outpu t to anoth er stage in the job d esign. In this example, the Trans-

    form er stage is used to convert the d ata in the DATE colum n from a YYYY-

    MM-DD date in internal date format to a string giving just the year and

    month (YYYY-MM).

    There are two links in the stage:

    The input from the data source (EXAMPLE1)

    The outpu t to the Sequential File stage

    To ena ble the use of one of the bu ilt-in Da taStage tr ansforms, you w ill

    assign d ata elemen ts to the DATE colum ns inp ut and ou tpu t from the

    Transformer stage. A DataStage d ata element d efines more p recisely the

    kind of data that can ap pear in a given colum n.

    In this examp le, you assign the Date da ta elemen t to the inpu t colum n, to

    specify the date is inp u t to the transform in internal forma t, and the

    MON TH .TAG da ta element to the outp ut column , to specify that the tran s-

    form prod uces a string of the format YYYY-MM.

    Note: If the d ata in the other column s required transforming, you cou ld

    assign Da taStage d ata elements to these colum ns too.

  • 8/14/2019 DS Designer Guide

    49/280

    Your First DataStage Project 2-17

    Dou ble-click the Transformer stage to ed it it. The Transformer Editor

    appears:

    Inpu t column s are show n on th e left, outp ut column s on the right. The

    up per p anes show the colum ns together with derivation details, the lower

    panes show the colum n meta d ata. In this case, inpu t columns have

    alread y been defined for inp ut link DSLink3. No ou tpu t column s have

    been d efined for outp ut link DSLink4, so the right pan es are blank.

    The next steps are to define the colum ns that w ill be output by the Trans-

    form er stage, and to specify the transform th at w ill enable the stage toconvert the type and format of dates before they are outpu t.

    1. Working in the u pp er-left pan e of the Transformer Editor, select the

    inpu t columns that you w ant to derive outp ut column s from. Click on

    the CODE, DATE, and QTY colum ns w hile holding dow n the Ctrl

    key.

    2. Click the left mouse button again and , keeping it held down , drag the

    selected colum ns to the outp ut link in the u pp er-right pan e. Drop the

    colum ns over the Column Name field by releasing the mou se button.

  • 8/14/2019 DS Designer Guide

    50/280

  • 8/14/2019 DS Designer Guide

    51/280

    Your First DataStage Project 2-19

    is directly d erived from the inpu t DATE colum n. Select the text

    DSLink3 and d elete it by pressing th e Delete key.

    8. Right-click in the Expression Editor box to open the Suggest

    Operand menu:

  • 8/14/2019 DS Designer Guide

    52/280

  • 8/14/2019 DS Designer Guide

    53/280

  • 8/14/2019 DS Designer Guide

    54/280

    2-22 Ascent ial DataStage Designer Guide

    To ed it the Sequ ential File stage:

    1. Click th e Inputs tab. The Inputs page app ears. This page contains:

    The nam e of the link. This is automatically set to the link n ame

    used in the job d esign. General tab. Contains the path nam e of the file, an op tional

    description of the link, and up da te action choices. You can use the

    default settings for this example, but you may w ant to enter a file

    nam e (by d efau lt the file is named after the inp ut link).

    Format tab. Determines how the d ata is written to th e file. In this

    example, the d ata is written u sing the d efau lt settings, that is, as a

    comma-delimited file.

    Columns tab. Contains the colum n d efinitions for the d ata you

    want to extract. This tab contains the colum n d efinitions specifiedin the Tran sformer stages ou tput link.

    2. Enter the pathname of the text file you w ant to create in the File name

    field, for examp le, seqfile.txt. By d efau lt the file is placed in the

    server project directory (for example, c:\ Ascen-

    tial\ DataStage\ Projects\ datastage) and is named after the inpu t link,

    bu t you can enter, or brow se for, a different d irectory.

    3. Click OK to close the Sequential File Stage d ialog box.

    4. Choose FileSave to save the job design.

    The job design is now complete and ready to be comp iled.

  • 8/14/2019 DS Designer Guide

    55/280

    Your First DataStage Project 2-23

    Compiling a JobWhen you finish you r design you mu st comp ile it to create an executab le

    job. Jobs are compiled using the Designer. To compile the job, do on e of the

    following: Ch oose File Compile.

    Click the Compile bu tton on the toolbar.

    The Compile Job wind ow ap pears:

    The job is comp iled . The result of the comp ilation ap pea rs in the disp lay

    area. If the resu lt of the com pilation is Job successfully compiled

    with no errors you can g o on to schedu le or run the job. The execut-

    able version of the job is stored in you r p roject along w ith your job design.

    If an error is d isplayed , clickShow Error. The stage w here the p roblem

    occurs is highlighted in the job d esign. Check that all the inpu t and outp ut

    colum n definitions have been specified correctly, and that you have

    entered d irectory paths and file or table names wh ere appropriate.

    For more information abou t the error, clickMore. ClickClose to close the

    Comp ile Job w indow.

  • 8/14/2019 DS Designer Guide

    56/280

    2-24 Ascent ial DataStage Designer Guide

    Running a JobExecutab le jobs are schedu led by the DataStage Director and run by the

    DataStage Server. You can start th e Director from the Designer by choosing

    Tools

    Run Director.When th e Director is started, the DataStage Director w indow appears w ith

    the statu s of all the jobs in you r project:

    H ighlight you r job in the Job name column. To run the job, choose Job

    Run N ow or click the Run but ton on the toolbar. The Job Run Options

    dialog box appears and allow s you to specify any p aram eter values and tospecify an y job ru n limits. In th is case, just clickRun . The status changes

    to Run ning. When the job is complete, the status changes to Finished .

    Choose FileExit to close the DataStage Director wind ow.

    Refer toDataStage Director Guide for more information abou t sched uling

    and running jobs.

    Advanced Procedures

    It is possible to run a job from w ithin another job. For more information,

    see Job Con trol Rou tines on page 4-55 an d Chapter 6, Job Sequences.

  • 8/14/2019 DS Designer Guide

    57/280

  • 8/14/2019 DS Designer Guide

    58/280

    2-26 Ascent ial DataStage Designer Guide

  • 8/14/2019 DS Designer Guide

    59/280

    DataStage Designer Overview 3-1

    3DataStage Designer

    Overview

    This chap ter d escribes the m ain featu res of the Da taStage Designer. It tellsyou h ow to start the Designer an d takes a quick tour of the user interface.

    Starting the DataStage DesignerTo start the D ataStage Designer, choose StartPrograms Ascential

    DataStage DataStage Designer. The Attach to Project d ialog box

    appears:

    You can also start the Designer from the shortcut icon on the d esktop, or

    from the DataStage Suite applications bar if you have DataStage XE

    installed.

    You must connect to a project as follows:

  • 8/14/2019 DS Designer Guide

    60/280

    3-2 Ascent ial DataStage Designer Guide

    1. Enter the name of your host in the Host system field. This is the n am e

    of the system w here the DataStage Server comp onents are installed.

    2. Enter your user name in the User name field. This is you r user name

    on the server system.

    3. Enter your password in the Password field.

    Note: If you are connecting to the server v ia LAN Manager, you can

    select the Omit check box. The User name and Password fields

    gray out and you log on to the server u sing your Wind ows NT

    Domain accoun t deta ils.

    4. Choose the project to connect to from the Project drop -down list box.

    This list box d isp lays all the p rojects installed on you r DataStage

    server.

    5. Select th e Save settings check box to save you r logon settings.

    6. Click OK. The DataStage Designer w indow app ears, by d efau lt with

    th e New dialog box open, allowing you to choose a type of job to

    create. You can set options to specify that th e Designer opens w ith an

    emp ty server or m ainfram e job, or noth ing at all, see Specifying

    Designer O ptionson page 3-21.

    Note: You can also start the DataStage Designer d irectly from the

    DataStage Manager or Director by choosing ToolsRun

    Designer.

    The DataStage Designer Window

    By d efault, DataStage initially starts w ith the New dialog box op en. You

    can choose to create a new job as follows:

    Server job. These run on the DataStage Server, connecting to other

    da ta sou rces as necessary.

    Mainframe job . These are available only if you have installed

    XE/ 390. Mainframe jobs are up load ed to a m ainframe, where they

    are comp iled and run. Parallel job. These are available only if you have installed the

    Parallel Extend er. These ru n on DataStage servers that are SMP,

    MPP, or cluster systems.

  • 8/14/2019 DS Designer Guide

    61/280

    DataStage Designer Overview 3-3

    Shared containers. These are reu sable job elements. Copies of

    shared containers can be used in any num ber of server jobs and

    edited as requ ired .

    Job Sequences. A job sequence allows you to specify a sequence of

    DataStage server jobs to be executed , and actions to takedepend ing on results.

    Or you can choose to op en an existing job of any of these types. You can

    use the DataStage options to specify that the Designer a lways opens a new

    server or mainframe job, shared container or job sequen ce w hen its starts.

    The initial appear ance of the DataStage Designer is shown below:

    The design p ane on the right side and the Property browser are both

    emp ty, and a limited num ber of menus app ear on the m enu bar. To see a

    more fully pop ulated Designer w indow, choose FileNew and choose

    the typ e of job to create from the N ew dialog box (this process w ill befamiliar to you if you w orked th rough the examp le in Chapter 2, Your

    First DataStage Project.) For the p ur poses of this example, we created a

    server job.

  • 8/14/2019 DS Designer Guide

    62/280

    3-4 Ascent ial DataStage Designer Guide

    Menu Bar

    There are nine pu ll-down menus. The commands available in each m enu

    change d epending on w hether you are currently displaying a server job,parallel job, or a m ainframe job.

    File. Creates, opens, closes, and

    saves DataStage jobs. Also sets up

    printers, comp iles server and

    parallel jobs, genera tes and

    up loads ma inframe jobs, and exits

    the Designer.

    Mainframe JobServer JobParallel Job

  • 8/14/2019 DS Designer Guide

    63/280

    DataStage Designer Overview 3-5

    Edit. Renam es or deletes stages and

    links in the Diagram w indow. Defines

    job p roper ties (Job Properties item),

    and displays the stage dialog boxes

    (Properties item). For server jobs andshared containers only, allows you to

    construct local or shared containers,

    deconstruct local containers, and

    convert local conta iners to shared

    containers and vice versa.

    View. Determines wha t is displayed in

    the DataStage Designer w indow.

    Displays or hid es the toolbar, tool

    pa lette, status bar, Repository w indow,and Property brow ser. For server jobs

    and shared containers on ly, allows you

    to display or hide the debu g bar. Other

    comma nd s allow you to custom ize the

    tool palette and refresh the view of the

    Repository items in the Repository

    window.

    Diagram. Determines w hat actions are

    performed in the Diagram w indow.Displays or hides the grid or p rint

    lines, enables or disables anno tations,

    activates or d eactivates the Snap to

    Grid option, and zooms in or out of the

    Diagram w indow. Also tur ns perfor-

    man ce monitoring on for server or

    par allel jobs. The snap to grid and

    zoom p roperties are app lied to the job

    or container wind ow currently

    selected. The settings are saved wh enthe job or container is saved and

    restored w hen it is open . The other

    settings are personal to you, and are

    saved betw een DataStage sessions

    ready for you to use again. When you

    chang e person al settings they affect all

    open wind ows immediately.

  • 8/14/2019 DS Designer Guide

    64/280

  • 8/14/2019 DS Designer Guide

    65/280

    DataStage Designer Overview 3-7

    The Property Browser

    The property browser is located by d efau lt in th e top left corner of the

    DataStage Designer wind ow (you can m ove it if required). It d isplays the

    prop erties of the object cur rently selected in the Diagram window. The

    prop erties given d epend on the typ e of object selected. It allow s you to ed itsome of the p roperties withou t opening a d ialog box.

    For stages and containers, it gives:

    Stage type

    Shared container nam e (shared container stages only)

    Name

    Descr ipt ion

    You can ed it the name, and add or ed it a description, but you cann otchange the stage typ e.

    For links, the property browser gives:

    Name

    Input link description

    Output link description

    You can ed it the name, and add or ed it a description.

  • 8/14/2019 DS Designer Guide

    66/280

    3-8 Ascent ial DataStage Designer Guide

    The Repository Window

    The Repository wind ow gives details of the items associated w ith the

    current p roject w hich are h eld in th e DataStage Repository. The w indow

    provides a subset of the DataStage Manager functionality. From the

    Designer you can ad d, delete, and edit the following:

    Data elements

    Job and job sequence properties

    Mainframe machine profiles

    Rou tin es

    Shared container properties

    Stage type propert ies

    Table definitions

    Transforms

    Detailed information is in D ataStage Developer s H elp an d DataStage

    Manager Guide. A guide to d efining and editing table definitions is given

    in this gu ide (Chapter 7) because tab le definitions are so central to job

    design.

  • 8/14/2019 DS Designer Guide

    67/280

    DataStage Designer Overview 3-9

    In the Designer Repository wind ow you can p erform any of the actions

    that you can perform from th e Repository tree in the M anager. When you

    select a category in the tree, a shortcut menu allows you to create a new

    item u nd er that category or a new su bcategory, or, for Table Definition

    categories, import a table definition from a d ata sou rce. When you selectan item in the tree, a shortcut menu allows you to perform various tasks

    dep end ing on the type of item selected:

    Data elements, machine profiles, routines,

    transforms

    You can create a copy of these items, rename

    them , d elete them and display the properties

    of the item. Provided the item is not read-only,

    you can ed it the prop erties.

    Jobs, shared containersYou can create a copy of these items, rename

    them, delete them and edit them in the

    diagram wind ow.

    Sta ge typ es

    You can add stage types to the d iagram

    window p alette and d isplay their prop erties.

    Provided the item is not read-only, you can

    edit the p roperties.

    Table definit ions

    You can create a copy of table definitions,

    rename them, delete them and d isplay the

    prop erties of the item. Provided the item is not

    read -only, you can ed it the p roperties. You can

    also imp ort table d efinitions from da ta sou rces.

    It is a good idea to choose View Refresh from the m ain menu bar before

    acting on any Repository items to ensure that you have a comp letely up-

    to-date view.

    You can d rag certain types of item from the Repository w indow onto a

    diagram w indow or the diagram window area, or onto specific comp o-

    nents w ithin a job:

    Jobs the job opens in a new d iagram w indow or, if dragged to a

    job sequence w indow, is add ed to th e job sequen ce.

  • 8/14/2019 DS Designer Guide

    68/280

    3-10 Ascent ial DataStage Designer Guide

    Shared containers if you d rag one onto an open d iagram window,

    the shared container ap pears in the job. If you d rag a shared

    container onto the background a new diagram wind ow op ens

    showing the contents of the shared container.

    Stage types drag a stage type onto an open d iagram w indow toad d it to the job or container. You can also drag it to the tool palette

    to add it as a tool.

    Table definitions drag a table definition on to a link to load the

    column d efinitions for that link. The Select Column s d ialog box

    allows you to select a subset of colum ns from the table definition to

    load if requ ired.

    The Diagram Window

    The area to the r ight of the DataStage Designer h olds the Diagramwindow s. A Diagram window app ears for each job, job sequence, or

    shared container that you open in your p roject. By d efau lt the diagram

  • 8/14/2019 DS Designer Guide

    69/280

    DataStage Designer Overview 3-11

    window has a colored backgroun d . You can tu rn this off using the Options

    d ialog box (seeDefault Options on page 3-21). The screenshots in this

    guide hav e the background turned off.

    The diagram w indow is the canvas on w hich you d esign and display your

    job. This wind ow has the follow ing comp onents:

    Title bar. Displays the nam e of the job or shared container.

    Page tabs. If you u se local containers in your job, the contents of

    these containers are d isplayed in separate window s within the

    jobs diagram wind ow. Sw itch between views using the tabs at th e

    bottom of the diagram w indow.

  • 8/14/2019 DS Designer Guide

    70/280

    3-12 Ascent ial DataStage Designer Guide

    Grid lines. Allow you to position stages more precisely in the

    wind ow. The grid lines are not displayed by d efau lt. Choose

    DiagramShow Grid Lines to enable them .

    Scroll bars. Allow you to view the job comp onents that do not fit in

    the display area.

    Print lines. Display the area that is printed wh en you choose File

    Print. The p rint lines also ind icate page bound aries. When you

    cross these, you have the choice of printing over several pages or

    scaling to fit a single page w hen pr inting. The p rint lines are n ot

    displayed by default. Choose DiagramShow Print Lines to

    enable them.

    You can use the resize hand le or the Maximize button to resize a diagram

    window. To resize the contents of the wind ow, use the zoom comman ds in

    the Diagram shortcut menu . If you maximize a window an ad d itionalmen u ap pears to the left of the File men u, giving access to Diagram

    window controls.

    By default, any stages you ad d to the Diagram w indow will snap to the

    grid lines. You can, how ever, turn th is op tion off by u nchecking Diagram

    Snap to Grid, clicking the Snap to Grid bu tton in the toolbar, or from

    th e Designer Options d ialog box.

    The d iagram w indow has a shortcut menu wh ich gives you access to the

    settings on the Diagram menu (see Menu Bar on page 3-4):

    Toolbar

    The Designer toolbar contains th e following buttons:

  • 8/14/2019 DS Designer Guide

    71/280

  • 8/14/2019 DS Designer Guide

    72/280

    3-14 Ascent ial DataStage Designer Guide

    The following is an exam ple und ocked server job tool palette:

    To ad d a stage to the D iagram w indow, choose it from the tool pa lette and

    click the Diagram w indow. The stage is add ed at the insertion p oint in the

    diagram w indow. If you click and d rag on the diagram w indow to draw a

    rectangle as an insertion p oint, the stage will be sized to fit that rectangle.

    You can also drag stages from the tool palette or from the Repository

    window and drop them on the Diagram window.

    To link tw o stages, choose the Link bu tton. Click the first stage, then d rag

    the mouse to the second stage. The stages are linked when you release the

    mouse button.

    You can customize the tool palette to add or remove variou s button s. Youcan add the buttons for plug-ins you h ave installed, and remove the

    buttons for stages you know y ou w ill not use. There are various ways in

    which you can customize the palette:

    In the palette itself.

    From the Repository window.

    From the Customize Toolbar dialog box.

    To custom ize the tool p alette from the p alette itself:

    To remove an existing item from th e palette, select it while holding

    dow n the CTRL and shift keys, and dr ag it off the palette (toanyw here other than a Diagram w ind ow).

    To mov e an item to another position in the p alette, select it while

    holding d own the CTRL and sh ift keys and dr ag it to the desired

    position.

    Link

    ContainerInput Stage

    TransformerStage

    AnnotationBCP LoadStage

    UniVerseStage

    Sequential

    File Stage

    Hashed File

    Stage

    Container

    Stage

    Container

    Output Stage

    Description

    Annotation

    Orabulk

    Stage

    Folder

    Stage

    UniData

    Stage

    ODBC

    Stage

    Aggregator

    StageIPC Stage

    Link Partition Link CollectorStageStage

  • 8/14/2019 DS Designer Guide

    73/280

    DataStage Designer Overview 3-15

    To add an add itional item to the palette, choose Customize Palette

    from th e shortcut m enu . The Customize Toolbar dialog box opens

    (see below for information abou t the Customize Toolbar dialog

    box).

    To customize the p alette from the Repository w indow :

    To add an ad d itional item to the palette, drag the item from the tree

    in the Repository wind ow to the p alette, or select Add to Palette

    from the items shortcut m enu .

    To custom ize the p alette using the Customize Toolbar dialog box:

    1. Choose View Customize Palette, or choose Customize Palette

    from a shortcut menu. The Customize Toolbar dialog box app ears.

    2. Thisdialog box lists all available stage types d epend ing on the typ e

    of job (server, mainframe, or job sequen ce) whose diagram window iscurren tly active. To ad d items to a p alette, select the icon in the Avail-

    able toolbar buttons wind ow and clickAd d. To remove items, select

    them in the Current toolbar buttons wind ow and clickRemove.

    There are some tools that m ust always be p resent on the p alette, and

    th e Remove bu tton is blanked ou t wh en these are selected .

    3. To arrange the buttons in the palette, select an item in the Current

    toolbar buttons list and use the Move Up an d Move Down buttons.

    4. Click Close to close the dialog box and d isplay the customized tool

    palette, or clickReset to reset to the d efault p alette settings.

  • 8/14/2019 DS Designer Guide

    74/280

    3-16 Ascent ial DataStage Designer Guide

    Status Bar

    The status bar appears at the bottom of the DataStage Designer w indow. It

    displays one-line help for the wind ow comp onents and information on the

    curren t state of job op erations, for example, comp ilation of server jobs. You

    can hide the status bar by choosing View Status Bar.

    Debugger Toolbar

    Server jobs DataStage has a bu ilt-in debu gger that can be u sed w ith server jobs or

    shared containers. The d ebugger toolbar contains bu ttons representing

    debu gger functions. You can h ide the d ebugger toolbar by choosing View

    Debu g Bar. The debug bar h as a drop-dow n list displaying currently

    open server jobs, allowing you to select one of these as the d ebug focus.

    Shortcut Menus

    There are a nu mber of shortcut m enus available wh ich you d isplay by

    clicking the right mou se button. The m enu displayed depend s on w here

    you clicked .

    Background. App ears when you right-

    click on the background area in the left of

    the Designer (i.e. the space arou nd

    Diagram w indows), or in any of the

    toolbar/ pa lette background areas. Gives

    access to the same items as the View

    menu (see page 3-5).

    Diagram window background. Appears

    wh en you right-click on a window back-

    groun d. Gives access to the same items as

    th e Diagram menu (see page 3-5).

    Go Stop Job

    Job Edit

    Toggle

    Parameters

    Breakpoint

    BreakpointsClear AllBreakpoints

    Step to

    Next Link

    Step toNext Row

    DebugWindow

    View Job Log

    Target debug job

  • 8/14/2019 DS Designer Guide

    75/280

  • 8/14/2019 DS Designer Guide

    76/280

  • 8/14/2019 DS Designer Guide

    77/280

  • 8/14/2019 DS Designer Guide

    78/280

    3-20 Ascent ial DataStage Designer Guide

    Font. Click th is to open a d ialog box w hich allow s you to specify a

    different font for the annota tion text.

    Color. Click this to open a d ialog box which allows you to specify a

    different color for the annotation text.

    Background color. Click th is to open a d ialog box w hich allow s

    you to specify a different backgroun d color for the annotation.

    Border. Select this to specify that the border of the ann otation is

    visible.

    Transparent. Select this to choose a tran sparent background .

    Description Type. Choose wheth er the Description Annotation

    displays the full description or short d escription from the job

    properties.

    Annotation Properties

    The Annotation Properties dialog box is as follows:

    The properties are the same as d escribed for description an notations,

    except there are n o Description Type options.

  • 8/14/2019 DS Designer Guide

    79/280

  • 8/14/2019 DS Designer Guide

    80/280

    3-22 Ascent ial DataStage Designer Guide

    The page has three areas:

    When Designer starts. Determines whether the Designer automat-

    ically op ens a new job w hen started , or p romp ts you for a job to

    create or open .

    Nothing Open. This is the defau lt option. The Designer op ens

    with no jobs, shared containers, or job sequences open, you can

    then d ecide w hether to op en an d existing item, or create a new

    one.

    Prompt for. Select this and choose New, Existing or Recent from

    the d rop-dow n list. The New d ialog box appears wh en you start

    the DataStage Designer, with th e New, Existing , or Recent page

    on top , allowing you to choose an item to op en.

    Create new. Select this and choose Server, Mainframe, Parallel,

    Sequencejob or Shared container from th e d rop-dow n list. Ifthis is selected, a new job of the sp ecified typ e is automatically

    created w hen the DataStage Designer is started.

    New job/container view attributes . Determines whether the snap

    to grid option w ill be on or not for any new jobs, job sequences, or

    shared containers that are open ed.

    Appearance. These options allow y ou to d ecide h ow the Designer

    background canvas is displayed and how the stage icons app ear on

    the canvas.

    By d efau lt the canvas has a backgrou