datastage experiments

119
GETTING STARTED WITH DATASTAGE Opening Virtual machine: 1) Run Datastage shortcut. 2) Goto action menu in menu bar and select “Ctrl+alt+delete”. 3) Give the login password as “P@ssw0rd”. Press “OK”. 4) Wait for 5 min to get loaded with all the services. NOTE: Don’t move the mouse cursor very often and don’t open the Internet Explorer as it makes the services slower. To check whether all the services are running or not: 1) Goto Run. 2) Type “services.msc” 3) Press Enter. 4) Check whether “IBM websphere” service is started or not. To Cleanup temporary Files 1) Run Cleanup.exe 2) Click the button cleanup 3) Wait for some time until all the temporary files get cleared. 4) Close.

Upload: goktech1991

Post on 05-Feb-2016

190 views

Category:

Documents


58 download

DESCRIPTION

Datastage Experiments for important stages

TRANSCRIPT

Page 1: Datastage Experiments

GETTING STARTED WITH DATASTAGE

Opening Virtual machine:

1) Run Datastage shortcut.2) Goto action menu in menu bar and select “Ctrl+alt+delete”.3) Give the login password as “P@ssw0rd”. Press “OK”.4) Wait for 5 min to get loaded with all the services.

NOTE: Don’t move the mouse cursor very often and don’t open the Internet Explorer as it makes the services slower.

To check whether all the services are running or not:

1) Goto Run.2) Type “services.msc”3) Press Enter.4) Check whether “IBM websphere” service is started or not.

To Cleanup temporary Files

1) Run Cleanup.exe2) Click the button cleanup3) Wait for some time until all the temporary files get cleared.4) Close.

Page 2: Datastage Experiments

Opening Designer client(Infosphere Data stage and quality stage):

1) Run “Designer client.exe”.2) Enter the username and password +ok.

Page 3: Datastage Experiments

Exercise-1 : Loading data from oltpsrc file to a dwhtarget file

Step 1:

File->New->Parallel Job.

Create a project in the repository by right clicking on dtstage1 and creating a new folder.

Name that folder.

Goto file->sequential file on palette

Drag and Drop the sequential file option twice to the work area.

Goto general->link on palette

Connect two sequential files by using link in work area(like drawing arrow in paint).

Page 4: Datastage Experiments

sequential_file(oltp) -> sequential_file(DWH)

copying the contents from oltp to DWH using flat file.

Step 2:

Create a txt file named “src.txt”.

Type some records with the structure (eno,ename,sal)

Rename sequential_File_o and sequential_File_1 as ‘oltpsrc’ and ‘dwhtarget’ respectively.

Page 5: Datastage Experiments

Step 3:

Setting oltpsrc properties

Double click ‘oltpsrc’ file on the work area

Set the properties as follows

File: Location of the source file

First Line is Column Name:Set True if first line of src file has column names else False

Page 6: Datastage Experiments

Set Format as follows:

Final delimeter = end(represents end of file)

Delimeter= Set the delimeter that you have used in the src file for separating each field

Quote=single|double|none as per the usage in src file fields.

Page 7: Datastage Experiments

Define Column name and datatype

Step 4: Setting ‘dwhtarget’ file properties

Page 8: Datastage Experiments

File=path of target file

File Update Mode=Overwrite (overwrites the target file if exists)|Create(creates a new file)|Append(append to the target file)

First Line is Column Names=True (treats first line of your src file as column names and skips the first line)|False (Loads the first line to the target file)

Step 5: Save Your Project:

Goto file-> save as

Item name: Project name

Folder Path: Path of your Project Folder

Step 6: Compiling Project:

Click the compile button on the toolbar.

Page 9: Datastage Experiments

Step 7: Run the Project:

Click the run button on the toolbar.

Warnings

No limit: Runs the process even if n warnings are present

Abort job after: Aborts the process after encountering the specified no. of warnings.

Note:

Before clicking Run close your src file and target file

Page 10: Datastage Experiments

Link Color status during run time.

Black-process not started

Blue-process is going on

Red- Process aborted

Green-Process completed successfully

Page 11: Datastage Experiments

Step 8: Run Director:

Now Goto->Tools->Run Director

It maintains run logs for all the projects.

To view logs: select the desired project and goto ->view -> log

Page 12: Datastage Experiments

Exercise 2: Pump the data from source to target with some constraints using ‘FILTER’ Stage

Filter is used for restricting each row of a file based upon certain conditions set against a/multiple fields in the row.

Eg: Select * from emp where sal>10000;

Step1: Create a new parallel project

Step 2: Save the project with a name.

Step 3: Drag and Drop three sequential files into the work area.

Step 4: Drag and Drop a Filter from processing option on palette into work area

Page 13: Datastage Experiments

Step 5: Create a source file named “src.txt”

Step 7: Set sequential_File_0 properties same as in exercise 1.

Step 8: Set Filter Properties as follows.

Setting Constraints:

Page 14: Datastage Experiments

Predicates:

1st Where clause condition for the link DSLink12(sal<=10000)

the sequential_file_1 will have the rows satisfies the above constraint

2nd where clause condition for the link DSLink11(sal>10000 and sal<=20000)

the sequential_file_2 will have the rows satisfies the above constraint

Options:

output Rejects=true for DSLink10 and right click on the DSLink10->select Convert to Stream

Keep Output Rejects=false ; if there is no

Now the sequential_file_3 will have the rows that are rejected from the above two constraints

Output Settings:

Page 15: Datastage Experiments

Mapping Columns:

1. Select the output link from the combo box2. Drag and Drop the columns from left to right side3. Redo the above steps for all the output links

Step 9: Set sequential_file_1, sequential_file_2, sequential_file_3 properties same as in exercise 1.

Step 10: Compile

Step 11: Run the project and observe the output.

Page 16: Datastage Experiments

Exercise-3: Load the target file from multiple src files using ‘Funnel’ stage

Step1: Create a new parallel project

Step 2: Save the project with a name.

Step 3: Drag and Drop four sequential files into the work area and rename them as src1, src2, src3 and target respectively.

Step 4: Drag and Drop a funnel from processing option on palatte into the work area.

Step 5: Set the src 1,src 2, scr 3 properties same as in exercise 1.

Page 17: Datastage Experiments

Step 6: Set Funnel Properties as follows

Properties settings

Funnel Type=Continuous Funnel.

Target file is loaded with all the src files in the order in which the src link comes to the funnel.

Funnel Type=Sequence Funnel.

Target file is loaded with all the src files in the order with which the src files are place in the work area.i.e., from top to bottom.

Page 18: Datastage Experiments

Funnel Type=Sort Funnel.

Target file is loaded with all the src files in the sorted manner based on the sor key value and sort order.

Output settings:

Page 19: Datastage Experiments

Step 7: set target file properties same as in exercise 1.

Step 8: Compile

Step 9: Run the project

Output:

Source files:

Target File on

1. Funnel Type=Continous Funnel

Page 20: Datastage Experiments

2. Funnel Type=Sequence Funnel

3. Funnel Type=Sort Funnel with key=ename and sort order=Ascending.

Page 21: Datastage Experiments

Exercise- 4: Pump the target file from the source file in the sorted order using ‘SORT’ stage

Step1: Create a new parallel project

Step 2: Save the project with a name.

Step 3: Drag and Drop two sequential files into the work area.

Step 4: Drag and Drop sort from processing option on the palette into the work area.

Step 5: set sequential_file_0 properties same as in exercise 1.

Step 6: set sort properties as follows

Page 22: Datastage Experiments

Output setting:

Step 7: set sequential_file_1 properties same as in exercise 1.

Step 8: compile and run the project.

OUTPUT:

Source file:

Page 23: Datastage Experiments

Target File:

Sort can also be performed with the link directed from funnel

Page 24: Datastage Experiments

The above case won’t work because Funnel link should be directed directly to Sort

Exercise -5: Load the target file after removing duplicate rows from the src file using ‘Remove Duplicates’ stage.

Page 25: Datastage Experiments

Step1: Create a new parallel project

Step 2: Save the project with a name.

Step 3: Drag and Drop two sequential files into the work area.

Step 4: Drag and Drop ‘Remove Duplicates’ from processing option on the palette into the work area.

Step 5: set sequential_file_0 properties same as in exercise 1.

Step 6: set ‘remove duplicates’ properties as follows.

Key=eno (Key column for the operation)

Page 26: Datastage Experiments

Duplicate to Retain=Last.

Row Duplicates:

Eno, ename, salary

101,gokul,10000

102,gopal,20000

101,gokul,15000

101,gokul,25000

103,kumar,20000

The record (101,gokul) has been duplicated for 3 times with different salary values. We need the latest updated row. So use the stage ‘Remove Duplicates’ as it removes all the duplicate rows keeping the last (or) first row retained.

Duplicate row search is made using the key, ‘eno’ in our case.

We can customize the duplicate to be retained by setting Duplicate to Retain=Last | First.

Output Settings:

Step 7: set sequential_file_1 properties same as in exercise 1.

Step 8: compile and run the project.

Page 27: Datastage Experiments

OUTPUT FOR THE ABOVE SETTINGS:

SOURCE FILE:

TARGET FILE:

Exercise 6: Join the rows in two src files and load them into the target using ‘JOIN’ stage

Page 28: Datastage Experiments

Step1: Create a new parallel project

Step 2: Save the project with a name.

Step 3: Drag and Drop three sequential files into the work area.

Step 4: Drag and Drop ‘Join’ from processing option on the palette into the work area.

Page 29: Datastage Experiments

Step 5: Set sequential_file_0 and sequential_file_1 properties same as in exercise 1 but select a key in both files with which the join has to be made. In our example we have selected the key as ‘eno’.

Page 30: Datastage Experiments

Step 6: set join properties as follows.

Page 31: Datastage Experiments

Key= eno

Join Type= Inner|Left outer|Right outer|Full Outer

Output Settings:

Note:

While Joining keep your small table as left table and big table as right table for better performance.

Step 7: set sequential_file_2 properties as same as in exercise 1.

Step 8: Compile and Run the project.

Page 32: Datastage Experiments

OUTPUT:

Source File 1 and 2:

Target file after Inner Join:

Target file after Left outer join:

Page 33: Datastage Experiments

Target file after Right outer join:

Target file after full outer join:

Page 34: Datastage Experiments

Exercise -7: Generate n number of dummy records under a defined table or structure using ‘Row Generator’ stage.

Step1: Create a new parallel project

Step 2: Save the project with a name.

Step 3: Drag and Drop a sequential file into the work area.

Step 4: Drag and Drop Row Generator from Development/Debug option on the palette into the work area.

Page 35: Datastage Experiments

Step 5: Set Row_Generator properties as follows

Output Settings:

Specifying the length and scale values is important here.

Sal=12000.00 (length=7 and scale=2)//generates all the values of decimal domain column with same no. of digits.

Page 36: Datastage Experiments

Length value for char is fixed length.(all the values of char domain column have fixed no. of characters)

Length value for integer and varchar is their upper limit i.e., the max no. of digits for integer and the max no. of characters for varchar.

Step 6: Set sequential_file_1 properties as same as in exercise 1.

Output:

Target File:

Page 37: Datastage Experiments

Exercise 8: Load data from a flat src file to a target oracle database using ‘oracle connector’ stage.

Step1: Create a new parallel project

Step 2: Save the project with a name.

Step 3: Drag and Drop a sequential file into the work area.

Step 4: Drag and Drop oracle connector from Database option on the palette into the work area.

Page 38: Datastage Experiments

Step 5: Set sequential_file_1 properties as same as in exercise 1.

Step 6: Starting Oracle services.

Start OracleJobSchedulerorcl, Orcaleoradb11g_home1 TNSListener, OracleServiceORCL services

Page 39: Datastage Experiments
Page 40: Datastage Experiments

Step 7: set oracle_connector properties as follows.

Check oracle connectivity by pressing the Test button under connection.

You can also View Data that has been imported using View Data button under usage.

Page 41: Datastage Experiments
Page 42: Datastage Experiments

Output Settings:

Specifying the length and scale values is important here.

Sal=12000.00 (length=7 and scale=2)//generates all the values of decimal domain column with same no. of digits.

Length value for char is fixed length.(all the values of char domain column have fixed no. of characters)

Length value for integer and varchar is their upper limit i.e., the max no. of digits for integer and the max no. of characters for varchar.

Step 8: Compile and run the project.

Page 43: Datastage Experiments

Output:

Source File:

Target:

Username: Scott/tiger@orcl

Page 44: Datastage Experiments

Exercise 9: Load data from an oracle database to a target flat file using ‘oracle connector’ stage.

Step1: Create a new parallel project

Step 2: Save the project with a name.

Step 3: Drag and Drop a sequential file into the work area.

Step 4: Drag and Drop oracle connector from Database option on the palette into the work area.

Page 45: Datastage Experiments

Step 5: Starting Oracle services.

Start OracleJobSchedulerorcl, Orcaleoradb11g_home1 TNSListener, OracleServiceORCL services

Page 46: Datastage Experiments

Step 6: Import a table (This will take a snapshot of the original table and this snapshot is used for further processing with better performance since reading each and every record from the oracle database via an oracle connection requires more overhead)

Since importing a table is equivalent to a snapshot, you have to perform it for each time whenever the table faces any changes.

The changes you are making in the table should be committed before importing it into the datastage, especially in oracle.

Page 47: Datastage Experiments

Username : scott

Password : tiger

Page 48: Datastage Experiments
Page 49: Datastage Experiments

Step 7: Set the oracle_connector properties as follows.

Page 50: Datastage Experiments

Column Settings:

Load the columns from the ‘employee’ table as follows

a. Click the button loadb. Select the table from the ‘table definitions’ wizard.c. Select the desired columns from the ‘select columns’ wizard

Page 51: Datastage Experiments
Page 52: Datastage Experiments

Step 7: set sequential_file_0 properties as same as in exercise 1.

Step 8: compile and run the project.

Page 53: Datastage Experiments

OUTPUT:

Target File:

Page 54: Datastage Experiments

Exercise 10: Load data from teradata database to oracle database using ‘Teradata connector and Oracle Connector’ stage.

Step1: Create a new parallel project

Step 2: Save the project with a name.

Step 3: Drag and Drop Teradata Connector and Oracle Connector from the Database option on the palette into the work area.

Page 55: Datastage Experiments

Step 4: Start teradata services.

Step 5: Import a teradata database.

Page 56: Datastage Experiments

Username: tduser

Password: tduser

Page 57: Datastage Experiments

Step 6: Set Teradata_Connector properties as follows.

Check oracle connectivity by pressing the Test button under connection.

You can also View Data that has been imported using View Data button under usage.

Page 58: Datastage Experiments

Column Settings:

Procedure is same as in exercise 9.

Specifying the length and scale values is important here. (from any db to db (or) from file to any db)

Sal=12000.00 (length=7 and scale=2)//generates all the values of decimal domain column with same no. of digits.

Length value for char is fixed length.(all the values of char domain column have fixed no. of characters)

Length value for integer and varchar is their upper limit i.e., the max no. of digits for integer and the max no. of characters for varchar.

Step 7: Set Oracle Connector properties as same as in exercise 8.

Step 8: Compile and run the project.

OUTPUT:

Target:

Username: Scott/tiger@orcl

Page 59: Datastage Experiments

Exercise 11: Load data from oracle database to teradata database using ‘Teradata connector and Oracle Connector’ stage.

Step1: Create a new parallel project

Step 2: Save the project with a name.

Step 3: Drag and Drop Teradata Connector and Oracle Connector from the Database option on the palette into the work area.

Step 4: Start oracle and teradata services.

Step 5. Import an oracle table.

Step 6: Set Oracle_Connector properties as same as in exercise 9.

Step 7: Set Teradata_Connector properties as follows.

Page 60: Datastage Experiments

Step 8: Compile and run the project.

Output:

At Teradata

Page 61: Datastage Experiments

Exercise 12: Load data from an Teradata database to a target flat file using ‘Teradata connector’ stage.

Step1: Create a new parallel project

Step 2: Save the project with a name.

Step 3: Drag and Drop a sequential file into the work area.

Step 4: Drag and Drop teradata connector from Database option on the palette into the work area.

Step 4: Start teradata services.

Step 5. Import a teradata table.

Step 6: Set teraddata_Connector properties as same as in exercise 10.

Step 7: Set Sequential_File properties as same as in exercise 1.

Step 8: Compile and run the project.

OUTPUT:

Source table and Target Flat file.

Page 62: Datastage Experiments

Exercise 13: Load data from an a target flat file to a Teradata database using ‘Teradata connector’ stage.

Step1: Create a new parallel project

Step 2: Save the project with a name.

Step 3: Drag and Drop a sequential file into the work area.

Step 4: Drag and Drop teradata connector from Database option on the palette into the work area.

Step 4: Start teradata services.

Step 5: Set Sequential_File properties as same as in exercise 1.

Step 6: Set teraddata_Connector properties as same as in exercise 10.

Step 7: Compile and run the project.

OUTPUT:

Source Target flat file and Target teradata table.

Page 63: Datastage Experiments

Exercise 14: Load data from teradata database to a teradata database using ‘Teradata connector’stage.

Step1: Create a new parallel project

Step 2: Save the project with a name.

Step 3: Drag and Drop two Teradata Connectors from the Database option on the palette into the work area.

Step 4: Start teradata services.

Step 5. Import a teradata table.

Step 6: Set teradata_connector_0 properties as same as in exercise 10.

Step 7: Step 6: Set teradata_connector_1 properties as same as in exercise 11.

Step 8: Compile and Run the project.

OUTPUT:

Source new_emp teradata table and Target cpy_emp teradata table.

Page 64: Datastage Experiments

Exercise 15: Load data from oracle database to a oracle database using ‘oracle connector’stage.

Step1: Create a new parallel project

Step 2: Save the project with a name.

Step 3: Drag and Drop two oracle Connectors from the Database option on the palette into the work area.

Step 4: Start oracle services.

Step 5. Import an oracle table.

Step 6: Set oracle_connector_0 properties as same as in exercise 11.

Step 7: Step 6: Set oracle_connector_1 properties as same as in exercise 10.

Step 8: Compile and Run the project.

OUTPUT:

Source oracle table ‘dept’:

Target Oracle table ‘cpy_dept’:

Page 65: Datastage Experiments

Exercise 16: Perform some aggregations on the src flat file and load them into a target flat file using ‘Aggregator’ stage.

Step1: Create a new parallel project

Step 2: Save the project with a name.

Step 3: Drag and Drop two sequential files into the work area.

Step 4: Drag and Drop Aggregator from processing option on the palette into the work area.

Step 5: Set sequential_file_0 properties as same as in exercise 1.

Page 66: Datastage Experiments

Step 6: Set Aggregator properties as follows.

Select deptid, max(sal) “Max_Sal” from emp group by deptid;

Group = deptid (group by column)

Aggregation Type=Calculation|Count Rows | Re-calculation

Column For Calculation=sal (column on which the aggregation has to be performed)

Maximum Value Output Column=Max_Sal (Alias name )

Page 67: Datastage Experiments

Column Mapping:

Column Settings

By default data type for all aggregation type will be Double. So reset the type as per your desire.

Step 7: Set Sequential_File_1 properties as same as in exercise 1.

Step 8: Compile and Run the project.

Page 68: Datastage Experiments

OUTPUT:

Source File

Target File on ‘Select deptid, max(sal) “Max_Sal” from emp group by deptid;’

Page 69: Datastage Experiments

Exercise 17: Load from src flat file to a target flat file with some derived columns using ‘Transformer’ stage.

Step1: Create a new parallel project

Step 2: Save the project with a name.

Step 3: Drag and Drop two sequential files into the work area.

Step 4: Drag and Drop ‘Transformer’ from processing option on the palette into the work area.

Page 70: Datastage Experiments

Step 5: set sequential_file_0 properties as same as in exercise 1.

Step 6: Set transformer properties as follows.

Drag and Drop the columns on which derivations have to be performed from left to right (Column Mapping).

Page 71: Datastage Experiments

In the right hand side right click on each column and select function->any desired function, then the function prototype will be loaded in the column.

Edit the column as per the prototype (for ex: on selecting UpCase, UpCase(%string%) will be loaded. Edit the parameter value as DSLink5.ename)

Page 72: Datastage Experiments

Deriving ‘Grade’ column from the sal column using If Else with the same procedure as above.

At the right bottom side rename the columns if you want (Here we are renaming ‘ename’ as ‘Emp_Name’, ‘sal’ as ‘Annual_salary’ ). Changes will get updated in DSLink6 table.

Be conscious in setting the datatype for each derived columns.

Step 7: set sequential_file_1 properties as same as in exercise 1.

Page 73: Datastage Experiments

Step 8: Compile and run the project.

OUTPUT:

Source File:

Target File:

Page 74: Datastage Experiments

Exercise 18: Compare two tables (DWH and OLTP) and Capture the changes in OLTP table with respect to DWH table then load the changes to a flat file using ‘Change Capture’ stage.

Step1: Create a new parallel project

Step 2: Save the project with a name.

Step 3: Drag and Drop oracle connectors from database option on the palette into the work area.

Step 4: Drag and Drop a ‘sequential file’ from file option on the palette into the work area.

Step 5: Drag and Drop ‘change capture’ from processing option on the palette into the work area.

Page 75: Datastage Experiments

Step 6: Create two tables student and dupstudent with the structure (rollno,name,age,deptid) and insert same records in student and dupstudent. Make some changes in the dupstudent table (new insert,delete,update).

Step 7: set oracle connector properties as same as in exercise 9.

Step 8: set change capture properties as follows.

Setting Properties

Change key=rollno (a column that will never change on which the comparison between the tables will occur).

Change Value= Age, Deptid, Name (columns whose values get change over time)

Drop Output For Copy, delete, edit, insert= False

If two tables contains exactly similar records then don’t leave that record, forward that record to the flat file.

If a record in student is not present in dupstudent (deleted) then forward that record to the flat file.

Page 76: Datastage Experiments

Similar actions on edit (update) and insert will occcur.

Column Settings:

The change capture generates a column called change_code by default which indicates the following.

Copy-0

Insert-1

Update-2

Delete-3

Page 77: Datastage Experiments

Column Mappings:

Step 9: set sequential_file properties as same as In exercise 1.

Step 10: compile and run the project.

Page 78: Datastage Experiments

OUTPUT:

Source tables:

Page 79: Datastage Experiments

Target File:

Page 80: Datastage Experiments

Exercise 19: Look up for the existence of records in DWH table with respect to OLTP table and join the records using ‘Look Up’ Stage

Step1: Create a new parallel project

Step 2: Save the project with a name.

Step 3: Drag and Drop three sequential files into the work area.

Step 4: Drag and Drop ‘Look Up’ from processing option on the palette into the work area.

Page 81: Datastage Experiments

Step 5: Set OLTPSRC and DWHSRC file properties as same as in exercise 1.

NOTE: Always oltp file should be at the top and dwh file should be at the bottom in the work area else error on running the project will occur.

Step 6: set look up properties as follows.

Create a link with ‘dno’ from oltp_link to dwh_link which act as a key for comparison.

Drag and Drop the desired columns from oltp_link and dwh_link to target_link.

Step 7: set target file properties as same as in exercise 1.

Step 8: Compile and run the project.

Page 82: Datastage Experiments

OUTPUT:

Source Files (DWH and OLTP):

Result: Execution success

Target File:

Inference:

If look up finds the existence of all the related records in DWH table with respect to OLTP table on using a key (here dno) then it will join those records and the join type is ‘natural join with using clause’.

So lookup can act as join with the above restriction.

Page 83: Datastage Experiments

Source files (DWH and OLTP):

Result:

Inference:

Since a record with the key (dno=6) in the oltp table is not exists in the dwh table, error occurred.

Page 84: Datastage Experiments

Exercise 20: Maintain logs of changes made in DWH table with respect to OLTP table using ‘SLOWLY CHANGING DIMENSION’ stage.

Step1: Create a new parallel project

Step 2: Save the project with a name.

Step 3: Drag and Drop three oracle connectors from database option on the palette into the work area.

Step 4: Drag and Drop a ‘sequential file’ from file option on the palette into the work area.

Step 5: Drag and Drop ‘Slowly Changing Dimension’ from processing option on the palette into the work area.

Page 85: Datastage Experiments

Step 6: Create a table oltp with the following description and insert some records then commit.

Step 7: Create a table deptdwh with the following description.

Step 7: set OLTP oracle connector properties as same as in exercise 9 and use oltp table.

Step 8: Set DWH oracle connector properties as same as in exercise 9 and use deptdwh table.

Step 9: Set Target_DWH oracle connector properties as same as in exercise 9 and use deptdwh table.

Step 10: Set Fact sequential file properties as same as in exercise 1.

Page 86: Datastage Experiments

Step 11: Set Slowly Changing Dimension as follows.

Fast Path: 1 of 5

Select output link as fact (sequential file).

Page 87: Datastage Experiments

Fast Path: 2 of 5 (Input)

Map the key column between oltp and dwh table.

Page 88: Datastage Experiments

Fast Path: 3 of 5 (Input)

Set Initial Value as 1

Create a txt file ‘System.txt’ in C:\ for system reference.

Give that file path under Source name:

Page 89: Datastage Experiments

Fast Path: 4 of 5 (Output)

Map columns for the Fact (sequential file).

Always map common columns from oltp table.

Page 90: Datastage Experiments

Fast Path: 5 of 5 (Output)

At Initial Stage:

Set Derivation, Purpose and Expire for columns.

Page 91: Datastage Experiments

Derivation and Expire can be set by double click->right click->function->desired function on the respective columns.

Purpose Settings:

Business Key: primary key

Surrogate key: to locate changes (for system reference)

Type 1: Non-Changeable values but not but not a business key (eg: Date of birth).

Type 2: Changeable values.

Effective Date: Entry date of the record

Expiration Date: Entry date of immediate duplicate record (so initially set it as null)

Current Indicator: Indicates the active record

Active-1

Inactive-0

Page 92: Datastage Experiments
Page 93: Datastage Experiments

Fast Path 5 of 5 (output) at final stage:

After setting the fast path: 5 of 5, fast path: 2 of 5 will become as

Page 94: Datastage Experiments

Step 12: Compile and Run the project.

OUTPUT:

Deptdwh table is inserted with the records from oltp table with stdate as current date, expdate as null and CID as 1(active record).

Page 95: Datastage Experiments

Fact file content:

After Making the following changes on oltp table

Page 96: Datastage Experiments

Deptdwh table is inserted with changed records as well as newly inserted records at oltp with stdate as current date, expdate and cid.

The ‘dname’ value of the row with deptno =10 is changed from ‘C’ to ‘JAVA’ .

The old record gets the expiration date as the starting date of the newly updated record

Current indicator (cid) of old record= 0 and for new record, cid=1.

Page 97: Datastage Experiments

Fact file content:

Page 98: Datastage Experiments

Exercise 21: PIVOT STAGE

Step1: Create a new parallel project

Step 2: Save the project with a name.

Step 3: Drag and Drop two sequential files into the work area.

Step 4: Drag and Drop ‘pivot’ from processing option on the palette into the work area.

Step 5: set sequential_file_0 properties as same as in exercise1.

Page 99: Datastage Experiments

Step 6: Set Pivot properties as follows.

Input settings:

Output Settings:

Page 100: Datastage Experiments

Step 7: set sequential_file_2 properties as same as in exercise1.

Step 8: Compile and run the project.

OUTPUT:

Source File:

Target File:

NOTE: Datatype of all horizontal columns except the primary key column in source table should be same. In our case q1, q2, q3 column in source table are integers. So that all these columns can fit into the column ‘q’ with integer datatype in target table.

Page 101: Datastage Experiments

Exercise 22: Run the jobs in sequential manner (one after other) using ‘Sequence Job’

Sequence Job is mainly used for executing the jobs one after other.

It is very essential to execute the jobs in a particular sequence in which one job depends on the finished execution state of another job.

For example consider the following query,

Select e.eno,e.ename,e.deptno,d.deptname from emp e join dept d on(e.deptno=d.deptno) where e.deptno in(10,20,30) order by 2;

The above query needs to execute three jobs (1. Join, 2. Filter, 3. Sort) in sequence.

Step1: Create a new sequence project

Step 2: Save the project with a name.

Page 102: Datastage Experiments

Step 3: Drag and Drop the jobs you want to execute sequentially from repository into the work area.

Step 4: Link the Jobs

Step 5: Compile and run the project.

Page 103: Datastage Experiments

Step 6: Open the run directory and observe the logs for successful execution of all the jobs.