scd type2 through informatica

19
Informatica Scd Type-2 implementation What is SCD Type-2: The Type 2 method tracks historical data by creating multiple records for a given natural key in the dimensional tables with separate surrogate keys and/or different version numbers. With Type 2, we have unlimited history preservation as a new record is inserted each time a change is made. Type2 can be achieved in different ways. How to implement SCD Type2 through Informatica: There is number of way to implement SCD Type2 in informatica. Example: Create source table CUST with below query. CREATE TABLE CUST ( CUST_ID NUMBER, CUST_NM VARCHAR2(250), ADDRESS VARCHAR2(250), CITY VARCHAR2(50), STATE VARCHAR2(50), INSERT_DT DATE, UPDATE_DT DATE); Insert following data in CUST table. Source Data: CUST_ID CUST_NAME ADDRESS CITY STATE INSER_DT UPDATE_DT 80001 Marion Atkins 100 Main St. Bangalore KA 1/7/2011 1/7/2011 80002 Laura Jones 510 Broadway Ave. Hyderabad AP 1/7/2011 1/7/2011 80003 Jon Freeman 555 6th Ave. Bangalore KA 1/7/2011 1/7/2011 The Type 2 method tracks historical data by creating multiple records for a given natural key (CUST_ID) in the dimensional tables with separate surrogate keys (PM_PRIMARYKEY). For this create table with below query

Upload: varun-hr

Post on 07-Nov-2014

112 views

Category:

Data & Analytics


3 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Scd type2 through informatica

Informatica Scd Type-2 implementation

What is SCD Type-2:

The Type 2 method tracks historical data by creating multiple records for a given natural key in the

dimensional tables with separate surrogate keys and/or different version numbers. With Type 2, we

have unlimited history preservation as a new record is inserted each time a change is made. Type2 can

be achieved in different ways.

How to implement SCD Type2 through Informatica:

There is number of way to implement SCD Type2 in informatica.

Example:

Create source table CUST with below query.

CREATE TABLE CUST

(

CUST_ID NUMBER,

CUST_NM VARCHAR2(250),

ADDRESS VARCHAR2(250),

CITY VARCHAR2(50),

STATE VARCHAR2(50),

INSERT_DT DATE,

UPDATE_DT DATE);

Insert following data in CUST table.

Source Data:

CUST_ID CUST_NAME ADDRESS CITY STATE INSER_DT UPDATE_DT

80001 Marion Atkins 100 Main St.

Bangalore KA 1/7/2011 1/7/2011

80002 Laura Jones 510 Broadway Ave.

Hyderabad AP 1/7/2011 1/7/2011

80003 Jon Freeman 555 6th Ave. Bangalore KA 1/7/2011 1/7/2011

The Type 2 method tracks historical data by creating multiple records for a given natural key (CUST_ID)

in the dimensional tables with separate surrogate keys (PM_PRIMARYKEY).

For this create table with below query

Page 2: Scd type2 through informatica

CREATE TABLE CUST_D

(

PM_PRIMARYKEY INTEGER,

CUST_ID NUMBER,

CUST_NM VARCHAR2(250),

ADDRESS VARCHAR2(250),

CITY VARCHAR2(50),

STATE VARCHAR2(50),

ACTIVE_DT DATE,

INACTIVE_DT DATE,

INSERT_DT DATE,

UPDATE_DT DATE)

Here active_dt and inactive_dt used to indentify history data.

PM_PRIMARYKEY is surrogate key which is used to identify each and every record uniquely in target.

The below diagram will explain high level over view of SCD type2 through informatica.

SCD Type-2 Over View

Flo

wF

low CUST SQ_CUST

lkp_CUST_D

exp_FLAG rtr_INS_UPD CUST_D_INSupd_INSERT

upd_UPDATE CUST_D_UPD

Before implementing we need to identify the attributes needs to be consider for history maintain.

In this example we will consider if any change in ADDRESS or CITY OR STATE.

So if any change in ADDRESS or CITY or STATE then we need to insert new record and inactivate old

record.

Creation of mapping:

Step1: First import source and target to informatica from data base.

Import the source definition CUST table using the Source Analyzer workspace. Go to Sources > Import

from Database.

Page 3: Scd type2 through informatica

This opens the Import Tables window. Assuming that a system DSN is already created for this

connection, specify all the necessary details and click Connect.

Select the CUST table to import and click OK to continue.

Page 4: Scd type2 through informatica

The CUST source definition is created and appears in the workspace. Click Save to save the source

definition in the repository.

The source table CUST contains only current data and doesn't have any historical data. This mapping

would be run daily to capture the historical data in the CUST_D target table. The Active and Inactive

Date logic would be used for SCD Type 2 mapping.

Follow the same steps using target Designer to import CUST_D table.

Page 5: Scd type2 through informatica

Now we have source CUST and target CUST_D tables are available.

Mapping creation:

Create mapping with name m_SCD_Type_2.

Page 6: Scd type2 through informatica

Now drag CUST table from sources to mapping designer workspace.

Create lookup table CUST_D

Create one input port in lkp_CUST_D table with name in_CUST_D with data type double. And add

condition CUST_ID=in_CUST_ID

Page 7: Scd type2 through informatica

Connect CUST_ID from source to in_CUST_ID lookup table

Page 8: Scd type2 through informatica

Create expression transformation and drag CUST_ID, CUST_NM, ADDRESS, CITY, STATE from Source

qualifier to expression. In the same why drag CUST_NM, ADDRESS, CITY, STATE from lkp_CUST_D table

to expression. Change attributes names in expression to identify source and lookup attributes as shown

in diagram.

Page 9: Scd type2 through informatica

In lookup transformation apply filter to retrieve only active records. This you can do it in lookup sql override. SELECT CUST_D.PM_PRIMARYKEY as PM_PRIMARYKEY, CUST_D.CUST_NM as CUST_NM, CUST_D.ADDRESS as ADDRESS, CUST_D.CITY as CITY, CUST_D.STATE as STATE, CUST_D.ACTIVE_DT as ACTIVE_DT, CUST_D.INACTIVE_DT as INACTIVE_DT, CUST_D.INSERT_DT as INSERT_DT, CUST_D.UPDATE_DT as UPDATE_DT, CUST_D.CUST_ID as CUST_ID FROM CUST_D WHERE INACTIVE_DT IS NOT NULL

Page 10: Scd type2 through informatica

Now create two output ports in expression transformation.

Out_DUMMY_DATE-- > which is used to populate ACTIVE_DT, INACTIVE_DT, INSERT_DT, UPDATE_DT

attributes in target. Assign SYSDATE for this attribute.

Out_FALG-- > which is used to flag record for insert new/insert update or update/Inactivate record.

Page 11: Scd type2 through informatica

To flag a record we need to check three conditions

Source record present in target or not.

For this if PM_PRIMARYKEY which is coming from source is null that means the record with

CUST_ID coming from source not present in target. This type of records needs to be inserted in

target directily.

Are there any changes in attributes from source and Target?

For this if PM_PRIMARYKEY is not null means the CUST_ID coming from source present in target.

Now we need to validate is there any changes in CUST_NM, ADDRESS, CITY, and STATE between

source and target data. If any change in data then need to insert this record in target at the

same time need to inactivate record present in target.

If record coming from source present in target and no changes between source and target

attributes then filter those records.

Write this condition in out_FLAG expression editor.

IIF(ISNULL(lkp_PM_PRIMARYKEY),1,IIF(lkp_CUST_NM!=src_CUST_NM OR

lkp_ADDRESS!=src_ADDRESS OR lkp_CITY!=src_CITY OR lkp_STATE!=src_STATE,2,3))

Page 12: Scd type2 through informatica

From the above condition if any record flagged as 1 that means it new record which is coming

from source and this is not available in target.

If any record flagged as 2 that means it’s already exists in target and there is a difference in

attributes from source and target.

If any record flagged as 3 that means this record present in source and target, there is no

difference between attributes.

Now create router transformation and drag the following attributes from expression to router

transformation.

Lkp_PM_PRIMARYKEY, src_CUST_ID, src_CUST_NM, src_ADDRESS, src_CITY, src_STATE,

out_DUMMY_DT and out_FLAG.

Page 13: Scd type2 through informatica

Create two groups in router one for insert and another one for update. In insert group will pass

both new and changed records for insert.

Update group only to pass the record which we need to inactivate records.

Page 14: Scd type2 through informatica

Up to this mapping will be like below

Now connect INSERT group from router to Target. Connect the flowing attributes from INSERT group to

Target src_CUST_ID, src_CUST_NM, src_ADDRESS, src_CITY, src_STATE,out_DUMMAY_DT to respective

fields in target. Connect out_DUMMY_DT field from router to INSERT_DT, UPDATE_DT and ACTIVE_DT

attributes.

Create sequence transformation and connect nextval from seq transformation to target

PM_PRIMARYKEY attribute in INSERT pipe line as shown in below screen.

Page 15: Scd type2 through informatica

Now drag lkp_PM_PRIMARYKEY, out_DUMMAY_DT from UPDATE group of router transformation and

connect to update strategy transformation.

Connect same to target as shown in above screen shot.

Page 16: Scd type2 through informatica

Now in update strategy transformation set the property as DD_UPDATE.

With this you have completed SCD Type2 mapping.

Finally the mapping will be look like below.

Page 17: Scd type2 through informatica

Create work flow with session for this mapping and assign source and target relational connections.

Page 18: Scd type2 through informatica

Now we will go with data example.

Data available in CUST table source on 1st Jul.

CUST_ID CUST_NM ADDRESS CITY STATE INSERT_DT UPDATE_DT

80001 Marion Atkins 100 Main St. Bangalore KA 7/1/2011 7/1/2011

80002 Laura Jones 510 Broadway Ave. Hyderabad AP 7/1/2011 7/1/2011

80003 Jon Freeman 555 6th Ave. Bangalore KA 7/1/2011 7/1/2011

Assuming this is the first time we are running mapping so there won’t be any data in target.

Page 19: Scd type2 through informatica

If you are running job on 2nd Jul data in target looks like below

So for all this records inactive date is null. Means all are active records.

After this run assume that data changed in source on 2nd. Changed data in source looks like below.

CUST_ID CUST_NM ADDRESS CITY STATE INSERT_DT UPDATE_DT

80001 Marion Atkins 100 Main St. Bangalore KA 7/1/2011 7/1/2011

80002 Laura Jones 510 Broadway Ave. Hyderabad AP 7/1/2011 7/1/2011

80003 Jon Freeman 555 6th Ave. Hyderabad AP 7/1/2011 7/2/2011

80004 Veeru 90,HSR Layout Bangalore KA 7/2/2011 7/2/2011

In the above data for first two records there are no changes after last refresh so there is now change in

update date. But for record with CUST_ID 80003 CITY, STATE changed from previous day to today. So

update date changed from 1St Jul to 2nd Jul.

If you run same job on 3rd Jul, then target data looks like below.

In target today two records get inserted. One is new record which is with CUST_ID 80004. And another

record which is changed record 80003.

So in target for CUST_ID 80003 we have two records one is inactivated (PM_PRIMARYKEY=3) and

another one is active record (PM_PRIMARYKEY=5). From this you can identify for over period of time

what is the active record for particular customer.

Hope this will help you to understand SCD Type 2 logic implementation in Informatica.