informatica to odi 11g

Ripu Jain, KPIT 1

Informatica to ODI 11gDeveloper Cheat Sheet

Ripu Jain, KPIT 2

ELT (ODI) vs ETL (Informatica)• Uses database (source or target) as ETL engine vs having

separate server for performing transformations• White box vs Black box• The ability to dynamically manage a staging area• The ability to generate code on source and target systems

alike, in the same transformation• The ability to generate native SQL for any database on the

market—most ETL tools will generate code for their own engines, and then translate that code for the • The ability to generate DML and DDL, and to orchestrate

sequences of operations on the heterogeneous systems

Conventional ETL Architecture

Extract LoadTransform

Next Generation Architecture“E-LT”

LoadExtractTransform Transform

Ripu Jain, KPIT 3

ODI Studio Components and Repositories• Designer (Work Repository)

• Develop and execute mappings (called interface(s) in ODI)• Develop and execute workflows (called package(s) in ODI)• Design data integrity checks and build transformations

• Operator (Work Repository)• Monitoring tool• Manage interface executions in sessions, and scenarios in

production.

• Topology (Master Repository)• Create, Manage connections to different technologies and

data servers – both source and target.

• Security (Master Repository)• Manages user profiles, roles and their privileges to ODI studio• Developer will usually have Designer and Operator component

access privileges, Admin will have access to all 4 components

Ripu Jain, KPIT 4

Topology NavigatorThis component is used to create and manage connections to different technologies – Oracle, MS, IBM, SAP, flat files, etc – as you would in Informatica’s Designer and Workflow Manager clients. Following are the sub components:• Physical Architecture – lets you create the “Data Server” of the specific

technology, be it your source or target. You will need to know the username/password of the SYSTEM user of the database, and the JDBC Driver and URL connection details

• Physical Schema – After Data Server is created, you need to manually create Physical Schema(s) in ODI, in which your source/target tables reside.• You can specify “Work Schema” - schema where ODI creates its temporary $ tables• You can specify “Context” - associate physical schema with logical schema

• Logical Architecture – is the ODI layer that lets you group physical schemas containing datastores that are structurally identical, but located on different data servers.

• Repositories – contains information about the Master and Work Repositories.

Ripu Jain, KPIT 5

Designer Navigator• Projects

• Same as managing designer repositories in Informatica – repositories for different environments, or a different ETL project.

• Project can have multiple folders – collection of interfaces (mappings) and packages (workflows)

• Models• Same as importing source and target table (called datastore in ODI) definitions

into Informatica folders• You can define keys, references, conditions and filters, and journalize (Change

Data Capture rules) models and datastores

• Interface• Same as creating a mapping in Informatica, with joins, filters, lookup and other

transformations

• Package• Same as creating workflows in Informatica• Execute multiple interfaces and procedures in a package, according to business

rules, with ability to send email, check variables, perform if-else rules…

Ripu Jain, KPIT 6

Knowledge Modules• Can be thought of as Informatica’s mapplets.• Reusable code templates provided to perform certain standard integration operations• In ODI Designer, after a project is created, to create a new interface, you must import certain KMs depending

on mapping logic and business rules.• Types of KM:

Ripu Jain, KPIT 7

Reverse KM• Used to perform a customized reverse-engineering of data models for a specific technology, and retrieve the

model-metadata to the ODI work repository

Ripu Jain, KPIT 8

Journalize KM• Used to create a journal of data modifications (insert, update and delete) of the source databases to keep

track of the changes• Not used in interface, but rather defined on a model

Ripu Jain, KPIT 9

Loading KM• Used in interfaces to extract data from the source database tables and other systems (files, middleware,

mainframe, etc.)• Not required when all source datastores reside on the same data server as the staging area.• Implements the declarative rules that need to be executed on the source server and retrieves a single result

set that it stores in a "C$" table in the staging area

Ripu Jain, KPIT 10

Check KM• Used to check that constraints on the sources and targets are not violated• The CKM can be used in 2 ways:

• To check the consistency of existing data. This can be done on any datastore or within interfaces, by setting the STATIC_CONTROL option to "Yes". In the first case, the data checked is the data currently in the datastore. In the second case, data in the target datastore is checked after it is loaded.

• To check consistency of the incoming data before loading the records to a target datastore. This is done by using the FLOW_CONTROL option. In this case, the CKM simulates the constraints of the target datastore on the resulting flow prior to writing to the target.

• The CKM accepts a set of constraints and the name of the table to check. It creates an "E$" error table which it writes all the rejected records to.

Ripu Jain, KPIT 11

Integration KM• Used in interfaces to integrate (load) data data from source (in case of datastore exists in the same data

server as the staging area) or loading table (i.e. C$ tables loaded by LKM in case of remote datastore on a separate data server than staging area) into the target datastore

• data might be inserted, updated or to capture slowly changing dimension.• Integration Modes:

• Append - Rows are append to target table. Existing records are not updated. It is possible to delete all rows before performing an insert by setting optional truncate property.

• Control Append - Does same operation as Append, but in addition data flow can be checked by setting flow control property. A flow control is used to check the data quality to ensure that all references are validated before loading into target.

• Incremental Update - Used for performing insert and update. Existing rows are updated and non-existence rows are inserted using Natural Key defined in interface, along with checking flow control.

• Slowly Changing Dimension - Used to maintain Type 2 SCD for slowly changing attributes.

• Types of IKM:• When staging is on same server as target.• When staging is on a different server than target (also referred as multi-technology IKMs).

Ripu Jain, KPIT 12

IKM – Staging is on same server as Target• IKM executes a single set-oriented SQL to perform declarative rules on all “C$” tables and source tables (D in

this case) to generate result set.• IKM can either write the result set

• directly into target table (in case of “Append”)• OR into an Integration table “I$” (For e.g., Incremental Update, SCD) before loading into target. Integration table (also called flow

table) is an image of the target table with few extra fields required to carry out specific operations on data before loading data into target. Data in this table are flagged for insert/update, transformed and checked against constraint to identify invalid rows using CKM and load erroneous rows into “E$” table and removed them from “I$” table.

• After completion of data loading, IKM drops temporary integration tables.• IKM can optionally call CKM to check the consistency of target datastore.

Ripu Jain, KPIT 13

IKM – Staging is on different server as Target• used for data servers with no transformation capabilities and only simple integration modes are possible, for

e.g. Server to File.• CKM operations cannot be performed in this strategy.

informatica to odi 11g

Documents