bodi training v1.1

Post on 03-Mar-2015

141 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Business Objects Data IntegratorData Integrator

TrainingBODI Version XI3

1

Audience

• Application Developers

• Consultants

• Database Administrators working on data extraction, data warehousing, or data integration.warehousing, or data integration.

2/18/2011 2

Assumptions

• You understand your source data systems, RDBMS, business intelligence and e-commerce messaging concepts.

• You are familiar with SQL (Structured Query Language).

• You are familiar with Microsoft Windows or UNIX • You are familiar with Microsoft Windows or UNIX platforms to effectively use Data Integrator.

2/18/2011 3

Business Objects Data Integration Platform

• The Data Integration Platform consists of

– Data Integrator: data movement and management server

– Rapid Marts: suite of packaged data caches for speedy delivery and integration of data

2/18/2011 4

Business Objects Data Integration Platform

2/18/2011 5

Rapid Mart SAP R/3 Modules

Account Payable ----> FI-FInanceAccount Receivable ----> FI-FInanceCost Center ----> CO-Controlling Human Resources ----> HR-Human Resources Human Resources ----> HR-Human Resources Inventory ----> MM-Materials Movement Plant Maintenance ----> PM-Plant Maintenance Production Planning ----> PP-Production Planning Project Systems ----> PS-Project Systems Purchasing ----> SD-Sales and Distribution Sales ----> SD-Sales and Distribution

2/18/2011 6

Data Integrator

• DI is a data movement and integration platform

2/18/2011 7

Data Integrator Architecture

2/18/2011 8

Data Integrator operating system platforms

• DI Designer runs on the following Windows platforms:� NT� 2000 Professional� 2000 Server� 2000 Advanced Server� 2000 Advanced Server� 2000 Datacenter Server� XP

• All other DI components run on the above Windows platforms and the following UNIX platforms:� Solaris 2.7 and 2.8 (Sun OS releases 5.7 and 5.8)� HP-UX version 11.00 (PA_RISC 2.0), and 11.1� IBM AIX 4.3.3.75 with maintenance level 4330-10, and AIX

5.1

2/18/2011 9

Data Integrator Components

• Standard components are:– DI Job Server– DI Engine– DI Designer– DI Designer– DI Repository– DI Access Server– DI Administrator– DI Metadata Reports tool– DI Web Server– DI Service– DI SNMP Agent

2/18/2011 10

Data Integrator Component Relationships

2/18/2011 11

Data Integrator Components

• DI Job Server– starts the data movement engine that integrates

data from multiple heterogeneous sources, performs complex data transformations, and manages extractions and transactions. manages extractions and transactions.

– can move data in either batch or real-time mode and uses distributed query optimization, multi-threading, in-memory caching, in-memory data transformations, and parallel pipelining to deliver high data throughput and scalability.

2/18/2011 12

Data Integrator Components

• DI Engine– When DI jobs are executed, the Job Server starts

DI engine processes to perform data extraction, transformation, and movement.transformation, and movement.

– DI engine processes use parallel-pipelining and in-memory data transformations to deliver high data throughput and scalability.

2/18/2011 13

Data Integrator Components

• DI Designer– allows for defining data management applications

which consist of data mappings, transformations, and control logic. and control logic.

– a development tool with a graphical user interface. It enables developers to create objects, then drag, drop, and configure them by selecting icons in flow diagrams, table layouts and nested, workspace pages.

2/18/2011 14

Data Integrator Components

• DI Repository– a set of tables that hold user-created and

predefined system objects, source and target metadata, and transformation rules. It is set up on metadata, and transformation rules. It is set up on an open client/server platform to facilitate the sharing of metadata with other enterprise tools. Each repository is stored on an existing RDBMS.

– associated with one or more DI Job Servers.

2/18/2011 15

Data Integrator Components

– There are two types of repositories:

A local repository is used by an application designer to store definitions of DI objects (like projects, jobs, work flows, and data flows) and source/target metadata.

A central repository is an optional component that can be used to support multi-user development. The central repository provides a shared object library allowing developers to check objects in and out of their local repositories.

2/18/2011 16

Data Integrator Components

• DI Access Server– The Access Server is a real-time, request-reply message

broker that collects message requests, routes them to a real-time service, and delivers a message reply within a user-specified time frame.specified time frame.

– The Access Server queues messages and sends them to the next available real-time service across any number of computing resources. This approach provides automatic scalability because the Access Server can initiate additional real-time services on additional computing resources if traffic for a given real-time service is high.

– Multiple Access Servers can also be configured.

2/18/2011 17

Data Integrator Components

• DI Administrator– browser-based administration of DI resources, including:

• Scheduling, monitoring and executing batch jobs• Configuring, starting and stopping real-time services• Configuring Job Server, Access Server, and repository

usage• Configuring and managing adapters• Managing users• Publishing batch jobs and real-time services via Web

services

2/18/2011 18

Data Integrator Components

• DI Metadata Reports tool– This provides browser-based reports on DI metadata, which

is stored in the repository. Reports are provided for:• Repository summary• Job analysis• Execution statistics• Impact analysis

2/18/2011 19

Data Integrator Components

• DI Web Server– supports browser access to the Administrator and the

Metadata Reports tool. – The Windows service name for this server is DI Web Server

service; – The UNIX equivalent is a daemon named the Tomcat server.

This is the servlet engine used by the DI Web Server.

2/18/2011 20

Data Integrator Components

• DI Service– The DI Service is installed when DI Job and

Access Servers are installed. The DI Service starts Job Servers and Access Servers when you starts Job Servers and Access Servers when you reboot your system.

– The Windows service name is DATA INTEGRATOR Service.

– The UNIX equivalent is a daemon named AL_JobService.

2/18/2011 21

Data Integrator Components

• DI SNMP Agent– DI error events can be communicated using

SNMP-supported applications for better error monitoring.monitoring.

– The DI SNMP agent monitors and records information about the Job Servers and jobs running on the computer where the agent is installed.

2/18/2011 22

Data Integrator Management Tools

• License Server– The License Server allows you to centrally control license

validation for DI components and licensed extensions.

• Repository Manager• Repository Manager– The Repository Manager allows you to create, upgrade, and

check the versions of local and central repositories.

• Server Manager– The Server Manager allows you to add, delete, or edit the

properties of Job Servers and Access Servers. It is automatically installed on each computer on which you install a Job Server or Access Server.

2/18/2011 23

Data Integrator Objects

• All “entities” you create, modify, or work with in DI Designer are called objects. The local object library shows objects such as source and target metadata, system functions, projects, and jobs.

• DI has two types of objects:• DI has two types of objects:� Reusable objects

� Have a single definition� All calls to the object refer to the object definition� Changes in the object definition get propagated to all calls to the object

definition

� Single-use objects� Objects that are defined only within the context of a single job or data

flow E.g. Scripts

2/18/2011 24

Data Integrator Object Relationships

2/18/2011 25

Projects

• A reusable object that allows you to group jobs.• highest level of organization offered by DI.• used to group jobs that have schedules that depend on one

another or that you want to monitor together.• Only one project can be open at a time.• Projects cannot be shared among multiple users.

2/18/2011 26

Jobs

• A job is the only object that is executed.• The following objects can be included in a job

definition:– Data flows– Data flows

• Transforms– Work flows

• Scripts• Conditionals• While Loops• Try/catch blocks.

2/18/2011 27

Datastores

• represent connections between DI and databases or applications, directly or through adapters.

• allow DI to access metadata from a database or application and hence to read from or write to a database or an application.database or an application.

• DI datastores can connect to:– Databases and mainframe file systems.– Applications that have pre-packaged or user-

written DI adapters.– SAP R/3, SAP BW, PeopleSoft, J.D. Edwards One

World, and J.D. Edwards World.

2/18/2011 28

File Formats

• DI can use data stored in files for data sources or data targets.

• File format objects can describe files in:– Delimited format — Characters such as commas – Delimited format — Characters such as commas

or tabs separate each field– Fixed width format — The column width is

specified by the user– SAP R/3 format

2/18/2011 29

Data Flows

• Data flows extract, transform, and load data; reading sources, transforming data, and loading targets, occurs inside a data flow.

• A data flow can be added to a job or a work flow.• From inside a work flow, a data flow can send and receive • From inside a work flow, a data flow can send and receive

information to and from other objects through input and output parameters.

2/18/2011 30

Data Flows

Source(s) Target(s)Data

Transformation

2/18/2011 31

Input Parameters

Source(s) Target(s)TransformationOperations

Output Parameters

Work Flows

• A work flow defines the decision-making process for executing data flows.

• The purpose of a work flow is to prepare for executing data flows and to set the state of the system after the data flows are complete.complete.

• The following objects can be elements in work flows:� Work f lows� Data flows� Scripts� Conditionals� While loops� Try/catch blocks

2/18/2011 32

Work Flows

Control ControlData

2/18/2011 33

Control Operations

ControlOperations

DataFlow

Conditionals

• Conditionals are single-use objects used to implement if/then/else logic in a work flow.

• To define a conditional, you specify a condition and two logical branches:– If A Boolean expression that evaluates to TRUE or

FALSE. You can use functions, variables, and standard operators to construct the expression.

– Then Work flow elements to execute if the If expression evaluates to TRUE.

– Else (Optional) Work flow elements to execute if the If expression evaluates to FALSE.

2/18/2011 34

Conditionals

Work Flow

ConditionalThen

Run Work FlowTrue

2/18/2011 35

If Process

Successful

Run Work Flow

Else Send E-mail

True

False

While Loops

• The while loop is a single-use object that you can use in a work flow.

• The while loop repeats a sequence of steps as long • The while loop repeats a sequence of steps as long as a condition is true.

2/18/2011 36

While Loops

While Number != 0

False

2/18/2011 37

Number != 0

True

Step 1

Step 2

Try / Catch Blocks

• A try/catch block is a combination of one try object and one or more catch objects that allow you to specify alternative work flows if errors occur while DI is executing a job.is executing a job.

• Try/catch blocks:– “Catch” classes of exceptions “thrown” by DI, the DBMS, or

the operating system– Apply solutions that you provide– Continue execution

• Try and catch objects are single-use objects.

2/18/2011 38

Try / Catch Blocks

• Categories of available exceptions are:� Database access errors� Email errors� Engine abort errors� Execution errors� Execution errors� File access errors� Microsoft connection errors� Parser errors� Predefined transform errors� Repository access errors� Resolver errors� System exception errors� User transform errors

2/18/2011 39

Scripts

• Scripts are single-use objects used to call functions and assign values to variables in a work flow.

• A script can contain the following statements:• A script can contain the following statements:– Function calls

– If statements

– While statements

– Assignment statements

– Operators

2/18/2011 40

Types Of Lookup Functions

Retrieves a value in a table or file based on the values in a different source

table or File.

– 1) Lookup– 1) Lookup

– 2) Lookup_Ext

– 3) Lookup_Seq

2/18/2011 41

Variables

• Variables are symbolic placeholders for values.

– Local Variables• Local variables are local to the work flow in which they are

defined—a local variable defined in a work flow is available for use in any of the single-use objects in the work flow.

• The value of the local variable can be passed as a parameter into another work flow or data flow called in the work flow.

2/18/2011 42

Variables

– Global Variables• Global variables are global within a job.

• Once a name for a global variable is used in a job that name becomes reserved for the job.

• Global variables are exclusive within the context of the job in which they are created.

• Setting parameters is not necessary when you use global variables.

2/18/2011 43

Parameters

• Parameters are expressions passed to a work flow or data flow when the work flow or data flow is called.

• Parameters can be defined to pass values into and out of • Parameters can be defined to pass values into and out of work flows, data flows, and custom functions

2/18/2011 44

Transforms

• The following transforms are available from the object library on the Transforms tab. -- Case -- Date_Generation

-- Effective_Date -- Hierarchy_Flattening-- Effective_Date -- Hierarchy_Flattening

-- History_Preserving -- Key_Generation

-- Map_Operation -- Merge

-- Pivot (Columns to Rows) -- Query

-- Reverse Pivot (Rows to Columns) -- Row_Generation

-- SQL -- Table_Comparison

2/18/2011 45

Query Transform

• Retrieves a data set that satisfies conditions that you specify. A query transform is similar to a SQL SELECT statement.

2/18/2011 46

Query Transform

2/18/2011 47

Query Transform

Input Schema

Output Schema

2/18/2011 48

Output Schema

Options

Case Transform

• Specifies multiple paths in a single transform (different rows are processed in different ways).

• Simplifies branch logic in data flows by consolidating case or decision making logic in one transform.or decision making logic in one transform.

• Paths are defined in an expression table.

2/18/2011 49

Case Transform

2/18/2011 50

Case Transform

2/18/2011 51

SQL Transform

• Performs the indicated SQL query operation.

• Use this transform to perform standard SQL operations for things that cannot be performed using other built-in things that cannot be performed using other built-in transforms.

2/18/2011 52

SQL Transform

2/18/2011 53

SQL Transform

2/18/2011 54

Merge Transform

• Combines incoming data sets, producing a single output data set with the same schema as the input data sets.

2/18/2011 55

Merge Transform

2/18/2011 56

Merge Transform

2/18/2011 57

Row_Gen Transform

• Produces a data set with a single column.

• The column values start from zero and increment by one to a specified number of rows.

2/18/2011 58

Row_Gen Transform

2/18/2011 59

Key_Generation Transform

• Generates new keys for new rows in a data set.

• The Key_Generation transform looks up the maximum existing key value from a table and uses it as the starting value to generate new keys.value to generate new keys.

2/18/2011 60

Key_Generation Transform

2/18/2011 61

Key_Generation Transform

2/18/2011 62

Date_Generation Transform

• Produces a series of dates incremented as you specify.

• Use this transform to produce the key values for a time dimension target. From this generated sequence you can populate other fields in the time dimension (such as populate other fields in the time dimension (such as day_of_week) using functions in a query.

2/18/2011 63

Date_Generation Transform

2/18/2011 64

Date_Generation Transform

2/18/2011 65

Date_Generation

Table_Comparison Transform

• Compares two data sets and produces the difference between them as a data set with rows flagged as INSERT or UPDATE.

• The Table_Comparison transform allows you to detect and• The Table_Comparison transform allows you to detect and

• forward changes that have occurred since the last time a target was updated.

2/18/2011 66

Table_Comparison Transform

2/18/2011 67

Map_Operation Transform

• Allows conversions between data manipulation operations.

• The Map_Operation transform allows you to change operation codes on data sets to produce the desired output.

– For example, if a row in the input data set has been updated in some previous operation in the data flow, you can use this transform to map the UPDATE operation to an INSERT. The result could be to convert UPDATE rows to INSERT rows to preserve the existing row in the target.

2/18/2011 68

Map_Operation Transform

2/18/2011 69

Table_Comparison & Map_Operation Transforms

2/18/2011 70

History_Preserving Transform

• The History_Preserving transform allows you to produce a new row in your target rather than updating an existing row.

• You can indicate in which columns the transform identifies • You can indicate in which columns the transform identifies changes to be preserved.

• If the value of certain columns change, this transform creates a new row for each row flagged as UPDATE in the input data set.

2/18/2011 71

Pivot Transform (Columns to Rows)

• Creates a new row for each value in a column that you identify as a pivot column.

• The Pivot transform allows you to change how the relationship between rows is displayed. relationship between rows is displayed.

• For each value in each pivot column, DI produces a row in the output data set.

• You can create pivot sets to specify more than one pivot column.

2/18/2011 72

Pivot Transform (Columns to Rows)

Region Sales-2001 Sales-2002 Sales-2003

North 200 300 400

East 300 600 700

West 350 800 770

South 800 200 3750

2/18/2011 73

South 800 200 3750

Region Year SalesNorth 2001 200North 2002 300North 2003 400

Reverse Pivot Transform (Rows to Columns)

• Creates one row of data from several existing rows.

• The Reverse Pivot transform allows you to combine data from several rows into one row by creating new columns.

• For each unique value in a pivot axis column and each • For each unique value in a pivot axis column and each selected pivot column, DI produces a column in the output data set.

2/18/2011 74

Reverse Pivot Transform (Rows to Columns)

Region Year SalesNorth 2001 200North 2002 300North 2003 400

2/18/2011 75

North 200 300 400Region 2001 2002 2003

Functions

• Functions operate on single values, such as values in specific columns in a data set.

• You can use functions in the following operations:

– Queries– Queries

– Scripts

– Conditionals

• You can use :– Built-in functions (DI functions)

– Custom functions (user-defined functions)

– Database and application functions (functions specific to DBMS)

2/18/2011 76

Procedures

• DI supports the use of stored procedures for Oracle, Microsoft SQL Server, Sybase, and DB2 databases.

• You can call stored procedures from the jobs you create • You can call stored procedures from the jobs you create and run in DI

2/18/2011 77

Debugging

• Execute a job in the Data Scan mode

• View and analyze the output data in the Data Scan window

• Compare and analyze different data samples

2/18/2011 78

Debugging – Data Scan Mode

2/18/2011 79

Debugging – Analyzing The OutputObject List Scan Date and Time

2/18/2011 80Schema Area Data Area

Migration and Repositories

• The development process you use to create your ETL application involves three distinct phases: design, test, and production.

• Each phase may require a different computer in a different environment, and different security settings for each.

• To control the environment differences, each phase may require a different repository.

2/18/2011 81

Migration and Repositories

DesignRepository

TestRepositoryExport to

Test

2/18/2011 82

ProductionRepository

Export to Production Repository

Test Repository

Migration and Repositories

• When moving objects from one phase to another, export jobs from your source repository to either a file or a database, then repository to either a file or a database, then import them into your target repository.

2/18/2011 83

Exporting Objects to a Database

• You can export objects from the current repository to another repository.

• However, the other repository must be the same version as the current one.

• However, the other repository must be the same version as the current one.

• The export process allows you to change environment-specific information defined in datastores and file formats to match the new environment.

2/18/2011 84

Exporting/Importing Objects to/from a File

• You can also export objects to a file.

• If you choose a file as the export destination, DI does not provide options to change environment specific information.specific information.

• Importing objects or an entire repository from a file overwrites existing objects with the same names in the destination repository. You must restart DI after the import process completes.

2/18/2011 85

Parallel Execution

• The maximum number of parallel DI engine processes in the Job Server options (Tools > Options> Job Server > Environment).

• This helps in running the transforms in parallel.

2/18/2011 86

Parallel Work Flows / Data Flows

2/18/2011 87

top related