sap businessobjects data services - community … sap businessobjects data services one tool for...

54
SAP BusinessObjects Data Services Technical Overview

Upload: truonghanh

Post on 28-May-2018

238 views

Category:

Documents


0 download

TRANSCRIPT

SAP BusinessObjects

Data Services

Technical Overview

© SAP 2008 / Page 2

1. Data Services Technical Architecture

2. Developing in Data Services

3. Data Quality Management

3. Deploying in Data Services

4. Deploying for Performance

5. Managing with Data Services

6. Maintenance with Data Services

Agenda

© SAP 2008 / Page 3© SAP 2007/Page 3

Are you spending too much resources learning

different tools?

Separate products create the need for redundant investments in training, software management, and life-cycle support

DI Development UI

DI Metadata

DATA INTEGRATOR

DI Engine

DQ Development (UI)

DQ Metadata

DATA QUALITY

DQ Engine

© SAP 2008 / Page 4

SAP BusinessObjects Data Services

One Runtime Architecture

One Development User Interface

One Metadata

Repository

One Administration Environment

One Set of Connectors

Data Services

Improve

Deliver

Transform

Access

Runtime Architecture

Metadata

Repository

Development User Interface

Administration and Connectors

Runtime Architecture

Metadata

Repository

Development User Interface

Administration and Connectors

Data Integrator

Data Quality

Data Services is the first single solution combining data integration and data quality

© SAP 2008 / Page 5© SAP 2007/Page 5

Introducing SAP BusinessObjects

Data Services

One tool for enterprise-class data integration and data quality

One intuitive development user interface (UI)

One administration environment

One runtime architecture

DATA SERVICESFirst single, enterprise-class data

Integration and quality application

One Runtime Architecture

One Development User Interface

One Metadata

Repository

One Administration Environment

One Set of Connectors

Data Services

Improve

Deliver

Transform

Access

© SAP 2008 / Page 6

SAP BusinessObjects Data Services

Gold Award Winner – 2008 Product of the Year

Source: SearchDataManagement.com, April 2009

© SAP 2008 / Page 7

Real Time

What is Data Services?

Technology with capabilities to

Explore (profile), Transform, and Move

data anywhere at any frequency

IMPROVE data (parse, cleanse, enhance,

match, consolidate)

Deliver auditable information

Maximize developer productivity

Deliver extreme scalability

On single servers

On multiple servers (Grid computing)

Benefits:

Database / Application agnostic

Total graphical environment maximizes

developer productivity with little or no

coding

Investment protected with extreme

scalability possibility

PhysicalData Integration

Semantic Layer

Batch

© SAP 2008 / Page 8

Use Case: Data Integrator

Integrate Heterogeneous Data

CRM

RDBMS

Excel

Emails

Files (Flat, XML)

Web

Mainframes

ERP

Original,

Raw data

Loads properly formatted and

structured data into Target/s

Richard Tan Hock Guan

Future Electornics

Singapore 068804

Tan Hok Guan

Future Elect. Co.

Singapore

Tan Hock Guan

Future & Electronics Co.

Singapore

Business

Intelligence

SAP BW

SAP R/3

CRM

Data

Mart

Data

Warehouse

Extracts &

Transforms

data

Data Integrator

Disparate sources One or more Targets

Tan, Hok Guan

Future Elect. Co.

7th Floor, SGX Center

2A Shenton Way, #01

Singapore

Mr. Richard Tan Hock Guan

Future Electornics

2 Shenton Way

SGX Centre 1, #07-01, S068804

Hock Guan Tan

Future & Electronics Co.

Shenton House

3 Shenton Way, #01-07

Singapore

© SAP 2008 / Page 9

Richard Tan Hock Guan

Future Electronics

2 Shenton Way

#07-01 SGX Centre 1

Singapore 068804

Tan Hock Guan

Future & Electronics Co.

3 Shenton Way

#07-01 Shenton House

Singapore 068805

Tan Hock Guan

Future & Electronics Co.

3 Shenton Way

#07-01 Shenton House

Singapore 068805

Original, Raw dataStandardized,

Corrected data

Matched,

De-Duplicated

data

Richard Tan Hock Guan

Future Electornics

2 Shenton Way

#07-01 SGX Centre 1

Singapore 068804

Tan Hok Guan

Future Elect.

2 Shenton Way

#07-01 SGX Centre 1

Singapore 068804

Apply “Fuzzy”

match

techniques

Using Dictionaries

+ Directories Rules

Data Quality

Management

Uncleaned, disparate data Cleansed data De-duped data

Tan, Hok Guan

Future Elect. Co.

7th Floor, SGX Center

2A Shenton Way, #01

Singapore

Mr. Richard Tan Hock Guan

Future Electornics

2 Shenton Way

SGX Centre 1, #07-01, S068804

Hock Guan Tan

Future & Electronics Co.

Shenton House

3 Shenton Way, #01-07

Singapore

Use Case: Data Quality Management

Scheduled Cleansing/Matching

© SAP 2008 / Page 10

Data Services Technical Architecture

Architecture

© SAP 2008 / Page 11© SAP 2008 / Page 11

Client-based (for job design)

and mostly Web-based (for

all other tasks)

Servers run on Windows and

UNIX (including Linux), while

Client runs only on Windows

Data Services Architecture

Designer (Windows) Administrator (Web)

Job Server and Engine

Heterogeneous

Data Sources Heterogeneous

Data Targets

Web

Applications

Local

RepositoryCentral

Repository

Dictionaries

+ Directories

Profiler

Repository

Request-Response

Access Server

Real-time

Services

© SAP 2008 / Page 12© SAP 2008 / Page 12

Data Services Physical Architecture

DI Designer (Windows) DI Administrator (Web)

Request-Response

Access Server

Real-time

Services

Job Server and Engine

Web Applications

Local

RepositoryCentral

Repository

Profiler

Repository

Server

(Windows,

UNIX, Linux)(Windows/Linux 32-bit,

UNIX 64-bit)

Repository (Logical –

repositories can be

separate/different

databases on

different servers)

Client

Global Parsing

Repository

Address

Directories

© SAP 2008 / Page 13

Developing in Data Services

Development Environment

© SAP 2008 / Page 14© SAP 2008 / Page 14

Data Services Designer

Repository

Window

Job

Dataflow

Project

Source

Target

Status

• Job Server

• Profiler Server

Workflow /

Dataflow

Canvas

© SAP 2008 / Page 15

Offers tremendous productivity

Pre-built (no coding) access to common

databases, common ERP applications

Access to ERP applications is via ERP

application layer, maintaining application

integrity

Exposes the application’s metadata layer

Pre-Built Interfaces

Databases:

• Oracle

• DB2

• Sybase & IQ

• SQL Server

• Informix

• Teradata

• ODBC

• MySQL

• Netezza

Applications:

• JD Edwards

• Oracle Apps

• PeopleSoft

• Siebel

• SFDC

• SAP BI

• SAP ERP (R/3)

• ABAP

• BAPI

• IDoc

Files & Transport:

• Text delimited

• Text fixed width

• EBCDIC

• XML

• Cobol

• Excel

• HTTP

• JMS

• SOAP (Web

Services)

Mainframe*

(with partner):

• ADABAS

• ISAM

• VSAM

• Enscribe

• IMS/DB

• RMS

• Both direct and

change data

© SAP 2008 / Page 16© SAP 2008 / Page 16

Connecting to Datastore Source / Target

Right-click to create Connection

Select Name, Type

Enter Connection Details

Message

1

2

© SAP 2008 / Page 17© SAP 2008 / Page 17

Connecting to File Source

Right-click to create File Schema

and Connection

1

Enter Connection Details2

© SAP 2008 / Page 18

Data Profiling

Analysis of data beyond viewing

Frequency distribution

Distinct values

Null values

Minimum/Maximum values

Data Patterns (e.g. Xxx Xxxx99,

99-Xxx)

Can drill down to view specific

records

© SAP 2008 / Page 19

Relationship Analysis

Comparison of values between data

sets to determine fit

Shows % of non-matching values

among

Table - Table

Flat file - Flat file

Table - Flat file

Can drill down to view actual

records

© SAP 2008 / Page 20

Interactive Debugger

Very useful for checking the logic of a

dataflow

Can examine and modify data row-by-row

Can place filters and breakpoints to pause

execution of job and returns control to user

Can break after processing a number of rows

© SAP 2008 / Page 21

Data Quality Management

Data Quality

© SAP 2008 / Page 22

Address Assignment Levels for Countries

Last-Line (or Country, Locality/City, Postcode) (248 countries/territories):

Singapore, Malaysia, India, etc.

Primary/Street name: (41 countries):

Brazil, Cyprus, French Guiana, French Polynesia, Greece, Greenland, Guadeloupe, Guam,

Martinique, Mayotte, Monaco, New Caledonia, New Zealand, Northern Mariana Islands,

Palau, Reunion, Saint Pierre and Miquelon, San Marino, Wallis and Futuna, (including the

ones listed under Primary/Street range)

Primary/Street range (to unit level - 25 countries):

American Samoa, Australia*, Austria, Belgium, Canada*, Denmark, Faroe Islands, Finland,

France, Germany, Italy, Japan*, Liechtenstein, Luxembourg, Netherlands, Norway, Poland,

Portugal, Puerto Rico, Spain, Sweden, Switzerland, United Kingdom, United States*, U.S.

Virgin Islands,

Note:

1. * - Requires Country-specific Engine

2. Address-line (or Unit) assignment level for Singapore before end

2008 through license with SingaporePost

© SAP 2008 / Page 23

More on Address Cleansing / Verification

Original Addresses

Cleansed Addresses

For some countries,

Address Cleanse

can correct or add

postal codes

© SAP 2008 / Page 24

More on Data (Customer) Cleansing

Original Customer Data

Cleansed Customer Data

Parsing (Identification and

Isolation of specific parts of

mixed data) can be extended

to non-Customer information

© SAP 2008 / Page 25

Universal Data Cleanse

Original, Raw dataCustomized

Dictionary /

Rules

Standardized and properly

formatted data

Custom Dictionary / Rules

Data Quality

Management

Uncleaned Product data Cleansed data

© SAP 2008 / Page 26

Rule1 = Size + Topping + Topping + Topping + Crust + Crust;

Action = Pizza;

Pizza = 1 : Pizza_Crust : 5;

Pizza = 1 : Pizza_Crust : 6;

Pizza = 1 : Pizza_Topping : 2;

Pizza = 1 : Pizza_Topping : 3;

Pizza = 1 : Pizza_Topping : 4;

Pizza = 1 : Pizza_Size : 1;

End_Action;

Universal Data Cleanse

Custom Parser

Word breaking

Break the input down into smaller pieces

Tokenization

Assign meaning to the pieces

Rule matching (pattern)

Match the pieces to rules

Actions & Action item

assignment

Create output from rule matches

Word Break Tokenize Rule Match Action

Large mushrooms sausage pepperoni stuffed crust

Pizza

Pizza_Crust

Pizza_Topping

Pizza_Size

© SAP 2008 / Page 27

Universal Data Cleanse

Custom Dictionaries

Dictionary

entries can

be added

or modified

Entire new

dictionary

can be

createdRule: Replace this word白い with “white”

when it is classified as COLOR

© SAP 2008 / Page 28

Deploying in Data Services

Deployment Environment

© SAP 2008 / Page 29

Multi-User Development Environment

DEV

Job Server

Central RepoDEV

TEST

Local Repo

TEST

Job Server

TEST PROD

PROD

Local Repo

PROD

Job Server

PROD

Job Server

PROD

Job Server

PROD

Job Server

Server Group

Central Repo

Check in GetGet Check in

Local Repo

Local Repo

Local Repo

Local Repo

© SAP 2008 / Page 30

Check-In/Check-Out Management

Sophisticated method for sharing/moving DI objects

Involves check-in/out of objects with versioning

Local

Centralized

© SAP 2008 / Page 31

Real-Time vs Batch (Scheduled) Processing

Support

Typical Batch

Job

Real-Time capabilities built into the engine in the form

of Request-Response message processing

ERP or Web

applications Data Services

Job Server and Engine

Message

Response

Immediate action

upon receiving

Message

Immediate action

upon receiving

Response

12

3

4

Access Server

Real-time

Services

Real-time Jobs

Listener

Converted to

Real-Time Job

© SAP 2008 / Page 32

Invoking/Consuming Data Services-Produced

Web Services

External

Web Service

client Request-Response

Access Server

Real-time

Services

Job Server and Engine

Repository

Data Services

Web Server

Web

Services

server

Invoke

Real-time

jobs

Invoke

Batch jobs

12

3

4a

4b

4c

5a

5b

© SAP 2008 / Page 33

Consuming 3rd

Party Web Services

Define Web Service

source

Use imported Web

Service operation as

standard function

2

1

© SAP 2008 / Page 34

Deploying for Performance

Deployment Environment

© SAP 2008 / Page 35© SAP 2008 / Page 35

Data Services Datastore Configurations

Data Sources/Targets can be

switched between different

development environments

© SAP 2008 / Page 36© SAP 2008 / Page 36

Data Services

System Configurations

System

configuration

DEV

System

configuration

PROD

Substitution Parameter

ConfigurationDEV PROD

$$FILE_PATH C:\TEMP C:\PROD

$$LOG_PATH C:\LOG C:\LOG

Datastore

Source

Datastore

Target

Datastore

configurationConfiguration1

Database type Oracle

Server name SJ-PROD

Database name Source_Data

User sd_5263

Password *****

Datastore

configurationLocalServer ProdServer

Database type mySQL Oracle

Server name Localhost SJ-PROD

Database name Source_Data Source_Data

User sa sd_5263

Password ***** *****

© SAP 2008 / Page 37

Parallel Execution

via explicit Parallel Data flows

Improving Performance - 1

The Dimension

data flows run in

parallel

The Dimensions data

flow run before the

Facts data flow

© SAP 2008 / Page 38

Parallel Execution

via Partitioned source and target

Improving Performance - 2

Non-partitioned data flow at Design Time

Partitioned at Execution time

If Source is

partitioned

2 ways

If Target is

partitioned 2

ways

Specifying

Partitioning

scheme

© SAP 2008 / Page 39

Improving Performance - 3

Parallel Execution

via specifying Degree of Parallelism (DOP) > 1

At Design Time

At Execution Time

DOP=2

© SAP 2008 / Page 40

Distributing Data Flow Execution - 1

Run as a separate process for the

following resource-intensive

operations

Hierarchy_Flattening

Join

GROUP_BY

ORDER_BY

DISTINCT

Table_Comparison

Lookup_ext function

Count_distinct function

Associate

CountryID

Global Address Cleanse

Global Suggestion List

Match

User-Defined

© SAP 2008 / Page 41

Distributing Data Flow Execution - 2

Various ways available to improve

performance

Flexible ETL (traditional transform)

Hybrid “ELT” (transform at target, transform

at source)

Concept is “Push-Down” processing Target

Data Integrator

Source

Traditional ETL

Target

Data Integrator

Source

Transform at Target

Target

Data Integrator

Source

Transform at Source

© SAP 2008 / Page 42

Distributing Data Flow Execution - 3

“Push-Down” in detail

“Push-Down” can be staged

onto database or file directory

To push down

operations to the

source server

To push down the

GroupBy operation

to the target server

Original

Data Flow

Data Flow with

Push-Down’s

Push-Down settings

© SAP 2008 / Page 43

Distributing Data Flow Execution - 4

TargetSource

Grid deployment (across multiple servers)

is easy

No changes required in Job flows

Simply specify at execution time what to

distribute across servers – Job, Data flow, or

sub-Data flow

© SAP 2008 / Page 44

Distributing Data Flow Execution - 5

Distribution Level 101

Job level distribution

All processes belonging to job execute on

the same computer

All sub-data flows run on same computer

© SAP 2008 / Page 45

Distributing Data Flow Execution - 6

Computer 1 Computer 2

Distribution Level 101

Data flow level distribution

All processes of each data flow can execute on a different computer

© SAP 2008 / Page 46

Distributing Data Flow Execution - 7

Computer 1Computer 3

Computer 2

Computer 4

Distribution Level 101

Sub data flow level distribution

Each sub data flow can execute on a different computer

© SAP 2008 / Page 47

Managing with Data Services

Management Environment

© SAP 2008 / Page 48

General Management

Web-based Management with graphical dash-board info

© SAP 2008 / Page 49

Recovery From Unsuccessful Job Execution

Automated Recovery

With this set, DI records the result of each successful

step in a job

During recovery, DI re-runs uncompleted or failed

steps under the same conditions as the original job

There is Manual Recovery, too

© SAP 2008 / Page 50

Maintenance with Data Services

Maintenance Environment

© SAP 2008 / Page 51

Auto-Reporting

Via Web-based Management

Console

Reports automatically generated

Report details can be interactively

chosen and drilled into

Reports can be printed

Interactive

Dataflow graphics

© SAP 2008 / Page 52

Lineage And Impact

Provides analyzing dependencies for

Impact (source to which target/s)

Lineage (target back to which source/s)

© SAP 2008 / Page 53

Thank you!

© SAP 2008 / Page 54

Copyright 2009 SAP AG

All Rights Reserved

No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP AG. The information contained herein

may be changed without prior notice.

Some software products marketed by SAP AG and its distributors contain proprietary software components of other software vendors.

SAP, R/3, xApps, xApp, SAP NetWeaver, Duet, SAP Business ByDesign, ByDesign, PartnerEdge and other SAP products and services mentioned herein as well as their

respective logos are trademarks or registered trademarks of SAP AG in Germany and in several other countries all over the world.

Business Objects and the Business Objects logo, BusinessObjects, Crystal Reports, Crystal Decisions, Web Intelligence, Xcelsius and other Business Objects products and

services mentioned herein as well as their respective logos are trademarks or registered trademarks of Business Objects S.A. in the United States and in several other

countries. Business Objects is an SAP Company. All other product and service names mentioned and associated logos displayed are the trademarks of their respective

companies. Data contained in this document serves informational purposes only. National product specifications may vary.

The information in this document is proprietary to SAP. No part of this document may be reproduced, copied, or transmitted in any form or for any purpose without the

express prior written permission of SAP AG. This document is a preliminary version and not subject to your license agreement or any other agreement with SAP. This

document contains only intended strategies, developments, and functionalities of the SAP® product and is not intended to be binding upon SAP to any particular course of

business, product strategy, and/or development. Please note that this document is subject to change and may be changed by SAP at any time without notice. SAP assumes

no responsibility for errors or omissions in this document. SAP does not warrant the accuracy or completeness of the information, text, graphics, links, or other items

contained within this material. This document is provided without a warranty of any kind, either express or implied, including but not limited to the implied warranties of

merchantability, fitness for a particular purpose, or non-infringement.

SAP shall have no liability for damages of any kind including without limitation direct, special, indirect, or consequential damages that may result from the use of these

materials. This limitation shall not apply in cases of intent or gross negligence.

The statutory liability for personal injury and defective products is not affected. SAP has no control over the information that you may access through the use of hot links

contained in these materials and does not endorse your use of third-party Web pages nor provide any warranty whatsoever relating to third-party Web pages.