sap businessobjects data services - community … sap businessobjects data services one tool for...
TRANSCRIPT
© SAP 2008 / Page 2
1. Data Services Technical Architecture
2. Developing in Data Services
3. Data Quality Management
3. Deploying in Data Services
4. Deploying for Performance
5. Managing with Data Services
6. Maintenance with Data Services
Agenda
© SAP 2008 / Page 3© SAP 2007/Page 3
Are you spending too much resources learning
different tools?
Separate products create the need for redundant investments in training, software management, and life-cycle support
DI Development UI
DI Metadata
DATA INTEGRATOR
DI Engine
DQ Development (UI)
DQ Metadata
DATA QUALITY
DQ Engine
© SAP 2008 / Page 4
SAP BusinessObjects Data Services
One Runtime Architecture
One Development User Interface
One Metadata
Repository
One Administration Environment
One Set of Connectors
Data Services
Improve
Deliver
Transform
Access
Runtime Architecture
Metadata
Repository
Development User Interface
Administration and Connectors
Runtime Architecture
Metadata
Repository
Development User Interface
Administration and Connectors
Data Integrator
Data Quality
Data Services is the first single solution combining data integration and data quality
© SAP 2008 / Page 5© SAP 2007/Page 5
Introducing SAP BusinessObjects
Data Services
One tool for enterprise-class data integration and data quality
One intuitive development user interface (UI)
One administration environment
One runtime architecture
DATA SERVICESFirst single, enterprise-class data
Integration and quality application
One Runtime Architecture
One Development User Interface
One Metadata
Repository
One Administration Environment
One Set of Connectors
Data Services
Improve
Deliver
Transform
Access
© SAP 2008 / Page 6
SAP BusinessObjects Data Services
Gold Award Winner – 2008 Product of the Year
Source: SearchDataManagement.com, April 2009
© SAP 2008 / Page 7
Real Time
What is Data Services?
Technology with capabilities to
Explore (profile), Transform, and Move
data anywhere at any frequency
IMPROVE data (parse, cleanse, enhance,
match, consolidate)
Deliver auditable information
Maximize developer productivity
Deliver extreme scalability
On single servers
On multiple servers (Grid computing)
Benefits:
Database / Application agnostic
Total graphical environment maximizes
developer productivity with little or no
coding
Investment protected with extreme
scalability possibility
PhysicalData Integration
Semantic Layer
Batch
© SAP 2008 / Page 8
Use Case: Data Integrator
Integrate Heterogeneous Data
CRM
RDBMS
Excel
Emails
Files (Flat, XML)
Web
Mainframes
ERP
Original,
Raw data
Loads properly formatted and
structured data into Target/s
Richard Tan Hock Guan
Future Electornics
Singapore 068804
Tan Hok Guan
Future Elect. Co.
Singapore
Tan Hock Guan
Future & Electronics Co.
Singapore
Business
Intelligence
SAP BW
SAP R/3
CRM
Data
Mart
Data
Warehouse
Extracts &
Transforms
data
Data Integrator
Disparate sources One or more Targets
Tan, Hok Guan
Future Elect. Co.
7th Floor, SGX Center
2A Shenton Way, #01
Singapore
Mr. Richard Tan Hock Guan
Future Electornics
2 Shenton Way
SGX Centre 1, #07-01, S068804
Hock Guan Tan
Future & Electronics Co.
Shenton House
3 Shenton Way, #01-07
Singapore
© SAP 2008 / Page 9
Richard Tan Hock Guan
Future Electronics
2 Shenton Way
#07-01 SGX Centre 1
Singapore 068804
Tan Hock Guan
Future & Electronics Co.
3 Shenton Way
#07-01 Shenton House
Singapore 068805
Tan Hock Guan
Future & Electronics Co.
3 Shenton Way
#07-01 Shenton House
Singapore 068805
Original, Raw dataStandardized,
Corrected data
Matched,
De-Duplicated
data
Richard Tan Hock Guan
Future Electornics
2 Shenton Way
#07-01 SGX Centre 1
Singapore 068804
Tan Hok Guan
Future Elect.
2 Shenton Way
#07-01 SGX Centre 1
Singapore 068804
Apply “Fuzzy”
match
techniques
Using Dictionaries
+ Directories Rules
Data Quality
Management
Uncleaned, disparate data Cleansed data De-duped data
Tan, Hok Guan
Future Elect. Co.
7th Floor, SGX Center
2A Shenton Way, #01
Singapore
Mr. Richard Tan Hock Guan
Future Electornics
2 Shenton Way
SGX Centre 1, #07-01, S068804
Hock Guan Tan
Future & Electronics Co.
Shenton House
3 Shenton Way, #01-07
Singapore
Use Case: Data Quality Management
Scheduled Cleansing/Matching
© SAP 2008 / Page 11© SAP 2008 / Page 11
Client-based (for job design)
and mostly Web-based (for
all other tasks)
Servers run on Windows and
UNIX (including Linux), while
Client runs only on Windows
Data Services Architecture
Designer (Windows) Administrator (Web)
Job Server and Engine
Heterogeneous
Data Sources Heterogeneous
Data Targets
Web
Applications
Local
RepositoryCentral
Repository
Dictionaries
+ Directories
Profiler
Repository
Request-Response
Access Server
Real-time
Services
© SAP 2008 / Page 12© SAP 2008 / Page 12
Data Services Physical Architecture
DI Designer (Windows) DI Administrator (Web)
Request-Response
Access Server
Real-time
Services
Job Server and Engine
Web Applications
Local
RepositoryCentral
Repository
Profiler
Repository
Server
(Windows,
UNIX, Linux)(Windows/Linux 32-bit,
UNIX 64-bit)
Repository (Logical –
repositories can be
separate/different
databases on
different servers)
Client
Global Parsing
Repository
Address
Directories
© SAP 2008 / Page 14© SAP 2008 / Page 14
Data Services Designer
Repository
Window
Job
Dataflow
Project
Source
Target
Status
• Job Server
• Profiler Server
Workflow /
Dataflow
Canvas
© SAP 2008 / Page 15
Offers tremendous productivity
Pre-built (no coding) access to common
databases, common ERP applications
Access to ERP applications is via ERP
application layer, maintaining application
integrity
Exposes the application’s metadata layer
Pre-Built Interfaces
Databases:
• Oracle
• DB2
• Sybase & IQ
• SQL Server
• Informix
• Teradata
• ODBC
• MySQL
• Netezza
Applications:
• JD Edwards
• Oracle Apps
• PeopleSoft
• Siebel
• SFDC
• SAP BI
• SAP ERP (R/3)
• ABAP
• BAPI
• IDoc
Files & Transport:
• Text delimited
• Text fixed width
• EBCDIC
• XML
• Cobol
• Excel
• HTTP
• JMS
• SOAP (Web
Services)
Mainframe*
(with partner):
• ADABAS
• ISAM
• VSAM
• Enscribe
• IMS/DB
• RMS
• Both direct and
change data
© SAP 2008 / Page 16© SAP 2008 / Page 16
Connecting to Datastore Source / Target
Right-click to create Connection
Select Name, Type
Enter Connection Details
Message
1
2
© SAP 2008 / Page 17© SAP 2008 / Page 17
Connecting to File Source
Right-click to create File Schema
and Connection
1
Enter Connection Details2
© SAP 2008 / Page 18
Data Profiling
Analysis of data beyond viewing
Frequency distribution
Distinct values
Null values
Minimum/Maximum values
Data Patterns (e.g. Xxx Xxxx99,
99-Xxx)
Can drill down to view specific
records
© SAP 2008 / Page 19
Relationship Analysis
Comparison of values between data
sets to determine fit
Shows % of non-matching values
among
Table - Table
Flat file - Flat file
Table - Flat file
Can drill down to view actual
records
© SAP 2008 / Page 20
Interactive Debugger
Very useful for checking the logic of a
dataflow
Can examine and modify data row-by-row
Can place filters and breakpoints to pause
execution of job and returns control to user
Can break after processing a number of rows
© SAP 2008 / Page 22
Address Assignment Levels for Countries
Last-Line (or Country, Locality/City, Postcode) (248 countries/territories):
Singapore, Malaysia, India, etc.
Primary/Street name: (41 countries):
Brazil, Cyprus, French Guiana, French Polynesia, Greece, Greenland, Guadeloupe, Guam,
Martinique, Mayotte, Monaco, New Caledonia, New Zealand, Northern Mariana Islands,
Palau, Reunion, Saint Pierre and Miquelon, San Marino, Wallis and Futuna, (including the
ones listed under Primary/Street range)
Primary/Street range (to unit level - 25 countries):
American Samoa, Australia*, Austria, Belgium, Canada*, Denmark, Faroe Islands, Finland,
France, Germany, Italy, Japan*, Liechtenstein, Luxembourg, Netherlands, Norway, Poland,
Portugal, Puerto Rico, Spain, Sweden, Switzerland, United Kingdom, United States*, U.S.
Virgin Islands,
Note:
1. * - Requires Country-specific Engine
2. Address-line (or Unit) assignment level for Singapore before end
2008 through license with SingaporePost
© SAP 2008 / Page 23
More on Address Cleansing / Verification
Original Addresses
Cleansed Addresses
For some countries,
Address Cleanse
can correct or add
postal codes
© SAP 2008 / Page 24
More on Data (Customer) Cleansing
Original Customer Data
Cleansed Customer Data
Parsing (Identification and
Isolation of specific parts of
mixed data) can be extended
to non-Customer information
© SAP 2008 / Page 25
Universal Data Cleanse
Original, Raw dataCustomized
Dictionary /
Rules
Standardized and properly
formatted data
Custom Dictionary / Rules
Data Quality
Management
Uncleaned Product data Cleansed data
© SAP 2008 / Page 26
Rule1 = Size + Topping + Topping + Topping + Crust + Crust;
Action = Pizza;
Pizza = 1 : Pizza_Crust : 5;
Pizza = 1 : Pizza_Crust : 6;
Pizza = 1 : Pizza_Topping : 2;
Pizza = 1 : Pizza_Topping : 3;
Pizza = 1 : Pizza_Topping : 4;
Pizza = 1 : Pizza_Size : 1;
End_Action;
Universal Data Cleanse
Custom Parser
Word breaking
Break the input down into smaller pieces
Tokenization
Assign meaning to the pieces
Rule matching (pattern)
Match the pieces to rules
Actions & Action item
assignment
Create output from rule matches
Word Break Tokenize Rule Match Action
Large mushrooms sausage pepperoni stuffed crust
Pizza
Pizza_Crust
Pizza_Topping
Pizza_Size
© SAP 2008 / Page 27
Universal Data Cleanse
Custom Dictionaries
Dictionary
entries can
be added
or modified
Entire new
dictionary
can be
createdRule: Replace this word白い with “white”
when it is classified as COLOR
© SAP 2008 / Page 29
Multi-User Development Environment
DEV
Job Server
Central RepoDEV
TEST
Local Repo
TEST
Job Server
TEST PROD
PROD
Local Repo
PROD
Job Server
PROD
Job Server
PROD
Job Server
PROD
Job Server
Server Group
Central Repo
Check in GetGet Check in
Local Repo
Local Repo
Local Repo
Local Repo
© SAP 2008 / Page 30
Check-In/Check-Out Management
Sophisticated method for sharing/moving DI objects
Involves check-in/out of objects with versioning
Local
Centralized
© SAP 2008 / Page 31
Real-Time vs Batch (Scheduled) Processing
Support
Typical Batch
Job
Real-Time capabilities built into the engine in the form
of Request-Response message processing
ERP or Web
applications Data Services
Job Server and Engine
Message
Response
Immediate action
upon receiving
Message
Immediate action
upon receiving
Response
12
3
4
Access Server
Real-time
Services
Real-time Jobs
Listener
Converted to
Real-Time Job
© SAP 2008 / Page 32
Invoking/Consuming Data Services-Produced
Web Services
External
Web Service
client Request-Response
Access Server
Real-time
Services
Job Server and Engine
Repository
Data Services
Web Server
Web
Services
server
Invoke
Real-time
jobs
Invoke
Batch jobs
12
3
4a
4b
4c
5a
5b
© SAP 2008 / Page 33
Consuming 3rd
Party Web Services
Define Web Service
source
Use imported Web
Service operation as
standard function
2
1
© SAP 2008 / Page 35© SAP 2008 / Page 35
Data Services Datastore Configurations
Data Sources/Targets can be
switched between different
development environments
© SAP 2008 / Page 36© SAP 2008 / Page 36
Data Services
System Configurations
System
configuration
DEV
System
configuration
PROD
Substitution Parameter
ConfigurationDEV PROD
$$FILE_PATH C:\TEMP C:\PROD
$$LOG_PATH C:\LOG C:\LOG
Datastore
Source
Datastore
Target
Datastore
configurationConfiguration1
Database type Oracle
Server name SJ-PROD
Database name Source_Data
User sd_5263
Password *****
Datastore
configurationLocalServer ProdServer
Database type mySQL Oracle
Server name Localhost SJ-PROD
Database name Source_Data Source_Data
User sa sd_5263
Password ***** *****
© SAP 2008 / Page 37
Parallel Execution
via explicit Parallel Data flows
Improving Performance - 1
The Dimension
data flows run in
parallel
The Dimensions data
flow run before the
Facts data flow
© SAP 2008 / Page 38
Parallel Execution
via Partitioned source and target
Improving Performance - 2
Non-partitioned data flow at Design Time
Partitioned at Execution time
If Source is
partitioned
2 ways
If Target is
partitioned 2
ways
Specifying
Partitioning
scheme
© SAP 2008 / Page 39
Improving Performance - 3
Parallel Execution
via specifying Degree of Parallelism (DOP) > 1
At Design Time
At Execution Time
DOP=2
© SAP 2008 / Page 40
Distributing Data Flow Execution - 1
Run as a separate process for the
following resource-intensive
operations
Hierarchy_Flattening
Join
GROUP_BY
ORDER_BY
DISTINCT
Table_Comparison
Lookup_ext function
Count_distinct function
Associate
CountryID
Global Address Cleanse
Global Suggestion List
Match
User-Defined
© SAP 2008 / Page 41
Distributing Data Flow Execution - 2
Various ways available to improve
performance
Flexible ETL (traditional transform)
Hybrid “ELT” (transform at target, transform
at source)
Concept is “Push-Down” processing Target
Data Integrator
Source
Traditional ETL
Target
Data Integrator
Source
Transform at Target
Target
Data Integrator
Source
Transform at Source
© SAP 2008 / Page 42
Distributing Data Flow Execution - 3
“Push-Down” in detail
“Push-Down” can be staged
onto database or file directory
To push down
operations to the
source server
To push down the
GroupBy operation
to the target server
Original
Data Flow
Data Flow with
Push-Down’s
Push-Down settings
© SAP 2008 / Page 43
Distributing Data Flow Execution - 4
TargetSource
Grid deployment (across multiple servers)
is easy
No changes required in Job flows
Simply specify at execution time what to
distribute across servers – Job, Data flow, or
sub-Data flow
© SAP 2008 / Page 44
Distributing Data Flow Execution - 5
Distribution Level 101
Job level distribution
All processes belonging to job execute on
the same computer
All sub-data flows run on same computer
© SAP 2008 / Page 45
Distributing Data Flow Execution - 6
Computer 1 Computer 2
Distribution Level 101
Data flow level distribution
All processes of each data flow can execute on a different computer
© SAP 2008 / Page 46
Distributing Data Flow Execution - 7
Computer 1Computer 3
Computer 2
Computer 4
Distribution Level 101
Sub data flow level distribution
Each sub data flow can execute on a different computer
© SAP 2008 / Page 49
Recovery From Unsuccessful Job Execution
Automated Recovery
With this set, DI records the result of each successful
step in a job
During recovery, DI re-runs uncompleted or failed
steps under the same conditions as the original job
There is Manual Recovery, too
© SAP 2008 / Page 51
Auto-Reporting
Via Web-based Management
Console
Reports automatically generated
Report details can be interactively
chosen and drilled into
Reports can be printed
Interactive
Dataflow graphics
© SAP 2008 / Page 52
Lineage And Impact
Provides analyzing dependencies for
Impact (source to which target/s)
Lineage (target back to which source/s)
© SAP 2008 / Page 54
Copyright 2009 SAP AG
All Rights Reserved
No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP AG. The information contained herein
may be changed without prior notice.
Some software products marketed by SAP AG and its distributors contain proprietary software components of other software vendors.
SAP, R/3, xApps, xApp, SAP NetWeaver, Duet, SAP Business ByDesign, ByDesign, PartnerEdge and other SAP products and services mentioned herein as well as their
respective logos are trademarks or registered trademarks of SAP AG in Germany and in several other countries all over the world.
Business Objects and the Business Objects logo, BusinessObjects, Crystal Reports, Crystal Decisions, Web Intelligence, Xcelsius and other Business Objects products and
services mentioned herein as well as their respective logos are trademarks or registered trademarks of Business Objects S.A. in the United States and in several other
countries. Business Objects is an SAP Company. All other product and service names mentioned and associated logos displayed are the trademarks of their respective
companies. Data contained in this document serves informational purposes only. National product specifications may vary.
The information in this document is proprietary to SAP. No part of this document may be reproduced, copied, or transmitted in any form or for any purpose without the
express prior written permission of SAP AG. This document is a preliminary version and not subject to your license agreement or any other agreement with SAP. This
document contains only intended strategies, developments, and functionalities of the SAP® product and is not intended to be binding upon SAP to any particular course of
business, product strategy, and/or development. Please note that this document is subject to change and may be changed by SAP at any time without notice. SAP assumes
no responsibility for errors or omissions in this document. SAP does not warrant the accuracy or completeness of the information, text, graphics, links, or other items
contained within this material. This document is provided without a warranty of any kind, either express or implied, including but not limited to the implied warranties of
merchantability, fitness for a particular purpose, or non-infringement.
SAP shall have no liability for damages of any kind including without limitation direct, special, indirect, or consequential damages that may result from the use of these
materials. This limitation shall not apply in cases of intent or gross negligence.
The statutory liability for personal injury and defective products is not affected. SAP has no control over the information that you may access through the use of hot links
contained in these materials and does not endorse your use of third-party Web pages nor provide any warranty whatsoever relating to third-party Web pages.