copyright © 2005, sas institute inc. all rights reserved. real-time data quality for sap dietrich...

50
Copyright © 2005, SAS Institute Inc. All rights reserved. Real-time Data Quality for SAP Dietrich O. Banschbach Manager, R&D EMEA SAS International

Upload: edgar-powers

Post on 16-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Copyright © 2005, SAS Institute Inc. All rights reserved. Real-time Data Quality for SAP Dietrich O. Banschbach Manager, R&D EMEA SAS International

Copyright © 2005, SAS Institute Inc. All rights reserved.

Real-time Data Quality

for SAPDietrich O. BanschbachManager, R&D EMEASAS International

Page 2: Copyright © 2005, SAS Institute Inc. All rights reserved. Real-time Data Quality for SAP Dietrich O. Banschbach Manager, R&D EMEA SAS International

Copyright © 2005, SAS Institute Inc. All rights reserved. 2

Agenda

Overview

dfConnector for SAP

Scenarios

Technology

Additional Information

Page 3: Copyright © 2005, SAS Institute Inc. All rights reserved. Real-time Data Quality for SAP Dietrich O. Banschbach Manager, R&D EMEA SAS International

Copyright © 2005, SAS Institute Inc. All rights reserved. 3

Overview: Companies

Companies involved:

SAP AG - world’s largest Enterprise Resource Planning (ERP) software company

DataFlux Corporation (a SAS company) a leading provider of data management solutions consisting of data quality, data profiling, data integration, data augmentation and data monitoring

Page 4: Copyright © 2005, SAS Institute Inc. All rights reserved. Real-time Data Quality for SAP Dietrich O. Banschbach Manager, R&D EMEA SAS International

Copyright © 2005, SAS Institute Inc. All rights reserved. 4

Overview: SAP partnership

SAS is an SAP Software Partner with several SAP certified interfaces

DataFlux, an SAP Software Partner in its own right, has attained SAP interface certificationfor its DataFlux dfConnector for SAP product

Page 5: Copyright © 2005, SAS Institute Inc. All rights reserved. Real-time Data Quality for SAP Dietrich O. Banschbach Manager, R&D EMEA SAS International

Copyright © 2005, SAS Institute Inc. All rights reserved. 5

dfConnector for SAP

DataFlux dfConnector for SAP enhances data quality in SAP systems – in real-time

Facilitates communication between SAP applications and DataFlux dfIntelliServer

Offers transparent access from SAP applications to DataFlux dfIntelliServer services for data validation, standardization, deduplication, error-tolerant search, etc.

Page 6: Copyright © 2005, SAS Institute Inc. All rights reserved. Real-time Data Quality for SAP Dietrich O. Banschbach Manager, R&D EMEA SAS International

Copyright © 2005, SAS Institute Inc. All rights reserved. 6

dfConnector for SAP

Provides a remote function call (RFC) server that channels function calls from within SAP systems to dfIntelliServer and returns results to SAP

Framework consisting of a set of DataFlux supplied ABAP functions that map to dfIntelliServer functions. These can be called by any SAP application.

Functions can be used to build new or extend existing data quality solutions in SAP using DataFlux methods

Page 7: Copyright © 2005, SAS Institute Inc. All rights reserved. Real-time Data Quality for SAP Dietrich O. Banschbach Manager, R&D EMEA SAS International

Copyright © 2005, SAS Institute Inc. All rights reserved. 7

dfConnector for SAP: Architecture

RFC server, based on SAP Java Connector

dfIntelliServer(data quality algorithms,

reference database)

Business Add-In (ABAP)

Search Index

SAS Oracle MySQL DB/2 MS SQL

JDBC

SAP Web Application

Server

BADI API

Page 8: Copyright © 2005, SAS Institute Inc. All rights reserved. Real-time Data Quality for SAP Dietrich O. Banschbach Manager, R&D EMEA SAS International

Copyright © 2005, SAS Institute Inc. All rights reserved. 8

dfConnector for SAP: Framework

Function modules written in ABAP use a standard „call function destination“ to invoke a method that is not part of the current SAP system

The „call function destination“ invokes dfConnector listening at the specified destination

dfConnector gathers all parameters and initiates the appropriate call into dfIntelliServer using its Java client API

Page 9: Copyright © 2005, SAS Institute Inc. All rights reserved. Real-time Data Quality for SAP Dietrich O. Banschbach Manager, R&D EMEA SAS International

Copyright © 2005, SAS Institute Inc. All rights reserved. 9

dfConnector for SAP: Postal Address Validation

ABAP programmers can use the framework functions in any SAP application

As an example application that uses this framework, dfConnector for SAP supports postal address validation as defined in SAP’s BC-BAS-PV certification scenario.

Enhances SAP’s Business Address Services (formerly Central Address Management)

dfConnector is “Certified for SAP NetWeaver”.Formally tested with R/3 Enterprise (4.7)

Page 10: Copyright © 2005, SAS Institute Inc. All rights reserved. Real-time Data Quality for SAP Dietrich O. Banschbach Manager, R&D EMEA SAS International

Copyright © 2005, SAS Institute Inc. All rights reserved. 10

dfConnector for SAP: Postal Address Validation

Customer, vendor and other addresses in SAP are checked in real-time for correct city names, street names, house numbers and zip codes

Missing information is auto completed from a reference database

Quarterly adjustment process keeps addresses up to date via a batch-run

− Reports which addresses are correct and which ones could not be validated (stating the reason)

− Process can be used to do initial validation of all addresses in SAP

Page 11: Copyright © 2005, SAS Institute Inc. All rights reserved. Real-time Data Quality for SAP Dietrich O. Banschbach Manager, R&D EMEA SAS International

Copyright © 2005, SAS Institute Inc. All rights reserved. 11

dfConnector for SAP: Deduplication

In addition to postal address validation, a duplicate check is carried out before a new entry can be saved in SAP

Avoids multiple entries of the same customer or vendor name with slight differences in spelling

Offers error tolerant (fuzzy) search

Page 12: Copyright © 2005, SAS Institute Inc. All rights reserved. Real-time Data Quality for SAP Dietrich O. Banschbach Manager, R&D EMEA SAS International

Copyright © 2005, SAS Institute Inc. All rights reserved. 12

Scenarios: Postal Address Validation

This scenario enhances data quality within SAP in real-time as address data is entered interactively

Addresses are checked for correct:− city names

− street names

− house numbers

− zip codes

Input is standardized according to postal authority requirements (e.g. USPS rules)

Missing information can be auto completed

Page 13: Copyright © 2005, SAS Institute Inc. All rights reserved. Real-time Data Quality for SAP Dietrich O. Banschbach Manager, R&D EMEA SAS International

Copyright © 2005, SAS Institute Inc. All rights reserved. 13

Scenario 1: Create new customer

Create new customer in SAPGUI using standard SAP transaction XD01

Fill in data:• Company name

• City

• Country

• (No street)

Page 14: Copyright © 2005, SAS Institute Inc. All rights reserved. Real-time Data Quality for SAP Dietrich O. Banschbach Manager, R&D EMEA SAS International

Copyright © 2005, SAS Institute Inc. All rights reserved. 14

Scenario 1: Create new customer

Page 15: Copyright © 2005, SAS Institute Inc. All rights reserved. Real-time Data Quality for SAP Dietrich O. Banschbach Manager, R&D EMEA SAS International

Copyright © 2005, SAS Institute Inc. All rights reserved. 15

Scenario 1: Create new customer

Required entry

Page 16: Copyright © 2005, SAS Institute Inc. All rights reserved. Real-time Data Quality for SAP Dietrich O. Banschbach Manager, R&D EMEA SAS International

Copyright © 2005, SAS Institute Inc. All rights reserved. 16

Scenario 1: Create new customer

Error message in status

line

Missing information

field is colored and cursor is positoned in

that field

Page 17: Copyright © 2005, SAS Institute Inc. All rights reserved. Real-time Data Quality for SAP Dietrich O. Banschbach Manager, R&D EMEA SAS International

Copyright © 2005, SAS Institute Inc. All rights reserved. 17

Scenario 1: Create new customer

Street name entered

incorrectly („Street“

instead of „Drive“)

Region required

to resolve the

address

Click on „Check“ button

when all data has

been entered

Page 18: Copyright © 2005, SAS Institute Inc. All rights reserved. Real-time Data Quality for SAP Dietrich O. Banschbach Manager, R&D EMEA SAS International

Copyright © 2005, SAS Institute Inc. All rights reserved. 18

Scenario 1: Create new customer

Address is validated by dfIntelliServer• City name converted to uppercase

• Postal code (ZiP) added

• Street name uppercased and standardized (DR=Drive)

• District added automatically

Page 19: Copyright © 2005, SAS Institute Inc. All rights reserved. Real-time Data Quality for SAP Dietrich O. Banschbach Manager, R&D EMEA SAS International

Copyright © 2005, SAS Institute Inc. All rights reserved. 19

Scenario 2:Creating a customer with minimal data entry

Data entered in SAP:

• Part of a street name with a spelling mistake

• Postal code

• Country (required by SAP)

Page 20: Copyright © 2005, SAS Institute Inc. All rights reserved. Real-time Data Quality for SAP Dietrich O. Banschbach Manager, R&D EMEA SAS International

Copyright © 2005, SAS Institute Inc. All rights reserved. 20

Scenario 2: Creating a customer with minimal data entry

Partial street

name with spelling mistake

Basic postal code

No region specified

Page 21: Copyright © 2005, SAS Institute Inc. All rights reserved. Real-time Data Quality for SAP Dietrich O. Banschbach Manager, R&D EMEA SAS International

Copyright © 2005, SAS Institute Inc. All rights reserved. 21

Scenario 2:Creating a new customer with minimal data entry Address is validated by dfIntelliServer

• City name uppercased

• Postal code added (zip plus 4)

• Street name uppercased and standardized (PKWY=Parkway)

− Spelling mistake corrected

• District added automatically

• Region added automatically

Page 22: Copyright © 2005, SAS Institute Inc. All rights reserved. Real-time Data Quality for SAP Dietrich O. Banschbach Manager, R&D EMEA SAS International

Copyright © 2005, SAS Institute Inc. All rights reserved. 22

Scenario 3: Inconsistent or unresolvable addresses

Neither post code nor city are specified

User insists on saving a record even though the entry could not be validated

To ensure high availability of the SAP system, address data can still be entered and saved if dfConnector and/or dfIntelliServer are temporarily unavailable. Entries are marked as not having been checked against official address reference data. Those addresses can be corrected in the dfConnector Quarterly Address Adjustment process which checks and updates in batch mode

Page 23: Copyright © 2005, SAS Institute Inc. All rights reserved. Real-time Data Quality for SAP Dietrich O. Banschbach Manager, R&D EMEA SAS International

Copyright © 2005, SAS Institute Inc. All rights reserved. 23

Scenario 3: Inconsistent or unresolvable addresses

Error message: No zip code and/or city specified

Page 24: Copyright © 2005, SAS Institute Inc. All rights reserved. Real-time Data Quality for SAP Dietrich O. Banschbach Manager, R&D EMEA SAS International

Copyright © 2005, SAS Institute Inc. All rights reserved. 24

Scenario 3: Inconsistent or unresolvable addresses

Page 25: Copyright © 2005, SAS Institute Inc. All rights reserved. Real-time Data Quality for SAP Dietrich O. Banschbach Manager, R&D EMEA SAS International

Copyright © 2005, SAS Institute Inc. All rights reserved. 25

Scenario 4: Duplicate search

The following scenario shows the duplicate search and elimination capabilities of DataFlux dfConnector for SAP

The scenario first shows how easy it is (caused by a small typo) to create a duplicate customer record in the SAP database without dfConnector

In comparison, the same process is performed using dfConnector for SAP to identify potential duplicates and resolve the situation

Page 26: Copyright © 2005, SAS Institute Inc. All rights reserved. Real-time Data Quality for SAP Dietrich O. Banschbach Manager, R&D EMEA SAS International

Copyright © 2005, SAS Institute Inc. All rights reserved. 26

Scenario 4: Duplicate search

Using the standard SAP search, the user first checks in SAP if the customer he would like to create does not currently exist. But accidentally he has a small typo in the street name (Wesston instead of Weston)

Page 27: Copyright © 2005, SAS Institute Inc. All rights reserved. Real-time Data Quality for SAP Dietrich O. Banschbach Manager, R&D EMEA SAS International

Copyright © 2005, SAS Institute Inc. All rights reserved. 27

Scenario 4: Duplicate search

The search returns no hits and the user proceeds under the assumption he can now create a unique customer

He creates and saves a new customer entry, thus creating a duplicate

Page 28: Copyright © 2005, SAS Institute Inc. All rights reserved. Real-time Data Quality for SAP Dietrich O. Banschbach Manager, R&D EMEA SAS International

Copyright © 2005, SAS Institute Inc. All rights reserved. 28

Scenario 4: Duplicate search

Page 29: Copyright © 2005, SAS Institute Inc. All rights reserved. Real-time Data Quality for SAP Dietrich O. Banschbach Manager, R&D EMEA SAS International

Copyright © 2005, SAS Institute Inc. All rights reserved. 29

Scenario 4: Duplicate search

Page 30: Copyright © 2005, SAS Institute Inc. All rights reserved. Real-time Data Quality for SAP Dietrich O. Banschbach Manager, R&D EMEA SAS International

Copyright © 2005, SAS Institute Inc. All rights reserved. 30

Scenario 4: Duplicate search

After that the duplicate search capabilities of dfConnector are triggered. Based on matchcodes created by dfIntelliServer, potential duplicates are detected

Page 31: Copyright © 2005, SAS Institute Inc. All rights reserved. Real-time Data Quality for SAP Dietrich O. Banschbach Manager, R&D EMEA SAS International

Copyright © 2005, SAS Institute Inc. All rights reserved. 31

Scenario 4: Duplicate search

Page 32: Copyright © 2005, SAS Institute Inc. All rights reserved. Real-time Data Quality for SAP Dietrich O. Banschbach Manager, R&D EMEA SAS International

Copyright © 2005, SAS Institute Inc. All rights reserved. 32

Scenario 4: Duplicate search

Page 33: Copyright © 2005, SAS Institute Inc. All rights reserved. Real-time Data Quality for SAP Dietrich O. Banschbach Manager, R&D EMEA SAS International

Copyright © 2005, SAS Institute Inc. All rights reserved. 33

Scenario 4: Duplicate searchTransaction flow

Address data is entered in SAPGUI. Postal address validation executes

The /DATAFLUX/ADDR_SEARCH implementation of the BAdI „ADDRESS_SEARCH“ is invoked

Function module /DATAFLUX/DUPLICATE_CHECK searches for duplicates

/DATAFLUX/DUPLICATE_CHECK calls dfConnector which gathers the entered SAP data.

Matchcodes are generated dynamically and a JDBC call is made to retrieve results from the external RDBMS. The results of the search are returned to dfConnector which passes them to SAP to display a list of potential duplicates

Page 34: Copyright © 2005, SAS Institute Inc. All rights reserved. Real-time Data Quality for SAP Dietrich O. Banschbach Manager, R&D EMEA SAS International

Copyright © 2005, SAS Institute Inc. All rights reserved. 34

Scenario 5: Quarterly adjustment process

Quarterly Adjustment is a batch process that ensures address data stays up to date

If new address data are available e.g. from USPS, this can be activated in the system in three steps by running:

• SAP report to get all addresses

• DataFlux provided report to check, standardize and auto complete addresses

• SAP report to write the updated addresses back to the SAP database

Page 35: Copyright © 2005, SAS Institute Inc. All rights reserved. Real-time Data Quality for SAP Dietrich O. Banschbach Manager, R&D EMEA SAS International

Copyright © 2005, SAS Institute Inc. All rights reserved. 35

Scenario 5: Quarterly adjustment process

RSADRQU1 report scans all addresses for a certain country and inserts them into an index table

/DATAFLUX/RSADRQU2 reads all SAP addresses from index table and validates each address. Addresses are checked, auto completed and standardized.If an address cannot be validated it is flagged for later reporting purposes. Indicates the level of address quality, i.e. how many addresses are correct and how many are incorrect

RSADRQU3 writes back validated and corrected addresses to the operational SAP database. Alternatively reports reason for not being able to write them back

Page 36: Copyright © 2005, SAS Institute Inc. All rights reserved. Real-time Data Quality for SAP Dietrich O. Banschbach Manager, R&D EMEA SAS International

Copyright © 2005, SAS Institute Inc. All rights reserved. 36

Scenario 5: Quarterly adjustment process

Page 37: Copyright © 2005, SAS Institute Inc. All rights reserved. Real-time Data Quality for SAP Dietrich O. Banschbach Manager, R&D EMEA SAS International

Copyright © 2005, SAS Institute Inc. All rights reserved. 37

Scenario 5: Quarterly adjustment process

Checked addresses:

+ = ok - = failed

Summary

Page 38: Copyright © 2005, SAS Institute Inc. All rights reserved. Real-time Data Quality for SAP Dietrich O. Banschbach Manager, R&D EMEA SAS International

Copyright © 2005, SAS Institute Inc. All rights reserved. 38

Scenario 5: Quarterly adjustment process

Page 39: Copyright © 2005, SAS Institute Inc. All rights reserved. Real-time Data Quality for SAP Dietrich O. Banschbach Manager, R&D EMEA SAS International

Copyright © 2005, SAS Institute Inc. All rights reserved. 39

Technology

Java 1.4.x/1.5 to interface SAP with the Dataflux dfIntelliServer 6 using SAP Java Connector 2.1.3

ABAP programming to hook into the predefined interfaces (SAP Business Add-In) for address validation and deduplication

SAP Add-on Assembly Kit (AAK) to allow for SAP certification (e.g. Name spaces, installation, deployment, upgrade etc.)

Search index creation in SAS data sets or in any external JDBC-compliant RDBMS

Page 40: Copyright © 2005, SAS Institute Inc. All rights reserved. Real-time Data Quality for SAP Dietrich O. Banschbach Manager, R&D EMEA SAS International

Copyright © 2005, SAS Institute Inc. All rights reserved. 40

Technology: dfConnector Framework Functions /DATAFLUX/AREA_CODE

/DATAFLUX/DETERMINE_GENDER

/DATAFLUX/DETERMINE_LOCALE

/DATAFLUX/DETERMINE_ENTITY

/DATAFLUX/DIRECTORY_SEARCH

/DATAFLUX/DUPLICATE_CHECK

/DATAFLUX/GENERATE_MATCHCODE

/DATAFLUX/GEN_MATCHCODE_PARSED

/DATAFLUX/GEOCODE

/DATAFLUX/LOOKUP_COUNTY

/DATAFLUX/LOOKUP_PHONE

/DATAFLUX/PARSE

/DATAFLUX/QUERY_SERVER

/DATAFLUX/STANDARDIZE

/DATAFLUX/STANDARDIZE_PARSED

/DATAFLUX/STANDARDIZE_SCHEME

/DATAFLUX/DELETE_INDEX_ENTRY

/DATAFLUX/VERIFY_ADDRESS

/DATAFLUX/MAINTAIN_INDEX_ENTRY

Page 41: Copyright © 2005, SAS Institute Inc. All rights reserved. Real-time Data Quality for SAP Dietrich O. Banschbach Manager, R&D EMEA SAS International

Copyright © 2005, SAS Institute Inc. All rights reserved. 41

Technology: /DATAFLUX/VERIFY_ADDRESS

Input data

Results

Page 42: Copyright © 2005, SAS Institute Inc. All rights reserved. Real-time Data Quality for SAP Dietrich O. Banschbach Manager, R&D EMEA SAS International

Copyright © 2005, SAS Institute Inc. All rights reserved. 42

Technology: /DATAFLUX/VERIFY_ADDRESS

Page 43: Copyright © 2005, SAS Institute Inc. All rights reserved. Real-time Data Quality for SAP Dietrich O. Banschbach Manager, R&D EMEA SAS International

Copyright © 2005, SAS Institute Inc. All rights reserved. 43

Technology: External Search Index

The external search index can be stored in an arbitrary RDBMS that supports the JDBC interface

Examples:• SAS data sets

• MySQL

• Microsoft SQL Server

• MaxDB (formerly known as SAP DB)

• Oracle

• ...

Page 44: Copyright © 2005, SAS Institute Inc. All rights reserved. Real-time Data Quality for SAP Dietrich O. Banschbach Manager, R&D EMEA SAS International

Copyright © 2005, SAS Institute Inc. All rights reserved. 44

Technology: External Search Index

Page 45: Copyright © 2005, SAS Institute Inc. All rights reserved. Real-time Data Quality for SAP Dietrich O. Banschbach Manager, R&D EMEA SAS International

Copyright © 2005, SAS Institute Inc. All rights reserved. 45

Technology: External Search Index

Page 46: Copyright © 2005, SAS Institute Inc. All rights reserved. Real-time Data Quality for SAP Dietrich O. Banschbach Manager, R&D EMEA SAS International

Copyright © 2005, SAS Institute Inc. All rights reserved. 46

Technology: External Search Index

Page 47: Copyright © 2005, SAS Institute Inc. All rights reserved. Real-time Data Quality for SAP Dietrich O. Banschbach Manager, R&D EMEA SAS International

Copyright © 2005, SAS Institute Inc. All rights reserved. 47

Technology: External search indexExample: Stored in SAS

Page 48: Copyright © 2005, SAS Institute Inc. All rights reserved. Real-time Data Quality for SAP Dietrich O. Banschbach Manager, R&D EMEA SAS International

Copyright © 2005, SAS Institute Inc. All rights reserved. 48

Technology: RFC server platforms

SAP supported Java Connector „JCo“ platforms (used by RFC server component of dfConnector):• Windows NT SP4 or later, Win 2000, XP, Win 2003 Server

• Sun Solaris/SPARC 8 or later

• IBM AIX 4.3 or later

• HP-UX 11.0 or later (PA_RISC processors, only)

• OS/400 V5R1 or later (not for SAP JCo 2.0.5)

• COMPAQ Tru64 5.0 or later (not for SAP JCo 2.1.x)

• Z/Linux on S/390 (Linux / Z-series GLIBC 2.2.4 or later)

• Linux Kernel 2.2.14 or later (Intel compatible processors)

Page 49: Copyright © 2005, SAS Institute Inc. All rights reserved. Real-time Data Quality for SAP Dietrich O. Banschbach Manager, R&D EMEA SAS International

Copyright © 2005, SAS Institute Inc. All rights reserved. 49

Additional Information

SUGI Birds-of-a-Feather (BoF) session “Enhancing SAP with SAS”, room 107, Tuesday at 6 p.m.

www.dataflux.com

Page 50: Copyright © 2005, SAS Institute Inc. All rights reserved. Real-time Data Quality for SAP Dietrich O. Banschbach Manager, R&D EMEA SAS International

Copyright © 2005, SAS Institute Inc. All rights reserved. 50Copyright © 2005, SAS Institute Inc. All rights reserved. 50