data quality / data cleansing in bw
TRANSCRIPT
![Page 1: Data Quality / Data Cleansing in BW](https://reader035.vdocuments.mx/reader035/viewer/2022071600/613d1b5e736caf36b759641f/html5/thumbnails/1.jpg)
Data Quality /Data Cleansing
in BW
Lothar Schubert, BW RIG
8/200101
![Page 2: Data Quality / Data Cleansing in BW](https://reader035.vdocuments.mx/reader035/viewer/2022071600/613d1b5e736caf36b759641f/html5/thumbnails/2.jpg)
SAP AG 2001, Title of Presentation, Speaker Name 2
Agenda
About Data Quality
Data Cleansing
Data Validation
Data Repair
02
![Page 3: Data Quality / Data Cleansing in BW](https://reader035.vdocuments.mx/reader035/viewer/2022071600/613d1b5e736caf36b759641f/html5/thumbnails/3.jpg)
SAP AG 2001, Title of Presentation, Speaker Name 3
Why Data Cleansing / Validation?
BW data are highly integrated.BW data are queried frequently.BW data are expected to be of high quality.BW requires high data accuracy for effective decision support.BW data often serve as foundation for further processing.
Data Quality / Information Quality – what it means:Data / Information is relevant.Data / Information is timely.Data / Information is correct. 03
![Page 4: Data Quality / Data Cleansing in BW](https://reader035.vdocuments.mx/reader035/viewer/2022071600/613d1b5e736caf36b759641f/html5/thumbnails/4.jpg)
SAP AG 2001, Title of Presentation, Speaker Name 4
Sources for Dirty Data
Data are incorrect in source systemData consolidation causes issuesTechnical platforms are different (code pages, etc.)Administration issues (double loadings,…)Custom logicTechnology issues (SW, DB, O/S, HW, …)…
04
![Page 5: Data Quality / Data Cleansing in BW](https://reader035.vdocuments.mx/reader035/viewer/2022071600/613d1b5e736caf36b759641f/html5/thumbnails/5.jpg)
SAP AG 2001, Title of Presentation, Speaker Name 5
Data Contaminants - 1
012-3344Cup Holder, green US012-3378Cup Holder, red US012-4122Lighter, black US012-5521white cover US012-7662green Cup Holder US
012-4011Cup Holder, green JP012-4122phone plug JP012-6611channel JP013-1452plastic cover, red JP013-1452(pink version of above) JP
red wheel, type "014-2221" CAblue wheel, type "012-3342" CA023-2211white wheel CA
multiple keys
inconsistent keys
invalid characters
surprises
free form fields
05
![Page 6: Data Quality / Data Cleansing in BW](https://reader035.vdocuments.mx/reader035/viewer/2022071600/613d1b5e736caf36b759641f/html5/thumbnails/6.jpg)
SAP AG 2001, Title of Presentation, Speaker Name 6
Data Contaminants - 2
XYZ.com Ltd. 10/10/2000 $ 67221XYZ.com Ltd. 10/10/2000 $ 67221XYZ.com Ltd. 10/10/2000 $ 67221XYZ.com Ltd. 10/12/2000 $ 35332XYZ.com Ltd. 10/14/2000 $ 31122XYZ.com Ltd. 10/17/2000 $ 99999999XYZ.com Ltd. 10/19/2000 $ 78882
XYZ.com Ltd. 10/10/99 $ 44332XYZ.com Ltd. 10/12/99 $ 33222
ABC Co. 10/14/2000 $ 4333LMN Ltd. 10/14/2000 $ 9000XYZ.com Ltd. 10/14/2000 $ 31122ZZZ Sl. 10/14/2000 $ 122211
data redundancy
data anomalies
data format
data redundancy
06
![Page 7: Data Quality / Data Cleansing in BW](https://reader035.vdocuments.mx/reader035/viewer/2022071600/613d1b5e736caf36b759641f/html5/thumbnails/7.jpg)
SAP AG 2001, Title of Presentation, Speaker Name 7
Data Contaminants - 3
Data Contamination during upload via Exits
Application ExitsGeneric BW Exit RSAP0001Transfer- / Update-RoutinesVirtual Exits
Consider the following:Timeliness of DataCheck for VersionsCheck for Return CodesDelta Trigger CapabilitiesPerformance and General Architecture
07
![Page 8: Data Quality / Data Cleansing in BW](https://reader035.vdocuments.mx/reader035/viewer/2022071600/613d1b5e736caf36b759641f/html5/thumbnails/8.jpg)
SAP AG 2001, Title of Presentation, Speaker Name 8
Where:In the Source System?During Data Extraction?In the BW System?
When:In the productive phase?In the test phase?In the blueprint phase?
Who:Is it a technical issue?Is it a project issue?Is it an organizational issue?
Where, When and by Whom to implementData Cleansing?
• Data cleansing occurs at all levels.• Avoid tendency to attempt cleanse onlywithin the BW extraction process.• Often data cleansing is best performed atthe legacy / source system level.
• Data cleansing is one of the greatestrisks in data movement efforts.• Design belongs into blueprint phase.• Test data are often cleaner than real data.
Often data quality and inconsistencyissues are systemic in the organizationand must be addressed at higher levelin the organization to get resolved.
08
![Page 9: Data Quality / Data Cleansing in BW](https://reader035.vdocuments.mx/reader035/viewer/2022071600/613d1b5e736caf36b759641f/html5/thumbnails/9.jpg)
SAP AG 2001, Title of Presentation, Speaker Name 9
ROI of Data Quality
You should ask…What is the risk of incomplete / incorrect data sets?What is the cost to fix data, once contaminated?What are corporate quality standard?
However, also you should ask…What is the reliability of source data?Where is the point of diminishing returns?
Data Quality as an Investment
09
![Page 10: Data Quality / Data Cleansing in BW](https://reader035.vdocuments.mx/reader035/viewer/2022071600/613d1b5e736caf36b759641f/html5/thumbnails/10.jpg)
SAP AG 2001, Title of Presentation, Speaker Name 10
Cleansing
dwh
ods
BW Architecture
Extr
actio
n/
Ope
n St
agin
g
Tran
sfor
mat
ion
Integration
Granularity
any
sour
ce
Asy
nchr
onou
sD
istr
ibut
ion
-O
pen
HU
B S
ervi
ces
any
targ
et
Business InformationBusiness Information WarehouseWarehouseSynchronous
Access
port
al/
appl
icat
ion
data marts
master data
PersistentStagingArea
Bus
ines
s R
ules Info
Cube
InfoCube
InfoCube
InfoCube
odsobject
odsobject
odsobject
odsobject
odsobject
Bus
ines
s R
ules
10
![Page 11: Data Quality / Data Cleansing in BW](https://reader035.vdocuments.mx/reader035/viewer/2022071600/613d1b5e736caf36b759641f/html5/thumbnails/11.jpg)
SAP AG 2001, Title of Presentation, Speaker Name 11
Agenda
About Data Quality
Data Cleansing
Data Validation
Data Repair
11
![Page 12: Data Quality / Data Cleansing in BW](https://reader035.vdocuments.mx/reader035/viewer/2022071600/613d1b5e736caf36b759641f/html5/thumbnails/12.jpg)
SAP AG 2001, Title of Presentation, Speaker Name 12
Referential Integrity
Assuring Referential Integrity can be a majorchallenge in DWH design…
Relax!BW does it for you.
Automated checks.Central Metadata Dictionary.Integrated Architecture. 12
![Page 13: Data Quality / Data Cleansing in BW](https://reader035.vdocuments.mx/reader035/viewer/2022071600/613d1b5e736caf36b759641f/html5/thumbnails/13.jpg)
SAP AG 2001, Title of Presentation, Speaker Name 13
Master Data Validation
13
![Page 14: Data Quality / Data Cleansing in BW](https://reader035.vdocuments.mx/reader035/viewer/2022071600/613d1b5e736caf36b759641f/html5/thumbnails/14.jpg)
SAP AG 2001, Title of Presentation, Speaker Name 14
C
Check for Permitted Characters
Case A: characters not permitted Case B: characters permitted
Permitted by standard:
!"%&'()*+,-/:;<=>?_0123456789
ABCDEFGHIJKLMNOPQRSTUVWXYZ14
![Page 15: Data Quality / Data Cleansing in BW](https://reader035.vdocuments.mx/reader035/viewer/2022071600/613d1b5e736caf36b759641f/html5/thumbnails/15.jpg)
SAP AG 2001, Title of Presentation, Speaker Name 15
ConsiderPerformance Impacts!
Checking for…
• use of character values in the Data type NUMC fields• correct consideration of the conversion routine ALPHA• use of lower case letters• use of special characters• plausibility of date / time fields
Consistency Check for Characteristic Values
15
![Page 16: Data Quality / Data Cleansing in BW](https://reader035.vdocuments.mx/reader035/viewer/2022071600/613d1b5e736caf36b759641f/html5/thumbnails/16.jpg)
SAP AG 2001, Title of Presentation, Speaker Name 16
Data Integrity Checks on Packages
APIs are available to read PSA contentsFunction RSAR_ODS_MAINTAIN,….Check for reference between recordsSummary checks, ….
16
![Page 17: Data Quality / Data Cleansing in BW](https://reader035.vdocuments.mx/reader035/viewer/2022071600/613d1b5e736caf36b759641f/html5/thumbnails/17.jpg)
SAP AG 2001, Title of Presentation, Speaker Name 17
Tip:Consider AutomationVia Event Chains.
Handling of Invalid Data Records
StagingEngine
StagingEngine
Business Information Warehouse
PSAExtractExtract OKOK
SchedulerScheduler
Error Handling:1- No Update, No Reporting2- Valid Records Update, No Reporting3- Valid Records Update, Reporting Possible
ErrorError
Correction of invalid data:• within source System• manually in PSA• by Rule (see RS_ERRORLOG_EXAMPLE)
PSA
17
![Page 18: Data Quality / Data Cleansing in BW](https://reader035.vdocuments.mx/reader035/viewer/2022071600/613d1b5e736caf36b759641f/html5/thumbnails/18.jpg)
SAP AG 2001, Title of Presentation, Speaker Name 18
Local Master Data
18
![Page 19: Data Quality / Data Cleansing in BW](https://reader035.vdocuments.mx/reader035/viewer/2022071600/613d1b5e736caf36b759641f/html5/thumbnails/19.jpg)
SAP AG 2001, Title of Presentation, Speaker Name 19
Deletion Features during Update
19
![Page 20: Data Quality / Data Cleansing in BW](https://reader035.vdocuments.mx/reader035/viewer/2022071600/613d1b5e736caf36b759641f/html5/thumbnails/20.jpg)
SAP AG 2001, Title of Presentation, Speaker Name 20
Agenda
About Data Quality
Data Cleansing
Data Validation
Data Repair
20
![Page 21: Data Quality / Data Cleansing in BW](https://reader035.vdocuments.mx/reader035/viewer/2022071600/613d1b5e736caf36b759641f/html5/thumbnails/21.jpg)
SAP AG 2001, Title of Presentation, Speaker Name 21
Aggregate Check Tool
Report RRX_TRACE_CHECK_AGGREGATE
Check OSS Note 202469 for details. 21
![Page 22: Data Quality / Data Cleansing in BW](https://reader035.vdocuments.mx/reader035/viewer/2022071600/613d1b5e736caf36b759641f/html5/thumbnails/22.jpg)
SAP AG 2001, Title of Presentation, Speaker Name 22
Custom Check Points
Key figuresArticleCashier Number
Article Cashier Number Sales (POS Receipts) Sales (Receipt) Overall result777921 1128 $ 0.00 $ 0.00 $ 0.00777922 1128 $ 0.00 $ 0.00 $ 0.00777923 1128 $ 0.00 $ 0.00 $ 0.00Overall result $ 0.00 $ 0.00 $ 0.00
• Identify check points in source system
• Write check point data to custom table
• Use generic extractor for load
• Populate check cube
• Perform Compress with 0 suppression
• Execute exception report
22
![Page 23: Data Quality / Data Cleansing in BW](https://reader035.vdocuments.mx/reader035/viewer/2022071600/613d1b5e736caf36b759641f/html5/thumbnails/23.jpg)
SAP AG 2001, Title of Presentation, Speaker Name 23
Audit Dimensions / Data Modelling
Audit Dimensions can identify:When were the data created?Which source did the data come from?Which tools where used for extraction?Which rules had touched the data?…
23
![Page 24: Data Quality / Data Cleansing in BW](https://reader035.vdocuments.mx/reader035/viewer/2022071600/613d1b5e736caf36b759641f/html5/thumbnails/24.jpg)
SAP AG 2001, Title of Presentation, Speaker Name 24
Display of individual Requests
Key figures Amount
Request ID
Request ID Amount12389 $ 80,000.00# $ 28,078,400.00Overall result $ 28,158,400.00
You can use the REQUEST ID to displayand analyze individual requests.
24
![Page 25: Data Quality / Data Cleansing in BW](https://reader035.vdocuments.mx/reader035/viewer/2022071600/613d1b5e736caf36b759641f/html5/thumbnails/25.jpg)
SAP AG 2001, Title of Presentation, Speaker Name 25
Check Programs (RSRV)
Infocubes: Fact, SID, MID,…HierarchiesInfoobjectsDDIC DefinitionsCharacteristic Values…
25
![Page 26: Data Quality / Data Cleansing in BW](https://reader035.vdocuments.mx/reader035/viewer/2022071600/613d1b5e736caf36b759641f/html5/thumbnails/26.jpg)
SAP AG 2001, Title of Presentation, Speaker Name 26
Data Quality Check Flags
26
![Page 27: Data Quality / Data Cleansing in BW](https://reader035.vdocuments.mx/reader035/viewer/2022071600/613d1b5e736caf36b759641f/html5/thumbnails/27.jpg)
SAP AG 2001, Title of Presentation, Speaker Name 27
Agenda
About Data Quality
Data Cleansing
Data Validation
Data Repair
27
![Page 28: Data Quality / Data Cleansing in BW](https://reader035.vdocuments.mx/reader035/viewer/2022071600/613d1b5e736caf36b759641f/html5/thumbnails/28.jpg)
SAP AG 2001, Title of Presentation, Speaker Name 28
Request Deletion Infocube / ODS
ConsiderLimitations!
28
![Page 29: Data Quality / Data Cleansing in BW](https://reader035.vdocuments.mx/reader035/viewer/2022071600/613d1b5e736caf36b759641f/html5/thumbnails/29.jpg)
SAP AG 2001, Title of Presentation, Speaker Name 29
Selective Deletion
29
![Page 30: Data Quality / Data Cleansing in BW](https://reader035.vdocuments.mx/reader035/viewer/2022071600/613d1b5e736caf36b759641f/html5/thumbnails/30.jpg)
SAP AG 2001, Title of Presentation, Speaker Name 30
InfoCube Reconstruction
30
![Page 31: Data Quality / Data Cleansing in BW](https://reader035.vdocuments.mx/reader035/viewer/2022071600/613d1b5e736caf36b759641f/html5/thumbnails/31.jpg)
SAP AG 2001, Title of Presentation, Speaker Name 31
InfoCube Request Reversal Posting
Works still fine after compression / roll-up! 31
![Page 32: Data Quality / Data Cleansing in BW](https://reader035.vdocuments.mx/reader035/viewer/2022071600/613d1b5e736caf36b759641f/html5/thumbnails/32.jpg)
Data Quality /Data Cleansing
in BW
Lothar Schubert, BW RIG
8/200132