real world data warehouses
TRANSCRIPT
Data Warehousing- in the real world -
Dr. Thomas ZurekNovember 2014
Big Data und Analytische Applikationen
Real-World Data Warehouses / Thomas Zurek
Who am I ?
• Vice President of Development @ SAP for – Business Warehouse (BW)– Business Planning & Consolidation (BPC)– HANA Analytics
• 17 years at SAP• PhD in Computer Science• Universities of Karlsruhe and Edinburgh
2November 2014
Real-World Data Warehouses / Thomas Zurek
Agenda
1. Examples 2. Business Intelligence (BI) + Data Warehouses (DW)3. Data Warehouses4. Layered Scalable Architecture (LSA)5. In-Memory Databases + Data Warehousing 6. Summary
3November 2014
Real-World Data Warehouses / Thomas Zurek 4
EXAMPLES
November 2014
Real-World Data Warehouses / Thomas Zurek 5
Examples of Business Intelligence Scenarios
• fraud detection- retail company- point-of-sales data & given discounts- huge amounts of data- a prototypical BI question
• long tail analysis- e-commerce companies like Amazon, Ebay, iTunes, Netflix, …- translate sales of popular products into (additional) sales in
the long tail- BI integrated into operational processes
November 2014
Real-World Data Warehouses / Thomas Zurek 6
Long Tail Analysis (1) – An Amazon Example
November 2014
Real-World Data Warehouses / Thomas Zurek 7
Long Tail Analysis (2)
Source: Chris Anderson, The Long Tail, Wired, October 2004, http://www.wired.com/wired/archive/12.10/tail.html
November 2014
Real-World Data Warehouses / Thomas Zurek 8
Long Tail Analysis (3)
Source: Chris Anderson, The Long Tail, Wired, October 2004, http://www.wired.com/wired/archive/12.10/tail.html
November 2014
Real-World Data Warehouses / Thomas Zurek 9
BUSINESS INTELLIGENCE +DATA WAREHOUSES
November 2014
Real-World Data Warehouses / Thomas Zurek 10
Business Intelligence and Data Warehouses
• Business Intelligence (BI)An environment in which business users conduct analyses that yield overall understanding of where
the business has been, where it is now, and where it will be in the near future (i.e. planning).
• Data Warehouse (DW)- An implementation of an informational database used to collect,
integrate and provide sharable data sourced from multiple operational databases for analyses.
- Provide data that is reliable, consistent, understandable.- It typically serves as the foundation for a business intelligence system.
November 2014
Real-World Data Warehouses / Thomas Zurek
Business Intelligence and Data Warehouses
11
Business IntelligenceOLAP, cubes, dimensions, measures, KPIs, scoreboards, dashboards,
pivot tables, data mining, predictive, slice & dice, planning, EPM, analytics, …
Data Warehouseconnectivity, cleansing, scrubbing, ETL, ELT, EHL,
transformation, harmonisation,consistency, compliance, auditing, big data, scalability, …
OperationalSystem
ERP, CRM, SCM, HR, …
Meta
Data
se
curit
y, m
odel
s, …
November 2014
Real-World Data Warehouses / Thomas Zurek
Business Intelligence and Data Warehouses
12
OperationalSystem
ERP, CRM, SCM, HR, …
Meta
Data
se
curit
y, m
odel
s, …
simply remember:(1) BI and DW(2) BI ≠ DW
Business IntelligenceOLAP, cubes, dimensions, measures, KPIs, scoreboards, dashboards,
pivot tables, data mining, predictive, slice & dice, planning, EPM, analytics, …
Data Warehouseconnectivity, cleansing, scrubbing, ETL, ELT, EHL,
transformation, harmonisation,consistency, compliance, auditing, big data, scalability, …
Focus today!
November 2014
Real-World Data Warehouses / Thomas Zurek 13
DATA WAREHOUSES
November 2014
Real-World Data Warehouses / Thomas Zurek
Multiple Data Sources
Why are there so many DBs at an enterprise?• business processes data captured in some DB• organisation reflected in system landscape• geography reflected in system landscape• smaller systems easier to manage than big systems• mergers and acquisitions• external data: market data, supplier data, …• …
14November 2014
Real-World Data Warehouses / Thomas Zurek
A Typical Example for Business Processes in an Enterprise
15
source: http://thebankwatch.com/2006/09/13/simplifying-the-business-model/
November 2014
Real-World Data Warehouses / Thomas Zurek 16
Business transform
End-user access / Presentation
Provide data
Data Acquisition
Harmonization
Data Propagation
Reporting / Analyses / Planning
Main Service : Spot for apps/Delta to app/App recoveryTransform : Enriched || General Business logicContent : Data source || Business domain specific History : Determined by rebuild requirements of appsStore : DSO(can be logical partitioned)
Main Service : Decouple, Fast load and distribute Transform : 1:1Content : 1 data source, All fields History : 4 weeksStore : PSA, DSO-WO.
Main Service : Integrated, harmonized Transform : Harmonize quality assure (in flow|| lookup)Content : Defined fieldsHistory : Short or not at all || Long termStore : Info source || IO/DSO/Z-table
Main Service : Make data available for reporting & planning tools Transform : Application specific/(dis-)aggregate/lookupContent : Application specific History : Application specific Store : IC,DSO, Info Set, Virtual Provider, Multi Provider.
A Typical Data Warehouse Architecture
Corp.Memory
ODSBI Layer
Data Warehouse
Source 1 Source 2 Source 3 Source 4 Source 5
Proj
ect G
over
nanc
eIT
Gov
erna
nce
November 2014
Real-World Data Warehouses / Thomas Zurek
Challenge 1: RELIABLE
• typical: data from 50-100 data sources• availability of data sources not given
– system downtimes– network failures– example:
• availability per data source = 98%• all 100 data sources available = 0.98**100 = 13%• 1 out of 100 data sources not available = 1 – 0.13 = 87%
all data in one place asserts reliable data access
17November 2014
Real-World Data Warehouses / Thomas Zurek
Challenge 2: CONSISTENT
• Assume: each data source is consistent!• Is the union of all data sources consistent?
NO !
In a DW, data gets synchronised and harmonized to provide a consistent view spanning multiple data sources.
18November 2014
Real-World Data Warehouses / Thomas Zurek
Examples Challenge 2: Transformation, Cleansing
• Jun 1, 2011 = 1.6.2011 = 06/01/11 = …
• VW Touareg = VW TOUAREG = [product] 87654 = …
• currency and unit conversions:– box kg
– €, $, £, ¥, … €
• resolve ID clashes:product 123 [in subsiduary A] ≠ product 123 [in subsiduary B]
• enrich data:add attributes from source A to data from source B
19November 2014
Real-World Data Warehouses / Thomas Zurek
Examples Challenge 2: History / Time-Dependency
• data is time-dependent, e.g.– employee A worked in department X in 2012– employee A worked in department Y in 2013– currency exchange rates– current view vs historic view analysis
• versioning of meta data– models change– development test production– auditing
20November 2014
© SAP AG 2009. All rights reserved. / Page 21 Public
Automatisierte Überprüfung der Datenqualität in Form eines Plausibility Gates
Single Point of Truth
Quelle 1 Quelle 2 Quelle ... Quelle n
Fachliche Überprüfung der Daten verringern den Administrationsaufwand und den anschließenden „Ärger“
Harmonisierte Auswertungen
Plausibility Gate
UNSPSC-Code vorhanden?
RVO mit BVO-Bezug?
DUNS-Nummer vorhanden?
Größenordnung BVO/RVO?
real customer example
Real-World Data Warehouses / Thomas Zurek 22
Challenge 3: UNDERSTANDABLE
• texts for cryptic numbers• multi-language support• data provenance:
know where the data originated
• auditing: track changes• relevance:
show the user data from his "realm of command"
November 2014
Real-World Data Warehouses / Thomas Zurek 23
LAYERED SCALABALE ARCHITECTURE (LSA)
November 2014
Real-World Data Warehouses / Thomas Zurek 24
Business transform
End-user access / Presentation
Provide data
Data Acquisition
Harmonization
Data Propagation
Reporting / Analyses / Planning
Main Service : Spot for apps/Delta to app/App recoveryTransform : Enriched || General Business logicContent : Data source || Business domain specific History : Determined by rebuild requirements of appsStore : DSO(can be logical partitioned)
Main Service : Decouple, Fast load and distribute Transform : 1:1Content : 1 data source, All fields History : 4 weeksStore : PSA, DSO-WO.
Main Service : Integrated, harmonized Transform : Harmonize quality assure (in flow|| lookup)Content : Defined fieldsHistory : Short or not at all || Long termStore : Info source || IO/DSO/Z-table
Main Service : Make data available for reporting & planning tools Transform : Application specific/(dis-)aggregate/lookupContent : Application specific History : Application specific Store : IC,DSO, Info Set, Virtual Provider, Multi Provider.
A Typical Data Warehouse Architecture
Corp.Memory
ODSBI Layer
Data Warehouse
Source 1 Source 2 Source 3 Source 4 Source 5
Proj
ect G
over
nanc
eIT
Gov
erna
nce
November 2014
Real-World Data Warehouses / Thomas Zurek 25
Yet Another, Arbitrary Example …
Source: http://www.zentut.com/wp-content/uploads/2012/10/stand-alone-data-mart.jpg
November 2014
Real-World Data Warehouses / Thomas Zurek 26
The Layered Scalable Architecture (LSA)
• reference architecture for DW
• term introduced by SAP, but not SAP-specific
• layers:– each layer has a certain task
– each layer has an associated service-level
– layers describe the step-wise refinement of data
• not every DW needs all LSA-layers
• modern technology allows to remove / merge layers as less or no performance-motivated services are required
• more: http://tinyurl.com/sap-lsa
November 2014
27
LSA Reference Layers LS
A
Reporting Layer
Business Transformation LayerBusiness Transformation Layer
Operational D
ata StoreO
perational Data Store
Data Propagation LayerData Propagation Layer
Quality & Harmonisation LayerQuality & Harmonisation Layer
Corporate MemoryCorporate Memory
Data Acquisition LayerData Acquisition Layer
Virtualization Layer
1:1 from extraction,temporary
source system service level,long term, comprehensive, complete, master the unknown
create harmonised view, guarantee quality
EDW layers- application neutral- corporate owned - granular
BI Applications/Analytics Layers
digestible, integrated, unified data, ready to consume
apply business logic
reporting, analysis ready abstraction near real time, operational like
November 2014
Real-World Data Warehouses / Thomas Zurek 28
IN-MEMORY DATABASES +DATA WAREHOUSING
November 2014
Real-World Data Warehouses / Thomas Zurek 29
Why In-Memory Databases?
Type of Memory Size Latency (~)
L1 CPUCache 64K 1 ns
L2 CPUCache 256K 5 ns
L3 CPUCache 8M 20 ns
Main Memory
GBs up to TBs 100ns
Disk TBs >1.000.000 ns
need cache-conscious data-structures and algorithms ! SAP HANA is an example for an in-memory DBMS
(from 2011)
November 2014
Real-World Data Warehouses / Thomas Zurek 30
The Data Warehousing Quadrantda
ta v
olum
e
huge
modest
number of data models, sources, …modest huge
Very Large DW
Data Mart Enterprise DW
Big DW
November 2014
Real-World Data Warehouses / Thomas Zurek 31
The Data Warehousing Quadrantda
ta v
olum
e
huge
modest
number of data models, sources, …modest huge
internet scale business process(e.g. Ebay, Amazon, …) generatinghuge amounts of (sensor) data
fairly modest challenges regardingsemantics, consolidation, harmoni-zation, integration with other data
few data sources
mix of scenarios with small andlarge amounts of data
many (1000s to 10000s) of datamodels
many (100s) different datasources
data mart type of setup oroperational (OLTP) analytics
modest number of tables modest (need for) integrations
between data models
VLDW BDW
EDWData Mart
more scenarios more combinations of
scenarios
m
ore
gran
ular
dat
a
sens
or /
big
dat
a
mor
e sc
enar
ios
SAP HANA
SAP BW
November 2014
Real-World Data Warehouses / Thomas Zurek 32
SUMMARY
November 2014
Real-World Data Warehouses / Thomas Zurek 33
What You Should Take Away
1. Difference: BI vs DW
2. What are the problems that a DW handles?
3. How are those problems tackled?
November 2014
Real-World Data Warehouses / Thomas Zurek 34November 2014