Download - datavirtuality - Beyond the data lake
US Office:
1355 Market Street, #488
San Francisco, CA 94103
German Office:
Katharinenstr. 15
04109 Leipzig, Germany
Beyond the Data Lake Simplifying data integration for the modern age
Matthias Korn | Technical Consultant
2 The Challenge
Gartner 2014: “VARIETY
is the biggest
challenge.”
“When asked about the
dimensions of data
organizations struggle
with most, 49% answered
variety, while 35%
answered volume and
16% velocity.”
3 Integration using the Data Warehouse
Data is integrated by copying it into a central repository
Approach: ETL process
Structure is applied in the repository
BI users query Data Marts
4 Why do so many DWH projects fail
Slow data-to-actionable-insights (6 to 9+ months)
77% failure rate*
Inflexible; costly modifications
Labour-intensive setup and maintenance
5 Data Lake – getting data in is pretty easy…
6 …but making sense of it is the challenge
Business User
7 Approaches to data fishing
Situation improved with YARN
Apache Mahout, HBase, Hive, Pig and MapReduce
Data Marts are created
BI user‘s report tools query Data Marts
Wait, didn‘t they do this before already?
8 „Transform“ just changed its position: ETL -> ELT
Data Marts have to be created by Data Scientists
BI users can‘t do new things
No permission concept
A lot of the stored data is never used, eating up the low storage costs
9 The Logical Data Warehouse
Introduced by Gartner in 2012
Adds virtualization of data sources
Adds distributed processes
logically consistent, subject-oriented integration of time-variant data
Via data management infrastructure
10 Logical Data Warehouse (LDW)
11 What does the Logical Data Warehouse do?
LDW knows where the data is stored instead of copying it
Repositories are used for datasources that are too slow
Presents all data in a single virtual database
Quickly reacts to changes in data models of source systems
12 Advantages of the Logical Data Warehouse
Real time data available and ready for analysis
Immediately productive
Different use cases supported: Exploration, data manipulation and batch processing
Data Model creation not tied to physical database: Logical Data Model!
Permission concept implemented
Webservice access using virtualization
Write back to the connected datasources
13 Example data flow in an LDW
BI frontend aware of all data sources - creates SQL statement
Distributed query taking place
Performance optimization engine replicates data only if needed
US Office:
1355 Market Street, #488
San Francisco, CA 94103
German Office:
Katharinenstr. 15
04109 Leipzig, Germany
DataVirtuality Thanks for your attention!
Connect with us at any time.