datavirtuality - beyond the data lake

Post on 08-Jan-2017

591 Views

Category:

Data & Analytics

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

US Office:

1355 Market Street, #488

San Francisco, CA 94103

German Office:

Katharinenstr. 15

04109 Leipzig, Germany

Beyond the Data Lake Simplifying data integration for the modern age

Matthias Korn | Technical Consultant

matthias.korn@datavirtuality.de

2 The Challenge

Gartner 2014: “VARIETY

is the biggest

challenge.”

“When asked about the

dimensions of data

organizations struggle

with most, 49% answered

variety, while 35%

answered volume and

16% velocity.”

3 Integration using the Data Warehouse

Data is integrated by copying it into a central repository

Approach: ETL process

Structure is applied in the repository

BI users query Data Marts

4 Why do so many DWH projects fail

Slow data-to-actionable-insights (6 to 9+ months)

77% failure rate*

Inflexible; costly modifications

Labour-intensive setup and maintenance

5 Data Lake – getting data in is pretty easy…

6 …but making sense of it is the challenge

Business User

7 Approaches to data fishing

Situation improved with YARN

Apache Mahout, HBase, Hive, Pig and MapReduce

Data Marts are created

BI user‘s report tools query Data Marts

Wait, didn‘t they do this before already?

8 „Transform“ just changed its position: ETL -> ELT

Data Marts have to be created by Data Scientists

BI users can‘t do new things

No permission concept

A lot of the stored data is never used, eating up the low storage costs

9 The Logical Data Warehouse

Introduced by Gartner in 2012

Adds virtualization of data sources

Adds distributed processes

logically consistent, subject-oriented integration of time-variant data

Via data management infrastructure

10 Logical Data Warehouse (LDW)

11 What does the Logical Data Warehouse do?

LDW knows where the data is stored instead of copying it

Repositories are used for datasources that are too slow

Presents all data in a single virtual database

Quickly reacts to changes in data models of source systems

12 Advantages of the Logical Data Warehouse

Real time data available and ready for analysis

Immediately productive

Different use cases supported: Exploration, data manipulation and batch processing

Data Model creation not tied to physical database: Logical Data Model!

Permission concept implemented

Webservice access using virtualization

Write back to the connected datasources

13 Example data flow in an LDW

BI frontend aware of all data sources - creates SQL statement

Distributed query taking place

Performance optimization engine replicates data only if needed

US Office:

1355 Market Street, #488

San Francisco, CA 94103

German Office:

Katharinenstr. 15

04109 Leipzig, Germany

DataVirtuality Thanks for your attention!

Connect with us at any time.

matthias.korn@datavirtuality.de

top related