DATA LAKE - RE BIRTH OF ENTERPRISE DATA THINKING
MAKING BIG DATA MEANINGFUL FOR ALL ENTERPRISE
WWW.AGILEISS.COM
1
Making BiG Data meaningful for All
By Raj Babu [email protected]
HADOOP IS NOT FOR SELECTED FEW, BUT FOR ALL ENTERPRISE
About Agile iSS
Agile iSS , We are a BI & Analytics services company servicing our clients on Big Data, Data Lake, BI, BI on Cloud, BI/Analytics As Service.
Our Goal is to make Big Data meaningful for all Enterprises.
We are focused on helping our clients upgrade their current EXPENSIVE
and old tech based ineffective BI solution to a POWERFUL, EFFECTIVE BI & ANALYTICS solution that is effective and has
lower TCO.
WWW.AGILEISS.COM
2
WWW.AGILEISS.COM
DATA LAKE - RE BIRTH OF ENTERPRISE DATA THINKING ENTERPRISE DATA LAKE (EDL)
I have just two goal for my 25 minute presentation today…… To convince you all on following……
Big Data is not only a solution for the select few Enterprises…..who have 100’s of TB’s or ZB’s of data. Big Data through Enterprise Data Lake (EDL) is now Mainstream and should be part
of standard IT stack solution for all mid and large Enterprises.
EDL makes Enterprise BI systems more Agile, Nimble, Economical & Valuable.
WWW.AGILEISS.COM
DATA LAKE - RE BIRTH OF ENTERPRISE DATA THINKING MAKING BIG DATA MEANINGFUL FOR ALL ENTERPRISE
Why Enterprise Data Lake Solution (based on Big Data, No-SQL
technology) + Traditional BI as Enterprise BI & Analytics Solution is a significantly more effective, than its predecessor
EDW that has tried and failed in the last 2 decades ..?
Why EDW Failed ?
WWW.AGILEISS.COM
If you Google “Challenges with EDW”, you will get something like this……
Takes too long to get anything done
BI is too Expensive to Build and Manage and never on the schedule
that Business wants
Our BI team and system can’t
implement changes fast..
Over complicated Architecture…
Our BI cant do anything ad-hoc, they need requirements, design, architecture, ETL for everything & it never gets
done after all……
Our BI is Always incomplete, it never
has all the data we need Our BI is not suitable for
ad-hoc Analytics
WWW.AGILEISS.COM
6
It is extremely expensive and practically impossible to gather requirements, design, build ETL and store all the data
needed in EDW & DM. EDW or Data Marts are optimized for data
analysis by processing and storing only subsets of datasets.
An EDL is designed to “RETAIN ALL DATASETS“. This is the single most powerful feature of EDL as we will never know the future complete scope of datasets for analytics.
Why EDW Failed? & EDL is taking over
Why EDL clearly wins over EDW ?
WWW.AGILEISS.COM
Service ad-hoc request with no latency & no
development
Inexpensive and low maintenance cost to manage as there is no or very minimal
Build effort
Minimal development
team involvement, unless data is needed in Data
Mart
All Data is in Data Lake…
Can do ad-hoc, no need for any SDLC to access any new data.
No more waiting….Perfect
place to offload all new & ad-hoc
request.
In EDL, ETL or Database is not needed for
Reporting or Analytics
Offers a perfect solution..NO heavy
duty ETL
What is a Data Lake ?
WWW.AGILEISS.COM
8
From Wiktionary data lake
A massive, easily accessible data repository built on (relatively)
inexpensive computer hardware for storing “Big Data".
Techtarget A data lake is a large object-based storage repository that holds data in its native format until it is needed. Etymology
Pentaho CTO James Dixon is credited with coining the term "data lake". As he described it in his blog entry.
If you Google Data Lake you will get following results…….
What is Data Lake Cont…….
WWW.AGILEISS.COM
9
From Wiktionary……
Pentaho CTO James Dixon described it in his blog entry,
"If you think of a datamart as a store of bottled water – cleansed and packaged and
structured for easy consumption.
-The data lake is a large body of water in a more natural state. The contents of
the data lake stream in from a source to fill the lake, and
various users of the lake can come to examine, dive in, or take samples.
What Data Lake has to Offer
WWW.AGILEISS.COM
10
** EDL image by PWC
ETL
In here all kinds of Analytics happen. 85% Analytics, 15% Proto type Reporting
EDL, ODS, Warm Archive
Data Marts
Is EDL a Product or tool ?
WWW.AGILEISS.COM
11
EDL is really a Reference Architecture for the Enterprise BI solution using Hadoop based Big-Data as the foundation. There are now many leading DB vendors seeing EDL as a clear winner and are
incorporating it in their offering and calling it Data Hub
Traditional ETL
Analytics & Data Scientist
Meta Data
Enterprise Data
WWW.AGILEISS.COM
12
Big Data ETL
Direct Analytics & Reporting
Data Mart’s
Enterprise Data Lake (EDL) On-Premise Reference Architecture For BI & Analytics
Data Lake on Hadoop (Horton Works, Cloudera, MAPR )
Traditional ETL
WWW.AGILEISS.COM
13
Enterprise Data
Meta Data
Analytics & Data Scientist
Data Lake on Hadoop (Horton Works, Cloudera, MAPR )
Data Mart’s Data Mart’s Data Mart’s
Enterprise Data Lake (EDL) On-Premise Reference Architecture For BI & Analytics – Stack View
WWW.AGILEISS.COM
Reference Architecture for EDL on Cloud or Hybrid
Your EDL can be Following
WWW.AGILEISS.COM
• A central Enterprise Data Repository ODS, Data Hub
• Staging source for all systems
• A warm and Active Data Archive /Vault
• Hadoop Data Warehouse
WWW.AGILEISS.COM
• Anyone one and everyone who is impatient about getting their hands on data
• The ones that cant give requirement but wanted reports yesterday
• The ones that have no patience for ETL or Report development
• Analytics, Data Science team
• ETL team for Staging
• By not having to buy DB capacity to store all data in BI database • When volume of data too high to process through a regular DB
Your EDL can service following……
Who are all supporting Data Lake or Data Hub ?
WWW.AGILEISS.COM
17
Explore EDL - There is nothing to loose
WWW.AGILEISS.COM
18
With EDL there is no need for expensive ETL, Databases
and long delays associated with your
BI & Analytics Platform.