i h adoop t d emise of d ata w arehousing - senturus · 2016-08-30 · of d ata w arehousing? t...

Post on 25-Apr-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

IS HADOOP THE DEMISE OF DATA WAREHOUSING?

THOUGHTS ON THE IMPACT OF HADOOP ON BI SYSTEMS AND DATA WAREHOUSING

Part of our BI Demystified

Series

John Peterson

CEO & Co-Founder

Senturus

Today’s Presenter

2

With thanks to:

Guy Wilnai, Sujee Maniyam and Knowledge @ Senturus

• INTRODUCTION

• THE DATA CHALLENGE

• WHAT IS HADOOP?

• ADVANTAGES & CHALLENGES

• IMPLICATIONS, PREDICTIONS & MISC. MUSINGS

• CONCLUSIONS

• Q&A

AGENDA

3 Copyright 2014 Senturus, Inc. All Rights Reserved

PRESENTATION SLIDE DECK ON WWW.SENTURUS.COM

Copyright 2014 Senturus, Inc. All Rights Reserved 4

WHO WE ARE

SENTURUS INTRODUCTION

Our Team:

Business depth combined with technical expertise. Former CFOs, CIOs, Controllers, Directors, BI Managers

SENTURUS: BUSINESS ANALYTICS CONSULTANTS

6 Copyright 2014 Senturus, Inc. All Rights Reserved

Business Intelligence Enterprise Planning Predictive Analytics

Creating Clarity from Chaos

• Former Head of BI/ Lead Architect – VISA

• Former Chief BI Architect – Jamba Juice

• Former Head of BI – Dole

• Former Chief BI Architect – Cisco

• Former Chief BI Architect – Central Garden & Pet

• Former Head of BI – Experian

• Former Head of BI – Robert Half International

• Former Head of Training (IBM Cognos, Southern California)

• Former Controller – The GAP

• Two former CFO’s

• Former Partner - PWC ($50million+ projects)

• Several former Vice Presidents of Marketing, Sales & Manufacturing/Supply Chain

• Several former COO’s

• Several former CIO’s

• Average experience = over 20 years

A FEW OF OUR TEAM MEMBERS (FORMER ROLES) Deep & Pragmatic Experience

Copyright 2014 Senturus, Inc. All Rights Reserved. 7

750+ CLIENTS, 1600+ PROJECTS, 13+ YEARS

Copyright 2014 Senturus, Inc. All Rights Reserved. 8

Outpacing our ability to harness it

THE DATA CHALLENGE

THE CHALLENGES (AND OPPORTUNITIES)

10 Copyright 2014 Senturus, Inc. All Rights Reserved.

• Data volumes & velocity increasing exponentially

• Data types proliferating

• Rapid emergence of less structured (or unstructured) data sources

• Value of Data increasing

• Traditional ETL is time-consuming and costly

• Traditional storage costs skyrocketing (not $/TB)

• Business users increasingly frustrated at not being able to get access to information

THE NET RESULT

11 Copyright 2014 Senturus, Inc. All Rights Reserved.

Something is bound to

happen

A WARNING ABOUT TODAY’S FOCUS

12 Copyright 2014 Senturus, Inc. All Rights Reserved.

IS ABOUT:

Hadoop as a potential platform or tool for Business Analytics & DW

IS NOT ABOUT:

Yet another “How Big Data will change the world” paradigm-shift prediction

ROLE OF HADOOP IN YOUR ENVIRONMENT

QUICK POLL

Under the Covers

WHAT IS HADOOP?

WHAT IS HADOOP?

15 Copyright 2014 Senturus, Inc. All Rights Reserved.

Hadoop is a stuffed elephant

WHAT IS HADOOP REALLY?

16 Copyright 2014 Senturus, Inc. All Rights Reserved.

Database Tables

• Hadoop is an open source distributed storage and processing framework

• Hadoop vs. RDBMS

System Tables

SQL Query Engine

Typical RDBMS

HDFS Files*

Hcatalog & YARN

Multiple Engines

Hadoop Stack

Storage

Metadata

Queries

*Raw data

to highly

structured

All layers combined in a

proprietary bundle

All layers separate and

independent allowing flexible access

REFERENCE ARCHITECTURE

17 Copyright 2014 Senturus, Inc. All Rights Reserved. Source: Hortonworks

REFERENCE ARCHITECTURE (DETAILED)

18 Copyright 2014 Senturus, Inc. All Rights Reserved. Source: Hortonworks

HADOOP STACK DISTRIBUTIONS

19 Copyright 2014 Senturus, Inc. All Rights Reserved.

Distribution Open Source Premium

Apache Y N

Cloudera Y Y

Horton Works Y N

MapR Y (?) Y

Intel N Y

EMC Greenplum HD N Y

ADVANTAGES OF HADOOP (FOR BI)

20 Copyright 2014 Senturus, Inc. All Rights Reserved.

• Dramatically lower cost

– 50x to 100x (or more)

• Can store virtually any data type

• Can support multiple analytic engines

• Massively scalable

– Both Size and Performance

– 100’s of nodes, TB of RAM, PB of storage

• Open-source leads to rapid innovation

HADOOP OFFERS COST EFFECTIVE STORAGE

“A recent survey of large financial services firms,

telecommunications carriers and retailers indicated that

storing data in an RDBMS typically runs between $30,000

and $100,000 (USD) per TB per year in total costs”

--- Cloudera white paper

- Hadoop can bring down the cost to ~$1,000 / TB

BIG DATA COST COMPARISON

Source : Neustar

BIG DATA COST COMPARISON

Source: HortonWorks

COST CASE STUDY (TELECOM)

• The carrier’s previous data processing environment was costing $59 million (USD) each year to manage 1PB of data, broken down as follows:

– $2 million (USD) per year = storage for 1PB raw archive data on network-attached storage (NAS) at $2,000 per TB per year

– $55 million (USD) per year = management and backup of 1PB processed data on EDW at $55,000 per TB per year

– $2 million (USD) per year = administration costs calculated at $1,000 per TB per year

• Calculating costs for moving data processing onto Cloudera, the carrier

reduced infrastructure costs to $5.1 million (USD) total

– $5 million (USD) per year = hardware, software and infrastructure for 1PB at $5,000 per TB per year

– $100,000 (USD) per year = administration costs calculated at $100 per TB per year

HADOOP CAN STORE ANY DATA TYPE

• Key-value pairs

• Text and binary data

• Structured

– Database records

• Semi-structured

– Sensor & Machine data

– Log files

• Un-structured

– Emails, tweets

“Set structure at query time”

Can retain

atomic level

data

ANALYTICS IN HADOOP

• ‘Batch’ or ‘offline’ analytics

– MapReduce based tools (java mapreduce, streaming, pig, hive)

– Have been there from the start, Well understood

• Fast Ad-Hoc querying

– New wave of processing, answer to MPP databases (Teradata .etc)

– Impala (Cloudera), stinger / Tez (Hortonworks), Shark on Spark (Apache)

• Streaming / Near-RealTime workloads

– Storm, Spark

– Propelled by YARN processing framework in Hadoop version 2.x

ANALYTICS IN HADOOP (CONT.)

• BI Tools integration – Rich BI tool integration

– Various levels of integration (basic, native, high-speed)

– Lots of vendors : Datameer, Pentaho, Tableau, QlikView, IBM Cognos…

• NOSQL store – Find data very quickly (milliseconds, just like a traditional database)

– Hbase

• Statistical Tools – R

• And, of course, the old favorite – SQL

– Example: InfiniDB (Calpont)

CHALLENGES OF HADOOP

28 Copyright 2014 Senturus, Inc. All Rights Reserved.

• Everything is very NEW

• Playing field is changing DAILY

– The Wild West

• Tools still in v1.0 mode (at best)

• Does not eliminate the need for dimensional modeling

• Security TBD

• No “standard” (winners) declared yet

• Lots of rough edges still

• Simple things, like surrogate keys…

A DIZZYING FIELD OF PLAYERS • Alpine Data Labs, San Mateo, CA.

• Cloudera, Palo Alto, CA.

• Concurrent, San Francisco, CA.

• Continuum Analytics, Austin, TX.

• Continuuity, Palo Alto, CA.

• Couchbase, Mountain View, CA.

• Datameer, San Mateo, CA.

• DataSift, San Francisco, CA.

• DataStax, San Francisco, CA.

• DataXu, Boston, MA.

• Enigma, New York, NY.

• Factual, Los Angeles, CA.

• GoodData, San Francisco, CA.

• Gravity, New York, NY.

• Guavus, San Mateo, CA.

• Hadapt, Cambridge, MA

• Hopper, Cambridge, MA.

• Hortonworks, Palo Alto, CA.

• KarmaSphere, Cupertino, CA

• Lattice Engines, San Mateo, CA.

• MapR Technologies, San Jose, CA.

• MemSQL, New York, NY.

• Mortar Data, New York, NY.

• Mu Sigma, Northbrook, IL + India.

• Neo Technology, San Mateo, CA

• Opera Solutions, San Diego, CA + India.

• ParAccel, Campbell, CA.

• Pivotal Software, Palo Alto, CA

• Platfora:, San Mateo, CA.

• RainStor, San Francisco, CA.

• Rocket Fuel, Redwood City, CA.

• SiSense, Redwood Shores, CA and Israel.

• Skytree, Atlanta, GA.

• Splice Machine, San Francisco, CA.

• Splunk, San Francisco, CA

• Statwing, San Francisco, CA.

• SumAll, New York, NY.

• Talend, Los Altos, CA.

• WibiData, San Francisco, CA.

• Zettaset, Mountain View, CA

• Zoomdata, Reston, VA.

• 10gen, New York, NY

• 1010data, New York, NY.

29 Copyright 2014 Senturus, Inc. All Rights Reserved. Partial snapshop as of May 2014

IMPLICATIONS, PREDICTIONS & MISC. MUSINGS

TSUNAMI WARNING

IMPLICATIONS, PREDICTIONS & MUSINGS

31 Copyright 2014 Senturus, Inc. All Rights Reserved.

• Hadoop as a Data Staging environment

• Hadoop as an Archive

• Hadoop as the Data Warehouse

– “Enterprise Data Hub”

• Future role of RDBMS’s ?? – For OLTP

– For Data Warehouse

• How much Transformation and where?

TYPICAL “BEST PRACTICES” BI ARCHITECTURE INTEGRATED BUSINESS PROCESS DIMENSIONAL MODELS WITH METADATA LAYER(S)

32 Copyright 2014 Senturus, Inc. All Rights Reserved.

ERP Data

CRM Data

Data

Inte

grati

on

Conforming Business Process

Dimensional Models

Standard Reports

Web P

ort

al

Other Sources

Information Security

Data Warehouse

Data

Abst

ract

ion M

odel

Ad h

oc

Query

ing Planning Data

Slic

ing

&

Dic

ing

Dash

board

Auth

ori

ng

Report

Auth

ori

ng

Dashboards/ Scorecards

Sourc

e S

yste

ms

of

Reco

rd

Thre

shold

Ale

rtin

g

Self-service Reporting & Analysis

Single Version of the Truth

Threshold-based Alerts

POTENTIAL BI ARCHITECTURE USING HADOOP INTEGRATED BUSINESS PROCESS DIMENSIONAL MODELS WITH METADATA LAYER(S)

33 Copyright 2014 Senturus, Inc. All Rights Reserved.

ERP Data

CRM Data

Data

Inte

grati

on

Conforming Business Process

Dimensional Models

Standard Reports

Web P

ort

al

Other Sources Information Security

Data Warehouse

Data

Abst

ract

ion M

odel

Ad h

oc

Query

ing Planning Data

Slic

ing

&

Dic

ing

Dash

board

Auth

ori

ng

Report

Auth

ori

ng

Dashboards/ Scorecards

Sourc

e S

yste

ms

of

Reco

rd

Thre

shold

Ale

rtin

g

Self-service Reporting & Analysis

Single Version of the Truth

Threshold-based Alerts

Hadoop Data Staging

IMPLICATIONS, PREDICTIONS & MUSINGS (CONT.)

34 Copyright 2014 Senturus, Inc. All Rights Reserved.

• What have I got to learn?

– MapReduce = No

– Hand-coding = No

– Scoop = Maybe

– SQL = YES

• Role of Existing Tools going forward

– ETL

– BI Front-ends

• Role of DW Appliances?

– HANA

– IBM PureData System (formerly Netezza), etc.

IMPLICATIONS, PREDICTIONS & MUSINGS (CONT.)

35 Copyright 2014 Senturus, Inc. All Rights Reserved.

• What is the impact on end-users seeking information?

• We still need:

– Data delivered in business user-friendly state

– Rich, relevant and conforming dimensions

– Ability to account for dimension changes over time

– Good performance (transformation and aggregation)

– Ability to integrate with existing systems

JP’S CONCLUSION #1

36 Copyright 2014 Senturus, Inc. All Rights Reserved.

Wow, this stuff is a BIG game changer

JP’S CONCLUSION #2

37 Copyright 2014 Senturus, Inc. All Rights Reserved.

It’s too early to call on the specifics

JP’S CONCLUSION #3

38 Copyright 2014 Senturus, Inc. All Rights Reserved.

DW Architectures & Technologies

are in a huge state of flux

But…

DW Principles still apply

Resources, Upcoming Events, Q&A

NEED MORE INFO?

• Cloudera & Ralph Kimball – Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop

Professionals – http://www.cloudera.com/content/cloudera/en/resources/library/recordedwebinar/

best-practices-for-the-hadoop-data-warehouse-video.html

– Building a Hadoop Data Warehouse: Hadoop 101 for EDW Professionals – http://www.cloudera.com/content/cloudera/en/resources/library/recordedwebinar/

building-a-hadoop-data-warehouse-video.html

• MapR & Jack Norris – How (and Why) Hadoop is Changing the Data Warehousing Paradigm

– http://tdwi.org/articles/2013/08/13/hadoop-changing-dw-paradigm.aspx

• HortonWorks – http://hortonworks.com/hadoop/

• Senturus.com – http://senturus.com/resources/

– jpeterson@senturus.com or jfrazier@senturus.com

ADDITIONAL RESOURCES

40 Copyright 2014 Senturus, Inc. All Rights Reserved

Contact us for

help on a POC

www.senturus.com

UPCOMING EVENTS

41 Copyright 2014 Senturus, Inc. All Rights Reserved

42 Copyright 2014 Senturus, Inc. All Rights Reserved.

More info….

Q & A

Helping Companies Learn From the Past, Manage the Present and Shape

the Future

www.senturus.com 888-601-6010 info@senturus.com

Thank You

Copyright 2014 by Senturus, Inc. This entire presentation is copyrighted and may not be reused or

distributed without the written consent of Senturus, Inc.

top related