hadoop: extending your data warehouse

28
1 Hadoop: Extending Your Data Warehouse Tony Baer | Principal Analyst, Ovum Moderated by Matt Brandwein | Product Marketing Manager, Cloudera May 9, 2013

Upload: cloudera-inc

Post on 20-Aug-2015

7.283 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Hadoop: Extending your Data Warehouse

1

Hadoop: Extending Your Data WarehouseTony Baer | Principal Analyst, OvumModerated by Matt Brandwein | Product Marketing Manager, Cloudera

May 9, 2013

Page 2: Hadoop: Extending your Data Warehouse

Welcome to the webinar!

• All lines are muted• Q&A after the presentation• Ask questions at any time by typing them in the

“Questions” pane on your WebEx panel• Recording of this webinar will be available

on-demand at cloudera.com

• Join the conversation on Twitter:@cloudera @TonyBaer #EDWHadoop

2

Page 3: Hadoop: Extending your Data Warehouse

3

Who is Cloudera?

What the Enterprise Requires

Only 100% open source Hadoop-based platform with both batch and real-time processing engines, enterprise-ready with native high availability

Suite of system and data management software

Comprehensive support and consulting services

Broadest Hadoop training and certification programs

Extensive Partner Ecosystem

Over 600 partners across hardware, software and services

The Leader in Big Data

Management

Deliver a revolutionary data management platform powered by Apache Hadoop

World’s leading commercial vendor of Apache Hadoop

Enable organizations to improve operational efficiency and Ask Bigger Questions of all their data

Customers & Users Across Industries

More production deployments than all other vendors combined

Page 4: Hadoop: Extending your Data Warehouse

© Copyright Ovum. All rights reserved. Ovum is a subsidiary of Informa plc.4

Hadoop: Extending your Data Warehouse

Tony Baer

[email protected]

May 9, 2013

Twitter: @TonyBaer

Page 5: Hadoop: Extending your Data Warehouse

© Copyright Ovum. All rights reserved. Ovum is an Informa business.5

The BI Bottleneck

Hadoop & Enterprise Data Warehousing strategy

How Cloudera supports Hadoop as extended DW

Agenda

Page 6: Hadoop: Extending your Data Warehouse

© Copyright Ovum. All rights reserved. Ovum is an Informa business.6

Sources Target(s)Staging Server

Extract Transform Load

Data Marts

DW

Traditional BI/Data warehousing architecture

ETL Tool

Page 7: Hadoop: Extending your Data Warehouse

© Copyright Ovum. All rights reserved. Ovum is an Informa business.7

DWs conceived for MBytes/GBytes of structured data

Data structured based on expected queries & analytics

Multiple tiers to separate distinct workloads

OLTP – ongoing, shallow interactions, simple queries

Transform – batch-oriented, IOPS-intensive

BI/analytics – data-intensive, spikey

Reduced, eliminated impact on OLTP

More complex architecture, more tradeoffs

DW — The base case

Page 8: Hadoop: Extending your Data Warehouse

© Copyright Ovum. All rights reserved. Ovum is an Informa business.8

EDW hitting the wall

Data growing in volume & complexity

Use cases require more, richer data

Customer retention

Operational Efficiency

Risk Mitigation

Data retention mandates/policies forcing hard decisions

ETL bursting batch windows

EDWs straining to accommodate volumes, varieties of data

Page 9: Hadoop: Extending your Data Warehouse

© Copyright Ovum. All rights reserved. Ovum is an Informa business.9

Sources Target(s)

Extract Load/Transform

DW

Data Marts

The ELT pattern

Page 10: Hadoop: Extending your Data Warehouse

© Copyright Ovum. All rights reserved. Ovum is an Informa business.10

The benefits – and limits – of ELT

Pros

Fewer data movements

Flatter architecture

Reduced errors with fewer data movements

Cons

Transform vs. analytic workload tradeoffs

SLAs jeopardized

Triggers arms race for more infrastructure

Processing Times

Infrastructure CostsData

Volumes

Assuming constant SLAs

Page 11: Hadoop: Extending your Data Warehouse

© Copyright Ovum. All rights reserved. Ovum is an Informa business.11

Enterprise DWs – Size has its limits

SLAs hit the wall

Software licensing costs

PBytes @ $20k - $50k/TByte get $$$$$$

Managing/transforming new data types consumes resource

Page 12: Hadoop: Extending your Data Warehouse

© Copyright Ovum. All rights reserved. Ovum is an Informa business.12

But what if...

You don’t have to worry about batch windows

You don’t have to trade off transformation vs. analytic processing cycles

You can control s/w license cost escalation

You can keep that archived data live

You can more readily consume new types of data & keep your analytic options open

Page 13: Hadoop: Extending your Data Warehouse

© Copyright Ovum. All rights reserved. Ovum is an Informa business.13

The BI Bottleneck

Hadoop & Enterprise Data Warehousing strategy

How Cloudera supports Hadoop as extended DW

Agenda

Page 14: Hadoop: Extending your Data Warehouse

© Copyright Ovum. All rights reserved. Ovum is an Informa business.14

Introducing Hadoop

Originally, data processing framework for solving unique Internet-scale problems

Based on Google File System (GFS) & MapReduce

Apache Hadoop community emerged to develop platform for wider scale adoption

FS, telcos, retail media discovered Hadoop’s benefits

Page 15: Hadoop: Extending your Data Warehouse

© Copyright Ovum. All rights reserved. Ovum is an Informa business.15

Hadoop benefits

Scalability

Near linear performance up to

1000s of nodes

Cost Flexibility

Leverages commodity h/w & open source s/w

Versatility with data, analytics & operation

Page 16: Hadoop: Extending your Data Warehouse

© Copyright Ovum. All rights reserved. Ovum is an Informa business.16

Hadoop’s trump card —Flexibility

Accommodates all kinds of data

Accommodates multiple workloads

Keeps your options open

Extensibility

Life beyond MapReduce

Many personalities

Best of both worlds

Convergence with SQL

Get the best of both worlds

Page 17: Hadoop: Extending your Data Warehouse

© Copyright Ovum. All rights reserved. Ovum is an Informa business.17

Sources Target

Extract Load/Transform

Data Marts

Existing DW/Data Mart environment

Hadoop

DW

Hadoop as Data transformation platform

Page 18: Hadoop: Extending your Data Warehouse

© Copyright Ovum. All rights reserved. Ovum is an Informa business.18

Why Hadoop as your data transformation platform?

Inexpensive cycles/storage

Low-cost platform reduces or eliminates tradeoff contingencies

No more transformation vs. analytics choice

Keep your archive active

Flexible division of labor

Data can remain in Hadoop or moved to SQL

Raw data sits alongside transformed data

Page 19: Hadoop: Extending your Data Warehouse

© Copyright Ovum. All rights reserved. Ovum is an Informa business.19

Why Hadoop as extension to your DW?

Efficient division of labor

Run time-consuming, resource-intensive analytic workloads inside Hadoop

Routine query, analytics, & reporting in SQL DW or data mart

Query Hadoop directly

Most commercial BI tools read Hive metadata

Query Hadoop interactively

Emerging MapReduce alternatives supporting interactive query

Page 20: Hadoop: Extending your Data Warehouse

© Copyright Ovum. All rights reserved. Ovum is an Informa business.20

The BI Bottleneck

Hadoop & Enterprise Data Warehousing strategy

How Cloudera supports Hadoop as extended DW

Agenda

Page 21: Hadoop: Extending your Data Warehouse

© Copyright Ovum. All rights reserved. Ovum is an Informa business.21

Cloudera supports SQL convergence

Partners with leading ETL, BI, and Data warehousing platform & tool providers

Connect Hadoop & SQL platforms

Emerging trend: BI, ETL tools are working natively inside Hadoop

Introducing Impala

Brings high-performance interactive SQL inside Hadoop

Turns Hadoop into an MPP SQL analytic data target

Extends, doesn't replace your SQL EDW or data mart

Makes your DW strategy more flexible, iterative

Page 22: Hadoop: Extending your Data Warehouse

© Copyright Ovum. All rights reserved. Ovum is an Informa business.22

Taming Hadoop

Cloudera Manager

Automates deployment and health monitoring

Automates Hadoop configuration

New side-by-side deployment support

Cloudera Navigator

New feature of Cloudera Manager

Tracks data utilization activity from HDFS, Hive & HBase

Stepping stone for data security/stewardship… watch this space

Backup & Disaster Recovery (BDR)

New feature to automate recovery workflows

Page 23: Hadoop: Extending your Data Warehouse

© Copyright Ovum. All rights reserved. Ovum is an Informa business.23

Hadoop –Takeaways

Economical platform for offloading data transformation cycles

Extends enterprise analytics

Hadoop & SQL are converging– broadening your analytic options

Hadoop won’t replace your EDW, but will take more of the workload

Cloudera actively broadening CDH to support & extend your EDW

SQL convergence

Platform manageability

Data security & stewardship

Page 24: Hadoop: Extending your Data Warehouse

24

Impala: Cloudera’s Design Strategy

Storage

Integration

Resource Management

Met

adat

a

BatchProcessingMAPREDUCE,

HIVE & PIG

…Interactive

SQLIMPALA

MathMachineLearning, Analytics

HDFS HBase

TEXT, RCFILE, PARQUET, AVRO, ETC. RECORDS

EnginesComplement MapReduce withinteractive MPP SQL engine

One pool of data

One metadata model

One security framework

One set of system resources

100% open source

An Integrated Part of the Hadoop Platform

Page 25: Hadoop: Extending your Data Warehouse

25

Impala Use Cases

Interactive BI/analytics on more data

Asking new questions

Data processing with tight SLAs

Query-able archive w/ full fidelity

Cost-effective, ad hoc query environment that offloads the data warehouse for:

Page 26: Hadoop: Extending your Data Warehouse

26

Leading BI tools work with Impala

Page 27: Hadoop: Extending your Data Warehouse

Questions?

27

• Type in the “Questions” panel

• Tweet @cloudera #EDWHadoop

• Recording will be available on-demand at cloudera.com

• Contact us:[email protected]: @TonyBaer

[email protected]: @MattBrandwein

Thank you for attending!

Try Cloudera todaycloudera.com/downloads

Learn more about Impala cloudera.com/impala

Get Hadoop Traininguniversity.cloudera.com

Ready to go?Check out Cloudera Quickstart

cloudera.com/quickstart

Page 28: Hadoop: Extending your Data Warehouse