dr. amr awadallah, cto/founder @awadallah, aaa@cloudera hadoop in the... · apache hadoop in the...

22
Apache Hadoop in the Enterprise Dr. Amr Awadallah, CTO/Founder @awadallah, [email protected]

Upload: others

Post on 24-May-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera Hadoop in the... · Apache Hadoop in the Enterprise Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera.com . Cloudera

Apache Hadoop in the Enterprise

Dr. Amr Awadallah, CTO/Founder

@awadallah, [email protected]

Page 2: Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera Hadoop in the... · Apache Hadoop in the Enterprise Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera.com . Cloudera

Cloudera The Leader in Big Data Management Powered by Apache Hadoop™

The Leading Open Source Distribution of Apache Hadoop

Powerful Suite of System & Data Management Software

Built for the Enterprise

Founded: 2008

Employees: 450+

Customers: Over 50% of the Fortune 50 and 65% of the Fortune 500 plus top US intelligence and defense agencies

Partner Ecosystem: 700+ in hardware, software, and services

Education: 15,000+ trained annually; developers, admins, analysts, data scientists

Community: Founders and top supporters of the Hadoop open source ecosystem

2 ©2013 Cloudera, Inc. All Rights Reserved.

Page 3: Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera Hadoop in the... · Apache Hadoop in the Enterprise Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera.com . Cloudera

Cloudera’s Mission Help Organizations Gain Value from All Their Data

Solve data problems.

Solve problems with data.

Ask Bigger Questions.

3 ©2013 Cloudera, Inc. All Rights Reserved.

Page 4: Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera Hadoop in the... · Apache Hadoop in the Enterprise Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera.com . Cloudera

Why is This Happening Now?

4 ©2013 Cloudera, Inc. All Rights Reserved.

Page 5: Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera Hadoop in the... · Apache Hadoop in the Enterprise Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera.com . Cloudera

IT’S ALL (BIG) DATA (NOT)

10TB to 10PB

5 ©2013 Cloudera, Inc. All Rights Reserved.

Page 6: Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera Hadoop in the... · Apache Hadoop in the Enterprise Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera.com . Cloudera

Complications of Status Quo

Structure Storage Network Silos

6 ©2013 Cloudera, Inc. All Rights Reserved.

Page 7: Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera Hadoop in the... · Apache Hadoop in the Enterprise Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera.com . Cloudera

The Story of “T”

7

OLTP

Enterprise Applications

ODS

Data Warehouse

Query Extract

Transform

Load

Transform

Business Intelligence

©2013 Cloudera, Inc. All Rights Reserved.

Page 8: Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera Hadoop in the... · Apache Hadoop in the Enterprise Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera.com . Cloudera

Volume, Velocity, Variety = Problems

8

OLTP

Enterprise Applications

Data Warehouse

Query Extract

Transform

Load

Transform

1

1

1

Slow Data Transformations = Missed ETL SLAs.

2

2

Slow Queries = Frustrated Business Users.

3 Must Archive. Archived data has a ton of latent value

Business Intelligence

©2013 Cloudera, Inc. All Rights Reserved.

Page 9: Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera Hadoop in the... · Apache Hadoop in the Enterprise Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera.com . Cloudera

Data Warehouse Optimization

9

OLTP

Enterprise Applications

ODS

Data Warehouse

Query (High $/Byte)

Cloudera

Transform

Query History

Active Storage

ETL Business Intelligence

©2013 Cloudera, Inc. All Rights Reserved.

Page 10: Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera Hadoop in the... · Apache Hadoop in the Enterprise Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera.com . Cloudera

10

Our Vision: The Android of Big Data

Integration and Data Collection

Storage for All of your Data (Structured or Unstructured)

Met

adat

a

Man

age

me

nt

Secu

rity

Batch Processing

… Interactive

SQL Interactive

Search Machine Learning

Partner Apps

Processing & Analytics

Resource Management

Cloudera Enterprise | The Platform for Big Data

©2013 Cloudera, Inc. All Rights Reserved.

Page 11: Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera Hadoop in the... · Apache Hadoop in the Enterprise Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera.com . Cloudera

Agility/Flexibility

11

Schema-on-Read (Hadoop):

Schema-on-Write (RDBMS):

• Prescriptive Data Modeling:

• Create static DB schema

• Transform data into RDBMS

• Query data in RDBMS format

• New columns must be added explicitly before new data can propagate into the system.

• Good for Known Unknowns (Repetition)

• Descriptive Data Modeling:

• Copy data in its native format

• Create schema + parser

• Query Data in its native format (does ETL on the fly)

• New data can start flowing any time and will appear retroactively once the schema/parser properly describes it.

• Good for Unknown Unknowns (Exploration)

©2013 Cloudera, Inc. All Rights Reserved.

Page 12: Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera Hadoop in the... · Apache Hadoop in the Enterprise Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera.com . Cloudera

Scalable Technology + Scalable Development

12

Grows without requiring developers to re-architect their algorithms/application

©2013 Cloudera, Inc. All Rights Reserved.

AUTO SCALE

Page 13: Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera Hadoop in the... · Apache Hadoop in the Enterprise Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera.com . Cloudera

Low ROB (but still a ton of

aggregate value)

High ROB

Economics: Return on Byte

13 ©2013 Cloudera, Inc. All Rights Reserved.

Page 14: Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera Hadoop in the... · Apache Hadoop in the Enterprise Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera.com . Cloudera

Cloudera Impala

14

BEFORE IMPALA

• With Impala: Interactive ANSI-92 SQL queries Native distributed query engine Optimized for low-latency

• Provides:

Answers as fast as you can ask Everyone can ask questions of all data Big data storage and analytics together

WITH IMPALA

• Unified storage: Supports HDFS and HBase Flexible file formats and schemas

• Unified Metastore • Unified Security • Unified Client Interfaces:

ODBC/JDBC SQL syntax Hue Beeswax Web UI

BATCH PROCESSING

USER INTERFACE

REAL-TIME ACCESS

©2013 Cloudera, Inc. All Rights Reserved.

Page 15: Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera Hadoop in the... · Apache Hadoop in the Enterprise Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera.com . Cloudera

But What about the RDBMS?

15

“Use right tool for the right job”

Optimize existing EDW systems for high-performance operational analytics

MOVE TO CLOUDERA

• Historical Data

• Data Processing

• Ad Hoc Exploration

• Transformation/Batch

KEEP IN EDW

• Operational Analytics

• Reporting

• Multi-statement Transactions

©2013 Cloudera, Inc. All Rights Reserved.

Page 16: Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera Hadoop in the... · Apache Hadoop in the Enterprise Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera.com . Cloudera

Legacy Information Architecture

16

Enterprise Applications

OLTP Systems

Networked Storage

ETL Grid

Data Warehouse

BI &

Rep

ort

ing

Page 17: Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera Hadoop in the... · Apache Hadoop in the Enterprise Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera.com . Cloudera

New Information Architecture

17

Enterprise Applications

OLTP Systems

Networked Storage

ETL Grid

Data Warehouse

BI &

Rep

ort

ing

Page 18: Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera Hadoop in the... · Apache Hadoop in the Enterprise Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera.com . Cloudera

The New Enterprise Big Data Stack

18 ©2013 Cloudera, Inc. All Rights Reserved.

Page 19: Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera Hadoop in the... · Apache Hadoop in the Enterprise Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera.com . Cloudera

19

Maturity Path

Operational Efficiency Competitive Advantage

ETL Acceleration

EDW Optimization

Deep BI Exploration

Historical Compliance

Agility Of

Schema

Not Only SQL

Any Data Type

Consolidation Data Hub

Business IT

Page 20: Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera Hadoop in the... · Apache Hadoop in the Enterprise Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera.com . Cloudera

Beyond Data Warehousing

20

COMMUNICATIONS Location- based advertising

HEALTH CARE Patient sensors, monitoring, EHRs Quality of care

LAW ENFORCEMENT & DEFENSE Threat analysis, Social media monitoring, Photo analysis

EDUCATION & RESEARCH Experiment sensor analysis

FINANCIAL SERVICES Risk & portfolio analysis New products

ON-LINE SERVICES / SOCIAL MEDIA People & career matching Website optimization

UTILITIES Smart Meter analysis for network capacity

CONSUMER PACKAGED GOODS Sentiment analysis of what’s hot, customer service

MEDIA / ENTERTAINMENT Viewers / advertising effectiveness

TRAVEL & TRANSPORTATION Sensor analysis for optimal traffic flows Customer sentiment

LIFE SCIENCES Clinical trials Genomics

RETAIL Consumer sentiment Optimized marketing

AUTOMOTIVE Auto sensors reporting location, problems

HIGH TECHNOLOGY / INDUSTRIAL MFG. Mfg quality Warranty analysis

OIL & GAS Drilling exploration sensor analysis

©2013 Cloudera, Inc. All Rights Reserved.

Page 21: Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera Hadoop in the... · Apache Hadoop in the Enterprise Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera.com . Cloudera

Benefit 1: Flexibility • Store any data • Run any analysis • Keep’s pace with the rate of change of incoming data

Benefit 2: Scalability • Proven growth to PBS/1,000s of nodes • No need to rewrite queries, automatically scales • Keep’s pace with the rate of growth of incoming data Benefit 3: Economics • Cost per TB at a fraction of other options • Keep all of your data alive in an active archive • Powering the data beats algorithm movement

The Cloudera Platform for Big Data

21 ©2013 Cloudera, Inc. All Rights Reserved.

Key Use Cases: • Transformation Offload (aka ETL/ELT Offload) • Exploratory Archive (aka Active Archive)

Page 22: Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera Hadoop in the... · Apache Hadoop in the Enterprise Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera.com . Cloudera

Dr. Amr Awadallah CTO/Founder @awadallah [email protected]