hadoop, sql & nosql: no longer an either-or question

19
© Copyright Ovum. All rights reserved. Ovum is a subsidiary of Infor 1 Hadoop, SQL & NoSQL – No longer an either or question Tony Baer Hadoop Summit 2014 June 4, 2014

Upload: tony-baer

Post on 26-Jan-2015

116 views

Category:

Data & Analytics


0 download

DESCRIPTION

It used to be black and white. If you needed MapReduce processing, you chose Hadoop; if you needed standard query and reporting, you chose a SQL data warehouse. The decision is no longer clear cut. With YARN clearing the way for Hadoop to accept multiple workloads, Hadoop is no longer your father’s MapReduce machine – as frameworks are rapidly emerging for interactive SQL, search, streaming and other workloads. We are on the path toward a federated world of analytic and operational decision stores, but as the boundaries between platform types grow fuzzier, deciding what platforms to use and where to run which workloads grow trickier.

TRANSCRIPT

Page 1: Hadoop, SQL & NoSQL: No Longer an Either-or Question

© Copyright Ovum. All rights reserved. Ovum is a subsidiary of Informa plc.1

Hadoop, SQL & NoSQL – No longer an either or question

Tony Baer

Hadoop Summit 2014

June 4, 2014

Page 2: Hadoop, SQL & NoSQL: No Longer an Either-or Question

© Copyright Ovum. All rights reserved. Ovum is an Informa business.2

Where we’ve come – Twins separated at birth & joyous reunion

Why/how the convergence?

Loose ends

Agenda

Page 3: Hadoop, SQL & NoSQL: No Longer an Either-or Question

© Copyright Ovum. All rights reserved. Ovum is an Informa business.3

SQL RDBMS

File systems

Hierarchical Data stores

OODBMS

SQL, NoSQL, Hadoop

1970s1980s

1990s

2000s

2010s

Network Data stores

Page 4: Hadoop, SQL & NoSQL: No Longer an Either-or Question

© Copyright Ovum. All rights reserved. Ovum is an Informa business.4

Early Development

Commercialization Ecosystem Formation

1960s 1980s 1990s 2000s

“Prehistoric”

EF Codd publishes seminal RDBMS model

IBM System R,

Ingres

DB2, Oracle,

Teradata, PC-based DBMSs

SQL becomes de facto

enterprise standard

data platform

Tooling emerges

SQL market consolidates:

Oracle, DB2, SQL Server,

Teradata

NewSQL analytic

platforms emerge

Mainframe era Midranges & PCs emerge

Big Data

2014

DBMSs add multiple engines

Database timeline

1970s

Client/server & n-Tier

Ecosystem Broadens

CODASYL, IMS

MySQL/ LAMP stack

emerges

J2EE, .NET

Page 5: Hadoop, SQL & NoSQL: No Longer an Either-or Question

© Copyright Ovum. All rights reserved. Ovum is an Informa business.5

Early Development Commercialization Ecosystem Formation

2003 - 2005 2009 2011 2012 2013

First Advanced

SQL platforms emerge

Hadoop emerges

Other NoSQL

platforms emerge

Cloudera intros

comm’l Hadoop support

Major vendors enter Big

Data market

Tooling emerges

2nd wave NewSQL platforms emerge

Big Data Tools emerge

Internet firm early adopters

Enterprise early adopters (FS & Media)

Mainstream adoption begins

2014

Big Data Apps

emerge

Big Data platform timeline

Hortonworks enters market

MongoDB, Cassandra

emerge

Page 6: Hadoop, SQL & NoSQL: No Longer an Either-or Question

© Copyright Ovum. All rights reserved. Ovum is an Informa business.6

Platform proliferation =Data processing silos

SQL RDBMS

NewSQL RDBMS

NoSQL Key-Value

NoSQL JSON

Hadoop

OLTP (ACID)

OLTP (Non-ACID)

BI Query & Report

Analytics

OLTP (Non-ACID)

Advanced Analytics

Operational Decision Support

Operational Decision Support

MapReduce- based

Advanced Analytics

Page 7: Hadoop, SQL & NoSQL: No Longer an Either-or Question

© Copyright Ovum. All rights reserved. Ovum is an Informa business.7

Where we’ve come – Twins separated at birth & joyous reunion

Why/how the convergence?

Loose ends

Agenda

Page 8: Hadoop, SQL & NoSQL: No Longer an Either-or Question

© Copyright Ovum. All rights reserved. Ovum is an Informa business.8

Analytic SLA requirements vary

Batch Periodic Interactive Real-time

Exploratory Analytics Standard

reporting

Days/Hours Seconds Split seconds

Interactive query

Search

Streaming

Decision Support

Modeling

Operational Decision Support

Hours/Minutes

Page 9: Hadoop, SQL & NoSQL: No Longer an Either-or Question

© Copyright Ovum. All rights reserved. Ovum is an Informa business.9

Analytics problems cross silos –Operational examples

Customer engagement

Interaction – Customer 360 query in DW

Behavior – Enrich with sentiment analysis on Hadoop

Engagement – Manage real-time engagement on NoSQL database

Risk mitigation

Baseline – Model party & transactional risk on DW or Hadoop

Enrich – Analyze, rank impact of externalities on Hadoop

Ingest – Real-time market feeds via streaming in-memory

Define – Decision processes offline via BPM

Act – Allow/deny credit on system of record

Page 10: Hadoop, SQL & NoSQL: No Longer an Either-or Question

© Copyright Ovum. All rights reserved. Ovum is an Informa business.10

Architecture –Common threads

Aggressive tiering

Multiple storage engines

Multiple workload types

On the horizon:

Federated query

Workload/query orchestration

Loose ends:

Common security?

Page 11: Hadoop, SQL & NoSQL: No Longer an Either-or Question

© Copyright Ovum. All rights reserved. Ovum is an Informa business.11

SQL Databases adding multiple personas

IBM DB2

BLU architecture adds columnar, data skipping, advanced tiering

New MongoDB-compliant JSON data store

Oracle Database 12c

“In-Memory” option adds DRAM-based columnar, extreme compression

Microsoft SQL Server

PDW adds columnar indexing

PolyBase feature adds Hadoop integration

Teradata

Teradata 14.10 adds “Intelligent Memory” data tiering, columnar, Hadoop integration

Aster 6 adds graph, file store, “SNAP” framework for choreographing SQL, MapReduce, graph & Hadoop processing

SAP

“Smart Data Access” federated query over HANA, Sybase IQ, Teradata & Hadoop

Page 12: Hadoop, SQL & NoSQL: No Longer an Either-or Question

© Copyright Ovum. All rights reserved. Ovum is an Informa business.12

Hadoop growing beyond MapReduce

Apache Hadoop 2.0’s new YARN resource allocation framework allows multiple workloads

Interactive SQL – lots of flavors

Spark – The new MapReduce & more…

Search

Streaming

Loose ends:

Graph ready for prime time?

Page 13: Hadoop, SQL & NoSQL: No Longer an Either-or Question

© Copyright Ovum. All rights reserved. Ovum is an Informa business.13

Emerging NewSQL + NoSQL databases

JSON data stores exploding

Intuitive for representing Internet data

MongoDB, Couchbase

IBM, Teradata… potentially Oracle adding JSON

New transaction stores … not full ACID

Cassandra for NoSQL (integrated to Hadoop)

NuoDB, Clustrix, MemSQL & others reinvent OLTP for distributed Internet apps

HBase

DynamoDB, Berkeley DB (Oracle NoSQL database) & other key-value stores

Page 14: Hadoop, SQL & NoSQL: No Longer an Either-or Question

© Copyright Ovum. All rights reserved. Ovum is an Informa business.14

A variety of overlapping choices

NewSQL

JSON

Graph

Hadoop

SQL

Deep analytics

StreamGraph

NoSQLAccount/user profiles

Interactive content

Graph

Machine data

JSON

SQL RDBMSOLTP

DW

JSON

Distributed OLTP

Fast, deep analytics

Active Archiving

SQ

L R

DB

MS

New

SQ

L R

DB

MS

No

SQ

L K

ey-V

alu

e

No

SQ

L J

SO

N

Had

oo

p

From To

Page 15: Hadoop, SQL & NoSQL: No Longer an Either-or Question

© Copyright Ovum. All rights reserved. Ovum is an Informa business.15

A variety of overlapping choices –But…

Who owns the logical

hub?

SQL RDBMS NewSQL

Hadoop NoSQL

OLTP

DW

Active Archiving

JSON

Distributed OLTP

Fast, deep analytics

JSON

Graph

SQL

Deep analytics

StreamGraph

Account/user profiles

Interactive content

Graph

Machine data

JSON

Page 16: Hadoop, SQL & NoSQL: No Longer an Either-or Question

© Copyright Ovum. All rights reserved. Ovum is an Informa business.16

Where we’ve come – Twins separated at birth & joyous reunion

Why/how the convergence?

Loose ends

Agenda

Page 17: Hadoop, SQL & NoSQL: No Longer an Either-or Question

© Copyright Ovum. All rights reserved. Ovum is an Informa business.17

Loose ends

Ideally, policy-based federated query will be the solution

Who owns federated query?

Data platform?

BI tool?

Application?

Who owns workload management?

Who owns security?

Tug of war between data platforms likely

Page 18: Hadoop, SQL & NoSQL: No Longer an Either-or Question

© Copyright Ovum. All rights reserved. Ovum is an Informa business.18

Takeaways

Analytics no longer limited by platform constraints

Data platforms are taking multiple personas –

Platform choice is not either/or

But

Analytics are no longer silo’ed

Execution remains silo’ed

The brass ring will be a logical hub for

Policy/SLA-based workload targeting & management

Security & operations/performance management

Page 19: Hadoop, SQL & NoSQL: No Longer an Either-or Question

© Copyright Ovum. All rights reserved. Ovum is a subsidiary of Informa plc.19

Thank you

Tony Baer

Ovum

(646) 546-5330

@TonyBaer

[email protected]