riding the elephant - hadoop 2.0

24
Simon Elliston Ball Head of Big Data - Red Gate Ventures @sireb Riding the Elephant: Hadoop 2.0 http://bit.ly/RidingElephants

Upload: simon-elliston-ball

Post on 04-Jul-2015

275 views

Category:

Documents


3 download

DESCRIPTION

Hadoop 2.0, and in particular YARN has opened up a lot of potential applications beyond MapReduce. This presentation explains some of the ways this happened, and what you can now do that you couldn't before. It also introduces some new tools (Spark) and infrastructure pieces (Mesos) to achieve even more efficient cluster use.

TRANSCRIPT

Page 1: Riding the Elephant - Hadoop 2.0

Simon Elliston Ball Head of Big Data - Red Gate Ventures

@sireb

Riding the Elephant: Hadoop 2.0

http://bit.ly/RidingElephants

Page 2: Riding the Elephant - Hadoop 2.0

Append only distributed file-system

In the beginning…

Map ReduceJava.

Page 3: Riding the Elephant - Hadoop 2.0

JVM Based (scala, groovy, jython, clojure)

More languages

Streaming (python, whatever)HDP for Windows and .NET SDK

Page 4: Riding the Elephant - Hadoop 2.0

Abstraction

Photo: https://www.flickr.com/photos/puroticorico/

Hive, PigCascadingScalding

Page 5: Riding the Elephant - Hadoop 2.0

SQL on Hadoop

Learning to share the toys

HBaseSolr on Hadoop

Sharing HDFS…

Page 6: Riding the Elephant - Hadoop 2.0

Map Reduce v1

JobTracker

Job

Head Node

TaskTrackerTask (Map /

Reduce)

Data Nodem slot 1m slot 2…m slot n

Task

Task

Task

r slot 1r slot 2…r slot n

TaskTrackerTask (Map /

Reduce)

Data Nodem slot 1m slot 2…m slot n

r slot 1r slot 2…r slot n

TaskTrackerTask (Map /

Reduce)

Data Nodem slot 1m slot 2…m slot n

r slot 1r slot 2…r slot n

Page 7: Riding the Elephant - Hadoop 2.0

Map Reduce v1

JobTracker

Job

Head Node

TaskTrackerTask (Map /

Reduce)

Data Nodem slot 1m slot 2…m slot n

MR Status

MR Status

MR Status

r slot 1r slot 2…r slot n

TaskTrackerTask (Map /

Reduce)

Data Nodem slot 1m slot 2…m slot n

r slot 1r slot 2…r slot n

TaskTrackerTask (Map /

Reduce)

Data Nodem slot 1m slot 2…m slot n

r slot 1r slot 2…r slot n

Page 8: Riding the Elephant - Hadoop 2.0

Typical Hadoop 1.x setup

HBase

Production

Adhoc

Page 9: Riding the Elephant - Hadoop 2.0

Typical Hadoop 1.x setup

HBase

Production

Adhoc

Page 10: Riding the Elephant - Hadoop 2.0

YARN architecture

Container

Application

Master

Container

Data Node

Node Manager

Container

Container

Container

Data Node

Node Manager

Application

Master

Container

Free Slot

Data Node

Node Manager

ResourceManager

YARN Client

Page 11: Riding the Elephant - Hadoop 2.0

YARN architecture

Container

Application

Master

Container

Data Node

Node Manager

Container

Container

Container

Data Node

Node Manager

Application

Master

Container

Free Slot

Data Node

Node Manager

ResourceManager

YARN Client

Page 12: Riding the Elephant - Hadoop 2.0

YARN architecture

Container

Application

Master

Container

Data Node

Node Manager

Container

Container

Container

Data Node

Node Manager

Application

Master

Container

Free Slot

Data Node

Node Manager

ResourceManager

YARN Client

Page 13: Riding the Elephant - Hadoop 2.0

YARN architecture

Container

Application

Master

Container

Data Node

Node Manager

Container

Container

Container

Data Node

Node Manager

Application

Master

Container

Free Slot

Data Node

Node Manager

ResourceManager

YARN Client

Page 14: Riding the Elephant - Hadoop 2.0

Removing the choke point

Advantages

60%-150% better usageLong running applications

Page 15: Riding the Elephant - Hadoop 2.0

Not quite…

Operating system for Big Data?

Security

…but a framework for Big Data Apps

Data Access abstraction

Page 16: Riding the Elephant - Hadoop 2.0

Storm on YARN

A whole batch of new applications

HOYA

Tez (Stinger)

MapReduce 2

Giraph

<Insert your application here>

Page 17: Riding the Elephant - Hadoop 2.0

Batch applications

Spinning YARNs with Spring

ServicesDirect to YARN APIsSpring Data Hadoop abstraction

Page 18: Riding the Elephant - Hadoop 2.0

Streamin

g

Why?

Machine

LearningGraph

sService

sDistributed Shell -

Anything.

Page 19: Riding the Elephant - Hadoop 2.0

Spark

A higher abstraction

Hadoop based?

… but can run on YARN

In Memory

Distributed

Fault tolerant

Real-time

✓✓✓

✓�

RRDs

Page 20: Riding the Elephant - Hadoop 2.0

Mesos

Wider sharing

Hadoop

Spark

Aurora

Mesos Framework

Hardware

YARN

MapReduce

HBase etc

HDFS

Page 21: Riding the Elephant - Hadoop 2.0

Hadoop is more than MapReduce

The new world

YARN opens up new paradigmsInfrastructure maturing: better sharing

Page 22: Riding the Elephant - Hadoop 2.0

Hadoop and beyond!

Page 23: Riding the Elephant - Hadoop 2.0

Thank you

Page 24: Riding the Elephant - Hadoop 2.0

Questions?Simon Elliston Ball Head of Big Data - Red Gate Ventures

@[email protected]

http://bit.ly/RidingElephants