riding the elephant - hadoop 2.0
DESCRIPTION
Hadoop 2.0, and in particular YARN has opened up a lot of potential applications beyond MapReduce. This presentation explains some of the ways this happened, and what you can now do that you couldn't before. It also introduces some new tools (Spark) and infrastructure pieces (Mesos) to achieve even more efficient cluster use.TRANSCRIPT
Simon Elliston Ball Head of Big Data - Red Gate Ventures
@sireb
Riding the Elephant: Hadoop 2.0
http://bit.ly/RidingElephants
Append only distributed file-system
In the beginning…
Map ReduceJava.
JVM Based (scala, groovy, jython, clojure)
More languages
Streaming (python, whatever)HDP for Windows and .NET SDK
Abstraction
Photo: https://www.flickr.com/photos/puroticorico/
Hive, PigCascadingScalding
SQL on Hadoop
Learning to share the toys
HBaseSolr on Hadoop
Sharing HDFS…
Map Reduce v1
JobTracker
Job
Head Node
TaskTrackerTask (Map /
Reduce)
Data Nodem slot 1m slot 2…m slot n
Task
Task
Task
r slot 1r slot 2…r slot n
TaskTrackerTask (Map /
Reduce)
Data Nodem slot 1m slot 2…m slot n
r slot 1r slot 2…r slot n
TaskTrackerTask (Map /
Reduce)
Data Nodem slot 1m slot 2…m slot n
r slot 1r slot 2…r slot n
Map Reduce v1
JobTracker
Job
Head Node
TaskTrackerTask (Map /
Reduce)
Data Nodem slot 1m slot 2…m slot n
MR Status
MR Status
MR Status
r slot 1r slot 2…r slot n
TaskTrackerTask (Map /
Reduce)
Data Nodem slot 1m slot 2…m slot n
r slot 1r slot 2…r slot n
TaskTrackerTask (Map /
Reduce)
Data Nodem slot 1m slot 2…m slot n
r slot 1r slot 2…r slot n
Typical Hadoop 1.x setup
HBase
Production
Adhoc
Typical Hadoop 1.x setup
HBase
Production
Adhoc
YARN architecture
Container
Application
Master
Container
Data Node
Node Manager
Container
Container
Container
Data Node
Node Manager
Application
Master
Container
Free Slot
Data Node
Node Manager
ResourceManager
YARN Client
YARN architecture
Container
Application
Master
Container
Data Node
Node Manager
Container
Container
Container
Data Node
Node Manager
Application
Master
Container
Free Slot
Data Node
Node Manager
ResourceManager
YARN Client
YARN architecture
Container
Application
Master
Container
Data Node
Node Manager
Container
Container
Container
Data Node
Node Manager
Application
Master
Container
Free Slot
Data Node
Node Manager
ResourceManager
YARN Client
YARN architecture
Container
Application
Master
Container
Data Node
Node Manager
Container
Container
Container
Data Node
Node Manager
Application
Master
Container
Free Slot
Data Node
Node Manager
ResourceManager
YARN Client
Removing the choke point
Advantages
60%-150% better usageLong running applications
Not quite…
Operating system for Big Data?
Security
…but a framework for Big Data Apps
Data Access abstraction
Storm on YARN
A whole batch of new applications
HOYA
Tez (Stinger)
MapReduce 2
Giraph
<Insert your application here>
Batch applications
Spinning YARNs with Spring
ServicesDirect to YARN APIsSpring Data Hadoop abstraction
Streamin
g
Why?
Machine
LearningGraph
sService
sDistributed Shell -
Anything.
Spark
A higher abstraction
Hadoop based?
… but can run on YARN
In Memory
Distributed
Fault tolerant
Real-time
✓✓✓
✓�
RRDs
✓
Mesos
Wider sharing
Hadoop
Spark
Aurora
Mesos Framework
Hardware
YARN
MapReduce
HBase etc
HDFS
Hadoop is more than MapReduce
The new world
YARN opens up new paradigmsInfrastructure maturing: better sharing
Hadoop and beyond!
Thank you
Questions?Simon Elliston Ball Head of Big Data - Red Gate Ventures
http://bit.ly/RidingElephants