hadoop 1.x vs 2

23
Hadoop 1.x vs Hadoop 2 Rommel Garcia Solutions Engineer - Big Data Hortonwork

Upload: rommel-garcia

Post on 26-Jan-2015

111 views

Category:

Technology


1 download

DESCRIPTION

There's a big shift in both at the architecture and api level from Hadoop 1 vs Hadoop 2, particularly YARN and we had our first meetup to talk about this (http://www.meetup.com/Atlanta-YARN-User-Group/) on 10/13/2013.

TRANSCRIPT

Page 1: Hadoop 1.x vs 2

Hadoop 1.x vs Hadoop 2

Rommel Garcia Solutions Engineer - Big Data

Hortonworks

Page 2: Hadoop 1.x vs 2

Transition To Big Data

Relational Dimensional(EDW)

Big Data

Page 3: Hadoop 1.x vs 2

Data Explosion

Page 4: Hadoop 1.x vs 2

3 Design Dimensions

Page 5: Hadoop 1.x vs 2

Key Hadoop Data Types

Sentiment

Clickstream

Sensor/Machine

Geographic

Server Logs

Text

Page 6: Hadoop 1.x vs 2

Hadoop is NOT

ESB

NoSQL

HPC

Relational

Real-time

The “Jack of all Trades”

Page 7: Hadoop 1.x vs 2

Hadoop 1

Limited up to 4,000 nodes per cluster

O(# of tasks in a cluster)

JobTracker bottleneck - resource management, job scheduling and monitoring

Only has one namespace for managing HDFS

Map and Reduce slots are static

Only job to run is MapReduce

Page 8: Hadoop 1.x vs 2

Hadoop 1 - Basics

BBBB CCCC AAAA AAAA AAAA

AAAA BBBB CCCC CCCC BBBB

MapReduce (Computation Framework)

HDFS (Storage Framework)

Page 9: Hadoop 1.x vs 2

Hadoop 1 - Reading Files

Rack1 Rack2 Rack3 RackN

read file (fsimage/edit)Hadoop Client

NameNode SNameNode

return DNs, block ids, etc.

DN | TT

DN | TT

DN | TT

DN | TT

DN | TT

DN | TT

DN | TT

DN | TT

DN | TT

DN | TT

DN | TT

DN | TT

checkpoint

heartbeat/block reportread blocks

Page 10: Hadoop 1.x vs 2

Hadoop 1 - Writing Files

Rack1 Rack2 Rack3 RackN

request write (fsimage/edit)Hadoop Client

NameNode SNameNode

return DNs, etc.

DN | TT

DN | TT

DN | TT

DN | TT

DN | TT

DN | TT

DN | TT

DN | TT

DN | TT

DN | TT

DN | TT

DN | TT

checkpoint

block reportwrite blocks

replication pipelining

Page 11: Hadoop 1.x vs 2

Hadoop 1 - Running Jobs

Rack1 Rack2 Rack3 RackN

Hadoop Client

JobTracker

DN | TT

DN | TT

DN | TT

DN | TT

DN | TT

DN | TT

DN | TT

DN | TT

DN | TT

DN | TT

DN | TT

DN | TT

submit job

deploy job

part 0part 0part 0part 0

map

reduce

shuffle

Page 12: Hadoop 1.x vs 2

Hadoop 1 - Security

UsersUsersUsersUsers

FFIIRREEWWAALLLL

LDAP/AD

Client Node/Spoke Server

KDC

Hadoop Cluster

authN/authZ

service request

block token

delegate token

* block token is for accessing data

* delegate token is for running jobs

Encryption PluginEncryption Plugin

Page 13: Hadoop 1.x vs 2

Hadoop 1 - APIs

org.apache.hadoop.mapreduce.Partitioner

org.apache.hadoop.mapreduce.Mapper

org.apache.hadoop.mapreduce.Reducer

org.apache.hadoop.mapreduce.Job

Page 14: Hadoop 1.x vs 2

Hadoop 2

Potentially up to 10,000 nodes per cluster

O(cluster size)

Supports multiple namespace for managing HDFS

Efficient cluster utilization (YARN)

MRv1 backward and forward compatible

Any apps can integrate with Hadoop

Beyond Java

Page 15: Hadoop 1.x vs 2

Hadoop 2 - Basics

Page 16: Hadoop 1.x vs 2

Hadoop 2 - Reading Files

(w/ NN Federation)

Rack1 Rack2 Rack3 RackN

read file

fsimage/edit copyHadoop Client NN1/ns1

SNameNodeper NN

return DNs, block ids, etc.

DN | NM

DN | NM

DN | NM

DN | NM

DN | NM

DN | NM

DN | NM

DN | NM

DN | NM

DN | NM

DN | NM

DN | NM

checkpoint

register/heartbeat/

block report

read blocks

fs sync Backup NNper NN

checkpoint

NN2/ns2 NN3/ns3 NN4/ns4

or

ns1 ns2 ns3 ns4

dn1, dn2

dn1, dn3

dn4, dn5dn4, dn5

Block Pools

Page 17: Hadoop 1.x vs 2

Hadoop 2 - Writing Files

Rack1 Rack2 Rack3 RackN

request write

Hadoop Client

return DNs, etc.

DN | NM

DN | NM

DN | NM

DN | NM

DN | NM

DN | NM

DN | NM

DN | NM

DN | NM

DN | NM

DN | NM

DN | NM

write blocks

replication pipelining

fsimage/edit copyNN1/ns1

SNameNodeper NN

checkpoint

block report

fs sync Backup NNper NN

checkpoint

NN2/ns2 NN3/ns3 NN4/ns4

or

Page 18: Hadoop 1.x vs 2

Hadoop 2 - Running Jobs

RackN

NodeManager

NodeManager

NodeManager

Rack2

NodeManager

NodeManager

NodeManager

Rack1

NodeManager

NodeManager

NodeManager

C2.1

C1.4

AM2

C2.2 C2.3

AM1

C1.3

C1.2

C1.1

Hadoop Client 1

Hadoop Client 2

create app2

submit app1

submit app2

create app1

ASM Schedulerqueues

ASM Containers

NM ASM

Scheduler Resources

.......negotiates.......

.......reports to.......

.......partitions.......

ResourceManager

status report

Page 19: Hadoop 1.x vs 2

Hadoop 2 - Security

FFIIRREEWWAALLLL

LDAP/AD

Knox Gateway Cluster

KDC

Hadoop Cluster

Enterprise/Cloud SSO Provider

JDBC ClientJDBC Client

REST ClientREST Client

FFIIRREEWWAALLLL

DMZ

Browser(HUE)Browser(HUE)Native Hive/HBase Native Hive/HBase

EncryptionEncryption

Page 20: Hadoop 1.x vs 2

Hadoop 2 - APIs

org.apache.hadoop.yarn.api.ApplicationClientProtocol

org.apache.hadoop.yarn.api.ApplicationMasterProtocol

org.apache.hadoop.yarn.api.ContainerManagementProtocol

Page 21: Hadoop 1.x vs 2

Resources

http://hortonworks.com/products/hortonworks-sandbox/

http://hortonworks.com/products/hdp-2/

http://hortonworks.com/resources/

http://hadoopsummit.org/san-jose/

Page 22: Hadoop 1.x vs 2

Hadoop Summit 2014

Page 23: Hadoop 1.x vs 2

Thank you!www.linkedin.com/in/rommelgarcia

twitter.com/rommelgarcia

[email protected]

Hortonworks