Download - Hadoop 1.x vs 2
![Page 1: Hadoop 1.x vs 2](https://reader038.vdocuments.mx/reader038/viewer/2022102922/54c654404a7959b1098b465a/html5/thumbnails/1.jpg)
Hadoop 1.x vs Hadoop 2
Rommel Garcia Solutions Engineer - Big Data
Hortonworks
![Page 2: Hadoop 1.x vs 2](https://reader038.vdocuments.mx/reader038/viewer/2022102922/54c654404a7959b1098b465a/html5/thumbnails/2.jpg)
Transition To Big Data
Relational Dimensional(EDW)
Big Data
![Page 3: Hadoop 1.x vs 2](https://reader038.vdocuments.mx/reader038/viewer/2022102922/54c654404a7959b1098b465a/html5/thumbnails/3.jpg)
Data Explosion
![Page 4: Hadoop 1.x vs 2](https://reader038.vdocuments.mx/reader038/viewer/2022102922/54c654404a7959b1098b465a/html5/thumbnails/4.jpg)
3 Design Dimensions
![Page 5: Hadoop 1.x vs 2](https://reader038.vdocuments.mx/reader038/viewer/2022102922/54c654404a7959b1098b465a/html5/thumbnails/5.jpg)
Key Hadoop Data Types
Sentiment
Clickstream
Sensor/Machine
Geographic
Server Logs
Text
![Page 6: Hadoop 1.x vs 2](https://reader038.vdocuments.mx/reader038/viewer/2022102922/54c654404a7959b1098b465a/html5/thumbnails/6.jpg)
Hadoop is NOT
ESB
NoSQL
HPC
Relational
Real-time
The “Jack of all Trades”
![Page 7: Hadoop 1.x vs 2](https://reader038.vdocuments.mx/reader038/viewer/2022102922/54c654404a7959b1098b465a/html5/thumbnails/7.jpg)
Hadoop 1
Limited up to 4,000 nodes per cluster
O(# of tasks in a cluster)
JobTracker bottleneck - resource management, job scheduling and monitoring
Only has one namespace for managing HDFS
Map and Reduce slots are static
Only job to run is MapReduce
![Page 8: Hadoop 1.x vs 2](https://reader038.vdocuments.mx/reader038/viewer/2022102922/54c654404a7959b1098b465a/html5/thumbnails/8.jpg)
Hadoop 1 - Basics
BBBB CCCC AAAA AAAA AAAA
AAAA BBBB CCCC CCCC BBBB
MapReduce (Computation Framework)
HDFS (Storage Framework)
![Page 9: Hadoop 1.x vs 2](https://reader038.vdocuments.mx/reader038/viewer/2022102922/54c654404a7959b1098b465a/html5/thumbnails/9.jpg)
Hadoop 1 - Reading Files
Rack1 Rack2 Rack3 RackN
read file (fsimage/edit)Hadoop Client
NameNode SNameNode
return DNs, block ids, etc.
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
checkpoint
heartbeat/block reportread blocks
![Page 10: Hadoop 1.x vs 2](https://reader038.vdocuments.mx/reader038/viewer/2022102922/54c654404a7959b1098b465a/html5/thumbnails/10.jpg)
Hadoop 1 - Writing Files
Rack1 Rack2 Rack3 RackN
request write (fsimage/edit)Hadoop Client
NameNode SNameNode
return DNs, etc.
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
checkpoint
block reportwrite blocks
replication pipelining
![Page 11: Hadoop 1.x vs 2](https://reader038.vdocuments.mx/reader038/viewer/2022102922/54c654404a7959b1098b465a/html5/thumbnails/11.jpg)
Hadoop 1 - Running Jobs
Rack1 Rack2 Rack3 RackN
Hadoop Client
JobTracker
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
submit job
deploy job
part 0part 0part 0part 0
map
reduce
shuffle
![Page 12: Hadoop 1.x vs 2](https://reader038.vdocuments.mx/reader038/viewer/2022102922/54c654404a7959b1098b465a/html5/thumbnails/12.jpg)
Hadoop 1 - Security
UsersUsersUsersUsers
FFIIRREEWWAALLLL
LDAP/AD
Client Node/Spoke Server
KDC
Hadoop Cluster
authN/authZ
service request
block token
delegate token
* block token is for accessing data
* delegate token is for running jobs
Encryption PluginEncryption Plugin
![Page 13: Hadoop 1.x vs 2](https://reader038.vdocuments.mx/reader038/viewer/2022102922/54c654404a7959b1098b465a/html5/thumbnails/13.jpg)
Hadoop 1 - APIs
org.apache.hadoop.mapreduce.Partitioner
org.apache.hadoop.mapreduce.Mapper
org.apache.hadoop.mapreduce.Reducer
org.apache.hadoop.mapreduce.Job
![Page 14: Hadoop 1.x vs 2](https://reader038.vdocuments.mx/reader038/viewer/2022102922/54c654404a7959b1098b465a/html5/thumbnails/14.jpg)
Hadoop 2
Potentially up to 10,000 nodes per cluster
O(cluster size)
Supports multiple namespace for managing HDFS
Efficient cluster utilization (YARN)
MRv1 backward and forward compatible
Any apps can integrate with Hadoop
Beyond Java
![Page 15: Hadoop 1.x vs 2](https://reader038.vdocuments.mx/reader038/viewer/2022102922/54c654404a7959b1098b465a/html5/thumbnails/15.jpg)
Hadoop 2 - Basics
![Page 16: Hadoop 1.x vs 2](https://reader038.vdocuments.mx/reader038/viewer/2022102922/54c654404a7959b1098b465a/html5/thumbnails/16.jpg)
Hadoop 2 - Reading Files
(w/ NN Federation)
Rack1 Rack2 Rack3 RackN
read file
fsimage/edit copyHadoop Client NN1/ns1
SNameNodeper NN
return DNs, block ids, etc.
DN | NM
DN | NM
DN | NM
DN | NM
DN | NM
DN | NM
DN | NM
DN | NM
DN | NM
DN | NM
DN | NM
DN | NM
checkpoint
register/heartbeat/
block report
read blocks
fs sync Backup NNper NN
checkpoint
NN2/ns2 NN3/ns3 NN4/ns4
or
ns1 ns2 ns3 ns4
dn1, dn2
dn1, dn3
dn4, dn5dn4, dn5
Block Pools
![Page 17: Hadoop 1.x vs 2](https://reader038.vdocuments.mx/reader038/viewer/2022102922/54c654404a7959b1098b465a/html5/thumbnails/17.jpg)
Hadoop 2 - Writing Files
Rack1 Rack2 Rack3 RackN
request write
Hadoop Client
return DNs, etc.
DN | NM
DN | NM
DN | NM
DN | NM
DN | NM
DN | NM
DN | NM
DN | NM
DN | NM
DN | NM
DN | NM
DN | NM
write blocks
replication pipelining
fsimage/edit copyNN1/ns1
SNameNodeper NN
checkpoint
block report
fs sync Backup NNper NN
checkpoint
NN2/ns2 NN3/ns3 NN4/ns4
or
![Page 18: Hadoop 1.x vs 2](https://reader038.vdocuments.mx/reader038/viewer/2022102922/54c654404a7959b1098b465a/html5/thumbnails/18.jpg)
Hadoop 2 - Running Jobs
RackN
NodeManager
NodeManager
NodeManager
Rack2
NodeManager
NodeManager
NodeManager
Rack1
NodeManager
NodeManager
NodeManager
C2.1
C1.4
AM2
C2.2 C2.3
AM1
C1.3
C1.2
C1.1
Hadoop Client 1
Hadoop Client 2
create app2
submit app1
submit app2
create app1
ASM Schedulerqueues
ASM Containers
NM ASM
Scheduler Resources
.......negotiates.......
.......reports to.......
.......partitions.......
ResourceManager
status report
![Page 19: Hadoop 1.x vs 2](https://reader038.vdocuments.mx/reader038/viewer/2022102922/54c654404a7959b1098b465a/html5/thumbnails/19.jpg)
Hadoop 2 - Security
FFIIRREEWWAALLLL
LDAP/AD
Knox Gateway Cluster
KDC
Hadoop Cluster
Enterprise/Cloud SSO Provider
JDBC ClientJDBC Client
REST ClientREST Client
FFIIRREEWWAALLLL
DMZ
Browser(HUE)Browser(HUE)Native Hive/HBase Native Hive/HBase
EncryptionEncryption
![Page 20: Hadoop 1.x vs 2](https://reader038.vdocuments.mx/reader038/viewer/2022102922/54c654404a7959b1098b465a/html5/thumbnails/20.jpg)
Hadoop 2 - APIs
org.apache.hadoop.yarn.api.ApplicationClientProtocol
org.apache.hadoop.yarn.api.ApplicationMasterProtocol
org.apache.hadoop.yarn.api.ContainerManagementProtocol
![Page 21: Hadoop 1.x vs 2](https://reader038.vdocuments.mx/reader038/viewer/2022102922/54c654404a7959b1098b465a/html5/thumbnails/21.jpg)
Resources
http://hortonworks.com/products/hortonworks-sandbox/
http://hortonworks.com/products/hdp-2/
http://hortonworks.com/resources/
http://hadoopsummit.org/san-jose/