availability and integrity in hadoop (strata eu edition)
DESCRIPTION
Strata EU conference slides on HA in Hadoop; demo omitted. Longer slideset to followTRANSCRIPT
© Hortonworks Inc. 2012
Data Availability and Integrityin Apache Hadoop
Steve Loughran @[email protected]
© Hortonworks Inc. 2012
Questions Hadoop Ops teams ask
•Can Hadoop keep my data safe?
•Can Hadoop keep my data available?
•What happens when things go wrong?
•Can you improve this?
Page 2
© Hortonworks Inc. 2012Page 3
DataNode
DataNode
DataNode
DataNode
ToR Switch
DataNode
DataNode
DataNode
DataNode
ToR Switch
Switch
(Job Tracker)
ToR Switch
2ary Name Node
Name Node
file
block1block2block3…
Can Hadoop Keep My Data Safe?
© Hortonworks Inc. 2012
Replication handles data integrity
•CRC32 checksum per 512 bytes•Verified across datanodes on write•Verified on all reads•Background verification of all blocks (~weekly)•Corrupt blocks re-replicated•All replicas corrupt operations team intervention
2009: Yahoo! lost 19 out of 329M blocks on 20K servers –bugs now fixed
Page 4
© Hortonworks Inc. 2012Page 5
DataNode
DataNode
DataNode
DataNode
ToR Switch
DataNode
DataNode
DataNode
DataNode
ToR Switch
Switch
(Job Tracker)
ToR Switch
2ary Name Node
Name Node
file
block1block2block3…
Harder: Switch failure
© Hortonworks Inc. 2012
Bonded 1 GbE >1 switchAvoids hardware problems, not software
Page 6
© Hortonworks Inc. 2012
NameNode failure rare but costs
Page 7
ToR Switch
2ary Name Node
2. Bring up new NameNode server-with same IP-or restart DataNodes
(Secondary NN receives streamed journal and checkpoints filesystem image)
Shared storage for filesystem image and journal ("edit log")
1. Try to reboot/restart
Yahoo!: 22 NameNode failures on 25 clusters in 18 months = .99999 availability
Name Node
Name NodeNN IP
NN IP
© Hortonworks Inc. 2012
What to improve
•Address costs of NameNode failure in Hadoop 1
•Add live NN failover (HDFS 2.0)
•Eliminate shared storage (HDFS 2.x)
•Add resilience to the entire stack
Page 8
© Hortonworks Inc. 2012Page 9
Full Stack HAadd resilience to planned/unplanned outages of layers underneath
© Hortonworks Inc. 2012Page 10
HA in Hadoop 1 (HDP1)Use existing HA clustering technologies to add cold failover of key manager services:
VMWare vSphere HARedHat HA Linux
© Hortonworks Inc. 2012Page 11
(Job Tracker)
2ary Name Node
Name Node
RedHat HA Linux
IP1
IP2
NN IP
ToR Switches
Name Node
NN IP
IP3
IP4
2NN IP
JT IP
HA Linux: heartbeats & failover
DataNode
DataNode
DataNode
DataNode
DataNode
DataNode
DataNode
DataNode
© Hortonworks Inc. 2012
Linux HA Implementation
•Replace init.d script with “Resource Agent” script
•Probe deep state of HDFS, Job Tracker
•Detection & handling of hung process hard
•Test in virtual + physical environments
•Testing with physical clusters
Page 12
© Hortonworks Inc. 2012
Yes, but does it work?
public void testKillHungNN() { assertRestartsHDFS { nnServer.kill(19, "/var/run/hadoop/hadoop-hadoop-namenode.pid") }}
Page 13
Groovy JUnit tests“Tools of Chaos” to break remote hosts and infrastructures
© Hortonworks Inc. 2012
And how long does it take?
Small cluster: 1-3 minutes
Medium Cluster: 2-4 Minutes
Where Medium == A Petabyte or less
14
Cold Failover is good enough for small/medium clusters
© Hortonworks Inc. 2012
“Full Stack”: IPC client
Configurable retry & time to blockipc.client.connect.max.retriesdfs.client.retry.policy.enabled
1. Blocking works for most clients (HBase, Pig…)
2. Failure-aware applications can tune/disable
3. Job tracker added “Safe Mode” for outages
Page 15
© Hortonworks Inc. 2012
Putting it all together: Demo
Page 16
© Hortonworks Inc. 2012
HA in Hadoop HDFS 2
Page 17
© Hortonworks Inc. 2012Page 18
DataNode
DataNodeNN
NN
Hadoop 2.0 HA
IP1
IP2
Active
Failure-Controller
Failure-Controller
Zoo-Keeper
Zoo-Keeper
Zoo-Keeper
Standby
Active
ActiveStandby
Standby
Active
© Hortonworks Inc. 2012
When will HDFS 2 be ready?Moving from alpha to beta ... production in 2013
Download and play with early releases!
Page 19
© Hortonworks Inc. 2012
Moving forward
•Retry policies for all remote client protocols/libraries in the stack.
•Dynamic (zookeeper?) service lookup
•YARN needs HA of Resource Manager, individual MR clusters
• “No more Managers”
Page 20
© Hortonworks Inc. 2012
Summary
•HDFS handles corruption and partial loss of data today
•Hadoop 1 now has cold failover for small/medium clusters
•Hadoop 2 adding hot failover
•Full Stack HA for resilience to outages
Page 21
© Hortonworks Inc. 2012
Single Points of Failure
There's always a SPOF
Q. How do you find it?
A. It finds you
Page 22
© Hortonworks Inc. 2012Page 23