did you really_want_that_data

19
Problem: is the data safe? Memory Network Storage Summary Did you really want that data? Steve Loughran 1 1 Hewlett-Packard Laboratories Bristol Hadoop & NoSQL Workshop, September 2011 Steve Loughran Did you really want that data?

Upload: steve-loughran

Post on 12-Nov-2014

3.222 views

Category:

Technology


0 download

DESCRIPTION

A review of past work on data loss in large scale systems and a discussion on its implications for Apache Hadoop, including proposals for operations processes and future source code improvements.

TRANSCRIPT

Page 1: Did you really_want_that_data

Problem: is the data safe?MemoryNetworkStorage

Summary

Did you really want that data?

Steve Loughran1

1Hewlett-Packard Laboratories

Bristol Hadoop & NoSQL Workshop, September 2011

Steve Loughran Did you really want that data?

Page 2: Did you really_want_that_data

Problem: is the data safe?MemoryNetworkStorage

Summary

Outline

1 Problem: is the data safe?

2 Memory

3 Network

4 Storage

Steve Loughran Did you really want that data?

Page 3: Did you really_want_that_data

Problem: is the data safe?MemoryNetworkStorage

Summary

Is data in Apache™ Hadoop™ safe?

1 Can data get lost or corrupted?2 Can this be detected and corrected?3 Where are the risks: RAM, Network, Storage?4 What about Hadoop itself?5 What can we do about this?

Steve Loughran Did you really want that data?

Page 4: Did you really_want_that_data

Problem: is the data safe?MemoryNetworkStorage

Summary

Memory: risk grows linearly per GB of RAM

Microsoft 2011 study of consumer PCs: correlation withoverclocking & CPU cycles. P(recurrence) = 0.3.[Nightingale 2011].CERN: 1-bit errors unexpectedly low; 2-bit errors found whennone expected [Panzer-Steindel 2007].Google: recurrent “hard” errors dominate; “chip-kill” best ECC.8% of DRAMs have ≥1 error/year. [Schroeder 2009].

“In more than 85% of the cases a correctable error isfollowed by at least one more correctable error in thesame month”

Steve Loughran Did you really want that data?

Page 5: Did you really_want_that_data

Problem: is the data safe?MemoryNetworkStorage

Summary

Memory: Reduce the risk

1 Use Chip-kill/Chipspare ECC for the servers that matter.2 Burn-in tests to read/write memory patterns.3 Monitor ECC failures and swap DIMMs with recurrent

problems.4 Data-scrubbing for main memory?

Steve Loughran Did you really want that data?

Page 6: Did you really_want_that_data

Problem: is the data safe?MemoryNetworkStorage

Summary

Network Issues

Risk: undetected corruption

Ethernet: flawed CRC32 - [Stone 2000].IPv4: CRC32.IPv6: no checksum.TCP + UDP : weak 16 bit additive sum only.HTTP: optional content-length header.

Steve Loughran Did you really want that data?

Page 7: Did you really_want_that_data

Problem: is the data safe?MemoryNetworkStorage

Summary

Recommendations for Hadoop

1 Explore Jumbo Ethernet frames.2 Servlets/JSP pages to send content-length headers.3 HTTP client code to verify content-length.4 Consider Content-MD5 header [RFC1864]5 Servlets/JSP pages to disable caching.

Steve Loughran Did you really want that data?

Page 8: Did you really_want_that_data

Problem: is the data safe?MemoryNetworkStorage

Summary

Storage

HDFS saves data to disks in 64-2048 MB Blocks.Each block is replicated to three or more servers (usually).Each block has a CRC checksum stored alongside it.Block checksums are checked on block reads.Blocks are verified in idle times; once/week is apparentlynormal.Checksum failures trigger re-replication of good copies.

Steve Loughran Did you really want that data?

Page 9: Did you really_want_that_data

Problem: is the data safe?MemoryNetworkStorage

Summary

Hard Disk Drives: The Good, the Bad and the Ugly[Elerath 2007]

Head fly heightHead/platter contact: scratches and thermal asperitiesSide-Track Erasure: may increase with densityTracking problems: “sector not found”Drive electronics

Steve Loughran Did you really want that data?

Page 10: Did you really_want_that_data

Problem: is the data safe?MemoryNetworkStorage

Summary

Are Disks the Dominant Contributor for Storage Failures?Jiang 2008: Breakdown of failures in Netapp server classes

Steve Loughran Did you really want that data?

Page 11: Did you really_want_that_data

Problem: is the data safe?MemoryNetworkStorage

Summary

My interpretation of [Jiang 2008]

1 Disk failures are only a part of the problem.2 High end interconnects only help once they’ve got redundancy.3 RAID assumes the SPOF is the disk and only corrects for that.

Steve Loughran Did you really want that data?

Page 12: Did you really_want_that_data

Problem: is the data safe?MemoryNetworkStorage

Summary

Between HDD and HDFS

Physical Interconnect [Jiang 2008]Controller/HDD incompatibilities [Panzer-Steindel 2007,Ghemawat 2003]DMAOperating SystemDevice DriversOS-level Filesystem

Steve Loughran Did you really want that data?

Page 13: Did you really_want_that_data

Problem: is the data safe?MemoryNetworkStorage

Summary

Hadoop Itself

Risk of bugs in Hadoop HDFS library

Race conditionsIntra-server checksums, not inter-serverPast: HDFS replicator under-replicatingPre 0.20.204: append unreliable

Risk of bugs in JVM versionsNamenode log corruptionOnly HDFS data is scrubbed; less efficient than SCSI/SATAVERIFY commands

Steve Loughran Did you really want that data?

Page 14: Did you really_want_that_data

Problem: is the data safe?MemoryNetworkStorage

Summary

Is your data safe?

P(single-block-corrupt) may increase with larger block sizes.Time to recover increases with larger block sizes (disk I/Obound for a single block).P(all-blocks-corrupt) may increase with larger block sizesSome compression schemes (gzip) hard to recover from singlebit corruption.How will the layers up the Hadoop stack cope?

Yahoo! show risk is small, but non-zero [Radia 2011].

Steve Loughran Did you really want that data?

Page 15: Did you really_want_that_data

Problem: is the data safe?MemoryNetworkStorage

Summary

User actions

One immediate optionReplicate critical files at 4x.

Steve Loughran Did you really want that data?

Page 16: Did you really_want_that_data

Problem: is the data safe?MemoryNetworkStorage

Summary

Operational actions

1 Burn-in tests.2 Monitor SMART errors and any reported corrupt blocks.3 Decommission any disk with SMART errors or corrupt blocks

offline immediately.4 Test outside Hadoop: SATA VERIFY, others?5 Use LZO compression.6 Use ext4 w/ journalling.7 Share stats w/ other HDFS users.

One issue: if its an interconnect problem, does a disk swap fix it?

Steve Loughran Did you really want that data?

Page 17: Did you really_want_that_data

Problem: is the data safe?MemoryNetworkStorage

Summary

HDFS Source Enhancements

Add option to SATA VERIFY critical data after a write.Add methods decommission/recommission a single drive.Add monitoring of spill data corruption.Tools to recover from corrupted LZO files.Tool for a bit-by-vote over 3+ inconsistent filesAdd a means to have small block sequences hosted on samenode (assuming Federated HDFS enables small blocks in largeclusters).Leave corrupt blocks alone to prevent physical sector reuse?

Steve Loughran Did you really want that data?

Page 18: Did you really_want_that_data

Problem: is the data safe?MemoryNetworkStorage

Summary

Proposal: HDFS Fault Injection

Deliberately corrupt mutiple HDFS block replicas.Hand corrupt blocks to the layers above.Report: errors, timeouts on read/write operations.Encourage upper layers to support & test recovery

Steve Loughran Did you really want that data?

Page 19: Did you really_want_that_data

Problem: is the data safe?MemoryNetworkStorage

Summary

Summary: data can get corrupted

Memory is a problem but low risk outside the SPOFs.Networking is manageable.Storage is the threat —all the way down.Operations tactics can mitigate this.Scope to improve HDFS; Hadoop networking.Fault injection could stress upper layers better.

Steve Loughran Did you really want that data?