![Page 1: A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers K. V. Rashmi, Nihar Shah, D. Gu, H. Kuang, D. Borthakur,](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0a55c/html5/thumbnails/1.jpg)
1
A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data
Centers
K. V. Rashmi, Nihar Shah, D. Gu, H. Kuang, D.
Borthakur, K. Ramchandran
UC Berkeley, Facebook
ACM SIGCOMM 2014
![Page 2: A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers K. V. Rashmi, Nihar Shah, D. Gu, H. Kuang, D. Borthakur,](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0a55c/html5/thumbnails/2.jpg)
2
A Solution to the Network Challenges of Data Recovery in Erasure-coded Distributed Storage Systems : A Study on the Facebook
Warehouse Cluster
K. V. Rashmi, Nihar Shah, D. Gu, H. Kuang, D. Borthakur, K.
Ramchandran
UC Berkeley, Facebook
The 5th USENIX Workshop on Hot Topics in File and Storage Technologies, HotStorage 2013
http://www.camdemy.com/media/11869
![Page 3: A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers K. V. Rashmi, Nihar Shah, D. Gu, H. Kuang, D. Borthakur,](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0a55c/html5/thumbnails/3.jpg)
3
Outline
• Introduction• Hitchhiker’s erasure code • Evaluation results• Conclusion
![Page 4: A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers K. V. Rashmi, Nihar Shah, D. Gu, H. Kuang, D. Borthakur,](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0a55c/html5/thumbnails/4.jpg)
4
Need for Redundant Storage in Data Centers
• Frequent unavailability events in data centers– Unreliable components– Software glitches, maintenance shutdowns, power
failures, etc.• Redundancy necessary for reliability and
availability
![Page 5: A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers K. V. Rashmi, Nihar Shah, D. Gu, H. Kuang, D. Borthakur,](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0a55c/html5/thumbnails/5.jpg)
5
Popular Approach for Redundant Storage: Replication
• Distributed file systems used in data centers store multiple copies of data on different machines
• Machines typically chosen on different racks– to tolerate rack failures
• E.g., Hadoop Distributed File System (HDFS) stores 3 replicas by default
![Page 6: A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers K. V. Rashmi, Nihar Shah, D. Gu, H. Kuang, D. Borthakur,](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0a55c/html5/thumbnails/6.jpg)
6
• HDFS
![Page 7: A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers K. V. Rashmi, Nihar Shah, D. Gu, H. Kuang, D. Borthakur,](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0a55c/html5/thumbnails/7.jpg)
7
Massive Data Sizes: Need Alternative to Replication
• Small to moderately sized data: disk storage is inexpensive– Replication viable
• No longer true for massive scales of operation– e.g., Facebook data warehouse cluster stores
multiple tens of Petabytes (PBs)
“Erasure codes” are an alternative
![Page 8: A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers K. V. Rashmi, Nihar Shah, D. Gu, H. Kuang, D. Borthakur,](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0a55c/html5/thumbnails/8.jpg)
8
Erasure Codes in Data Centers
• Facebook data warehouse cluster – Uses Reed-Solomon(RS) codes instead of 3
replication on a portion of the data – Savings of multiple Petabytes of storage space
![Page 9: A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers K. V. Rashmi, Nihar Shah, D. Gu, H. Kuang, D. Borthakur,](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0a55c/html5/thumbnails/9.jpg)
block 1
block 2
block 3
block 4
a
b
a+b
parity blocks
data blocks block 1
block 2
block 3
Erasure Codes Replication Reed-Solomon (RS) code
a
a
b
block 4
Redundancy
b
2x
a+2b
2x
http://www.camdemy.com/media/11869
![Page 10: A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers K. V. Rashmi, Nihar Shah, D. Gu, H. Kuang, D. Borthakur,](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0a55c/html5/thumbnails/10.jpg)
block 1
block 2
block 3
block 4
a
b
a+b
a+2b
parity blocks
data blocks block 1
block 2
block 3
block 4
Erasure Codes Replication Reed-Solomon (RS) code
a
a
b
b
Redundancy
First order comparison:
2x
tolerates any one failure
2x
tolerates any two failures
http://www.camdemy.com/media/11869
![Page 11: A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers K. V. Rashmi, Nihar Shah, D. Gu, H. Kuang, D. Borthakur,](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0a55c/html5/thumbnails/11.jpg)
block 1
block 2
block 3
block 4
a
b
a+b
a+2b
parity blocks
data blocks block 1
block 2
block 3
block 4
Erasure Codes Replication Reed-Solomon (RS) code
a
a
b
b
Redundancy
First order comparison:
2x
tolerates any one failure
2x
tolerates any two failures
http://www.camdemy.com/media/11869
![Page 12: A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers K. V. Rashmi, Nihar Shah, D. Gu, H. Kuang, D. Borthakur,](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0a55c/html5/thumbnails/12.jpg)
block 1
block 2
block 3
block 4
a
b
a+b
a+2b
parity blocks
data blocks block 1
block 2
block 3
block 4
Erasure Codes Replication Reed-Solomon (RS) code
a
a
b
b
Redundancy
First order comparison:
2x
tolerates any one failure
2x
tolerates any two failures
http://www.camdemy.com/media/11869
![Page 13: A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers K. V. Rashmi, Nihar Shah, D. Gu, H. Kuang, D. Borthakur,](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0a55c/html5/thumbnails/13.jpg)
block 1
block 2
block 3
block 4
a
b
a+b
a+2b
parity blocks
data blocks block 1
block 2
block 3
block 4
Erasure Codes Replication Reed-Solomon (RS) code
a
a
b
b
Redundancy
First order comparison:
2x
tolerates any one failure
2x
tolerates any two failures
http://www.camdemy.com/media/11869
![Page 14: A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers K. V. Rashmi, Nihar Shah, D. Gu, H. Kuang, D. Borthakur,](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0a55c/html5/thumbnails/14.jpg)
block 1
block 2
block 3
block 4
a
b
a+b
a+2b
parity blocks
data blocks block 1
block 2
block 3
block 4
Erasure Codes Replication Reed-Solomon (RS) code
a
a
b
b
Redundancy
First order comparison:
In general:
2x
Tolerates any one failure
Lower MTTDL (Mean Time To Data Loss), High storage requirement
2x
Tolerates any two failures
Order of magnitude higher MTTDL with much lesser storage
![Page 15: A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers K. V. Rashmi, Nihar Shah, D. Gu, H. Kuang, D. Borthakur,](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0a55c/html5/thumbnails/15.jpg)
15
Erasure Codes
• Using RS codes instead of 3-replication on less-frequently accessed data has led to savings of multiple Petabytes in the Facebook warehouse cluster
• Facebook warehouse cluster employs an (k=10, r=4) RS code, thus resulting in a 1.4x storage requirement
http://www.camdemy.com/media/11869
![Page 16: A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers K. V. Rashmi, Nihar Shah, D. Gu, H. Kuang, D. Borthakur,](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0a55c/html5/thumbnails/16.jpg)
• (#data, #parity) RS code: –tolerates failure of any #parity blocks –these (#data + #parity) blocks constitute a “stripe”
• Facebook warehouse cluster uses a (10, 4)
RS code
#data = 2 (data blocks)
#parity = 2 (parity blocks)
a
b
a+b
a+2b
4 blocks In a stripe
Reed-Solomon (RS) Codes
Example: (2, 2) RS code
http://www.camdemy.com/media/11869
![Page 17: A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers K. V. Rashmi, Nihar Shah, D. Gu, H. Kuang, D. Borthakur,](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0a55c/html5/thumbnails/17.jpg)
17
Existing Systems
• Need additional storage– Huang et al. (Windows Azure) 2012, Sathiamoorthy et al.
(Xorbas) 2013, Esmaili et al. (CORE) 2013• Add additional parities to reduce download
– Hu et al. (NCFS 2011) • Highly restricted parameters – Khan et al. (Rotated-RS) 2012: #parity≤3– Xiang et al., Wang et al. 2010, Hu (NCCloud) et al. 2012:
#parity≤2 – Hitchhiker performs as good or better for these restricted
settings as well
![Page 18: A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers K. V. Rashmi, Nihar Shah, D. Gu, H. Kuang, D. Borthakur,](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0a55c/html5/thumbnails/18.jpg)
18
Erasure codes in Data Centers:HDFS-RAID
Borthakur, “HDFS and Erasure Codes (HDFS-RAID)”Fan, Tantisiriroj, Xiao and Gibson, “DiskReduce: RAID for Data-Intensive Scalable Computing”, PDSW 09
![Page 19: A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers K. V. Rashmi, Nihar Shah, D. Gu, H. Kuang, D. Borthakur,](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0a55c/html5/thumbnails/19.jpg)
19
Erasure codes in Data Centers:HDFS-RAID
Borthakur, “HDFS and Erasure Codes (HDFS-RAID)”Fan, Tantisiriroj, Xiao and Gibson, “DiskReduce: RAID for Data-Intensive Scalable Computing”, PDSW 09
(10, 4) Reed-Solomon code • Any 10 blocks sufficient• Can tolerate any 4 failures
![Page 20: A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers K. V. Rashmi, Nihar Shah, D. Gu, H. Kuang, D. Borthakur,](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0a55c/html5/thumbnails/20.jpg)
20
Impact on Data Center Network
![Page 21: A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers K. V. Rashmi, Nihar Shah, D. Gu, H. Kuang, D. Borthakur,](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0a55c/html5/thumbnails/21.jpg)
21
Impact on Data Center Network
RS codes significantly increase network usage during reconstruction
![Page 22: A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers K. V. Rashmi, Nihar Shah, D. Gu, H. Kuang, D. Borthakur,](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0a55c/html5/thumbnails/22.jpg)
22
Impact on Data Center Network
Burdens the already oversubscribed Top Of Rack(TOR) switch and higher Router
![Page 23: A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers K. V. Rashmi, Nihar Shah, D. Gu, H. Kuang, D. Borthakur,](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0a55c/html5/thumbnails/23.jpg)
23
Machine Unavailability Events
• From HDFS Name Node ‐ logs • Logged when no heart-beat for > 15min– machines unavailable for more than 15 minutes in
a day– 15 minutes is the default wait-time of the cluster
to flag a machine as unavailable• Blocks marked unavailable, periodic recovery
process• The period 22nd Jan. to 24th Feb. 2013
http://hadoop.apache.org/
Rashmi et al., “A Solution to the Network Challenges of Data Recovery in Erasure-coded Storage: A Study on the Facebook Warehouse Cluster”, Usenix HotStorage Workhsop 2013. http://www.camdemy.com/media/11869
![Page 24: A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers K. V. Rashmi, Nihar Shah, D. Gu, H. Kuang, D. Borthakur,](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0a55c/html5/thumbnails/24.jpg)
24
Machine Unavailability Events
Median of ≈50 machine-unavailability events logged per dayhttp://www.camdemy.com/media/11869
![Page 25: A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers K. V. Rashmi, Nihar Shah, D. Gu, H. Kuang, D. Borthakur,](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0a55c/html5/thumbnails/25.jpg)
25
Missing blocks per stripe
Dominant scenario: Single block recoveryhttp://www.camdemy.com/media/11869
![Page 26: A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers K. V. Rashmi, Nihar Shah, D. Gu, H. Kuang, D. Borthakur,](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0a55c/html5/thumbnails/26.jpg)
26
Facebook Data Warehouse Cluster
• Median of 180 TB transferred across racks per day for recovery operations• Reduction of more than 50TB of cross-rack traffic per day• Around 5 times that under 3-replication
http://www.camdemy.com/media/11869
![Page 27: A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers K. V. Rashmi, Nihar Shah, D. Gu, H. Kuang, D. Borthakur,](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0a55c/html5/thumbnails/27.jpg)
27
RS codes: The Good and The Bad
• Maximum possible fault tolerance for given storage overhead– Storage capacity optimal– (“maximum-distance-separable” in coding theory
parlance)• Flexibility in choice of parameters– Supports any number of data and parity blocks
• Not designed to handle reconstruction operations efficiently– Negative impact on the network
![Page 28: A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers K. V. Rashmi, Nihar Shah, D. Gu, H. Kuang, D. Borthakur,](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0a55c/html5/thumbnails/28.jpg)
28
Goal
• To build a system with:
![Page 29: A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers K. V. Rashmi, Nihar Shah, D. Gu, H. Kuang, D. Borthakur,](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0a55c/html5/thumbnails/29.jpg)
29
Hitchhiker
• Is a system with:
![Page 30: A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers K. V. Rashmi, Nihar Shah, D. Gu, H. Kuang, D. Borthakur,](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0a55c/html5/thumbnails/30.jpg)
30
At an Abstract Level
HITCHHIKER
![Page 31: A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers K. V. Rashmi, Nihar Shah, D. Gu, H. Kuang, D. Borthakur,](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0a55c/html5/thumbnails/31.jpg)
Block 1
Block 2
Block 3
Block 4
a1
a2
a1+a2
a1+2a2
1 Byte
b1
b2
b1+b2
b1+2b2
1 Byte
data blocks
parity blocks
Hitchhiker’s Erasure Code: Toy ExampleTake a (2, 2) Reed-Solomon code
http://www.camdemy.com/media/11869
![Page 32: A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers K. V. Rashmi, Nihar Shah, D. Gu, H. Kuang, D. Borthakur,](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0a55c/html5/thumbnails/32.jpg)
Block 1
Block 2
Block 3
Block 4
a1
a2
a1+a2
a1+2a2
b1
b2
b1+b2
b1+2b2
Hitchhiker’s Erasure Code: Toy Example (In (2,2) RS code: recovery download & IO = 4 Bytes)
a2
a1+a2
b2
b1+b2
![Page 33: A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers K. V. Rashmi, Nihar Shah, D. Gu, H. Kuang, D. Borthakur,](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0a55c/html5/thumbnails/33.jpg)
Block 1
Block 2
Block 3
Block 4
a1
a2
a1+a2
a1+2a2
b1
b2
b1+b2
b1+b2 + a1
Hitchhiker’s Erasure Code: Toy Example
Add information from first group on to parities of the second group
No additional storage!
![Page 34: A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers K. V. Rashmi, Nihar Shah, D. Gu, H. Kuang, D. Borthakur,](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0a55c/html5/thumbnails/34.jpg)
Fault-Tolerance (Toy Example)
Same fault tolerance as RS code: can tolerate failure of any 2 nodes
Block 1
Block 2
Block 3
Block 4
a1
a2
a1+a2
a1+2a2
b1
b2
b1+b2
b1+2b2+a1
![Page 35: A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers K. V. Rashmi, Nihar Shah, D. Gu, H. Kuang, D. Borthakur,](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0a55c/html5/thumbnails/35.jpg)
Block 1
Block 2
Block 3
Block 4
a1
a2
a1+a2
a1+2a2
b1
b2
b1+b2
b1+2b2+a1
a1 a2
Fault-Tolerance (Toy Example)
Same fault tolerance as RS code: can tolerate failure of any 2 nodes
![Page 36: A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers K. V. Rashmi, Nihar Shah, D. Gu, H. Kuang, D. Borthakur,](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0a55c/html5/thumbnails/36.jpg)
Block 1
Block 2
Block 3
Block 4
a1
a2
a1+a2
a1+2a2
b1
b2
b1+b2
b1+2b2+a1
a1 a2
subtract
Fault-Tolerance (Toy Example)
Same fault tolerance as RS code: can tolerate failure of any 2 nodes
![Page 37: A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers K. V. Rashmi, Nihar Shah, D. Gu, H. Kuang, D. Borthakur,](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0a55c/html5/thumbnails/37.jpg)
Block 1
Block 2
Block 3
Block 4
a1
a2
a1+a2
a1+2a2
b1
b2
b1+b2
b1+2b2+a1
a1 a2 b1 b2
Fault-Tolerance (Toy Example)
Same fault tolerance as RS code: can tolerate failure of any 2 nodes
![Page 38: A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers K. V. Rashmi, Nihar Shah, D. Gu, H. Kuang, D. Borthakur,](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0a55c/html5/thumbnails/38.jpg)
Block 1
Block 2
Block 3
Block 4
a1
a2
a1+a2
a1+2a2
b1
b2
b1+b2
b1+2b2+a1
Efficient Reconstruction
Data transferred (Download & IO) only 3 Bytes (instead of 4 Bytes as in RS)
![Page 39: A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers K. V. Rashmi, Nihar Shah, D. Gu, H. Kuang, D. Borthakur,](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0a55c/html5/thumbnails/39.jpg)
Block 1
Block 2
Block 3
Block 4
a1
a2
a1+a2
a1+2a2
b1
b2
b1+b2
b1+2b2+a1
b2
b1+b2
b1+2b2+a1
Efficient Reconstruction
Data transferred (Download & IO) only 3 Bytes (instead of 4 Bytes as in RS)
![Page 40: A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers K. V. Rashmi, Nihar Shah, D. Gu, H. Kuang, D. Borthakur,](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0a55c/html5/thumbnails/40.jpg)
Block 1
Block 2
Block 3
Block 4
a1
a2
a1+a2
a1+2a2
b1
b2
b1+b2
b1+2b2+a1
b2
b1+b2
b1+2b2+a1
Subtract
Efficient Reconstruction
Data transferred (Download & IO) only 3 Bytes (instead of 4 Bytes as in RS)
![Page 41: A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers K. V. Rashmi, Nihar Shah, D. Gu, H. Kuang, D. Borthakur,](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0a55c/html5/thumbnails/41.jpg)
Block 1
Block 2
Block 3
Block 4
a1
a2
a1+a2
a1+2a2
b1
b2
b1+b2
b1+2b2+a1
b2
b1+b2
b1+2b2+a1
Subtract
Efficient Reconstruction
Data transferred (Download & IO) only 3 Bytes (instead of 4 Bytes as in RS)
![Page 42: A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers K. V. Rashmi, Nihar Shah, D. Gu, H. Kuang, D. Borthakur,](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0a55c/html5/thumbnails/42.jpg)
42
Hitchhiker’s Erasure Code
• Builds on top of RS codes• Uses our theoretical framework of
“Piggybacking”*• Three versions– XOR– XOR+– Non-XOR
* K.V. Rashmi, Nihar Shah, K. Ramchandran, “A Piggybacking Design Framework for Read-and Download efficient Distributed Storage Codes”, in IEEE International Symposium on Information Theory, 2013.
![Page 43: A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers K. V. Rashmi, Nihar Shah, D. Gu, H. Kuang, D. Borthakur,](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0a55c/html5/thumbnails/43.jpg)
43
Hop and couple (disk layout)
• Way of choosing which bytes to mix– couples bytes farther apart in block– to minimize the degree of discontinuity in disk
reads during data reconstruction• Translate savings in network-transfer to
savings in disk-IO as well– By making reads contiguous
![Page 44: A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers K. V. Rashmi, Nihar Shah, D. Gu, H. Kuang, D. Borthakur,](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0a55c/html5/thumbnails/44.jpg)
44
RS vs Hitchhiker from the Network’s Perspective…
![Page 45: A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers K. V. Rashmi, Nihar Shah, D. Gu, H. Kuang, D. Borthakur,](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0a55c/html5/thumbnails/45.jpg)
45
Data Transfer during Reconstruction in RS-based System
Transfer: 10 full blocksConnect to 10 machines
![Page 46: A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers K. V. Rashmi, Nihar Shah, D. Gu, H. Kuang, D. Borthakur,](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0a55c/html5/thumbnails/46.jpg)
46
Data Transfer during Reconstruction in Hitchhiker
Reconstruction of data blocks 1-9:
Transfer: 2 full blocks + 9 half blocks (=6.5 blocks total) Connect to 11 machines
![Page 47: A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers K. V. Rashmi, Nihar Shah, D. Gu, H. Kuang, D. Borthakur,](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0a55c/html5/thumbnails/47.jpg)
47
Data Transfer during Reconstruction in Hitchhiker
Reconstruction of data block 10:
Transfer: 13 half blocks (=6.5 blocks total)Connect to 13 machines
![Page 48: A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers K. V. Rashmi, Nihar Shah, D. Gu, H. Kuang, D. Borthakur,](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0a55c/html5/thumbnails/48.jpg)
48
Hop-and-Couple
• Technique to pair bytes under Hitchhiker’s erasure code
• Makes disk reads during reconstruction contiguous
![Page 49: A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers K. V. Rashmi, Nihar Shah, D. Gu, H. Kuang, D. Borthakur,](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0a55c/html5/thumbnails/49.jpg)
49
Hop-and-Couplehop length = 1 byte = 1
Figure 7: Two ways of coupling bytes to form stripes for Hitchhiker's erasure code. The shaded bytes are read and downloaded for the reconstruction of the first unit. While both methods require the same amount of data to be read, the reading is discontiguous in (a), while (b) ensures that the data to be read is contiguous.
![Page 50: A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers K. V. Rashmi, Nihar Shah, D. Gu, H. Kuang, D. Borthakur,](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0a55c/html5/thumbnails/50.jpg)
50
Hop-and-Couplehop length = half the size of a unit
Figure 7: Two ways of coupling bytes to form stripes for Hitchhiker's erasure code. The shaded bytes are read and downloaded for the reconstruction of the first unit. While both methods require the same amount of data to be read, the reading is discontiguous in (a), while (b) ensures that the data to be read is contiguous.
![Page 51: A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers K. V. Rashmi, Nihar Shah, D. Gu, H. Kuang, D. Borthakur,](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0a55c/html5/thumbnails/51.jpg)
51
Hitchhiker’s Erasure Code
Figure 2: Two stripes of a (k=10, r=4) Reed-Solomon (RS) code. Ten units of data (first ten rows) are encoded using the RS code to generate four parity units (last four rows).
![Page 52: A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers K. V. Rashmi, Nihar Shah, D. Gu, H. Kuang, D. Borthakur,](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0a55c/html5/thumbnails/52.jpg)
52
Hitchhiker’s Erasure Code
Figure 3: The theoretical framework of Piggybacking [22] for parameters (k=10, r=4). Each row represents one unit of data.
Figure 2: Two stripes of a (k=10, r=4) Reed-Solomon (RS) code. Ten units of data (first ten rows) are encoded using the RS code to generate four parity units (last four rows).
![Page 53: A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers K. V. Rashmi, Nihar Shah, D. Gu, H. Kuang, D. Borthakur,](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0a55c/html5/thumbnails/53.jpg)
53
Hitchhiker-XOR
• The XOR-only feature of these erasure codes significantly reduces the computational complexity of decoding, making degraded reads and failure recovery faster.
• Hitchhiker's erasure code optimizes only the reconstruction of data units; reconstruction of parity units is performed as in RS codes.
![Page 54: A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers K. V. Rashmi, Nihar Shah, D. Gu, H. Kuang, D. Borthakur,](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0a55c/html5/thumbnails/54.jpg)
54
Hitchhiker-XOR
Figure 4: Hitchhiker-XOR code for (k=10, r=4). Each row represents one unit of data.
![Page 55: A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers K. V. Rashmi, Nihar Shah, D. Gu, H. Kuang, D. Borthakur,](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0a55c/html5/thumbnails/55.jpg)
55
Hitchhiker-XOR+
• Hitchhiker-XOR+ reduces the amount of data required for reconstruction and employs only additional XOR operations.
• This property, which we term the all-XOR-parity property, requires at least one parity function of the RS code to be an XOR of all the data units.
![Page 56: A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers K. V. Rashmi, Nihar Shah, D. Gu, H. Kuang, D. Borthakur,](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0a55c/html5/thumbnails/56.jpg)
56
Hitchhiker-XOR+
Figure 5: Hitchhiker-XOR+ for (k=10, r=4). Parity 2 of the underlying RS code is all-XOR.
![Page 57: A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers K. V. Rashmi, Nihar Shah, D. Gu, H. Kuang, D. Borthakur,](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0a55c/html5/thumbnails/57.jpg)
57
Hitchhiker-nonXOR
• Hitchhiker-nonXOR guarantees the same savings as Hitchhiker-XOR+ even when the underlying RS code does not possess the all-XOR-parity property, but at the cost of additional finite-field arithmetic.
• Hitchhiker-nonXOR can be built on top of any RS code.
![Page 58: A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers K. V. Rashmi, Nihar Shah, D. Gu, H. Kuang, D. Borthakur,](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0a55c/html5/thumbnails/58.jpg)
58
Hitchhiker-nonXOR
Figure 6: Hitchhiker-nonXOR code for (k=10, r=4). This can be built on any RS code. Each row is one unit of data.
![Page 59: A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers K. V. Rashmi, Nihar Shah, D. Gu, H. Kuang, D. Borthakur,](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0a55c/html5/thumbnails/59.jpg)
59
Implementation & Evaluation Setup(1)
• Implemented on top of HDFS-RAID– Erasure coding module in HDFS based on RS– Used in the Facebook data warehouse cluster
• Deployed and tested on a 60-machine test cluster at Facebook– Verified 35% reduction in the network transfers
during reconstruction
![Page 60: A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers K. V. Rashmi, Nihar Shah, D. Gu, H. Kuang, D. Borthakur,](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0a55c/html5/thumbnails/60.jpg)
60
Implementation & Evaluation Setup(2)
• Evaluation of timing metrics on the Facebook data warehouse cluster in production– under real-time production traffic and workloads– using Map-Reduce to run encoding and
reconstruction jobs, just as HDFS-RAID
![Page 61: A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers K. V. Rashmi, Nihar Shah, D. Gu, H. Kuang, D. Borthakur,](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0a55c/html5/thumbnails/61.jpg)
61
Decoding Time
• RS decoding on only half portion of the blocks• Faster computation for degraded reads and recovery• XOR versions: 25% lesser than non-XOR
36% reduction
![Page 62: A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers K. V. Rashmi, Nihar Shah, D. Gu, H. Kuang, D. Borthakur,](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0a55c/html5/thumbnails/62.jpg)
62
Read &Transfer Time
• Read & transfer time 30% lower in Hitchhiker (HH)• Similar reduction for other block sizes(4、 64MB) as well
35% less
![Page 63: A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers K. V. Rashmi, Nihar Shah, D. Gu, H. Kuang, D. Borthakur,](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0a55c/html5/thumbnails/63.jpg)
63
MedianThe 95th percentile of the read time
![Page 64: A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers K. V. Rashmi, Nihar Shah, D. Gu, H. Kuang, D. Borthakur,](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0a55c/html5/thumbnails/64.jpg)
64
MedianThe 95th percentile of the read time
![Page 65: A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers K. V. Rashmi, Nihar Shah, D. Gu, H. Kuang, D. Borthakur,](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0a55c/html5/thumbnails/65.jpg)
65
Encoding Time
Benefits outweigh higher encoding cost in many systems (e.g., HDFS):• encoding of raw data into erasure-coded data is one time operation• often run as a background job
72% higher
![Page 66: A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers K. V. Rashmi, Nihar Shah, D. Gu, H. Kuang, D. Borthakur,](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0a55c/html5/thumbnails/66.jpg)
66
Hitchhiker
![Page 67: A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers K. V. Rashmi, Nihar Shah, D. Gu, H. Kuang, D. Borthakur,](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0a55c/html5/thumbnails/67.jpg)
67
Conclusion
• We present Hitchhiker, an new erasure-coded storage system.
• Hitchhiker reduces both network and disk traffic during reconstruction by 25% to 45% as RS-based systems.
![Page 68: A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers K. V. Rashmi, Nihar Shah, D. Gu, H. Kuang, D. Borthakur,](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0a55c/html5/thumbnails/68.jpg)
68
References• [6] A. G. Dimakis, P. B. Godfrey, Y. Wu, M. Wainwright, and K. Ramchandran. Network
coding for distributed storage systems. IEEE Trans. Inf. Th., Sept. 2010.• [17] S. Mahesh, M. Asteris, D. Papailiopoulos, A. G. Dimakis, R. Vadali, S. Chen, and D.
Borthakur. Xoring elephants: Novel erasure codes for big data. In VLDB, 2013.• [19] D. Papailiopoulos, A. Dimakis, and V. Cadambe. Repair optimal erasure codes
through hadamard designs. IEEE Trans. Inf. Th., May 2013.• [20] K. V. Rashmi, N. B. Shah, D. Gu, H. Kuang, D. Borthakur, and K. Ramchandran. A
solution to the network challenges of data recovery in erasure-coded distributed storage systems: A study on the Facebook warehouse cluster. In Proc. USENIX HotStorage, June 2013.
• [21] K. V. Rashmi, N. B. Shah, and P. V. Kumar. Optimal exact-regenerating codes for the MSR and MBR points via a product-matrix construction. IEEE Trans. Inf. Th., 2011.
• [22] K. V. Rashmi, N. B. Shah, and K. Ramchandran. A piggybacking design framework for read-and download-ecient distributed storage codes. In IEEE International Symposium on Information Theory, 2013.
• [26] N. Shah, K. Rashmi, P. Kumar, and K. Ramchandran. Distributed storage codes with repair-by-transfer and non-achievability of interior points on the storage-bandwidth tradeoff. IEEE Trans. Inf. Theory, 2012.
![Page 69: A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers K. V. Rashmi, Nihar Shah, D. Gu, H. Kuang, D. Borthakur,](https://reader035.vdocuments.mx/reader035/viewer/2022062421/56649d345503460f94a0a55c/html5/thumbnails/69.jpg)
69
Figure 9: Data read patterns during reconstruction of blocks 1, 4 and 10 in Hitchhiker-XOR+: the shaded bytes are read and downloaded.