![Page 1: Megasthenis Asteris Alex Dimakismegasthenis.github.io/repository/ISIT2012-Repairable... · 2016. 5. 17. · Megasthenis Asteris Alex Dimakis . Distributed Storage Cluster of machines](https://reader035.vdocuments.mx/reader035/viewer/2022071513/6133c6bedfd10f4dd73b4eb3/html5/thumbnails/1.jpg)
Repairable Fountain Codes
Megasthenis Asteris Alex Dimakis
![Page 2: Megasthenis Asteris Alex Dimakismegasthenis.github.io/repository/ISIT2012-Repairable... · 2016. 5. 17. · Megasthenis Asteris Alex Dimakis . Distributed Storage Cluster of machines](https://reader035.vdocuments.mx/reader035/viewer/2022071513/6133c6bedfd10f4dd73b4eb3/html5/thumbnails/2.jpg)
Distributed Storage
Cluster of machines running Hadoop at Yahoo! (Source: Yahoo!)
• Failures are the norm. • We need to protect the data: Introduce redundancy • Already using Erasure Codes (e.g. Reed Solomon)
![Page 3: Megasthenis Asteris Alex Dimakismegasthenis.github.io/repository/ISIT2012-Repairable... · 2016. 5. 17. · Megasthenis Asteris Alex Dimakis . Distributed Storage Cluster of machines](https://reader035.vdocuments.mx/reader035/viewer/2022071513/6133c6bedfd10f4dd73b4eb3/html5/thumbnails/3.jpg)
Problem Description
• Systematic form Input is part of the output
Create a linear code with the following properties:
![Page 4: Megasthenis Asteris Alex Dimakismegasthenis.github.io/repository/ISIT2012-Repairable... · 2016. 5. 17. · Megasthenis Asteris Alex Dimakis . Distributed Storage Cluster of machines](https://reader035.vdocuments.mx/reader035/viewer/2022071513/6133c6bedfd10f4dd73b4eb3/html5/thumbnails/4.jpg)
Problem Description
• Rateless property Columns created independently
...
→ ∞
![Page 5: Megasthenis Asteris Alex Dimakismegasthenis.github.io/repository/ISIT2012-Repairable... · 2016. 5. 17. · Megasthenis Asteris Alex Dimakis . Distributed Storage Cluster of machines](https://reader035.vdocuments.mx/reader035/viewer/2022071513/6133c6bedfd10f4dd73b4eb3/html5/thumbnails/5.jpg)
Problem Description
• MDS property Any k columns are full rank = 0
= 0
![Page 6: Megasthenis Asteris Alex Dimakismegasthenis.github.io/repository/ISIT2012-Repairable... · 2016. 5. 17. · Megasthenis Asteris Alex Dimakis . Distributed Storage Cluster of machines](https://reader035.vdocuments.mx/reader035/viewer/2022071513/6133c6bedfd10f4dd73b4eb3/html5/thumbnails/6.jpg)
Problem Description
• Good Locality A column is a linear combination
of at most l other columns.
x1
x2
x3
x4
x5
p1p2
![Page 7: Megasthenis Asteris Alex Dimakismegasthenis.github.io/repository/ISIT2012-Repairable... · 2016. 5. 17. · Megasthenis Asteris Alex Dimakis . Distributed Storage Cluster of machines](https://reader035.vdocuments.mx/reader035/viewer/2022071513/6133c6bedfd10f4dd73b4eb3/html5/thumbnails/7.jpg)
Problem Description
• Systematic form
• Rateless property
• MDS property
• Good Locality
Summarizing…
...
→ ∞
![Page 8: Megasthenis Asteris Alex Dimakismegasthenis.github.io/repository/ISIT2012-Repairable... · 2016. 5. 17. · Megasthenis Asteris Alex Dimakis . Distributed Storage Cluster of machines](https://reader035.vdocuments.mx/reader035/viewer/2022071513/6133c6bedfd10f4dd73b4eb3/html5/thumbnails/8.jpg)
Approach • Systematic
• Sparsity is a measure of locality: The sparser the parities the better
the locality.
Q How sparse can parities be?
• Sparse Parities
• Good Locality
+
=
![Page 9: Megasthenis Asteris Alex Dimakismegasthenis.github.io/repository/ISIT2012-Repairable... · 2016. 5. 17. · Megasthenis Asteris Alex Dimakis . Distributed Storage Cluster of machines](https://reader035.vdocuments.mx/reader035/viewer/2022071513/6133c6bedfd10f4dd73b4eb3/html5/thumbnails/9.jpg)
Approach
However... • Systematic MDS codes can afford no zero in the parity columns.
• MDS cannot have Sparse Parities.
• Relax MDS property: • (1+ε) k symbols should suffice for decoding.
![Page 10: Megasthenis Asteris Alex Dimakismegasthenis.github.io/repository/ISIT2012-Repairable... · 2016. 5. 17. · Megasthenis Asteris Alex Dimakis . Distributed Storage Cluster of machines](https://reader035.vdocuments.mx/reader035/viewer/2022071513/6133c6bedfd10f4dd73b4eb3/html5/thumbnails/10.jpg)
Approach
• Systematic
• Rateless
• MDS property (1+ε)k suffice for decoding
Fountain Codes
ε should be arbitrarily small.
• Sparse Parities
In light of the observations, we want:
...
→ ∞
![Page 11: Megasthenis Asteris Alex Dimakismegasthenis.github.io/repository/ISIT2012-Repairable... · 2016. 5. 17. · Megasthenis Asteris Alex Dimakis . Distributed Storage Cluster of machines](https://reader035.vdocuments.mx/reader035/viewer/2022071513/6133c6bedfd10f4dd73b4eb3/html5/thumbnails/11.jpg)
Prior work
• [Gummadi]– Systematic LT/Raptor codes – ε cannot be arbitrarily small.
• [Shokrollahi] -Raptor Codes
- Systematic version - Parities no longer sparse in the input
![Page 12: Megasthenis Asteris Alex Dimakismegasthenis.github.io/repository/ISIT2012-Repairable... · 2016. 5. 17. · Megasthenis Asteris Alex Dimakis . Distributed Storage Cluster of machines](https://reader035.vdocuments.mx/reader035/viewer/2022071513/6133c6bedfd10f4dd73b4eb3/html5/thumbnails/12.jpg)
Our construction
...d(k)
...
...ik
j
×ci,j
...
...i
• k input symbols
• Systematic part
• For each parity:
- Choose d(k) symbols.
- Choose cij ∼ U [0 . . . q)
How small can the degree d(k), of the encoded symbols be?
![Page 13: Megasthenis Asteris Alex Dimakismegasthenis.github.io/repository/ISIT2012-Repairable... · 2016. 5. 17. · Megasthenis Asteris Alex Dimakis . Distributed Storage Cluster of machines](https://reader035.vdocuments.mx/reader035/viewer/2022071513/6133c6bedfd10f4dd73b4eb3/html5/thumbnails/13.jpg)
Results
...d(k)
...
...ik
j
×ci,j
...
...i
d(k) = c ln (k) (c ∝ 1/)
⇒(1 + ) k random columns
of G are lin. indep.w.p. k/q close to 1.
Theorem 2
Decoding w.h.p.⇒
d(k) = Ω (ln (k))
Theorem 1
Coupon Collector
![Page 14: Megasthenis Asteris Alex Dimakismegasthenis.github.io/repository/ISIT2012-Repairable... · 2016. 5. 17. · Megasthenis Asteris Alex Dimakis . Distributed Storage Cluster of machines](https://reader035.vdocuments.mx/reader035/viewer/2022071513/6133c6bedfd10f4dd73b4eb3/html5/thumbnails/14.jpg)
Proving Theorem 2
[Kovalenko & Levitskaya, Cooper, Karp, …. ]
• Analyze the rank of a random matrix.
• But here, we have an arbitrary number of systematic columns and a random part (parities)
(1 + )k
![Page 15: Megasthenis Asteris Alex Dimakismegasthenis.github.io/repository/ISIT2012-Repairable... · 2016. 5. 17. · Megasthenis Asteris Alex Dimakis . Distributed Storage Cluster of machines](https://reader035.vdocuments.mx/reader035/viewer/2022071513/6133c6bedfd10f4dd73b4eb3/html5/thumbnails/15.jpg)
Proof – Part 1
No
rank(GK) < k
Yes
Yes No rank(GK) = kBad Coeffs?
∃M?
• Focus on k x k submatrix.
• Full rank corresponds to a perfect matching on bipartite graph
• (whp using Edmond’s theorem and Schwartz-Zippel
[ Ho et al. ]
Schwart-Zippel
![Page 16: Megasthenis Asteris Alex Dimakismegasthenis.github.io/repository/ISIT2012-Repairable... · 2016. 5. 17. · Megasthenis Asteris Alex Dimakis . Distributed Storage Cluster of machines](https://reader035.vdocuments.mx/reader035/viewer/2022071513/6133c6bedfd10f4dd73b4eb3/html5/thumbnails/16.jpg)
Proof – Part 2
- Erdos-Reyni Random matrix result, Hall’s marriage theorem, first moment method.
Key technical result: - There is a perfect matching whp.
![Page 17: Megasthenis Asteris Alex Dimakismegasthenis.github.io/repository/ISIT2012-Repairable... · 2016. 5. 17. · Megasthenis Asteris Alex Dimakis . Distributed Storage Cluster of machines](https://reader035.vdocuments.mx/reader035/viewer/2022071513/6133c6bedfd10f4dd73b4eb3/html5/thumbnails/17.jpg)
Conclusions • We introduced a new family of fountain codes:
– Systematic – Near MDS – With logarithmic locality (easy repair of a single
symbol failure - very useful in distributed storage systems)
• Our proof involved analyzing a new family of random matrices.
• An interesting open problem: can we use belief propagation decoding for this ensemble?
![Page 18: Megasthenis Asteris Alex Dimakismegasthenis.github.io/repository/ISIT2012-Repairable... · 2016. 5. 17. · Megasthenis Asteris Alex Dimakis . Distributed Storage Cluster of machines](https://reader035.vdocuments.mx/reader035/viewer/2022071513/6133c6bedfd10f4dd73b4eb3/html5/thumbnails/18.jpg)
Simulation
ε , q = 0.01, 17
ε , q = 0.13, 17
ε , q = 0.25, 17
ε , q = 0.01, 113
100 120 140 160 180 200 220 240 2600
0.2
0.4
0.6
0.8
1
k
Pr(
decod
ing)
P r(decoding) vs k (R = 0 .5 , c = 5)