data persistence in sensor networks: towards optimal encoding for data recovery in partial network...

22
Data Persistence in Sensor Networks: Towards Optimal Encoding for Data Recovery in Partial Network Failures Abhinav Kamra, Jon Feldman, Vishal Misra and Dan Rubenstein DNA Research Group, Columbia University

Upload: elaine-stone

Post on 18-Dec-2015

223 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Persistence in Sensor Networks: Towards Optimal Encoding for Data Recovery in Partial Network Failures Abhinav Kamra, Jon Feldman, Vishal Misra and

Data Persistence in Sensor Networks: Towards Optimal

Encoding for Data Recovery in Partial Network Failures

Abhinav Kamra, Jon Feldman, Vishal Misra and Dan RubensteinDNA Research Group, Columbia University

Page 2: Data Persistence in Sensor Networks: Towards Optimal Encoding for Data Recovery in Partial Network Failures Abhinav Kamra, Jon Feldman, Vishal Misra and

Motivation and Model

Typical Scenario of Sensor Networks Large number of nodes deployed to

``sense'' environmentData collected periodically

pulled/pushed through a sink/gateway node

Nodes prone to failure (disaster, battery life, targeted attack)

Want data to survive individual node failures``Data Persistence''

Page 3: Data Persistence in Sensor Networks: Towards Optimal Encoding for Data Recovery in Partial Network Failures Abhinav Kamra, Jon Feldman, Vishal Misra and

OverviewErasure codes

LT-CodesSoliton distribution

Coding for failure-prone sensor networksMajor resultsA brief sketch of proofsA case study of failure-prone sensor

networks

Page 4: Data Persistence in Sensor Networks: Towards Optimal Encoding for Data Recovery in Partial Network Failures Abhinav Kamra, Jon Feldman, Vishal Misra and

Erasure Codes

Message

Encoding

Received

Message

Encoding Algorithm

Decoding Algorithm

Transmission

n

cn

n≥

n

Page 5: Data Persistence in Sensor Networks: Towards Optimal Encoding for Data Recovery in Partial Network Failures Abhinav Kamra, Jon Feldman, Vishal Misra and

Luby Transform Codes

Simple Linear CodesImprovement over “Tornado codes”Rateless Codes

Page 6: Data Persistence in Sensor Networks: Towards Optimal Encoding for Data Recovery in Partial Network Failures Abhinav Kamra, Jon Feldman, Vishal Misra and

Erasure Codes: LT-Codes

b1 b2 b3 b4 b5F=

n=5 input blocks

Page 7: Data Persistence in Sensor Networks: Towards Optimal Encoding for Data Recovery in Partial Network Failures Abhinav Kamra, Jon Feldman, Vishal Misra and

LT-Codes: Encoding

b1 b2 b3 b4 b5

c1

1. Pick degree d1 from a pre-specified distribution. (d1=2)

2. Select d1 input blocks uniformly at random. (Pick b1 and b4 )

3. Compute their sum (XOR).

4. Output sum, block IDs

E(F)=

F=

Page 8: Data Persistence in Sensor Networks: Towards Optimal Encoding for Data Recovery in Partial Network Failures Abhinav Kamra, Jon Feldman, Vishal Misra and

LT-Codes: Encoding

E(F)=

b1 b2 b3 b4 b5

c1 c2 c3 c4 c5 c6 c7

L

F=

Page 9: Data Persistence in Sensor Networks: Towards Optimal Encoding for Data Recovery in Partial Network Failures Abhinav Kamra, Jon Feldman, Vishal Misra and

LT-Codes: Decoding

b1 b2 b3 b4 b5

c1 c2 c3 c4 c5 c6 c7

F= b1 b2 b3 b4 b5

c1 c2 c3 c4 c5 c6 c7

F= b1 b2 b3 b4 b5

c1 c2 c3 c4 c5 c6 c7

F= b1 b2 b3 b4 b5

c1 c2 c3 c4 c5 c6 c7

335445555ccbccbccb←−←−←−

F= b1 b2 b3 b4 b5

c1 c2 c3 c4 c5 c6 c7E(F)=

F=

b5 b5 b5

b1 b2 b3 b4 b5

c1 c2 c3 c5 c6 c7

F=

b5c4b5 b5

b1 b2 b3 b4 b5

c1 c2 c3 c5 c6 c7

F=

b5c4b5 b5

b1 b2 b3 b4 b5

c1 c2 c3 c5 c6 c7

F=

b5c4b5 b5

b1 b2 b3 b4 b5

c1 c2 c3 c5 c6 c7

F=

b5c4b5 b5b2b2

b1 b2 b3 b4 b5

c1 c2 c3 c5 c6 c7

F=

b5c4b5 b5b2b2

Page 10: Data Persistence in Sensor Networks: Towards Optimal Encoding for Data Recovery in Partial Network Failures Abhinav Kamra, Jon Feldman, Vishal Misra and

Degree Distribution for LT-Codes Soliton Distribution:

Avg degree H(N) ~ ln(N) In expectation: Exactly one degree 1 symbol in

each round of decoding Distribution very fragile in practice

Niii

N

i <<−

=

=

1 for )1(

1

11

π

π

Page 11: Data Persistence in Sensor Networks: Towards Optimal Encoding for Data Recovery in Partial Network Failures Abhinav Kamra, Jon Feldman, Vishal Misra and

Failure-prone Sensor NetworksAll earlier works:

How many encoded symbols needed to recover all original symbols (all or nothing decoding)

Failure-prone networks:How many original symbols can be

recovered from given surviving encoded symbols

Page 12: Data Persistence in Sensor Networks: Towards Optimal Encoding for Data Recovery in Partial Network Failures Abhinav Kamra, Jon Feldman, Vishal Misra and

Iterative Decoder

x1 x3

x5 x2

x1 x3

x4

x3

Received Symbols

Recovered Symbols

• 5 original symbols x1 … x5

• 4 encoded symbols received

• Each encoded symbol is XOR of component original symbols

x3

x1

x4

Page 13: Data Persistence in Sensor Networks: Towards Optimal Encoding for Data Recovery in Partial Network Failures Abhinav Kamra, Jon Feldman, Vishal Misra and

Sensor Network Model

Encoded Symbols remaining: kWant to maximize “r”, the recovered

original data symbolsNo idea apriori what k will be

Page 14: Data Persistence in Sensor Networks: Towards Optimal Encoding for Data Recovery in Partial Network Failures Abhinav Kamra, Jon Feldman, Vishal Misra and

Coding is bad, for small kN original symbolsk encoded symbols receivedIf k ≤ 0.75N, no coding required

N = 128

Page 15: Data Persistence in Sensor Networks: Towards Optimal Encoding for Data Recovery in Partial Network Failures Abhinav Kamra, Jon Feldman, Vishal Misra and

Proof SketchTheorem: To recover first N/2 symbols, it is best to not do any

encodingProof:

1. Let C(i, j) = Expected symbols recovered from i degree 1 and j symbols of degree 2 or more.

2. C(i, j) ≤ C(i+1, j-1) if C(i, j) ≤ N/2a. Sort given symbols in decoding orderb. All degree 1 symbols will be decoded before other symbolsc. Last symbol in decoded order will be of degree > 1 (see b.)d. Replace this symbol by a random degree 1 symbole. New degree 1 symbol more likely to be useful

3. Hence, more degree 1 symbols => Better output4. No coding is best to recover any first N/2 symbols5. All degree 1 => Coupon Collector’s => ≈ 3N/4 symbols to

recover N/2 distinct symbols

Page 16: Data Persistence in Sensor Networks: Towards Optimal Encoding for Data Recovery in Partial Network Failures Abhinav Kamra, Jon Feldman, Vishal Misra and

Ideal Degree Distribution

Theorem: To recover r data units such that

r < jN/(j+1), the optimal degree distribution has symbols of degree j or less only.

Page 17: Data Persistence in Sensor Networks: Towards Optimal Encoding for Data Recovery in Partial Network Failures Abhinav Kamra, Jon Feldman, Vishal Misra and

Lower degree are better for small kIf k ≤ kj, use symbols of up to degree j

So, use kj – kj-1 degree j symbols in close to optimal distribution

N = 128

Page 18: Data Persistence in Sensor Networks: Towards Optimal Encoding for Data Recovery in Partial Network Failures Abhinav Kamra, Jon Feldman, Vishal Misra and

Case Study: Single-sink Sensor Network

Storage

1

2

4

3

Sink

Sensor nodenodes exchange symbols

nodes 2 and 3 transfer new symbols to the sink

Page 19: Data Persistence in Sensor Networks: Towards Optimal Encoding for Data Recovery in Partial Network Failures Abhinav Kamra, Jon Feldman, Vishal Misra and

Case Study: Single-sink Sensor NetworkNetwork prone to failureNodes store unencoded symbols at first

and higher degrees with timeSink receives low degree symbols first and

higher degree as time goes on

1

2

4

3

Sink

Page 20: Data Persistence in Sensor Networks: Towards Optimal Encoding for Data Recovery in Partial Network Failures Abhinav Kamra, Jon Feldman, Vishal Misra and

Distributed SimulationClique TopologyN = 128 nodes in a clique topologySink receives one symbol per unit time

Page 21: Data Persistence in Sensor Networks: Towards Optimal Encoding for Data Recovery in Partial Network Failures Abhinav Kamra, Jon Feldman, Vishal Misra and

Distributed SimulationChain Topology

N = 128 nodes in a chain topology

1 2 3 … N

Page 22: Data Persistence in Sensor Networks: Towards Optimal Encoding for Data Recovery in Partial Network Failures Abhinav Kamra, Jon Feldman, Vishal Misra and

Related WorkBulk Data Distribution: Coding is

usefulTornado (Efficient Erasure Correcting Codes by M. Luby et. al.,

IEEE Transactions on Information Theory, vo. 47, no. 2, 2001)

LT-Codes (LT Codes by M. Luby, FOCS 2002)

Reliable Storage in Sensor NetworksDecentralized erasure code (Ubiquitous Access to

Distributed Data in Large-Scale Sensor Networks through Decentralized Erasure Codes by A. Dimakis et. al., IPSN 2005)

Random Linear Coding (“How Good is Random Linear Coding Based Distributed Networked Storage?” by M. Medard et. al., NetCod 2005)