algorithms for big data: graphs and memory errors 5 (lecture by giuseppe italiano)

92
Resilient Algorithms and Data Structures (Work by Ferraro-Petrillo, Finocchi, I. & Grandoni)

Upload: anton-konushin

Post on 06-May-2015

1.001 views

Category:

Education


0 download

DESCRIPTION

The first part of my lectures will be devoted to the design of practical algorithms for very large graphs. The second part will be devoted to algorithms resilient to memory errors. Modern memory devices may suffer from faults, where some bits may arbitrarily flip and corrupt the values of the affected memory cells. The appearance of such faults may seriously compromise the correctness and performance of computations, and the larger is the memory usage the higher is the probability to incur into memory errors. In recent years, many algorithms for computing in the presence of memory faults have been introduced in the literature: in particular, an algorithm or a data structure is called resilient if it is able to work correctly on the set of uncorrupted values. This part will cover recent work on resilient algorithms and data structures.

TRANSCRIPT

Page 1: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

Resilient Algorithms and Data Structures

(Work by Ferraro-Petrillo, Finocchi, I. & Grandoni)

Page 2: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

Outline of the Talk

1.  Motivation and Model 2.  Resilient Algorithms: •  Sorting and Searching

3.  Resilient Data Structures •  Priority Queues •  Dictionaries

4.  Experimental Results 5.  Conclusions and Open Problems

2

Page 3: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

Memory Errors

Memory error: one or multiple bits read differently from how they were last written.

Many possible causes: •  electrical or magnetic interference (cosmic rays) •  hardware problems (bit permanently damaged) •  corruption in data path between memories and processing units

Errors in DRAM devices concern for a long time [May & Woods 79, Ziegler et al 79, Chen & Hsiao 84, Normand 96, O’Gorman et al 96, Mukherjee et al 05, … ]

3

Page 4: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

Memory Errors

Soft Errors: Randomly corrupt bits, but do not leave any physical damage --- cosmic rays Hard Errors: Corrupt bits in a repeatable manner because of a physical defect (e.g., stuck bits) --- hardware problems

4

Page 5: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

Error Correcting Codes (ECC)

Error correcting codes (ECC) allow detection and correction of one or multiple bit errors

Typical ECC is SECDED (i.e., single error correct, double error detect)

Chip-Kill can correct up to 4 adjacent bits at once

ECC has several overheads in terms of performance (33%), size (20%) and money (10%).

ECC memory chips are mostly used in memory systems for server machines rather than for client computers

5

Page 6: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

Impact of Memory Errors

Consequence of a memory error is system dependent 1. Correctable errors : fixed by ECC

2. Uncorrectable errors :

2.1. Detected : Explicit failure (e.g., a machine reboot)

2.2. Undetected : 2.2.1. Induced failure (e.g., a kernel panic) 2.2.2. Unnoticed (but application corrupted, e.g., segmentation fault, file not found, file not readable, … )

6

Page 7: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

How Common are Memory Errors?

7

Page 8: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

How Common are Memory Errors? [Schroeder et al 2009] experiments 2.5 years (Jan 06 – Jun 08) on Google fleet (104 machines, ECC memory)

Memory errors are NOT rare events! 8

Page 9: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

How Common are Memory Errors?

[Hwang et al 2012]

9

Only minority (2-20%) of nodes experiences 1 single error. Majority experiences larger number of errors (half of nodes sees > 100 errors and top 5% of nodes sees > million errors)

Page 10: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

Error Distribution

[Hwang et al 2012]

10

Very skewed distribution of errors across nodes: the top 5% of error nodes account for more than 95 % of all errors

Page 11: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

Error Correlation

[Hwang et al 2012]

11

Errors happen in a correlated fashion: even a single error on a node raises the probability of future errors to more than 80%, and after seeing just a handful of errors this probability increases to more than 95%.

Page 12: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

Memory Errors

Recent studies point to main memory as one of the leading hardware causes for machine crashes and component replacements in today’s data centers. As the amount of DRAM in servers keeps growing and chip densities increase, DRAM errors might pose an even larger threat to the reliability of future generations of systems.

12

Page 13: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

Memory Errors

Not all machines (clients) have ECC memory chips. Increased demand for larger capacities at low cost just makes the problem more serious – large clusters of inexpensive memories Need of reliable computation in the presence of memory faults

13

Page 14: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

Memory Errors

•  Memory errors can cause security vulnerabilities: Fault-based cryptanalysis [Boneh et al 97, Xu et al 01, Bloemer & Seifert 03] Attacking Java Virtual Machines [Govindavajhala & Appel 03] Breaking smart cards [Skorobogatov & Anderson 02, Bar-El et al 06]

•  Avionics and space electronic systems: Amount of cosmic rays increase with altitude (soft errors)

Other scenarios in which memory errors have impact (and seem to be modeled in an adversarial setting):

14

Page 15: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

Memory Errors in Space

15

Page 16: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

Memory Errors in Space

16

Page 17: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

Memory Errors in Space

17

Page 18: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

Memory Errors in Space

18

Page 19: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

Recap on Memory Errors

1. Memory errors can be harmful: uncorrectable memory errors cause some catastrophic event (reboot, kernel panic, data corruption, …)

19

I’m thinking of getting back into crime, Luigi. Legitimate business is too corrupt…

Page 20: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

A small example

Classical algorithms may not be correct in the presence of (even very few) memory errors

1 2 3 4 5 6 7 8 9 10

11 12 13 14 15 16 17 18 19 20

A

B

Out

An example: merging two ordered lists

Θ(n) Θ(n)

Θ(n2) inversions

... 11 12 20 13

80

... 2 3 4 9 10 80

20

Page 21: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

Recap on Memory Errors

2. Memory errors are NOT rare: even a small cluster of computers with few GB per node can experience one bit error every few minutes.

21

I know my PIN number: it’s my name I can’t remember…

Page 22: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

Recap on Memory Errors

3. ECC may not be available (or may not be enough): No ECC in inexpensive memories. ECC does not guarantee complete fault coverage; expensive; system halt upon detection of uncorrectable errors; service disruption; etc… etc…

22

Page 23: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

Impact of Memory Errors

23

Page 24: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

Resilient Algorithms and Data Structures

Resilient Algorithms and Data Structures: Capable of tolerating memory errors on data (even

throughout their execution) without sacrificing correctness, performance and storage space

Make sure that the algorithms and data structures we design are capable of dealing with memory errors

24

Page 25: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

Faulty- Memory Model [Finocchi, I. 04]

•  Memory fault = the correct data stored in a memory location gets altered (destructive faults)

•  Faults can appear at any time in any memory location simultaneously

•  Assumptions: –  Only O(1) words of reliable memory (safe memory) –  Corrupted values indistinguishable from correct ones

Wish to produce correct output on uncorrupted data (in an adversarial model)

•  Even recursion may be problematic in this model.

25

Page 26: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

Terminology

δ = upper bound known on the number of memory errors (may be function of n)

α = actual number of memory errors (happen during specific execution)

Note: typically α ≤ δ

All the algorithms / data structure described here need to know δ in advance

26

Page 27: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

Other Faulty Models

Design of fault-tolerant alg’s received attention for 50+ years

Liar Model [Ulam 77, Renyi 76,…] Comparison questions answered by a possibly lying adversary. Can exploit query replication strategies.

Fault-tolerant sorting networks [Assaf Upfal 91, Yao Yao 85,…] Comparators can be faulty. Exploit substantial data replication using fault-free data replicators.

Parallel Computations [Huang et al 84, Chlebus et al 94, …] Faults on parallel/distributed architectures: PRAM or DMM simulations (rely on fault-detection mechanisms)

27

Page 28: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

Other Faulty Models

  Robustness in Computational Geometry [Schirra 00, …]

  Faults from unreliable computation (geometric precision) rather than from memory errors

  Noisy / Unreliable Computation [Bravermann Mossel 08]

  Faults (with given probability) from unreliable primitives (e.g., comparisons) rather than from memory errors

  Memory Checkers [Blum et al 93, Blum et al 95, …]

  Programs not reliable objects: self-testing and self-correction. Essential error detection and error correction mechanisms.

  ………………………………………

28

Page 29: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

Outline of the Talk

1.  Motivation and Model 2.  Resilient Algorithms: •  Sorting and Searching

3.  Resilient Data Structures •  Priority Queues •  Dictionaries

4.  Experimental Results 5.  Conclusions and Open Problems

29

Page 30: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

Resilient Sorting

We are given a set of n keys that need to be sorted

Q1. Can sort efficiently correct values in presence of memory errors?

Q2. How many memory errors can tolerate in the worst case if we wish to maintain optimal time and space?

Value of some keys may get arbitrarily corrupted

We cannot tell which is faithful and which is corrupted

30

Page 31: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

Terminology

•  Faithfully ordered sequence = ordered except for corrupted keys

•  Resilient sorting algorithm = produces a faithfully ordered sequence (i.e., wish to sort correctly all the uncorrupted keys)

•  Faithful key = never corrupted

1 2 3 4 5 6 7 8 9 10 ordered Faithfully

80

•  Faulty key = corrupted

31

Page 32: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

Trivially Resilient Resilient variable: consists of (2δ+1) copies

x1, x2, …, x2δ+1 of a standard variable x

Value of resilient variable given by majority of its copies: •  cannot be corrupted by faults •  can be computed in linear time and constant space

[Boyer Moore 91]

Trivially-resilient algorithms and data structures have Θ(δ) multiplicative overheads in terms of time and space

Note: Trivially-resilient does more than ECC (SECDED, Chip-Kill, ….)

32

Page 33: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

Trivially Resilient Sorting

Can trivially sort in O(δ n log n) time during δ memory errors

Trivially Resilient Sorting

O(n log n) sorting algorithm able to tolerate only O (1) memory errors

33

Page 34: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

Resilient Sorting

Comparison-based sorting algorithm that takes O(n log n + δ2) time to run during δ memory errors

O(n log n) sorting algorithm able to tolerate up to O ((n log n)1/2) memory errors

Any comparison-based resilient O(n log n) sorting algorithm can tolerate the corruption of at most O ((n log n)1/2) keys

Upper Bound [Finocchi, Grandoni, I. 05]:

Lower Bound [Finocchi, I. 04]:

34

Page 35: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

Resilient Sorting

35

[Babenko and Pouzyrevsky, ’12] randomized algorithm (based on quicksort) which runs in O(n log n+δ (n log n)1/2) expected time (or deterministic, in O(n log n+δ (n)1/2 log n) worst-case time) during δ memory errors

Lower bound assumes that algorithms not allowed to introduce replicas of existing elements.

Page 36: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

Resilient Sorting (cont.)

Randomized integer sorting algorithm that takes O(n + δ2) time to run during δ memory errors

O(n) randomized integer sorting algorithm able

to tolerate up to O(n1/2) memory errors

Integer Sorting [Finocchi, Grandoni, I. 05]:

36

Page 37: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

search(5) = false

Resilient Binary Search

2 3 4 5 8 9 13 20 26 1 7 10 80

Wish to get correct answers at least on correct keys:

search(s) either finds a key equal to s, or determines that no correct key is equal to s

If only faulty keys are equal to s, answer uninteresting (cannot hope to get trustworthy answer)

37

Page 38: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

Trivially Resilient Binary Search

Can search in O(δ log n) time during δ memory errors

Trivially Resilient Binary Search

38

Page 39: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

Resilient Searching

Randomized algorithm with O(log n + δ) expected time [Finocchi, Grandoni, I. 05]

Deterministic algorithm with O(log n + δ) time [Brodal et al. 07]

Upper Bounds :

Lower Bounds : Ω(log n + δ) lower bound (deterministic)

[Finocchi, I. 04]

Ω(log n + δ) lower bound on expected time [Finocchi, Grandoni, I. 05]

39

Page 40: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

Resilient Dynamic Programming

Running time O(nd + δd+1) and space usage O(nd + nδ) Can tolerate up to δ = O(nd/(d+1)) memory errors

[Caminiti et al. 11]

d-dim. Dynamic Programming

40

Page 41: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

Outline of the Talk

1.  Motivation and Model 2.  Resilient Algorithms: •  Sorting and Searching

3.  Resilient Data Structures •  Priority Queues •  Dictionaries

4.  Experimental Results 5.  Conclusions and Open Problems

41

Page 42: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

Resilient Data Structures

Algorithms affected by errors during execution

Data structures affected by errors in lifetime

Data structures more vulnerable to memory errors than algorithms:

42

Page 43: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

Resilient Priority Queues

Maintain a set of elements under insert and deletemin

insert adds an element

deletemin deletes and returns either the minimum uncorrupted value or a corrupted value

Consistent with resilient sorting

43

Page 44: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

Resilient Priority Queues

Upper Bound : Both insert and deletemin can be implemented in O(log n + δ) time

[Jorgensen et al. 07] (based on cache-oblivious priority queues)

Lower Bound : A resilient priority queue with n > δ elements must use Ω(log n + δ) comparisons to answer an insert followed by a deletemin

[Jorgensen et al. 07]

44

Page 45: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

Resilient Dictionaries

Maintain a set of elements under insert, delete and search

insert and delete as usual, search as in resilient searching:

Again, consistent with resilient sorting

search(s) either finds a key equal to s, or determines that no correct key is equal to s

45

Page 46: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

Resilient Dictionaries

Randomized resilient dictionary implements each operation in O(log n + δ) time

[Brodal et al. 07]

More complicated deterministic resilient dictionary implements each operation in O(log n + δ) time

[Brodal et al. 07]

46

Page 47: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

Resilient Dictionaries

Pointer-based data structures

Faults on pointers likely to be more problematic than faults on keys

Randomized resilient dictionaries of Brodal et al. built on top of traditional (non-resilient) dictionaries

Our implementation built on top of AVL trees

47

Page 48: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

Outline of the Talk

1.  Motivation and Model 2.  Resilient Algorithms: •  Sorting and Searching

3.  Resilient Data Structures •  Priority Queues •  Dictionaries

4.  Experimental Results 5.  Conclusions and Open Problems

48

Page 49: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

Experimental Framework

Alg

orith

m /

Dat

a St

ruct

ure Non-Resilient

Trivially Resilient

Resilient

O(f(n))

O(δ · f(n))

O(f(n) + g(δ ))

49

Resilient sorting from [Ferraro-Petrillo et al. 09]

Resilient dictionaries from [Ferraro-Petrillo et al. 10]

Implemented resilient binary search and heaps

Implementations of resilient sorting and dictionaries more engineered than resilient binary search and heaps

Page 50: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

Experimental Platform

•  2 CPUs Intel Quad-Core Xeon E5520 @ 2.26Ghz

•  L1 cache 256Kb, L2 cache 1 Mb, L3 cache 8 Mb

•  48 GB RAM

•  Scientific Linux release with Linux kernel 2.6.18-164

•  gcc 4.1.2, optimization flag –O3

50

Page 51: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

Fault Injection

This talk: Only random faults

Algorithm / data structure and fault injection implemented as separate threads

(Run on different CPUs)

Preliminary experiments (not here): error rates depend on memory usage and time.

51

Page 52: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

Resiliency: Why should we care?

What’s the impact of memory errors? Try to analyze impact of errors on mergesort, priority queues and dictionaries using a common framework (sorting)

Attempt to measure error propagation: try to estimate how much output sequence is far from being sorted (because of memory errors)

Heapsort implemented on array. For coherence, in AVLSort we do not induce faults on pointers

Will measure faults on AVL pointers in separate experiment

52

Page 53: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

Error Propagation

•  k-unordered sequence = faithfully ordered except for k (correct) keys

•  k-unordered sorting algorithm = produces a k-unordered sequence, i.e., it faithfully sorts all but k correct keys

2-unordered 1 2 3 4 9 5 7 8 6 10 80

•  Resilient is 0-unordered = i.e., it faithfully sorts all correct keys

53

Page 54: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

The Importance of Being Resilient

n = 5,000,000; 0.01% (random) errors in input è 0.13% errors in output 0.02% (random) errors in input è 0.22% errors in output

54

α

Page 55: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

The Importance of Being Resilient

n = 5,000,000; 0.01% (random) errors in input è 0.40% errors in output 0.02% (random) errors in input è 0.47% errors in output

55

α

Page 56: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

The Importance of Being Resilient

n = 5,000,000; 0.01% (random) errors in input è 68.20% errors in output 0.02% (random) errors in input è 79.62% errors in output

56

α

Page 57: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

The Importance of Being Resilient

57

α

Page 58: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

Error Amplification

Mergesort 0.002-0.02% (random) errors in input è 24.50-79.51% errors in output AVLsort 0.002-0.02% (random) errors in input è 0.39-0.47% errors in output Heapsort 0.002-0.02% (random) errors in input è 0.01-0.22% errors in output

They all show some error amplification. Large variations likely to depend on data organization

Note: Those are errors on keys. Errors on pointers are more dramatic for pointer-based data structures

58

Page 59: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

The Importance of Being Resilient

AVL with n = 5,000,000; α errors on memory used (keys, parent pointers, pointers, etc…) 100,000 searches; around α searches fail: on the avg, able to complete only about (100,000/α) searches before crashing

59

α

Page 60: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

Isn’t Trivial Resiliency Enough?

Memory errors are a problem Do we need to tackle it with new algorithms / data structures?

Aren’t simple-minded approaches enough?

60

Page 61: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

Isn’t Trivial Resiliency Enough?

δ = 1024

61

Page 62: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

Isn’t Trivial Resiliency Enough?

  δ = 1024

  100.000 random search

62

Page 63: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

Isn’t Trivial Resiliency Enough?

  δ = 512

  100.000 random ops

63

Page 64: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

Isn’t Trivial Resiliency Enough?

  δ = 1024

  100.000 random ops   no errors on pointers

64

Page 65: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

Isn’t Trivial Resiliency Enough? All experiments for 105 ≤ n ≤ 5 105, δ=1024, unless specified otherwise Mergesort Trivially resilient about 100-200X slower than non-resilient

Binary Search Trivially resilient about 200-300X slower than non-resilient

Dictionaries Trivially resilient AVL about 300X slower than non-resilient

Heaps Trivially resilient about 1000X slower than non-resilient (δ = 512) [deletemin are not random and slow]

65

Page 66: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

Performance of Resilient Algorithms

Memory errors are a problem Trivial approaches produce slow algorithms / data structures

Need non-trivial (hopefully fast) approaches

How fast can be resilient algorithms / data structures?

66

Page 67: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

Performance of Resilient Algorithms

α = δ = 1024

67

Page 68: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

Performance of Resilient Algorithms

α = δ = 1024

68

Page 69: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

Performance of Resilient Algorithms

  α = δ = 1024

  100,000 random search

69

Page 70: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

Performance of Resilient Algorithms

  α = δ = 1024

  100,000 random search

70

Page 71: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

Performance of Resilient Algorithms

  α = δ = 512

  100,000 random ops

71

Page 72: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

Performance of Resilient Algorithms

  α = δ = 512

  100,000 random ops

72

Page 73: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

Performance of Resilient Algorithms

  α = δ = 1024

  100,000 random ops

73

Page 74: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

Performance of Resilient Algorithms

  α = δ = 1024

  100,000 random ops

74

Page 75: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

Performance of Resiliency All experiments for 105 ≤ n ≤ 5 105, α=δ=1024, unless specified otherwise Mergesort Resilient mergesort about 1.5-2X slower than non-resilient mergesort [Trivially resilient mergesort about 100-200X slower]

Binary Search Resilient binary search about 60-80X slower than non-resilient binary search [Trivially resilient binary search about 200-300X slower]

Heaps Resilient heaps about 20X slower than non-resilient heaps (α = δ = 512)

[Trivially resilient heaps about 1000X slower]

Dictionaries Resilient AVL about 10-20X slower than non-resilient AVL [Trivially resilient AVL about 300X slower]

75

Page 76: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

Larger Data Sets

76

How well does the performance of resilient algorithms / data structures scale to larger data sets?

Previous experiments: 105 ≤ n ≤ 5 105

New experiment with n = 5 106

(no trivially resilient)

Page 77: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

Larger Data Sets

77

α

n = 5,000,000

Page 78: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

Larger Data Sets

n = 5,000,000

α

78

Page 79: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

Larger Data Sets

α

100,000 random search on n = 5,000,000 elements

79

log2 n ≈ 22

Page 80: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

Larger Data Sets

α

80

100,000 random search on n = 5,000,000 elements

Page 81: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

Larger Data Sets

100,000 random ops on a heap with n = 5,000,000

α

81

log2 n ≈ 22

Page 82: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

Larger Data Sets

100,000 random ops on a heap with n = 5,000,000

α

82

Page 83: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

Larger Data Sets

100,000 random ops on AVL with n = 5,000,000

α

83

log2 n ≈ 22

Page 84: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

Larger Data Sets

100,000 random ops on AVL with n = 5,000,000

α

84

Page 85: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

Larger Data Sets

All experiments for n = 5 106

Mergesort [was 1.5-2X for 105 ≤ n ≤ 5 105] Resilient mergesort is 1.6-2.3X slower (requires ≤ 0.04% more space) Binary Search [was 60-80X for 105 ≤ n ≤ 5 105] Resilient search is 100-1000X slower (requires ≤ 0.08% more space)

Heaps [was 20X for 105 ≤ n ≤ 5 105] Resilient heap is 100-1000X slower (requires 100X more space)

Dictionaries [was 10-20X for 105 ≤ n ≤ 5 105] Resilient AVL is 6.9-14.6X slower (requires about 1/3 space)

85

Page 86: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

Sensitivity to δ

86

How critical is the choice of δ ?

Underestimating δ (α > δ) compromises resiliency

Overestimating δ (α << δ) gives some performance degradation

Page 87: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

Performance Degradation

Mergesort Resilient mergesort improves by 9.7% in time and degrades by 0.04% in space

Binary Search Resilient search degrades to 9.8X in time and by 0.08% in space

Heaps Resilient heap degrades to 13.1X in time and by 59.28% in space Dictionaries Resilient AVL degrades by 49.71% in time

87

α = 32, but algorithm overestimates δ = 1024:

Page 88: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

Robustness

88

Resilient mergesort and dictionaries appear more robust than resilient search and heaps

I.e., resilient mergesort and dictionaries scale better with n, less sensitive to δ (so less vulnerable to bad estimates of δ), …

How much of this is due to the fact that their implementations are more engineered?

Page 89: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

Outline of the Talk

1.  Motivation and Model 2.  Resilient Algorithms: •  Sorting and Searching

3.  Resilient Data Structures •  Priority Queues •  Dictionaries

4.  Experimental Results 5.  Conclusions and Open Problems

89

Page 90: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

Concluding Remarks

•  Need of reliable computation in the presence of memory errors

•  Investigated basic algorithms and data structures in the faulty memory model: do not wish to detect /correct errors, only produce correct output on correct data

•  Tight upper and lower bounds in this model •  After first tests, resilient implementations of

algorithms and data structures look promising

90

Page 91: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

Future Work and Open Problems

•  More (faster) implementations, engineering and experimental analysis?

•  Resilient graph algorithms?

•  Lower bounds for resilient integer sorting?

•  Better faulty memory model?

•  Resilient algorithms oblivious to δ?

•  Full repertoire for resilient priority queues (delete, decreasekey, increasekey)?

91

Page 92: Algorithms for Big Data: Graphs and Memory Errors 5 (Lecture by Giuseppe Italiano)

Thank You!

92

My memory’s terrible these days…