trading fault tolerance for performance in an encoding · an encoding–cont’d q make a program...

Trading Fault Tolerance for Performance in AN encoding

Norman A. Rink and Jeronimo CastrillonTechnische Universität [email protected]

Computing Frontiers15-17 May 2017Siena, Italy

Outline

1. Introduction – hardware faults and software-based fault tolerance

2. AN encoding

3. Configurable compiler-implemented AN encoding

4. Fault experiments and results

5. Summary and outlook

2

Outline


2. AN encoding




3

Hardware faults and soft errors

4

q Faults are a long-standing and recurring issue.q In safety-critical embedded devices.q In servers/data centers and HPC workloads.

q What causes hardware faults?q Cosmic radiation.q “Dim silicon” (near-threshold computing to save energy).q Temperature variations, process variations.

à Typically lead to transient hardware faults, aka soft errors.

q Non-negligible fault rates in emerging computing paradigms: q Silicon-/carbon-based nano-technologies, graphene.q Chemical information processing, bio-inspired/quantum computing, NVM.

taken from S. Borkar, “Designing reliable systems from unreliable components: …,” IEEE Micro, vol. 25, no. 6, 2005.

Hardware faults and soft errors – cont’d

5

cost

(run

time,

ene

rgy)

error probability/output degradation

approximate computing applications,e.g. image processing, machine learning

safety-/security-critical applications,e.g. automotive, operating system (kernels)

non-critical user applications, processes that can be restarted

q The trade-off of computing with faulty hardware:

à Hardware-implemented fault tolerance is not flexible.

future and emerging HW (perhaps)

Error detection through redundant code

q Typical approach to detecting/correcting hardware faults:q Replicate data flow (duplication is sufficient for error detection).q Insert checks.

6

%3 = add i64 %0, %1%4 = mul i64 %3, %2

%3 = add i64 %0, %1%r3 = add i64 %r0, %r1%4 = mul i64 %3, %2%r4 = mul i64 %r3, %r2

%f0 = icmp eq i64 %4, %r4br i1 %f0, label continue,

label recover

transformation

original code

fault-tolerant code

duplicated dataflow

error check

runtime overhead of duplicated dataflow typically < 2x

q State of the art: EDDI ’02, SWIFT ’05.q Many variations and improvements exist.q It is usually assumed that the memory system is already protected, typically by ECC.

Outline


2. AN encoding




7

AN encoding

q Correctness condition: Integer values are multiples of a fixed constant A.à Check for faults like this:

if (n % A != 0) { exit(AN_ERROR_CODE); }

q Advantages over replication:q Data in memory is encoded (automatically protected): no need to replicate loads/stores.q Suitable for multi-threaded and shared memory applications.

q Disadvantages: Large runtime overheads (up to and over several 10x).q Checking is expensive (modulo operation).q Decoding is expensive (division by A).q Decoding often required, e.g., for address operands.

8

Trade frequencies of these operations for runtime.

aside:moreeagervariantsofANencodinghaveevenlargeroverheads

AN encoding – cont’d

q Make a program fault-tolerant by transforming it into an AN-encoded program.

1. Value encoding:

2. Operation encoding, e.g.:

3. Check insertion:q Where non-encoded values enter/leave the scope of AN encoding:

§ memory accesses, calls to external functions, return values

à Often referred to as synchronization points.9

%3 = mul i64 %0, %1 %t3 = mul i64 %0, %1%3 = div i64 %t3, A

multiplication

%1 = load i64* %0 %t0 = ptrtoint i64* %0 to i64%t1 = div i64 %t0, A%t3 = inttoptr i64 %t1 to i64*%1 = load i64* %t3

load

integerconstantc integerconstantc*A

Outline


2. AN encoding




10

Configurable compiler-implemented AN encoding

11

q optional pre-optimizationsq AN encoding increases the complexity of code.q Hence, after AN encoding, the compiler may fail to spot

opportunities for optimization.

q value encoding, operation encoding, check insertionq As previously discussed.q Remember, checks are inserted as so-called synchronization points.

q expansionq Turn encoding, decoding and checking operations into (sequences)

of native machine operations.

Encoding variants

12

q check insertionq Always check before values are stored to memory.q Always check before values are decoded.

codegeneratedforthevariantsduringcompilation

onelocalaccumulatorperfunction

Runtime overheads

13

label test case

A-C bubblesort

D CRC (cyclic redundancy checker)

E DES encryption

F Dijkstra

G-I lex, parse, eval

label test case

J Fibonacci

K integer matrix multiply

L memcopy

M-O quicksort

bestresultswithpre-optimizations

Outline


2. AN encoding




14

Fault injection

15

q Assumptions:q Only a single fault affect program execution.q Only single bit flips occurs.

q Simulate symptoms of faults by …q … flipping a random bit in a random register.q … flipping a rondom bit in a random memory location (that is accessed).

q Evaluate AN encoding on the conditional probability:

Commonly justified by the rarity of faults.(SEU – single event upset)

faultsinmemoryfaultsintheCPU

pSDC = P( ”silent data corruption” | ”hardware fault (visible at the architecture level)” )

Faults in the CPU – full results

16

AA

.1A

.2A

.3A

.po.1

A.p

o.2

A.p

o.3 B

B.1

B.2

B.3

B.p

o.1

B.p

o.2

B.p

o.3 C

C.1

C.2

C.3

C.p

o.1

C.p

o.2

C.p

o.3 D

D.1

D.2

D.3

D.p

o.1

D.p

o.2

D.p

o.3 E

E.1

E.2

E.3

E.p

o.1

E.p

o.2

E.p

o.3 F

F.1

F.2

F.3

F.p

o.1

F.p

o.2

F.p

o.3 G

G.1

G.2

G.3

G.p

o.1

G.p

o.2

G.p

o.3 H

H.1

H.2

H.3

H.p

o.1

H.p

o.2

H.p

o.3

II.

1I.

2I.

3I.

po.1

I.p

o.2

I.p

o.3 J

J.1

J.2

J.3

J.p

o.1

J.p

o.2

J.p

o.3 K

K.1

K.2

K.3

K.p

o.1

K.p

o.2

K.p

o.3 L

L.1

L.2

L.3

L.p

o.1

L.p

o.2

L.p

o.3 M

M.1

M.2

M.3

M.p

o.1

M.p

o.2

M.p

o.3 N

N.1

N.2

N.3

N.p

o.1

N.p

o.2

N.p

o.3 O

O.1

O.2

O.3

O.p

o.1

O.p

o.2

O.p

o.3

rela

tive

fre

qu

en

cie

s

0.0

0.2

0.4

0.6

0.8

1.0

correct detected hang crash sdc

pSDC

Faults in the CPU and in memory – SDC

17

A B C D E F G H I J K L M N O means

p sdc

0.00

0.05

0.10

0.151 2 3 po.1 po.2 po.3

Memory:

pSDC · 103:q Memory accesses are relatively rare.q A single vulnerable has a much stronger effect.q Stack accesses have been found to be

particularly vulnerable.

A B C D E F G H I J K L M N O means

p sdc

0.00

0.02

0.04

0.06

0.081 2 3 po.1 po.2 po.3

CPU:

24x

55x

7x

15x

Outline


2. AN encoding




18

Summary and outlook

19

q On less reliable HW, some applications …q … can live with the occasional error (approximate computing).q … may require additional measures to tolerate faults (e.g. OS kernels).

q Encoding is an interesting alternative to instruction duplication.q Protection of entire computing systems, multi-threaded and shared memory applications.

q AN encoding has large runtime overheads (several 10x to 100x).q Here: reduction from 9.9x to 3.6x – accompanied by an increase in pSDC.q Further reduction to 2.1x if pointers are not encoded.

q Can we design HW that supports error detection by encoding?q Must understand better the interaction of encoding with compiler optimizations.q In assessments, would like to avoid long-running fault injection experiments.



Work supported by the German Research Foundation (DFG) within the Cluster of Excellence ‘Center for Advancing Electronics Dresden’ (cfaed). Thank you.



Back upComputing Frontiers15-17 May 2017Siena, Italy

Worked example of AN encoding

22

%1 = load i64* %0%2 = add i64 %1, 42

%3 = call i64 @foo(i64 %2)store i64 %3, i64* %0

%1 = load i64* %0%2 = add i64 %1, 126

%3 = call i64 @foo(i64 %2)store i64 %3, i64* %0

value encoding

*p = foo( (*p) + 42 );

%t0 = ptrtoint i64* %0 to i64%t1 = div i64 %t0, 3%t2 = inttoptr i64 %t1 to i64*%1 = load i64* %t2

%2 = add i64 %1, 126

%t3 = div i64 %2, 3%t4 = call i64 @foo(i64 %t3)%3 = mul i64 %t4, 3

%t5 = ptrtoint i64* %0 to i64%t6 = div i64 %t5, 3%t7 = inttoptr i64 %t6 to i64*

store i64 %3, i64* %t7

operation encoding

call void @check(i64 %0)%t0 = ptrtoint i64* %0 to i64%t1 = div i64 %t0, 3%t2 = inttoptr i64 %t1 to i64*%1 = load i64* %t2call void @check(i64 %1)

%2 = add i64 %1, 126

call void @check(i64 %2)%t3 = div i64 %2, 3%t4 = call i64 foo(i64 %t3)%3 = mul i64 %t4, 3

call void @check(i64 %0)%t5 = ptrtoint i64* %0 to i64%t6 = div i64 %t5, 3%t7 = inttoptr i64 %t6 to i64*

call void @check(i64 %3)store i64 %3, i64* %t7

check insertion

(A = 3)

trading fault tolerance for performance in an encoding · an encoding–cont’d q make a program...

Documents