prophecy : using history for high-throughput fault tolerance

59
Prophecy: Using History for High- Throughput Fault Tolerance Siddhartha Sen Joint work with Wyatt Lloyd and Mike Freedman Princeton University

Upload: mika

Post on 03-Feb-2016

36 views

Category:

Documents


0 download

DESCRIPTION

Prophecy : Using History for High-Throughput Fault Tolerance. Siddhartha Sen Joint work with Wyatt Lloyd and Mike Freedman Princeton University. Non-crash failures happen. Model as Byzantine (malicious). Mask Byzantine faults. Service. Clients. Mask Byzantine faults. Throughput. Clients. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Prophecy : Using History for High-Throughput Fault Tolerance

Prophecy: Using History for High-Throughput Fault Tolerance

Siddhartha Sen

Joint work with Wyatt Lloyd and Mike Freedman

Princeton University

Page 2: Prophecy : Using History for High-Throughput Fault Tolerance

Non-crash failures happen

Model as Byzantine (malicious)

Page 3: Prophecy : Using History for High-Throughput Fault Tolerance

Mask Byzantine faults

Clients Service

Page 4: Prophecy : Using History for High-Throughput Fault Tolerance

Mask Byzantine faults

Replicated service

Clients

Throughput

Page 5: Prophecy : Using History for High-Throughput Fault Tolerance

Mask Byzantine faults

Replicated service

Clients

Throughput

Linearizability(strong consistency)

Page 6: Prophecy : Using History for High-Throughput Fault Tolerance

Byzantine fault tolerance (BFT)

• Low throughput

• Modifies clients

• Long-lived sessions

Page 7: Prophecy : Using History for High-Throughput Fault Tolerance

Prophecy

• High throughput + good consistency

• No free lunch:– Read-mostly workloads– Slightly weakened consistency

Page 8: Prophecy : Using History for High-Throughput Fault Tolerance

Byzantine fault tolerance (BFT)

• Low throughput

• Modifies clients

• Long-lived sessions

D-ProphecyD-Prophecy

ProphecyProphecy

Page 9: Prophecy : Using History for High-Throughput Fault Tolerance

Traditional BFT reads

Clients

Replica Group

application

Agree?

Page 10: Prophecy : Using History for High-Throughput Fault Tolerance

A cache solution

Clients

Replica Group

applicationcache

Agree?

Page 11: Prophecy : Using History for High-Throughput Fault Tolerance

A cache solution

Clients

Replica Group

applicationcache

Agree?Problems:

• Huge cache• Invalidation

Page 12: Prophecy : Using History for High-Throughput Fault Tolerance

A compact cache

Clients

Replica Group

applicationcache

Requests Responsesreq1 resp1req2 resp2req3 resp3

Page 13: Prophecy : Using History for High-Throughput Fault Tolerance

A compact cache

Clients

Replica Group

applicationcache

Requests Responsessketch(req1) sketch(resp1)sketch(req2) sketch(resp2)sketch(req3) sketch(resp3)

Requests Responses

Page 14: Prophecy : Using History for High-Throughput Fault Tolerance

A sketcher

Clients

Replica Group

applicationsketcher

Page 15: Prophecy : Using History for High-Throughput Fault Tolerance

A sketcher

Clients

Replica Group

…………

…………

…………

sketch webpage

Page 16: Prophecy : Using History for High-Throughput Fault Tolerance

Executing a read

Clients

Replica Group

…………

……………………

…………Agree?

Fast, load-balanced reads

sketch webpage

Page 17: Prophecy : Using History for High-Throughput Fault Tolerance

Executing a read

Clients

Replica Group

…………

………………

……

…………Agree?

sketch webpage

Page 18: Prophecy : Using History for High-Throughput Fault Tolerance

Executing a read

Clients

Replica Group

…………

…………

…………

sketch webpage

key-value store

replicated state machine

Page 19: Prophecy : Using History for High-Throughput Fault Tolerance

Executing a read

Clients

Replica Group

…………

…………

…………Agree?

sketch webpage

…………

Maintain a fresh cache

Page 20: Prophecy : Using History for High-Throughput Fault Tolerance

NO!

Did we achieve linearizability?

Page 21: Prophecy : Using History for High-Throughput Fault Tolerance

Executing a read

Clients

Replica Group

…………

…………

…………

sketch webpage

…………

Page 22: Prophecy : Using History for High-Throughput Fault Tolerance

Executing a read

Clients

Replica Group

…………

…………

…………Agree?

sketch webpage

…………

Page 23: Prophecy : Using History for High-Throughput Fault Tolerance

Executing a read

Clients

Replica Group

…………

…………

…………Agree?

sketch webpage

…………

Fast reads may be stale

Page 24: Prophecy : Using History for High-Throughput Fault Tolerance

Load balancing

Clients

Replica Group

…………

…………

…………Agree?

sketch webpage

…………

Pr(k stale) = gk

Page 25: Prophecy : Using History for High-Throughput Fault Tolerance

Traditional BFT:• Each replica executes read• Linearizability

D-Prophecy:• One replica executes read• “Delay-once” linearizability

D-Prophecy vs. BFT

Clients

Replica Group

Page 26: Prophecy : Using History for High-Throughput Fault Tolerance

Byzantine fault tolerance (BFT)

• Low throughput

• Modifies clients

• Long-lived sessions

D-ProphecyD-Prophecy

ProphecyProphecy

Page 27: Prophecy : Using History for High-Throughput Fault Tolerance

Key-exchange overhead

11%

3%

Page 28: Prophecy : Using History for High-Throughput Fault Tolerance

Internet services

Clients

Replica Group

Page 29: Prophecy : Using History for High-Throughput Fault Tolerance

Sketcher

A proxy solution

Clients

Replica Group

Proxy

Consolidate sketchers

Page 30: Prophecy : Using History for High-Throughput Fault Tolerance

Sketcher

Clients

Replica Group

Trusted

A proxy solution

Sketcher must be fail-stop

Page 31: Prophecy : Using History for High-Throughput Fault Tolerance

Sketcher must be fail-stop

Sketcher

Clients

Replica Group

Trusted

A proxy solution

• Trust middlebox already• Small and simple

Page 32: Prophecy : Using History for High-Throughput Fault Tolerance

…………

…………

Sketcher

…………

Executing a read

Clients

Replica Group

Trusted

q

…………

…………

…………

Fast, load-balanced reads

Prophecy

Page 33: Prophecy : Using History for High-Throughput Fault Tolerance

Prophecy

Sketcher

Clients

Replica Group

Trusted

……………………

…………

…………

…………

…………

…………

Fast reads may be stale

…………

Page 34: Prophecy : Using History for High-Throughput Fault Tolerance

Delay-once linearizability

Page 35: Prophecy : Using History for High-Throughput Fault Tolerance

W, R, W, W, R, R, W, R

Delay-once linearizability

Read-after-write property

Page 36: Prophecy : Using History for High-Throughput Fault Tolerance

W, R, W, W, R, R, W, R

Delay-once linearizability

Read-after-write property

Page 37: Prophecy : Using History for High-Throughput Fault Tolerance

Example application

• Upload embarrassing photos1. Remove colleagues

from ACL 2. Upload photos3. (Refresh)

• Weak may reorder

• Delay-once preserves order

Page 38: Prophecy : Using History for High-Throughput Fault Tolerance

Byzantine fault tolerance (BFT)

• Low throughput

• Modifies clients

• Long-lived sessions

D-ProphecyD-Prophecy

ProphecyProphecy

Page 39: Prophecy : Using History for High-Throughput Fault Tolerance

Implementation

• Modified PBFT– PBFT is stable, complete– Competitive with Zyzzyva et. al.

• C++, Tamer async I/O– Sketcher: 2000 LOC– PBFT library: 1140 LOC– PBFT client: 1000 LOC

Page 40: Prophecy : Using History for High-Throughput Fault Tolerance

Evaluation

• Prophecy vs. proxied-PBFT– Proxied systems

• D-Prophecy vs. PBFT– Non-proxied systems

Page 41: Prophecy : Using History for High-Throughput Fault Tolerance

Evaluation

• Prophecy vs. proxied-PBFT– Proxied systems

• We will study:– Performance on “null” workloads– Performance with real replicated service– Where system bottlenecks, how to scale

Page 42: Prophecy : Using History for High-Throughput Fault Tolerance

Basic setup

Sketcher

Clients (100)

Replica Group (PBFT)

(concurrent)

Page 43: Prophecy : Using History for High-Throughput Fault Tolerance

Fraction of failed fast reads

Alexa top sites:< 15%

Page 44: Prophecy : Using History for High-Throughput Fault Tolerance

Small benefit on null reads

Page 45: Prophecy : Using History for High-Throughput Fault Tolerance

Apache webserver setup

Sketcher

Clients

Replica Group

Page 46: Prophecy : Using History for High-Throughput Fault Tolerance

Large benefit on real workload

3.7x

2.0x

Page 47: Prophecy : Using History for High-Throughput Fault Tolerance

Benefit grows with work94s (Apache)

Null workloads are misleading!

Page 48: Prophecy : Using History for High-Throughput Fault Tolerance

Benefit grows with work

Page 49: Prophecy : Using History for High-Throughput Fault Tolerance

Single sketcher bottlenecks

Page 50: Prophecy : Using History for High-Throughput Fault Tolerance

Scaling out

Page 51: Prophecy : Using History for High-Throughput Fault Tolerance

Scales linearly with replicas

Page 52: Prophecy : Using History for High-Throughput Fault Tolerance

Summary• Prophecy good for Internet services– Fast, load-balanced reads

• D-Prophecy good for traditional services

• Prophecy scales linearly while PBFT stays flat

• Limitations:– Read-mostly workloads (meas. study corroborates)– Delay-once linearizability (useful for many apps)

Page 53: Prophecy : Using History for High-Throughput Fault Tolerance

Thank You

Page 54: Prophecy : Using History for High-Throughput Fault Tolerance

Additional slides

Page 55: Prophecy : Using History for High-Throughput Fault Tolerance

Transitions

• Prophecy good for read-mostly workloads

• Are transitions rare in practice?

Page 56: Prophecy : Using History for High-Throughput Fault Tolerance

Measurement study

• Alexa top sites

• Access main page every 20 sec for 24 hrs

Page 57: Prophecy : Using History for High-Throughput Fault Tolerance

Mostly static content

Page 58: Prophecy : Using History for High-Throughput Fault Tolerance

Mostly static content15%

Page 59: Prophecy : Using History for High-Throughput Fault Tolerance

Dynamic content

• Rabin fingerprinting on transitions

• 43% differ by single contiguous change

• Sampled 4000 of them, over half due to:– Load balancing directives– Random IDs in links, function parameters