scalable computing on open distributed systems jon weissman university of minnesota national...
TRANSCRIPT
![Page 1: Scalable Computing on Open Distributed Systems Jon Weissman University of Minnesota National E-Science Center CLADE 2008](https://reader036.vdocuments.mx/reader036/viewer/2022062720/56649f0c5503460f94c1fc5a/html5/thumbnails/1.jpg)
Scalable Computing on Open Distributed Systems
Jon WeissmanUniversity of Minnesota
National E-Science CenterCLADE 2008
![Page 2: Scalable Computing on Open Distributed Systems Jon Weissman University of Minnesota National E-Science Center CLADE 2008](https://reader036.vdocuments.mx/reader036/viewer/2022062720/56649f0c5503460f94c1fc5a/html5/thumbnails/2.jpg)
What is the Problem?• Open distributed systems
– Tasks submitted to the “system” for execution– Workers do the computing, execute a task, return an answer
• The Challenge– Computations that are erroneous or late are less useful– Failure, errors, hacked, misconfigured– Unpredictable time to return answers
• Both local- and wide-area systems– Focus on volunteer wide-area systems
![Page 3: Scalable Computing on Open Distributed Systems Jon Weissman University of Minnesota National E-Science Center CLADE 2008](https://reader036.vdocuments.mx/reader036/viewer/2022062720/56649f0c5503460f94c1fc5a/html5/thumbnails/3.jpg)
Shape of the Solution
• Replication• Works for all sources of unreliability
– computation and data
• How to do this intelligently - scalably?
![Page 4: Scalable Computing on Open Distributed Systems Jon Weissman University of Minnesota National E-Science Center CLADE 2008](https://reader036.vdocuments.mx/reader036/viewer/2022062720/56649f0c5503460f94c1fc5a/html5/thumbnails/4.jpg)
Replication Challenges• How many replicas?
– too many – waste of resources– too few – application suffers
• Most approaches assume ad-hoc replication– under-replicate: task re-execution (^ latency)– over-replicate: wasted resources (v throughput)
• Using information about the past behavior of a node, we can intelligently size the amount of redundancy
![Page 5: Scalable Computing on Open Distributed Systems Jon Weissman University of Minnesota National E-Science Center CLADE 2008](https://reader036.vdocuments.mx/reader036/viewer/2022062720/56649f0c5503460f94c1fc5a/html5/thumbnails/5.jpg)
Problems with ad-hoc replication
Unreliable node
Reliable nodeTask x sent to group A
Task y sent to group B
![Page 6: Scalable Computing on Open Distributed Systems Jon Weissman University of Minnesota National E-Science Center CLADE 2008](https://reader036.vdocuments.mx/reader036/viewer/2022062720/56649f0c5503460f94c1fc5a/html5/thumbnails/6.jpg)
System Model
0.9
0.4
0.8
0.8
0.7
0.8
0.8
0.7
0.4
0.3
• Reputation rating ri– degree of node reliability
• Dynamically size the redundancy based on ri
• Note: variable sized groups
• Assume no correlated errors, relax later
![Page 7: Scalable Computing on Open Distributed Systems Jon Weissman University of Minnesota National E-Science Center CLADE 2008](https://reader036.vdocuments.mx/reader036/viewer/2022062720/56649f0c5503460f94c1fc5a/html5/thumbnails/7.jpg)
Smart Replication• Rating based on past interaction with clients
– prob. (ri) over window • correct/total or timely/total
– extend to worker group (assuming no collusion) => likelihood of correctness (LOC)
• Smarter Redundancy– variable-sized worker groups– intuition: higher reliability clients => smaller groups
12
1:,
12
1
1
12
1121
)1(k
kmm
k
iii
k
iik
ii rr
![Page 8: Scalable Computing on Open Distributed Systems Jon Weissman University of Minnesota National E-Science Center CLADE 2008](https://reader036.vdocuments.mx/reader036/viewer/2022062720/56649f0c5503460f94c1fc5a/html5/thumbnails/8.jpg)
Terms
• LOC (Likelihood of Correctness), g
– computes the ‘actual’ probability of getting a correct or timely answer from a group g of clients
• Target LOC (target)– the success-rate that the system tries to ensure while
forming client groups
![Page 9: Scalable Computing on Open Distributed Systems Jon Weissman University of Minnesota National E-Science Center CLADE 2008](https://reader036.vdocuments.mx/reader036/viewer/2022062720/56649f0c5503460f94c1fc5a/html5/thumbnails/9.jpg)
Scheduling Metrics
• Guiding metrics– throughput : is the set of successfully completed
tasks in an interval
– success rate s: ratio of throughput to number of tasks attempted
![Page 10: Scalable Computing on Open Distributed Systems Jon Weissman University of Minnesota National E-Science Center CLADE 2008](https://reader036.vdocuments.mx/reader036/viewer/2022062720/56649f0c5503460f94c1fc5a/html5/thumbnails/10.jpg)
Algorithm Space
• How many replicas?– algorithms compute how many replicas to meet a
success threshold
• How to reach consensus?– Majority (better for byzantine threats)– M-1 (better for timeliness)– M-2 (2 matching)
![Page 11: Scalable Computing on Open Distributed Systems Jon Weissman University of Minnesota National E-Science Center CLADE 2008](https://reader036.vdocuments.mx/reader036/viewer/2022062720/56649f0c5503460f94c1fc5a/html5/thumbnails/11.jpg)
One Scheduling Algorithm
![Page 12: Scalable Computing on Open Distributed Systems Jon Weissman University of Minnesota National E-Science Center CLADE 2008](https://reader036.vdocuments.mx/reader036/viewer/2022062720/56649f0c5503460f94c1fc5a/html5/thumbnails/12.jpg)
Evaluation
• Baselines– Fixed algorithm: statically sized equal groups uses no
reliability information
– Random algorithm: forms groups by randomly assigning nodes until target is reached
• Simulated a wide-variety of node reliability distributions
![Page 13: Scalable Computing on Open Distributed Systems Jon Weissman University of Minnesota National E-Science Center CLADE 2008](https://reader036.vdocuments.mx/reader036/viewer/2022062720/56649f0c5503460f94c1fc5a/html5/thumbnails/13.jpg)
Experimental Results: correctness
Simulation: byzantine behavior only … majority voting
![Page 14: Scalable Computing on Open Distributed Systems Jon Weissman University of Minnesota National E-Science Center CLADE 2008](https://reader036.vdocuments.mx/reader036/viewer/2022062720/56649f0c5503460f94c1fc5a/html5/thumbnails/14.jpg)
Role of target
• Key parameter– hard to specify
• Too large– groups will be too large (low throughput)
• Too small– groups will be too small (low success rate)
• Instead, adaptively learn it– bias toward or s or both
![Page 15: Scalable Computing on Open Distributed Systems Jon Weissman University of Minnesota National E-Science Center CLADE 2008](https://reader036.vdocuments.mx/reader036/viewer/2022062720/56649f0c5503460f94c1fc5a/html5/thumbnails/15.jpg)
Adaptive Algorithm
![Page 16: Scalable Computing on Open Distributed Systems Jon Weissman University of Minnesota National E-Science Center CLADE 2008](https://reader036.vdocuments.mx/reader036/viewer/2022062720/56649f0c5503460f94c1fc5a/html5/thumbnails/16.jpg)
What about time?
• Timeliness• Result > time T is less (or not) useful
– (1) soft deadlines• user interacting, visualization output from computation
– (2) hard deadlines• need to get X results done before HPDC/NSDI/… deadline
• Live experimentation on PlanetLab• Real application: BLAST
![Page 17: Scalable Computing on Open Distributed Systems Jon Weissman University of Minnesota National E-Science Center CLADE 2008](https://reader036.vdocuments.mx/reader036/viewer/2022062720/56649f0c5503460f94c1fc5a/html5/thumbnails/17.jpg)
Some PL data
Computation
- both across and within nodes
Communication
- both across and within nodes
Temporal variability
![Page 18: Scalable Computing on Open Distributed Systems Jon Weissman University of Minnesota National E-Science Center CLADE 2008](https://reader036.vdocuments.mx/reader036/viewer/2022062720/56649f0c5503460f94c1fc5a/html5/thumbnails/18.jpg)
PL EnvironmentRidge is our live system that implements reputation
120 wide-area nodes, fully correct, M-1 consensus
3 Timeliness environments based on deadlines
D=120s D=180s D=240s
![Page 19: Scalable Computing on Open Distributed Systems Jon Weissman University of Minnesota National E-Science Center CLADE 2008](https://reader036.vdocuments.mx/reader036/viewer/2022062720/56649f0c5503460f94c1fc5a/html5/thumbnails/19.jpg)
Experimental Results: timeliness
Best BOINC (BOINC*), conservative (BOINC-) vs. RIDGE
![Page 20: Scalable Computing on Open Distributed Systems Jon Weissman University of Minnesota National E-Science Center CLADE 2008](https://reader036.vdocuments.mx/reader036/viewer/2022062720/56649f0c5503460f94c1fc5a/html5/thumbnails/20.jpg)
Makespan Comparison
![Page 21: Scalable Computing on Open Distributed Systems Jon Weissman University of Minnesota National E-Science Center CLADE 2008](https://reader036.vdocuments.mx/reader036/viewer/2022062720/56649f0c5503460f94c1fc5a/html5/thumbnails/21.jpg)
Collusion
• Suppose errors are correlated?• How?
– Widespread bug (hardware or software)– Misconfiguration– Virus– Sybil attack– Malicious group
• With Emmanuel Jeannot (Inria)
![Page 22: Scalable Computing on Open Distributed Systems Jon Weissman University of Minnesota National E-Science Center CLADE 2008](https://reader036.vdocuments.mx/reader036/viewer/2022062720/56649f0c5503460f94c1fc5a/html5/thumbnails/22.jpg)
Key Ideas• Execute a task => answer groups
– A1, A2, … Ak
– For each Ai there are associated workers Wi1, Wi
2… Win
– Pcollusion(workers in Ai)
• Learn probability of correlated errors– Pcollusion(W1, W2)
• Estimate probability of group correlated errors– Pcollusion(G), G=[W1, W2, W3, …] via f {Pcollusion(Wi, Wj), for all i,j}
• Rank and select answer– Pcollusion(G) and |G|– Update matrix: Pcollusion(W1, W2)
![Page 23: Scalable Computing on Open Distributed Systems Jon Weissman University of Minnesota National E-Science Center CLADE 2008](https://reader036.vdocuments.mx/reader036/viewer/2022062720/56649f0c5503460f94c1fc5a/html5/thumbnails/23.jpg)
Bootstrap Problem
• Building collusion matrix• Must first “bait” colluders
– Over-replicate such that majority group is still correct to expose colluders
– : probability of worker collusion– : probability colluders fool the system
• Given group size k
![Page 24: Scalable Computing on Open Distributed Systems Jon Weissman University of Minnesota National E-Science Center CLADE 2008](https://reader036.vdocuments.mx/reader036/viewer/2022062720/56649f0c5503460f94c1fc5a/html5/thumbnails/24.jpg)
4: 1 group 30% colluders, always collude5. Same group – colludes 30% of the time7. 2 groups (40%, 30% colluders)
correctness
![Page 25: Scalable Computing on Open Distributed Systems Jon Weissman University of Minnesota National E-Science Center CLADE 2008](https://reader036.vdocuments.mx/reader036/viewer/2022062720/56649f0c5503460f94c1fc5a/html5/thumbnails/25.jpg)
throughput
![Page 26: Scalable Computing on Open Distributed Systems Jon Weissman University of Minnesota National E-Science Center CLADE 2008](https://reader036.vdocuments.mx/reader036/viewer/2022062720/56649f0c5503460f94c1fc5a/html5/thumbnails/26.jpg)
Summary
• Reliable Scalable computing– correctness and timeliness
• Future work– combined models and metrics– workflows: coupling data and computation
reliability
Visit ridge.cs.umn.edu to learn more