george saad university of new mexico department of computer science

64
George Saad University of New Mexico Department of Computer Science Selfishness and Malice in Distributed Systems

Upload: winifred-green

Post on 13-Dec-2015

219 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: George Saad University of New Mexico Department of Computer Science

George SaadUniversity of New Mexico

Department of Computer Science

Selfishness and Malice in Distributed Systems

Page 2: George Saad University of New Mexico Department of Computer Science

Selfishness and Malice• Selfishness and malice have negative influence on the

performance of distributed systems.• Selfishness of players in a game can reduce social welfare.• Malicious nodes can seriously disrupt the network.

• In this dissertation, we provide algorithms to address these issues.

Page 3: George Saad University of New Mexico Department of Computer Science

Selfishness and Malice• Selfishness (El-Farol game): we characterize

BCE for game of +ve/-ve network effects. “The Power of Mediation in an Extended El Farol Game”, SAGT’13

2013

2013 2014

“Self-Healing Communication”, SSS’13 “Self-Healing Computation”, SSS’14

• Malice: we develop algorithms to recover networks from Byzantine faults.

Page 4: George Saad University of New Mexico Department of Computer Science

Part I : Selfishness

Page 5: George Saad University of New Mexico Department of Computer Science

El-Farol Game

• A set of n selfish players• Actions:• go to the bar• stay home

• The cost function:• cost to stay = 1,• cost to go: f(x)

Objective: find an equilibrium which minimizes Social Cost, where

Page 6: George Saad University of New Mexico Department of Computer Science

Our El Farol Extension

We extend the cost function:• The cost to stay can be any constant t > 0,• The cost to go, f(x):

Page 7: George Saad University of New Mexico Department of Computer Science

Positive and Negative Network Effects

“Many real situations in fact display both kinds of [positive and negative] externalities … an on-line social media site with limited infrastructure might be most enjoyable if it has a reasonably large audience, but not so large that connecting to the Web site becomes very slow due to the congestion.”

“Many real situations in fact display both kinds of [positive and negative] externalities … an on-line social media site with limited infrastructure might be most enjoyable if it has a reasonably large audience, but not so large that connecting to the Web site becomes very slow due to the congestion.”

[Easley and Kleinberg, 2010]

Page 8: George Saad University of New Mexico Department of Computer Science

Solution ConceptsHow to minimize

my own cost unilaterally?• Nash Equilibrium • Unfortunately, NE has high social cost.

• Correlated Equilibrium (CE)• Mediator implements CE.

Page 9: George Saad University of New Mexico Department of Computer Science

Mediator• A trusted coordinator that • gives recommendations to the players, • implements a correlated equilibrium.• Note that all players have free will.

• A mediator is optimal when it implements the best correlated equilibrium.

Page 10: George Saad University of New Mexico Department of Computer Science

Let mediator have a probability distribution on k ≥ 1 strategy profiles.• The players know probability distribution and strategy profiles.• Mediator selects secretly one strategy profile according to the

probability distribution. • Mediator advises each player privately and separately.• No player has incentive to deviate unilaterally from the advice.

How to design such a mediator?

( [s11,…,s1n], p1 )

( [s21,…,s2n] , p2 )

( [sk1,…,skn] , pk )

( x1 , p1 )

( x2 , p2 )

( xk , pk )

( x1 , p1 )

( x2 , p2 )

( xk , pk )

( x1 , p1 )

( x2 , p2 )

( xk , pk )

( x1 , p1 )

( x2 , p2 )

( xk , pk )

( x1 , p1 )

( x2 , p2 )

( xk , pk )

Page 11: George Saad University of New Mexico Department of Computer Science

Example for (c, s1, s2)-El Farol Game

• For a (2, 4, 4)-El Farol game:• Best Nash Equilibrium:

• ¼-fraction of players go.• Social cost = n.

• An optimal mediator:• Strategy profile 1: (x1 = 0, p1 = 1/3)

• Strategy profile 2: (x2 = ½, p2 = 2/3)• Expected social cost = ⅔ n.

• The optimal social cost (no selfishness)• ½-fraction of players go.• Social cost = ½ n.

Page 12: George Saad University of New Mexico Department of Computer Science

How efficient is our mediator?

Page 13: George Saad University of New Mexico Department of Computer Science

Our Contributions• Game of positive and negative network effects, we characterize: • Optimal Social Cost,• Best Nash Equilibrium (BNE), and• Best Correlated Equilibrium (BCE).

• Efficiency of optimal mediator for this game• When BCE = BNE?• MV and EV can be unbounded!

Page 14: George Saad University of New Mexico Department of Computer Science

Optimal Social Cost

We characterize x* as a function of parameters of our game.

Page 15: George Saad University of New Mexico Department of Computer Science

Best Nash Equilibrium

Page 16: George Saad University of New Mexico Department of Computer Science

Optimal Mediator

Page 17: George Saad University of New Mexico Department of Computer Science

- - p is a function of c, s1 and s2.- p can be 0 or 1 for some values of c, s1 and s2.

When is BCE = BNE?

Page 18: George Saad University of New Mexico Department of Computer Science

If c ≤ 1, then all players would rather stay, if f(1) ≥ 1; all players would rather go, if f(1) < 1.

If c > 1 and λ(c, s1, s2) ≥ 1, then all players would rather go, where:

When BCE = BNE?

BCE is advantageous over BNE when c > 1 and λ < 1.

Page 19: George Saad University of New Mexico Department of Computer Science

Can MV be unbounded?c s1 s2 c/s1 1

Page 20: George Saad University of New Mexico Department of Computer Science

Can EV be unbounded?c s1 s2 c/s1 1

Page 21: George Saad University of New Mexico Department of Computer Science

Related Work• Linear Congestion Games [CK’05]:• 1.577 ≤ EV ≤ 1.6 and MV ≤ 1.015.

• Ranking Games [BFHS’07]:• EV = n-1 and MV = n-1 for n>3.

• Virus Inoculation Game [DMNS’09]:• EV = and MV = .

Page 22: George Saad University of New Mexico Department of Computer Science

Conclusion

• We extended the El-Farol game to have both positive and negative network effects.

• For this extension, we have characterized:• the optimal social cost, • the BNE, and• the BCE.

• We characterized the MV and the EV for this game.• We show when BCE = BNE.• We show that MV and EV can be unbounded in this game.

Page 23: George Saad University of New Mexico Department of Computer Science

Open Problems

• Multi-Site El-Farol Game (> 2 actions): • The bar has k > 2 sites.• Each player chooses which site to go to.• How many strategy profiles required for BCE?

• If f(x) is polynomial in x, with degree > 1, then• what is the characterization of BCE? • Is # strategy profiles related to degree of

f(x)?

Page 24: George Saad University of New Mexico Department of Computer Science

Self-Healing Communication Self-Healing Computation

Part II : Malice

Page 25: George Saad University of New Mexico Department of Computer Science

Malice• We consider the presence of an adversary.

• Adversary takes over a subset of nodes to cause faults.

• Byzantine Faults vs Fail-Stop Faults

• Fault Tolerance:

• Replication

• Self-healing (automatic recovery)

Page 26: George Saad University of New Mexico Department of Computer Science

Fault Tolerance• Non-self-healing algorithms for Byzantine model: [NW’03,

HK’04, FSY’05, AS’06, AJR’06, AS’07, JY’08, GKKY’10, GKKY’13].

• Self-healing algorithms for fail-stop model: [BSAS’06, ST’06, HRST’08, HST’09, PT’11, ST’11].

• Self-healing Algorithms for Byzantine faults?

• We develop self-healing algorithms to recover from Byzantine faults.

Page 27: George Saad University of New Mexico Department of Computer Science

How to recover from Byzantine faults?

Self-Healing CommunicationMessage is sent through a path of nodes.

Self-Healing ComputationComputation is performed through circuits.

Page 28: George Saad University of New Mexico Department of Computer Science

Our Model• A network of n nodes• Static and Computationally Bounded Adversary• Adversary controls up to ¼ of the nodes.• Partially Synchronous Communication: Upper bound of time

steps between sending and receiving messages.• Rushing Adversary: Waiting until receiving all messages from

good nodes before responding.• After bad nodes selected, Quorum Graph is built up [KLST’10]• Any quorum is a set of θ(log n) nodes; and • Each node is in θ(log n) quorums.• At most ¼ of nodes in any quorum are bad.

KLST’10 : Valerie King, Steve Lonargan, Jared Saia and Amitabh Trehan, “Load balanced Scalable Byzantine Agreement through Quorum Building, with Full Information”, ICDCN 2010.

Page 29: George Saad University of New Mexico Department of Computer Science

Naïve Communication (no self-healing)

• All-to-all communication between quorums• Message cost O(l log2 n), and latency O(l)• However, we can do better by self-healing.

Page 30: George Saad University of New Mexico Department of Computer Science

Our Contribution• We developed a self-healing algorithm that detects message

corruptions and marks bad nodes.

• Each bad node causes O((log∗ n)2) corruptions, in expectation.“Fool me once, shame on you. Fool me ω((log* n)2) times,

shame on me.”

Iterated Logarithme.g. log*

1010 = 5

Naïve Communication Our Algorithm

Message cost O(l log2 n ) O(l + log n)Latency O(l) O(l)Corruptions No corruptions O(t(log∗ n)2))

Page 31: George Saad University of New Mexico Department of Computer Science

Our Algorithm (SEND)

SEND-PATH

SEND

CHECK

CHECK1 CHECK2

HEAL

HEAL is triggered O(t) times before all bad nodes are marked.

Page 32: George Saad University of New Mexico Department of Computer Science

CHECK1• SEND triggers CHECK1 with probability 1/(log log n)2.• Subquorum size is O(log log n).• Latency is O(l) and Message Cost is O(l (log log n)2).• Detects corruptions with const prob. for l = O(log2 n).

• SEND triggers CHECK1 with probability 1/(log log n)2.• Subquorum size is O(log log n).• Latency is O(l) and Message Cost is O(l (log log n)2).

Page 33: George Saad University of New Mexico Department of Computer Science

CHECK2• SEND triggers CHECK2 with probability 1/(log ∗ n)2.• CHECK2 has O(log ∗ n) rounds.• Incremental subquorum size, up to O(log∗ n).• Latency is O(l log ∗ n) and Message Cost is O(l (log ∗ n)2).

• SEND triggers CHECK2 with probability 1/(log ∗n)2.• CHECK2 has O(log ∗ n) rounds.• Incremental subquorum size, up to O(log∗ n).• Latency is O(l log ∗ n) and Message Cost is O(l (log ∗n)2).

Page 34: George Saad University of New Mexico Department of Computer Science

CHECK2 Analysis• Deception Interval : a substring of bad nodes, where a

corruption occurs.• Key Points of Detecting Corruptions:• Deception interval shrinks logarithmically with prob. ≥ ½.• O(log* n) rounds to shrink deception interval to size zero.

Page 35: George Saad University of New Mexico Department of Computer Science

CHECK2 Analysis• Deception Interval shrinks logarithmically from round to round:

Page 36: George Saad University of New Mexico Department of Computer Science

HEAL

• Inspects each node participated what it received and sent

• Marks the nodes that are in conflict* A pair of nodes is in conflict if they accuse each other

• Each pair of nodes in conflict has at least one bad node

Page 37: George Saad University of New Mexico Department of Computer Science

?HEAL

• If ½ nodes in any quorum are marked, they are set unmarked.

• HEAL is triggered O(t) times before all bad nodes are marked.

• We show that using a potential function argument.

• f(b,g) is monotonically increasing,• Δf(b,g) is at least some +ve constant.• When f(b,g) = t, we are done.

Page 38: George Saad University of New Mexico Department of Computer Science

Empirical Results• Our simulation runs:• over butterfly networks of quorums,• for different network sizes, up to

n=30k, and • for different fractions of bad nodes.

• Simulation terminates after all bad nodes are marked.

• The results are taken over 3000 experiments.

Page 39: George Saad University of New Mexico Department of Computer Science

# messages is improved by a factor of 60 for CHECK1

39,100

649

Empirical Results# Messages reduces by a factor of 60 (n~30k)

39,100

1,177

# messages is improved by a factor of 33 for CHECK2

Page 40: George Saad University of New Mexico Department of Computer Science

Empirical ResultsLatency increases by 1½ times (n~30k)

Latency increases by 1½ times for CHECK1

39,100

649

Latency increases by 2 times for CHECK2

18

13

25

13

Page 41: George Saad University of New Mexico Department of Computer Science

Empirical ResultsCorruption Probability 0

39,100

649

18

13

25

13

CHECK1 CHECK2

Page 42: George Saad University of New Mexico Department of Computer Science

Empirical Results# Messages reduces by O(log2 n) times

Page 43: George Saad University of New Mexico Department of Computer Science

Empirical ResultsLatency increases by (1) timesθ

Page 44: George Saad University of New Mexico Department of Computer Science

How to recover from Byzantine faults?

Self-Healing CommunicationMessage is sent through a path of nodes.

Self-Healing ComputationComputation is performed through circuits.

Page 45: George Saad University of New Mexico Department of Computer Science

Quorum Graph• Quorum Graph has:• n input quorums; • m quorum gates; and• one output quorum

Page 46: George Saad University of New Mexico Department of Computer Science

• No self-healing• All nodes in each quorum (gate) perform the same computation• Results are sent between quorums via all-to-all communication• Expensive resource cost

Naïve Computation

Page 47: George Saad University of New Mexico Department of Computer Science

Our Contribution

Naïve Computation Our Algorithm

Message cost O( (n+m) log2 n ) O(m + nlog n)

Computation cost O( (n+m) log2 n ) O(m + nlog n)

Latency O(l) O(l)Corruptions No corruptions O(t(log∗ n)2))

We develop a self-healing algorithm for computation networks

Page 48: George Saad University of New Mexico Department of Computer Science

Our Algorithm (COMPUTE)

COMPUTE

CHECK

EVALUATE

RECOVER

Page 49: George Saad University of New Mexico Department of Computer Science

CHECK Algorithm• CHECK has O(log* n) rounds• In each round, nodes are selected uniformly at random, and same

computation is performed

Round 1

Round 2

Page 50: George Saad University of New Mexico Department of Computer Science

CHECK Algorithm• Adversary corrupts computation in a Deception Subgraph.

• Key points of corruption detection:• We prove that deception subgraph shrinks logarithmically in each

round with constant probability.• Once deception subgraph shrinks to size zero, corruption is

detected.

Page 51: George Saad University of New Mexico Department of Computer Science

Shrinks Logarithmically

Round 1

Round 2

Page 52: George Saad University of New Mexico Department of Computer Science

Shrinks Logarithmically

Round 2

Round 3

Page 53: George Saad University of New Mexico Department of Computer Science

Shrinks Logarithmically

Round 3

Round 4

Page 54: George Saad University of New Mexico Department of Computer Science

RECOVER

• Inspects each node participated what it received and sent

• Marks the nodes that are in conflict* A pair of nodes is in conflict if they accuse each other

• Each pair of nodes in conflict has at least one bad node

Page 55: George Saad University of New Mexico Department of Computer Science

?RECOVER

• If ½ nodes in any quorum are marked, they are set unmarked.

• HEAL is triggered O(t) times before all bad nodes are marked.

• We show that using a potential function argument.

• f(b,g) is monotonically increasing, and• when it reaches t, we are done.

Page 56: George Saad University of New Mexico Department of Computer Science

Empirical Results• Our simulation runs:• over perfect binary trees of quorums,• for different network sizes, up to 8k, and • for different fractions of bad nodes.

• Simulation terminates after all leaders are good.

• The results are taken over 3000 experiments.

Page 57: George Saad University of New Mexico Department of Computer Science

Empirical Results# Messages reduces by factor of 65 (n~8k)

Reduced by afactor of 651.01M

66M

Page 58: George Saad University of New Mexico Department of Computer Science

Empirical ResultsLatency increases by 1.75 times (n~8k)

Increases 1.75 times

63 time steps

36

Page 59: George Saad University of New Mexico Department of Computer Science

Empirical ResultsCorruption Probability 0

Page 60: George Saad University of New Mexico Department of Computer Science

Empirical Results# Messages reduces by O(log2n) times!

Page 61: George Saad University of New Mexico Department of Computer Science

Empirical ResultsLatency increases by (1) timesθ

Page 62: George Saad University of New Mexico Department of Computer Science

Conclusion

• We developed self-healing algorithms to recover networks from Byzantine faults.

• Message cost is reduced polylogarithmically in n, compared to non-self-healing algorithms.

• Experiments show that message cost reduced by • Up to a factor of 60 for communication networks• Up to a factor of 65 for computation networks

• For t < n/4, the expected total number of corruptions is O(t(log∗ n)2)

Page 63: George Saad University of New Mexico Department of Computer Science

Open Problems• Can we limit the number of corruptions to O(t)?• How to self-heal networks with churn? adaptive adversary?• How to self-healing asynchronous networks?• We trigger CHECK and select the nodes in a centralized

manner. How we make CHECK decentralized?• We propose a decentralized CHECK for future work.• We implement a simulation that suggests interesting results.

Page 64: George Saad University of New Mexico Department of Computer Science

Thanks! Any Questions?