brice lecture

Upload: heru-agung-s

Post on 04-Jun-2018

225 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/13/2019 Brice Lecture

    1/41

    Communication Cost of Distributed Computing

    Abbas El Gamal

    Stanford University

    Brice Colloquium, 2009

    A. El Gamal (Stanford University) Comm. Cost of Dist. Computing Brice Colloquium, 2009 1 / 41

    http://-/?-http://-/?-
  • 8/13/2019 Brice Lecture

    2/41

    Motivation

    Performance of distributed information processing systems oftenlimited by communicationDigital VLSI

    Multi-processors

    Data centersPeer-peer networks

    Sensor networks

    Networked mobile agentsPurpose of communication is to make decision, compute function,coordinate action based on distributed data

    How much communication is needed to perform such a task?

    A. El Gamal (Stanford University) Comm. Cost of Dist. Computing Brice Colloquium, 2009 2 / 41

  • 8/13/2019 Brice Lecture

    3/41

    Wireless Video Camera NetworkTodays surveillance systems: Analog; costly; human operatedFuture systems: Digital; networked; self-conguring; automateddetection, e.g., suspicious activity, localization, tracking, . . .

    Sending all video data requires large communication bandwidth , energyA. El Gamal (Stanford University) Comm. Cost of Dist. Computing Brice Colloquium, 2009 3 / 41

  • 8/13/2019 Brice Lecture

    4/41

    Distributed Computing Approach (YSEG 04, EEG 07)

    Scan-lines

    Clusterhead

    Local processing

    Subtract backgroundScan-lines

    A. El Gamal (Stanford University) Comm. Cost of Dist. Computing Brice Colloquium, 2009 4 / 41

  • 8/13/2019 Brice Lecture

    5/41

    Experimental Setup

    View of the setup

    View froma camera

    Top viewof room

    Scan-linesfrom 16

    cameras

    A. El Gamal (Stanford University) Comm. Cost of Dist. Computing Brice Colloquium, 2009 5 / 41

  • 8/13/2019 Brice Lecture

    6/41

    Problem Setup

    Network with m nodes. Node j has data x j Node j wishes to estimate g j (x 1, x 2, . . . , x m ), j = 1 , 2, . . . , mNodes communicate and perform local computing

    Network

    x 1 x 2

    x 3

    x j x m

    What is the minimum amount of communication needed?

    A. El Gamal (Stanford University) Comm. Cost of Dist. Computing Brice Colloquium, 2009 6 / 41

  • 8/13/2019 Brice Lecture

    7/41

    Communication Cost of Distributed Computing

    Problem formulated and studied in several elds:Computer science: Communication complexity; gossipInformation theory: Coding for computing; CEO problemControl: Distributed consensus

    Formulations differ in:Data modelType of communication-local computing protocolEstimation criterionMetric for communication cost

    A. El Gamal (Stanford University) Comm. Cost of Dist. Computing Brice Colloquium, 2009 7 / 41

  • 8/13/2019 Brice Lecture

    8/41

    Outline

    1 Communication Complexity

    2 Information Theory: Lossless Computing

    3 Information Theory: Lossy Computing

    4 Distributed Consensus

    5 Distributed Lossy Averaging

    6 Conclusion

    A. El Gamal (Stanford University) Comm. Cost of Dist. Computing Brice Colloquium, 2009 8 / 41

    Communication Complexity

    http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-
  • 8/13/2019 Brice Lecture

    9/41

    Communication Complexity

    Communication Complexity (Yao 79)

    Two nodes, two-way communicationNode j = 1 , 2 has binary k -vector x j Both nodes wish to compute g (x 1, x 2)Use round robin communication and local computing protocol

    x 1 x 2g g

    What is minimum number of bits of communication needed(communication complexity, C (g )) ?

    A. El Gamal (Stanford University) Comm. Cost of Dist. Computing Brice Colloquium, 2009 9 / 41

    Communication Complexity

  • 8/13/2019 Brice Lecture

    10/41

    Communication Complexity

    Example (Equality Function)g (x 1, x 2) = 1 if x 1 = x 2, and 0, otherwise

    Upper bound: C (g ) k + 1 bitsFor k = 2: C (g ) 3 bits

    x 1

    x 2

    11100100

    00 01 10 11

    0 0 0 10 0 1 00 1 0 01 0 0 0

    Lower bound: Every communication protocol partitions {0, 1}k { 0, 1}k

    into g -monochromatic rectangles , each having distinct code If r (g ) is minimum number of such rectangles, then C (g ) log r (g )For k = 2 , r (g ) = 8 C (g ) 3 bitsIn general, C (g ) k + 1 C (g ) = k + 1 bits

    A. El Gamal (Stanford University) Comm. Cost of Dist. Computing Brice Colloquium, 2009 10 / 41

    Communication Complexity

  • 8/13/2019 Brice Lecture

    11/41

    Communication Complexity

    Number-on-Forehead Game

    m players. Player j has binary k -vector x j on her forehead

    Player j can see all numbers except her own, x j Every player wants to know if x 1 = x 2 = . . . = x m or notPlayers take turns in broadcasting to all other players

    Recall for m = 2, C (g ) = k + 1

    What is C (g ) for m 3?

    Consider the following protocol:Player 1 broadcasts A = 1 if x 2 = x 3 = . . . = x m , A = 0, otherwiseIf A = 0, then all players know that values are not equalIf A = 1 only player 1 does not know if all values are equal or not;Player 2 broadcasts B = 1 if x 1 = x 3 , B = 0, otherwise

    C (g ) = 2 bits suffice independent of m and k !!

    Overlap in information can signicantly reduce communication

    A. El Gamal (Stanford University) Comm. Cost of Dist. Computing Brice Colloquium, 2009 11 / 41

    Communication Complexity

  • 8/13/2019 Brice Lecture

    12/41

    p y

    More on Communication ComplexityOther denitions of communication complexity:

    Average: Uniform distribution over node valuesRandomized: Nodes use coin ips in communicationNon-deterministic : Non-zero probability of computing error

    Some references:

    A. Yao, Some complexity questions related to distributivecomputing, STOC, 1979

    A. Orlitsky, A. El Gamal, Average and randomized communicationcomplexity, IEEE Trans. on Info. Th., 1990

    R. Gallager, Finding parity in a simple broadcast network, IEEE Trans. on Info. Th. , 1988

    E. Kushilevitz and N. Nisan, Communication Complexity, CambridgeUniversity Press, 2006

    A. El Gamal (Stanford University) Comm. Cost of Dist. Computing Brice Colloquium, 2009 12 / 41

    Information Theory: Lossless Computing

  • 8/13/2019 Brice Lecture

    13/41

    y p g

    Information Theory: Lossless Computing

    m node network. Node j observes source X j X j generates i.i.d. process {X j 1, X j 2, . . .}, X ji X a nite setSources may be correlated: (X 1, X 2, . . . , X m ) p (x 1, x 2, . . . , x m )Node j wishes to nds an estimate g ji of g j (X 1i , X 2i , . . . , X mi ) foreach sample i = 1 , 2, . . .

    Block coding: Nodes communicate, perform local computing overblocks of n-samples

    Lossless criterion

    g n

    j = g n

    j for all j { 1, 2, . . . , m} with high probability (whp )

    What is the minimum sum rate in bits/sample-tuple?Problem is in general open, even for 3-nodes

    A. El Gamal (Stanford University) Comm. Cost of Dist. Computing Brice Colloquium, 2009 13 / 41

    Information Theory: Lossless Computing

  • 8/13/2019 Brice Lecture

    14/41

    y p g

    Lossless Compression

    Two nodes, one-way communication

    Sender node observes i.i.d. source X p (x )Receiver node wishes to losslessly estimate X n of X n

    X n X n

    Theorem (Shannon 1948)The minimum lossless compression rate is the entropy of X

    H (X ) := E (log p (X )) bits/sample

    Example (Binary Source)X Bern (p ): H (p ) := p log p (1 p ) log(1 p ) [0, 1]

    A. El Gamal (Stanford University) Comm. Cost of Dist. Computing Brice Colloquium, 2009 14 / 41

  • 8/13/2019 Brice Lecture

    15/41

    Information Theory: Lossless Computing

  • 8/13/2019 Brice Lecture

    16/41

    Lossless Compression with Side InformationTwo nodes, one way communication

    Sender node has i.i.d. source X , receiver node has i.i.d. source Y (X , Y ) p (x , y )Receiver node wishes to estimate X n losslessly

    X n Y n X n

    H (X ) sufficient; Can we do better?If sender also knows Y n , then minimum rate is conditional entropy

    H (X |Y ) := EX , Y (log p (X |Y )) H (X )

    Theorem (Slepian-Wolf 1973)Minimum rate with side information at receiver only is H (X |Y )

    A. El Gamal (Stanford University) Comm. Cost of Dist. Computing Brice Colloquium, 2009 16 / 41

    Information Theory: Lossless Computing

  • 8/13/2019 Brice Lecture

    17/41

    Proof uses random binning

    Example (Doubly Symmetric Binary Sources)Y Bern (1/ 2) and Z Bern (p ) be independent, X = Y Z Receiver has Y , wishes to losslessly estimate X

    Minimum rate: H (X |Y ) = H (Y Z |Y ) = H (Z ) = H (p )For example, if Z Bern (0.01), H (p ) 0.08Versus H (X ) = H (1/ 2) = 1 if correlation is ignored

    RemarksShannon theorem holds for error-free compression (e.g., Lempel-Ziv)Slepian-Wolf does not hold in general for error-free compression

    A. El Gamal (Stanford University) Comm. Cost of Dist. Computing Brice Colloquium, 2009 17 / 41

    Information Theory: Lossless Computing

  • 8/13/2019 Brice Lecture

    18/41

    Lossless Computing with Side Information

    Receiver node wishes to compute g (X , Y )Let R g be minimum rate

    X n Y n g n

    Upper and lower Bounds

    R g H (X |Y )R g H (g (X , Y )|Y )

    A. El Gamal (Stanford University) Comm. Cost of Dist. Computing Brice Colloquium, 2009 18 / 41

    Information Theory: Lossless Computing

  • 8/13/2019 Brice Lecture

    19/41

    Bounds sometimes coincide

    Example (Mod-2 Sum)

    Y Bern (1/ 2), Z Bern (p ) independent, X = Y Z Function : g (X , Y ) = X Y = Z Lower bound: H (X Y |Y ) = H (Z |Y ) = H (p )Upper bound: H (X |Y ) = H (p )

    R g = H (p )

    Bounds do not coincide in general

    ExampleX = ( V 1, V 2, . . . , V 10), V j i.i.d. Bern (1/ 2), Y Unif {1, 2, . . . , 10}Function : g (X , Y ) = V Y Lower bound: H (V Y |Y ) = 1 bitUpper bound: H (X |Y ) = 10 bitsCan show that R g = 10 bits

    A. El Gamal (Stanford University) Comm. Cost of Dist. Computing Brice Colloquium, 2009 19 / 41

    Information Theory: Lossless Computing

  • 8/13/2019 Brice Lecture

    20/41

    Theorem (Orlitsky, Roche 2001)

    R g = H G (X |Y )H G is conditional entropy of characteristic graph of X , Y, and g

    Generalizations:Two way, two rounds:

    A. Orlisky, J. Roche, Coding for computing, IEEE Trans. on Info.Th. , 2001

    Innite number of rounds:N. Ma, P. Ishwar, Two-terminal distributed source coding withalternating messages for function computation, ISIT , 2008

    A. El Gamal (Stanford University) Comm. Cost of Dist. Computing Brice Colloquium, 2009 20 / 41

    Information Theory: Lossless Computing

  • 8/13/2019 Brice Lecture

    21/41

    Lossless Computing: m = 3

    2-sender, 1-receiver network, one-way communicationReceiver wishes to estimate a function g (X , Y ) losslesslyWhat is the minimum sum rate in bits/sample-pair?

    X n

    Y n

    g n

    Problem is in general open

    A. El Gamal (Stanford University) Comm. Cost of Dist. Computing Brice Colloquium, 2009 21 / 41

    Information Theory: Lossless Computing

  • 8/13/2019 Brice Lecture

    22/41

    Distributed Lossless Compression

    Let g (X , Y ) = ( X , Y )

    Theorem (Slepian-Wolf)The minimum achievable sum rate is the joint entropy

    H (X , Y ) = H (X ) + H (Y |X )= H (Y ) + H (X |Y )

    Same as if X , Y were jointly encoded!Achievability uses random binningTheorem can be generalized to m nodes

    A. El Gamal (Stanford University) Comm. Cost of Dist. Computing Brice Colloquium, 2009 22 / 41

    Information Theory: Lossless Computing

  • 8/13/2019 Brice Lecture

    23/41

    Distributed Mod-2 Sum Computing

    Let Y Bern (1/ 2), Z Bern (p ) independent, Y = X Z

    Receiver wishes to compute g (X , Y ) = X Y = Z Slepian-Wolf gives sum rate of H (X , Y ) = 1 + H (p )Can we do better?

    Theorem (Korner, Marton 1979)Minimum sum rate is: 2H (p )

    Minimum sum rate can be nH (p ), AZ n uniquely determines Z n whp

    A. El Gamal (Stanford University) Comm. Cost of Dist. Computing Brice Colloquium, 2009 23 / 41

    Information Theory: Lossy Computing

  • 8/13/2019 Brice Lecture

    24/41

    Information Theory: Lossy Computing

    Node j observes i.i.d. source X j Every node wishes to estimate g (X 1i , X 2i , . . . , X mi ), i = 1 , 2, . . .Nodes communicate and perform local computingAs in lossless case, we use block codes

    MSE distortion criterion

    1mn

    m

    j =1

    n

    i =1

    E (g ij g i )2 D

    What is the minimum sum rate R (D ) bits/sample-tuple?Problem is in general open (even for 2-nodes)

    A. El Gamal (Stanford University) Comm. Cost of Dist. Computing Brice Colloquium, 2009 24 / 41

    Information Theory: Lossy Computing

  • 8/13/2019 Brice Lecture

    25/41

    Lossy Compression

    Two nodes, one-way communication

    Sender node observes i.i.d. Gaussian source X N (0, P )Receiver node wishes to estimate X to prescribed MSE distortion D

    X n X n

    What is the minimum required rate R (D )?

    Theorem (Shannon 1949)

    Let d := D / P be normalized distortion. The rate distortion function is

    R (D ) =12 log

    1d d 1

    0 d > 1

    A. El Gamal (Stanford University) Comm. Cost of Dist. Computing Brice Colloquium, 2009 25 / 41

    Information Theory: Lossy Computing

  • 8/13/2019 Brice Lecture

    26/41

    Lossy Compression with Side Information

    Two nodes, one-way communication(X , Y ) Gaussian; 0 mean; average power P ; correlation coeff. Receiver wishes to estimate X to MSE distortion D

    X n Y n X n

    Theorem (Wyner-Ziv 1976)

    R (D ) =12 log

    (1 2)d d < (1

    2

    )0 otherwise

    d = D / P

    A. El Gamal (Stanford University) Comm. Cost of Dist. Computing Brice Colloquium, 2009 26 / 41

    Information Theory: Lossy Computing

  • 8/13/2019 Brice Lecture

    27/41

    Lossy Averaging with Side InformationReceiver node now wishes to estimate (X + Y )/ 2 to MSE distortion D

    PropositionR (D ) =

    12 log

    (1 2)d d < (1

    2)

    0 otherwise

    d = 4D / P

    Same rate as sending X / 2 with MSE distortion D For two-way; both nodes wish to estimate average with MSE D

    Proposition (Su, El Gamal 2009)

    R (D ) =log (1

    2)d d < (1

    2)

    0 otherwise

    Same as two independent one-way roundsA. El Gamal (Stanford University) Comm. Cost of Dist. Computing Brice Colloquium, 2009 27 / 41

  • 8/13/2019 Brice Lecture

    28/41

    Distributed Consensus

  • 8/13/2019 Brice Lecture

    29/41

    Distributed Consensus: MotivationDistributed coordination, information aggregation in distributed systems

    Flocking and schooling behaviors in nature

    Spread of rumors, epidemics

    Coordination of mobile autonomous vehicles

    Load balancing in parallel computers

    Finding le sizes, free memory size in peer-to-peer networks

    Information aggregation in sensor networks

    In many of these scenarios, communication is asynchronous, subject to linkand node failures and topology changes

    A. El Gamal (Stanford University) Comm. Cost of Dist. Computing Brice Colloquium, 2009 29 / 41

    Distributed Consensus

  • 8/13/2019 Brice Lecture

    30/41

    Distributed Averaging

    m-node network. Node j observes real-valued scalar x j Every node wishes to estimate the average (1/ m) m j =1 x j Communication performed in rounds , for example,

    2 nodes selected in each roundThey exchange information and perform local computing

    Distributed Protocol : Communication and local computing do not depend on node identity

    Synchronous (deterministic) : Node-pair selection predeterminedAsynchronous (Gossip): Node-pairs selected at random

    How many rounds are needed to estimate the average (time-to-consenus)?

    A. El Gamal (Stanford University) Comm. Cost of Dist. Computing Brice Colloquium, 2009 30 / 41

    Distributed Consensus

  • 8/13/2019 Brice Lecture

    31/41

    Recent Related Work

    D. Kemp, A. Dobra, J. Gehrke, Gossip-based computation of aggregate information, FOCS , 2003

    L. Xiao and S. Boyd, Fast linear iterations for distributed averaging,CDC, 2003

    S. Boyd, A. Ghosh, B. Prabhakar, D. Shah, Gossip algorithms:design, analysis and applications, INFOCOM, 2005 gossip,convergence time as function of graph connectivity

    A. El Gamal (Stanford University) Comm. Cost of Dist. Computing Brice Colloquium, 2009 31 / 41

    Distributed Consensus

  • 8/13/2019 Brice Lecture

    32/41

    Example: Centralized Protocol

    Node 1 acts as cluster-headRounds 14: Cluster-headreceives values from other nodes

    The cluster head computes theaverageRounds 58: Cluster-head sendsaverage to other nodesThe number of rounds form-node network: 2m 2Computation is perfect

    x 1

    x 2

    x 3 x 4

    x 5

    7.1

    4.3

    8.2 5.5

    2.9

    7.1 4. 3

    4.3

    4.3

    8 . 2

    8.2

    4.3, 8.2

    5 . 5

    5.5

    4.3, 8.2, 5.5

    2 .9

    2.9

    4.3, 8.2, 5.5, 2.9

    5.6 5. 6

    5.65.6 5

    . 6

    5.65.6

    5 . 6

    5.65.6

    5 .6

    5.6

    5.6

    5.6

    A. El Gamal (Stanford University) Comm. Cost of Dist. Computing Brice Colloquium, 2009 32 / 41

  • 8/13/2019 Brice Lecture

    33/41

    Distributed Consensus

    I

  • 8/13/2019 Brice Lecture

    34/41

    Issues

    Communicating/ computing with innite precision not realistic

    Number of rounds not good measure of communication cost

    Quantized consensus:L. Xiao, S. Boyd, S.J. Kim, Distributed average consensus withleast-mean-square deviation, J. Parallel Distrib. Comput., 2007

    M. Yildiz, A. Scaglione, Coding with side information for rateconstrained consensus, IEEE Trans. on Sig. Proc., 2008

    O. Ayaso, D. Shah, M. Dahleh, Information theoretic bounds ondistributed computatiton, preprint, 2008

    We recently studied lossy computing formulation:H. Su, A. El Gamal,Distributed lossy averaging, ISIT , 2009

    A. El Gamal (Stanford University) Comm. Cost of Dist. Computing Brice Colloquium, 2009 34 / 41

    Distributed Lossy Averaging

    Di ib d L A i (S El G l 2009)

  • 8/13/2019 Brice Lecture

    35/41

    Distributed Lossy Averaging (Su, El Gamal 2009)

    Network with m nodes

    Node j observes Gaussian i.i.d. source X j N (0, 1)Assume sources are independentEach node wishes to estimate the average g n := (1 / m) m j =1 X

    n j

    Rounds of two-way, node-pair communication; block codes used

    Network rate distortion functionR (D ) is minimum sum rate such that

    1mn

    m

    j =1

    n

    i =1E (g ij g i )2 D

    R (D ) known only for m = 2

    A. El Gamal (Stanford University) Comm. Cost of Dist. Computing Brice Colloquium, 2009 35 / 41

    Distributed Lossy Averaging

    C t t L B d R (D )

  • 8/13/2019 Brice Lecture

    36/41

    Cutset Lower Bound on R (D )

    x j

    x 1

    x 2x 3

    x 4

    x 5

    x 6

    x mSuper-nodeP = m 1m 2

    Theorem (Cutset Lower Bound (Su, El Gamal 20090))

    R (D ) m2

    log m 1m2D

    for D < (m 1)/ m2

    Bound is tight for m = 2Can achieve within factor of 2 using centralized protocol for large m

    A. El Gamal (Stanford University) Comm. Cost of Dist. Computing Brice Colloquium, 2009 36 / 41

    Distributed Lossy Averaging

    G i P t l

  • 8/13/2019 Brice Lecture

    37/41

    Gossip ProtocolTwo nodes are selected at random in each round

    Use weighted-average-and-compress schemeBefore communication, node j sets g n j (0) = X n

    j

    At round t + 1: Suppose nodes j , k selected:Nodes exchange lossy estimates of their states g n j (t ), g nk (t )at normalized distortion d Nodes update their estimates:

    g n j (t + 1) = 12

    g n j (t ) + 1

    2(1 d )g nk (t ),

    g nk (t + 1) = 1

    2g nk (t ) +

    1

    2(1 d )

    g n j (t )

    Final estimate for node j is g n j (T )Rate at each round is 2 (1/ 2) log(1/ d )Update equations reduce to gossip when d = 0, rate

    A. El Gamal (Stanford University) Comm. Cost of Dist. Computing Brice Colloquium, 2009 37 / 41

    Distributed Lossy Averaging

    Gossip Protocol

  • 8/13/2019 Brice Lecture

    38/41

    Gossip Protocol

    Let E(R (D )) be expected sum rate over node-pair selectionR (D ) E(R (D ))

    Can show that E(R (D )) larger than cutset bound by roughly a factorof logmCutset bound achievable within factor to 2 using centralized protocol

    Price of gossiping is roughly logm

    Protocol does not exploit build-up in correlation to reduce rate

    A. El Gamal (Stanford University) Comm. Cost of Dist. Computing Brice Colloquium, 2009 38 / 41

    Distributed Lossy Averaging

    Effect of Using Correlation (m = 50)

  • 8/13/2019 Brice Lecture

    39/41

    Effect of Using Correlation (m = 50)

    0 0.005 0.01 0.015 0.020

    20

    40

    60

    80

    D

    E ( R ( D ) )

    Without CorrelationWith Correlation)

    A. El Gamal (Stanford University) Comm. Cost of Dist. Computing Brice Colloquium, 2009 39 / 41

    Conclusion

    Summary

  • 8/13/2019 Brice Lecture

    40/41

    Summary

    Communication cost of distributed computing studied in several elds

    Source Coding/ Estimation CommunicationModel Computing Criterion Cost

    Comm. Comp. discrete, scalar zero error bitsScalar

    Info. theory random block lossless bits/sample-tupleLossy

    Dist. Consensus continuous, scalar MSE roundsscalar

    Many cool results, but much remains to be done

    A. El Gamal (Stanford University) Comm. Cost of Dist. Computing Brice Colloquium, 2009 40 / 41

    Conclusion

  • 8/13/2019 Brice Lecture

    41/41

    Thank You

    A. El Gamal (Stanford University) Comm. Cost of Dist. Computing Brice Colloquium, 2009 41 / 41