network support for cloud services lixin gao, umass amherst

23
Network Support for Cloud Services Lixin Gao, UMass Amherst

Upload: lucinda-wade

Post on 11-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Network Support for Cloud Services Lixin Gao, UMass Amherst

Network Support for Cloud Services

Lixin Gao, UMass Amherst

Page 2: Network Support for Cloud Services Lixin Gao, UMass Amherst

Outline

• Data center networking– Design issues– Resource sharing

• Asynchronous computation model

Page 3: Network Support for Cloud Services Lixin Gao, UMass Amherst

Conventional Data Center Networks

• Hierarchical tree structure• High speed core switches are

expensive• Hard to scale

Page 4: Network Support for Cloud Services Lixin Gao, UMass Amherst

Data Center Network Design

• Commodity Hardware– Server– Switch

• Scalable

• Fat tree, Dcell, Bcube, VL2, ….

Page 5: Network Support for Cloud Services Lixin Gao, UMass Amherst

Dpillar Structure

• Devices– All servers have dual-

port– All switches have n-port

• Server and switch columns– k columns

• Server naming– (col, label), label

• Connecting rule– Servers in and ,

their labels differ at only

011... k

iH 1iH

]1log,0[ 2 ni

i

Page 6: Network Support for Cloud Services Lixin Gao, UMass Amherst

Design Issues

• Inexpensive• Scale to a large number of servers• Fault Tolerant Routing• Load Balancing

Page 7: Network Support for Cloud Services Lixin Gao, UMass Amherst

Network Resource Sharing within Data Center

• Virtualization of CPU (Xen), memory (DiffEng), storage (SAN)

• Network resource can become bottleneck– Sorting and shuffling of MapReduce– Sync among tasks slows down computation– Backup of VMs

• Bandwidth sharing– Granularity: point-to-point or group based– Fair share: centralized vs. distributed– Privacy: public cloud vs. private cloud

Page 8: Network Support for Cloud Services Lixin Gao, UMass Amherst

MapReduce Model• Map: generate key value pairs

• Reduce: aggregate values for a key from multiple sources

• Shuffle and sort

Page 9: Network Support for Cloud Services Lixin Gao, UMass Amherst

Iterative Computations

PageRank

Clustering

BFS

Youtube video suggestion

Pattern Recognition

Page 10: Network Support for Cloud Services Lixin Gao, UMass Amherst

Synchronous Model

• Ease of MapReduce implementation• However,– Overhead of sync operation, sorting– Slow convergence, waste of CPU,

network resources–Many iterative computations can be

performed asynchronously• PageRank, shorest path, adsorption, link

proximity estimation, belief propagation….

Page 11: Network Support for Cloud Services Lixin Gao, UMass Amherst

Shortest Paths

0

∞3

14

2

5

1

5

22

4

3

2

3

1

4

∞1

1

mapreduc

e

Page 12: Network Support for Cloud Services Lixin Gao, UMass Amherst

Shortest Paths

0

∞3

14

2

5

1

5

22

4

3

2

3

1

4

Parallel execution

7

8

3

6

3

∞1

1

8

4

5

5

mapreduc

e

Page 13: Network Support for Cloud Services Lixin Gao, UMass Amherst

Shortest Paths

0

∞3

14

2

5

1

5

22

4

3

2

3

1

4

7

8

3

6

3

∞1

1

8

4

Parallel execution

5

5

mapreduc

e

Page 14: Network Support for Cloud Services Lixin Gao, UMass Amherst

An Asynchronous Model

• A general framework– Eliminate synchronization– Scheduling policy

• Prove correctness for a wide range of applications– PageRank, Personalized PageRank– Link Proximity Estimation

• Commute time, Katz metric, shortest path

– Bayesian Inference• Scheduling policies– Top-k query

Page 15: Network Support for Cloud Services Lixin Gao, UMass Amherst

Shortest Path

Facebook dataset

SSSP-m dataset

Page 16: Network Support for Cloud Services Lixin Gao, UMass Amherst

PageRank

Google webgraph

PageRank-m webgraph

Page 17: Network Support for Cloud Services Lixin Gao, UMass Amherst

Conclusions

• Network design within data center– Design based on commodity hardware– Network resources sharing

• Asynchronous computation framework– Reduced bandwidth requirement – Efficient computation

Page 18: Network Support for Cloud Services Lixin Gao, UMass Amherst

An Example of Outage• planet02.csc.ncsu.edu experiences packet loss on July 30, 2005

Page 19: Network Support for Cloud Services Lixin Gao, UMass Amherst

Causes of Outages• Most lost packets are caused by routing

outages

Failure Type Lost packets

fraction

unknown 14572 0.2

Routing dynamics

58111 0.8

Page 20: Network Support for Cloud Services Lixin Gao, UMass Amherst

Towards 5 Nines Reliability

• Exploiting redundancy on Internet Path–Multiple routing instances to ensure

consistency

• Exploiting multiple sites within a cloud– Site selection through route monitoring– Deliver through private WAN

Page 21: Network Support for Cloud Services Lixin Gao, UMass Amherst

Packet Loss due to Routing Failures

• Failover events: 76% packets lost• Recovery events: 26% packets lost

Failover Recovery

Page 22: Network Support for Cloud Services Lixin Gao, UMass Amherst

Round-trip Delay• Failover events have significant impact

on packet round-trip delays. In the worst case, packet round-trip delays can be more than 900msec.

Failover Recovery

Page 23: Network Support for Cloud Services Lixin Gao, UMass Amherst

Reordering during Failover Events

• The number of reordered packets is small. However, the offset of reordered packets is large.

• Larger buffer sizes for real-time applications.