peregrine : an all-layer-2 container computer network

55
1 Peregrine: An All- Layer-2 Container Computer Network Tzi-cker Chiueh , Cheng-Chun Tu , Yu-Cheng Wang, Pai-Wei Wang, Kai-Wen Li, and Yu- Ming Huan Computer Science Department, Stony Brook University †Industrial Technology Research Institute, Taiwan 1

Upload: calida

Post on 06-Jan-2016

49 views

Category:

Documents


0 download

DESCRIPTION

Peregrine : An All-Layer-2 Container Computer Network. Tzi-cker Chiueh , Cheng-Chun Tu , Yu-Cheng Wang, Pai-Wei Wang, Kai-Wen Li, and Yu-Ming Huan ∗Computer Science Department, Stony Brook University †Industrial Technology Research Institute, Taiwan. 1. Outline. Motivation - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Peregrine : An All-Layer-2 Container Computer Network

1

Peregrine: An All-Layer-2 Container Computer Network

Tzi-cker Chiueh , Cheng-Chun Tu, Yu-Cheng Wang, Pai-Wei Wang, Kai-Wen Li, and Yu-Ming Huan∗Computer Science Department, Stony Brook University

†Industrial Technology Research Institute, Taiwan

1

Page 2: Peregrine : An All-Layer-2 Container Computer Network

Outline

Motivation Layer 2 + Layer 3 design Requirements for cloud-scale DC Problems of Classic Ethernet to the Cloud

Solutions Related Solutions Peregrine’s Solution

Implementation and Evaluation Software architecture Performance Evaluation

2

Page 3: Peregrine : An All-Layer-2 Container Computer Network

L2 + L3 Architecture: Problems3

Problem: Configuration:- Routing table in the routers- IP assignment- DHCP coordination - VLAN and STP

Problem: Configuration:- Routing table in the routers- IP assignment- DHCP coordination - VLAN and STP

Virtual Machine Mobility Constrained to a Physical Location

Virtual Machine Mobility Constrained to a Physical Location

Bandwidth BottleneckBandwidth Bottleneck

Ref: Cisco data center with FabricPath and the Cisco FabricPath Switching System

Problem: forwarding table sizeCommodity switch 16-32k

Problem: forwarding table sizeCommodity switch 16-32k

Page 4: Peregrine : An All-Layer-2 Container Computer Network

Requirements for Cloud-Scale DC Any-to-any connectivity with non-blocking fabric

Scale to more than 10,000 physical nodes Virtual machine mobility

Large layer 2 domain Fast fail-over

Quick failure detection and recovery Support for Multi-tenancy

Share resources between different customers Load balancing routing

Efficiently use all available links

4

Page 5: Peregrine : An All-Layer-2 Container Computer Network

Layer 2 Switch

Solution: A Huge L2 Switch!

5

Single L2 NetworkSingle L2 Network Non-blocking Backplane Bandwidth

Non-blocking Backplane Bandwidth

Config-free, Plug-and play

Config-free, Plug-and play

Linear cost and power scaling

Linear cost and power scaling

VMVMVMVM

VMVMVMVM

VMVMVMVM

VMVMVMVM

VMVMVMVM

VMVMVMVM

….. Scale to 1 million VMs

However, Ethernet does not scale!However, Ethernet does not scale!

5

Page 6: Peregrine : An All-Layer-2 Container Computer Network

Revisit Ethernet: Spanning Tree Topology

s1

N3 N5

N1 N8 N7

N6

N4

N2

s3 s4

s2

6

Page 7: Peregrine : An All-Layer-2 Container Computer Network

Revisit Ethernet: Spanning Tree Topology

s1

N3 N5

N1 N8 N7

N6

N4

N2

s3 s4

s2

Root

RD

D B

7

Page 8: Peregrine : An All-Layer-2 Container Computer Network

Revisit Ethernet: Broadcast and Source Learning

s1

N3 N5

N1 N8 N7

N6

N4

N2

s3 s4

s2

Root

RD

D B

Benefit: plug-and-playBenefit: plug-and-play

8

Page 9: Peregrine : An All-Layer-2 Container Computer Network

Ethernet’s Scalability Issues Limited forwarding table size

Commodity switch: 16k to 64k entries STP as a solution to loop prevention

Not all physical links are used No load-sensitive dynamic routing

Slow fail-over Fail-over latency is high ( > 5 seconds)

Broadcast overhead Typical Layer 2 consists of hundreds of hosts

9

Page 10: Peregrine : An All-Layer-2 Container Computer Network

Related Works / Solution Strategies

Scalability: Clos network / Fat-tree to scale out

Alternative to STP Link aggregation, e.g. LACP, Layer 2 trunking Routing protocols to layer 2 network

Limited forwarding table size Packet header encapsulation or re-writing

Load balancing Randomness or traffic engineering approach

10

Page 11: Peregrine : An All-Layer-2 Container Computer Network

Design of the Peregrine11

Page 12: Peregrine : An All-Layer-2 Container Computer Network

Peregrine’s Solutions

Not all links are used Disable Spanning Tree protocol

L2 Loop Prevention Redirect broadcast and block flooding packet

Source learning and forwarding calculate all routes for all node-pairs by Route Server

Limited switch forwarding table size Mac-in-Mac two stage forwarding by Dom0 kernel module

12

Page 13: Peregrine : An All-Layer-2 Container Computer Network

ARP Intercept and Redirect13

AA sw1sw1

sw2sw2

sw3sw3

sw4sw4 BB

DSDS

1. DS-ARP 2. DS-Reply

Control flowData flow

3. Send data

RAS

RAS

Directory Service Route Algorithm Server

Page 14: Peregrine : An All-Layer-2 Container Computer Network

Peregrine’s Solutions

Not all links are used Disable Spanning Tree protocol

L2 Loop Prevention Redirect broadcast and block flooding packet

Source learning and forwarding Calculate all routes for all node-pairs by Route Server Fast fail-over: primary and backup routes for each pair

Limited switch forwarding table size Mac-in-Mac two stage forwarding by Dom0 kernel module

14

Page 15: Peregrine : An All-Layer-2 Container Computer Network

Mac-in-Mac Encapsulation

AA sw1sw1

sw2sw2

sw3sw3

sw4sw4 BB

DSDS

1. ARP redirect

2. B locates at sw4

3. sw4

Control flowData flow

5. Decap and restore original frame

Decapsulation

4. Encap sw4 in source mac

Encapsulation

sw 4 B A

15

Page 16: Peregrine : An All-Layer-2 Container Computer Network

Fast Fail-Over16

Goal: Fail-over latency < 100 msec Application agnostic TCP timeout: 200ms

Strategy: Pre-compute a primary and backup route for each VM Each VM has two virtual MACs When a link fails, notify hosts using affected primary routes

that they should switch to corresponding backup routes

Page 17: Peregrine : An All-Layer-2 Container Computer Network

When a Network Link Fails17

Page 18: Peregrine : An All-Layer-2 Container Computer Network

IMPLEMENTATION & Evaluation

18

Page 19: Peregrine : An All-Layer-2 Container Computer Network

Software Architecture19

Page 20: Peregrine : An All-Layer-2 Container Computer Network

Review All Components

AA sw1sw1

sw2sw2

sw3sw3

sw4sw4 BB

DSDS

ARP redirect

ARP request rate

sw5sw5

sw6sw6

sw7sw7

RAS

RAS

How fast can DS handle request?

How fast can DS handle request?

How long can RAS process request?How long can RAS process request?MIM moduleMIM module

Backup route

Performance of: MIM, DS, RAS, switch ?

Performance of: MIM, DS, RAS, switch ?

20

Page 21: Peregrine : An All-Layer-2 Container Computer Network

Mac-In-Mac Performance

Time spent for decap/encap/total: 1us / 5us / 7us (2.66GHz CPU)Around 2.66K / 13.3K / 18.6K cycles

21

cdf

Page 22: Peregrine : An All-Layer-2 Container Computer Network

Aggregate Throughput for Multiple VMs

1. APR table size < 1k2. Measure TCP throughput of 1VM, 2VM, 4VM communicating to each

other.

22

Page 23: Peregrine : An All-Layer-2 Container Computer Network

ARP Broadcast Rate in a Data Center What’s the ARP traffic rate in real world?

From 2456 hosts, CMU CS department claims that there are 1150 ARP/sec at peak, 89 ARP/sec on average.

From 3800 hosts at university network, there are around 1000 ARP/sec at peak, < 100 ARP/sec on average.

To scale to 1M node, 20K-30K ARP/sec on average. Current optimal DS: 100K ARP/sec

23

Page 24: Peregrine : An All-Layer-2 Container Computer Network

Fail-over time and its breakdown Average fail-over time:

75ms Switch: 25 ~ 45 ms

sending trap (soft unplug) RS: 25ms

receiving trap and processing DS: 2ms

receiving info from RS and inform DS

The rests are network delay and dom0 processing time

24

Page 25: Peregrine : An All-Layer-2 Container Computer Network

Conclusion

A unified Layer-2-only network for LAN and SAN Centralized control plane and distributed data

plane Use only Commodity Ethernet switches

Army of commodity switches vs. few high-port-density switches

Requirements on switches: run fast and has programmable routing table

Centralized load-balancing routing using real-time traffic matrix

Fast fail-over using pre-computed primary/back routes

25

Page 26: Peregrine : An All-Layer-2 Container Computer Network

Thank you

Questions?26

Page 27: Peregrine : An All-Layer-2 Container Computer Network

Review All Components: Result

27

AA sw1sw1

sw2sw2

sw3sw3

sw4sw4 BB

DSDS

ARP redirect

ARP request rate

sw5sw5

sw6sw6

sw7sw7

RSRS

100K ARP/sec100K ARP/sec

25ms per request25ms per request

Link down35ms

7us for Packet processing

7us for Packet processing

Backup route

27

Page 28: Peregrine : An All-Layer-2 Container Computer Network

Thank you

Backup slides28

Page 29: Peregrine : An All-Layer-2 Container Computer Network

OpenFlow Architecture

OpenFlow switch: A data plane that implements a set of flow rules specified in terms of the OpenFlow instruction set

OpenFlow controller: A control plane that sets up the flow rules in the flow tables of OpenFlow switches

OpenFlow protocol: A secure protocol for an OpenFlow controller to set up the flow tables in OpenFlow switches

29

Page 30: Peregrine : An All-Layer-2 Container Computer Network

Data Path (Hardware)Data Path (Hardware)

Control Control PathPath OpenFlowOpenFlow

OpenFlow OpenFlow ControllerController

OpenFlow Protocol (SSL/TCP)

30

Page 31: Peregrine : An All-Layer-2 Container Computer Network

Conclusion and Contribution

Using commodity switches to build a large scale layer 2 network

Provide solutions to Ethernet’s scalability issues Suppressing broadcast Load balancing route calculation Controlling MAC forwarding table Scale up to one million VMs by Mac-in-Mac two stage forwarding Fast fail-over

Future work High Availability of DS and RAS, mater-slave model Inter

31

Page 32: Peregrine : An All-Layer-2 Container Computer Network

Comparisons

Scalable and available data center fabrics IEEE 802.1aq: Shortest Path Bridging IETF TRILL Competitors: Cisco, Juniper, Brocade Differences: commodity switches, centralized load

balancing routing and proactive backup route deployment

Network virtualization OpenStack Quantum API Competitors: Nicira, NEC Generality carries a steep performance price

Every virtual network link is a tunnel Differences: Simpler and more efficient because it runs

on L2 switches directly

32

Page 33: Peregrine : An All-Layer-2 Container Computer Network

Three Stage Clos Network (m,n,r)

1n

2n

rn

.

.

.

1

2

m

.

.

.

1

2

r

.

.

.

n

n

n

r x r m x nn x m

33

Page 34: Peregrine : An All-Layer-2 Container Computer Network

Clos Network Theory

Clos(m, n, r) configuration:rn inputs, rn outputs

2r nxm + m rxr switches, less than rn x rn Each rxr switch can in turn be implemented as

a 3-stage Clos network Clos(m,n,r) is rearrangeably non-blocking iff m >= n Clos(m,n,r) is stricly non-blocking iff m >= 2n-1

34

Page 35: Peregrine : An All-Layer-2 Container Computer Network

Link Aggregation35

Page 36: Peregrine : An All-Layer-2 Container Computer Network

ECMP: Equal-Cost Multipath

Pros: multiple links are used, Cons: hash collision, re-converge downstream to a single link

Pros: multiple links are used, Cons: hash collision, re-converge downstream to a single link

36

Page 37: Peregrine : An All-Layer-2 Container Computer Network

Example: Brocade Data Center

Ref: Deploying Brocade VDX 6720 Data Center Switches with Brocade VCS in Enterprise Data Centers

Link aggregationLink aggregation

L3 ECMPL3 ECMP

37

Page 38: Peregrine : An All-Layer-2 Container Computer Network

PortLand• Scale-out: Three-layer,

multi-root topology• Hierarchical, encode

location into MAC address• Local Discover Protocol to

find shortest path, route by MAC

• Fabric Manager maintains IP to MAC mapping

• 60-80 ms failover, centrally control and notify

38

Page 39: Peregrine : An All-Layer-2 Container Computer Network

VL2: Virtual Layer 2

• Three layer, Clos network• Flat, IP-in-IP, Location

address(LA) an Application Address (AA)

• Link-state routing to disseminate LA

• VLB + flow-based ECMP• Depend on ECMP to

detect link failure• Packet interception at S VL2 Directory Service

39

Page 40: Peregrine : An All-Layer-2 Container Computer Network

Monsoon• Three layer, multi-root

topology• 802.1ah MAC-in-MAC

encapsulation, source routing

• centralized routing decision

• VLB + MAC rotation• Depend on LSA to detect

failures• Packet interception at S Monsoon Directory Service

IP <-> (server MAC, ToR MAC)

40

Page 41: Peregrine : An All-Layer-2 Container Computer Network

TRILL and SPB

TRILL• Transparent Interconnect of

Lots of Links, IETF• IS-IS as a topology

management protocol• Shortest path forwarding• New TRILL header• Transit hash to select next-

hop

SPB• Shortest Path Bridging, IEEE• IS-IS as a topology

management protocol• Shortest path forwarding• 802.1ah MAC-in-MAC• Compute 16 source node

based trees

41

Page 42: Peregrine : An All-Layer-2 Container Computer Network

TRILL Packet ForwardingLink-state routing

TRILL header

A-ID: nickname of AC-ID: nickname of C

HopC: hop countRef: NIL Data Communications42

Page 43: Peregrine : An All-Layer-2 Container Computer Network

SPB Packet ForwardingLink-state routing

802.1ah Mac-in-Mac

Ref: NIL Data Communications

I-SID: Backbone Service Instance IdentifierB-VID: backbone VLAN identifier

43

Page 44: Peregrine : An All-Layer-2 Container Computer Network

Re-arrangeable non-blocking Clos network

nxk (N/n)x(N/n) kxn

N=6n=2k=2

3x3

3x3

2x2

2x2

2x2

2x2

2x2

2x2input output

Example:1. Three-stage Clos network2. Condition: k>=n3. An unused input at ingress switch can always be connected

to an unused output at egress switch4. Existing calls may have to be rearranged

ingress middle egress

44

Page 45: Peregrine : An All-Layer-2 Container Computer Network

Features of Peregrine network

• Utilize all links• Load balancing routing algorithm• Scale up to 1 million VMs

– Two stage dual mode forwarding• Fast fail over• Load balancing routing algorithm

45 45

Page 46: Peregrine : An All-Layer-2 Container Computer Network

Goal

• Given a mesh network and traffic profile– Load balance the network resource utilization

• Prevent congestion by balancing the network load to support as many traffic load as possible

– Provide fast recovery from failure• Provide primary-backup route to minimize recovery

time

S D

Primary

Backup

46

Page 47: Peregrine : An All-Layer-2 Container Computer Network

Factors

• Only hop count• Hop count and link residual capacity• Hop count, link residual capacity, and link

expected load• Hop count, link residual capacity, link expected

load and additional forwarding table entries requiredHow to combine them into one number for a particular candidate

route?How to combine them into one number for a particular candidate

route?

47

Page 48: Peregrine : An All-Layer-2 Container Computer Network

Route Selection: idea

S1 D1

D2S2

A B

C D

Which route is better from S1 to D1?

S2-D2 shares link C-D

Leave C-D free

Share with S2-D2

Link C-D is more important! Idea: use it as sparsely as possible

Link C-D is more important! Idea: use it as sparsely as possible

48

Page 49: Peregrine : An All-Layer-2 Container Computer Network

Route Selection: hop count and Residual capacity

S1 D1

D2S2

A B

C D

Leave C-D free

Share with S2-D2

Using Hop count or residual capacity makes no difference!

Using Hop count or residual capacity makes no difference!

Traffic Matrix:S1 -> D1: 1GS2 -> D2: 1G

Traffic Matrix:S1 -> D1: 1GS2 -> D2: 1G

49

Page 50: Peregrine : An All-Layer-2 Container Computer Network

Determine Criticality of A Link

= fraction of all (s, d) routes that pass through link l

Expected load of a link at initial stateExpected load of a link at initial state

= Bandwidth demand matrix for s and d

Determine the importance of a linkDetermine the importance of a link

50

Page 51: Peregrine : An All-Layer-2 Container Computer Network

Criticality ExampleFrom B to C has four possible routes.

A B C

0 4/4

2/4

Case2: Calculates = B, d = C

Case3: s = A, d = C is similar

Case2: Calculates = B, d = C

Case3: s = A, d = C is similar

2/4 2/4

2/42/4

2/4

4/4

51

Page 52: Peregrine : An All-Layer-2 Container Computer Network

Expected LoadAssumption: load is equally distributed over each

possible routes between S and D.

Consider bandwidth demand for B-C is 20.Expected Load:

Consider bandwidth demand for B-C is 20.Expected Load:

A B C

0 20

10 10

20

1010

10

10

52

Page 53: Peregrine : An All-Layer-2 Container Computer Network

Cost MetricsCost metric represents the expected load per

unit of available capacity on the link

= Residual Capacity= Expected Load

Idea: pick the link with minimum costIdea: pick the link with minimum cost

A B C

0.01 0.01

0.01 0.010.01 0.01

0 0.02 0.02

53

Page 54: Peregrine : An All-Layer-2 Container Computer Network

Forwarding Table MetricConsider using commodity switch with 16-32k

forwarding table size.

Idea: minimize entry consumption, prevent forwarding table from being exhaustedIdea: minimize entry consumption, prevent forwarding table from being exhaustedA B C

100

100

0200

300

= available forwarding table entries at node n= available forwarding table entries at node n

INC_FWD= extra entries needed to route A-C

INC_FWD= extra entries needed to route A-C

54

Page 55: Peregrine : An All-Layer-2 Container Computer Network

Load Balanced Routing• Simulated network

– 52 PMs with 4 NICs, total 384 links– Replay 17 multi-VDC 300-second traces

• Compare – Random shortest path

routing (RSPR)– Full Link Criticality-based

routing (FLCR)• Metrics: congestion count

– # of links with exceeded capacity

• Low additional traffic induced by FLCR

55