windows azure: scaling sdn in the public cloud

40

Upload: open-networking-summits

Post on 29-Jul-2015

144 views

Category:

Technology


1 download

TRANSCRIPT

Page 3: Windows Azure: Scaling SDN in the Public Cloud

Windows Azure: Scaling SDN in the Public Cloud

Albert Greenberg

Director of Development

Windows Azure Networking

[email protected]

Page 4: Windows Azure: Scaling SDN in the Public Cloud

• Microsoft’s big bet on public cloud

• Companies move their IT infrastructure to the cloud

• Elastic scaling and less expensive than on-premises DC

• Runs major Microsoft properties (Office 365, OneDrive, Skype, Bing, Xbox)

Page 5: Windows Azure: Scaling SDN in the Public Cloud

Summary • Scenario: BYO Virtual Network to the Cloud

• Per customer, with capabilities equivalent to on premise counterpart

• Challenge: How do we scale virtual networks across millions of servers?

• Solution: Host SDN solves it: scale, flexibility, timely feature rollout, debuggabililty • Virtual networks, software load balancing, …

• How: Scaling flow processing to millions of nodes • Flow tables on the host, with on-demand rule dissemination

• RDMA to storage

• Demo: ExpressRoute to the Cloud (Bing it!)

Page 6: Windows Azure: Scaling SDN in the Public Cloud

Infrastructure as a Service: Develop, test, run your apps

Easy VM portability

If it runs on Hyper-V, it runs in Windows Azure: Windows, Linux, … (Ubuntu, redis, mongodb, redis, …)

Deploy VMs anywhere with no lock-in

Page 7: Windows Azure: Scaling SDN in the Public Cloud

What Does IaaS Mean for Networking? Scenario: BYO Network

Windows Azure Virtual Networks

• Goal: BYO Address Space + Policy

• Azure is just another branch office of your enterprise, via VPN

• Communication between tenants of your Azure deployment should be efficient and scalable

10.1/16 10.1/16

Secu

re T

un

ne

l

Page 8: Windows Azure: Scaling SDN in the Public Cloud

Public Cloud Scale

Page 9: Windows Azure: Scaling SDN in the Public Cloud
Page 10: Windows Azure: Scaling SDN in the Public Cloud

2010 2014

Compute Instances

Page 11: Windows Azure: Scaling SDN in the Public Cloud

2010 2014

Azure Storage

Page 12: Windows Azure: Scaling SDN in the Public Cloud

2010 2014

Azure DC Network Capacity

Page 13: Windows Azure: Scaling SDN in the Public Cloud

Windows Azure momentum

Page 14: Windows Azure: Scaling SDN in the Public Cloud

How do we support 50k+ virtual networks, spread over a single 100k+ server deployment in a DC?

Start by finding the right abstractions

Page 15: Windows Azure: Scaling SDN in the Public Cloud

SDN: Building the right abstractions for Scale

Abstract by separating management, control, and data planes

Azure Frontend

Controller

Switch

Management Plane

Control Plane

Management plane Create a tenant

Control plane Plumb these tenant ACLs to these switches

Data plane Apply these ACLs to these flows

Example: ACLs

• Data plane needs to apply per-flow policy to millions of VMs

• How do we apply billions of flow policy actions to packets?

Page 16: Windows Azure: Scaling SDN in the Public Cloud

Solution: Host Networking

• If every host performs all packet actions for its own VMs, scale is much more tractable

• Use a tiny bit of the distributed computing power of millions of servers to solve the SDN problem • If millions of hosts work to implement billions of flows, each host only needs

thousands

• Build the controller abstraction to push all SDN to the host

Page 17: Windows Azure: Scaling SDN in the Public Cloud

VNets on the Host

• A VNet is essentially a set of mappings from a customer defined address space (CAs) to provider addresses (PAs) of hosts where VMs are located

• Separate the interface to specify a VNet from the interface to plumb mappings to switches via a Network Controller

• All CA<-> PA mappings for a local VM reside on the VM’s host, and are applied there

Azure Frontend

Controller

Customer Config

VNet Description (CAs)

L3 Forwarding Policy (CAs <-> PAs)

VMSwitch VMSwitch

Blue VMs CA Space

Green VMs CA Space

Northbound API

Southbound API

Page 18: Windows Azure: Scaling SDN in the Public Cloud

VNet Controller Azure Frontend

Controller

Node1: 10.1.1.5

Blue VM1 10.1.1.2

Green VM1 10.1.1.2

Azure VMSwitch

Node2: 10.1.1.6

Red VM1 10.1.1.2

Green VM2 10.1.1.3

Azure VMSwitch

Node3: 10.1.1.7

Green S2S GW 10.1.2.1

Azure VMSwitch

Green Enterpise Network 10.2/16

VPN GW

Customer Config

VNet Description

L3 Forwarding Policy

Secondary Controllers

Consensus Protocol

Page 19: Windows Azure: Scaling SDN in the Public Cloud

Forwarding Policy: Traffic to on-prem

Node1: 10.1.1.5

Blue VM1 10.1.1.2

Green VM1 10.1.1.2

Azure VMSwitch Src:10.1.1.2 Dst:10.2.0.9

Src:10.1.1.2 Dst:10.2.0.9

Policy lookup: 10.2/16 routes to GW on host with PA 10.1.1.7

Controller

Src:10.1.1.5 Dst:10.1.1.7 GRE:Green Src:10.1.1.2 Dst:10.2.0.9

L3 Forwarding Policy

Node3: 10.1.1.7

Green S2S GW 10.1.2.1

Azure VMSwitch

Green Enterpise Network 10.2/16

VPN GW

Src:10.1.1.2 Dst:10.2.0.9 L3VPN PPP

Page 20: Windows Azure: Scaling SDN in the Public Cloud

IaaS VM

Cloud Load Balancing

• All infrastructure runs behind an LB to enable high availability and application scale

• How do we make application load balancing scale to the cloud?

• Challenges: • Load balancing the load balancers

• Hardware LBs are expensive, and cannot support the rapid creation/deletion of LB endpoints required in the cloud

• Support 10s of Gbps per cluster

• Support a simple provisioning model

LB

Web Server VM

Web Server VM

SQL Service

IaaS VM

SQL Service

Page 21: Windows Azure: Scaling SDN in the Public Cloud

NAT

All-Software Load Balancer: Scale using the Hosts

LB VM

VM DIP 10.1.1.2

VM DIP 10.1.1.3

Azure VMSwitch

Stateless Tunnel

Edge Routers

Client

VIP

VIP

DIP DIP

Direct Return:

VIP

VIP

LB VM

VM DIP 10.1.1.4

VM DIP 10.1.1.5

Azure VMSwitch

NAT Controller

Tenant Definition: VIPs, # DIPs

Mappings

• Goal of an LB: Map a Virtual IP (VIP) to a Dynamic IP (DIP) set of a cloud service

• Two steps: Load Balance (select a

DIP) and NAT (translate VIP->DIP and ports)

• Pushing the NAT to the vswitch

makes the LBs stateless (ECMP) and enables direct return

• SDN controller abstracts out

LB/vswitch interactions

NAT

Page 22: Windows Azure: Scaling SDN in the Public Cloud

How We Scaled Host SDN

Page 23: Windows Azure: Scaling SDN in the Public Cloud

Flow Tables are the right abstraction

Node: 10.4.1.5

Azure VMSwitch

Blue VM1 10.1.1.2

NIC

Controller

Tenant Description

VNet Description

Flow Action

VNet Routing Policy

ACLs NAT Endpoints

Flow Action Flow Action

TO: 10.2/16 Encap to GW

TO: 10.1.1.5 Encap to 10.5.1.7

TO: !10/8 NAT out of VNET

Flow Action Flow Action

TO: 79.3.1.2 DNAT to 10.1.1.2

TO: !10/8 SNAT to 79.3.1.2

Flow Action

TO: 10.1.1/24 Allow

10.4/16 Block

TO: !10/8 Allow

• VMSwitch exposes a typed Match-Action-Table API to the controller

• One table per policy

• Key insight: Let controller tell the switch exactly what to do with which packets (e.g. encap/decap), rather than trying to use existing abstractions (Tunnels, …)

VNET LB NAT ACLS

Page 24: Windows Azure: Scaling SDN in the Public Cloud

1. Table typing and flow caching are critical to Dataplane Performance

Node: 10.4.1.5

Azure VMSwitch

Blue VM1 10.1.1.2

NIC

Flow Action Flow Action Flow Action

TO: 10.2/16 Encap to GW

TO: 10.1.1.5 Encap to 10.5.1.7

TO: !10/8 NAT out of VNET

Flow Action Flow Action

TO: 79.3.1.2 DNAT to 10.1.1.2

TO: !10/8 SNAT to 79.3.1.2

Flow Action

TO: 10.1.1/24 Allow

10.4/16 Block

TO: !10/8 Allow

VNET LB NAT ACLS

• COGS in the cloud is driven by VM density – 40GbE is here

• NIC Offloads are critical to achieving density

• Requires significant design work in the VMSwitch to scale overlay / NAT / ACL policy to line speed

• First-packet actions can be complex, but established-flow matches need to be typed, predictable, and simple

Page 25: Windows Azure: Scaling SDN in the Public Cloud

Node: 10.4.1.5

Azure VMSwitch

2. Separate Controllers By Application

Blue VM1 10.1.1.2

NIC

LB Controller

Tenant Description

VNet Description

Flow Action

VNet Routing Policy

ACLs NAT Endpoints

Flow Action Flow Action

TO: 10.2/16 Encap to GW

TO: 10.1.1.5 Encap to 10.5.1.7

TO: !10/8 NAT out of VNET

Flow Action Flow Action

TO: 79.3.1.2 DNAT to 10.1.1.2

TO: !10/8 SNAT to 79.3.1.2

Flow Action

TO: 10.1.1/24 Allow

10.4/16 Block

TO: !10/8 Allow

VNET LB NAT ACLS

Network Controller

VNet Controller

LB

VIP Endpoints

Northbound API

Page 26: Windows Azure: Scaling SDN in the Public Cloud

3. Eventing: Agents are also per-Application

• Attempting to give each VMSwitch a synchronously consistent view of the entire network is not scalable

• Separate rapidly changing policy (location mappings of VMs in VNet) from static provisioning policy

• VMSwitches should request needed

mappings on-demand via eventing • We need a smart host agent to

handle eventing and look up mappings

Azure VMSwitch

Blue VM1 10.1.1.2

NIC

Flow Action Flow Action

TO: 10.2/16 Encap to GW

TO: 10.1.1.5 Encap to 10.5.1.7

TO: !10/8 NAT out of VNET

VNET

VNet Agent

VNet Controller

Mapping Service Mapping Service

Mapping Service

Policy (once)

Policy Mapping Request Event (No policy found for packet)

Mapping Request

Mappings

Page 27: Windows Azure: Scaling SDN in the Public Cloud

Eventing: The Real API is on the Host

• The wire protocols between the controller, agent, and related services are now application specific (rather than generic SDN APIs)

• The real southbound API (which is

implemented by VNet, LB, ACLs, etc) is now between the Agents and the VMSwitch • High performance OS-level API rather than a

wire protocol

• We have found that eventing is a requirement of any nontrivial SDN application

Azure VMSwitch

Blue VM1 10.1.1.2

NIC

Flow Action Flow Action

TO: 10.2/16 Encap to GW

TO: 10.1.1.5 Encap to 10.5.1.7

TO: !10/8 NAT out of VNET

VNET

Vnet Agent

VNet Controller

Mapping Service Mapping Service

Mapping Service

Policy (once)

Mapping Request Event (No policy found for packet)

Mapping Request

Southbound API

VNet Application

Mappings

Page 28: Windows Azure: Scaling SDN in the Public Cloud

• VNet scope is a region – 100k+ nodes. One controller can’t manage them all!

• Solution: Regional controller defines the VNet, local controller programs end hosts

• Make the Mapping Service hierarchical, enabling DNS-style recursive lookup

VNET

Agent

Local Controller

Local Mappings

Policy Mapping Request

Mappings

4. Separate Regional and Local Controllers

Flow Action

TO: 10.2/16 Encap to GW

TO: 10.1.1.5 Encap to 10.5.1.7

TO: !10/8 NAT out of VNET

VNET

Agent

Local Controller

Local Mappings

Policy Mapping Request

Mappings

Flow Action

TO: 10.2/16 Encap to GW

TO: 10.1.1.5 Encap to 10.5.1.7

TO: !10/8 NAT out of VNET

Regional Controller

Regional Controller

Regional Controller

Regional Controller

Regional Controller

Regional Mappings

Mapping Request

VNet Description

Policy

Page 29: Windows Azure: Scaling SDN in the Public Cloud

A complete virtual network needs storage as well as compute!

How do we make Azure Storage scale?

Page 30: Windows Azure: Scaling SDN in the Public Cloud

Storage is Software Defined, Too

• Erasure Coding provides durability of 3-copy writes with small (<1.5x) overhead by distributing coded blocks over many servers

• Lots of network I/O for each storage I/O

Write Commit

Erasure Code

• We want to make storage clusters scale cheaply on commodity servers

To make storage cheaper, we use lots more network!

Page 31: Windows Azure: Scaling SDN in the Public Cloud

RDMA – High Performance Transport for Storage

• Remote DMA primitives (e.g. Read address, Write address) implemented on-NIC • Zero Copy (NIC handles all transfers via DMA) • Zero CPU Utilization at 40Gbps (NIC handles all packetization) • <2μs E2E latency

• RoCE enables Infiniband RDMA transport over IP/Ethernet network (all L3)

• Enabled at 40GbE for Windows Azure Storage, achieving massive COGS savings by eliminating many CPUs in the rack

All the logic is in the host: Software Defined Storage now scales with the Software Defined Network

NIC

Application

NIC

Application

Memory

Buffer A

Memory

Buffer B

Write local buffer at Address A to remote buffer at Address B

Buffer B is filled

Page 32: Windows Azure: Scaling SDN in the Public Cloud

Just so we’re clear… 40Gbps of I/O with 0%

CPU

Page 33: Windows Azure: Scaling SDN in the Public Cloud

Hybrid Cloud: How do we Onboard Enterprise?

Page 34: Windows Azure: Scaling SDN in the Public Cloud

Public internet

Public internet

ExpressRoute: Direct Connection to Your VNet

• All VNET policy to tunnel to/from customer circuit implemented on hosts

• Predictable low latency, high throughput to the cloud

Page 35: Windows Azure: Scaling SDN in the Public Cloud

ExpressRoute: Now live in MSIT!

Page 36: Windows Azure: Scaling SDN in the Public Cloud

Host

Customer Router

ExpressRoute: Entirely Automated SDN Solution

Edge Router

VMSwitch

Gateway VM

BGP RIB

VNET Agent

Gateway Controller

VNET Controller

SLB

Mapping Service

Page 37: Windows Azure: Scaling SDN in the Public Cloud

DEMO: ExpressRoute

Page 38: Windows Azure: Scaling SDN in the Public Cloud

Result: We made SDN Scale

• VNET, SLB, ACLs, Metering, and more scale to millions of servers

• Tens of Thousands of VNETs

• Tens of Thousands of Gateways

• Hundreds of Thousands VIPs

• 10s of Tbps of LB’d traffic

• Billions of Flows… all in the host!

Bandwidth served by SLB to a storage cluster over a week

40Gbps

30Gbps

20Gbps

Page 39: Windows Azure: Scaling SDN in the Public Cloud

Host Networking makes Physical Network Fast and Scalable

• Massive, distributed 40GbE network built on commodity hardware • No Hardware per tenant ACLs • No Hardware NAT • No Hardware VPN / overlay • No Vendor-specific control,

management or data plane

• All policy is in software – and everything’s a VM!

• Network services deployed like all other services

• Battle-tested solutions in Windows Azure are coming to private cloud

10G Servers

Page 40: Windows Azure: Scaling SDN in the Public Cloud

We bet our infrastructure on Host SDN, and it paid off

• The incremental cost of deploying a new tenant, new VNet, or new load balancer is tiny – everything is in software

• Using scale, we are cheaper and faster than any tenant deployed by an admin on-prem

• Public cloud is the future! Join us!