reliability strategies for network function virtualization...

Post on 21-Apr-2018

225 Views

Category:

Documents

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Reliability Strategies for Network Function Virtualization and Cloud NetworksMassimo TornatoreDepartment of Electronics, Information and BioengineeringPolitecnico di Milano, Italy

IEEE 2017 Emerging Technologies Reliability Roundtable (ETR-RT17)Bologna, Italy, July 2nd

Outline

Cloud/Content and Reliability

1. Virtual Network Mapping in Cloud Networks Content Connectivity vs. Network Connectivity

2. Network Function Virtualization (NFV) Reliable Service Chaining Problem

Conclusion and Future Directions

M. Tornatore - Reliability Strategies in NFV and Cloud Networks

2

Cloud Network

M. Tornatore - Protection Strategies in Next Generation Cloud Networks

Data

Content

Social networking

Storage

Web browsing

Videos

E‐mail

User

Request

User

Request

User

Request

User

Request

3

47

1

5

2

6 Cloud DC traffic growth

[1] Cisco Global Cloud Index: Forecast and Methodology, 2015–2020 White Paper

3

Any Content, Anywhere, Any Time

• 90% of the total Internet traffic is generated due to content dissemination [2]

• What really matters is the connectivity to content

• End‐to‐End → End‐to‐Content

4

[2] CISCO. Cisco Visual Networking Index: Forecast and Methodology, 2011‐2016. in White Paper, May 2012

M. Tornatore - Reliability Strategies in NFV and Cloud Networks

Are Cloud Networks Reliable?

• Data loss, service disconnection, security in cloud are still open issues

• Big obstacle for adoption of cloud service from business users

• Some numbers from [3]

• In June 2012, a lightning storm hit the Amazon Virginia data center, taking Netflix as well as Pinterest, Instagram and other sites off line for hours

• Two Sprint fiber optic cuts disrupted Alaska Airline’s operation in Oct. 2012

• Recent survey shows “data loss” at no. 2 of top cloud threat list

• Survey shows that in 2011, 19% of the businesses that experienced data loss are from the cloud

[3] J. Sterbenz et al. Resilience and Survivability in Communication Networks: Strategies, Principles, and Survey of Disciplines. Computer Networks, vol. 54, no. 8, pp. 1245 ‐ 1265, June 2010

5

M. Tornatore - Reliability Strategies in NFV and Cloud Networks

From Cloud Computing to Edge Computing

• 5G networks must provide 99,999% service availability [4] Enabler: Fog Computing, Mobile Edge Computing (MEC), Surrogate Servers, 

Caches, Edge Cloud…  Latency? Traffic Offloading?.... Reliability!

• Most cloud services can be accessed even in case of network disconnection!

6

[4]  NGMN Alliance "5G white paper." Next generation mobile networks, white paper (2015).

M. Tornatore - Reliability Strategies in NFV and Cloud Networks

7

M. Tornatore - Reliability Strategies for NFV and Cloud Networks

1) Content Connectivity

New Survivability Metric: Content Connectivity

Traditional metric: Network connectivity (NC)• Reachability of all nodes from any other node in the network

New metric: Content connectivity (CC)•Reachability of content from any node in the network

[5] “Fault‐Tolerant Virtual Network Mapping to Provide Content Connectivity in Optical Networks,” M. F. Habib et al.

Originserver

Proxyserver

8

Entreprise network

M. Tornatore - Reliability Strategies in NFV and Cloud Networks

Questions on Content Connectivity 9

M. Tornatore - Reliability Strategies for NFV and Cloud Networks

• How do traditional network survivability problem evolve when the introduce Content Connectivity?

• Virtual Network Mapping (Multi‐layer protection)

• Can we save network resource with Content Connectivity?

Survivable Virtual Network Embedding (SVNE)

Initial failure of physical elements

Vertical correlated cascading failures cause failures on upper layers.

Physical layer

Virtual Layer

10

M. Tornatore - Reliability Strategies in NFV and Cloud Networks

• Note: Embedding vs. Mapping

Cut‐Set Definition 11

Cut 1

Cut 4Cut 2

Cut 3

Cut 5

Cut 6

M. Tornatore - Reliability Strategies in NFV and Cloud Networks

SVNE: Condition For Network Connectivity

We must ensure that there is no physical link that  supports all the virtual links in a virtual cutset 

(for all cutset in the virtual network)

12

M. Tornatore - Reliability Strategies in NFV and Cloud Networks

e.g., physical link (1-2) supports all

virtual links of Cut 1

Non‐Survivable Embedding For Net. ConnectivityExample

Non Survivable Embedding 1

2 3

54

13

Cut 1

M. Tornatore - Reliability Strategies in NFV and Cloud Networks

Survivable Embedding For Network ConnectivityExample

Survivable Embedding

No physical link that supports all the virtual links of any cutset

1

2 3

54

14

M. Tornatore - Reliability Strategies in NFV and Cloud Networks

SVNE: Condition For Content Connectivity (K) 15

M. Tornatore - Reliability Strategies in NFV and Cloud Networks

We must ensure that all virtual nodes can reach at least one surrogate server after the occurrence 

of K failures at physical layer

32

SVNEContent Connectivity 

Same as network

connectivity

Scenario A1 Failure

1 Datacenter(trivial)

16

1 54

M. Tornatore - Reliability Strategies in NFV and Cloud Networks

32

SVNEContent Connectivity 

Scenario B1 Failure

2 Datacenters

Network connectivity is not guaranteed,

but content connectivity is

guaranteed

17

1 54

M. Tornatore - Reliability Strategies in NFV and Cloud Networks

SVNEContent Connectivity K=2 

K=2# replicas=2

Content connectivity guaranteed

18

1

2 3

54

K=2# replicas=4

Nonsurvivable

content connected embedding

M. Tornatore - Reliability Strategies in NFV and Cloud Networks

Number of Replicas Vs Number Virtual Links

• Which strategy is better to ensure content connectivity? • Increase number of replicas (more datacenters)?• Increase connectivity of virtual network (more links?)?

• Which is the best choice? 

This issue is currently being addressed by members of our team

19

M. Tornatore - Reliability Strategies in NFV and Cloud Networks

Classification of Analyzed Approaches

Approaches against single‐link failures:•Network Connectivity (NC1)•Content Connectivity (CC1)

Approaches against double‐link failures:•Network Connectivity (NC2)•Content Connectivity (CC2)

20

Can we provide NC1 after first failure? and maintain CC2 after second failure, until failure recovery is

complete? NC1 + CC2

M. Tornatore - Reliability Strategies in NFV and Cloud Networks

The problem and how we solved it

• Inputs Physical topology Logical topology Fixed datacenter locations

• Outputs Survivable Virtual Net. Mapping

• Objective Minimize the resource usage (i.e., wavelengths)

Integer Linear programming.

Heuristics

21

M. Tornatore - Reliability Strategies in NFV and Cloud Networks

0

20

40

60

80

100

120

0.29 0.47 0.57 0.71 1.00

Num

ber o

f Wavelen

ght c

hann

els

β

CC1 CC2 NC1NC1+CC2 NC2

0.71

Numerical Results

Physical topology: NSFNET (14 nodes, 22 bidirectional links)

0.29 0.47

0.57

1

DC1DC2

Logical topologies: Different connectivity degrees (β)

• Number of datacenters: 2• Number of wavelengths per link: 20

22

M. Tornatore - Reliability Strategies in NFV and Cloud Networks

Lesson Learned

•With a small additional effort in the design phase, we can ensure network connectivity to single failures augmented with content connectivity against double‐link failures with minimum resources and with a limited number of datacenters

23

M. Tornatore - Reliability Strategies in NFV and Cloud Networks

24

M. Tornatore - Reliability Strategies for NFV and Cloud Networks

2) Reliable Service Chaining

Network Function Virtualization

Network functions implemented as virtual network function (VNF) (Virtual Machines) in general purpose hardware

No more “middle-boxes”

25

[7] “Virtualizing Network Security with NFV and SDN Explored in New Whitepaper and Webinar”, www.infonetics.com

M. Tornatore - Reliability Strategies in NFV and Cloud Networks

Service Chain

User NAT FW DPI WOCWeb

Server

• VNFs are chained to set‐up a Service Chain (SC)

• Example: Web‐Service SC

• Each SC has its own requirements in terms of

Bandwidth

Latency

Resiliency

DPI: Deep Packet InspectionWOC: WAN Optimized Controller

26

M. Tornatore - Reliability Strategies in NFV and Cloud Networks

NAT: Network Address TranslatorFW: Firewall

VNF Placement for Service Chaining

VNF shared by different SCs

VNFs sharing the same node

Each SC has an end-to-end latency requirement

Physical Topology

Each VNF is carachterized by its processing requirement (#of CPUs)

Start Point

VNF3

VNF4

EndPoint

StartPoint

VNF1

VNF2

EndPointService Chain 1

Service Chain n

27

M. Tornatore - Reliability Strategies in NFV and Cloud Networks

Resilient VNF “placement”Questions to be answered

Where do we place VNFs and route traffic to ensure resiliency against link/node failures?

Which  protection schemes shall we apply?

28

M. Tornatore - Reliability Strategies in NFV and Cloud Networks

Protection schemesSeveral possible combination/choices

Unprotected

29

M. Tornatore - Reliability Strategies in NFV and Cloud Networks

(Virtual) Link Protection (Vl-P)

(Virtual) Node Protection (Vn-P) End-to-End Protection (E2E-P)

Numerical settings (1)

• NSFNET network topology (14 nodes, 22 links @1Gb/s)

• 5 different types of SCs

NAT: Network Address Translator, FW: FirewallTM: Traffic Monitor, VOC: Video Optimization Controller, IDPS: Intrusion Detection Prevention 

System, WOC: WAN Optimized Controller

[8] M. Claypool and K. Claypool, Latency and player actions in online games, Commun. ACM 49, 11 (November 2006), 40‐45[9] A. Hmaity et al. "Virtual Network Function placement for resilient Service Chain provisioning," 2016 8th International Workshop on Resilient Networks Design and Modeling (RNDM), Halmstad, 2016, pp. 245‐252.

WSVSVoIP

OG

30

NATUSER FW TM WOC IDPS WEB SERVER

NATUSER FW TM VOC IDPS VIDEO SERVER

NATUSER FW TM FW NAT VOICE SERVER

NATUSER FW VOC WOC IDPS GAME SERVER

M. Tornatore - Reliability Strategies in NFV and Cloud Networks

Service Chain Bandwidth (kb/s) Max latency (ms)

Web Service (WS) 100 500Video Streaming (VS) 4000 100

VoIP 64 100Online Gaming (OG) 50 60

Results – Number of required NFV Nodes 31

M. Tornatore - Reliability Strategies in NFV and Cloud Networks

Web Service Online gaming

Results ‐ Average Hop Count

• Average path length (nr. of hops)

32

At high values of nodes capacity the Vl‐P produces the longest paths due to the fact that many pairs of 

disjoint paths must be computed

(hard disjontness constraint)

M. Tornatore - Reliability Strategies in NFV and Cloud Networks

Lesson learned

• Applications have so diverse requirements (latency, computing intensity, bandwidth, reliability), there’s no one‐size‐fit‐all solution

→ «Slicing», applica on‐aware resource/protection provisioning

33

M. Tornatore - Reliability Strategies in NFV and Cloud Networks

Other research directions1) Self‐diagnosed networks (i.e., machine learning/analytics) 

34

M. Tornatore - Reliability Strategies for NFV and Cloud Networks

•Machine learning for fault diagnosis • A fault has a set of symptoms (warnings, alarms, other faults)• Fault diagnosis correlates observed symptoms so as to determine their root cause(s)• It leverages on monitoring data (e.g., collected by operator’s hot line [8]): counters, powers, temperatures, …

•Machine learning for Fault Localization • Authors in [9] use Network Kriging

[8] S. Gosselin et. al. , Application of Probabilistic Modeling and Machine Learning to the Diagnosis of FTTH GPON Networks, ONDM 2017[9] K. Christodoulopoulos et al.. Exploiting network kriging for fault localization. In Optical Fiber Communication Conference (pp. W1B‐5)

Other research directions2) SDN control resiliency 

35

M. Tornatore - Reliability Strategies for NFV and Cloud Networks

Data Plane

C1

C2

Control PlaneC1 C2

C3

C3

• Determining # of controller and their placements 

• Determining logical control plane topology 

• Mapping control plane (routing) to physical network

• Controller‐to‐switch assignments

[10]. S Savas, M Tornatore, MF Habib, P Chowdhury, B Mukherjee, Disaster-resilient control plane design and mapping in software-defined networks, in High Performance Switching and Routing (HPSR), 2015

Thank You! 36

..and thanks to them!

M. Tornatore - Reliability Strategies in NFV and Cloud Networks

Biswanath MukherjeeFarhan HabibSedef Savas

Achille PattavinaAli HmaityFrancesco Musumeci

My publications on these topics

CONTENT CONNECTIVITY

• M. F. Habib, M. Tornatore, and B. Mukherjee, "Fault‐Tolerant Virtual Network Mapping to Provide Content Connectivity in Optical Networks," in Optical Fiber Communication Conference/National Fiber Optic Engineers Conference 2013, paper OTh3E.4.

• A. Hmaity, F. Musumeci and M. Tornatore, "Survivable virtual network mapping to provide content connectivity against double‐link failures," 2016 12th International Conference on the Design of Reliable Communication Networks (DRCN), Paris, 2016, pp. 160‐166

RELIABLE SERVICE CHAINING

• A. Hmaity, M. Savi, F. Musumeci, M. Tornatore and A. Pattavina, "Virtual Network Function placement for resilient Service Chain provisioning," 2016 8th International Workshop on Resilient Networks Design and Modeling (RNDM), Halmstad, 2016, pp. 245‐252.

37

M. Tornatore - Reliability Strategies for NFV and Cloud Networks

M. Tornatore - Protection Strategies in Next Generation Cloud Networks

BACKUP SLIDES

38

Service chains modelling

Servicechain

M. Tornatore - Protection Strategies in Next Generation Cloud Networks

39

Mapping VNFs Requests

StartPoint

VNFreques

t

VNF1

EndPoint

Phase 2:Mapping VNF requests toNFV nodes that host VNFs

Servicechain

VNF2 VNF3

VNF3

VNF2VNF1

VNFrequest

VNFreques

t

Phase 1:Mapping VNFsto NFV nodes

M. Tornatore - Protection Strategies in Next Generation Cloud Networks

40

Problem statement (2/2)

Objective function:

Minimize total number of “NFV nodes” (i.e., nodes hosting VNFs)

Three groups of constraints

VNF Request placement

VNF routing constraints

Performance (i.e., latency) constraints

Protection constraints

MILPMinimize

Number of Active NFV

Nodes

Physical topology

Active NFV nodes

Physical path for each SC (routing)

Size and position of VNFs (VNF placement)

SCs to be deployed

SCs and VNFs parameters

M. Tornatore - Protection Strategies in Next Generation Cloud Networks

41

ILP Sets and parameters

VNF request node mappi

ngVNF reques

ts to physic

al paths

mappingMappin

g NFV to VNF request

s

M. Tornatore - Protection Strategies in Next Generation Cloud Networks

42

Constraints (E2E‐P)

M. Tornatore - Protection Strategies in Next Generation Cloud Networks

43

Constraints (E2E‐P)

M. Tornatore - Protection Strategies in Next Generation Cloud Networks

44

Constraints (E2E‐P)

M. Tornatore - Protection Strategies in Next Generation Cloud Networks

45

Constraints (E2E‐P)

Latency and capacity

constraints

M. Tornatore - Protection Strategies in Next Generation Cloud Networks

46

Whose Problem(s) Are We Addressing?

Consumers (you and I)

Enterprises

Cloud‐service providers

Carriers

47

M. Tornatore - Reliability Strategies in NFV and Cloud Networks

What Kind Of Issues Are We Addressing?

• Traffic Engineering (TE)– “Put the traffic where the bandwidth is”

• Network Engineering (NE)– “Put the bandwidth where the traffic is”

• Network Planning (NP)– “Put the bandwidth where the traffic is forecasted to 

be”

TE – online, dynamic, provisioning problem, ms time scale

NE – intermediate problem, months time scale

NP – offline, static, dimensioning problem, 5‐yr time scale

48

M. Tornatore - Reliability Strategies in NFV and Cloud Networks

Summary of Necessary and SufficientConditionsSingle‐link failures:•Network connectivity with k =1 (NC 1) «CutSet» condition [6]

•Content connectivity with K=1 (CC 1) 1 replica: same as NC1 > 1 replica: reachability of at least one replica under any single failure (much 

simpler)

Double‐link failures:•Network connectivity with k=2 (NC 2) CutSet condition applies to each pair of physical links Very hard condition

•Content connectivity with K= 2 (CC2) 1 replica: same as NC2 > 1 replica: reachability of at least one replica under any double failure (much 

simpler)

49

[6] K. Lee and E. Modiano. Cross Layer Survivability in WDM‐based Networks. in IEEE INFOCOM, Rio de Janeiro, Brazil, April 2009

M. Tornatore - Reliability Strategies in NFV and Cloud Networks

SCs latency requirements

Different latency “contributions”• Propagation and transmission 

delay (network links)• Processing delay for VNFs in 

NFV nodes considering resource sharing Upscaling, i.e., a VNF is shared 

by different SCs (notconsidered)

Context‐switching, i.e., two or more VNFs share the same hardware resources (processors)

Context switching costs

50

M. Tornatore - Reliability Strategies in NFV and Cloud Networks

Problem statement

The VNF Placement problem for survivable SC provisioning

• Given a physical topology, a set of SCs (with latency requirements) to deploy in the network, the required resilience level (nodes and/or links)• Decide the optimal placement of VNFs and mapping of virtual links into the physical topology• Minimizing the number of active NFV nodes• Subject to latency, protection, routing, (capacity) constraints

51

MILPMinimize Number

of Active NFV Nodes

Physical topology Active NFV nodes

Physical path for each SC (routing)

VNF placementSCs to be deployed

SCs and VNFs parameters

M. Tornatore - Reliability Strategies in NFV and Cloud Networks

SVNEContent Connectivity K=2 (More virtual links)

Nonsurvivable

content connected embedding

K=2# replicas=1

Content connectivity guaranteed

K=2# replicas=2

52

1

2 3

54

Content connectivity against double-link

failures can be guaranteed with limited number of replicas if:Conn. Degree( virtual

network) > 2

M. Tornatore - Reliability Strategies in NFV and Cloud Networks

Problem Statement

• Inputs Physical topology Logical topology Fixed datacenter locations

• Outputs Survivable Virtual Net. Mapping

• Objective Minimize the resource usage (i.e., wavelengths)

• Constraints Flow constraints Placement constraints Capacity constraint

Integer Linear programming.

Heuristics

53

M. Tornatore - Reliability Strategies in NFV and Cloud Networks

• Connectivity degree= 0.57

• All nodes are assumed to hold a datacenter

25

35

45

55

65

75

1 2 3 4 5 6 7 8

Num

ber o

f w

avel

engt

h ch

anne

ls

Number of data-centers

CC1 CC2 NC1NC1+CC2 NC2

54Numerical Results

M. Tornatore - Reliability Strategies in NFV and Cloud Networks

E2E‐P and Vn‐Pactivate twice the number of NFV nodes w.r.t. Vl‐Pand Unprotectedscenarios when  

NFV‐node capacity is high, and less than 

twice under small values of NFV‐node capacity

Numerical results – Web Service SCs

0

2

4

6

8

10

12

14

16

2 4 6 8 10 12

Num

ber o

f activeNFV

 nod

es

Node capacity (#CPU cores per NFV‐node)

Unpro Vl‐P Vn‐P E2E

55

For loose latency requirement (WS) resiliency Vl‐P comes at no 

addtional cost in terms of NFV nodes with respect to 

Unprotected case

M. Tornatore - Reliability Strategies in NFV and Cloud Networks

0

2

4

6

8

10

12

14

16

2 4 6 8 10 12

Num

ber o

f activeNFV

 nod

es

Node capacity (#CPU cores per NFV‐node)

Unpro Vl‐P Vn‐P E2E

Numerical results – Online Gaming SCs

For small values of NFV‐node capacity, only Unproscenario is feasible

56

Vl‐P is infeasible independently 

from node capacity and  Vn‐P comes at the same cost of 

E2E‐P

The operator is constrained to 

place backup VNFs off‐site to provide resiliency against 

link failures

M. Tornatore - Reliability Strategies in NFV and Cloud Networks

Numerical settings (2)

Service Chain Bandwidth (kb/s) Max latency (ms)

Web Service (WS) 100 500Video Streaming (VS) 4000 100

VoIP 64 100Online Gaming (OG) 50 60

57

VNF CPU requirements (per user)

NAT 0.00092FW 0.0009TM 0.0133WOC 0.0054IDPS 0.0107VOC) 0.0054

M. Tornatore - Reliability Strategies in NFV and Cloud Networks

Protection schemes

• No resiliency against link/node failures

58

+ Low cost‐ Low reliability

M. Tornatore - Reliability Strategies in NFV and Cloud Networks

Protection schemesVirtual link Protection (Vl‐P)

• Resiliency against link failures

59

+ High node consolidation+ Low recovery time‐ Large bandwidth usage, long paths (latency)

M. Tornatore - Reliability Strategies in NFV and Cloud Networks

Protection schemesVirtual node Protection (Vn‐P)

60

• Resiliency against node failures

+ High flexibility to meet SC latency requirements‐ Large number of NFV nodes‐ High recovery time

M. Tornatore - Reliability Strategies in NFV and Cloud Networks

Protection schemesEnd to End Protection (E2E‐P)

61

• Resiliency against both node and link failures 

+ high flexibility to meet SC latency requirements+ highest resiliency‐ Large number of nodes

M. Tornatore - Reliability Strategies in NFV and Cloud Networks

top related