end-to-end delay modeling for embedded vnf chains in 5g ...bbcr.uwaterloo.ca/~wzhuang/papers/5g...

13
1 End-to-End Delay Modeling for Embedded VNF Chains in 5G Core Networks Qiang Ye, Member, IEEE, Weihua Zhuang, Fellow, IEEE, Xu Li, and Jaya Rao Abstract—In this paper, an analytical end-to-end (E2E) packet delay modeling is established for multiple traffic flows travers- ing an embedded virtual network function (VNF) chain in fifth generation (5G) communication networks. The dominant- resource generalized processing sharing (DR-GPS) is employed to allocate both computing and transmission resources among flows at each network function virtualization (NFV) node to achieve dominant-resource fair allocation and high resource utilization. A tandem queueing model is developed to characterize packets of multiple flows passing through an NFV node and its outgoing transmission link. For analysis tractability, we decouple packet processsing (and transmission) of different flows in the modeling and determine average packet processing and transmission rates of each flow as approximated service rates. An M/D/1 queueing model is developed to calculate packet delay for each flow at the first NFV node. Based on the analysis of packet inter-arrival time at the subsequent NFV node, we adopt an M/D/1 queueing model as an approximation to evaluate the average packet delay for each flow at each subsequent NFV node. The queueing model is proved to achieve more accurate delay evaluation than that using a G/D/1 queueing model. Packet transmission delay on each embedded virtual link between consecutive NFV nodes is also derived for E2E delay calculation. Extensive simulation results demonstrate the accuracy of our proposed E2E packet delay modeling, upon which delay-aware VNF chain embedding can be achieved. Index Terms—NFV, SDN, embedded VNF chains, E2E de- lay modeling, tandem queueing model, CPU and bandwidth resources, bi-resource allocation, DR-GPS, rate decoupling. I. I NTRODUCTION Future communication networks are expected to provide customized delay-sensitive end-to-end (E2E) service deliv- eries [1] (e.g., video streaming, machine-to-machine com- munications) with fine-grained quality-of-service (QoS) for Internet-of-Things (IoT) [2]–[6]. Typical IoT application sce- narios include remote control for smart homing, smart sens- ing [7], large-scale mobile social networking [8], industrial automation [9], high-definition video conferencing, and in- telligent transportation systems [10], [11]. To support diver- sified applications and use cases, network servers providing different functions (e.g., classifiers, firewalls, and proxies) are required to be augmented in a large scale to fulfill customized service requirements in both wireless and core network domains. However, the increasingly densified network Qiang Ye and Weihua Zhuang are with the Department of Electrical and Computer Engineering, University of Waterloo, Waterloo, ON, Canada, N2L 3G1 (emails: {q6ye, wzhuang}@uwaterloo.ca). Xu Li and Jaya Rao are with Huawei Technologies Canada Inc., Ottawa, ON, Canada, K2K 3J1 (emails: {Xu.LiCA, jaya.rao}@huawei.com). Copyright (c) 2012 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending a request to [email protected]. deployment significantly expands installation and operational cost of the infrastructure. Network function virtualization (NFV) [12]–[15] provides a promising solution to reduce the deployment cost and realize flexible function placement and service customization. With NFV, network functions are de- coupled from function-specific servers, softwarized as virtual network functions (VNFs), and placed on general-purpose programmable servers (also called NFV nodes [13], [16]). Specifically, through a resource virtualization platform [17], computing resources (i.e., CPU cores) for task processing on each NFV node are virtualized as virtual CPU (vCPU) cores, upon which virtual machines (VMs) are installed. Then, VNFs are programmed on VMs of different NFV nodes at different network locations to achieve high resource utilization. For the evolving network paradigm, the backbone core net- work consists of a combination of network switches and NFV nodes interconnected via high-speed wired transmission links. NFV nodes hosting and operating VNFs are introduced to improve service provisioning and resource utilization. A set of VNFs and the virtual links connecting them constitute a logic VNF chain, referred to as service function chain (SFC) [18], representing a specific sequence of network functions that a traffic flow 1 requires to traverse for E2E service provisioning. All VNF chains are managed by a virtualization controller and are placed onto the physical substrate network, with each VNF embedded on an NFV node and virtual links represented by transmission links and network switches. This process is known as VNF chain embedding [13], [14], [19], [20]. Note that the virtualization controller is software-defined network- ing (SDN) enabled [21], with all control functions decoupled from the underlying physical network [22], [23]. Therefore, the controller has direct control (programmability) on all VNFs to enhance resource utilization via traffic balancing and VM migration [24], [25]. E2E packet delay of a delay-sensitive service flow traversing an embedded VNF chain is a main metric indicating the embedding performance. Exsiting research works investigate how to achieve optimal embedding of VNF chains on the core network to minimize the deployment, operational, and delay violation cost under physical resource constraints and flow conservation constraints. E2E packet delay of each traffic flow is calculated as a summation of packet transmission delays on each physical link, without considering packet processing delay due to CPU processing on NFV nodes [13], [14], [26]. However, each packet from different traffic flows passing 1 A traffic (service) flow refers to an aggregation of packets belonging to the same service type for the same source and destination node pair in the backbone network.

Upload: others

Post on 14-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: End-to-End Delay Modeling for Embedded VNF Chains in 5G ...bbcr.uwaterloo.ca/~wzhuang/papers/5G delay modeling July18.pdf · Future communication networks are expected to provide

1

End-to-End Delay Modeling for Embedded VNFChains in 5G Core Networks

Qiang Ye, Member, IEEE, Weihua Zhuang, Fellow, IEEE, Xu Li, and Jaya Rao

Abstract—In this paper, an analytical end-to-end (E2E) packetdelay modeling is established for multiple traffic flows travers-ing an embedded virtual network function (VNF) chain infifth generation (5G) communication networks. The dominant-resource generalized processing sharing (DR-GPS) is employed toallocate both computing and transmission resources among flowsat each network function virtualization (NFV) node to achievedominant-resource fair allocation and high resource utilization.A tandem queueing model is developed to characterize packetsof multiple flows passing through an NFV node and its outgoingtransmission link. For analysis tractability, we decouple packetprocesssing (and transmission) of different flows in the modelingand determine average packet processing and transmission ratesof each flow as approximated service rates. An M/D/1 queueingmodel is developed to calculate packet delay for each flow atthe first NFV node. Based on the analysis of packet inter-arrivaltime at the subsequent NFV node, we adopt an M/D/1 queueingmodel as an approximation to evaluate the average packet delayfor each flow at each subsequent NFV node. The queueing modelis proved to achieve more accurate delay evaluation than thatusing a G/D/1 queueing model. Packet transmission delay on eachembedded virtual link between consecutive NFV nodes is alsoderived for E2E delay calculation. Extensive simulation resultsdemonstrate the accuracy of our proposed E2E packet delaymodeling, upon which delay-aware VNF chain embedding canbe achieved.

Index Terms—NFV, SDN, embedded VNF chains, E2E de-lay modeling, tandem queueing model, CPU and bandwidthresources, bi-resource allocation, DR-GPS, rate decoupling.

I. INTRODUCTION

Future communication networks are expected to providecustomized delay-sensitive end-to-end (E2E) service deliv-eries [1] (e.g., video streaming, machine-to-machine com-munications) with fine-grained quality-of-service (QoS) forInternet-of-Things (IoT) [2]–[6]. Typical IoT application sce-narios include remote control for smart homing, smart sens-ing [7], large-scale mobile social networking [8], industrialautomation [9], high-definition video conferencing, and in-telligent transportation systems [10], [11]. To support diver-sified applications and use cases, network servers providingdifferent functions (e.g., classifiers, firewalls, and proxies)are required to be augmented in a large scale to fulfillcustomized service requirements in both wireless and corenetwork domains. However, the increasingly densified network

Qiang Ye and Weihua Zhuang are with the Department of Electrical andComputer Engineering, University of Waterloo, Waterloo, ON, Canada, N2L3G1 (emails: {q6ye, wzhuang}@uwaterloo.ca).

Xu Li and Jaya Rao are with Huawei Technologies Canada Inc., Ottawa,ON, Canada, K2K 3J1 (emails: {Xu.LiCA, jaya.rao}@huawei.com).

Copyright (c) 2012 IEEE. Personal use of this material is permitted.However, permission to use this material for any other purposes must beobtained from the IEEE by sending a request to [email protected].

deployment significantly expands installation and operationalcost of the infrastructure. Network function virtualization(NFV) [12]–[15] provides a promising solution to reduce thedeployment cost and realize flexible function placement andservice customization. With NFV, network functions are de-coupled from function-specific servers, softwarized as virtualnetwork functions (VNFs), and placed on general-purposeprogrammable servers (also called NFV nodes [13], [16]).Specifically, through a resource virtualization platform [17],computing resources (i.e., CPU cores) for task processing oneach NFV node are virtualized as virtual CPU (vCPU) cores,upon which virtual machines (VMs) are installed. Then, VNFsare programmed on VMs of different NFV nodes at differentnetwork locations to achieve high resource utilization.

For the evolving network paradigm, the backbone core net-work consists of a combination of network switches and NFVnodes interconnected via high-speed wired transmission links.NFV nodes hosting and operating VNFs are introduced toimprove service provisioning and resource utilization. A set ofVNFs and the virtual links connecting them constitute a logicVNF chain, referred to as service function chain (SFC) [18],representing a specific sequence of network functions that atraffic flow1 requires to traverse for E2E service provisioning.All VNF chains are managed by a virtualization controllerand are placed onto the physical substrate network, with eachVNF embedded on an NFV node and virtual links representedby transmission links and network switches. This process isknown as VNF chain embedding [13], [14], [19], [20]. Notethat the virtualization controller is software-defined network-ing (SDN) enabled [21], with all control functions decoupledfrom the underlying physical network [22], [23]. Therefore,the controller has direct control (programmability) on all VNFsto enhance resource utilization via traffic balancing and VMmigration [24], [25].

E2E packet delay of a delay-sensitive service flow traversingan embedded VNF chain is a main metric indicating theembedding performance. Exsiting research works investigatehow to achieve optimal embedding of VNF chains on the corenetwork to minimize the deployment, operational, and delayviolation cost under physical resource constraints and flowconservation constraints. E2E packet delay of each traffic flowis calculated as a summation of packet transmission delayson each physical link, without considering packet processingdelay due to CPU processing on NFV nodes [13], [14], [26].However, each packet from different traffic flows passing

1A traffic (service) flow refers to an aggregation of packets belonging tothe same service type for the same source and destination node pair in thebackbone network.

Page 2: End-to-End Delay Modeling for Embedded VNF Chains in 5G ...bbcr.uwaterloo.ca/~wzhuang/papers/5G delay modeling July18.pdf · Future communication networks are expected to provide

2

through an NFV node usually requires different amounts ofCPU processing time on the NFV node and packet trans-mission time on the outgoing link in sequence [27], [28].Depending on the type of VNF each flow traverses, someflows’ packets with large headers bottleneck on CPU pro-cessing, whereas packets of other flows having large packetpayload sizes demand more transmission time. In addition,the packet arrival process of a flow at an NFV node correlateswith packet processing and transmission at its preceding NFVnodes, which makes E2E delay analysis difficult. Therefore,how to develop an analytical model to evaluate the delay thateach packet of a flow experiences when passing through anembedded VNF chain, including packet queueing delay, packetprocessing delay on NFV nodes, and packet transmission delayon links, is a challenging research issue. For VNF chainembedding, there is a tradeoff between E2E delay satisfactionand reducing the network cost. Different VNF chains areoften embedded on a common network path with multipleVNF instances operated on an NFV node to improve resourceutilization and reduce the deployment and operational cost ofnetwork functions and links. On the other hand, sharing a setof physical resources with other flows on NFV nodes and linksmay degrade the delay of one individual flow. Thus, modelingE2E packet delay of each flow is essential to achieve delay-aware VNF chain embedding. Since traffic flows traversingeach NFV node demonstrate discrepant “dominant” resourceconsumption on either CPU or link bandwidth, how to allocatethe two resources among traffic flows to guarantee allocationfairness and achieve high resource utilization, referred to asbi-resource allocation, needs investigation and is required forE2E delay modeling.

In this paper, we employ dominant-resource generalizedprocessor sharing (DR-GPS) [28] as the bi-resource allocationscheme among traffic flows sharing resources at each NFVnode. The DR-GPS balances the tradeoff between service iso-lation and high resource utilization, and maintains dominant-resource fairness [29] among flows. Then, we model packetdelay of a flow passing through NFV nodes and physical links(and switches) of an embedded VNF chain. The contributionsof this paper are two-folded:

1) With DR-GPS, we establish a tandem queueing modelto extract the process of each flow going through CPUprocessing and link transmission at the first NFV node. Toremove the rate coupling effect among flows, we deriveaverage processing rate as an approximation on instanta-neous processing rate for each flow with the considerationof resource multiplexing. An M/D/1 queueing model isthen developed for each decoupled processing queue todetermine the average packet processing delay. Basedon the analysis of packet departure process from eachdecoupled processing queue, the decoupled transmissionrate is derived for each flow.

2) We analyze packet arrival process and then remove ratecoupling effect of each flow traversing the subsequentNFV node. To eliminate the dependence of packet pro-cessing and packet transmission between consecutiveNFV nodes, the arrival process of each flow at the

subsequent NFV node is approximated as a Poissonprocess and an M/D/1 queueing model is employed tocalculate the average delay for packet processing. It isproved that the average packet queueing delay for CPUprocessing based on the approximated M/D/1 queueingmodel is an improved upper bound over that calculatedupon a G/D/1 queueing model.

The rest of the paper is organized as following sections.Existing studies on E2E packet delay modeling for embeddedVNF chains are reviewed in Section II. The system modelunder consideration is described in Section III. In Section IV,we present a bi-resource allocation scheme employed forflows traversing an NFV node. The end-to-end delay modelingfor packet flows going through an embedded VNF chain isestablished in Section V. In Section VI, numerical results arepresented to demonstrate the accuracy of the proposed delayanalytical framework and its effectiveness for achieving delay-aware VNF chain embedding. Lastly, we draw the conclusionsin Section VII. Main parameters and symbols throughout thispaper are summarized in Table I.

II. RELATED WORK

In most existing studies, E2E packet delay for an embeddedVNF chain is modeled as a summation of transmission delayswhen every packet of a flow traverses each embedded physicallink, without considering packet processing delay associatedwith VNFs. In [13] and [14], delay violation penalty isexpressed as a function of E2E packet transmission delayfor each traffic flow, which indicates the cost if the delayrequirement for packets traversing an embedded VNF chainis violated. A delay violation cost minimization problem isthen formulated to determine an optimal embedded physicalnetwork path that achieves minimum E2E packet transmsis-sion delay for each flow. In [26], the total delay when apacket is routed between consecutive virtual nodes consistsof packet processing delay, packet queueing delay, and packettransmission delay, and is determined using network trafficmeasuring tools instead of analytical modeling. The E2E delayfor packets going through each source-destination (S-D) nodepair in an embedded virtual network is then calculated toachieve QoS-aware multicast virtual network embedding. Forembedded VNF chains, how CPU and bandwidth resourcesare shared among multiple flows traversing each NFV nodedetermines the fractions of processing and transmission ratesallocated to each flow, and thus affects E2E packet delaycalculation. Multi-resource sharing is studied in a data center(DC) environment [27]–[29], where dominant-resource fairallocation is adopted to equalize the dominant resource sharesof multiple flows, while achieving high resource utilization.However, how to model E2E packet delay for each flowtraversing a set of DC nodes needs further investigation [28].An analytical E2E delay model for packets passing througheach embedded VNF chain is established in [30], where anindependent M/M/1/K queueing model is employed to evaluatepacket delay at each VNF. However, packet inter-arrival timeat a subsequent VNF correlates with the processing andtransmission rates at its preceding VNF, making packet delay

Page 3: End-to-End Delay Modeling for Embedded VNF Chains in 5G ...bbcr.uwaterloo.ca/~wzhuang/papers/5G delay modeling July18.pdf · Future communication networks are expected to provide

3

TABLE I: Main parameters and symbols

Symbol DefinitionI Set of traffic flows traversing a common embedded physical network pathNz z th NFV node along one embedded VNF chainnz Number of network switches and physical links between Nz and Nz+1

λi Packet arrival rate of traffic flow i at N1

τi,1/τi,2 Time consumption of each packet of flow i for CPU processing/link transmission at N1

Ci,1/Ci,2 Maximum packet processing/transmission rate allocated to flow i at N1

ci,1/ci,2 Allocated packet processing/transmission rate out of Ci,1/Ci,2 to flow ihi,1/hi,2 Fraction of CPU/link bandwidth resources allocated to flow i at N1

B One of the flow combinations for a backlogged flow set out of Iµi,1/µi,2 Decoupled packet processing/transmission rate for flow i at N1

%i,1 Non-empty probability of flow i’ s processing queue at N1

µ′i,1/µ′i,2 Decoupled packet processing/transmission rate for flow i at N2

%′i,1 Non-empty probability of flow i’ s processing queue at N2

Yi Inter-departure time of successive packets of flow i from the decoupled processing at N1

Zi Packet inter-departure time for flow i passing though link transmission at N1

Di,z Average packet delay for flow i passing through z th NFV nodeD

(f)i,z Total packet transmission delay for flow i traversing nz switches and links

calculation unique at each VNF. In addition, the assumptionof exponential distribution on packet processing time needsjustification. Existing studies establish analytical delay modelsfor packets of a traffic flow going through a set of OpenFlownetwork switches in SDN [31], [32]. Since control functionsare migrated from each network switch to the SDN controllerwhere processing resources are consumed for making routingdecisions, the network switches are simplified with only packetforwarding functions. E2E packet delay for a flow passingthrough a sequence of network switches is determined basedon an M/M/1 queueing network modeling.

Overall, developing an accurate analytical E2E delay modelfor each traffic flow traversing an embedded VNF chain ischallenging and of importance for achieving delay-aware VNFchain embedding.

III. SYSTEM MODEL

A. Embedded VNF Chains

An aggregated data traffic flow from the wireless networkdomain, belonging to an identical service type, is requiredto traverse a sequence of VNFs in the core network to fulfillcertain service requirements. Each VNF is embedded and oper-ated on an NFV node, and each virtual link represents a set oftransmission links and network switches. Multiple VNF chainscan be embedded on a common network path to improveresource utilization and reduce network deployment and op-erational cost. We consider a set, I , of traffic flows traversingdifferent logic VNF chains over a common embedded physicalpath. In Fig. 1, two flows i and j (∈ I), representing two logicVNF chains f1 → f3 and f1 → f2, respectively, traverseone embedded network path and share the same physicalresources. At the service level, we have flow i traversing afirewall function and a domain name system (DNS) functionsequentially to fulfill a secured DNS service request, and flowj traversing a firewall function and an intrusion detectionsystem (IDS) for secured end-to-end data streaming. At thenetwork level, flow i goes through the first NFV node N1

operating VNF f1 and transmission link L0, and are thenforwarded by n1 network switches {R1, ..., Rk, ..., Rn1

} and

n1 transmission links {L1, ..., Lk, ..., Ln1} in between before

reaching the second NFV node, N2, operating VNF f3; Flowj traverses the same physical path but passes through VNFf2 at the second NFV node N2. After passing through N2,traffic flows i and j are forwarded by a sequence of n2switches and n2 links (not depicted in Fig. 1 for brevity)to reach the destination node in the core network. For ageneral case, we have the set I of flows traversing and sharingan embedded physical path, with m NFV nodes denotedby Nz (z = 1, 2, ...,m) and nz pairs of network switchesand physical links forwarding traffic between NFV nodes Nzand Nz+1 before reaching the destination node. With NFV,different VNFs can be flexibly orchestrated and installed atappropriate NFV nodes to enhance traffic balancing and reducedeployment cost of network infrastructure. When a traffic flowpasses through an NFV node, each packet of the flow firstrequires a CPU time for packet processing, after which theprocessed packet is allocated link bandwidth resources fortransmission [28]. The total amount of CPU time is assumedinfinitely divisible [27], [28] on each NFV node and needs tobe properly shared among traffic flows passing through, andthe bandwidth resources on transmission links are also sharedamong traffic flows.

B. Traffic Model

Packet arrivals of flow i at the first NFV node, N1, aremodeled as a Poisson process with arrival rate λi. For resourcerequirements of a flow, we define time profile for flow i passingthrough N1 as a two dimensional time vector [τi,1, τi,2],indicating that every packet of flow i requires τi,1 time forCPU processing and τi,2 time for transmission if all CPU timeon N1 and link bandwidth resources on L0 are allocated toflow i [28]. Correspondingly, the rate vector [Ci,1, Ci,2] (inthe unit of packet per second) for flow i is the reciprocalof the time profile, Ci,1 = 1

τi,1and Ci,2 = 1

τi,2. Service

flows with different packet structures often have discrepanttime profile when passing through an NFV node. For example,when going through a firewall function, service flows carryingsmall packets with a large header size, such as DNS request

Page 4: End-to-End Delay Modeling for Embedded VNF Chains in 5G ...bbcr.uwaterloo.ca/~wzhuang/papers/5G delay modeling July18.pdf · Future communication networks are expected to provide

4

Switch NFV node Firewall IDS

DNS

Transmission link Programmable connection Traffic flow direction

Flow i

Flow j

Flow i

Flow j

... ...

Fig. 1: Two embedded VNF chains sharing a common physical network path.

packets, are more CPU time demanding, whereas other datatraffic flows, e.g., video traffic, with a large packet size willrequire more time for packet transmissions but less time onpacket processing. Therefore, service flows always have amore critical resource consumption on either CPU time orlink bandwidth, which is referred to as dominant resource.With a given set of maximum available CPU resources andlink bandwidth resources on an NFV node and its outgoinglink, time profile for different traffic flows sharing one NFVnode can be very different. The maximum available CPUtime on N1 and maximum link bandwidth on L0 are sharedamong flows. Suppose flow i is allocated packet processingrate ci,1 out of the maximum processing rate Ci,1, and packettransmission rate ci,2 out of the maximum transmission rateCi,2. Note that when the CPU time is shared among multipleflows, there can be some overhead time as the total CPUresources switch among flows for different processing tasks.Existing studies show that this CPU switching overhead onlybecomes obvious when the traffic of each flow is saturatedwith a high percentage of CPU utilization (i.e., CPU coresare frequently interrupted for switching tasks) [18], [33].Considering only non-saturation traffic, we assume that theallocated processing rate ci,1 for flow i (∈ I) varies linearlywith its occupied fraction of CPU time (i.e., the useful fractionof CPU usage). Thus, we denote the fraction of CPU resourcesallocated to flow i by hi,1 =

ci,1Ci,1

and the fraction of linkbandwidth resources by hi,2 =

ci,2Ci,2

.

IV. BI-RESOURCE ALLOCATION DISCIPLINE

A. Dominant-Resource Generalized Processor Sharing

When different service flows multiplex at a common NFVnode, we want to determine how CPU and link bandwidthresources should be shared among the traffic flows to achievehigh utilization on each resource type and, at the same time,maintain a (weighted) fair allocation among the services. Sinceservice flows can have different dominant resources, the bi-resource sharing becomes more challenging than single re-source allocation, to balance between high resource utilizationand fair allocation for both resource types.

The generalized processor sharing (GPS) discipline is abenchmark fluid-flow (i.e., with resources being infinitelydivisible) based single resource allocation model for integratedservice networks in the traditional communication networks

[34], [35]. Each service flow, say flow i, at a common GPSserver (e.g., a network switch) is assigned a positive value,ϕi, indicating its priority in bandwidth allocation. The GPSserver guarantees that the allocated transmission rate gi forflow i satisfies

gi ≥ϕi∑

i∈IϕiG (1)

where G is the maximum service rate of the GPS server. Notethat the inequality sign in (1) holds when some flows in I donot have packets to transmit, and thus more resources can beallocated among any backlogged flows. Therefore, GPS hasthe properties of achieving both service isolation and a highresource multiplexing gain among flows.

However, if GPS is directly applied in the bi-resourcecontext (i.e., bi-resource GPS), it is difficult to simultaneouslymaintain fair allocation for both CPU time and link bandwidthand achieve high system performance. Consider flow i andflow j, with time profiles [τi,1, τi,2] and [τj,1, τj,2] respectively,traversing NFV node N1, with the same service priority forfair resource sharing. Assume τi,1 > τi,2 and τj,1 < τj,2. Ifwe apply the bi-resource GPS, both the maximum processingand transmission rates are equally divided for the two flows.Consequently, the performance of both flows traversing N1 isnot maximized: For flow i, due to unbalanced time profiles,the allocated link transmission rate ci,2 is larger than theprocessing rate ci,1, leading to resource wastage on linktransmission; The situation reverses for flow j, where packetsare accumulated for transmissions, causing an increase ofqueueing delay. Therefore, to improve the system perfor-mance, a basic principle [28] is that the fractions, hi,1 andhi,2, of CPU and bandwidth resources allocated to any flowi (∈ I) should be in the same proportion as its time profile,i.e., hi,1

hi,2=

τi,1τi,2

, to guarantee the allocated processing rate beequalized with the transmission rate, i.e., ci,1 = ci,2. In such away, queueing delay before packet transmissions of each flowcan be eliminated. However, with this basic principle, if weapply GPS on one of the two resources (i.e., single-resourceGPS with equalized processing and transmission rates), theallocation of the other type of resources among traffic flowscontinues to be unbalanced due to the discrepancy of timeprofiles among different flows.

To maximize the tradeoff between high performance andfair resource allocation for each traffic flow, we employ a

Page 5: End-to-End Delay Modeling for Embedded VNF Chains in 5G ...bbcr.uwaterloo.ca/~wzhuang/papers/5G delay modeling July18.pdf · Future communication networks are expected to provide

5

dominant-resource generalized processor sharing scheme [28]for the bi-resource allocation. The DR-GPS combines theconcepts of dominant resource fairness (DRF) [29] and GPS,in which the fractions of allocated dominant resources amongdifferent backlogged flows are equalized based on servicepriority, and the other type of resources are allocated to ensurethe processing rate equal transmission rate for each flow(the basic principle applies). When some flows do not havebacklogged packets for processing, their allocated resourcesare redistributed among other backlogged flows if any. Sincethere are a finite number of flow combinations to form abacklogged flow set out of I , we denote B (∈ I) as one ofthe flow combinations for a backlogged flow set. Each back-logged flow has a dominant resource consumption in eitherpacket processing or packet transmission. We mathematicallyformulate DR-GPS in (P1) when the set, I (|I| ≥ 1), of trafficflows traverse N1, where | · | is the set cardinality, as follows:

(P1) : max{h1,1, ..., hi,1, ..., hj,1, ..., h|B|,1}

s.t.

∑i∈B

hi,1 ≤ 1 (2a)∑i∈B

hi,2 ≤ 1 (2b)

hi,1 =τi,1τi,2

hi,2 (2c)

hi,dwi

=hj,dwj

, ∀i, j ∈ B (2d)

hi,1, hi,2, wi, wj ∈ [0, 1]. (2e)

In (P1), wi and wj are weights of resource allocation torepresent service priority for flow i and flow j, respectively,and hi,d is the fraction of occupied dominant resources offlow i, which is either hi,1 or hi,2. Constraint (2c) guaranteesci,1 = ci,2; Constraint (2d) equalizes the fractions of allocateddominant resources among the backlogged flows. Problem(P1) is a linear programming problem and can be solvedefficiently to obtain the optimal solutions of hi,1 and hi,2 forany flow i.

The DR-GPS has properties of i) service isolation byguaranteeing a service rate in (1) to each flow, and ii) workconservation by fully utilizing at least one of the two typesof resources in serving the backlogged flows [28], [36],[37]. Although the queueing delay for packet transmissions isreduced by employing the DR-GPS scheme, the total packetdelay2 for each flow traversing the same NFV node should beevaluated. With GPS [34], [35], the process of multiple trafficflows passing through common NFV node N1 is extractedas a tandem queueing model, shown in Fig. 2. The totalpacket delay is the summation of packet queueing delay beforeprocessing, packet processing delay and packet transmissiondelay. In the following, we develop an analytical model toevaluate the total packet delay for each traffic flow traversingN1. Based on the delay modeling for flows traversing thefirst NFV node, the end-to-end packet delay for traffic flowspassing through the embedded VNF chains can be analyzed.

2Total packet delay refers to the duration from the instant that a packetof one traffic flow reaches to the processing queue of the NFV node to theinstant when it is transmitted out of the NFV node over a physical link.

CPU processing Link transmission

Flow i

Flow j

,1ic ,2ic

,1jc ,2jc

Processing queue

( )j

l

( )il

...

...

...

Fig. 2: A queueing model for multiple traffic flows traversing N1.

V. END-TO-END DELAY ANALYSIS

In this section, we first analyze the total packet delay foreach traffic flow traversing the first NFV node, and thenextend the delay model to evaluate the end-to-end delay fortraffic lows passing through a sequence of NFV nodes of anembedded VNF chain.

A. Rate Decoupling

The main difficulty in analyzing the packet delay for eachflow is that both the processing and transmission rates ofeach flow depend on the backlog status of other flows atthe same NFV node. For the case of two flows, when oneof the flows has an empty processing queue, its processingand transmission resources are given to the other backloggedflow to exploit the traffic multiplexing gain. Therefore, theprocessing (transmission) rate of each flow switches betweentwo deterministic rate values, depending on the status of theother flow. For a general case where we have the set, I , ofmultiplexing flows which includes a set, B, of backloggedflows excluding the tagged flow i, the processing rate ci,1of flow i can be determined by solving (P1), based on setB of backlogged flows. We further denote B as Br wherer = 1, 2, ...,

( |I||B|), representing one of

( |I||B|)

combinationsof |B| backlogged flows. Thus, ci,1 changes with Br. Thiscorrelation of queue status among flows leads to the processingrate of each flow jumping over discrete deterministic values,making the total packet delay analysis complex.

To remove the coupling effect of instantaneous processingrate of one flow fluctuating with the backlog status of otherflows, we first determine the average processing rate µi,1 forflow i, with the consideration of resource multiplexing amongdifferent flows, i.e., non-empty probabilities of processingqueues from all other flows, through a set of high-ordernonlinear equations given by

µi,1 =

|I|−1∑|B|=0

M∑r=1

∏l∈Br

%l,1∏k∈Br

(1− %k,1)ci,1

%i,1 =λiµi,1

, ∀i ∈ I.

(3)

In (3), M =(|I|−1|B|), Br = I\{i

⋃Br}, and %i,1 is the non-

empty probability of processing queue for flow i at N1. Givenpacket arrival rate for any flow in I , (3) has 2|I| equationswith 2|I| variables and can be solved numerically for the set ofaverage processing rates of each flow. For analysis tractability,we use the average processing rate, µi,1, as an approximationfor the instantaneous processing rate ci,1 to transform theoriginal correlated processing system to a number of |I|

Page 6: End-to-End Delay Modeling for Embedded VNF Chains in 5G ...bbcr.uwaterloo.ca/~wzhuang/papers/5G delay modeling July18.pdf · Future communication networks are expected to provide

6

uncorrelated processing queues. A case of two traffic flows atan NFV node is demonstrated in Fig. 3. With the decoupled

CPU processing Link transmission

Flow i

Flow j

,1im,2ic

,1jm ,2jc

Processing queue

( )j

l

( )il

Decoupled processing

Flow i

Flow j

,1im

,1jm

Processing queue

( )j

l

( )il

Decoupled processing

(a)

(b)

,2im

,2jm

Decoupled transmission

Fig. 3: A queueing model for (a) decoupled packet processing and (b)decoupled packet processing and transmission.

deterministic processing rates, packet processing for each flowcan be modeled as an M/D/1 queueing process, upon whichwe can further calculate both the decoupled packet processingdelay and the total packet delay for processing. The accuracyof processing rate decoupling is verified through extensivesimulations presented in Section VI.

In Fig. 3(a), the instantaneous link transmission rate foreach flow is also correlated with the backlog status of otherflows. To remove the transmission rate correlation for packetdelay analysis at the first NFV node, we analyze the packetarrival process of each flow at link transmission (i.e., thedeparture process from the preceding packet processing) inthe following.

B. Queue Modeling at the First NFV Node

We study the packet departure process from each decoupledprocessing at N1. Taking flow i in Fig. 3 as an example, wefocus on the inter-departure time between two successivelydeparted packets from the decoupled processing. As indicatedin [38] and [39], for an M/D/1 queueing system, the queueoccupancy distribution in a steady state seen by a departingpacket is the same as that seen by an arriving packet due tothe Poisson characteristic of the arriving process. Therefore,a departing packet from the decoupled processing sees thesame empty probability of the processing queue as an arrivingpacket. Let random variable Yi be the inter-departure timeof successive packets of flow i departing from the decoupledprocessing at N1. If the l th departing packet sees a nonemptyqueue, then Yi = Ti, where Ti = 1

µi,1is the decoupled

processing time for a packet of flow i; If the departing packetsees an empty queue upon its departure, Yi = Xi + Ti,where random variable Xi denotes the duration from thetime of the l th packet departure of flow i to the time of(l + 1) th packet arrival. Because of the memoryless propertyof a Poisson arrival process, Xi has the same exponentialdistribution as packet inter-arrival time with parameter λi.

Therefore, the probability density function (PDF) of Yi, ξYi(t),

can be calculated as

ξYi(t) = (1− %i,1) ξ(Xi+Ti)(t) + %i,1ξTi(t) (4)

where %i,1 = λi

µi,1. Since Xi and Ti are independent random

variables, the PDF of Xi + Ti is the convolution of the PDFsof Xi and Ti. Thus, (4) is further derived as

ξYi(t) =

(1− λi

µi,1

)[ξXi(t)~ ξTi(t)] +

λiµi,1

ξTi(t)

=

(1− λi

µi,1

)[λie−λitu(t)~ δ(t− Ti)

]+

λiµi,1

δ(t− Ti)

=λi (µi,1 − λi)

µi,1e−λi(t−Ti)u(t− Ti) +

λiµi,1

δ(t− Ti)

(5)

where u(t) is the unit step function, δ(t) is the Dirac deltafunction, and ~ is the convolution operator. From (5), thecumulative distribution function (CDF) of Yi is given by

FYi(t) =

[1−

(1− λi

µi,1

)e−λi(t−Ti)

]u(t− Ti). (6)

Based on (5) and (6), both mean and variance of Yi can becalculated as

E[Yi] =1

λiand D[Yi] =

1

λ2i− 1

µi,12. (7)

From (6) and (7), we observe that, when λi is small, thedeparture process, delayed by the service time Ti, approachesthe Poisson arrival process with parameter λi; when λi isincreased to approach µi,1, the departure process approachesthe deterministic process with rate µi,1.

Since the packet departure rate from each decoupled pro-cessing is the same as packet arrival rate at each processingqueue, the decoupled transmission rate µi,2 for flow i is thesame as µi,1, by solving the set of equations in (3). At thispoint, we have a complete decoupled queueing model of bothpacket processing and packet transmission for flow i traversingthe first NFV node N1, shown in Fig. 3(b). The average totalpacket delay for flow i is determined by

Di,1 =1

µi,1+

λi2µi,12(1− %i,1)

+1

µi,2. (8)

Before modeling the delay for flows going through thesecond NFV node, N2, we analyze the departure process forpacket transmissions of each flow at N1. Similar to the analysisof packet departure process from the decoupled processingfor flow i at N1, we set time 0 as the instant when the l thpacket departs from the processing queue and reaches thetransmitting queue for immediate packet transmission. Sincewe have µi,1 = µi,2, Ti also indicates packet transmissiontime, i.e., Ti = 1

µi,2. Let Zi denote packet inter-departure

time for flow i passing though link transmission. If the l thdeparting packet from the processing queue sees a nonemptyqueue, we have Zi = Ti; otherwise, the following two casesapply, as illustrated in Fig. 4:

Page 7: End-to-End Delay Modeling for Embedded VNF Chains in 5G ...bbcr.uwaterloo.ca/~wzhuang/papers/5G delay modeling July18.pdf · Future communication networks are expected to provide

7

Case 1 – If the (l + 1) th packet’s arrival time at theprocessing queue is greater than the l th packet’s transmissiontime at the transmitting queue, i.e, Xi > Ti, we have

Zi = ζ1 + 2ζ2 = (Xi − Ti) + 2Ti = Xi + Ti (9)

where ζ1 indicates the duration from the instant that the l thpacket departs from the transmission queue till the instant thatthe (l+1) th packet arrives at the processing queue, and ζ2 =Ti;

Case 2 – If the (l + 1) th packet arrives at the processingqueue while the l th packet is still at the transmission queue,i.e., Xi ≤ Ti, we have

Zi = ζ ′1 + ζ ′2 = [Ti − (Ti −Xi)] + Ti = Xi + Ti (10)

where ζ ′1 denotes the remaining processing time on the (l +1) th packet at the processing queue after the l th packet departsfrom transmission queue, and ζ ′2 = Ti.

0 TimeiT

iX

i iX T+ 2

i iX T+

l th departure from

transmitting queue (l+1) th departure

0 TimeiT

iX

i iX T+ 2

i iX T+

(l+1) th departure

iZ

iZ

(a)i iX T>

(b)i iX T£

1z 2z 2z

'

1z'

2z

l th departure from

transmitting queue

Fig. 4: A composition of Zi under different cases.

As a result, the PDF of Zi is given by

ξZi(t) = (1− %i,1)

[P{Xi ≤ Ti}ξ(Xi+Ti)(t)

+P{Xi > Ti}ξ(Xi+Ti)(t)]+ %i,1ξTi(t)

= (1− %i,1) ξ(Xi+Ti)(t) + %i,1ξTi(t).

(11)

Comparing with (4) and (11), we conclude that Zi and Yi haveexactly the same probability distribution, and thus any orderof statistics (e.g. expectation, variance).

C. Delay over the Virtual Link between NFV Nodes N1 andN2

So far, we derive the packet departure process of each flowfrom the link transmission at the first NFV node N1. Beforereaching to the second NFV node N2, the flows may gothrough a sequence of network switches and physical linksforwarding the traffic. The transmission rates allocated toflow i from these switches and links are the same as thetransmission rates µi,2 from N1 to maximize the bandwidthutilization [40]. Therefore, queueing delays on switches andlinks are not considered for each flow. The total packettransmission delay for flow i traversing n1 switches and n1links before reaching N2 is given by

D(f)i,1 =

2n1µi,2

. (12)

D. Delay at the Second NFV Node

Proposition 1. A Poisson packet flow traverses a tandem ofk (k = 1, 2, 3, ...) servers, each with deterministic servicecapacity Y (k). If we have Y (q) ≥ Y (q−1), ∀q ∈ [2, k],the departure process of the traffic flow coming out of thek th (k ≥ 2) server remains the same as the departure processfrom the first server.

The proof of Proposition 1 is given in Appendix A.According to Proposition 1, the arrival process at the second

NFV node N2 is the same as the traffic departure process fromN1. Based on (7) and (11), the arrival rate for flow i at N2 isλi. Thus, the same method can be used as in (3) to get a set ofdecoupled processing and transmission rates µ′i,1 and µ′i,2 forflow i at N2, as shown in Fig. 5, by taking into considerationthe time profiles of traffic flows going through the new VNF(s)at N2 and instantaneous processing and transmission rates c′i,1and c′i,2 allocated to flow i. The main difference in packetdelay modeling for flow i at N2 from that at N1 is that thepacket arrival process for flow i is a general process withaverage arrival rate λi. The process has the inter-arrival timeZi with the same CDF, expectation and variance as those ofYi in (6) and (7). Thus, we can model packet processing atN2 as a G/D/1 queueing process3, where the average packetqueueing delay before processing for flow i at N2 is given by[38]

Wi,2 =λi

(1λ2i− 1

µi,12 − σ2

i

)2(1− %′i,1

) ≤λi

(1λ2i− 1

µi,12

)2(1− %′i,1

) . (13)

In (13), ρ′i,1 = λi

µ′i,1

, Te is the idle duration within inter-arrivaltime of successive packets of flow i at N2, with variance σ2

i .Since the arrival process at N2 for each flow correlates with

the preceding decoupled processing rates at N1, as indicatedin (6), the G/D/1 queueing model is not accurate especiallywhen λi becomes large [38]. Also, it is difficult to obtain thedistribution of Te to calculate σ2

i in (13). Using the upperbound in (13) to approximate Wi,2 is not accurate when λiis small (the queueing system is lightly loaded), since theprobability of an arriving packet at the processing queue ofN2 seeing an empty queue increases, and σ2

i becomes large.

Flow i

Flow j

Processing queue

j

Decoupled processing Decoupled transmission

Node Node

...

Fig. 5: A decoupled queueing model for traffic flows traversing the first andsecond NFV nodes in sequence.

From (6) and (7), in the case of µ′i,1 < µi,2, Zi ismore likely to approach an exponentially distributed randomvariable than a deterministic value with varying λi under the

3Note that we consider the case where µ′i,1 < µi,2, ∀i ∈ I ; For the caseof µ′i,1 ≥ µi,2, there is no queueing delay for processing at N2.

Page 8: End-to-End Delay Modeling for Embedded VNF Chains in 5G ...bbcr.uwaterloo.ca/~wzhuang/papers/5G delay modeling July18.pdf · Future communication networks are expected to provide

8

condition of ρ′i,1 < 1. Therefore, to make the arrival processof each flow at the processing queue of N2 independentof processing and transmission rates at N1, we approximatepacket arrival process of flow i at N2 as a Poisson process withrate parameter λi, and establish an M/D/1 queueing modelto represent the packet processing for flow i. Proposition 2indicates that the average packet queueing delay Qi,2 in theM/D/1 queueing model is an improved upper bound over thatin the G/D/1 system in (13), especially when the input trafficis lightly loaded.

Proposition 2. Given µ′i,1 < µi,2, Qi,2 is an upper boundof Wi,2 when the processing queue for flow i at N2 is bothlightly-loaded and heavily-loaded.

The proof of Proposition 2 is given in Appendix B.According to the approximation on packet arrival process

for flow i at N2, average total packet delay at N2 is calculated,independently of the processing and transmission rates at N1,given by

Di,2 =

1

µ′i,1+

λi

2µ′i,12 (1− ρ′i,1) + 1

µ′i,2, µ′i,1 < µi,2

1

µ′i,1+

1

µ′i,2, µ′i,1 ≥ µi,2.

(14)

E. Average E2E Delay

Based on the same methodology of delay modeling forpackets traversing N2, the average total packet delay forflow i traversing each subsequent NFV node (if any) can bederived independently. Under the condition that the decoupledpacket processing rate of flow i at one subsequent NFV nodeNz (z > 2) is smaller than the decoupled packet transmissionrate at its preceding NFV node Nz−1, using an approximatedM/D/1 queueing model to represent packet processing at Nz isvalid since packet arrival process of flow i at Nz is more likelyto approach a Poisson process with varying λi. In general, theaverage E2E delay for a packet of flow i passing throughan embedded VNF chain, consisting of m NFV nodes, is thesummation of average total delay for a packet passing throughall NFV nodes and the total transmission delay on switchesand links along the path forwarding the packet, given by

Di =

m∑z=1

Di,z +

m∑z=1

D(f)i,z . (15)

In (15), Di,z is the average packet delay for flow i passingthrough z th NFV node of the embedded VNF chain, D(f)

i,z isthe total packet transmission delay for flow i traversing nzswitches and nz links before reaching NFV node Nz+1, andis determined in the same way as in (12).

VI. SIMULATION RESULTS

In this section, simulation results are presented to verify theaccuracy of the proposed packet delay modeling of each flowpassing through an embedded VNF chain. All simulations arecarried out using the network simulator OMNeT++ [41]. Weconsider the network scenario where two equally weighted

flows, i and j, traversing logic VNF chains firewall (f1) →DNS (f3) and firewall (f1) → IDS (f2) respectively, areembedded on a common physical path and share a set ofprocessing and transmission resources, as shown in Fig. 1.Flow i represents DNS request traffic, whereas flow j indicatesa video-conferencing data streaming. The packet arrival rateλi for flow i is set to 150 packet/s with packet size of 4000bits [42]. The packet size for flow j is set to 16000 bits,and we vary its arrival rate, λj , from 75 packet/s to 350packet/s to reflect different traffic load conditions. The ratevector for each flow traversing an NFV node is tested overOpenStack [43], which is a resource virtualization platforminstalled on each NFV node. By injecting traffic flows withdifferent packet sizes into different VNFs programmed on theOpenStack, we test maximum available packet processing andtransmission rates for different flows. With DR-GPS, each flowis allocated a fraction of the maximum available processingand transmission rates, upon which packet-level simulation isconducted to evaluate packet delay of each flow traversingeach NFV node. Table II summarizes the rate vectors for flowsi and j traversing different VNFs. Other important simulationsettings are also included.

A. Packet Delay at the First NFV Node (N1)

We first compare packet processing delay and packet queue-ing delay for each flow traversing N1. In Fig. 6, it isdemonstrated that both packet processing delay and packetqueueing delay derived using rate decoupling between flowsi and j are close to the simulation results with rate coupling.We can see from Fig. 6(a) that the decoupled processingrate for flow i decreases with λj , since the processing queuenonempty probability for flow j increases statistically, shrink-ing the decoupled processing rates for the other flow. Packettransmission delay and queueing delay before transmission areevaluated for both flows at N1 in Fig. 7. For packet transmis-sion delay, the analytical results match the simulation resultswell, and there is almost no queueing delay before packettransmissions, which indicates the accuracy of the proposedtransmission rate decoupling. In Fig. 8, we compare queueingdelays before packet transmissions for flows i and j at N1 byemploying DR-GPS and bi-resource GPS schemes. Althoughthe bi-resource GPS achieves fair allocation on both CPU andbandwidth resources between the two flows, the amount ofallocated bandwidth resources is overly provisioned for flowi and is underestimated for flow j, due to the discrepancy oftime profiles for different flows. Thus, packet queueing delaybefore link transmission for flow j becomes much larger thanthat with the DR-GPS where packet queueing delays for bothflows at the transmission link of N1 are minimized.

B. Packet Delay at the Second NFV Node (N2)

After traversing the first NFV node, N1, packet processingdelay for flow i and flow j at the second NFV node, N2, isevaluated in Fig. 9. With a close match between analyticaland simulation results, it is verified that the processing ratedecoupling at N2 is accurate. Packet queueing delay beforeprocessing for both flows at N2 are demonstrated in Fig.

Page 9: End-to-End Delay Modeling for Embedded VNF Chains in 5G ...bbcr.uwaterloo.ca/~wzhuang/papers/5G delay modeling July18.pdf · Future communication networks are expected to provide

9

TABLE II: Simulation parameters

ParametersTraffic flows Flow i Flow j

Rate vector (Firewall) [1000, 2000] packet/s [750, 500] packet/sRate vector (DNS) [1000, 1250] packet/s —Rate vector (IDS) — [800, 312.5] packet/s

n1 20 20n2 25 25

Simulation time 1000 s 1000 s

80 100 120 140 160 180 200 220 2400

0.5

1

1.5

2

2.5

3

Packet Arrival Rate of Flow j (λj)

PacketProcessingDelay

atN

1(m

s)

Flow i (decoupled)

Flow i (simulation)

Flow j (decoupled)

Flow j (simulation)

(a)

80 100 120 140 160 180 200 220 2400

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

Packet Arrival Rate of Flow j (λj)

Packet

Queu

eingDelay

forProcessing(m

s)

Flow i (decoupled)

Flow i (simulation)

Flow j (decoupled)

Flow j (simulation)

(b)

Fig. 6: Packet delay for processing at N1; (a) Average packet processingdelay (b) Average packet queueing delay.

10 and Fig. 11. As λj increases, the packet queueing delayfor flow i increases slightly since the resource multiplexinggain obtained by flow i from flow j becomes small. We cansee from Fig. 10 that the M/D/1 queueing model for packetprocessing at N2 provides a tighter upper bound than theG/D/1 queueing delay upper bound for flow i. In comparisonwith the results for flow j in Fig. 11, we observe that theG/D/1 upper bound of packet queueing delay is looser for flowi, since the processing queue for flow i is lightly loaded with

80 100 120 140 160 180 200 220 2400

0.5

1

1.5

2

2.5

Packet Arrival Rate of Flow j (λj)

Pack

et D

elay

(m

s)

Transmission delay (flow i, decoupled)Transmission delay, (flow i, simulation)Transmission delay (flow j, decoupled)Transmission delay, (flow j, simulation) Queueing delay (flow i, simulation)Queueing delay (flow j, simulation)

Fig. 7: Delay for packet transmissions at N1.

150 200 250 300 35010

−3

10−2

10−1

100

101

Packet Arrival Rate of Flow j (λj)

Queu

eingDelay

forPacketTransm

ission

(ms)

DR-GPS, flow i (simulation)

DR-GPS, flow j (simulation)

Bi-resource GPS, flow i (simulation)

Bi-resource GPS, flow j (simulation)

Fig. 8: Queueing delay for packet transmissions under different resourceallocation disciplines.

low queue nonempty probability shown in Fig. 12, whereas theG/D/1 queueing delay upper bound becomes tighter for flowj with the increase of λj but less accurate in a heavy trafficload condition. For both flows, the proposed M/D/1 queueingmodel is an improved upper bound to approximate the packetqueueing delay before processing at N2.

Packet transmission delay for both flows at N2 is demon-strated in Fig. 13. Similar to Fig. 7, we can see the decoupledtransmission rates for each flow are close to the simulation

Page 10: End-to-End Delay Modeling for Embedded VNF Chains in 5G ...bbcr.uwaterloo.ca/~wzhuang/papers/5G delay modeling July18.pdf · Future communication networks are expected to provide

10

80 100 120 140 160 180 200 220 2401

1.5

2

2.5

3

3.5

4

Packet Arrival Rate of Flow j (λj)

Packet

ProcessingDelay

atN

2(m

s)

Flow i (decoupled)

Flow i (simulation)

Flow j (decoupled)

Flow j (simulation)

Fig. 9: Packet processing delay at N2.

80 100 120 140 160 180 200 220 24010

−2

10−1

100

101

Packet Arrival Rate of Flow j (λj)

PacketQueu

eingDelay

forFlow

i(m

s)

M/D/1 approximation

Simulation

G/D/1 upper bound

Fig. 10: Packet queueing delay for flow i at N2.

80 100 120 140 160 180 200 220 2400

2

4

6

8

10

12

14

16

18

Packet Arrival Rate of Flow j (λj)

PacketQueueingDelay

forFlow

j(m

s)

M/D/1 approximation

Simulation

G/D/1 upper bound

Fig. 11: Packet queueing delay for flow j at N2.

80 100 120 140 160 180 200 220 2400.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Packet Arrival Rate of Flow j (λj)

Nonem

pty

Probab

ilityforProcessingQueu

esatN

2

Flow i

Flow j

Fig. 12: Nonempty probability for processing queues at N2.

80 100 120 140 160 180 200 220 2400

0.5

1

1.5

2

2.5

3

3.5

4

Average Packet Arrival Rate of Flow j (λj)

PacketDelay

atN

2(m

s)

Transmission delay (flow i, decoupled)Transmission delay, (flow i, simulation)Transmission delay (flow j, decoupled)Transmission delay, (flow j, simulation) Queueing delay (flow i, simulation)Queueing delay (flow j, simulation)

Fig. 13: Delay for packet transmissions at N2.

80 100 120 140 160 180 200 220 240100

150

200

250

300

350

Packet Arrival Rate of Flow j (λj)

End-to-EndPacket

Delay

(ms)

Flow i (analytical)

Flow i (simulation)

Flow j (analytical)

Flow j (simulation)

Fig. 14: End-to-End packet delay for both flows.

Page 11: End-to-End Delay Modeling for Embedded VNF Chains in 5G ...bbcr.uwaterloo.ca/~wzhuang/papers/5G delay modeling July18.pdf · Future communication networks are expected to provide

11

results, and queueing delay before packet transmissions isnegligible. Lastly, we evaluate in Fig. 14 the end-to-end delayfor packets of each flow going through the whole embeddedphysical network path, which is a summation of the totalpacket delay for traversing all NFV nodes and packet transmis-sion delay over all physical links and network switches. It isdemonstrated that the proposed analytical modeling providesan accurate delay evaluation on end-to-end packet processingand transmission for multiple flows traversing VNF chainsembedded on a common physical network path. The proposedanalytical modeling can help to support E2E delay-aware VNFchain embedding.

VII. CONCLUSION

In this paper, we present an analytical model to evaluateE2E packet delay for multiple traffic flows traversing a com-mon embedded VNF chain. With DR-GPS, both CPU andbandwidth resources are allocated among different flows ateach NFV node to achieve dominant-resource fair allocationwith high resource utilization. A tandem queueing model isestablished to describe packets of each flow passing throughan NFV node and its outgoing link. By removing the couplingeffect on instantaneous packet processing rates among multipleflows, an M/D/1 queueing model is used to determine averagepacket delay at each decoupled processing queue of the firstNFV node. The correlation of packet transmission rates isalso removed in the modeling based on the analysis of packetdeparture process from each decoupled processing queue. Wefurther analyze the packet arrival process of each flow at thesubsequent NFV node, and establish an approximated M/D/1queueing model to determine the average packet delay ofa flow at a decoupled processing queue of the NFV node,which is proved to be a more accurate upper bound than thatusing a G/D/1 queueing model in both lightly- and heavily-loaded traffic conditions. Packet transmission delay over eachembedded virtual link between consecutive NFV nodes isalso derived for E2E delay calculation. Simulation resultsdemonstrate the accuracy and effectiveness of the proposedanalytical E2E packet delay modeling for achieving delay-aware VNF chain embedding.

ACKNOWLEDGMENT

The authors would like to thank Yushi (Alfred) Cao (anUndergraduate Research Assistant) for his help in testing overOpenStack the time profiles for traffic flows traversing an NFVnode.

APPENDIX A. PROOF OF PROPOSITION 1

Proof: Define Z(k) as packet inter-departure time at the k thserver, and define ρ(1) as the probability that the l th departingpacket sees a nonempty queue at the first server. If the l thdeparting packet sees an empty queue at the first server, we letX(1) be the duration from time 0 till the instant that (l+1) thpacket arrives at the server. With Y (q) ≥ Y (q−1), there is noqueueing delay at each of the servers following the first server.Similar to the description in Fig. 4, if the l th departing packet

sees a nonempty queue, we have Z(k) = Y (1); Otherwise, twocases are considered:

Case 1 – If X(1) >k∑q=2

Y (q),

Z(k) =

(X(1) −

k∑q=2

Y (q)

)+

k∑q=1

Y (q) = X(1)+Y (1); (16)

Case 2 – If X(1) ≤k∑q=2

Y (q),

Z(k) =

k∑q=1

Y (q)−

(k∑q=2

Y (q) −X(1)

)= Y (1)+X(1). (17)

Hence, Z(k) has the same PDF as Yi derived in (4), whichends the proof.

APPENDIX B. PROOF OF PROPOSITION 2

Proof: When λi is small, σ2i in (13) is close to that for an

M/D/1 queueing system, given by [38]

σ2i ≈

1

λ2i− 1

µ′i,12 . (18)

Hence, Wi,2 is further derived as

Wi,2 ≈λi

[1λ2i− 1

µi,12 −

(1λ2i− 1

µ′i,1

2

)]2(1− %′i,1

)=λi

(1

µ′i,1

2 − 1µi,1

2

)2(1− %′i,1

) .

(19)

When λi becomes large, the idle duration within inter-arrivaltime of successive packets of flow i at N2 is small, makingσ2i negligible. Thus, we have

Wi,2 ≈λi

(1λ2i− 1

µi,12

)2(1− %′i,1

) ≈λi

(1

µ′i,1

2 − 1µi,1

2

)2(1− %′i,1

) . (20)

On the other hand, under both traffic load cases, Qi,2 in theapproximated M/D/1 queueing system is derived as

Qi,2 =λi

2µ′i,12 (1− ρ′i,1) ≥Wi,2. (21)

Thus, we prove that Qi,2 is an upper bound of Wi,2 inboth lightly-loaded and heavily-loaded traffic conditions, andbecomes a tighter upper bound than that in the G/D/1 systemin (13) when λi is small.

REFERENCES

[1] Q. Ye, J. Li, K. Qu, W. Zhuang, X. Shen, and X. Li, “End-to-end qualityof service in 5G networks – Examining the effectiveness of a networkslicing framework,” IEEE Veh. Technol. Mag., vol. 13, no. 2, pp. 65–74,Jun. 2018.

[2] A. Al-Fuqaha, M. Guizani, M. Mohammadi, M. Aledhari, andM. Ayyash, “Internet of Things: A survey on enabling technologies,protocols, and applications,” IEEE Commun. Surv. Tutor., vol. 17, no. 4,pp. 2347–2376, Fourth Quarter 2015.

[3] Q. Ye and W. Zhuang, “Distributed and adaptive medium access controlfor Internet-of-Things-enabled mobile networks,” IEEE Internet ThingsJ., vol. 4, no. 2, pp. 446–460, Apr. 2017.

Page 12: End-to-End Delay Modeling for Embedded VNF Chains in 5G ...bbcr.uwaterloo.ca/~wzhuang/papers/5G delay modeling July18.pdf · Future communication networks are expected to provide

12

[4] Q. Ye and W. Zhuang, “Token-based adaptive MAC for a two-hopInternet-of-Things enabled MANET,” IEEE Internet Things J., vol. 4,no. 5, pp. 1739–1753, Oct. 2017.

[5] H. Li, M. Dong, and K. Ota, “Radio access network virtualization forthe social Internet of Things,” IEEE Cloud Comput., vol. 2, no. 6, pp.42–50, Nov. 2015.

[6] W. Wu, Q. Shen, K. Aldubaikhy, N. Cheng, N. Zhang, and X. Shen,“Enhance the edge with beamforming: Performance analysis ofbeamforming-enabled WLAN,” in Proc. IEEE WiOpt’ 18, May 2018,pp. 1–6.

[7] X. Duan, C. Zhao, S. He, P. Cheng, and J. Zhang, “Distributedalgorithms to compute Walrasian equilibrium in mobile crowdsensing,”IEEE Trans. Ind. Electron., vol. 64, no. 5, pp. 4048–4057, May 2017.

[8] G. Yang, S. He, and Z. Shi, “Leveraging crowdsourcing for efficientmalicious users detection in large-scale social networks,” IEEE InternetThings J., vol. 4, no. 2, pp. 330–339, Apr. 2017.

[9] L. Lyu, C. Chen, Z. Shanying, and X. Guan, “5G enabled co-design ofenergy-efficient transmission and estimation for industrial IoT systems,”IEEE Trans. Ind. Informat., to appear, doi: 10.1109/TII.2018.2799685.

[10] F. Lyu, H. Zhu, H. Zhou, W. Xu, N. Zhang, M. Li, and X. Shen,“SS-MAC: A novel time slot-sharing MAC for safety messages broad-casting in VANETs,” IEEE Trans. Veh. Technol., to appear, doi:10.1109/TVT.2017.2780829.

[11] W. Xu, H. A. Omar, W. Zhuang, and X. Shen, “Delay analysis of in-vehicle Internet access via on-road WiFi access points,” IEEE Access,vol. 5, pp. 2736–2746, Feb. 2017.

[12] R. Mijumbi, J. Serrat, J. L. Gorricho, N. Bouten, F. D. Turck, andR. Boutaba, “Network function virtualization: State-of-the-art and re-search challenges,” IEEE Commun. Surv. Tutor., vol. 18, no. 1, pp. 236–262, First Quarter 2016.

[13] L. Wang, Z. Lu, X. Wen, R. Knopp, and R. Gupta, “Joint optimizationof service function chaining and resource allocation in network functionvirtualization,” IEEE Access, vol. 4, pp. 8084–8094, Nov. 2016.

[14] F. Bari, S. R. Chowdhury, R. Ahmed, R. Boutaba, and O. C. M. B.Duarte, “Orchestrating virtualized network functions,” IEEE Trans.Netw. Serv. Manage., vol. 13, no. 4, pp. 725–739, Dec. 2016.

[15] N. Zhang, P. Yang, S. Zhang, D. Chen, W. Zhuang, B. Liang, andX. Shen, “Software defined networking enabled wireless network vir-tualization: Challenges and solutions,” IEEE Netw., vol. 31, no. 5, pp.42–49, May 2017.

[16] S. Q. Zhang, Q. Zhang, H. Bannazadeh, and A. Leon-Garcia, “Routingalgorithms for network function virtualization enabled multicast topol-ogy on SDN,” IEEE Trans. Netw. Serv. Manage., vol. 12, no. 4, pp.580–594, Dec. 2015.

[17] J. Ordonez-Lucena, P. Ameigeiras, D. Lopez, J. J. Ramos-Munoz,J. Lorca, and J. Folgueira, “Network slicing for 5G with SDN/NFV:Concepts, architectures, and challenges,” IEEE Commun. Mag., vol. 55,no. 5, pp. 80–87, May 2017.

[18] X. Li, J. Rao, H. Zhang, and A. Callard, “Network slicing with elasticSFC,” in Proc. IEEE VTC’ 17, Sept. 2017, pp. 1–5.

[19] J. Li, N. Zhang, Q. Ye, W. Shi, W. Zhuang, and X. Shen, “Joint resourceallocation and online virtual network embedding for 5G networks,” inProc. IEEE GLOBECOM’ 17, Dec. 2017, pp. 1–6.

[20] M. Mechtri, C. Ghribi, and D. Zeghlache, “A scalable algorithm for theplacement of service function chains,” IEEE Trans. Netw. Serv. Manage.,vol. 13, no. 3, pp. 533–546, Sept. 2016.

[21] B. A. A. Nunes, M. Mendonca, X. N. Nguyen, K. Obraczka, andT. Turletti, “A survey of software-defined networking: Past, present, andfuture of programmable networks,” IEEE Commun. Surv. Tutor., vol. 16,no. 3, pp. 1617–1634, Third Quarter 2014.

[22] H. Li, K. Ota, M. Dong, and M. Guo, “Mobile crowdsensing in softwaredefined opportunistic networks,” IEEE Commun. Mag., vol. 55, no. 6,pp. 140–145, Jun. 2017.

[23] H. Li, M. Dong, and K. Ota, “Control plane optimization in software-defined vehicular ad hoc networks,” IEEE Trans. Veh. Technol., vol. 65,no. 10, pp. 7895–7904, Oct. 2016.

[24] C. Clark, K. Fraser, S. Hand, J. G. Hansen, E. Jul, C. Limpach, I. Pratt,and A. Warfield, “Live migration of virtual machines,” in Proc. ACMUSENIX’ 05, May 2005, pp. 273–286.

[25] S. Akoush, R. Sohan, A. Rice, A. W. Moore, and A. Hopper, “Pre-dicting the performance of virtual machine migration,” in Proc. IEEEMASCOTS’ 10, Aug. 2010, pp. 37–46.

[26] S. Ayoubi, C. Assi, K. Shaban, and L. Narayanan, “MINTED: Multicastvirtual network embedding in cloud data centers with delay constraints,”IEEE Trans. Commun., vol. 63, no. 4, pp. 1291–1305, Apr. 2015.

[27] A. Ghodsi, V. Sekar, M. Zaharia, and I. Stoica, “Multi-resource fairqueueing for packet processing,” ACM SIGCOMM Comput. Commun.Rev., vol. 42, no. 4, pp. 1–12, Oct. 2012.

[28] W. Wang, B. Liang, and B. Li, “Multi-resource generalized processorsharing for packet processing,” in Proc. ACM IWQoS’ 13, Jun. 2013,pp. 1–10.

[29] A. Ghodsi, M. Zaharia, B. Hindman, A. Konwinski, S. Shenker, andI. Stoica, “Dominant resource fairness: Fair allocation of multipleresource types,” in Proc. ACM NSDI’ 11, Apr. 2011, pp. 24–37.

[30] F. C. Chua, J. Ward, Y. Zhang, P. Sharma, and B. A. Huberman,“Stringer: Balancing latency and resource usage in service function chainprovisioning,” IEEE Internet Comput., vol. 20, no. 6, pp. 22–31, Nov.2016.

[31] B. Xiong, K. Yang, J. Zhao, W. Li, and K. Li, “Performance evaluation ofOpenFlow-based software-defined networks based on queueing model,”Comput. Netw., vol. 102, pp. 172–185, Jun. 2016.

[32] M. Jarschel, S. Oechsner, D. Schlosser, R. Pries, S. Goll, and P. Tran-Gia,“Modeling and performance evaluation of an openflow architecture,” inProc. 23rd ITC, Sept. 2011, pp. 1–7.

[33] N. Egi et al., “Understanding the packet processing capability of multi-core servers,” Intel Tech. Rep., 2009.

[34] A. K. Parekh and R. G. Gallager, “A generalized processor sharingapproach to flow control in integrated services networks: The single-node case,” IEEE/ACM Trans. Netw., vol. 1, no. 3, pp. 344–357, Jun.1993.

[35] Z.-L. Zhang, D. Towsley, and J. Kurose, “Statistical analysis of thegeneralized processor sharing scheduling discipline,” IEEE J. Sel. AreasCommun., vol. 13, no. 6, pp. 1071–1080, Aug. 1995.

[36] D. C. Parkes, A. D. Procaccia, and N. Shah, “Beyond dominant resourcefairness: Extensions, limitations, and indivisibilities,” ACM Trans. Econ.Comput., vol. 3, no. 1, p. 3, Mar. 2015.

[37] A. Gutman and N. Nisan, “Fair allocation without trade,” in Proc. ACMAAMAS’ 12, Jun. 2012, pp. 719–728.

[38] D. P. Bertsekas, R. G. Gallager, and P. Humblet, Data networks.Englewood Cliffs, NJ, USA: Prentice-hall, 1987, vol. 2.

[39] L. Kleinrock, Queueing systems, volume 1: Theory. New York: Wiley,1976.

[40] S. Larsen, P. Sarangam, R. Huggahalli, and S. Kulkarni, “Architecturalbreakdown of end-to-end latency in a TCP/IP network,” Int. J. ParallelProgram., vol. 37, no. 6, pp. 556–571, Dec. 2009.

[41] “OMNeT++ 5.0,” [Online]. Available: http://www.omnetpp.org/omnetpp.

[42] A. Liska and G. Stowe, DNS security: Defending the domain namesystem. Syngress, 2016.

[43] “Openstack (Release Pike),” [Online]. Available: https://www.openstack.org.

Qiang Ye (S’16-M’17) received the B.S. degree innetwork engineering and M.S. degree in communica-tion and information system from the Nanjing Uni-versity of Posts and Telecommunications, Nanjing,China, in 2009 and 2012, respectively, and the Ph.D.degree in electrical and computer engineering fromthe University of Waterloo, Waterloo, ON, Canada,in 2016. He has been a Post-Doctoral Fellow withthe Department of Electrical and Computer Engi-neering, University of Waterloo, since 2016. Hiscurrent research interests include SDN and NFV,

network slicing for 5G networks, VNF chain embedding and end-to-endperformance analysis, medium access control and performance optimizationfor mobile ad hoc networks and Internet of Things.

Page 13: End-to-End Delay Modeling for Embedded VNF Chains in 5G ...bbcr.uwaterloo.ca/~wzhuang/papers/5G delay modeling July18.pdf · Future communication networks are expected to provide

13

Weihua Zhuang (M’93-SM’01-F’08) has been withthe Department of Electrical and Computer En-gineering, University of Waterloo, Waterloo, ON,Canada, since 1993, where she is a Professor anda Tier I Canada Research Chair in Wireless Com-munication Networks. She is the recipient of 2017Technical Recognition Award from IEEE Commu-nications Society Ad Hoc & Sensor Networks Tech-nical Committee, one of 2017 ten N2Women (Starsin Computer Networking and Communications), anda co-recipient of several best paper awards from

IEEE conferences. Dr. Zhuang was the Editor-in-Chief of IEEE Transactionson Vehicular Technology (2007-2013), Technical Program Chair/Co-Chair ofIEEE VTC Fall 2017 and Fall 2016, and the Technical Program SymposiaChair of the IEEE GLOBECOM 2011. She is a Fellow of the IEEE, theRoyal Society of Canada, the Canadian Academy of Engineering, and theEngineering Institute of Canada. Dr. Zhuang is an elected member in theBoard of Governors and VP Publications of the IEEE Vehicular TechnologySociety. She was an IEEE Communications Society Distinguished Lecturer(2008-2011).

Xu Li is a staff researcher at Huawei TechnologiesInc., Canada. He received a Ph.D. (2008) degreefrom Carleton University, an M.Sc. (2005) degreefrom the University of Ottawa, and a B.Sc. (1998)degree from Jilin University, China, all in computerscience. Prior to joining Huawei, he worked as aresearch scientist (with tenure) at Inria, France. Hiscurrent research interests are focused in 5G systemdesign and standardization, along with 90+ refereedscientific publications, 40+ 3GPP standard proposalsand 50+ patents and patent filings. He is/was on

the editorial boards of the IEEE Communications Magazine, the IEEETransactions on Parallel and Distributed Systems, among others. He was aTPC co-chair of IEEE VTC 2017 (fall) LTE, 5G and Wireless NetworksTrack, IEEE Globecom 2013 Ad Hoc and Sensor Networking Symposium.He was a recipient of NSERC PDF awards, IEEE ICNC 2015 best paperaward, and a number of other awards.

Jaya Rao (M’14) received his B.S. and M.S. degreesin Electrical Engineering from the University ofBuffalo, New York, in 2001 and 2004, respectively,and his Ph.D. degree from the University of Calgary,Canada, in 2014. He is currently a Senior ResearchEngineer at Huawei Technologies Canada, Ottawa.Since joining Huawei in 2014, he has worked on re-search and design of CIoT, URLLC and V2X basedsolutions in 5G New Radio. He has contributedfor Huawei at 3GPP RAN WG2, RAN WG3, andSA2 meetings on topics related to URLLC, network

slicing, mobility management, and session management. From 2004 to 2010,he was a Research Engineer at Motorola Inc. He was a recipient of the BestPaper Award at IEEE WCNC 2014.