mobile computing softwarization and machine learning for

Networking Laboratory 1/36

Sungkyunkwan University

Copyright 2000-2020 Networking Laboratory

Mobile Computing

Softwarization and Machine Learning

for Intelligent Networks

Dr. Syed M. Raza and Prof. Hyunseung Choo

[email protected], [email protected]

8th November, 2020

mailto:[email protected]

mailto:[email protected]


Outline (1/3)

Limitations of conventional networks (4)

Software-defined networking (19)► Historical perspective (1)

► Architecture (2)

► OpenFlow (3)

► Testbeds (3) GENI

OFELIA

KOREN

► Mininet (5) Introduction

Features and benefits

Video

Extensions

► Controllers (3) Overview

ONOS

Floodlight

► Wrap up (1)


Outline (2/3)

Network function virtualization (NFV) (8)► Overview (1)

► Practical example (1)

► Architectural framework (3) Complete framework

NFV orchestrator

Virtual Network Function (VNF) manager

► Containers (4) Architecture

Differences from Virtual Machines

Available container platforms

Practical example

► Wrap up (1)

Network management and control for future networks (35)

► Network management based on Machine Learning (ML) (2)

► ML approaches (9)

► ML techniques (2)


Outline (3/3)

► Neural Network (NN) (17)

Introduction video

Overview

Neuron

Activation functions (Sigmoid, Tanh, ReLU, Leaky ReLU, Swish)

Training procedure

Loss functions (MSE, BCE, CCE, SCCE)

Example of learning

Different errors

Improvement of results

► Deep Learning (DL) (2)

Deep Neural Network (DNN)

DL models

Concluding remarks


Analogy for Intelligent Networks

We can make an analogy of intelligent computer network with

human nervous system

?

Conventional network


Conventional Networks (1/3)



Conventional networks are not intelligent

Intelligence in conventional networks lies in the end hosts

Hard to make conventional networks intelligent

► Control and networking protocols are distributed among the devices

► Control and data plane are tightly coupled

► No common view of the network for learning and decision making

Coupled Data and Control planes



Changes are required in the architecture of conventional

networks to make them intelligent

► Decoupling of data and control planes

► Centralized network control

► Common network view

To make these changes, one way is thorough Softwarization

which is called as Software Defined Networking (SDN)

Control plane

Data plane


Software Defined Networking (1/3)Definition

Network with following four features can be called as Software

Defined Network

► Decoupled control and data planes: Control functionality is removed from

network devices that will become packet forwarding devices

► Flow-based forwarding decision: A flow is a sequence of packets

between that receive identical service at the forwarding devices

► External control logic: A software platform that enables programming of

forwarding devices, which is often called as SDN controller or Network

Operating System (NOS)

► Network programmability: Software applications running on top of the

NOS that interacts with the underlying data plane devices


Software Defined Networking (2/3)Historical Perspective

The term Software Defined Networking originated around ideas

and work related to OpenFlow in Stanford University [1]

Before SDN there were many attempts regarding softwarization

of the networks

Category Pre-SDN initiatives More recent SDN developments

Data plane programmabilityXbind, IEEE P1520, smart packets, ANTS, SwitchWare, Calvert, high performance router, NetScript, Tennenhouse

ForCES, OpenFlow, POF

Control and data plane decoupling NCP, GSMP, Tempset, ForCES, RCP, Soft-router, PCE, 4D, IRSCP SANE, Ethane, OpenFlow, NOX, POF

Network virtualization Tempset, Mbone, 6Bone, RON, Planet Lab, Impase, GENI, VINI Open vSwitch, Mininet, Flowvisor, NVP

Network operating system Cisco IOS, JUNOS, ExtremeXOS, SR OS NOX, Onix, ONOS

Technology pull initiatives Open Signaling ONF

Overview of the history of programmable networks

D. Kreutz, F. M. V. Ramos, P. E. Veríssimo, C. E. Rothenberg, S. Azodolmolky and S. Uhlig, "Software-Defined Networking: A Comprehensive Survey," in Proceedings of the

IEEE, vol. 103, no. 1, 2015


Software Defined Networking (2/3)Architecture

Y. Zhang, L. Cui, W. Wang, Y. Zhang, “A survey on software defined networking with multiple controllers,” Journal of Network and Computer Applications, Volume 103, 2018,


Source: Wikipedia


OpenFlow (1/3)Introduction

OpenFlow is a protocol that provides a standard interface for

programming the data plane switches

Flow TableFlow Table



Secure ChannelGroup

TableGroup Table

Meter TableMeter Table

End systems OpenFlow Switches

SDN

Controller

OpenFlow Protocol

Over SSL

Internet

Software

Hardware/Firmware


OpenFlow (2/3)Flow Entry

An entry in the Flow Table has three fields

► The rule: Defines the flow, and rule consists of fields in the packet header

► The action: Defines how the packets should be processed

► Statistics: Count the number of packets and bytes for each flow, and the

time since the last packet matched the flow

Source: http://docs.ruckuswireless.com/fastiron/08.0.61/fastiron-08061-

sdnguide/GUID-031030CA-62EC-4009-A516-5510238EF8F4.html

http://docs.ruckuswireless.com/fastiron/08.0.61/fastiron-08061-sdnguide/GUID-031030CA-62EC-4009-A516-5510238EF8F4.html


OpenFlow (3/3)Functioning

Upon arrival on the OpenFlow switch:

► The packet will be match by a Flow Entry in the Flow Table

► Action will be executed if the header field is matched

► Counters are updated

If the packet doesn’t match any flow entry, it will be sent to the

controller over the secure channel


SDN Testbeds (1/3)GENI

GENI is an open infrastructure for at scale networking and

distributed systems research and education that spans the US

M. Berman, et.al., “GENI: A federated testbed for innovative network experiments,” Computer Networks, Volume 61, 2014


SDN Testbeds (2/3)OFELIA

OFELIA – Pan-European OpenFlow Testbed

• Berlin, Germany (TUB)

• Ghent, Belgium (IBBT)

• Zurich, Switzerland (ETH)

• Barcelona, Spain (i2CAT)

• Bristol, UK (UNIVBRIS)

• Catania, Italy (CNIT)

• Rome, Italy (CNIT)

• Trento, Italy (CREATE-NET)

• Pisa, Italy (CNIT, 2 locations)

• Uberlândia, Brazil (UFU)

• Castelldefels, Spain (CTTC)

M. Gerola et al., "Demonstrating inter-testbed network virtualization in OFELIA SDN experimental

facility," 2013 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS),

Turin, 2013


SDN Testbeds (3/3)KOREN

A multi purpose Korean

backbone network,

connecting 8 metropolitan

areas ( Suwon, Daejeon,

Gwangju, Daegu, Busan,

Gangwon, Jeju) from 10

Gbps to 160 Gbps

http://www.koren.kr/koren/eng/net/natworkmap.html?cate=3&menu=1

http://www.koren.kr/koren/eng/net/natworkmap.html?cate=3&menu=1


Mininet (1/4)Introduction

An emulated network environment that can run on a single PC

► Mininet uses lightweight virtualization to make a single system look like a

complete network

► Runs real kernel, switch, and application code on a single machine

► Internally uses Linux containers to emulate hosts, switches, and links

Command line, UI, Python interfaces

Many OpenFlow features are built-in

► Useful developing, deploying, and sharing


Mininet (2/4)Features and Benefits

Reasons to use Mininet

► Fast

► Possible to create many custom topologies

► Can run real programs (anything that can run on Linux can run in a Mininet

host)

► Programmable OpenFlow switches

► Easy to use

► Open source

Alternatives to Mininet and their limitations

► Real system: Difficult to configure and are expensive

► Networked VMs: Scalability

► Simulators: No path to hardware deployment


Mininet (3/4)Introduction Video

This introductory video of Mininet shows:

► Basic information about Mininet (i.e., where to download and where to get

guidelines)

► Basic initiation of Mininet

► Different available commands and their functions

► Different simple topologies

► How OpenFlow packets can be observed in Wireshark


Mininet (4/4)Extensions

Mininet-WiFi

► Emulator for Software-Defined Wireless Networks

► It had the wireless functionality in the host and the switches

MaxiNet

► It extends the Mininet emulation environment to span across several

physical machines

► This allows to emulate very large software-defined networks

MiniNext

► It extends Mininet to makes it easier to build complex networks

OpenNet

► An emulator for Software-Defined Wireless Local Area Network and

Software-Defined LTE Backhaul Network


SDN Controllers (1/5)

Many SDN controllers with different functionalities and purpose

are available

Generally two categories of SDN controllers

► Industry oriented

► Research oriented

SDN controllers

Industry oriented Research oriented

Open DayLight NOX

ONOS Beacon

Faucet RYU

Floodlight


SDN Controllers (3/5) ONOS

The Open Network Operating System (ONOS)

Open source SDN NOS targeted at the Service Provider and

mission critical networks

Developed and lead by ON.LAB and now is under Linux

Foundation

The major goals of ONOS are

► Carrier grade features ( availability, and performance) in the control plane

► Web style agility

► Migration of existing networks to white boxes

► Innovation in both network hardware and software, independent of their own

time scales

https://onosproject.org/

https://onosproject.org/


SDN Controllers (5/5) Floodlight

Floodlight is an open enterprise class OpenFlow Controller

It was introduced by Big Switch Networks, and is still backed by

the engineers in Big Switch networks

Floodlight can handle mixed OpenFlow and non-OpenFlow

network “islands”

Floodlight is designed to be high performance is multi-threaded

from the ground up

http://www.projectfloodlight.org/floodlight/

http://www.projectfloodlight.org/floodlight/


SDN Takeaway

The logically centralized control plane and complete network

view provided by SDN is an ideal platform to implement the

learning algorithms to control the network


Network Function Virtualization (NFV) (1/3)

Middle boxes in networks perform different functions such as

Firewall, Gateways, Load balancers

Network Functions Virtualization (NFV) virtualizes network

functions (VNFs), previously carried out by dedicated hardware

BRAS: Broadband Remote Access Server


Network Function Virtualization (NFV) (3/3)A Practical Example

UE: User equipment

VM: Virtual machine

MME: Mobility management entity

SGW: Service gateway

PGW: Packet data network gateway

HSS: Home subscriber server

PCRF: Policy and Charging Rules Function

Virtualized LTE Architecture

Conventional LTE Architecture


NFV Architectural Framework (1/3)

NFV Management and Orchestration (MANO): Responsible for

management of VNFs and services

MANO has following main components

► Orchestrator

► VNF Manager (VNFM)

► VIM

NFV Infrastructure (NFVI)

ETSI GS NFV-MAN 001 V1.1.1 (2014-12):

https://www.etsi.org/deliver/etsi_gs/NFV-

MAN/001_099/001/01.01.01_60/gs_NFV-MAN001v010101p.pdf

https://www.etsi.org/deliver/etsi_gs/NFV-MAN/001_099/001/01.01.01_60/gs_NFV-MAN001v010101p.pdf



NFV Orchestrator

► Generates, maintains and tears down network services of VNF themselves.

If there are multiple VNFs, orchestrator will enable creation of end to end

service over multiple VNFs

► Responsible for global resource management of NFVI resources. For

example managing the NFVI resources i.e., compute, storage and

networking resources among multiple VIMs in network

► Performs its functions by NOT talking directly to VNFs but through VNFM

and VIM



VNF Manager

► Manages a VNF or multiple VNFs

► Life cycle management of VNF instances

► Can do the same functions as EMS but through open interface/reference

point (Ve-Vnfm)

VIM (Virtualized Infrastructure Manager)

► The management system for NFVI

► Responsible for controlling and managing the NFVI compute, network and

storage resources within one operator’s infrastructure domain

► Responsible for collection of performance measurements and events


Containers (1/4)

Virtualization of application instead of hardware

Runs on top of the core OS (Linux or Windows)

Does not require dedicated CPU, Memory, Network managed by

core OS

Optimizes Infrastructure — speed and density


Containers (2/4)


Containers (3/4)

There are various container platforms available

► Dockers (most famous)

► Linux Container (LXC)

► Next generation Linux Container management (LXD)

► Solaris Zone (Oracle)

► RKT (CoreOS)

Container orchestration platforms

► Google Kubernetes

► Docker Swarm

► Amazon ECS

► Azure Container services

► CoreOS Fleet


Containers (4/4)

Agones: Dedicated game servers for online gaming using

Kubernetes

https://agones.dev/site/

Players

Kubernetes

Agones

https://agones.dev/site/


NFV Takeaway

NFV along with SDN completes the softwarization of network

devices and functions

MANO provides the platform to implement learning algorithms for

improved deployment of VNFs and resource utilization

Orchestrator


Network Management and Control for

Future Networks (1/2)

AI-based intelligent resource management and control for future

wired networks and 6G

► Resource placement and allocation optimization, network personalization,

Radio Access Network (RAN) design, etc.

Currently, optimal solutions are obtained by applying exhaustive

search methods, genetic algorithms, combinatorial, and branch

and bound techniques

Incur significantly high time and computational complexity


Network Management and Control for

Future Networks (2/2)

Sub-optimal solutions are obtained based on techniques such as

Lagrangian relaxations, iterative distributed optimization,

heuristic algorithms, and game theory

Computation intensive algorithms may not be feasible for large

wired and cellular networks due to high control overhead

Sub-optimal solutions can be far from optimal solutions and their

convergence properties and the optimality gap could be unknown


Network Management based on Machine

Learning (ML) (1/2)

With machine learning (ML), the required information is learned

directly from the data samples

ML can be used to obtain practical solutions for radio resource

allocation problems in a large wired and cellular networks given

the past optimal or near-optimal resource allocation decisions

Pros

► ML-based resource allocation algorithms can be implemented online

► Lower cost and faster development

Cons

► No performance guarantee (suboptimal performance)

► Lack of interpretability (blackbox mapping of inputs to outputs)

► Depends on the availability of data


Network Management based on Machine

Learning (ML) (2/2)

ML based solutions are not feasible for every scenario

ML based solutions should be used in one of the following

scenarios

► No mathematical model or efficient algorithm (modeling and/or algorithmic

deficit)

► Task involves a function that maps well-defined inputs to well-defined

outputs

► The function does not change rapidly over time

► Large data sets can be made available

► Error can be tolerated and no need for optimal solutions


ML ApproachesRegression Classification

Clustering Anomaly detection


ML ApproachesRegression (1/2)

Regression


ML ApproachesRegression (2/2)

Application Example: Localization

F. Vanheel, J. Verhaevert, E. Laermans, I. Moerman, P. Demeester, “Automated linear tools improve rssi wsn localization in multipath indoor environment,”

EURASIP J. Wireless Communication and Network, 2011, 1-27


ML ApproachesClassification (1/2)

Classification

C1C2

C3


ML ApproachesClassification (2/2)

Application Example: System Recognition

X. Zheng, Z. Cao, J. Wang, Y. He, Y. Liu, “ZiSense: towards interference resilient duty cycling in wireless sensor networks,” ACM Conference on Embedded

Network Sensor Systems, November 2014


ML ApproachesClustering (1/2)

Clustering

C1

C2

C3


ML ApproachesClustering (2/2)

Application Example: System Identification

N. Shetty, S. Pollin, P. Pawełczak, “Identifying Spectrum Usage by Unknown Systems using Experiments in Machine Learning,” IEEE Wireless

Communications and Networking Conference, April 2009


ML ApproachesAnomaly Detection (1/2)

Anomaly detection


ML ApproachesAnomaly Detection (1/2)

A. S. Uluagac, S. V. Radhakrishnan, C. Corbett, A. Baca, R. Beyah , “A passive technique for fingerprinting wireless devices with Wired-side Observations,”

IEEE Conference on Communications and Network Security (CNS), October 2013


ML Techniques (1/2)

Categories:

► Supervised learning

► Unsupervised learning

► Reinforcement learning

Supervised learning

► Given the dataset 𝐷:

𝐷 = { 𝑥1, 𝑦1 , 𝑥2, 𝑦2 , … , 𝑥𝑁, 𝑦𝑁 }

► Predict 𝑦 that generalizes the input-output mapping in 𝐷 to inputs 𝑥 that are

outside 𝐷

► Classification (discrete output) and regression (continuous output) problems

Classification: Predict which class 𝑥 belongs to

Regression: Predict numerical value from 𝑥


ML Techniques (2/2)

Common supervised learning techniques

► Bayesian classification

► K-nearest Neighbor (KNN)

► Neural Network (NN)

► Support Vector Machine (SVM)

► Decision Tree (DT) Classification

► Recommender System


Neural Networks (NNs) (1/17)

This introductory video of NNs shows:

► Basic structure of NNs

► Fundamental components of NNs

► Forward and backward propagation



Defines a mapping 𝑔 𝑥, 𝜃 : ℝ𝑛 →ℝ𝑘 of an input vector 𝑥 ∈ ℝ𝑛 to

an output vector 𝑦 ∈ ℝ𝑘

Consist of basic components

known as neurons or nodes

Three layers:

► Input layer

► Hidden layer

► Output layer

Nodes can perform non-linear

functions

Neural Network



A neuron 𝑘 in the hidden layer

can be defined as:

𝑣𝑘= σ𝑗=1𝑚 𝑤𝑘𝑗𝑥𝑗 + 𝑏𝑘

𝑦𝑘= 𝜑(𝑣𝑘)

Commonly used learning

algorithm:

► Back propagation algorithm

Gradient descent is a common

algorithm to optimize the

weights of neurons based on

the gradient of the loss function

Non-linear model of a neuron

Prof. Ekram Hossain, “Radio Resource Allocation in the Beyond 5G Era: Promises of Deep Learning and Deep Reinforcement Learning,” Lecture slides



Activation Functions (1/5)

Sigmoid: 𝜑 𝑥 =1

1+𝑒−𝑥

Advantages:

► Smooth gradient, preventing high variation in out values

► Output values bound between 0 and 1, normalizing the output of each node

► Clear predictions as value above 2 or below -2, tends to bring the output

value close to 1 or 0

Disadvantages:

► Vanishing gradient for very high or very low values. This can result in the

network refusing to learn further

► Computationally expensive

Network applications:► Used in problems requiring predictions and

classification

► E.g. Threat prediction for network security




TanH or Hyperbolic Tangent: tanh 𝑥 =𝑒𝑥−𝑒−𝑥

𝑒𝑥+𝑒−𝑥

Advantages:

► As the values are zero centered it makes easier to input values that are

strongly positive, neutral, and strongly negative

► Remaining characteristics are similar as Sigmoid function

Disadvantages:

► Disadvantages are similar as Sigmoid function

Network applications:

► Also used for prediction applications in networks

► E.g. Next PoA prediction for proactive mobility,

flow priority prediction for admission control, and

Congestion prediction in the networks




Rectified Linear Unit (ReLU):max(0, 𝑥)

Advantages:

► It is computationally efficient and allows the network to converge very

quickly

► Although it looks like a linear function, but it has a derivative function and

allows for backpropagation

Disadvantages:

► ReLU has a dying problem, that is when inputs approach zero, or are

negative, the gradient of the function becomes zero, and model cannot

learn


► Used in network applications who datasets consists

of more values >0

► E.g. network traffic prediction for traffic engineering,

and available capacity of a network link




Leaky Rectified Linear Unit (Leaky ReLU):max(0.1 ∗ 𝑥, 𝑥)

Advantages:

► It prevents the dying ReLU problem as it has a small positive slope in the

negative area

► Otherwise it behaves like ReLU

Disadvantages:

► Leaky ReLU has a problem of results inconsistency for negative input

values


► Network applications are similar to ReLU

► E.g. load prediction for proactive scaling of VNFs,

and integrity prediction of virtualized infrastructure




Swish: σ 𝑥 =𝑥

1+𝑒−𝑥

Advantages:

► It is a self-gated activation function recently proposed by researchers at

Google

► It performs better than ReLU with similar level of computational efficiency


► It can be used for multi-class classification applications in networks

► E.g. Network traffic classification for differentiated user services, and

anomaly classification for effective security application

P. Ramachandran, B. Zoph, Q. V. Le, “Swish: A Self-Gated Activation Function,” arXiv 1710.05941, 2017


Neural Networks (NNs) (9/17)Training Procedure (1/2)

Target function 𝑦 = 𝑓∗(𝑥)

► 𝑥 is the input vector and 𝑦 is the output vector

𝑦 = 𝑓(𝑥; 𝜃) , where 𝜃 denotes the unknown parameters, i.e.,

weights and biases

Goal is to learn 𝜃 precisely so that our model can be closer to the

original one

A training dataset composed of inputs and outputs is typically

used to train the model

Initialize the weights and biases randomly and feed the inputs to

the input layer

Output of the input layer is used an input for the hidden layer and

thus data propagates through hidden layers to the output layer


Neural Networks (NNs) (10/17)Training Procedure (2/2)

Forward pass: Propagation of information from the input layer to

the output layer

Loss function: Determines the quality of the model by

calculating the error between the predicted and the original

value, e.g. mean squared error (MSE)

Backward pass: The error signal is propagated backward

through the hidden layers and updates the 𝜃 in each layer

Training process continues until the error rate reaches a

threshold value

Training cycle of epoch: When training data completes a

forward and backward pass



Loss Functions (1/4)

Mean Squared Error (MSE)

► MSE loss is generally used for regression tasks

► It is calculated by taking the mean of squared differences between actual

(target) and predicted values

► Its performance is best when output of NN model is a real number




Binary Crossentropy (BCE)

► BCE loss is used for the binary classification tasks

► The output value should be passed through a sigmoid activation function

and the range of output is (0-1)

► While training the network, the target value fed to the network should be 1 if

true otherwise 0




Categorical Crossentropy (CCE)

► CCE loss is used for multi-class classification tasks

► The number of nodes in the output layer are same as the number of classes

► The output layer nodes uses softmax function so that each node output is a

probability value between (0-1), and target values need to be fed as one-hot

► One-hot is a vector of same size as number of classes and index position

corresponding to the actual class would be 1 and all others would be zero




Sparse Categorical Crossentropy (SCCE)

► Same as CCE with only one difference

► The difference is that target values does not need to passed as one-hot

vector

► Only the index of the class is passed as the target value


Neural Networks (NNs) (15/17)Example of Learning XOR

Learning algorithm will adapt the parameters 𝜃 to make 𝑓 as

similar as possible to 𝑓∗

MSE loss function:

► 𝑒 𝜃 =1

4σ𝑋1,𝑋2

(𝑓∗ 𝑋1, 𝑋2 − 𝑓 𝑋1, 𝑋2; 𝜃 )2

Forward pass

Loss

functionWeights update

Backward pass

Target function: 𝑓∗(𝑋1, 𝑋2)

Model function:

𝑦 = 𝑓(𝑋1, 𝑋2; 𝜃)

𝑋1, 𝑋2 ∈ { 0,0 , 0,1 , 1,0 ,1,1 }



Training error: Error evaluated over training set

Test error or generalization error: Error evaluated over test set

Generalization gap: Gap between training error and test error

Minimizing training error can be regarded as a necessary but not

sufficient condition to obtain a low generalization error



Underfitting:

► A machine learning algorithm is said to be underfitting if

it is not able to make the error over the training set small

Model not rich enough to capture the variations in

training data

Overfitting:

► Not able to make the gap between training error and

test error small

Model too rich to account for the variations in training

data

Hyperparameters:

► Number of neurons, weights and bias

Underfitting

Overfitting

Balanced


Deep Learning (DL) (1/2)Deep Neural Network (DNN)

DL: A branch of ML which can be

supervised, semi-supervised or

unsupervised

DNN: A DL architecture

consisting of many hidden layers

Improves performance at a

faster rate when the dimension

of the training data (no. of

features) increases

Deep Neural Network


Deep Learning (DL) (2/2)Model Architectures

Unsupervised Pre-trained Networks► No formal training required as they are pre-trained on past experience

► Examples are Auto-Encoders, Deep Belief Networks, and Generative

Adversarial Networks

► Network applications include resource allocation and placement

Convolutional Neural Networks (CNN)► Takes in an input image, assign importance (learnable weights and biases)

to different objects in the image to differentiate between them

► Network applications include network traffic classification where traffic

information is converted in to images

Recurrent Neural Networks (RNN)► Adds additional weights to the network to create cycles in the network graph

so as to maintain an internal state

► Network applications include network traffic and mobility prediction where

time series data is used as input sequences


Concluding Remarks

Network softwarization is a paradigm shift from conventional

networking

It sets the platform for ML/DL based solutions to realize

autonomous networks

This opens new research avenues with new set of challenges

involving networks, OS, and AI

In next lectures we will further discuss DL techniques and

examine their applications in mobile computing

Questions are welcomed through email

mobile computing softwarization and machine learning for

Documents