new 5g-miedge · 2019. 4. 3. · 2 5g-miedge page 2 the document is proprietary of the 5g-miedge...

53
Deliverable Horizon2020 EUJ-01-2016 723171 5G-MiEdge D3.3 Date : February 2019 Public Deliverable 5G-MiEdge Page 1 5G-MiEdge Millimeter-wave Edge Cloud as an Enabler for 5G Ecosystem EU Contract No. EUJ-01-2016-723171 Contractual date: M32 Actual date: M32 Authors: See list Work Package: D3.3 - Context information management to create traffic map for mmWave edge cloud Security: Public Nature: Report Version: 1.0 Number of pages: 53 Abstract This deliverable is the final report of Task 3.2. It reports on the activities of Work Package 3 on the management and exploitation of context information and on learning algorithms for radio environment and traffic map prediction. Keywords Network architecture, 5G mobile, edge computing, system integration, context information, resource management, traffic prediction, control signaling, resource allocation, computation caching, load balancing All rights reserved.

Upload: others

Post on 21-Oct-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

  • Deliverable Horizon2020 EUJ-01-2016 723171 5G-MiEdge D3.3

    Date : February 2019 Public Deliverable

    5G-MiEdge Page 1

    5G-MiEdge Millimeter-wave Edge Cloud as an Enabler for 5G Ecosystem

    EU Contract No. EUJ-01-2016-723171

    Contractual date: M32

    Actual date: M32

    Authors: See list

    Work Package: D3.3 - Context information management to create traffic map for mmWave

    edge cloud

    Security: Public

    Nature: Report

    Version: 1.0

    Number of pages: 53

    Abstract

    This deliverable is the final report of Task 3.2. It reports on the activities of Work Package

    3 on the management and exploitation of context information and on learning algorithms

    for radio environment and traffic map prediction.

    Keywords

    Network architecture, 5G mobile, edge computing, system integration, context information, resource

    management, traffic prediction, control signaling, resource allocation, computation caching, load balancing

    All rights reserved.

  • 2

    5G-MiEdge Page 2

    The document is proprietary of the 5G-MiEdge consortium members. No copy or distribution, in any

    form or by any means, is allowed without the prior written agreement of the owner of the property

    rights.

    This document reflects only the authors’ view. The European Community is not liable for any use that

    may be made of the information contained herein.

    Authors

    CEA Nicola di Pietro [email protected]

    KDDI Research Katsuo Yunoki [email protected]

    Sapienza University Sergio Barbarossa [email protected]

    Stefania Sardellitti [email protected]

    Francesca Cuomo [email protected]

    Mattia Merluzzi [email protected]

  • Deliverable Horizon2020 EUJ-01-2016 723171 5G-MiEdge D3.3

    Date : February 2019 Public Deliverable

    5G-MiEdge Page 3

    Table of contents

    Abbreviations and acronyms ......................................................................................... 5

    Executive Summary ........................................................................................................ 8

    1 Introduction ............................................................................................................. 9

    1.1 On privacy of context information and users’ data ........................................ 10

    2 Overview of architectural aspects related to information exchange ................. 11

    3 Management and exploitation of context information for content and

    computation caching ............................................................................................. 14

    3.1 Proactive caching and transport optimization ................................................ 14

    3.1.1 State of the art ..................................................................................... 14

    3.1.2 Contribution ........................................................................................ 15

    3.1.3 Scenario .............................................................................................. 16

    3.1.4 Algorithm ........................................................................................... 17

    3.1.5 Numerical Results .............................................................................. 19

    3.2 Computation caching ...................................................................................... 20

    3.2.1 State of the art ..................................................................................... 21

    3.2.2 Contribution ........................................................................................ 21

    3.2.3 Considerations on the required signalling .......................................... 22

    3.2.4 Computation caching policies ............................................................ 22

    3.2.5 Simulation results ............................................................................... 25

    3.2.6 Computation caching with federation of small cells .......................... 29

    3.2.7 Simulation results ............................................................................... 31

    4 Learning algorithms for physical and application layer parameters and

    context information............................................................................................... 34

    4.1 Graph topology inference from data .............................................................. 34

    4.2 Radio environment map (REM) ..................................................................... 35

    4.2.1 State of the art ..................................................................................... 35

    4.2.2 Contribution ........................................................................................ 36

    4.3 Traffic map ..................................................................................................... 38

    4.3.1 Graph-based prediction of traffic map................................................ 41

    4.4 Estimation of file popularity across space ...................................................... 43

  • 4

    5G-MiEdge Page 4

    5 Relevance of the proposed algorithms and techniques to the project’s use cases ........................................................................................................................ 46

    5.1 Omotenashi services ....................................................................................... 46

    5.2 Moving hotspot .............................................................................................. 47

    5.3 2020 Tokyo Olympic ...................................................................................... 47

    5.4 Outdoor dynamic crowd ................................................................................. 48

    5.5 Automated driving .......................................................................................... 48

    6 Summary ................................................................................................................ 49

    7 References .............................................................................................................. 50

  • 5

    5G-MiEdge Page 5

    Abbreviations and acronyms

    Acronym Description

    3GPP 3rd Generation Partnership Project

    5G 5th (fifth) Generation

    5G-MiEdge Millimeter-wave Edge Cloud as an Enabler for 5G Ecosystem

    5QI 5G QoS Identifier

    AF Application Function

    AMF Access and Mobility management Function

    ANDSF Access Network Discovery and Selection Function

    AP Access Point

    API Application Interface

    AS Application Server

    BS Base Station

    BSSID Basic Service Set Identification

    CDN Content Delivery Network

    C-Plane Control Plane

    CPN Connectivity Provider Network

    C-RAN Centralized RAN

    CRN Cognitive Radio Network

    C/U split Control/User-plane split

    D2D Device-to-Device

    DC Dual Connectivity

    D-RAN Distributed RAN

    DN Data Network

    DP Data Plane

    DSA Dynamic Spectrum Access

    eMBB Enhanced Mobile Broadband

    EPC Evolved Packet Core

    ETSI European Telecommunications Standards Institute

    GBR Guaranteed Bit Rate

    GSP Graph Signal Processing

    GUI Graphic User Interface

    HD High Definition

  • 6

    5G-MiEdge Page 6

    HTTP HyperText Transfer Protocol

    ICN Information-Centric Networks

    IEEE The Institute of Electrical and Electronic Engineers

    IoT Internet of Things

    LADN Local Area Data Network

    LCM Life Cycle Management

    LoA Levels of Automation

    M2M Machine-to-Machine

    MAB Multi-armed Bandit

    ME Mobile Edge or Multi-access Edge

    ME app Mobile Edge application

    MEC Mobile Edge Computing or Multi-access Edge Computing

    MEH Mobile Edge Host

    MEO Mobile Edge Orchestrator

    MEP Mobile Edge Platform

    MEPM Mobile Edge Platform Manager

    MgNB Master gNodeB

    MiEdge mmWave Edge cloud

    mmWave Millimeter Wave

    MSF MEC Service Function

    N3IWF Non-3GPP Interwork Function

    NEF Network Exposure Function

    NFV Network Functions Virtualization

    NR New RAT

    NSSAI Network Slice Selection Assistance Information

    OBU On-Board Unit

    OSS Operations Support System

    PCF Policy Control Function

    PDU Packet Data Unit

    QFI QoS Flow Identifier

    QoE Quality of Experience

    QoS Quality of Service

    RAT Radio Access Technology

  • 7

    5G-MiEdge Page 7

    RAN Radio Access Network

    REM Radio Environment Map

    RL Reinforcement Learning

    RNI Radio Network Information

    RSS Received Signal Strength

    RSU Road Side Unit

    sBS Base Station for small cell

    SDN Software-Defined Network

    SgNB Secondary gNodeB

    SINR Signal to Interference-plus-Noise Ratio

    S-MEH Source ME host

    SMF Session Management Function

    SSC Serving Small Cell

    TA Tracking Area

    T-MEH Target ME host

    UDM Unified Data Management

    UE User Equipment

    UOF User plane Optimization Function

    UPF User Plane Function

    U-Plane User Plane

    uRLLC Ultra-Reliable & Low Latency Communications

    V2V Vehicle-to-Vehicle

    V2X Vehicle-to-Everything

    VM Virtual Machine

    WP Work Package

  • Deliverable Horizon2020 EUJ-01-2016 723171 5G-MiEdge D3.3

    Date: February 2019

    Public Deliverable

    5G-MiEdge Page 8

    Executive Summary

    5G-MiEdge focuses on combining millimeter-wave access technologies and multi-

    access edge computing to develop a novel edge cloud architecture to fulfil the

    challenging performance requirements of 5G and beyond network use cases. The third

    work package (WP3) of 5G-MiEdge is dedicated to the development of joint radio,

    computation, and storage resource management algorithms suitable for the millimeter-

    wave edge cloud paradigm. Such algorithms rely on an agile, interoperable and

    resilient control signaling to support multi-connectivity and on distributed methods to

    forecast and share useful context information.

    This deliverable reports on the results of 5G-MiEdge’s Task 3.2: “Context information

    management for traffic map prediction”. The provided analysis focuses on

    management, exploitation, and learning of context information, with direct

    applications to two caching problems and to the construction of radio environment and

    traffic maps.

  • Deliverable Horizon2020 EUJ-01-2016 723171 5G-MiEdge D3.3

    Date: February 2019

    Public Deliverable

    5G-MiEdge Page 9

    1 Introduction

    Millimeter-wave (mmWave) wireless access is expected to be one of the key enabling

    technologies needed for Multi-Gigabit access networks. However, mmWave

    technologies also present some weak points like limited coverage, severe blocking loss,

    and a heavy load on the backhaul. The introduction of multi-layer and multi-

    connectivity seems an attractive solution to compensate for those penalties. Hence, it

    is paramount to develop a new interoperable control plane and control signaling to

    support multi-connectivity. Moreover, edge computing is supposed to be deployed

    with prefetching and caching functions to deliver desired contents to user terminals

    with low latency. Therefore, a new control plane is also required to support on-target

    delivery of contents between users and edge cloud. Task 3.2 of 5G-MiEdge’s WP3

    focuses on defining new procedures to share and exploit measured context (location,

    traffic, action, content and computation popularity, etc. [D3.1]), on understanding the

    impact and roles of terminals, and finally on the management of liquid resource, which

    requires adapted signaling for joint communication and computing cluster formation.

    Cluster formation either requires a centralized intelligence for orchestrating its

    formation and update or is accomplished by distributed intelligence.

    An efficient and effective management of resources and their optimal allocation are

    possible if the network is “aware” of the environment in which it operates. 5G-

    MiEdge’s resource allocation techniques rely on the capacity of the network to

    regularly measure, monitor, elaborate, and possibly predict the parameters that

    characterize the status of the resources and the users. These parameters are in general

    referred to as context information [D3.1]. Examples of context information are the

    users’ position or density in a given area, their experienced channel quality, the

    workload of a given edge cloud node as a percentage of its full computing-

    communication-caching capacity, etc. These parameters constitute the inputs of the

    optimization algorithms that 5G-MiEdge is proposing and that contribute to an

    effective liquid management of resources. Up-to-date context information needs to be

    constantly exchanged between the users and the network, and within different elements

    of the network itself.

    In this document, algorithmic solutions are proposed that address two main context-

    dependent objectives of 5G-MiEdge’s WP3:

    the design of joint resource orchestration algorithms for distributed mmWave edge cloud of 5G wireless heterogeneous networks, based on proactive

    caching techniques;

    the design of machine learning methods to forecast radio environment and traffic maps, as well as network context information.

    In Section 2, we briefly recall some main aspects of the network architecture proposed

    by 5G-MiEdge, with a focus on the role of interfaces to share and collect useful context

    information that can be exploited for optimizing the network performance.

    Section 3 is dedicated to problems related to caching. Section 3.1 proposes a strategy

    to derive the optimal trade-off between the caching and transport costs associated to

    the distribution of contents in information networks. The section includes a subsection

    dedicated to file popularity estimation. In Section 3.2, a detailed analysis is provided

  • Deliverable Horizon2020 EUJ-01-2016 723171 5G-MiEdge D3.3

    Date: February 2019

    Public Deliverable

    5G-MiEdge Page 10

    about computation caching, the novel paradigm introduced in [AR6.1] and [AR6.2].

    Computation caching is a particularly suitable scheme for the framework of

    computation (or task) offloading, typical of Multi-access Edge Computing (MEC). The

    goal of Section 3.1 is twofold: first, to provide a concrete example of how the

    knowledge of context information (task popularity and sizes) plays a crucial role in the

    optimization scenarios studied by 5G-MiEdge; second, to show the benefits of

    computation caching, as a technique to drastically reduce the uplink radio traffic (from

    users to cloud) and the energy costs of computation offloading (both for users and

    cloud). These advantages lead to a higher system scalability and a more profitable

    network deployment.

    In Section 4, we describe some techniques to construct radio environment maps

    (Section 4.2) and to recover the spatial pattern of wireless data traffic from sparse

    measurements (Section 4.3). These results rely on a powerful tool named Graph Signal

    Processing (GSP), introduced in Section 4.1, which extends the classical signal

    processing tools to signals defined on graphs. Efficiently reconstructing reliable radio

    environment maps and traffic maps is instrumental to design effective resource

    allocation schemes. Finally, Section 4.4 proposes a technique to estimate the popularity

    of a set of files, an essential step to optimize the caching schemes proposed in Section

    3.

    In Section 5, the technical contributions described in Section 3 and 4 are put in relation

    with the use cases of 5G-MiEdge presented in [D1.1] and [D1.3].

    1.1 On privacy of context information and users’ data

    Before moving to the technical sections of this document, we would like to highlight

    that our project is fully aware of privacy concerns and necessary mechanisms to allow

    users to carefully choose the information they share with other network nodes and

    entities. For our proposed scenarios, it is always the user requesting a certain service

    and implicitly agreeing to share the necessary information to fulfill this request. We do

    not have any intention to harvest user specific data beyond this. An explicit work on

    privacy-related issues is out of scope for this project and beyond the main area of

    expertise in the consortium. For the mechanisms used within this project, we do not

    require privacy policies beyond best practice in currently run commercial networks.

  • Deliverable Horizon2020 EUJ-01-2016 723171 5G-MiEdge D3.3

    Date: February 2019

    Public Deliverable

    5G-MiEdge Page 11

    2 Overview of architectural aspects related to information exchange

    At the early stage of the project, we identified a preliminary 5G-MiEdge baseline

    architecture [D1.3], illustrated in Fig. 2-1 in a high-level detail. This was derived from

    an architecture for 5G mobile networks defined in [TS23.501] by the 3rd Generation

    Partnership Project (3GPP), including considerations of interworking with non-3GPP

    access. In order to locate MEC hosts (MEHs) on the non-3GPP access side,

    interconnection of user plane (U-plane) is not considered with 5G mobile network side,

    but MEHs on both sides are connected with the Mp3 interface defined in [MEC003].

    Fig. 2-1 High-level baseline system architecture for 5G-MiEdge

    In terms of context information management, each MEH can retrieve context data from

    the users that it serves. This can be shared among MEHs via the abovementioned Mp3

    interface. Moreover, context information can also be centralized to 5G-MiEdge’s MEC

    Service Function (MSF).

    Radio resource information in 5G mobile networks is obtained by the Access and

    Mobility management Function (AMF) and may be shared to MSF via the Network

    Exposure Function (NEF). The procedure for retrieving radio resource information is

    shown in Fig. 2-2.

  • Deliverable Horizon2020 EUJ-01-2016 723171 5G-MiEdge D3.3

    Date: February 2019

    Public Deliverable

    5G-MiEdge Page 12

    Fig. 2-2 procedure for retrieving radio resource information

    The Radio Network Information (RNI) Application Interface (API) is also defined in

    [MEC012]. MEHs can obtain the radio link quality, but also the access point workload

    and availability directly if this API is implemented on the access point. The

    information retrieval is performed via a protocol based on the well-known HTTP

    (HyperText Transfer Protocol) protocol. Fig. 2-3 is an example of this procedure. It is

    also possible to subscribe RNI event notifications.

    Fig. 2-3 An example of RNI procedure [MEC012]

    Traffic statistics on a User Plane Function (UPF) can be retrieved via the Session

    Management Function (SMF). Those statistics also can be provided to MSF via NEF.

    A possible procedure for that is shown in Fig. 2-4. It is possible to obtain and measure

    statistics on traffic per service. Types of service can be identified in a specific way, e.g.

    service ID of data session.

  • Deliverable Horizon2020 EUJ-01-2016 723171 5G-MiEdge D3.3

    Date: February 2019

    Public Deliverable

    5G-MiEdge Page 13

    Fig. 2-4 Procedure for retrieving traffic information from SMF

  • Deliverable Horizon2020 EUJ-01-2016 723171 5G-MiEdge D3.3

    Date: February 2019

    Public Deliverable

    5G-MiEdge Page 14

    3 Management and exploitation of context information for content and computation caching

    In this section, we investigate two different caching-related problems. Both classic

    content caching and the novel concept of computation caching are paradigmatic

    examples of techniques whose effectiveness is strictly dependent on up-to-date,

    accurate, and timely shared context information (popularity and size of cacheable

    content and task results, evolution of users’ interests, radio channel quality and

    interference levels, etc.). In Section 3.1, a proactive strategy for content delivery is

    proposed, which takes into consideration data transfer and storage costs in order to

    find the optimal network configuration evolution in terms of placement and routing of

    content. Section 3.2 recalls and provides new results on the computation caching

    technique, highlighting its benefits in terms of reduced uplink (from UE to MEH) data

    traffic, reduced workload at the MEH level, and improved general network

    performance.

    It is important to highlight that the results of this section all contribute to the reduction

    of network costs and to a considerable optimization of the exploitation of resources.

    The word “cost” includes several meanings: energetic costs, time delays, volume of

    data flowing through the network, amount of control signalling, etc. Cutting these costs,

    as the numerical results of this section show, directly implies a higher scalability of the

    system and an easier implementation of the proposed techniques. For instance, the

    reduction of uplink data traffic from the UEs to the MEHs obtained through

    computation caching, as well as the decrease of the computational efforts required for

    computation offloading, both lead to the possibility of down-scaling the system

    resources to serve the associated users, with non-trivial gains with respect to a system

    that does not leverage the computation caching techniques. A “lighter” network can

    then be more easily deployed, with less associated costs and improved scalability.

    Moreover, optimized caching strategies globally reduce the flow of data through the

    network (or concentrate it in moments of the day when it is more sustainable), thus

    streamlining and speeding up communications.

    3.1 Proactive caching and transport optimization

    In this subtask, we devise a strategy for finding the optimal trade-off between transport

    and caching energy costs associated to the distribution of contents in information

    networks. The results of this subtask led to publications [BSC+18] and [SCM18].

    3.1.1 State of the art

    The rapid increase of content delivery in Internet has motivated the development of

    novel networking paradigms such as Information Centric Networking (ICN), that

    integrate content delivery as a native network feature and therefore are better suited

    for efficiently accessing and distributing contents [JST+09], [ADI+12]. One of the

    main benefits of ICN is to reduce user content access delay and network bandwidth

    usage by storing the contents at the network edge close to the end user. ICN is based

  • Deliverable Horizon2020 EUJ-01-2016 723171 5G-MiEdge D3.3

    Date: February 2019

    Public Deliverable

    5G-MiEdge Page 15

    on named data objects, which are, for example, web pages, videos, documents, or other

    pieces of information. In contrast, current networks are host-centric since

    communication is based on named hosts, which are for example, web servers, PCs,

    and mobile handsets. In ICN networks, serving small base stations (MEHs) are

    equipped with storage capabilities to cache contents as they are requested by the end

    users. Many nodes contribute actively in content caching to reduce the network

    congestion, the access delay and origin servers’ load [ZLZ15], [Wan+14]. Several

    content caching strategies have been proposed to maximize local hit rate, or the

    fraction of requests served by a given cache, optimizing the placement and routing of

    information objects in a static way [HMR17], [CGK+12], [KSK+18]. Clearly, an

    effective caching strategy builds significantly on the ability to learn and predict users’

    behaviour. This capability lies at the foundation of proactive caching [Bas+15]and it

    motivates the need to merge future networks with big data analytics [Zey+16]. A

    distributed dynamic content replacement strategy that refreshes the caches contents as

    they travel through the network have been proposed in [LTV+15], where the authors

    considered the problem of finding the time evolution of the optimal placement and

    routing of contents which minimizes the sum of the transport and caching energies.

    3.1.2 Contribution

    We propose to incorporate the ICN strategy in the edge cloud to have a robust

    mechanism to handle mobility in the mmWave scenario. ICN is a networking

    infrastructure tailored for content delivery, based on a name-data-routing. In ICN,

    content objects move across the network according to users’ requests and are retrieved

    by their name, and each network entity is equipped with limited storage capabilities.

    This helps in reaching contents without havening explicit knowledge on their storage

    location, reducing access delay and network bandwidth utilization. The question

    becomes then how to distribute contents through the network. We address this question

    by finding the optimal trade-off between content replication and delivery time.

    The proposed strategy is proactive with respect to the users’ requests, as contents are

    pre-fetched depending on the distribution of their (estimated) popularity (see also

    Section 4.4). This applies to all kind of caching strategies and it is not a direct

    consequence of the ICN paradigm. In particular, we develop a dynamic energy-

    efficient strategy that jointly optimizes caching and delivery costs within each cluster

    of nodes, i.e. moderate-size network in which each node (entities like MEHs) has

    storage capabilities. Starting from the strategy proposed in [LTV+15], we incorporate

    a cost of caching the information objects that depends dynamically on the local and

    global popularity of the objects, in order to encourage the edge nodes to host the most

    popular contents. The content distribution results as a trade-off between replication

    and delivery time in an energy-efficient dynamic way to find network configuration

    evolution in terms of placement and routing of content objects over time.

  • Deliverable Horizon2020 EUJ-01-2016 723171 5G-MiEdge D3.3

    Date: February 2019

    Public Deliverable

    5G-MiEdge Page 16

    3.1.3 Scenario

    Let us consider a transport network represented by a graph composed of

    a set of nodes (vertices) V, a set of edges (links) E, and a set of information objects K,

    as illustrated in Fig. 3-1. We assume that contents can be permanently or temporarily

    stored over the nodes of this graph or travel through its links. Specifically, in some

    nodes, called repository, the content objects are stored permanently, at least as far as

    their popularity does not change, and in the other nodes the contents may appear and

    disappear according to users’ requests and network resource allocation.

    Fig. 3-1 Network example

    To simplify our formulation, we assume that all contents can be split into objects of

    equal size, identified by an index 𝑘 ∈ 𝐾 . Each node and link are characterized, respectively, by a storage and transport capability. We assume that time is divided in

    slots of fixed duration ∆τ and we consider time frames, each composed of T time slots.

    At time slot n, each node 𝑢 ∈ 𝑉 in the network can act as a repository node of a set of information objects 𝐾𝑢[𝑛] ⊆ 𝒦, and can request a set of information objects 𝑄𝑢[𝑛] ⊆

    𝒦. Let us denote with 𝒒[𝑛] ∈ {0,1}|𝒱||𝒦| the request arrival process such that

    The random process 𝒒[𝑛] depends on the time evolution of the contents’ popularity that is modelled as a Poisson process, with an average arrival rate 𝑝𝑢𝑘[𝑛] at node u for object k following the Zipf distribution [BBD14]:

    where 𝛼𝑢[𝑛], 𝛽𝑢[𝑛] and 𝑟𝑢𝑘[𝑛] are, respectively, the Zipf parameter, the request rate and the rank of object k, at node u at time n. The content objects popularity evolves in

    1

    2

    5 4

    3

    6 7

    8

    9 10

    11

  • Deliverable Horizon2020 EUJ-01-2016 723171 5G-MiEdge D3.3

    Date: February 2019

    Public Deliverable

    5G-MiEdge Page 17

    time following a rule based on local and global popularity measures associated to each

    time frame s:

    According to a forgetting factor 𝜂 ∈ [0,1] , taking into account all the previous probabilities, one can derive the following formula:

    A similar formula applies for the global popularity:

    3.1.4 Algorithm

    Considering the graph of our network, we can define a vertex signal over the nodes

    𝑠𝑢[𝑘, 𝑛] as

    and an edge signal (i.e., a binary value indicating if a content is being transported

    through this edge (edge = link within a network)) 𝑡𝑢𝑣[𝑘, 𝑛] over the links, as

    Typically, each content may be hosted on every node, removed from the actual position

    and moved whenever convenient to another location, by meeting the dynamic process

    of users’ requests. Moreover, each content is to be stored at least in one repository node,

    which can be the one at the edge of the network, i.e. may directly access a content

    delivery network, or a node chosen in a proactive way. The choice of the node where

    pre-fetching a content object is made follows the following probabilistic measure

    𝑤𝑢[𝑘]:

    It depends on the centrality of each node u, with respect to the other nodes requesting

    the object k choosing the node where 𝑤𝑢[𝑘] takes its minimum value, taking into account the length of the shortest path (in number of hops) 𝐵𝑢𝑣, between node u and node v, and the average popularity of object k at node v. The goal of our work is to

  • Deliverable Horizon2020 EUJ-01-2016 723171 5G-MiEdge D3.3

    Date: February 2019

    Public Deliverable

    5G-MiEdge Page 18

    devise a proactive caching strategy that minimizes the sum of caching and

    transportation costs by taking into account that the cost of caching strictly depends on

    the time evolution of the popularity of the requested contents, so as to make this

    strategy proactive and context-aware. We define the energy cost for storing a content

    k on a node u during T consecutive time slots, in the time window [𝑛′ − 𝑇 + 1, 𝑛′], where n’ is the time frame, as

    where 𝑐𝑢[𝑘, 𝑛] is the time-varying energy cost for keeping content k on node u at time n, defined as

    where 𝛾 ∈ [0,1] is the trade-off coefficient between the local and global popularity costs. Note that the cost 𝑐𝑢[𝑘, 𝑛] is low for contents with the highest popularity to encourage the storage of these frequently requested contents. Then, we can define the

    cost associated to the content transport as

    The proposed proactive caching optimization problem is then defined as

    where

    where 𝜆 is a positive parameter controlling the ratio between transport and storage energy costs. The constraint set 𝜒 is defined in order to satisfy all constraints associated to moving the contents throughout the network:

  • Deliverable Horizon2020 EUJ-01-2016 723171 5G-MiEdge D3.3

    Date: February 2019

    Public Deliverable

    5G-MiEdge Page 19

    More specifically, the constraint set 𝜒 incorporates the following rules: if object k is requested by node u at time slot n, then k either is already present

    in the cache of node u at time n, or it has to be transported to node u from a

    neighbour node 𝑣 ∈ 𝑁𝑢 within a maximum delivery time slot; if k is being cached at node u at time n, then k either was in the cache of u at

    time n−1 or was received by node u from a neighbour node 𝑣 ∈ 𝑁𝑢 at time n−1;

    if object k is delivered to node u from a neighbour node 𝑣 ∈ 𝑁𝑢 at time slot n, then this object either was in the cache of v at time n−1 or was transferred to u

    from a neighbour node 𝑤 ∈ 𝑁𝑢 at time n−1; the existence of an initial condition constraint that assures a proactive selection

    of the repository nodes that always stores the objects in 𝒦𝑢𝑝 , and at n = 0

    nothing else;

    the existence of a constraint that assures that the total amount of contents stored and delivered meets the storage and capacity constraints of the nodes and the

    edges of the network;

    the existence of a constraint that states the binary nature of the storage and transport variables.

    3.1.5 Numerical Results

    The simulation results are reported in the next two figures, for a network of 11 nodes

    and considering 10 content objects to storage or/and deliver in a 15 slots frame. The

    performance of our strategy is compared with the non-proactive strategy proposed in

    [LTV+15], where the content popularity does not affect the optimization process. Fig.

    3-2 shows the transport gain in terms of number of hops reduction, defined as 𝐺ℎ =𝑛ℎ

    𝑛 − 𝑛ℎ𝑝 , where 𝑛𝑛 and 𝑛𝑝 are the average number of hops, respectively, in the non-

    proactive and in the proactive algorithms. It can be observed that 𝐺ℎ increases when the transport cost λ decreases, since, in this case, the objects transport is favoured by

    the proactive caching.

    Fig. 3-2 Number of hops gain vs. transport cost parameter

  • Deliverable Horizon2020 EUJ-01-2016 723171 5G-MiEdge D3.3

    Date: February 2019

    Public Deliverable

    5G-MiEdge Page 20

    Fig. 3-3 shows the energy cost comparison between the two strategies, respect to the

    transport cost parameter. Note that proactivity yields considerable energy savings for

    low λ values, taking benefit from the optimal transport strategy.

    Fig. 3-3 Total average energy cost vs. transport cost parameter

    3.2 Computation caching

    In MEC networks, Serving Small Cells (SSCs) endowed with radio access technology,

    computing units, MEHs, and local cache memories can be requested by UEs to run

    computing tasks on their behalf. The procedure of entrusting these computational

    assignments to small cells is called task or computation offloading. It allows UEs to

    save both time and energy and revolutionises the classical interaction between UEs

    and SSCs. The computation offloading procedure is made of three phases: an uplink

    communication phase during which the UE uploads inputs and instructions to be run

    by its SSC; a computing phase during which the SSC executes those instructions; and

    a downlink communication phase during which the computation results are sent from

    the SSC to the UE. Differently from many current applications, the computation

    offloading paradigm entails a considerable increase of the uplink traffic at the mobile

    edge of the network, often constrained by tight latency requirements. It becomes

    crucial to endow the edge cloud with the capability to handle this tsunami of uplink

    information, so that neither the communication channels nor the computing resources

    at the SSC get congested, overwhelmed by a high number of computation offloading

    requests. In this section, we treat a solution for the optimisation of computation

    offloading based on caching of computational results. This strategy, called

    computation caching and different from more classical content caching, is highly

    context-dependent and is based on the knowledge (or estimation or learning or

    measurement) of some contextual parameters that characterise the offloadable

    applications, namely their popularity, input size, and output size. We described in

    [D3.1] the notion of context. In this section, we show a meaningful example of how

    contextual information can be exploited to optimise computation offloading and, by

    extension, the new use cases and services typical of the 5G framework.

  • Deliverable Horizon2020 EUJ-01-2016 723171 5G-MiEdge D3.3

    Date: February 2019

    Public Deliverable

    5G-MiEdge Page 21

    3.2.1 State of the art

    The role of content caching in MEC networks is critically important and deeply

    investigated [AFK17], [BBD14], [IW16], [WZZ+17]. In the context of task

    offloading, a new form of caching was recently introduced, after noticing the

    pointlessness of repeating many times the same computation for the same reiterated

    offloading request. This paradigm is called computation caching [OC15], [Oue16],

    [OC17] and suggests to exploit small cells’ memory to store the results of offloadable

    computations [EBS17], [dC18] so that they can be simply retrieved from the cache

    instead of being recomputed each time they are requested. The goal is to decimate

    redundant and repetitive processing and has several advantages, e.g., drastically

    reducing computation time and saving energy for both UEs and SSCs, preventing

    uplink bottlenecks, freeing network resources and decreasing the SSCs’ workload, and

    diminishing the number of virtual machine instantiations. We presented in [AR6.1],

    [AR6.2], [dC18] our first results on computation caching policies and algorithms. We

    are going to recall them here, together with some useful notation and some more recent

    numerical simulation results. After, we will discuss the role of computation caching

    policies in federation algorithms for collaboration among small cells.

    3.2.2 Contribution

    Computation caching policies are enablers for proactive computation caching

    [EBS17], intended as the strategy of dynamically adapting the content of cache

    memories based on the continuous learning of task popularity and other statistics. The

    leading concept is that future offloading traffic can be predicted and computation

    caching can be proactively adjusted to smoothly react to traffic fluctuations. The

    policies that we consider focus on three quantities that characterise an offloadable task:

    its popularity, the size of its input, and the size of its result. In particular, this last

    parameter plays a crucial role: in computation caching, the size of the data to cache

    and to download (the task result) can be significantly different from the size of the data

    to upload from the UE to the SSC (the task input). This marks a sharp difference with

    classical content caching, in which cached data essentially have the same size of the

    corresponding data travelling through the network.

    The numerical results presented later highlight several benefits, thanks to the used

    computation caching techniques. First, computation caching helps in reducing by

    considerable percentages the uplink traffic from UEs to the edge cloud, in the

    framework of computation offloading. This contributes to mitigate the “tsunami” of

    uplink traffic that is foreseen for future 5G communications. Moreover, thanks to

    computation caching, more offloading requests can be treated per time unit by the same

    small cell, hence increasing the quality of the provided services and allowing the same

    MEH to serve more UEs. This is mainly due to the fact that the average computation

    delay required by offloadable tasks is shortened, because already-cached task results

    do not need to be computed again. This also entails a reduction of the computational

    capacity needed at the MEHs’ level to guarantee a certain performance. Finally, the

    numerical results obtained in the scenario with small cell federation not only confirm

    and strengthen the previous points, but also show that well-designed computation

    caching policies can be sufficient to guarantee high performance, thus making

  • Deliverable Horizon2020 EUJ-01-2016 723171 5G-MiEdge D3.3

    Date: February 2019

    Public Deliverable

    5G-MiEdge Page 22

    federation not essential to achieve high gains. This is particularly beneficial in

    scenarios where small cell clustering induces high fixed costs (due to backhaul

    implementation, overhead communications, etc.).

    3.2.3 Considerations on the required signalling

    Notice that the task popularity and the input/output data sizes can be measured (or

    estimated or inferred) directly by the SSC and in general do not need to be exchanged

    with or retrieved from other nodes of the network. Hence, at a single-small-cell level,

    the implementation of computation caching techniques does not require an increase of

    exchanged control signalling with respect to the state-of-the-art protocols and

    algorithms for task offloading in MEC networks. Our computation caching policies

    and the related cache filling algorithms are based on context information and statistics

    that in general each SSC can collect on its own, during a learning or training phase that

    can last as long as necessary and be repeated at a suitable time granularity. Of course,

    it is possible to conceive scenarios in which popularity, data sizes, or instructions are

    transmitted among the small cells or other network nodes in a centralised or distributed

    manner to speed up the learning/training phase; though, even in these cases the

    exchange of information would happen only once in a while, not drastically impacting

    the control signalling required to manage the network operations.

    The situation is slightly different in the scenario proposed in Section 3.2.5 and 3.2.6,

    in which small cells’ caches are federated. In this case, some additional signalling is

    required at each offloading request to manage the search for cached results in

    neighbouring cache memories. All details will be provided later, but the important

    point for now is that, for each task offloading request received by the SSC whose task

    result is not cached therein, the SSC sends a message to each one of its neighbours to

    know if they are storing that result. Even when this procedure is carried out in the most

    naïve way, the number of messages exchanged is linear in the number of neighbouring

    small cells 𝑁 and the messages consist in just a few bits of information (a label to identify the task and a “yes” or “no” from the neighbours to the SSC). Thus, the

    signalling price to pay is little and fully compensated by the increased benefits of

    computation caching. Also in this case, it is possible to design strategies in which a

    bigger amount of information is shared only once in a while among the small cells

    (e.g. the cache indicators of each small cell), in order to reduce the amount of signalling

    at each task offloading request. If the SSC already knows which neighbouring cache

    contains the desired result, then it needs to send only one downloading request to

    retrieve it (instead of 𝑁).

    3.2.4 Computation caching policies

    In our setting, a UE offloads computational tasks to the MEC via its SSC. The

    communication rates are denoted 𝑅𝑈𝐿 in uplink and 𝑅𝐷𝐿 in downlink and are measured in bits per second. We suppose that the computational capacity of the SSC is 𝑓 CPU cycles per second and that the SSC can store up to 𝑚 bits to perform computation caching on a local memory.

  • Deliverable Horizon2020 EUJ-01-2016 723171 5G-MiEdge D3.3

    Date: February 2019

    Public Deliverable

    5G-MiEdge Page 23

    Offloadable tasks belong to a finite set 𝒞 = {𝑐1, … , 𝑐𝐾}, that we call the computation catalogue. A task 𝑐𝑘 ∈ 𝒞 represented by a triplet: 𝑐𝑘 = (𝑊𝑘, 𝑒𝑘, 𝑊𝑘

    ′ ), where 𝑊𝑘 the input data (a sequence of bits) to be processed, 𝑒𝑘 is the number of CPU cycles per bit needed to elaborate the data, and 𝑊𝑘

    ′ is the computation result (another sequence of

    bits). We denote |𝑊𝑘| and |𝑊𝑘′| the sizes in bits of 𝑊𝑘 and 𝑊𝑘

    ′.

    In order to represent the content of the SSC’s cache, we define the cache indicator as

    the vector 𝜎 = (𝜎1, … , 𝜎𝐾) ∈ {0,1}𝐾 such that 𝜎𝑘 = 1 if and only if the result 𝑊𝑘

    ′ of

    𝑐𝑘 ∈ 𝒞 is stored in the SSC’s cache. Thus, a cache indicator fully identifies the cache content. Since the cache size is limited to 𝑚 bits, in general not all vectors in {0,1}𝐾 correspond to a feasible cache configuration. Therefore, we define the set of feasible

    cache indicators as follows:

    ℱ = {𝜎 ∈ {0,1}𝐾 ∶ ∑ 𝜎𝑘

    𝐾

    𝑘=1

    |𝑊𝑘′| ≤ 𝑚}.

    Task offloading starts with a request from the UE to the SSC specifying the task to run

    and a time delay within which the UE needs to retrieve its result. Such an offloading

    request is denoted 𝑟 = 𝑐𝑘 for some 𝑘, meaning that the UE asks for the execution of the 𝑘-th task and to receive its result within 𝑡 seconds. If the SSC has enough available resources to elaborate the task, the request is accepted, otherwise it is denied.

    Our goal is to describe strategies to reduce the costs of tasks offloading. The total cost

    of the offloading procedure is made of several independent contributions, among

    which we identify two main components:

    i) the cost of uploading the computation inputs from the UE to the SSC; ii) the cost of running the computation at the SSC.

    Depending on the application, the word “cost” can indicate energy consumptions, time

    delays, or any other metric that measures an expense or the quality of service.

    Nonetheless, in all scenarios, there are evident benefits in keeping available in the

    cache memory a computation result before it is requested to the SSC: indeed, whenever

    a result 𝑊𝑘′ is stored, the task 𝑐𝑘 does not need to be run, its input data does not need

    to be uploaded, and 𝑊𝑘′ can be straightforwardly sent to the UE. The most important

    consequence is that the two abovementioned cost components (denoted i) and ii)) are

    zeroed. Hence, the total cost of offloading a cacheable task 𝑐𝑘 ∈ 𝒞 is:

    Γ𝑡𝑜𝑡(𝑐𝑘) = Γ𝑟𝑒𝑞(𝑐𝑘) + (1 − 𝜎𝑘) (Γ𝑈𝐿(𝑐𝑘) + Γ𝑐𝑜𝑚𝑝(𝑐𝑘)) + Γ𝐷𝐿(𝑐𝑘) + 𝛾(𝑐𝑘),

    where Γ𝑟𝑒𝑞(𝑐𝑘) is the cost of sending 𝑟, the offloading request; 𝜎𝑘 is the 𝑘-th entry of

    the cache indicator; Γ𝑈𝐿(𝑐𝑘) is the cost of uploading the input data; Γ𝑐𝑜𝑚𝑝(𝑐𝑘) is the cost of computing the task result (assuming, for simplicity, that the CPU state does not

    vary in time and the computation cost only depends on the task parameters); Γ𝐷𝐿(𝑐𝑘) is the cost of sending the result back to the UE; and 𝛾(𝑐𝑘) includes any other fixed cost that does not directly depend on 𝑐𝑘, e.g., any fixed processing cost at the MEC level or the cost of maintaining active the SSC’s hardware, including the cache memory. The

    cost of reading a task result from the cache is considered negligible.

    The previous considerations lead to an important question: given a cache of finite size,

    how to choose which 𝑊𝑘′’s to store, with the goal of minimizing the overall costs? To

  • Deliverable Horizon2020 EUJ-01-2016 723171 5G-MiEdge D3.3

    Date: February 2019

    Public Deliverable

    5G-MiEdge Page 24

    answer, let us consider 𝑅 offloading requests 𝑟1, … , 𝑟𝑅, sent from the UE to the SSC during its service period. By definition, every request uniquely corresponds to a task:

    for every 𝑖 = 1, … , 𝑅, we have 𝑟𝑖 = (𝑘, 𝑡𝑖), for some 𝑘 ∈ {1, … , 𝐾} identifying a task in the catalogue and some latency constraint 𝑡𝑖. Thus, Γ𝑡𝑜𝑡(𝑟𝑖) = Γ𝑡𝑜𝑡(𝑐𝑘) for some 𝑘 and we define the cost over the whole serving period as:

    Γ(𝜎) = ∑Γ𝑡𝑜𝑡(𝑟𝑖)

    𝑅

    𝑖=1

    .

    Our goal is to find the cache indicator that minimizes Γ(𝜎):

    𝜎𝑜𝑝𝑡 = arg min𝜎∈ℱ

    Γ(𝜎) = arg min𝜎∈ℱ

    (∑ Γ𝑡𝑜𝑡(𝑟𝑖)

    𝑅

    𝑖=1

    ).

    Ideally, if 𝜎𝑜𝑝𝑡 is known, the SSC guarantees an optimal cost minimization by storing

    the 𝑊𝑘′’s for which (𝜎𝑜𝑝𝑡)𝑘 = 1.

    Since the number of cache indicators grows exponentially with 𝐾 , it is not always algorithmically possible to run through all of them to exhaustively determine 𝜎𝑜𝑝𝑡. The scope of our analysis is to propose and evaluate strategies to choose cache indicators

    with close-to-optimal associated performance. A very natural choice is to assign a

    hierarchy among tasks and to fill the cache with the results of the highest-priority ones.

    A caching metric 𝜆 ∶ 𝒞 → 𝐑+ assigns to each task a “cacheability value”. The caching policy based on 𝜆 is the application of the following cache filling algorithm that prioritises the tasks with the highest cacheability value:

    1: let 𝜋 ∶ {1, … , 𝐾} → {1, … , 𝐾} be a permutation such that 𝜆(𝑐𝜋(1)) ≥ ⋯ ≥

    𝜆(𝑐𝜋(𝐾)).

    2: set 𝜎 = (0, … , 0) and 𝑠 = 0.

    3: for 𝑘 = 1, … , 𝐾, do

    4: if 𝑠 + |𝑊𝜋(𝑘)′ | ≤ 𝑚, then

    5: set 𝜎𝜋(𝑘) = 1 and 𝑠 = 𝑠 + |𝑊𝜋(𝑘)′ | .

    6: end if

    7: end for

    8: fill the SSC’s cache according to 𝜎.

    We call 𝜎(𝜆) the indicator yielded by the previous algorithm. Clearly, a caching policy

    is based on a well-designed metric if Γ(𝜎(𝜆)) is close to Γ(𝜎𝑜𝑝𝑡) . A first observation, very spontaneous and common to the context of content caching [IW16],

    is that a good caching policy needs to depend on the popularity of tasks. Indeed, to

    reduce costs, we want to avoid to repeatedly process frequently requested tasks. In this

    perspective, we define the popularity 𝑝𝑘 of a task 𝑐𝑘 to be the probability that 𝑐𝑘 is offloaded to the SSC. In our setting (and, in general, whenever the offloading requests

    are pairwise independent and if their total number 𝑅 is big enough to be statistically representative), we can write:

  • Deliverable Horizon2020 EUJ-01-2016 723171 5G-MiEdge D3.3

    Date: February 2019

    Public Deliverable

    5G-MiEdge Page 25

    𝑝𝑘 = |{𝑖 ∶ 𝑟𝑖 = (𝑘, 𝑡𝑖), ∃ 𝑡𝑖}| ⋅ 𝑅−1.

    In general, the task popularity is a typical example of context information that the SSC

    can learn and update during its serving period. Let us recall the three computation

    caching polices that we investigated in [dC18]:

    First policy: simply based on task popularity, we define:

    𝜆1(𝑐𝑘) = 𝑝𝑘, ∀ 𝑘 ∈ {1, … , 𝐾}.

    A better choice comes from the observation that caching the result of a very popular

    task with low input uploading and computation costs, can be less advantageous than

    caching the result of a less popular task with higher costs. The latter directly depend

    on the size (in bits) of 𝑊𝑘, denoted |𝑊𝑘|, which justifies the next policy.

    Second policy: based on popularity and input data size, let

    𝜆2(𝑐𝑘) = 𝑝𝑘|𝑊𝑘|, ∀ 𝑘 ∈ {1, … , 𝐾}.

    Third policy: finally, a third policy is based on the observation that caching task

    results whose size is small allows storing more of them. Hence, it may be more

    convenient to cache a high number of small-size results, even if their popularity and

    input size do not maximise 𝜆2. To increase the caching priority of tasks with small |𝑊𝑘

    ′|, we define:

    𝜆3(𝑐𝑘) = 𝑝𝑘|𝑊𝑘|

    |𝑊𝑘′|

    , ∀ 𝑘 ∈ {1, … , 𝐾}.

    The introduction of 𝜆3 is an important novelty of our work, that we discussed already in [AR6.2] and formally presented in [dC18]. It turned out to be the most advantageous

    metric of the three, as also confirmed by the numerical simulations shown in the

    following.

    3.2.5 Simulation results

    In our numerical simulations, |𝑊𝑘| and |𝑊𝑘′| are chosen independently at random for

    every 𝑘 as follows: let 𝑦, 𝑌 ∈ 𝐍 satisfy 𝑦 ≤ 𝑌 and let 𝑥, 𝑋 ∈ 𝐑 be two real numbers in [1, 10] (with 𝑥 ≤ 𝑋 if 𝑦 = 𝑌 ). When we say that |𝑊𝑘| belongs to [𝑥𝑒𝑦 ∶ 𝑋𝑒𝑌[ , we mean that 𝑥 ⋅ 10𝑦 ≤ |𝑊𝑘| < 𝑋 ⋅ 10

    𝑌 and |𝑊𝑘| = 𝑢 ⋅ 10𝑣 , with 𝑢 and 𝑣 randomly

    fixed as follows: first, 𝑣 is chosen uniformly in {𝑦, 𝑦 + 1, … , 𝑌} ; then, 𝑢 is chosen uniformly either in [𝑥, 10[ (if 𝑣 = 𝑦) or in [1, 10[ (if 𝑦 < 𝑣 < 𝑌) or in [1, 𝑋[ (if 𝑣 =𝑌). The same rule is used for |𝑊𝑘

    ′|, independently from the corresponding |𝑊𝑘|. With this strategy, there is no privileged order of magnitude among the values taken by |𝑊𝑘| and |𝑊𝑘

    ′|, even when the maximum possible value is much bigger than the minimum.

    In figures below, the abscissae represent the SSC’s cache size. 0% means that 𝑚 = 0 bits and 100 % that 𝑚 = ∑ |𝑊𝑘

    ′|𝐾𝑘=1 bits. Fig. 3-4 and Fig. 3-5 were obtained with the simulation parameters specified in Table 3-1.

  • Deliverable Horizon2020 EUJ-01-2016 723171 5G-MiEdge D3.3

    Date: February 2019

    Public Deliverable

    5G-MiEdge Page 26

    Table 3-1 Parameters for numerical simulations

    Parameter Value Parameter Value

    𝛼 0.6 𝑅𝐷𝐿 500 Mb/s

    |𝑊𝑘| [1𝑒6 ∶ 1𝑒9[ bits 𝑅𝑈𝐿 125 Mb/s

    |𝑊𝑘′| [1𝑒3 ∶ 1𝑒9[ bits 𝑒𝑘/𝑓 10

    −8 s/bit

    In particular, we considered stable radio channel conditions and constant uplink and

    downlink communication rates. First, the cache was filled applying one of the three

    policies defined above, then a high number of offloading requests were simulated. We

    supposed the popularity of offloading requests to obey the Zipf law [BBD14]: 𝑝𝑘 =(𝐴𝑘𝛼)−1, for constant 𝛼 and 𝐴 = ∑ 𝑘−𝛼𝐾𝑘=1 . Notice that, without loss of generality, tasks can be assumed to be sorted in the catalogue by descending popularity. The

    simulated offloading operation consisted of four main serial steps: offloading request,

    input data uploading, task computation, results downloading. If the results of the

    computation were found in the SSC’s cache, data uploading and task computation were

    skipped and the results directly sent to the UE. In all simulations, we assumed that a

    new offloading request was sent instantaneously after the results of the previous one

    were downloaded.

    Fig. 3-4 shows, for 𝐾 = 50000, the percentage of task input data that did not need to be uploaded nor elaborated because the corresponding results were cached and

    available for downloading. For brevity, we call this the “spared input data”. Measuring

    the spared input data is an effective approach to evaluate the goodness of the caching

    policies: the more it is, the higher the corresponding saving in energy, time, or any

    other metric, both for the UE and for the SSC. Notice that avoiding to send the input

    data from the UE to the SSC also and very importantly entails an uplink traffic

    reduction. For computations such that 𝑊𝑘′ is cached, the uplink data transfer is reduced

    of |𝑊𝑘| bits, which is not negligible at all. Maximizing the quantity of spared input data not only reduces the computing load at the SSC and the communication

    time/energy at the UE’s side, but also contributes to lighten the uplink communication

    channel, allowing to handle more requests and/or more UEs.

    In Fig. 3-4, the quality of the third policy is clearly confirmed by the separation among

    the curves. Remarkably, for a cache only as big as 2 % of the total size, the third policy allows to spare more than 80 % of the input data, whereas the first and second policy respectively achieve around 20 and 30 %.

    Fig. 3-5 shows the ratio between the average number of offloaded tasks per hour with

    and without computation caching. Measuring this gain involves the computation of the

    offloading time for every request. In the notation of the beginning of this subsection,

    where in this case Γ𝑡𝑜𝑡(𝑐𝑘) indicates the total offloading time of 𝑐𝑘 = (𝑊𝑘 , 𝑒𝑘, 𝑊𝑘′),

    we have Γ𝑟𝑒𝑞(𝑐𝑘) = 128/𝑅𝑈𝐿 (where we supposed that a request 𝑟𝑖 has a standard size

    of 16 bytes, Γ𝑈𝐿(𝑐𝑘) = |𝑊𝑘|/𝑅𝑈𝐿 , Γ𝑐𝑜𝑚𝑝(𝑐𝑘) = 𝑒𝑘|𝑊𝑘|/𝑓 , and Γ𝐷𝐿(𝑐𝑘) = |𝑊𝑘′|/

    𝑅𝐷𝐿. We also added to the previous terms a latency of 2 ms, corresponding to 𝛾(𝑐𝑘). Fig. 3-5 reasserts the superiority of the third policy, which allows gains of up to a factor

    10 for cache sizes of less than 20 %, whereas the gain with respect to the other policies

  • Deliverable Horizon2020 EUJ-01-2016 723171 5G-MiEdge D3.3

    Date: February 2019

    Public Deliverable

    5G-MiEdge Page 27

    does not go beyond a factor 4. These gains translate into reduced uplink transmissions

    and facilitate the prevention of uplink bottlenecks.

    Fig. 3-4 Spared input data for 𝑲 = 𝟓𝟎𝟎𝟎𝟎

    Fig. 3-5 Gain in served offloading requests per hour for 𝑲 = 𝟓𝟎𝟎𝟎𝟎

  • Deliverable Horizon2020 EUJ-01-2016 723171 5G-MiEdge D3.3

    Date: February 2019

    Public Deliverable

    5G-MiEdge Page 28

    The simulations that produce the results of Fig. 3-6 and Fig. 3-7 are organized as

    follows:

    1. The cache of the SSC is considered already filled at time 0, using the cache filling algorithm presented above.

    2. We consider 𝑇 = 107 successive time intervals. 3. During each time interval 𝑡 , for 𝑡 = 1, … , 𝑇 , the SSC receives and treats a

    random number of offloading requests 𝑅𝑡. All the 𝑅𝑡 are i.i.d. Poisson random variables with parameter 𝜆 = 5, i.e. in average 𝑅𝑡 = 5.

    For each time interval 𝑡, each one of the 𝑅𝑡 requests corresponds to one task 𝑐𝑘. This choice is made at random, independently for every request, respecting the Zipf

    distribution of task popularity. The average computational delay depicted in Fig. 3-6

    represents the time (in seconds) needed to compute the results of the 𝑅𝑡 offloading requests received at time interval 𝑡, averaged over the total number of time intervals 𝑇 = 107 . Here, we are considering the purely computational delay, i.e. (1 −

    𝜎𝑘)Γ𝑐𝑜𝑚𝑝(𝑐𝑘) = (1 − 𝜎𝑘) 𝑒𝑘|𝑊𝑘|

    𝑓, as a measure of the “load” of the SSC. Depending

    on the request arrival rate, the average computational delay allows to estimate if the

    system is well dimensioned and can handle the incoming traffic of computation

    offloading requests. As desired and expected, policy 𝜆3 yields the best average computational delays.

    During each time interval 𝑡, the SSC receives 𝑅𝑡 computation offloading requests. Let us call 𝐷𝑡 the computational delay needed to compute their results. We say that a computational resource outage occurs at time 𝑡 if 𝐷𝑡 > 1, i.e. if the computation of the task results requested at time interval 𝑡 is not complete within the end of the interval. Fig. 3-7 shows the measured computational resource outage probability as a

    function of the cache size.

    Fig. 3-6 Average computation delay for 𝑲 = 𝟐𝟎𝟎𝟎𝟎 and 𝝀 = 𝟓

  • Deliverable Horizon2020 EUJ-01-2016 723171 5G-MiEdge D3.3

    Date: February 2019

    Public Deliverable

    5G-MiEdge Page 29

    Fig. 3-7 Probability of computational outage for 𝑲 = 𝟐𝟎𝟎𝟎𝟎 and 𝝀 = 𝟓

    3.2.6 Computation caching with federation of small cells

    Let us enlarge our scenario and consider the presence of more than one small cell. In

    such a context, the SSC that serves the UE (or UEs) can collaborate with and “talk” to

    its neighbouring small cells via backhaul connections. We are interested in studying

    how these connections can be exploited to increase the benefits of computation caching,

    since in this scenario the SSC can indirectly access all its neighbours’ cache memories.

    So, let us suppose that the SSC has 𝑁 neighboring small cells whose cache size is 𝑚 bits. The backhaul communication rate between the SSC and each one of its

    neighbours is denoted 𝑅𝐵𝐻. We call 𝒮 = {𝑠0, … , 𝑠𝑁} the set of small cells and 𝑠0 is the SSC. Let us call 𝐶𝑛 ⊆ 𝒞 the (content of the) cache of small cell 𝑠𝑛 ∈ 𝒮 , for 𝑛 =0, … , 𝑁 . Thus, we can generalize the notation of the previous subsections and characterize 𝐶𝑛 with a cache indicator vector 𝜎

    𝑛 ∈ {0,1}𝐾, such that 𝜎𝑘𝑛 = 1 if the 𝑘-

    th task result is stored in 𝐶𝑛 and 0 otherwise. More generally, for a subset of small cells

    𝒮′ ⊆ 𝒮, we define its cumulative cache indicator 𝜎𝒮′

    ⊆ {0,1}𝐾 as:

    𝜎𝑘𝒮′ = 1, if ∃𝑛 such that 𝑠𝑛 ∈ 𝒮

    ′ and 𝜎𝑘𝑛 = 1;

    𝜎𝑘𝒮′ = 0, otherwise.

    With every small cell 𝑠𝑛 ∈ 𝒮 we associate a popularity vector 𝑝𝑛 ∈ ]0,1]𝐾 . Each 𝑝𝑘

    𝑛

    represents the popularity of task 𝑐𝑘 as measured (or learnt or known or estimated, etc.)

    by 𝑠𝑛. In general, 𝑝𝑛 and 𝑝𝑛

    ′ are different for 𝑛 ≠ 𝑛′, because the users served by two

    different small cells do not necessarily have the same interests. Hence, even when

    some small cells use the same policy (among the ones described above) to fill their

    cache memories, the resulting content of the caches will be different and they will store,

    in general, different computation results.

  • Deliverable Horizon2020 EUJ-01-2016 723171 5G-MiEdge D3.3

    Date: February 2019

    Public Deliverable

    5G-MiEdge Page 30

    We want now to generalise to the considered scenario the offloading costs expressed

    above. In such a context, computation offloading happens as follows: first, the UE

    sends an offloading request 𝑟 = (𝑘, 𝑡) to its SSC; if the SSC’s cache contains 𝑊𝑘′, the latter is sent to the UE; if instead 𝑊𝑘

    ′ ∉ 𝐶0, the SSC transfers the offloading request 𝑟 to its neighbours; if 𝑊𝑘

    ′ ∈ 𝐶𝑛 for some 𝑛 and if downloading 𝑊𝑘′ from the 𝑛 -th

    neighbour costs less than computing it locally, then the SSC fetches 𝑊𝑘′ from that

    neighbour and forwards it to the UE; otherwise, if this operation costs too much or if

    𝑊𝑘′ ∉ 𝐶𝑛 for all 𝑛, the SSC receives 𝑊𝑘 from the UE, computes 𝑊𝑘

    ′, and sends it to

    the UE. So, if the SSC is federated with (i.e. is in communication with) a set ℱ ⊆{𝑠1, … , 𝑠𝑁} of other small cells, then the cost can be written as

    Γ𝑡𝑜𝑡(𝑐𝑘) = Γ𝑟𝑒𝑞(𝑐𝑘) + Γ𝐷𝐿(𝑐𝑘) + 𝛾(𝑐𝑘)

    + (1 − 𝜎𝑘0) (Π𝑘 (Γ𝑈𝐿(𝑐𝑘) + Γ𝑐𝑜𝑚𝑝(𝑐𝑘)) + (1 − Π𝑘)𝑀𝑘),

    with

    𝑀𝑘 = min (Γ𝑈𝐿(𝑐𝑘) + Γ𝑐𝑜𝑚𝑝(𝑐𝑘), Γ𝐵𝐻(𝑐𝑘)),

    where Γ𝐵𝐻(𝑐𝑘) is the cost of retrieving the result of 𝑐𝑘 from one of the federated caches, and

    Π𝑘 = ∏ (1 − 𝜎𝑘𝑛)

    𝑛∶𝑠𝑛∈ℱ

    .

    Notice that Π𝑘 = 0 if 𝜎𝑘𝑛 = 1 for some 𝑠𝑛 ∈ ℱ; otherwise, Π𝑘 = 1.

    At this point, we would like to define a criterion that the SSC 𝑠0 can apply to decide with which of its 𝑁 neighbours to federate. In general, the federation with a neighbour 𝑠𝑛 is beneficial to 𝑠0 if two conditions are met:

    1. The cache of 𝑠𝑛 contains task results which are not contained in the cache 𝐶0 of 𝑠0.

    2. The cost of retrieving those results via backhaul connection is less than the cost

    spent by 𝑠0 to compute them, i.e. 𝑀𝑘 = Γ𝐵𝐻(𝑐𝑘). Inspired by the second condition, we define as follows the reward perceived by the

    SSC when it can retrieve the result of task 𝑐𝑘 that is not cached in its memory:

    𝑔𝑘 = max (0, 𝑝𝑘0 (Γ𝑈𝐿(𝑐𝑘) + Γ𝑐𝑜𝑚𝑝(𝑐𝑘) − Γ𝐵𝐻(𝑐𝑘))) .

    This definition allows us to introduce a preference metric that 𝑠0 can use to classify the utility of federating with the other small cells; we define the preference of the SSC

    for 𝑠𝑛 as:

    𝜋𝑛0 = ∑ 𝜎𝑘

    𝑛(1 − 𝜎𝑘0)𝑔𝑘

    𝐾

    𝑘=1

    ∈ 𝐑.

    This preference measures the “gain” perceived by the SSC when it has access to the

    cache 𝐶𝑛. As wanted, only tasks which are cached in 𝑠0 but not in 𝑠𝑛 contribute to the gain (this is ensured by the product 𝜎𝑘

    𝑛(1 − 𝜎𝑘0)).

    We can extend the definition of preference as follows: if 𝒯 ⊆ 𝒮 is a cluster of federated small cells (containing the SSC), then the gain that the SSC perceives from federating

    with 𝑠𝑛 is

    𝜋𝑛𝒯 = ∑ 𝜎𝑘

    𝑛(1 − 𝜎𝑘𝒯)𝑔𝑘

    𝐾

    𝑘=1

    ∈ 𝐑.

  • Deliverable Horizon2020 EUJ-01-2016 723171 5G-MiEdge D3.3

    Date: February 2019

    Public Deliverable

    5G-MiEdge Page 31

    Let us call ℱ ⊆ {𝑠1, … , 𝑠𝑁} the cluster of the small cells that are federated with 𝑠0. We can simply build ℱ as follows:

    1: Set 𝒯 ← {𝑠0} and ℱ ← ∅. 2: repeat

    3: Find �̂� = arg max{𝑛∶𝑠𝑛∉𝒯}

    𝜋𝑛𝒯 .

    4: Set 𝒯 ← 𝒯 ∪ {𝑠�̂�} and ℱ ← ℱ ∪ {𝑠�̂�}. 5: until a desired stopping condition is reached.

    Examples of stopping conditions may be the achievement of a satisfactory total (i.e.

    per cluster) cache size or of a maximum number of federated neighbours, which can

    depend on the associated federation costs and amount of control signalling.

    3.2.7 Simulation results

    Similarly to the case where we considered only one small cell (the SSC), the popularity

    of offloading requests in this new scenario obeys Zipf’s law: without loss of generality,

    let us suppose that the task catalogue is ordered in such a way that 𝑐1 is the most popular task for the SSC, 𝑐2 is its second most popular, and so on; then, 𝑝𝑘

    0 = 𝐴/𝑘𝛼, for some constant 𝛼 and 𝐴 = ∑ 𝑘−𝛼𝐾𝑘=1 . For all the other small cells, instead, let 𝜙𝑛: {1, … , 𝐾} → {1, … 𝐾} be a random permutation of 𝐾 elements (i.e. of the 𝐾 tasks); then, we fix 𝑝𝑘

    𝑛 = 𝑝𝜙𝑛(𝑘)0 . In practice, the popularity values are the same, but their

    order is randomly shuffled.

    Simulations are carried out as in Section 3.2.5 and the simulation parameters are the

    same of Table 3-1 (unless stated otherwise). The performance evaluators that we plot

    are the same of Section 3.2.5 as well. The difference now is that the SSC is connected

    to its neighbours (the backhaul communication rate is 𝑅𝐵𝐻 = 10 Gbit/s) and the cache size is always fixed to be 5% of the total cacheable amount of data. The SSC has access

    to the cache memory of the neighbours with which it is federated and the offloading

    costs are calculated as described in Section 3.2.6. Here the figures are plotted in

    function of the number of neighbours with which the SSC federates, using the

    federation algorithm presented at the end of Section 3.2.6 with 𝑔𝑘 = 𝑝𝑘0|𝑊𝑘| . In

    particular, zero neighbours means that the SSC operates on its own, without assistance

    from the other small cells.

    The benefits of cache federation to improve the efficacy of computation caching are

    visible from all figures below. For all policies, all measured parameters improve

    (increase or decrease, according to their nature) when the SSC increases the number

    of neighbours involved in the treatment of offloading requests. Qualitatively speaking,

    the main message is that small cell federation is beneficial and can substitute the

    increase of the size of small cells’ cache memory. By comparing the results policy by

    policy, as for the results of Section 3.2.5, we have the confirmation that policy 𝜆3 achieves better performance than 𝜆1 and 𝜆2. Moreover, one can notice that in general the performance related to 𝜆3 increases or decreases slightly slower than that related to 𝜆2 and 𝜆1. This means that (again, speaking in a qualitative sense), the process of federation is relatively more effective when the computation caching policy is 𝜆2 or 𝜆1 . Nonetheless, even the best performance obtained with 𝜆2 or 𝜆1 when the SSC federates with other 19 neighbours, do not beat the performance induced by policy 𝜆3, even in absence of federation. This means that somehow 𝜆3is in absolute intrinsically better and more suitable to the computation caching framework, at least when it

  • Deliverable Horizon2020 EUJ-01-2016 723171 5G-MiEdge D3.3

    Date: February 2019

    Public Deliverable

    5G-MiEdge Page 32

    concretely differs from the other policies (i.e., when |𝑊𝑘′| takes values in a wide

    enough range).

    Fig. 3-8 Spared input data for 𝑲 = 𝟐𝟎𝟎𝟎𝟎 and 𝝀 = 𝟓

    Fig. 3-9 Gain in number of treated requests per hour, 𝑲 = 𝟐𝟎𝟎𝟎𝟎, 𝝀 = 𝟓

  • Deliverable Horizon2020 EUJ-01-2016 723171 5G-MiEdge D3.3

    Date: February 2019

    Public Deliverable

    5G-MiEdge Page 33

    Fig. 3-10 Average computational delay for 𝑲 = 𝟐𝟎𝟎𝟎𝟎 and 𝝀 = 𝟓

    Fig. 3-11 Probability of computational resource outage, 𝑲 = 𝟐𝟎𝟎𝟎𝟎, 𝝀 = 𝟓

  • Deliverable Horizon2020 EUJ-01-2016 723171 5G-MiEdge D3.3

    Date: February 2019

    Public Deliverable

    5G-MiEdge Page 34

    4 Learning algorithms for physical and application layer parameters and context information

    The run-time awareness of the operating environment conditions is an important

    information for smart and efficient usage of resources in different types of dynamic

    systems. In this section, we will report the advancements achieved in three different

    applications: i) construction of the radio environment map (REM); ii) recovery of the

    spatial pattern of wireless data traffic from sparse measurements; iii) prediction of file

    popularity across space and time. Adopting learning algorithms from the physical to

    the application layer gives the possibility of optimizing prefetching algorithm as well

    as proactive resource allocation strategies. In this case, context information can be

    learned from available data (e.g. current traffic), predicting and estimating different

    parameters across space and time. In the case of REM, the knowledge of the

    electromagnetic field at some points in space, can help in reconstructing the field at

    other points. At the same time, the knowledge of current traffic and time series helps

    in predicting future traffic demand in order to be able to accommodate it. Moreover, if

    coupled with popularity estimation, this method can be used to assist data prefetching

    algorithms. In the ensuing sections, we report the state of the art and the progress

    achieved by 5G-MiEdge in these fields.

    4.1 Graph topology inference from data

    Associating a graph-based representation with a dataset plays a crucial role in

    determining and extracting relevant information from the data. Recently, the research

    field known as Graph Signal Processing (GSP) [SM14], has extended the classical

    signal processing tools to the analysis of signals defined over graph. A key feature of

    GSP is that the analysis tools, like for instance the Graph Fourier Transform, come to

    depend on the graph topology. There is a large amount of works whose goal is to learn

    the network topology from a set of observations [Kol09], [GSK18] and [MSM18]. By

    modelling the observations as random variables or processes, the graph topology

    typically reflects correlations among signals defined over its vertices. However,

    looking only at correlations may fail to capture the causality relations existing among

    the data. Alternative approaches using the partial correlation [Kol09] or Gaussian

    graphical models [FHT08], [LT10] have been deeply investigated. Some GSP-based

    approaches make assumptions about the graph by enforcing properties such as sparsity

    and/or smoothness of the signals [Kal16], [DTF+16].

    The Graph Fourier Transform (GFT) for undirected graph has been defined as the

    projection of the observed signal onto the space spanned by the eigenvectors of the

    graph Laplacian matrix. This implies that a signal defined over the vertices of a graph,

    if associated with different graph topologies (i.e., to different sets of edges) leads in

    general to different spectra. In [SBD19], we proposed a method to associate a graph

    topology with the observed signal in order to make the signal band-limited over the

    inferred graph.

    Enforcing this band-limited property enables then the use of sampling theory to

    recover the overall signal from a subset of values, see e.g. [TBD16]. This property is

  • Deliverable Horizon2020 EUJ-01-2016 723171 5G-MiEdge D3.3

    Date: February 2019

    Public Deliverable

    5G-MiEdge Page 35

    appealing in all applications where it is convenient to reduce the number of

    observations.

    The approach we proposed in [SBD19] is composed of two main steps:

    1. Learn, jointly, the subset of the GFT basis vectors 𝐔 associated with the bandlimited signal 𝐬 and the sparse signal representation from the observations;

    2. Infer the graph weighted Laplacian 𝐋, and then the graph topology, from the estimated (partial) GFT basis.

    More specifically, we defined a signal 𝐲 on a graph 𝒢 as a mapping from the vertex set 𝒱 to the set of real numbers. For undirected graphs with N vertices, the GFT 𝐬 of a graph signal 𝐲 has been defined as the projection of 𝐲 onto the subspace spanned by the eigenvectors 𝐔 = {𝒖𝑖}𝑖=1

    𝑁 of the Laplacian matrix 𝐋 , i.e. 𝒔 = 𝐔𝑇𝒚 . A band-limited graph signal is a signal whose GFT 𝒔 is sparse, i.e., it can then be written as

    𝒚 = 𝐔𝒔 where 𝒔 is sparse. Given a subset of indices 𝒦 ⊆ 𝒱 , the band-limiting operator over the set 𝒦 is defined as 𝐁𝒦 = 𝐔𝚺𝒦𝐔

    𝑇, where 𝚺𝒦 is a diagonal matrix whose 𝑖-th diagonal entry is 1, if 𝑖 ∈ 𝒦, and 0 otherwise.

    A signal 𝐲 is said to be perfectly band-limited, within the (frequency) index set, if 𝐁𝒦𝒚 = 𝒚 [TBD16]. The band-limited property is useful because, among other properties, it enables signal reconstruction from a subset of samples. Let us suppose

    we observe a signal only over a subset of nodes I and we wish to reconstruct the overall

    signal. Defining by 𝐆𝐼 the selecting diagonal matrix whose i-th diagonal entry is 1 if 𝑖 ∈ 𝐼 and 0 otherwise, and denoting by 𝒓 = 𝐆𝐼𝒚 the observed signal, under some conditions on the bandlimited signal 𝒚 [TBD16], the entire signal can be recovered from r :

    𝒚 = 𝐔𝓚(𝐔𝓚𝑻 𝐆𝑰𝐔𝓚)

    −𝟏𝐔𝓚𝑻 𝒓 (1)

    where 𝐔𝒦 is the 𝑁x𝐾 matrix whose columns are the eigenvectors of 𝐋 associated with the signal bandwidth.

    Then we apply the method proposed in [SBD19] to learn the orthonormal transform

    matrix 𝐔, the sparse matrix 𝐒 of the signals GFT, and the underlying graph topology, captured by the Laplacian matrix 𝐋 that admits the columns of 𝐔 as its eigenvectors. Our approach is based on the minimization, under proper constraints forcing the signal

    bandlimitedness, of the objective function ‖𝐘 − 𝐔𝐒‖ + 𝑓(𝐋, 𝐘, 𝐒) , i.e. the sum of the data fitting error plus a penalty function 𝑓(𝐋, 𝐘, 𝐒) useful to drive the graph topology to reflect the desired properties of the observed graph signals. In the ensuing sections

    we apply our graph topology inference method to the recovery of the radio

    environment map.

    4.2 Radio environment map (REM)

    4.2.1 State of the art

    Building a radio environment map (REM) is instrumental to devise appropriate

    resource allocation schemes, as already suggested in a series of works [Li17], [Far16],

    [Pes+14], [Wei13]. In particular, building a dynamic REM is the key step to enable an

    effective Dynamic Spectrum Access (DSA) in Cognitive Radio Networks (CRNs).

  • Deliverable Horizon2020 EUJ-01-2016 723171 5G-MiEdge D3.3

    Date: February 2019

    Public Deliverable

    5G-MiEdge Page 36

    Typically, a REM is built starting from sparse measurements and then using some kind

    of interpolation. The fundamental problem in building a REM is the trade-off between

    accuracy and number of measurements: the larger is the number of measurements, the

    better is the accuracy. A number of different approaches have been proposed in the

    literature, see e.g. [Pes+14] for a survey.

    4.2.2 Contribution

    To overcome the limitations of available approaches, we applied the innovative

    algorithm described in Section 4.1 for building a REM, based on the representation of

    the field to be reconstructed over a graph. The starting point of the proposed approach

    is that the relationships between the field values in different points in space can be

    properly represented through a graph whose topology captures the correlation among

    different points. Standard approaches implicitly assume that this graph is a regular

    graph, where each position in space is placed on a regular grid. Conversely, we do not

    assume the graph to be given a priori, but we want to learn its topology from

    measurements collected during a training phase. Then, once the learning step has been

    completed, we exploit the graph to find out the optimal interpolator. The details of our

    proposed approach are given in [SBD19].

    Let us consider an example of electromagnetic field map in a urban environment

    obtained using the ray tracing tool Remcom Wireless InSite 2.6.3 1. In Fig. 4-1, we

    report an example of the field observed in a district of Ottawa. In this case, there are

    four active radio base stations, located in the southeast, northeast, northwest, and

    southwest sides of the examined area. The field is deterministic, as obtained by the

    ray-tracing computer model. To make the field more representative of reality, we

    introduced log-normal random fading to better simulate a realistic environment.

    Fig. 4-1 REM: true field (background), reconstructed field (circles)

  • Deliverable Horizon2020 EUJ-01-2016 723171 5G-MiEdge D3.3

    Date: February 2019

    Public Deliverable

    5G-MiEdge Page 37

    The background (continuous) color is the underlying field and it acts as a benchmark

    to test the validity of our algorithms. The N = 136 circles identify the graph nodes

    where we wish to reconstruct the field, based on a number of measurements much

    smaller than 136. To proceed, we start inferring a graph structure that captures

    similarities across different nodes. The training set is built adopting different

    configurations of active radio base stations and incorporating random channel

    variability with respect to the deterministic ray-tracing model. The idea underlying our

    algorithm is to associate a graph to a data set such that the observed signals looks as

    band-limited over the inferred graph. This is the key enabling us to apply sampling

    theory for signals defined over graphs [TBD16]. This theory is instrumental to derive

    the optimal interpolator and for finding the optimal sample location [TBD16]. This

    last possibility is useful in those cases where we can design a priori the location where

    to take field measurements. The possibility to reduce the number of observations and

    still be able to reconstruct the overall field is particularly important in this application.

    The details on the graph recovery algorithms and on the signal reconstruction from a

    reduce number of samples is reported in our recent contribution [SBD19]. A numerical

    example useful to assess the goodness of the proposed approach is reported in Fig. 4-1,

    where the color within the circles represent the reconstructed field values from a

    number of measurements equal to 10. Comparing the color within each circle with the

    color around the point (the continuous map, which represents the benchmark), we can

    see that the reconstructed value is very close to the real one. In numerical terms, in our

    experiment the normalized mean square error (NMSE) in this case is equal to 0.05.

    This is indeed a remarkable result because it shows that associating the proper graph

    to a set of measurements, one is able to strongly reduce the number of observations

    and still be able to reconstruct the overall field with good accuracy. In Fig. 4-2, we plot

    the NMSE for the reconstructed REM field versus the number of samples. We can

    observe that using a numbers of samples equal to K=4, the NMSE is small and the

    benefit from further increasing the number of samples is negligible.

    Fig. 4-2 NMSE versus number of samples used for the signal reconstruction

  • Deliverable Horizon2020 EUJ-01-2016 723171 5G-MiEdge D3.3

    Date: February 2019

    Public Deliverable

    5G-MiEdge Page 38

    4.3 Traffic map

    In this task, we have analyzed wireless data traffic, collected in the area of Milan, Italy

    [BDAPI]. The data represent data traffic as a function of space and time. A snapshot

    of the aggregated traffic is reported in Fig. 4-3.

    Fig. 4-3 Data traffic map in the city of Milan, Italy

    We applied our methods for filtering and recovery of signals defined over a graph to

    this data set. In particular, we inferred the graph topology reflecting the relations

    among data traffic values in different points in space and time and then we used the

    graphical representation to derive optimal sampling algorithms and signal recovery.

    More specifically, we considered the outgoing calls activity generated by the Telecom

    Italia cellular network over Milan. The activity in terms of issued calls are spatially

    aggregated using a squared grid (see [BDAPI] for details on data generation). We

    focus on the area around the historical centre of Milan by selecting a grid of N=144

    nodes. We observed the calls daily traffic during the months of November and

    December 2013. The data are aggregated for each day over an interval of one hour and

    observing the time interval from hours 7:00 to 11:00 a.m. To better capture the calls

    activity dynamics we further distinguished the working and weekend days by

    processing them as two separate datasets. We used a training set of M=15 and M=10

    days for the working days and weekends, respectively. The remaining days of each set

    are assumed as testing data. Our goal is to recover a graph in order to make the

    observed signal bandlimited over the inferred graph. A necessary condition for the

    recovery of the overall signal from a subset of samples is that the number of samples

    Ns has not to be smaller than the bandwidth K.

    Given the observed matrix 𝐘 we first estimate the transform matrix �̂� and then we recover the network topology by using the total variation-based graph learning

    algorithm proposed in [SBD19], with K=20. Then, using the Max-Det greedy sampling

  • Deliverable Horizon2020 EUJ-01-2016 723171 5G-MiEdge D3.3

    Date: February 2019

    Public Deliverable

    5G-MiEdge Page 39

    strategy proposed in [TBD16], we selected a subset of 20 nodes and then recovered

    the overall signal using equation (1).

    An example of reconstructed traffic map is reported in Fig. 4-4 and Fig. 4-5, where we

    used Ns = K = 20 signal samples to recover the calls traffic over four consecutive hours on Thursday 28 November 2013. Comparing the recovered map (left column)

    with the real traffic (right column), we can notice that the reconstructed traffic map is

    very similar to the real one.

    To investigate the trade-off between the signal bandwidth K and the normalized mean

    square error (NMSE), Fig. 4-6 shows as by increasing both K and the number of

    samples, the accuracy in signal recovering improves as well.

    As further result, in Fig. 4-7, we plot the NMSE versus the number of used samples.

    We can observe as increasing the number of observed samples the NMSE decreases as

    well, and by using a small number of samples, i.e. K=N=20, we can guarantee a good

    field reconstruction. Finally, in , we illustrate an example of recovered calls map for

    the weekend days, specifically on Saturday 7 December 2013. It c