data dissemination using information-centric networking · 2.1 information-centricnetworking...

Data Dissemination using Information-CentricNetworking

by

Ali Shariatmadari

A thesis submitted in conformity with the requirementsfor the degree of Doctor of Philosophy

Graduate Department of Electrical and Computer EngineeringUniversity of Toronto

c© Copyright 2016 by Ali Shariatmadari

Abstract

Data Dissemination using Information-Centric Networking

Ali Shariatmadari

Doctor of Philosophy

Graduate Department of Electrical and Computer Engineering

University of Toronto

2016

Information-Centric Networking (ICN) is a promising paradigm to answer

challenges the current Internet is facing. It is a paradigm that puts content

first, and inherently enables content mobility and content security. In this

work, we use ICN in real world applications. We present an ICN-based data-

dissemination layer for Smart City platforms. We also present a content-based

publish/subscribe overlay system based on that data-dissemination layer. We

are using the system to collect and publish data from various sources, in-

cluding demos with Unmanned Autonomous Vehicles (UAVs) providing live

transportation video.

Furthermore, by promoting in-network caching, ICN is a promising paradigm

to answer current challenges in the service provider’s domain. This work re-

ports on a cache placement and content routing strategy for service providers

to delay the onset of congestion (time-to-exhaustion) to the extent possible in

order to optimize their capital expenditure for their limited capacity planning

budget. We show that even a limited deployment of ICN provides a substantial

increase in the time-to-exhaustion of the network and a decrease in the num-

ii

ber of links with high utilization. We also study the effects of homogeneous

and heterogeneous caching mechanisms on the performance of an ICN based

content-delivery system.

iii

to my wife, my mother, and my father

iv

Acknowledgements

This work would not have been possible without the help and support of

many. First and foremost, I wish to offer my sincerest gratitude to my super-

visor, Professor Alberto Leon-Garcia, who has supported me by his generous

and continuous support, advice, and guidance throughout my study. His in-

sightful suggestions and ideas have been precious for the development of this

thesis. It has been an honor and privilege for me to work with him, and for

that, I am grateful.

Besides my advisor, I would like to thank the respectable members of my

examination committee, Prof. Roch Glitho, Prof. Baochun Li, Prof. Ben Liang,

and Prof. Shahrokh Valaee, for their constructive comments, feedbacks, and

questions.

My sincere thanks also go to Dr. Ali Tizghadam for all the stimulating

discussions, suggestions, and ideas. Also, I would like to thank all the members

of the Network Architecture Lab.

I wish to give my special gratitude to my wife, Maryam, whose love and

support made my journey possible. Finally, I thank my parents for their

love and encouragement, without whom I would never have enjoyed so many

opportunities.

v

Contents

1 Motivations 1

1.1 Challenges of Current Internet . . . . . . . . . . . . . . . . . . 2

1.2 Possible Solution: Information-Centric Networking . . . . . . . 3

1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.3.1 Data Dissemination using ICN in Smart City Platforms 4

1.3.2 Content Delivery in Service Providers . . . . . . . . . . 6

1.4 Thesis organization . . . . . . . . . . . . . . . . . . . . . . . . 7

2 Background and Related Works 8

2.1 Information-Centric Networking . . . . . . . . . . . . . . . . . 8

2.1.1 Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.1.2 A Brief History of ICN . . . . . . . . . . . . . . . . . . 12

2.1.3 Named-Data Networking . . . . . . . . . . . . . . . . . 13

2.1.4 MobilityFirst . . . . . . . . . . . . . . . . . . . . . . . 19

2.1.5 ICN Design Selection . . . . . . . . . . . . . . . . . . . 22

2.2 CVST Platform . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2.2.1 Smart Application on Virtual Infrastructure . . . . . . 29

vi

2.2.2 Publish/Subscribe Systems . . . . . . . . . . . . . . . . 31

2.3 Content Delivery over Internet . . . . . . . . . . . . . . . . . . 34

2.3.1 Content Delivery Networks . . . . . . . . . . . . . . . . 37

2.3.2 Content Provider’s Cache . . . . . . . . . . . . . . . . 39

2.3.3 Transparent Caching . . . . . . . . . . . . . . . . . . . 41

2.3.4 Cache Placement in ICN . . . . . . . . . . . . . . . . . 42

2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3 Data Dissemination in CVST 44

3.1 ICN-Based Data Dissemination Layer . . . . . . . . . . . . . . 44

3.1.1 Publisher-Broker Exchange . . . . . . . . . . . . . . . . 46

3.1.2 Subscriber-Broker Exchange . . . . . . . . . . . . . . . 47

3.1.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 49

3.2 Broker Architecture . . . . . . . . . . . . . . . . . . . . . . . . 53

3.2.1 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 58

3.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 61

3.3.1 Broker Implementation . . . . . . . . . . . . . . . . . . 61

3.3.2 Communication Layer . . . . . . . . . . . . . . . . . . 62

3.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

3.4.1 Traffic Flow Sensors . . . . . . . . . . . . . . . . . . . 66

3.4.2 Public Transportation . . . . . . . . . . . . . . . . . . 69

3.4.3 Drone Vision as a Service . . . . . . . . . . . . . . . . 71

3.4.4 Subscription Portal . . . . . . . . . . . . . . . . . . . . 76

3.5 Evaluation and Performance Tests . . . . . . . . . . . . . . . . 78

3.5.1 IDD Publication Test . . . . . . . . . . . . . . . . . . . 79

vii

3.5.2 Scalability of the Matching Engine . . . . . . . . . . . 80

3.5.3 IDD and IP Performance Comparison . . . . . . . . . . 82

3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

4 Content Delivery in Service Providers 86

4.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . 87

4.1.1 Content Distribution in Service Providers . . . . . . . . 87

4.1.2 Time-to-exhaustion . . . . . . . . . . . . . . . . . . . . 89

4.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . 92

4.2.1 Demands and Storage Budget . . . . . . . . . . . . . . 94

4.2.2 Content Delivery Networks . . . . . . . . . . . . . . . . 96

4.2.3 Named-Data Networking . . . . . . . . . . . . . . . . . 100

4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

4.3.1 Time-to-Exhaustion of different topologies . . . . . . . 103

4.3.2 Limited NDN Deployment . . . . . . . . . . . . . . . . 108

4.3.3 I/O Speed Effect . . . . . . . . . . . . . . . . . . . . . 110

4.3.4 Routing Protocol Effect in CDN . . . . . . . . . . . . . 111

4.3.5 Heterogeneous Caching . . . . . . . . . . . . . . . . . . 112

4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

5 Conclusion 115

5.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

5.1.1 Data Dissemination in CVST . . . . . . . . . . . . . . 116

5.1.2 Time to Exhaustion . . . . . . . . . . . . . . . . . . . . 117

5.2 Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

viii

Bibliography 119

ix

List of Tables

2.1 Summary of memory technologies [1] . . . . . . . . . . . . . . 19

2.2 NDN and MobilityFirst Comparison . . . . . . . . . . . . . . . 24

3.1 The APIs exposed by XPUB and XSUB services . . . . . . . . 55

4.1 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

x

List of Figures

2.1 NDN Protocol Stack [2] . . . . . . . . . . . . . . . . . . . . . 14

2.2 Structure of NDN Packets . . . . . . . . . . . . . . . . . . . . 15

2.3 NDN Forwarding Process . . . . . . . . . . . . . . . . . . . . . 16

2.4 The MobilityFirst architecture [3] . . . . . . . . . . . . . . . . 20

2.5 Mobile Delivery in MobilityFirst [3] . . . . . . . . . . . . . . . 22

2.6 Layered Architecture of CVST Platform [4] . . . . . . . . . . . 27

2.7 Multi-tier Cloud for End-to-End Application Platform . . . . 30

2.8 SAVI test-bed main components [5] . . . . . . . . . . . . . . . 31

2.9 Peak Period Traffic Composition — North America [6] . . . . 35

2.10 Traffic estimation of different types for global and mobile networks 36

2.11 Internet traffic source distribution in 2013 [7] . . . . . . . . . . 40

2.12 Internet’s architecture is changing [8] . . . . . . . . . . . . . . 41

3.1 Application Platform for Smart Transportation . . . . . . . . 45

3.2 Publisher-Broker Communication . . . . . . . . . . . . . . . . 47

3.3 Subscriber-Broker Communication . . . . . . . . . . . . . . . . 48

3.4 High-level architecture of content-based publish/subscribe over

IDD in CVST . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

xi

3.5 Design of the Broker: Abstraction of the complexity of different

system components . . . . . . . . . . . . . . . . . . . . . . . . 56

3.6 Sequence Diagram of the Content-Based Publish/Subscribe Sys-

tem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

3.7 Scalability of the Broker with Micro-service design . . . . . . . 59

3.8 Apache Avro schema used in XPUB-Matcher communication . 63

3.9 Avro schema used in Matcher-XSUB communication . . . . . . 63

3.10 Sample data gathered from traffic sensors . . . . . . . . . . . . 65

3.11 Schema of the traffic sensor data . . . . . . . . . . . . . . . . 65

3.12 Sample subscription for traffic sensor data . . . . . . . . . . . 66

3.13 A match all query . . . . . . . . . . . . . . . . . . . . . . . . . 66

3.14 Data of traffic sensors on the CVST portal . . . . . . . . . . . 67

3.15 Sample data gathered from public transit vehicles . . . . . . . 68

3.16 Schema of for Toronto Public Transit Data . . . . . . . . . . . 69

3.17 A sample geo distance query for public transportation data . . 70

3.18 Publishing Drone Data . . . . . . . . . . . . . . . . . . . . . . 71

3.19 Sample Drone Data . . . . . . . . . . . . . . . . . . . . . . . . 72

3.20 Video playback of a drone flight on CVST portal . . . . . . . 73

3.21 Subscription Portal: Public Transportation Query . . . . . . . 74

3.22 Subscription Portal: Public Transportation Data . . . . . . . . 75

3.23 Subscription Portal: Traffic Sensor Query . . . . . . . . . . . . 76

3.24 Subscription Portal: Traffic Sensor Data . . . . . . . . . . . . 77

3.25 FIB table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

xii

3.26 Interests and Data packets log during XPUB and publisher com-

munication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

3.27 Scalability of the Matching Engine - Experiment Setup . . . . 79

3.28 Scalability of the Matching Engine, one minute rolling average 80

3.29 Scalability of the Matching Engine, five minutes rolling average 81

3.30 Data usage: IDD vs IP — Experiment Setup . . . . . . . . . . 82

3.31 Data usage: IDD vs IP — Results . . . . . . . . . . . . . . . . 83

4.1 Network of a Service Provider . . . . . . . . . . . . . . . . . . 87

4.2 Content distribution in Service Providers . . . . . . . . . . . . 88

4.3 Flows between sources and destinations pass through multiple

links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

4.4 Time-to-exhaustion. Traffic is increasing monthly until network

is congested. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

4.5 Feasibility model for CDN . . . . . . . . . . . . . . . . . . . . 99

4.6 Feasibility model for NDN . . . . . . . . . . . . . . . . . . . . 102

4.7 Rocketfuel network . . . . . . . . . . . . . . . . . . . . . . . . 103

4.8 DGM network . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

4.9 Tree network . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

4.10 Time-to-exhaustion in Rocketfuel network . . . . . . . . . . . 106

4.11 Time-to-exhaustion in DGM network . . . . . . . . . . . . . . 107

4.12 Time-to-exhaustion in Tree network . . . . . . . . . . . . . . . 108

4.13 Changes in TTE of Rocketfuel topology with number of caches 109

4.14 Link utilization of NDN vs CDN . . . . . . . . . . . . . . . . . 110

4.15 Changes in TTE of Rocketfuel topology with I/O limit . . . . 111

xiii

4.16 Changes in TTE of Rocketfuel topology with Routing algorithm 112

4.17 Heterogeneous vs Homogeneous caching storage in NDN . . . 113

xiv

Chapter 1

Motivations

Current Internet is a product of four decades of evolution. Today, the rapid

growth of contents and the number of connected devices is changing the archi-

tecture of the Internet. The Internet was designed for different circumstances,

at a time when the primary concern was sharing resources. There were few

and expensive computers and their accessories, with few connections between

them. Therefore, host-to-host communication model became the central prin-

ciple of the design of the Internet. In this design, each machine must have

an IP address and follow the TCP/IP protocol to be able to communicate to

other machines in the network. Although TCP/IP has been doing the job

well, today’s network is not all about end-to-end communication between two

hosts. Let us go over some challenges that the Internet is facing.

1

Chapter 1. Motivations 2

1.1 Challenges of Current Internet

A variety of things are expected to get connected to the Internet, billions of

them. These things operate over multiple domains such as transportation, en-

ergy, weather, construction, health, agriculture, etc. This phenomenon, known

as the Internet of Things (IoT), is changing the architecture of the Internet.

These devices are highly heterogeneous and have hardware constraints. They

have lower power consumption, CPU, and memory usage in order of mag-

nitudes. They usually have multiple interfaces over different communication

protocols and lack or have a limited number of configuration options. TCP/IP

is an end-to-end communication protocol and expects the application layer to

provide such services. Therefore, in a constrained environment of IoT devices,

using TCP/IP as a communication layer will be very challenging.

The Internet traffic is also rapidly growing due to Over-The-Top (OTT)

and Video-on-Demand (VoD) services such as Netflix and YouTube. Video

traffic is now consuming most of the bandwidth on the Internet. A more

detailed analysis shows that Netflix (31.6%) and YouTube (18.7%) combined,

account for over 50% of downstream traffic in fixed access [6]. This growth is

another force that is changing the architecture of the Internet. The content

providers are exploiting the economies of scale and using Content Delivery

Networks (CDN) to transfer this everyday increasing traffic, which exacerbates

the change. CDNs were introduced to overcome the limitations of traditional

Web caching systems by deploying several caches throughout the globe and

populating these caches with the popular content during the off-peak traffic

hours. Some content providers are very keen to work with service providers


(SP) to provide these caches. For example, Netflix OpenConnect program is

rapidly expanding its coverage by offering to install and maintain the caches in

the SP’s network. But using TCP/IP for content delivery is quite inefficient.

A Gigabyte of content, like a TV Show, can generate a petabyte1 of transient

data. Contents, such as live video streams, are transferred over the Internet

multiple time, which puts enormous pressure on the infrastructure.

1.2 Possible Solution: Information-Centric Net-

working

New networking paradigms such as Information-Centric Networking (ICN)

provide solutions to these problems. ICN is a clean-slate network architecture

for future Internet. It has named-data at the core of the networking, and

names are decoupled from content location, applications, storage or media of

transport. Decoupling data name and its location gives ICN native support

for mobility since the users only need to know the name of the content and

not where the content is located. It also supports data security and privacy

requirements by enabling digital signature and encryption. This solution is not

only agnostic about the source of the content but also gives us the capability

of in-network caching for all contents. In-network caching will help to place

popular content near the consumer to lower the latency, will result in a better

utilization of the infrastructure and will increase the throughput.

An IoT platform is a substrate that offers data collection from a diverse

11 PB=10005 bytes =1015 bytes = 1000 terabytes


set of sensors operating in different domains. The substrate should be able to

transfer various types of data generated by these sources and decouple data

collection and delivery. Not only, the substrate must provide data validation

and integrity, but also must guarantee secure communication. Data sources

must be able to respond to pull-based and push-based data requests. At the

same time, the platform should provide support for middle-wares and value

added services such as data processing and aggregation. Such requirements

make ICN a potential alternative networking solution for an IoT platform.

Also, built-in support for in-network caching and multicasting in ICN im-

proves the utilization of underlying infrastructure by removing redundant flows

of the same content and helps the providers to control the extensive cost of last-

mile technologies. Furthermore, detecting popular contents and storing them

in caches near the edge of the network will decrease the latency, and moving

away from host-to-host communication model and employing a strategy layer

will improve content delivery in the mobile environment.

1.3 Contributions

This thesis makes the following contributions.

1.3.1 Data Dissemination using ICN in Smart City Plat-

forms

The urban population of the world is growing. By 2050, 2.5 billion people will

be added to world’s urban population [9]. This growth poses major difficulties


for cities to meet objectives such as the quality of life and the socio-economic

development of their citizens. The vision of a Smart City is a response to

these challenges. One of the major obstacles in the path to Smart Cities is

the current heterogeneous technologies use in cities and their lack of inter-

operability. Therefore, a unified platform for Internet of Things can become

the building blocks of the Smart City concept, both at the infrastructure and

service level [10].

A Smart City Platforms requires collecting data from a heterogeneous set

of data sources in various domains, mobile and fixed. Also, the platform shall

anonymize, cleanse and check the integrity of the collected data. It shall send

the received data, in various formats, to interested parties and shall guarantee a

secure data transfer. The platform shall provide different methods for accessing

the data streams, which include content as well as event notifications. For

example, a customer shall be able to pull the data, and another one may

register to receive notifications from the system upon the availability of the

data. The streams have diverse requirements for provenance, privacy and

security. And last but not least, the platform shall be scalable to cope with

the daily increase of the number of data sources and data sinks.

We present a platform to gather data streams from a wide range of data

sources including road cameras, loop detectors, planned and emergency road

closures, fixed and mobile traffic sensors, drones, social media networks, public

transit vehicles, etc. This platform makes the data available to a broad range

of customers using a novel data dissemination layer. We based the design of

the data-dissemination layer on Information-Centric Networking, which inher-


ently enables content mobility, caching, and security. Here we will focus on the

Named Data Networking (NDN) implementation of ICN. NDN does not in-

herently support event notifications. Therefore, we enhanced NDN to add the

push notification capability. We present a Naming design for our system that

ensures we can use the inherent features of NDN, such as in-network caching,

scalability and mobility [11].

We implemented an ICN-aware content-based publish/subscribe system

using the data-dissemination layer. In this system, data sources are publishers

that send their data updates to a network of brokers. A user can express its

interest in the data updates through a set of subscription queries and subscribe

to the notification of the availability of the content that matched the queries.

The broker registers the subscriptions queries and matches the newly published

data against them and then notifies the subscribers.

1.3.2 Content Delivery in Service Providers

Exponential traffic growth due to the increasing popularity of Over-The-Top

Video services has put service providers under much pressure. By promoting

in- network caching, Information-Centric Networking is a promising paradigm

to answer current challenges in the service provider’s domain. In this work, we

report on a cache placement and content routing strategy for service providers

to delay the onset of congestion of their network. We aim to optimize the

capital expenditure of their limited capacity planning budget. We show that

even a limited deployment of ICN provides a substantial increase of the onset

of congestion of the network and a decrease in the number of links with high


utilization [12].

1.4 Thesis organization

The rest of this document is organized as follows. First, we provide in Chap-

ter 2 a review of the general knowledge required for proper understanding of

this thesis, including Information-Centric Networking and Content Delivery

in the Internet. Then, Chapter 3 focuses on the design and implementation

of the data dissemination layer. Chapter 4 focuses on how Service Providers

may delay the congestion of their network by using Information-Centric Net-

working. Each chapter provides evaluation results for the proposed methods.

The thesis concludes with Chapter 5, which summarizes the contributions and

provides an outlook on future works.

Chapter 2

Background and Related Works

In this chapter, we will review the concept of Information-Centric Network-

ing in Section 2.1. We go over Naming, Name-based Routing and In-Network

Caching, and then we review different implementations of ICN paradigm. We

review CVST, a platform for Smart city applications in Section 2.2. In Sec-

tion 2.3 we survey current technologies used for content delivery over the In-

ternet, such as Content Delivery Networks.

2.1 Information-Centric Networking

Information-Centric Networking (ICN) [2, 3, 13] is a clean slate networking

paradigm that tries to solve current networking problems by replacing the

host-to-host communication model. ICN puts the data at the focus center

of the network and then designs the facilities necessary for transferring that

data. Using ICN, users express their interest for content and then the network

is responsible for providing that content for them. In ICN, it does not matter

8

Chapter 2. Background and Related Works 9

where the content is stored, and the roles of identifier and locator of the content

are decoupled. In the current architecture, IP plays both of these roles.

ICN assigns a name to the data itself, not the content container that stores

that data. Once content is created it has a name that cannot be changed, which

is similar to the way version controlling systems work in software programming.

Content routers then use this name to route and forward data requests to the

authorized sources. Since the routing is based on the name instead of the

host address, network efficiency can be improved by using in-network caching.

Therefore, if a router has already cached the data, it can answer the data

request itself. Otherwise, the request, based on its name, is forwarded to

the next hop for processing. This decoupling also provides better support for

user’s mobility. Most of the ICN designs also include an inherent protection

and authentication of data itself, in contrast with encrypting the connection

between the two parties in the current layout.

2.1.1 Concepts

In this section, we will review the concepts and terminologies that are common

between ICN designs.

Naming

As discussed earlier, one of the problems of current Internet architecture is

that IP addresses are playing the role of both locator and identifier of the

information. HTTP URLs are translated to IP addresses using DNS and the

IP addresses are mapped to the location of the content server. Therefore, the


location of the data is attached to its name. Any change in the location of the

data will result in changing its name, and there is no consistent way to keep

track of identical copies of data in different places. To solve this problem, ICN

decouples content from its location. This decoupling shifts the paradigm from

current host-to-host communication to a hop-to-hop communication model be-

tween network entities. When a consumer requests data, the network provides

that data from any authorized source. One of the first benefits of this model is

that only the receiver can retrieve the information, and no data can be received

unless the receiver requests it. This one way requesting, is different from the

current architecture that anyone can send data to any IP address in the net-

work. ICN designs put Naming at the core of the networking model, which

makes it the most important part of designing of an ICN model. A naming

model answers three questions [14]:

• validity: The ability to check no one has tampered the content, usually

by having a verifiable digital signature.

• provenance: The ability to bind the data with the content publisher,

usually using its public key.

• relevance: The ability to map the content to the original request.

Name-based Routing

In ICN, after the receiver sends a request, the network will find the authorized

source for the data and will retrieve the content. It follows that all ICN

designs should do named-base routing. Also, naming data creates the ability


to aggregate all the requests for that data and intrinsically provides multicast

forwarding capability.

In-Network Caching

By decoupling information and its location, named data can be stored any-

where in the network, i.e. in-network caching. In-network caching is accom-

plished without any overlay and is an intrinsic part of ICN networks. In-

network caching is an improvement over the way routers’ storage is used today,

which is only for buffering packets. In ICN when a router receives an interest

for content, if it has it in the cache, it can provide it immediately.

Security

In TCP/IP, security is achieved by encrypting the transmission channel plus

authenticating the end points of communication. In this model, there is no way

to provide the authenticity of the data itself, and we have to trust the container

of the data. Moreover, TCP/IP is designed to forward any traffic towards

the destination, which results in an imbalance of power between senders and

receivers. This imbalance creates the ability, for attackers and spammers,

to launch Distributed Denial of Service (DDoS) attacks. However, in ICN,

content can be protected against alteration or eavesdropping and only genuine

copies of the data can exist in the network. Also, ICN architecture is receiver

driven which prevents DDoS attacks.


Mobility

TCP/IP was designed with the fixed and immobile hosts in mind, but today,

we are facing with a sharp increase in the number of connected mobile devices.

The network that the host is attached to determine the IP address of the host.

Therefore, the IP address of a mobile device will change if it moves to other

networks. This change of address will result in a distributed connection of

every TCP/IP active session on the device. Some workarounds using different

overlay solutions may be used to remedy this problem. These solutions come

with many inefficiencies since the problem is in the TCP/IP design. Moreover,

IP networks must forward traffic on spanning trees to avoid loops and cannot

make full use of multiple connections of a particular host. ICN will tackle both

of these problems. ICN can take full advantage of multiple connections that a

device has and efficiently manages the communication using all of them. The

reason is, in ICN, there is no end-to-end connection, and every device is only

talking to its next hop.

2.1.2 A Brief History of ICN

The introduction of the idea of separating names and locators goes back

to TRIAD project [15]. However, the Data Oriented Network Architecture

(DONA) [13] is one of the first complete ICN designs. DONA uses a flat

name architecture that replaces current hierarchical names (URLs) by using

the notion of self-certifying names.

A self-certifying name is a tuple of the cryptographic hash of the public

key of the content publisher, P, and a unique label, L, as an identifier of the


data that is published under that name. L can be a cryptographic hash of

the content, which makes the label unique and the data immutable. Entities

that are interested in that data will learn its name from a trusted external

source, such as a search engine for names. The name is self-certifying because

anyone who has access to the public key of the publisher can verify the re-

lationship between the data, the content publisher and the label. The name

resolution and routing is done using servers called Resolution Handlers (RHs).

DONA uses source routing by querying these RH servers, which returns a set

of network links that a request must traverse to reach its destination. DONA

is compatible with current Internet architecture, but the requests take a long

path to be able to reach their destination, which causes unnecessary delays.

Moreover, source routing information creates overhead in the packet header.

In addition to DONA, there are many other proposed architectures for

ICN, such as PURSUIT [16–18], SAIL [19], COMET [20] and CONVER-

GENCE [21], but here we review Named Data Networking [2] and Mobili-

tyFirst [3].

2.1.3 Named-Data Networking

Named-Data Networking (NDN) [2] is a fully-fledged ICN architecture, which

initially introduced in a Google Talk [22] by Van Jacobson and developed

as Content-Centric Networking in PARC [23]. Then, NDN [24] started as

an NSF-funded Future Internet Architecture project that began in 2010, in

collaboration between 12 campuses.

Fig. 2.1 shows the NDN vision of the Internet protocol stack in comparison


IPpackets

email WWW phone ...

SMTP HTTP RTP ...

TCP UDP ...

ethernet PPP ...

copper fiber radio ...

CSMA async sonet ...

Every node

copper fiber radio ...

Individual apps

Individual links Strategy

Security

File Stream ...

browser chat ...

Contentchunks

IP UDP P2P BCast ...

Figure 2.1: NDN Protocol Stack [2]

to current Internet protocol stack. The narrow waist of the hourglass is a layer

with minimal required functionality and plays the role of a universal agreement

between the hosts that want to communicate over the network. Currently, IP is

playing this part, but in NDN, content chunks are the global agreement. NDN

envisions that it will operate over various networking technologies, including

TCP/IP.

NDN Architecture

NDN uses two kinds of packets for data delivery, Interest packets, and Data

packets. These two kinds are analogous to the way TCP Data and Ack packets

work. The difference is, in NDN, the consumer sends an Interest packet and

then receives the Data packet corresponding to that Interest from the network.

However, in TCP, the server sends the Data, and the client responds with an

Ack. Fig. 2.2 shows the architecture of these packets.


Interest Packet Data Packet

Name Name

(order preference, publisher filter,exclude filter, …)

Selectors MetaInfo

Nonce

Guiders(scope, Interest lifetime)

Content

Signature

(content type,freshness period, …)

(signature type, key locator,signature bits, …)

Figure 2.2: Structure of NDN Packets

To be able to forward contents hop-by-hop based on names, NDN uses

three tables, Pending Interest Table (PIT), a Forwarding Information Base

(FIB), and a Content Store (CS). The PIT, which has a list of all the Interests

and their incoming interface, will prevent duplicate forwarding of an Interest

and will satisfy the pending Interest packets when the corresponding Data

packet arrives from the authorized source. The FIB is a table that matches

name prefixes to output interfaces, and the CS will cache the incoming Data

packets so that the node can satisfy future Interest packets.

When an Interest packet arrives (Fig. 2.3) in an NDN node, the node will

first check the CS, then the PIT and then the FIB table. Routers use the

longest match lookup to match the Name in the Interest packet to the FIB

entries. When a Data packet arrives, the node first checks the PIT, and if

there is a match, optionally stores it in the CS. Because each Interest packet

will result in one Data packet, the flow balance in balanced, and the Data

packet will always take the reverse path of the Interest packet.


ContentStore

Pending InterestTable (PIT)

FIBInterest ✗ ✓✗

forward

✓Data ✓add incominginterface

✗

drop orNACK

ContentStore

Pending InterestTable (PIT)

✗

Data✓forward

discard Data

cache

Downstream Upstream

✗lookup miss ✓lookup hit

Figure 2.3: NDN Forwarding Process

Strategy Layer

The strategy layer (Fig. 2.1) uses the information in PIT and FIB tables to

find the best forwarding path for an Interest packet. For example, an adaptive

forwarding strategy will make an informed forwarding decision about which

interfaces will be used to forward a particular Interest packet based on the

number of Interest packets cached in the PIT table. It can also balance the

forwarding of Interest packets among multiple interfaces, detect failures and

choose alternative forwarding paths. An effective strategy layer may use multi-

path forwarding capability of NDN to avoid congestion and failures. Strategy

layer may also handle the transmission of control messages among neighbor

routers.

Strategy layer plays a major role in optimizing the utilization of underly-


ing infrastructure, especially in a mobile environment, where packet delivery is

unreliable. In mobile transmission, Interest packets or Data packets might get

lost or damaged, or connectivity is interrupted. Strategy layer may be used

to re-transmit the Interest packets that are not satisfied within a reasonable

period. Although, ultimately the transport is receiver driven and the applica-

tion that originates the Interest packets will be responsible for the unsatisfied

Interest packets.

For example, when a client sends Interest packets for content, the routers

along the path between the client to the authorized source can cache the

corresponding Data packets. If during the transmission, the client moves to

a new network, some Data packets will not reach the client. However, upon

joining the new network, the client’s strategy layer will be triggered to reissue

the Interest packets for the missing Data packets. This new Interest packet

will fetch the data from the nearest upstream content store that has cached

the Data packets. On the other hand, if the authorized source of the data

moves to another network some issued Interest packets by the client will not

reach the source, and they will eventually time out. The strategy layer will

reissue them until the content is completely retrieved.

In Chapter 3 with discuss the design of a data dissemination layer. In

this design, the strategy layer acts as a load balancer. Multiple instances of

a service are registered under the same name and the strategy layer sends

Interest packets to them in round robin. Furthermore, in Chapter 4 by using

the strategy layer we optimize the content routing in the network and delay

the network congestion.


NDN Naming

NDN is using hierarchical names similar to URLs but not necessarily human

readable, for example, a video can be named /cvst/videos/sample.mp4. In

NDN, contents are divided into chunks and each chunk is immutable and

has a unique name. For example, /cvst/videos/sample.mp4/_v2/_s1 points

to the first section of the version 2 of the sample.mp4. Usually the first

section of the latest version of the content is represented by a path similar to

/cvst/videos/sample.mp4. Names are hierarchical, similar to HTTP, which

allows efficient aggregation in routing tables and fast lookup. Interest can refer

to names that do not exist and publishers can generate content for that name

on the fly.

Name prefixes are usually globally meaningful, similar to domain names in

HTTP, but they can also refer to a local context such as /home/projector.

These naming conventions for pieces of data are not part of NDN, but it can

be designed to provide the ability of relative data retrieval for applications.

Naming design plays an important role in enabling full potentials of NDN

in an application. We go over the Naming design of our event notification

layer in Chapter 3.

Performance and Scalability

Many studies [25–30] researched the performance, scalability, and practical-

ity of Named-Data Networking. For example, authors in [1] try to evaluate

if the NDN model can be implemented using today’s technology. At first, a

comparison of current memory technologies is made, since the memory access


Table 2.1: Summary of memory technologies [1]

Technology Access time (ns) Max Size

TCAM 4 ~20MbSRAM 0.45 ~210MbRLDRAM 15 ~2GbDRAM 55 ~10GBHigh-speed SSD 1,000 ~10TBSSD 10,000 ~1TB

latency is the bottleneck of today’s router design. Table 2.1 summarizes mem-

ory technologies and their access latency. The authors use HashCache [31] to

implement the indexing required for Content Store, PIT, and FIB tables.

Authors propose the use of 40 bits for indexing the hash tables to reduce

the collision. Besides, Bloom filters are used to do the longest prefix match in

the FIB table. They propose that if a name is using B components, the router

uses B Bloom filters to query for each potential prefix match and then query

the hash table to detect possible false positives. The memory bits needed for

Bloom filters are five to twenty times the number of items in the FIB table.

If one wants to store 250 million entries in the FIB, current approximate of

global unique host names, the router required 1.5 GB of off-chip RLDRAM

for the index of the hash and 4 Gbits of on-chip SRAM for each bloom filter.

2.1.4 MobilityFirst

The MobilityFirst [3] puts mobile devices as the first-class citizens of its archi-

tecture and focuses on handling delay/disruptive tolerant networks in addition


Figure 2.4: The MobilityFirst architecture [3]

to multi-homing, multi/any-cast support and security. MobilityFirst assigns

a 160-bit Globally Unique Identifier (GUID) to any entity, such as devices,

contexts or data. GUIDs are either assigned randomly or are generated by a

global or local Name Certification Service (NCS) as a self-certifying hash of

the public key of the publisher. For example, a single video content will have

the same GUID everywhere in the network. By assigning GUID to all net-

work objects, MobilityFirst supports both host-to-host and hop-by-hop data

transformation.

Using a Global Name Resolution Service (GNRS), the GUIDs are mapped

to one or more topological network addresses and are used as the authoritative

header for routing. Therefore, both flat name addressing and network based

addressing are used, which is called a hybrid GUID and network address based

routing scheme. Routing table size is reduced, but the need for a distributed


service for name resolution is required, which is implemented as a distributed

hash table hosted by network routers.

As shown in Fig. 2.4, suppose John wants to receive content on all of his

devices. He first registers them using an NCS, which assigns the same GUID to

all of his devices. These devices upon link establishment with the network will

register their Network Addresses (NA) in GNRS. The sender, using NCS, looks

up the GUID and then, using send or get functions, sends/gets information

from/to them. The packet of this request will have a source and destination

GUID, and a Service IDentifier (SID). SID shows the delivery method, e.g.,

unicast, multicast or anycast. The packet can also include a set of network

addresses of the destination GUID, which are resolved by using GNRS. The

network address resolution task can also be delegated to the content routers

in the network, as well.

If due to movement or link disruption message delivery fails, the packet is

stored in content routers in the network, and then the routers will periodically

query the GNRS for rebinding the destination GUIDs and network addresses.

Fig. 2.5 shows a scenario of temporary disconnection of a mobile node. Delivery

to Node NA99 is failed due to the device movement to another network with

a new network address NA75. The new network connection establishment

will trigger a new GUID/address rebind in GNRS. The last content router,

which has cached the data, is constantly querying GNRS for new network

address. When that router receives this new address, it will retry to send the

data to its destination. MobilityFirst employs a Generalized STorage-Aware

Routing (GSTAR) mechanism, which is a link state routing protocol, at the


Figure 2.5: Mobile Delivery in MobilityFirst [3]

intra-domain level to better support disconnections and delays of mobility and

variable link conditions [32].

In addition to on-path caching on intermediate routers, MobilityFirst also

provides off-path caching ability. Off-path cached versions of content are all

known to the network with the same GUID, and all the cache servers register

themselves in GNRS with a different network address. When a client sends a

get request with an SID set to anycast, the content will be transmitted from

the nearest cache server.

2.1.5 ICN Design Selection

In this section, we do a comparative study between NDN and MobilityFirst.

We first discuss the commonalities and then focus on the differences between


these two models. NDN and MobilityFirst share three commonalities in their

implementation.

Receiver Driven

Both implementations first publish the content, which means they advertise

the availability of the content and then consumers subscribe or request that

content. The request does not have to happen at the same time or know the

location of the published content. In NDN, content availability is advertised

and then the consumer can send an Interest for that content. In MobilityFirst,

data is registered in NCS and is assigned a GUID and then the consumer

requests that GUID by sending a get packet with the GUID of content as the

destination.

In-Network Caching

Since the name and locator are decoupled, in both NDN and MobilityFirst,

a router can directly serve the content from its local cache or forward the

request to the next hop. In-network caching happens regardless of the protocol

used for the transportation. However, there is a difference between NDN and

MobilityFirst on the amount of data they cache, which goes back to the way

content is named on these two models.

Content-oriented security

Contrary to current security models that are based on securing the path that

data takes to reach the consumer, ICN models can secure the content itself.


NDN MobilityFirst

Naming Names are hierarchical Flat naming, singlecomponent

Routing Local routing based on arouting table Uses a distributed hash table

Caching On-path and Off-path On-path and Off-path

Mobility Sending new Interests androuting table updates

Late binding of name andnetwork address by routers

SecurityData packets include

signatures. Distributed trustmodel

Uses self-certifying names

DeveloperFriendly

Open source, available inmany programming

languages

Closed source, limitedavailability

Table 2.2: NDN and MobilityFirst Comparison

Both NDN and MobilityFirst have the content signed by the content creator.

NDN puts this signature inside the data packet, so every packet of data will

include its verifiable signature. MobilityFirst uses self-certifying naming and

the signature is placed in the name of the content.

Now we focus on the differences between NDN and MobilityFirst models.

There are three areas where these two models drift apart and use different

implementations.

Naming

NDN provides the Naming requirements (Section 2.1.1) by including the sig-

nature of the content, including its name in the data sent to the consumer.

This signature ensures that the name and content are bound together, verifi-


able by the user. Naming in NDN can be local, and neither requires to have

a particular structure nor needs to be globally unique. On the other hand,

MobilityFirst can use self-certifying names for GUIDs. A self-certifying name

provides the binding between the name and the content; however, the name is

not in a human-readable form and cannot take arbitrary architecture. More-

over, users must use other means such as search engines or applications to find

the name of the content they need.

Routing

An ICN network must be able to route the content to consumers. To be

scalable, NDN uses a hierarchical naming structure and consolidates name

prefixes to reduce the routing tables. The size of the routing table is at least

the size of unique prefixes in the network. MobilityFirst translates the names

to a network address using Global Name Resolution System (GNRS) and then

maps network addresses to interfaces in each router. The size of this routing

table is bounded by the number of routers in the network. The trade off is

between having a global name resolution system and large routing table.

Narrow Waist

ICN designs, including NDN and MobilityFirst, use hop-by-hop communica-

tion between ICN layers, which can be done over either IP or any of another

local delivery protocols. However, to be able to have a global connectivity,

each design must have a narrow waist. NDN defines chunks of data as the

narrow waist of the network. Therefore, content will be divided into chunks,


and each chunk will have its name and digital signature. Different services

such as transport protocols are implemented over this structure. Mobility-

First puts Name-based service layer as its narrow waist. These service layers

use GUIDs for all the network-attached objects including hosts, content, and

services, and enable a series of APIs that can be used by upper layers.

Table 2.2 shows a comparison between NDN and MobilityFirst. We chose

NDN as out ICN implementation. The protocol is very simple in design and it

needs much fewer components to function, which makes it suitable for limited

deployment. Also, NDN naming method has many commonalities with the

way HTTP and MPEG-DASH name contents and Names are human readable.

NDN implementation is available open source, with development kits available

for many programming languages and is used in many open-source projects.

Furthermore, our work can be extended by using MobilityFirst as the ICN

implementation.

2.2 CVST Platform

The rapid rate of urbanization globally has become a challenge to municipali-

ties and governments. Smart cities arises as a promising solution to challenges

of urbanization that involves the gathering and analysis of the information

from different sources in real-time. For example, in existing traffic manage-

ment systems, data is frequently not shared or readily available outside its

agency’s domain. These limitations affect the performance of the real-time

analysis of data and reliable detection of the root causes of traffic jams.


IaaS

PaaS

Information-Centric Networking

BIaaSPublish/SubscribeOverlay

AlgorithmicEnginesEngines

APIsAnaly�c

SaaS Portal CustomKPIs

UrbanPlanning

Conges�onpricing

ThirdPartyApps

…

Physical Resource Orchestration

Figure 2.6: Layered Architecture of CVST Platform [4]

Providing city information in a single platform is a key enabler to effective

city management. Connected Vehicles and Smart Transportation (CVST) [33,

34] is an open and scalable platform for developing smart city applications.

The platform consists of four main building blocks [4]. Fig. 2.6 depicts the

multi-layer architecture of the CVST platform. The lowest layer is the In-

frastructure as a Service (IaaS) layer, which provides resource management in

a cloud environment. This layer is based on SAVI (Section 2.2.1) cloud and

provides resources that can be scaled up/down/out to adjust to the varying

demands of applications.

Platform as a Service (PaaS) layer is divided into two parts. The bot-

tom sub-layer is responsible for the end-to-end multi-domain orchestration,

and it uses capabilities from SAVI. The top sub-layer is concerned with data

dissemination. The data dissemination layer of CVST platform has these re-

quirements:


a. Collect data about the city from variety of sources with different types

b. Support mobility of data sources.

c. Meet privacy requirements of different data types.

d. Guarantee secure data transmission.

e. Allow customers to pull data from the platform on demand.

f. Notify the customers of data availability, e.g. push notifications.

g. Have the ability to scale out to support new data sources and data sinks.

h. Be optimized for best performance.

i. Provide services such as data anonymization, cleansing, verification on

top of data collection.

The Business Intelligence as a Service (BIaaS) layer provides an analytics

platform to extract statistics, to data trends, and to identify data patterns.

The BIaaS applies different techniques, such as stream analytics, for purposes,

such as KPI analysis. This layer provides a set of APIs (Application Program-

ming Interfaces) that are used by both internal and external applications.

BIaaS uses content-based publish/subscribe to collect and send data.

The Smart applications as a Service (SaaS) layer offers a set of smart

city applications. Applications such as real-time dashboards and monitoring

systems, traffic flow optimizations, and route assistance may be provided by

public or private organizations. These applications use the APIs provided by

other layers to access the raw or processed data.


2.2.1 Smart Application on Virtual Infrastructure

CVST uses an infrastructure that operates on virtualized resources, managed

using IaaS and PaaS principles. Smart Applications on Virtual Infrastruc-

tures (SAVI) is an initiative to build a test-bed for research and development

of future Internet architectures and applications. SAVI [35] project explores

the role of virtualization and software-defined infrastructure in application

platforms and provides necessary tools for doing experimentation in deploying

future application platforms. SAVI provides large scale computing, storage,

and fast network fabric over a cloud infrastructure.

All resources, computing, networking and others, are managed by a single

management system to offer enabling services. The resources required sup-

porting CVST span multiple resource tiers. As shown in Fig. 2.7, these tiers

are spread across a large geographic extent from remote massive core data cen-

ters to smart edge resources located closer to the user, to Customer Premise

Edge (CPE) resources, such as sensors, near the user or environment. Each

tier provides services that vastly differ in its processing, storage, and network-

ing capacity requirements. This three-tier application platform has been built

and deployed in the SAVI test-bed.

As shown in Fig. 2.8, SAVI test-bed is designed and implemented to help

to overcome challenges in implementing and testing new network applications.

It provides resource management, scalability, reliability, security, and account-

ability to facilitate rapid development of applications.

Software-Defined Infrastructure (SDI) is an approach where a software

manager manages virtual and physical resources in a converged fashion. The


Figure 2.7: Multi-tier Cloud for End-to-End Application Platform

SDI manager is hierarchical to ensure scalability and to handle heterogeneity.

Each resource type is controlled by one or more associated controllers, which

themselves interact with the SDI Manager. The controllers also communicate

with a topology manager that provides an integrated view of all resources, and

monitoring and analytics system. The SDI manager provides coordination and

an infrastructure-wide view to the resource controllers and results in a more

efficient resource management.

SAVI test-bed consists of eight nodes and has been in operation across

Canada since 2003. The CANARIE and ORION networks provide the Layer 2

connectivity between SAVI core (i.e. datacenter) and smart edge nodes. Each

node has its SDI manager on top of OpenStack and OpenFlow.

SAVI offers CVST the flexibility in resource management, a unified archi-

tecture, support for deployment of heterogeneous and programmable physical

and virtual resources, and powerful resources required for data analytics and

intelligence. There are services such as VM migration and multi-layer moni-

toring that are used to improve resiliency and robustness of CVST.

SAVI provides a set of heterogeneous physical resources. Some resources

that SAVI provides in this infrastructure include:

a. High-performance server blades with multi-core CPUs


AccessNode

EdgeNodeEdge Node

EdgeNode

EdgeNode

SAVIDedicatedNetwork

SAVITBControlCenter

CoreNode

CoreNode

Experiment/ApplicationX

Experiment/ApplicationY

VirtualNetworkX

VirtualNetworkY

AccessNode

AccessNode

AccessNode

AccessNode

Figure 2.8: SAVI test-bed main components [5]

b. Dedicated bare-metal machines with dedicated networking resources,

available in different flavors including high performance and low power

c. Graphics Processor Units (GPU) attached to bare-metal machines

d. Programmable Hardware using NetFPGA available as both attached to

bare-metals or as a standalone network device

2.2.2 Publish/Subscribe Systems

CVST platform uses publish/subscribe paradigm for data dissemination. Event

processing systems use patterns such as Publish/Subscribe between different

parties. Publish/Subscribe is a building block in many applications such as


social media, financial systems, and network management. Publish/Subscribe

decouples data sources and sinks and is an effective pattern for large-scale data

dissemination systems.

There have been some attempts at having publish/subscribe system using

Information-Centric Networking. The authors in [36, 37] propose to change

the design of the NDN and add a built-in notification system to it. Instead,

we believe that the being receiver driven is at the heart of the NDN paradigm

and can answer the needs of a high-performance publish/subscribe system.

Publish/Subscribe is an abstract for an information dissemination paradigm

that moves information from a set of content creators (Publishers) to content

consumers (Subscribers). Publishers create the content and emit it to the

system and then the system notifies the interested subscribers. The communi-

cation between the publishers and the subscribers can either happen directly

or be facilitated by a set of broker servers. Subscribers can define their interest

in the contents by various models. The most popular subscription model is

topic based subscription. Publishers create the contents and attach a label

or topic, and then, the system sends this content to the subscribers who are

interested in that topic.

Typically, each subscriber has a distinct, possibly different, interest in the

same data. Subscribers should not only be able to specify the topic of the

data, but also describe some conditions on the data itself. Every data source

is a publisher that sends its data to the broker and then the subscribers will

receive the data from the broker based on some conditions. A publish/sub-

scribe system that subscribers can express a criterion on the published data is


a Content-Based Publish/Subscribe. The subscribers will receive all the pub-

lished data matched with their criterion [38, 39]. Elvin [40], SIENA [41], and

PADRES [42] are some examples of content-based publish/subscribe systems.

CVST aims to collect information from a variety of data sources. Many

of these data sources have a constrained environment and have limited avail-

able resources. Therefore, they cannot run complicated applications. Current

publish/subscribe systems support content-based publish/subscribe paradigm

in their application layer and inherit the shortcomings of TCP/IP paradigm.

These systems require to either have a sophisticated application layer or con-

figure the network layer specific to their applications.

To support mobility in a TCP/IP, the network must keep the TCP session

alive by using rendezvous mechanisms [43]. These solutions either bring lim-

itations to the type of applications supported by the network or compromise

the security of the system. Furthermore, to support security, applications are

responsible for data encryption and integrity checks.

For example, PADRES has no built-in support for security and involves the

application layer in the data multicast and routing. PADRES only supports

a limited number of data formats. Furthermore, PADRES has to deal with

the mobility of publisher and subscriber in the application layer. The result

is an application layer that becomes more and more complicated. A complex

application layer cannot run on resource constrained devices. On the other

hand, it is very hard to scale a complex application. With the increase of

cloud-based solutions, it is more desirable to have applications that can scale

out by running similar instances that can share the workload.


In contrast, Information-Centric Networking decouples content from its lo-

cator, applications, storage and media. Therefore, the network supports not

only caching and multicast, but also mobility, security, and scalability. In

ICN the job of content distribution is done by the network, not the appli-

cation. Furthermore, we extend ICN to support real-time event notification,

which is required for a publish/subscribe system. In this work, we present the

dissemination layer for CVST platform using Information-Centric Network-

ing paradigm. We also discuss the details of the design and implementation

of content-based publish/subscribe overlay using this dissemination layer in

Chapter 3.

2.3 Content Delivery over Internet

Video traffic is consuming most of the bandwidth on the Internet. Fig. 2.9

shows the peak traffic composition of North America reported for the second

half of 2013 [6]. It shows that Real-Time Entertainment is responsible for over

67% of downstream bytes during the peak period for fixed access and 40% for

mobile access. More detail analysis of the data shows that Netflix (31.6%)

and YouTube (18.7%) combined, account for over 50% of downstream traffic

in fixed access. Not to mention, this is just the beginning of the problem.

Nowadays, users watch videos on YouTube and Netflix, or share files us-

ing BitTorrent. Content dissemination, driven by video-centric services, has

caused an exponential growth of Internet traffic. As Fig. 2.10a shows, by 2017,

traffic generated by IP Video and file sharing will be in the range of 80 to 90


(a) Fixed Access (b) Mobile Access

Figure 2.9: Peak Period Traffic Composition — North America [6]

percent of the total IP traffic of the Internet. IP Video consists of Internet

video, IP Video on Demand (VoD), video streamed gaming and video con-

ferencing. Globally, IP video traffic will account for 73 percent of traffic in

2017 [44]. Fig. 2.10b shows similar estimation for mobile network [45].

Fig. 2.10a shows that compound annual growth rate of Video traffic is

estimated to be about 69% [44]. In April 2014 alone, Netflix reached 50 million

subscribers, and started streaming in UltraHD (4K). Hulu reached 5 million

paying subscribers, Comcast released their own cloud based DVR and AT&T

announced their intention of investing $500 million in streaming video business.

These are all indications, that in the next few years, many similar video services

will be offered to the consumers.

The other problem that operators are dealing with is mobile traffic. By

2017, mobile traffic will surpass wire and will account for about 55% of total

IP traffic. Fig. 2.9b shows that, during the peak period, Real-Time Entertain-


(a) Global Consumer IP Traffic [44]a

aThe percentages within parenthesesnext to the legend denote the relativetraffic shares in 2012 and 2017.

(b) Mobile Video Will Generate Over69 Percent of Mobile Data Traffic by2018 [45]a

aFigures in parentheses refer to trafficin 2018.

Figure 2.10: Traffic estimation of different types for global and mobile networks

ment traffic is the most dominant and it is accounting for almost 50% of the

downstream transmission on the network. Fig. 2.10b represents the estimates

of the mobile video traffic and the fact that it will generate most of the mobile

traffic growth through 2018 because the video has much higher bit rate than

other mobile contents.

Between 2013 and 2018, mobile video will grow at a compound annual

growth rate of 69% and it will reach 11 Exabyte1 per month of the total 15.9

Exabyte of mobile traffic. This growth rate is the highest among all other

mobile content categories. Even today, mobile video represents more than

half of global mobile data traffic, which brought its own challenges to service

providers. Video traffic will be the largest part of the traffic with the highest

growth rate, both globally and mobile, and we have to get ready to face this

11 EB =10006 bytes =1018 bytes = 10000 terabytes


challenge eventually [44].

Content delivery over the web has been historically from one server to mul-

tiple clients. With the rapid increase of the web users, two problems arose.

First, service providers could not handle all the web traffic, which led to putting

web caches in their network. Web caches [46] are the first attempt of having

in-network storage. The major problem with web caches was the content in-

consistency, because there was no coordination between content providers and

the cache owners. Nevertheless, with the rapid increase of user’s connection

speed, web caches lost their usefulness.

Second problem was the web servers could not handle the load anymore. To

fix this problem, content providers shifted toward multiple servers to multiple

clients model. In this model, usually a load-balancing server forwards the

requests to different servers based on their current load. However, due to

high expenses only the big companies who had enough resources were able to

incorporate this method; hence, we saw the rise of Content Delivery Networks.

2.3.1 Content Delivery Networks

Content Delivery Networks [47] provide multi server to multi client paradigm

for everyone. We can divide the building blocks of CDNs into the following

categories [25]:

Storage

CDNs have a vastly distributed network over the Internet and host the contents

of their customers in many locations. CDNs were only hosting static contents


in the beginning, but now dynamic content hosting is also available. Because

of the relationship between CDNs and content providers, content inconsistency

problem of web caches is eradicated. Content is given to CDNs and they will

replicate it on their servers across the globe. These servers are usually placed

at the Internet peering points or inside of the operator’s network.

Structure of CDNs is complex, mostly because of how TCP/IP was de-

signed, as a host-to-host communication protocol. CDNs lack a fine grain

control over the placement of their servers, the service is provided for a subset

of applications and there is no collaboration between different CDNs.

Request Routing

CDNs are a distributed implementation of the multi-server to multi-client

model and they forward the requests from clients towards the best server.

Request routing in CDNs plays the role of load balancing and is usually done

using Domain Name Service (DNS). As we know, DNS is designed for trans-

lating domain names to IP addresses, but CDNs are exploiting it for load

balancing.

When a user sends a request for a URL, a DNS request is sent to the user’s

recursive DNS server. This server will forward the request to the authoritative

DNS server responsible for the domain name. Then the authoritative server

will respond with the IP address of the server that hosts the requested content.

Now consider if the authoritative DNS server sends a different IP address based

on different parameters such as location of the user’s recursive DNS server,

content availability or server condition, we will have a simple load balancing


mechanism and this is what CDNs usually do [48]. Another possibility is that

authoritative DNS resolve to an Anycast IP address, and then the request will

be routed to the nearest authoritative DNS server, where each one can return

a different IP address to the user.

This process heavily relies on DNS, which arises several issues. For exam-

ple, when a user uses Google’s DNS, the recursive DNS will not represent user’s

true location. This method also assumes that user’s recursive DNS server fol-

lows the TTL of the DNS response and removes it after the expiration, but

DNS protocol does not guaranty this and in practice, it is not followed [49]

either.

Authoritative DNS server can also return an Anycast IP address to the

user, which will result in a similar load balancing effect. The difference in this

method is that the system will be less DNS dependent, but the trade-off is

that every cache server should have exactly the same content. Furthermore,

the IP routing will be using the shortest path algorithm only and not other

factors like server load.

2.3.2 Content Provider’s Cache

The Internet is changing very fast. Fifty percent of the traffic is generated by

35 services only (Fig. 2.11). Compare this with 2007, where fifty percent of

traffic came from thousands of web sites and 2009, where fifty percent of traffic

came from 150 web sites. These significant content sources, such as Netflix

and Google, are creating their own version of a Content Delivery Network.

Content providers are putting their own caches inside operators’ network to


Figure 2.11: Internet traffic source distribution in 2013 [7]

serve the users. This architecture is a win-win situation for both the content

providers and network operators, since the operators will also save the traffic

that otherwise would pass their Internet exchange connection.

Some content providers are very keen to work with operators to provide

these caches. For example, Netflix Open Connect [50] program is rapidly

expanding its coverage, since they offer to install and maintain a cache for

free in an ISP’s network. Netflix will save money by not using a traditional

CDN while increasing its customers’ quality of experience because customers

will experience lower delays. Netflix uses a proactive caching technique where

they place their popular content in the cache during off-peak hours and then

later serve that content to users. Large content providers, datacenters and

CDNs not only directly connecting to each other, but also, by bypassing tier-1

providers are connecting to operator’s network [8]. These interconnections are

changing the face of the Internet from a hierarchical architecture to a flatter


Sprint, MCI, UUnet, PSInet

NAP NAP

ISP1 ISP2 ISP3

NationalBackboneOprators

RegionalAccess

Providers

LocalAccess

Providers

CustomerIP Network

Consumers and business customers

(a) Traditional Internet Logical Topology

IXP

ISP1 ISP2

Global InternetCore

Regional / Tier 2Providers

CustomerIP Network

Consumers and business customers

IXP IXP

Hyper GiantsLarge Content, Consumer, CDN

Global Transit /National Backbones

(b) Emerging New Internet Logical Topol-ogy

Figure 2.12: Internet’s architecture is changing [8]

one (Fig. 2.12).

2.3.3 Transparent Caching

Transparent cache [51] servers are caches deployed by the operators directly

in their network and have a full control over them. Simply put, a transparent

cache looks at all the contents of different applications such as video in the

operator’s network and serves them directly if possible. For example, it will

detect if a video content is getting popular, then caches it locally and serves

the users from that local cache. Transparency is important and the cache does

not meddle with user’s requests such play, pause or fast-forward or the adver-

tisements that the content provider puts in their content. This transparency

allows using these nodes to cache different types of contents from different

sources without the need of an agreement between the operators and content

providers. These equipments are usually expensive and use techniques such as

deep packet inspection.


Service providers use caching to delay congestion in their network as much

as possible. In Chapter 4, we review how caching and routing effect the onset

of congestion of the network and will show how using ICN for content delivery

is beneficial for service providers.

2.3.4 Cache Placement in ICN

There is a wealth of literature on cache deployment in the context of ICN [26,

52, 53] some with contradictory results. The authors in [54] provide an an-

alytical model of the cache miss probability of a single caching system and

extend it to a network of caches. In [26], the authors use different central-

ity metrics for sizing storage in a content-centric networks, but couldn’t find

an incentive for heterogeneous caching. The authors in [55] solve a budget

constrained caching problem in Content-centric networking context and note

that topology has a significant impact on the optimal cache placement. They

have considered hop counts as the base metric for optimizing cache placement.

Reference [25] studies the evolution of CDN and its challenges and shows how

ICN paradigm can help to overcome them.

2.4 Summary

In this chapter, we reviewed Information-Centric Networking. We compared

two implementations of this paradigm, Named-Data Networking and Mobil-

ityFirst. Also, we discussed the CVST platform and the requirements of its

data dissemination layer. We also studied how Content Delivery Networks


distribute content optimally until the network of service providers, and the

effects of the Over-The-Top on Service Providers.

Chapter 3

Data Dissemination in CVST

In this chapter, we describe the design and implementation of the ICN-based

data dissemination layer of CVST and the content-based publish/subscribe

overlay using that layer.

In Section 3.1, we will discuss the detail design of our ICN-based Data

Dissemination (IDD) layer. In Section 3.2, we review the architecture of the

content-based publish/subscribe system in CVST and in Section 3.3 we will

discuss implementation detail of the system. In Section 3.5, we review the

performance tests and evaluations of the system.

3.1 Data Dissemination using Information Cen-

tric Networking

Fig. 3.1 denotes the major building blocks of the CVST platform including

data ingestion through publishers, data dissemination layer, analytics and al-

44

Chapter 3. Data Dissemination in CVST 45

DatabaseSubscriber(RawData)

Broker

API

Portal

DataForm

at

DataValidation

DataCleansing

No

Yes

Publisher

DataAnonymization

Applications(Subscribers)

Alg.Engine

CongestionPricing

Routing

AnalyticsEngine

Simulation

Data Dissemination (over ICN)

Figure 3.1: Application Platform for Smart Transportation

gorithmic engines, application programming interfaces (APIs), and the end-

user portal. As depicted in Fig. 2.6, the IDD is the top sub-layer of PaaS layer.

Its task is to disseminate arbitrary streams of various data from any source to

any destination. Sources may include road sensors, cameras, social application

feeds, public transportation GPS traces, construction events, incident reports,

open data, and private data that can only be accessed securely. Data streams

can be real-time or retrieved from data stores. The system must be extensible

and be able to accommodate new sources of information. The IDD sub-layer

also provides data verification and integrity, privacy, and security.

The communication layer in Fig. 3.4 is based on NDN paradigm [24]. In

NDN, naming is one of the most important parts of application design. It can

affect the performance and complication of the system. Names are used to

route Interest packets towards the destination and to select the applications

responsible for processing the Interest packets. Following a proper Naming de-

sign, NDN provides support for data mobility, provenance and integrity. How-


ever, one of the requirements of the CVST platform is to have real-time event-

notification capability and NDN does not inherently support that. Therefore,

we extended NDN with event-notification capability to unify content distribu-

tion and event notification. We have two naming design in the IDD layer. One

is for publisher-broker communication and another one for subscriber-broker

communication.

3.1.1 Publisher-Broker Exchange

Each publisher, on start, will have a conversation with the broker to let it

know that it is alive. The publisher will send an Interest packet that can also

contain some configuration in the naming. We call this the “start” process.

Data name, denoted in Fig. 3.2, consists of four parts. The first part is the

name of the broker, here /broker, the second part is the action of this Interest,

here /pub/start, and the third part is the full name of the publisher. The

last part is the publisher-specific configuration that is encoded and added to

the name and is read by the broker. This configuration includes a sequence

number to be used by the broker.

After the publisher sends out the start Interest packet, the broker will

respond with an acknowledgment packet, including some configurations. The

broker will also immediately sends out an Interest packet to the publisher.

The name of this Interest packet consists of three parts. The first part is the

name of the publisher, the second part is the action /data and the last part is

the sequence number of the latest data that the broker has already received.

At the start, the sequence number is a random number and will increase


Publisher Broker/broker/pub/start/<pub_id>/<config>

ACK

New Data

/publisher/data/<seq#>

/publisher/data/<new seq#>

Figure 3.2: Publisher-Broker Communication

over time. The publisher satisfies this Interest packet when it has new infor-

mation available to publish. Then the brokers will receive that information

and immediately sends the next Interest packet. The Interest packets at the

publisher may expire without any new data. The expiration of the Interest

means that if the publisher does not generate any new data for a while, it can-

not send it to the broker. However, this is not a problem, since the publisher

will re-initiate the “start” process discussed above periodically, and will receive

another Interest packet from the broker. It must be noted that the sequence

number is a choice made by the publisher.

3.1.2 Subscriber-Broker Exchange

Similar data exchange will happen on the subscriber side as denoted in Fig. 3.3.

On start, the subscriber will send its alive status to the broker. This name of

this Interest packet consists of, the name of the broker, the action /sub/start,

and the full path of the subscriber, plus the subscriber’s specific configuration.

The broker acknowledges this action and responds with a set of configurations.


Subscriber

/<sub_id>/match/<data_name>

/broker/sub/data/<#sub_id>/<seq#>

/borker/sub/<#sub_id>/<new seq#>

/broker/sub/start/<sub_id>/<config>

ACK

Broker

/<data_name>

Data

Figure 3.3: Subscriber-Broker Communication

Then the subscriber sends an Interest packet for the data notification. The

name of this Interest consists of the name of the broker, the action /sub/data,

the path of the subscriber and the sequence number of the data the subscriber

has already received. When the broker acquires a data from the publisher that

matches what the subscriber has requested, the name of that data will be sent

to the subscriber. The subscriber will then use that name to claim the data.

After receiving of the data, the subscriber sends the next data request.

Subscribers are required to send the Interest packets periodically. This

periodic Interest acts as a heartbeat for the subscriber and provides more

flexibility for the broker. For example, the broker will be able to pause the

data matching for a subscriber if the heartbeat stops. It is also possible to

register a subscriber without the heartbeat. In that case, the subscriber will

provide a callback name in the registration process and the broker will send an

Interest similar to the start Interest of the publisher to notify the subscriber of


the data. Then the subscriber will send the data request Interest accordingly.

3.1.3 Discussion

In this section, we discuss the reasoning behind our design and its benefits in

different scenarios.

Simple is better than complex

One of the advantages of the presented naming design is to simplify the ar-

chitecture of the broker and to minimize the necessity of keeping the state of

the system as much as possible. For example, in the publisher-broker commu-

nication, a publisher either sends out the “start” Interest to notify the broker,

or satisfies the Interests from the broker. On the other hand, the broker only

needs to have access to the latest data it has received from the publisher.

The same reasoning may be applied to the side of the subscriber. By

employing the heartbeat, the subscriber will become responsible for keeping

its status alive for the period of its interest in the data. The broker can be

configured to stop data matching for the subscriber due to lack of heartbeat

signal.

Mobility

One of the main advantages provided by the IDD layer is the mobility support.

The IDD layer uses point-to-point communication protocol, unlike TCP, which

is an end-to-end protocol. Nodes will communicate by knowing the name of the

data that they are interested in, and the network will route that data towards


the destination. So a mobile publisher will continue to receive Interest packets

from the broker, and the broker will receive the data, without the need to

re-initiate the communication.

If the network becomes partitioned, the publisher will not receive the In-

terest packets from the broker. When new data is available, it will go through

the “start” process, since there is no pending Interest from the broker by the

publisher side. Based on the publisher configuration, this can be repeated

indefinitely. The publisher can also be configured to store the historical data

locally. When the link is back on, the broker will be notified of the existence

of the new data and will send the Interest for the data. The sequence number

in the Interest contains the latest information the broker has received. The

publisher, based on its configuration, may have stored the historical data, and

therefore, will send them to the broker, or the historical data was not needed,

and the last available data will be forwarded to the broker. For example, the

history of a live video stream will not be saved; however, the log of a traffic

sensor data will be saved, and if asked by the broker, will be sent over. The

historical data can be purged when the broker sends the new Interest packet

with a new sequence number acknowledging the receive of the data. This sce-

nario also happens if the Interest packet sent from the broker to the publisher

is dropped.

If the data packet from the publisher is dropped, the broker will not send

new Interest packet to the publisher, and there will be no Interest packet by

the publisher side for the next data. To resend the data again, the publisher

will go into the “start” process after a deadline and will notify the broker of


the existence of the data. The deadline is configurable, and it is on the scale

of the round-trip time.

On the side of the subscriber, the heartbeat will be received by the broker

even if the subscriber is mobile. Interest packet acts as a breadcrumb for the

Data packet and data will take the reverse path of the Interest packet until it

reaches the subscriber. If the subscriber has moved from its original place, it

will resend another heartbeat interest. The heartbeat Interest will either be

satisfied by one of the upper layer routers, due to in-network caching, or by

the broker itself.

The heartbeat will also resolve the network partitioning scenario. After

network partitioning, the subscriber will not receive any new data, and the

heartbeat Interest will contain an old sequence number. After resolving of the

network partitioning, the broker will receive a heartbeat with an old sequence

number. If the broker has been configured to store the historical data for

the subscriber, the name of the historical data will be sent to the subscriber.

Otherwise, the name of the new data will be sent over. The same process

happens if the heartbeat Interest or the Data packet is dropped on their way.

As presented earlier, the broker only sends the name of the matched data.

Therefore, the subscriber will be responsible for retrieving that matched data,

which follows the receiver-driven philosophy of the ICN paradigm.

Better Infrastructure Utilization

Two important features of the IDD layer are multicast and in-network caching

capability. Routers store all the incoming interests in a Pending Interest Table


and, upon receiving of the data, satisfy all the Interest packets for the same

name at once. They also use their storage to save the content in their content

store to satisfy future Interest packets. As presented earlier, the broker will

send only the name of the data to the subscribers, and the subscribers will

request that data separately. The broker can satisfy the Interest packets of all

the subscribers with the same data packet, i.e. the broker sends out the data

only once and all the subscribers will receive it.

Scalability

Multiple broker instances may receive the published data for the system. This

scalability is achieved by using the same name for all of them. The best route

selection strategy will send the Interest packets towards the lowest-cost next

hop. Since the sequence number is set by the publisher and increasing over

time, one of the instances of the broker will receive the data.

To handle scalability at the broker, we use the fact that, in our design,

the subscriber will receive the name of the data from the broker and will send

another Interest packet to retrieve the matched data. Thus, the power of

forwarding subscribers to the right place is in the hand of the broker. The

best route selection strategy can also help with the forwarding of the Interest

packets for the matched data to the proper server. For example, the broker can

forward the subscriber to the publisher itself and avoid storing the data itself,

using the fact that data names are unique, and the network is responsible for

retrieving the data.


Security

We must address two problems in the security domain. First, the broker must

accept data only from known publishers. No one should be able to inject data

into the system. Second, no one should be able to access the published data

without authorization.

Every Interest and Data packet are signed to ensure the provenance and

integrity of the data. In the CVST platform, the first problem is solved by

having the broker act as the trustworthy key management system. The bro-

ker issues certificates for publishers and subscribers in the registration phase.

These certificates are used to sign the Interest and Data packets. Through

the broker, everyone has access to each other’s public key and can validate the

provenance of the data.

To solve the second problem, we must use a shared key encryption algo-

rithm. The broker will issue the shared key and provides it to the publishers

and subscribers. Since the broker can validate the identity of the publishers

and subscribers, only the authorized users will have access to the key. The

publishers encrypt their data, and the broker and subscribers decrypt the data

with the key.

3.2 Broker Architecture

CVST collects data from many types of producers, however, the consumers

are typically interested in a portion of the published data. Therefore, CVST

uses a content based publish/subscribe paradigm. For example, the central


Communicatoin Layer (IDD, IP, ...)

Registration/SchemaManager

SubscriptionTable

PublicationTable

DistributedMatching Engine

Publishers Subscribers

Message Queuing

Broker

Figure 3.4: High-level architecture of content-based publish/subscribe overIDD in CVST

database of the CVST platform is a subscriber that receives all the new data

updates from all the data sources. In another example, a drone incident is a

content type that is generated and disseminated in the platform. A subscriber

may only be interested in the drone incidents in a particular area. Fig. 3.4

shows the high-level architecture of the publish/subscribe system in CVST

platform. The system consists of publishers, subscribers, and the broker that

communicate using a communication layer.

Micro-service Abstraction

Using micro-services is an approach in software system design in which the

system is structured into smaller individual service units. Each service runs as


API Type Description

/register XPUB Register a schema in the broker for publication/unregister XPUB Remove the schema from the broker/publish XPUB Publish new data based on a registered schema/subscribe XSUB Register a query in the broker for subscription/unsubscribe XSUB Remove a registered query from the broker/schemas XSUB Request for the list of registered schemas/schemas/<:id> XSUB Request for a specific schema using its id

Table 3.1: The APIs exposed by XPUB and XSUB services

an independent process and communicates with other services through APIs.

To be able to abstract the implementation of different communication pro-

tocols, the broker is divided into three services, XPUB, XSUB, and Matcher.

Fig. 3.5 shows how these services communicate with each other using a message

queuing system.

XPUB is responsible for communicating with the publishers and XSUB

with the subscribers. XPUB and XSUB hide the complexity of different com-

munication layers from the Matcher and provide the ability to add more proto-

cols without affecting other parts of the system. They define a set of APIs that

can be used by publishers and subscribers to talk to the system. They also hide

the complexity of the Matcher from the publishers and the subscribers, which

provides the ability to improve and replace the Matcher without affecting cur-

rent publishers and subscribers. XPUB and XSUB support multiple protocols.

For each protocol, a separate instance of XPUB or XSUB is started.

For example, an instance of XPUB may listen on any address such as

"ndn:/broker/xpub", "tcp://0.0.0.0:4040" or "http://broker:8080/broker/xpub",

as long as XPUB or XSUB has the protocol implemented.


XPUBWorkers

XSUBWorkers

Communication Layer (IDD, IP, ...)

MatcherWorkers

Message Queuing

Figure 3.5: Design of the Broker: Abstraction of the complexity of differentsystem components

Schema Registration

Each publisher must first register itself with the system. Registering a pub-

lisher means the publisher must provide a schema for its data, which will be

used later by the broker to verify the structure of the incoming data from

the publisher. Schema will also be used by subscribers to define the criteria

of their subscription. The registration information of the publisher is saved

in the Publication Table. The publisher may also provide some additional

configuration in the registration process.

Subscriptions

For subscribers, registration involves providing a query based on data schemas.

These queries are based on the schema that publishers have registered in the

system. Subscribers also provide callback paths that are used by the broker to

send notifications about the newly published data that match the registered


Publisher XPUB Matcher XSUB Subscriber

Registraion

Publication

Subscription

Match Notification

Data Retrieval

register schemastore schema

ok

ok

register query

store query

ok

okpublish data

match datadata is matched

send notification

data request

data

Figure 3.6: Sequence Diagram of the Content-Based Publish/Subscribe System

queries.

Message Queuing

All the communication between the components of the broker is facilitated

by using a Message Queuing system. Using message queuing separates the

different components of the system and facilitates distributing them among

the various machines. Message Queuing, itself, is a cluster of nodes which act

as one logical system from points of view of other components.

Matching Engine

The incoming data from a publisher is sent to the XPUB. The XPUB puts

the data in a queue which is then picked up by one of the Matcher’s workers.

At first, the matching engine, based on the Publication Table, checks if the


data conforms to the schema provided by the publisher, if not the data is

rejected. Then the data is matched against the Subscription table. If a match

is found, the data is put back in another queue with additional data from the

subscription. The matched data is picked up from the queue by one of the

XSUB instances, which sends notifications to the subscribers. The matched

data is stored for later retrieval by the subscribers. The sequence diagram of

the publication, matching and data retrieval is depicted in Fig. 3.6.

3.2.1 Discussion

In this section, we discuss the advantages of abstraction and breaking down

of the broker functionality across multiple micro-services.

Agility

By having separate services for different tasks, development can be focused on

individual components independently. Each service can be updated or even

replaced without affecting the other parts of the system as long as there is

an instance in the system that provides the compatible APIs. For example,

matching engine can be updated or even replaced independently of XPUB and

XSUB.

Efficiency

Another advantage of micro-service based design is more efficient use of un-

derlying infrastructure. As discuss in Section 2.2, CVST is running on top of

an IaaS layer which can provide resources on demand. Therefore, each service


W W W W W W

XPUB XPUB XPUB XSUB XSUB XSUB

Matcher Matcher Matcher

MQ MQ

Load Balancer

Publisher Publisher

Subscriber Subscriber

Figure 3.7: Scalability of the Broker with Micro-service design

can independently ask only for the resources it requires, which increases the

efficiency.

Scalability

The Micro-service design provides solutions that can scale well to mitigate

high traffic demands. Although the broker is logically one node, the high

traffic load will be distributed among different system components. Fig. 3.7

shows how the system may run in a distributed way. Each part of the system

can run as a set of instances, which are glued together by the message queuing

system.


For example, an XPUB may consist of multiple instances behind a load

balancer listening on a particular network address. One of these instances will

receive the data from a publisher and sends it to the message queuing system.

Message queuing is itself a cluster of nodes. XPUB workers can connect to any

of the Message queuing nodes and store the data in the queue. The Message

queuing will notify the Matcher workers about the new data. The data is

replicated on the other nodes to protect the system against failure.

One of the available workers of the matching system will pick up the new

data from the queue and match it against the subscription queries. The match-

ing engine is also a cluster of nodes, and all the nodes in that cluster can match

data and registered queries. The workers of the Matcher may connect to any

of the nodes in the matching engine cluster to do the matching, and if there

is a matched subscription, the worker will store the data back in the message

queuing system for the XSUB instances.

Similarly, XSUB load is distributed between its instances by the message

queuing system. One of the workers of XSUB picks up this data and notifies

the subscribers about the new match. XPUB and XSUB do not keep any

state about their clients and expose a set of RESTful APIs. Also, the clients

do not keep any state about the instance of the service, with which, they are

communication. Furthermore, on the network level, using IDD ensures that

there are no connections made between clients and the instances.


3.3 Implementation

In this section, we review some implementation details of different components

of the content-based publish/subscribe overlay.

3.3.1 Broker Implementation

In this section, we review the implementation details of different broker com-

ponents. As depicted in Fig. 3.5, the broker has three main components,

Matcher, XPUB, and XSUB. They communicate with each other using a Mes-

sage Queuing system. XPUB and XSUB communicate to the outside worlds

using the Communication Layer.

Message Queuing

For Message Queuing, we use RabbitMQ [56]. RabbitMQ is an open source

messaging system that is robust and easy to use. It runs on all major operating

system and supports a lot of developing platforms. It also provides clustering

and high availability features.

Matching Engine

The Matching Engine is responsible for checking if the incoming data satis-

fies the set of constraints defined by the queries registered by the subscribers.

Running Queries is one the main features of any database engine. However,

matching data against a query, i.e. a reverse query, is not a standard feature.

To have a fast, distributed and reliable Matching engine, we have chosen Elas-


ticsearch [57]. Elasticsearch is a distributed search engine that provides the

reverse query capability known as Percolator [58]. We also used Elasticsearch

to store Publication and Subscription tables. When a publisher registers its

schema or a subscriber registers its query, the corresponding data will be saved

in Elasticsearch for later retrieval.

Matcher, XPUB and XSUB

Matcher, XPUB and XSUB are implemented in Python language. As discussed

in Section 3.2.1, each component runs as a set of independent processes that do

the same job in parallel. For example, the workers of Matcher pick up the data

from RabbitMQ and match them against the subscriptions using Elasticsearch

(Fig. 3.4) and then put the result back in another RabbitMQ queue.

3.3.2 Communication Layer

The system supports two communication protocols: IDD and HTTP. IDD

communication is based on the design discussed in Section 3.1. The HTTP

APIs provide a similar URL syntax as the IDD layer. Publishers and Sub-

scribers have the option to choose either of these protocols to communicate

with the broker.

Data Serialization

We use Apache Avro [59] as the data serialization system. Apache Avro is a

sub-project of Apache Hadoop [60] and uses a compact binary data format

with a rich data structure and integrates with many developing platforms,


1 {2 "namespace": "ca.cvst.broker",3 "type": "record",4 "name": "xpub",5 "fields": [6 {"name": "publisher_schema", "type": "string"},7 {"name": "data", "type": "bytes"}8 ]9 }

Figure 3.8: Apache Avro schema used in XPUB-Matcher communication

1 {2 "namespace": "ca.cvst.broker",3 "type": "record",4 "name": "xsub",5 "fields": [6 {"name": "subscribers", "type": "map", "values":7 {"type": "array", "items": "string"}8 },9 {"name": "data", "type": "bytes"}

10 ]11 }

Figure 3.9: Avro schema used in Matcher-XSUB communication

without the need for code generation. Apache Avro relies on data schema to

read and write the data. It also supports schema exchange in a connection

handshake. An Apache Avro schema is a JSON1 document.

For each type of published data, every part of the system, such as pub-

lishers, subscribers, and the broker will use the same schema to encode and

decode the data. Using one schema throughout the system for each data source

ensures the consistency of the data everywhere. For example, if the publisher

uses an unknown, invalid or tampered schema, the broker will not verify the

data and will drop it. On the other hand, using Apache Avro provides the

capability of schema evolution without disruption of the system functionality.

In other words, if the data schema changes, each component can still use the

1JavaScript Object Notation


old schema until the new schema is propagated to every component in the

system.

XPUB-Matcher Communication

XPUB workers listen on one or multiple addresses for new data publications.

The publications are received from the publishers in a binary format, seri-

alized by Apache Avro. Then, XPUB adds the schema name of the pub-

lishers and stores the data in the queue for Matcher. Fig. 3.8 shows the

schema used for XPUB-Matcher communication. The schema has two fields.

“publisher_schema” is the name of the publisher, such as “TTC” and “data”

is binary data encoded by Apache Avro.

Matcher-XSUB Communication

If the published data matched any of the subscriptions, Matcher will put the

data back in the query so XSUB workers can notify the subscribers. We use

“headers” exchange in RabbitMQ to send data to multiple XSUBs only once.

In addition to the published data, Matcher will also include the list of the

subscribers that each XSUB must notify. Then the data is sent to the sub-

scriber. The subscribers choose the communication protocol in the registration

process. Fig. 3.9 shows the Apache Avro schema used in the Matcher-XSUB

communication. The field, “subscribers”, is the list of subscribers callback

addresses that the XSUB must notify about the new “data”.


1 {2 "main_road_id": "C09-00069",3 "main_road_name": "HWY -2",4 "ref_road_ID": 4308,5 "ref_road_name": "Kingsway",6 "length": 2.145277 "JAM_FACTOR": 1,8 "avg_speed_capped": 51.15,9 "avg_speed_uncapped": 51.15,

10 "free_flow_speed": 55.92,11 "confidence": 0.92,12 "timestamp": 146311830213 }

Figure 3.10: Sample data gathered from traffic sensors

1 {2 "namespace": "ca.cvst.schemas",3 "type": "record",4 "name": "hw_sensor",5 "fields": [6 {"name": "main_road_id", "type": "string"},7 {"name": "main_road_name", "type": "string"},8 {"name": "confidence", "type": "float"},9 {"name": "ref_road_ID", "type": "int"},

10 {"name": "ref_road_name", "type": "string"},11 {"name": "length", "type": "float"}12 {"name": "JAM_FACTOR", "type": "int"},13 {"name": "avg_speed_capped", "type": "float"},14 {"name": "avg_speed_uncapped", "type": "float"},15 {"name": "free_flow_speed", "type": "float"},16 {"name": "timestamp", "type": "long"}17 ]18 }

Figure 3.11: Schema of the traffic sensor data

3.4 Examples

In this section, we review some examples that are published using our control-

based publish/subscribe system. We review three publications: traffic flow

sensors, public transportation, and live video feed of drone flights. We also

review our subscription portal, which can be used to create subscription queries

and receive publication data in real-time.


1 {2 "bool": {3 "must": [4 {"match": {"main_road_name": "HWY"}},5 {"match": {"main_road_name": "401"}},6 {"match": {"main_road_name": "Express"}},7 {"range": {"avg_speed_capped": {"lte": 60}}}8 ]9 }

10 }

Figure 3.12: Sample subscription for traffic sensor data

1 {2 "match_all": {}3 }

Figure 3.13: A match all query

3.4.1 Traffic Flow Sensors

CVST collects data from the traffic flow sensors installed on the roads of the

city of Toronto. A publisher receives the raw data from a live feed, and before

publication, parses, validates and cleans them. Fig. 3.10 lists a sample data

received from traffic sensors.

main_road_id is a unique string for the main road the sensor covers.

main_road_name is a text description of the road. ref_road_id is a unique

identifier for the location of the sensor. ref_road_name is the text description

of the location of the sensor. length is the length of the road that is covered

by the sensor in kilometers. JAM_FACTOR is a number between 0 and 10 and

indicates the expected quality of the travel. As the number approaches ten,

the quality of travel is getting worse. For example, when there is a road clo-

sure, the Jam Factor will be 10. avg_speed_capped is the average speed of the

road in km/h capped by the speed limit. avg_speed_uncapped is the average


Figure 3.14: Data of traffic sensors on the CVST portal

speed of the road in km/h not capped by the speed limit. free_flow_speed

is the free flow speed on this part of the road. confidence is an indication

of how the speed was determined and is usually a value between 0.7 and 1.0.

If the road is closed, the value is -1. timestamp is a Unix time epoch which

indicates the time the data has been generated.

The publisher will use the Apache Avro schema listed in Fig. 3.11 for

data serialization. Each schema has a namespace and a name. The type

of the schema is always “record”. Each schema defines a series of fields

that maps directly to their corresponding fields in the data. As shown in

Fig. 3.11, namespace is set to ca.cvst.schemas, and the name of the schema

is hw_sensor. Therefore, the Fully qualified domain name (FQDN) of the

schema is ca.cvst.schemas.hw_sensor. For each data field in Fig. 3.10 there

is a field in the schema. For example, main_road_id is defined as a field with


1 {2 "vehicle_id": 1007,3 "coordinates": [4 -79.50425,5 43.7791486 ],7 "routeNumber": "41",8 "route_name": "41-Keele",9 "dirTag": "41_0_41A",

10 "heading": "216",11 "predictable": true,12 "GPStime": 1463363577,13 "last_update": "Mon, 16 May 2016 01:53:00 -0000",14 "timestamp": 1463363581,15 "dateTime": "Mon, 16 May 2016 01:53:01 -0000"16 }

Figure 3.15: Sample data gathered from public transit vehicles

the type of string, and JAM_FACTOR is defined as a field with the type of int.

A subscriber can define a query based on the schema in Fig. 3.11. Fig. 3.12

lists a sample query that asks the broker to send the data of the sensors on

HWY 401 Express that report a speed less than or equal to 60 km/h. Here, the

match against HWY 401 Express is defined as a combination of three smaller

match conditions. In addition to the road name, a range condition is defined

for the avg_speed_capped. All of these conditions are wrapped in a must

segment, which acts as a logical and operator.

A subscriber can receive all the data published by a publisher by registering

a match_all query as listed in Fig. 3.13. The central database in the CVST

platform (Fig. 3.1) is one of the subscribers that receives all the data and makes

them available to be processed by other parts of the system, such as analytics

engine. The portal server is another subscriber to the data and notifies the

web clients of the changes in the data, and the web clients will update their

interface accordingly. Fig. 3.14 shows the presentation of the traffic sensor


1 {2 "namespace": "ca.cvst.schemas",3 "type": "record",4 "name": "ttc",5 "fields": [6 {"name": "vehicle_id", "type": "int"},7 {"name": "coordinates", "data_type": "geo_point",8 "type": {9 "type": "array", "items":"double"}

10 },11 {"name": "routeNumber", "type": "string"},12 {"name": "route_name", "type": "string"},13 {"name": "dirTag", "type": "string"},14 {"name": "heading", "type": "string"},15 {"name": "predictable", "type": "boolean"},16 {"name": "GPStime", "type": "long"},17 {"name": "last_update", "type": "string"},18 {"name": "timestamp", "type": "long"},19 {"name": "dateTime", "type": "string"}20 ]21 }

Figure 3.16: Schema of for Toronto Public Transit Data

data on the CVST portal.

3.4.2 Public Transportation

Another source of data in CVST is the real-time information of the Toronto

Public Transit fleet. Fig. 3.15 shows a sample data reported by public tran-

sit vehicles and Fig. 3.16 shows the schema for that data. vehicle_id is a

unique id for the vehicle that has reported the data. coordinates is an array

of numbers that represent the current longitude and latitude of the vehicle.

routeNumber is a unique id for the route that the vehicle is operating on.

route_name is the text description of the route. dirTag provides more infor-

mation about the route. heading specifies the heading of the vehicle in degrees

and is between 0 and 360. A negative value indicates that the heading is not

currently available. predictable specifies whether the vehicle’s location is


1 {2 "bool" : {3 "must" : {4 "match" {5 "routeNumber" : 416 },7 },8 "filter" : {9 "geo_distance_range": {

10 "from": "50m",11 "to": "1km",12 "pin.location": {13 "lat": 43.779148,14 "lon": -79.5042515 }16 }17 }18 }19 }

Figure 3.17: A sample geo distance query for public transportation data

currently predictable. GPStime specifies the time of the GPS installed on the

vehicle. last_update specifies the last time that the vehicle has reported its

position. timestamp specifies the Unix time epoch of the report. dateTime is

a text representation of timestamp.

Notice that coordinates (Line 7 in Fig. 3.16) has an extra attribute

data_type. This extra attribute provides more information for the match-

ing engine about the nature of the data and adds the capability for the sub-

scriber to define specific queries. For example, coordinates is defined as a

geo_point. Therefore, a subscriber can define a geo-distance query by provid-

ing a coordinate and a distance from that coordinate, and the matching engine

will calculate if the data point falls in the specified area. This data specific

queries are only possible if the matching engine knows in advance that data is

a geo_point. Fig. 3.16 shows a sample geo distance query that asks for the

data of all the vehicles of route number 41 when the distance of the vehicles


OctorotorTelemetryRadio TelemetryRadio

HDCamera DigitalVideoTransmitter

DigitalVideoReceiver

Video Processing

Loca�onProcessing

Publisher

Ground StationSystem

BrokerDB

Subscriber

VideoRecording Engine

DatabaseServer

Portal ServerClients

CVST Platform

PortalSubscriber

VideoStorage

Figure 3.18: Publishing Drone Data

to a particular location is between 500m to 1km.

3.4.3 Drone Vision as a Service

Road traffic information is typically gathered from sources, such as loop de-

tectors, radar detectors, traditional CCTV and infrared cameras and mobile

probes employing technologies such as GPS. Installation of these sources is

costly, and so they are installed only on the main roads and intersections with

high traffic. Some sources such as highway cameras have other limitations as

well. For example, the maximum height of installation of a camera is limited.

Moreover, most of the sources are immobile, and if there are changes in the

traffic pattern in the city, they are not useful.

Visual analytics play an important role by offering immediate surveillance

in small and large cities. However, current monitoring systems are spatially


1 {2 "_id": "1",3 "ect": "Tue, 24 Nov 2015 12:53:33 -0000",4 "geojson": {5 "features": [6 {7 "geometry": {8 "coordinates": [9 -79.3344972,

10 43.70273211 ],12 "type": "Point"13 },14 "properties": {15 "name": "Center"16 },17 "type": "Feature"18 },19 ],20 "type": "FeatureCollection"21 },22 "timestamp": 1448369613,23 "video": {24 "src": "/api/drone_camera/1/video",25 "type": "video/mp4"26 }27 }

Figure 3.19: Sample Drone Data

blind. For example, they can only provide road condition and visual cover-

age at discrete locations with a limited number of traffic cameras and data

sensing devices that are not sufficiently dense to provide on-demand immedi-

ate visual surveillance. Unmanned Autonomous Vehicles (UAV), drones, are

applicable in multiple smart city domains, including transportation, construc-

tion, agriculture, etc. In this section, we discuss how our platform offers an

infrastructure for Vision as a Service (VaaS) using UAVs.

UAVs are good candidates to help gather real-time information with a bet-

ter view and lower cost than the sensors currently in use. However, they need

a platform that can handle their mobility and provide security and real-time

data analysis. They can travel at higher altitude and speed than vehicular traf-


Figure 3.20: Video playback of a drone flight on CVST portal

fic towards the incident location. VaaS provides the city planners the ability

to allocate the required resources, i.e. drones and their associated networking

and computing resources, on demand and extends the coverage of the existing

intelligent transportation systems.

Fig. 3.18 shows the functional blocks of the system. We have deployed an

Octorotor UAV, mounted with an HD camera. The camera signal and GPS

location information are transmitted to a nearby ground station, where the

video feed is transcoded and published to the system.

At first, the publisher publishes to the broker the start of the event. Then

the publisher starts collecting the video and location information from the

drone system. The drone reports its location information to the ground station

using the wireless link used for the control system and the publisher extracts

the location information from the control software. The HD video camera


Figure 3.21: Subscription Portal: Public Transportation Query

installed on the drone sends the video over the high-bandwidth wireless link

to the ground. At the ground PC, the video is encoded to the proper size and

format and then published along with the current location of the drone to the

CVST platform as two separate publications.

The broker distributes the published data to all the subscribers. By default,

a UAV event has three subscribers:

a. Video recording system, which starts workers for recording the video in

proper format and storing it in the right location.

b. The database subscriber that stores the event information, such as drone


Figure 3.22: Subscription Portal: Public Transportation Data

location updates and the URL of the live or recorded video feed.

c. The CVST portal which hosts live streaming and playback and associ-

ated analytics.

At the end of the event, publisher published the end of the event and sub-

sequently the database and the portal will be updated accordingly. Fig. 3.19

shows a sample data that is published during an event. It contains the location

of the event as a GeoJson [61] document, the time of the event in timestamp

as Unix time epoch and the current URL to the video feed. The video feed is

always available to be consumed by the web portal.

The first official live demonstration of VaaS took place in Toronto on Oc-

tober 2015, launching a UAV to a height of 75 meters adjacent to the Don

Valley Parkway. Fig. 3.20 shows a screen shot of the portal while playing a

live stream of a drone flight. The platform has been used to publish live video

feed from drone flights in many demonstrations, and during these demos, the

live video feeds were available on the CVST platform. The recorded videos of

the flights can be viewed on the CVST portal [34].


Figure 3.23: Subscription Portal: Traffic Sensor Query

3.4.4 Subscription Portal

We also developed a web portal that can act as a subscriber and receive live

updates for different queries in real-time. Using the portal, users can register

an account, login to the portal and register their queries for different pub-

lished data types. Fig. 3.21 shows the query builder interface when the user

is creating a subscription for public transport data.

Behind the scene, the portal is using the XSUB API as discussed in Sec-

tion 3.2. The portal queries for available registered publishers in the system

and their schemas by using calling the /schemas api. For example, as depicted


Figure 3.24: Subscription Portal: Traffic Sensor Data

in Fig. 3.21, two publishers are registered in the system. The Field name is

populated based and the Apache Avro schema of the publishers. Therefore,

the portal does not need any hard coded data about the publishers to pro-

vide this functionality and can dynamically support new publishers. Any new

publisher in the system is automatically available to the users.

The portal provides a simple interface for creating subscriptions. For ex-

ample, as shown in Fig. 3.21, user is interested in all the updates for vehicle

id 1003. Multiple conditions can be added to the query at the same time.

Similar to Fig. 3.12, the query contains a match query. Query builder will also

ask for a Time to live (TTL) for the query. After TTL is expired, the query

is removed from the system.

After submitting the query, results will be pushed to the portal as soon as

they are available. Fig. 3.22 shows the live results received from the publisher

based on the query defined above. The fields defined in the schema and their

values are presented to the user in a table. Fig. 3.23 is the subscription portal

while the user is defining a query for the traffic sensor data. Here, the query

will receive the data of highway sensors on roads that their name contain 401.

Fig. 3.24 shows the subscription page while receiving live updates of this query.


1 FIB:2 /xsub nexthops={faceid=262 (cost=0)}3 /hw_sensor nexthops={faceid=259 (cost=0)}4 /xpub nexthops={faceid=261 (cost=0)}

Figure 3.25: Forwarding Information Base table after XPUB, XSUB and pub-lisher are started

1 [ Forwarder ] onIncomingInteres t f a c e=261 i n t e r e s t=/xpub/ pub l i sh / s t a r t /hw_sensor/%FEQ%AE2 [ ContentStore ] f i nd /xpub/ pub l i sh / s t a r t /hw_sensor/%FEQ%AE L3 [ ContentStore ] no−match4 [ Forwarder ] onContentStoreMiss i n t e r e s t=/xpub/ pub l i sh / s t a r t /hw_sensor/%FEQ%AE5 [ Forwarder ] onOutgo ingInterest f a c e=259 i n t e r e s t=/xpub/ pub l i sh / s t a r t /hw_sensor/%FEQ%AE6 [ Forwarder ] onIncomingInteres t f a c e=259 i n t e r e s t=/hw_sensor/data/%FEQ%AE7 [ ContentStore ] f i nd /hw_sensor/data/%FEQ%AE R8 [ ContentStore ] no−match9 [ Forwarder ] onContentStoreMiss i n t e r e s t=/hw_sensor/data/%FEQ%AE

10 [ Forwarder ] onOutgo ingInterest f a c e=261 i n t e r e s t=/hw_sensor/data/%FEQ%AE11 [ Forwarder ] onIncomingData f a c e=261 data=/hw_sensor/data/%FEQ%AE/%FD%01/%00%0012 [ ContentStore ] i n s e r t /hw_sensor/data/%FEQ%AE/%FD%01/%00%0013 [ Forwarder ] onIncomingData matching=/hw_sensor/data/%FEQ%AE14 [ Forwarder ] onOutgoingData f a c e=259 data=/hw_sensor/data/%FEQ%AE/%FD%01/%00%0015 [ Forwarder ] onIncomingInteres t f a c e=259 i n t e r e s t=/hw_sensor/data/%FEQ%AF16 [ ContentStore ] f i nd /hw_sensor/data/%FEQ%AF R17 [ ContentStore ] no−match18 [ Forwarder ] onContentStoreMiss i n t e r e s t=/hw_sensor/data/%FEQ%AF19 [ Forwarder ] onOutgo ingInterest f a c e=261 i n t e r e s t=/hw_sensor/data/%FEQ%AF20 [ Forwarder ] onIncomingData f a c e=261 data=/hw_sensor/data/%FEQ%AF/%FD%01/%00%0021 [ ContentStore ] i n s e r t /hw_sensor/data/%FEQ%AF/%FD%01/%00%0022 [ Forwarder ] onIncomingData matching=/hw_sensor/data/%FEQ%AF23 [ Forwarder ] onOutgoingData f a c e=259 data=/hw_sensor/data/%FEQ%AF/%FD%01/%00%00

Figure 3.26: Interests and Data packets log during XPUB and publisher com-munication

3.5 Evaluation and Performance Tests

In this section, we present some results of our system evaluation and perfor-

mance tests. In Section 3.5.1 we go over the network trace of publishing traffic

flow sensor data. In Section 3.5.2, we test the scalability of the workers of the

Matching Engine by putting the system under heavy load and then scale out

the workers by launching new virtual machines instances. In Section 3.5.3, we

test the performance of data delivery to subscribers using IP and Name-Data

Networking. All of these evaluations are end-to-end tests and involve all the

system components.


Query Servers

Publications

Message Queuing

Worker VMs

....Figure 3.27: Scalability of the Matching Engine - Experiment Setup

3.5.1 IDD Publication Test

Fig. 3.25 lists the status of Forwarding Information Base table of the router

after XPUB, XSUB and Traffic Flow publisher are started. Line 1 is the

face registered by the XSUB, line 2 is the face registered by the Traffic Flow

publisher, and line 3 is the face registered by the XPUB service.

Fig. 3.26 lists the packet log of the start process discussed in Section 3.1.1.

Publisher periodically sends the “start” Interest packet to the XPUB instance.

Line 1 in Fig. 3.26 shows that the Interest packet of the “start” process is re-

ceived by the XPUB, under the name /xpub/publish/start/hw_sensor/%FEQ%AE.

/xpub is the path to the XPUB instance, /publish/start is the action verb

for the “start” process, /hw_sensor is the path of the publisher, and %FEQ%AE

is the binary format of the sequence number of the data available in the pub-

lisher.

Line 6 shows the Interest packet that the XPUB sends back to the publisher

at /hw_sensor/data/%FEQ%AE to request the newly published data. Notice

that the same sequence number is used in the name of the data. Line 11


0

1000

2000

3000

4000

5000

6000

7000

8000

0 1000 2000 3000 4000 5000 6000 7000 0

2

4

6

8

10

12

14

16D

eliv

ery

Rat

e (m

sg/s

)

Num

ber o

f Wor

kers

Time (s)

workers

deliveries.mean(1m)

Figure 3.28: Scalability of the Matching Engine, one minute rolling average

show the data is sent by the publisher to XPUB, properly segmented. Line 12

indicates that data is cached so it will be available for another Interest packets

requesting the same data. Line 15 shows that after the XPUB receives the new

data, it immediately sends an Interest packet requesting for the next sequence

number.

3.5.2 Scalability of the Matching Engine

Since the Matching Engine does most of the computationally intensive job

in the system, we did an experiment to test how it can scale out. Fig. 3.27

shows the setup of the experiment. We used separate machines for different

components of this experiment, the Message Queuing Server, the Query Server,

and Worker Servers are each a separate virtual machine (VM) instance running


0

1000

2000

3000

4000

5000

6000

7000

0 1000 2000 3000 4000 5000 6000 7000 0

2

4

6

8

10

12

14

16D

eliv

ery

Rat

e (m

sg/s

)

Num

ber o

f Wor

kers

Time (s)

workers

deliveries.mean(5m)

Figure 3.29: Scalability of the Matching Engine, five minutes rolling average

on the SAVI platform. Therefore, we can launch as many workers as we

need independently of other parts of the system. To keep workers busy, we

bombarded the message queuing system with new messages and kept it full

throughout the experiment. Then we started workers one by one to consume

the messages and match them against a subscription query in the Query Server.

Over time, the number of workers is increased from 1 to 16, and the rate of

the message delivery is measured every ten seconds on the Message Queuing

Server.

Fig. 3.28 and Fig. 3.29 show the message delivery rate of the message

queuing system over one minute and five minutes rolling average respectively.

To better understand the trend of the data we have superimposed the number

of workers over the delivery rate. The left axes in Fig. 3.28 and Fig. 3.29


Subscribers

....R1 R2

Publications BrokerL1

Figure 3.30: Data usage: IDD vs IP — Experiment Setup

show the delivery rate, while the right axes indicate the number of workers

over time. The increasing trend of the delivery rate follows the number of

workers in the system. For example, when there is one worker, the delivery

rate is about 500 msg/s. Increasing the number of workers to five, increases

the delivery rate to 2500msg/s. This experiment shows the system can scale

out and mitigate a higher load with a higher delivery rate by adding more

parallel workers to the system.

3.5.3 IDD and IP Performance Comparison

Next, we setup a test experiment to evaluate the performance of IDD layer.

Fig. 3.30 shows the test setup. Similar to Section 3.5.2, we continuously publish

our test publications to the broker, and a setup a series of subscribers with

queries that match those publications. We have set up the system in a way that

all the communications between the broker and the subscribers are transmitted

over a single network link, noted as L1. As shown in Fig. 3.30, L1 is between

routers R1 and R2. R1 is the connection point of the broker’s network and

L1, and R2 is the connection point of the subscribers’ network and L1.


0

50

100

150

200

250

300

0 60 120 180 240 300 360 420 480 540 600 660 720 780

Dat

a R

ate

(KB/

s)

Time (s)

IDD

IP

Figure 3.31: Data usage: IDD vs IP — Results

We tested the system with both IP and IDD as the communication pro-

tocol. Throughout the trial, we increase the number of subscribers every 60

seconds and measure the link utilization of L1 every 10 seconds. Fig. 3.31

shows the one minute rolling average of the link utilization of L1 when sub-

scribers are added to the system every 60 seconds. As one can see, when the

subscribers use IP, link utilization of L1 is constantly higher than when IDD

is used.

This difference comes from the fact that IDD only puts one copy of the

data on the wire. All the subscribers are sending Interest packets for the same

data name. After one of the subscribers sends its Interest packet over L1 to the

broker, the subsequent Interest packets are stored in R2. The broker satisfies

the first Interest packet with the matched data. This data reaches R2 on its


path. R2 caches the data and satisfies all of its pending Interest packets. On

the other hand, in the IP-based communication, there is no in-network caching

on the protocol layer and the broker has to send each subscriber a new copy

of the data, which results in a higher link utilization.

To improve the performance of IP-based protocols one must put a cache

near the R2. Then, all the requests from subscribers towards the broker should

be rerouted to that cache, for example, by configuration in the application of

the subscribers or packet level inspection at R2. In IP, data is coupled with its

location, the application and routers. Therefore, the network lacks the sup-

port for mobility and provenance. In IDD, they are decoupled. Therefore, the

application is responsible for creating the content and the network is respon-

sible for delivery it. The application does not have to know about the client

or network configuration, and the network does not need to know about the

application specific packets to forward them to specific caches.

3.6 Summary

In this chapter, we presented our Naming design for Named-Data Networking

to have push notification in the data dissemination layer of CVST platform.

We discussed how the design would provide a simple, scalable communication

layer that has support for security and mobility. We have used this design in a

scalable and distributed implementation of a content-based publish/subscribe

system. Using micro-services provides the publish/subscribe system the capa-

bility to easily scale out and serve more requests. We also demonstrated some


examples, such as live stream of drone events, using this dissemination layer

in the CVST platform.

Chapter 4

Content Delivery in Service

Providers

The CDN architecture is optimized to deliver the content until it reaches the

network of service providers. The Service Providers (SP) usually place Con-

tent Delivery Network (CDN) caches at the Internet Exchange peering points,

connected to the core of the network. Inside the operator’s network, it is a

different matter. The area of the network that is close to the consumer, and

is known as last mile network, is not optimized for Over-The-Top content.

The CDN architecture does not solve the inefficient use of the SP’s network

infrastructure. When the users are requesting the same content, that content

is transmitted over the network of the SP multiple times. In this chapter,

we use time-to-exhaustion (TTE) as our metric and formulate the problem to

place the caches in the network and route the content in a way that TTE is

maximized.

86

Chapter 4. Content Delivery in Service Providers 87

INTERNET CDNCDNCDNCDN

NetflixGoogleAmazonAkamai

3rd PartyCaches

OperatorCDN

Access{ { Last Mile

Agg/Edge

Figure 4.1: Network of a Service Provider

The rest of this chapter is organized as follows. In Section 4.1, the content

delivery problem in a service provider is investigated. Further, details of our

analytical model are discussed in Section 4.2. Simulation results and validation

are provided in Section 4.3.

4.1 Problem Definition

4.1.1 Content Distribution in Service Providers

Fig. 4.1 shows a simplified path that content takes from its source, through the

operator’s network and at last to the consumer. A service provider’s network

usually consists of multiple layers, the Core layer, the Aggregation layer, and

the Access layer. The core of the network transfers the highest volume of data

from various aggregation sites between sources and destinations. The Core

has few points of presence (PoP) and high capacity communication. Content

servers are usually connected to the core through an Internet exchange peering

point. The next level is the Aggregation level and is a concentration point of

multiple distribution centers, which themselves may be connected to smaller

distribution Edge centers. Each center in the aggregation level usually serves


INTERNET NetflixGoogleAmazonAkamai

PeeringPoint

Agg/Edge

(a) Content delivery from Peering Points

INTERNET NetflixGoogleAmazonAkamai

PeeringPoint

Cache

Agg/Edge

(b) Effect of caching on network traffic

Figure 4.2: Content distribution in Service Providers

about one to three million customers. The final layer is called Access layer and

is directly connected to the consumers. For example, a cable provider edge

layer contains cable modem termination systems (CMTS) and each CMTS

serves about 10 to 50 thousand subscribers. Access layer of a wireless service

provider contains several cellular antennas.

Now consider the subscribers that request an OTT video content. As shown

in Fig. 4.2a, for every request for content, a new connection is created between

the content source and the consumer’s machine. Even when all the users are

requesting for the same content, that content is transmitted over the network

multiple times. Note that the source of this content may be either controlled

by the operator itself or come from a VoD content server owned by a 3rd

party CDN. If the content source is a live stream from outside of the network,

operators are faced with an even bigger challenge than for VoD content. Many

consumers watch live stream content concurrently, and the operator does not


have any control over the content coming from outside of the network. This

structure is not scalable and is an apparent waste of underlying resources.

For example, consider a Service Provider in Canada that is serving OTT

content to its users in Ontario and Quebec. If the peering point is in Chicago,

all the traffic requests from users in Quebec and Ontario are served from

Chicago. Installing a cache in Toronto will save a lot of traffic that, other-

wise, would have gone over the network from Chicago to users in Ontario and

Quebec.

Fig. 4.2b shows a network that has a cache near the Aggregation and the

Edge level. All the flows, which were passing through the Core, are now

terminated at a lower level of the network. Therefore, putting a cache in lower

levels of the network saves the extra bandwidth used by multiple transmission

and increases the available capacity of the network. Hence, Content Providers

are putting their caches inside the operator’s network. Netflix, with its Open

Connect program, convinced the operators to set up cache servers even deeper

in their network, in places such as metro areas, to reduce the traffic load on

their core and peering points.

4.1.2 Time-to-exhaustion

In a network with increasing demand, such as service providers, congestion is

inevitable. For a service provider, serving content from a peering points incurs

cost. At the same time, serving more content to users means more revenue.

The demand increase will eventually exhaust the network at some point in the

future unless the onset of congestion of the network is increased. The network


S

S

S

D

D

Figure 4.3: Flows between sources and destinations pass through multiple links

onset of congestion is when the capacity of a link in the network is exceeded,

i.e., the link is congested.

However, the onset of congestion not only depends on the network topology

but also on the pattern of the growth of the demands. For example, network

congestion in a network with a linear demand growth is different from a net-

work with an exponential demand growth. Furthermore, the demand matrix

plays a major role in the onset of congestion of the network. For example,

introducing new services, offering new types of quality of service for content

delivery, or adding new customers change the onset of congestion of the net-

work.

Fig. 4.3 shows how content routing and caching can affect the onset of

congestion. Following the max-flow-min-cut theorem, the maximum amount

of flow passing from the sources to the destinations in a network is equal to

total link capacity of the minimum cut of that network. Now, consider a case

that most of the flows are routed through a critical link, which makes that

link congested sooner than later. Service providers have two solutions to this

problem. The first solution is optimizing the content routing and passing them


0 5 10 15 20 25 30Time (Months)

0

100

200

300

400

500

600

Traf

fic (G

B/s

)

Network Capacity

TTE 1 TTE 2

Demand 1Demand 2

Figure 4.4: Time-to-exhaustion. Traffic is increasing monthly until network iscongested.

through different links. Therefore, the critical link will have a lower average

load over time, and its congestion is delayed. Another option is to move the

flow destinations, e.g. caches, to other parts of the network. In other words,

putting caches in the network will delay the onset of congestion.

For example, assume that the demand is increasing every month. Fig. 4.4

shows such a scenario. If the current demand (shown as Demand 1) in the

network is 100Gb/s and the network can handle a maximum of 400Gb/s, the

current infrastructure will keep up with the traffic for the next 16 months.

However, by placing caches in the network and optimizing content routing,

the demand pattern changes (shown as Demand 2), and the network will stay

congestion-free for another eight months.


The problem that SPs are facing is how to plan their future network to

accommodate the constant increasing of the demand, to provide a congestion

free network and to minimize the costs. Another challenge is that the SP

already has an established network. SPs need time to buy pieces of equipment,

test them, and deploy them in their infrastructure. These investments keep

the network congestion-free for a limited time.

Also, the budget planning process has a time element. In other words,

the budget is planned for a limited period, such as a year. Therefore, Service

providers use the notion of time-to-exhaustion for forecasting. TTE becomes

crucial for network capacity planning since it affects the amount and the timing

of investment in the infrastructure. For example, with a limited budget, the

SP must choose how to plan the additional capacity and where to put the

caches and what type of content should to cached.

We aim to maximize the time-to-exhaustion, considering a limited budget,

by placing caches in the best locations and optimizing the content routing. We

will show that using ICN-based paradigms, such as Named-Data Networking,

will outperform optimal cache placement and content routing in CDN and will

prolong time-to-exhaustion of the network. The strategy layer of the NDN can

be used to route the content in the optimal way.

4.2 Problem Formulation

We model our network as a directed graph G(V,E) with the set of nodes V

and links E. U is the notation for the set of nodes that have a demand for


Constants

V Set of nodesE Set of directional linksG(V,E) Graph of the networkP Nodes that are connected to the peering pointsU Nodes that have a demand for contentsC Nodes that can cache contentsLk Size of the content kαki Demand for content k at node i

Γ+i ,Γ

−i Set of ingress and egress neighbors of node i

rki Maximum rate node i can read content k from its cacheB Total storage available for all cachesV (.) The function that maps storage to its budget valueci,j Capacity of link (i, j)I(.) Indicator function, 1 if the condition is true, 0 o.w.M Maximum number of caches in the networkφsdi,j Shortest-path betweenness of link (i, j) from node s to d

Common Variables

Si Storage at node ipi Decision variable for cache placement at node ihki Decision variable for caching content k at node iβki Total demand by node i for content k

CDN Specific Variables

fkdi,j Flow for content k on link (i, j) going to node dγkds Traffic flow from node s to node d for content k

NDN Specific Variables

fki,j The rate interests for content k is sent on link (i, j)

Table 4.1: Notations


contents. P indicates the set of nodes that can satisfy demands for contents,

e.g Internet exchange points. The set of nodes that are candidates for caching

contents is noted by C. All the notations are listed in Table 4.1.

4.2.1 Demands and Storage Budget

To find the TTE of the network, we will model the network for one time epoch.

We assume that within this time epoch, the demands are known and fixed,

but the location of the caches, the cached content and content routing are not.

We also assume a limited storage budget, B, is available for capacity planning

of all the caches in the network.

Since the demand of each user changes over time following different pat-

terns, we will run an exhaustive search to find the TTE by solving a series

of feasibility problems. A feasibility problem does not have any objective and

will only find a feasible solution to the problem. For each budget value, we

change the demands of the users following a pre-known pattern. If, for a set of

demands, the network becomes congested, the problem will become infeasible.

When the problem becomes infeasible, we have found the TTE of the network.

Final solution of the model provides a cache placement and content routing

policy that maximizes the TTE of the network.

Demands

We denote the demand at each node i for content k by αki . Note that αk

i

depends on time, however, we are solving the problem for each time epoch

separatly. The demand at each node also depends on whether the node i


caches content k or not, denoted by a binary variable hki . In other words, the

traffic of populating a cache is also a demand. Therefore, total demand at

node i can be written as:

βki = αk

i + hki ∀i ∈ C ∪ U (4.1)

Note that βki is the number of the requests for content k, not the size of

the demand. The size of the demand is Lkβki where Lk is the size of content

k.

Storage Budget

Each cache in the network is assigned a part of the storage, denoted by Si.

However, B is the storage budget in dollar value. We assume that the relation

between the amount of storage and its dollar value can be written as a function

V (Si). The sum of all the budgets assigned to caches should be equal to B.

Let pi be the binary variable that decides if node i is a cache. Therefore, the

budget constraint can be written as Eq (4.2).

∑i

piV (Si) ≤ B ∀i ∈ C (4.2)

Here we assume that V (.) is a linear function, however, this can be extended

to any convex function. For each cache, total cached objects can not exceed

the size of the storage of that cache as written in Eq (4.3).

∑k

Lkhki ≤ piSi ∀i ∈ C (4.3)


Also, total number of caches placed in the network can be limited by an

upper bound, M , as written in Eq (4.4).

∑i

pi ≤M ∀i ∈ C (4.4)

To have homogeneous caching, we may also add a constraint that enforces

all Si to be equal.

Cache Replacement Policy and Routing

Caching policy provided by the solution will maximize the TTE of the network.

hki is the binary variable that shows if content k is cached at node i. Solving

the model for two different time epochs with different demands will result

in different hki . The difference between hki for different demands will be the

cache replacement policy of node i. Adopting a certain caching replacement

policy, such as Least Recently Used (LRU) or Least Frequently Used (LFU),

will reduce the TTE of the network.

The solution also provides the content routing policy for the network.

Adopting a routing protocol such as shortest-path will also reduce the TTE of

the network. We study this effect in the result section.

4.2.2 Content Delivery Networks

In service providers, transparent caching is done by putting one or more caches

in the network and re-routing the requests towards them. SPs may also host

the content sources of their own or from third parties. To model this, we define


a multi-commodity flow problem.

Flow Conservation

The flow conservation at node s for content k can be written as Eq (4.5). We

denote fkdi,j as the flow for content k on link (i, j) going to node d and γkds for

the flow for content k from node s to node d. The left-hand side of Eq (4.5)

is the difference between total egress (Γ−s ) and ingress (Γ+s ) flows for content

k at node s that is destined for node d.

∑j∈Γ−

s

fkds,j −

∑j∈Γ+

s

fkdj,s = γkds − Lkβ

ks δ(s− d) ∀s, d ∈ V (4.5)

The right-hand side of Eq (4.5) is the total flow that is originated at node s

towards node d for content k minus the demand at node s for content k. δ(i)

is the Kronecker delta function, it is equal to 1 when i is zero, otherwise it is

zero. Therefore, Lkβks in Eq (4.5) will only have any effect when s and d are

the same node. In other words, the ingress and egress flow destined to node

d at any node other than d is equal to the traffic produced at that node for

node d. When s and d are equal all ingress traffic into node d will be equal to

the demand at node d. Therefore, considering the fact that node d does not

send traffic to itself (i.e. fkdd,j = 0,∀j and γkdd = 0), Eq (4.5) will be reduced to

∑j∈Γ+

d

fkdj,d = Lkβ

kd


Cache Population Traffic

The cache population traffic is satisfied by peering points. Therefore, the total

demand originated at the core (P) of the network, must be bigger than the

size of the cached content (Eq (4.6)).

∑s∈P

γkis ≥ Lkhki ∀i ∈ C (4.6)

I/O and Link Capacity Limits

A node can only become a source of the flow for a content request when it

is a cache and it has the content cached. I(i ∈ C) in Eq (4.7) is equal to 1,

if only node i is a cache candidate. hki will be equal to 1 when the content

k is cached at node i. rki is the rate that each node can read contents from

its cache storage and put them on the wire. It is the limitation of the node’s

hardware, e.g. I/O limit of the node’s hard disks.

∑d

γkdi ≤ I(i ∈ C)rki Lkhki (4.7)

Also, each link (i, j) has a limited capacity, denote by ci,j. The link capacity

enforces that the sum of all the flows to all destinations for all contents be less

than total link capacity, as in Eq (4.8).

∑k,d

fkdi,j ≤ ci,j (4.8)

The complete feasibility problem that models a CDN in the network of a

service provider is shown in Fig. 4.5:


solvesubject to∑

j∈Γ−s

fkds,j −

∑j∈Γ+

s

fkdj,s = γkds − Lkβ

ks δ(s− d) ∀s, d

∑s∈P

γkis ≥ Lkhki ∀i ∈ C∑

d

γkdi ≤ I(i ∈ C)rki Lkhki ∀i ∈ V \ P∑

k,d

fkdi,j ≤ ci,j

βki = αk

i + hki ∀i ∈ V \ P∑i

piV (Si) ≤ B∑k

Lkhki ≤ piSi

Figure 4.5: Feasibility model for CDN

Shortest-path routing

Routing in the network of the service providers is usually based on shortest-

path routing. To study the effects of shortest-path routing, we add a routing

constraint to our model. Shortest-path routing is modeled using the shortest-

path betweenness centrality of each link.

Betweenness centrality (BC) is one of the centrality metrics in graphs [62].

Betweenness centrality measures the degree to which a node or a link is needed

when connecting other nodes along paths. Shortest-path betweenness central-

ity of the link (i, j) with respect to the source node s and the destination

node d, denoted as φsdi,j, is defined as the proportion of the number of the


shortest paths from node s to d that passes through link (i, j). Therefore, the

average traffic for content k that passes through link (i, j) from source s to

destination d can be written as φsdi,jγ

kds . To model shortest path routing we can

add Eq (4.9) to the model. Eq (4.9) will have the link (i, j) to not transfer

any traffic more than its share, if the routing is done using shortest-path.

fkdi,j ≤

∑s

φsdi,jγ

kds s ∈ V (4.9)

4.2.3 Named-Data Networking

Interest Forwarding

To model NDN, we will find the locations that potentially can satisfy more

interest in contents. This notion of interest here is more of the nature of

content popularity in a node, similar to the virtual interest packets studied

in [63], and is different from the Interest packet in NDN paradigm. We denote

fki,j as the rate that interest for content k is forwarded on link (i, j). Since NDN

is a point-to-point protocol we do not have flows from sources to destinations,

but potential interests that move around the network until they are satisfied.

Suppose node s has some interest in content k (βks ). Therefore, the egress

interests (∑

j∈Γ−sfks,j) from node s is increased by βk

s . This is written as an

inequality in Eq (4.10).

∑j∈Γ−

s

fks,j −

∑j∈Γ+

s

fkj,s ≤ βk

s (4.10)

Now consider a node that has a content cached in its content store and


can satisfy interest for that content and remove the interest from the network.

Each node also has an I/O limit for reading its content store that limits the

rate interests are satisfied. Otherwise, the interest will be forwarded towards

other nodes in the network. Therefore, a node can at most satisfy the interests

by the rate that is bounded by its I/O limit, as written in Eq (4.11).

∑j∈Γ−

s

fks,j −

∑j∈Γ+

s

fkj,s + I(i ∈ C)rksh

ks ≥ βk

s (4.11)

Consider the scenario that node s is not caching content k. Therefore,

Eq (4.10) and Eq (4.11) will be reduced to an equality and will enforce that

node s forwards all of its ingress and local interests. However, if node s caches

content k, the ingress interests can be satisfied by an amount bounded by

the hardware limitations of node s. Finding the movement of this potential

interest in the network can be used to find the best place to cache the content.

Link capacity limit

The next step is to model the link capacity constraint. In NDN, Data packets

follow the reverse path of the Interest packet to reach the destination. There-

fore, sending an interest over the link (i, j) will result in the data sent back over

the link (j, i). We can use this to write link capacity constraint as Eq (4.12).

∑k

Lkfki,j ≤ cj,i (4.12)

Including Eq (4.1), Eq (4.2), Eq (4.3), the complete feasibility problem for

NDN is shown in Fig. 4.6


solvesubject to∑

j∈Γ−s

fks,j −

∑j∈Γ+

s

fkj,s ≤ βk

s∑j∈Γ−

s

fks,j −

∑j∈Γ+

s

fkj,s + I(i ∈ C)rksh

ks ≥ βk

s∑k

Lkfki,j ≤ cj,i

βki = αk

i + hki ∀i ∈ V \ P∑i

piV (Si) ≤ B∑k

Lkhki ≤ piSi

Figure 4.6: Feasibility model for NDN

4.3 Results

We evaluated the numerical result of our model using multiple network topolo-

gies. Fig. 4.7 is one of the Rocketfuel networks [64]. Fig. 4.8 is a Dorogovtsev-

Goltsev-Mendes (DGM) topology and Fig. 4.9 is a tree network. The Rock-

etfuel topology is simplified by removing the leaf nodes from the original net-

work and consolidating the demands from the removed nodes into their parent

nodes [65]. The simplified network has 50 nodes and 194 directed links. These

three topologies are comparable in size. The number of users is 25 nodes in

Rocketfuel and 27 nodes in DGM and tree topologies. We consider one peering

point for each network, and the rest of nodes are cache candidates nodes. At

each node, the demand for each content follows a Zipf distribution with α = 2.

We assumed all the users have the same demand, and it is uniformly increasing


1

2

3

4

5

6 7

8

9

10 11

12

13

14

15

16

1718

19

20

2122

23

24

25

26

27

28

293031

3233

34

3536

37

38

39

40

41

42

4344

45

46 47

48

49

50

Figure 4.7: Rocketfuel network

by 5% every month. This increase is based on the current observation of OTT

demand increase. As mentioned in Section 4.2.1, for each budget point we

solve a series of feasibility problem and increase the demand until the network

is saturated. We also simulated the back-pressure algorithm in [63] to compare

with the performance of our model.

4.3.1 Time-to-Exhaustion of different topologies

To evaluate the performance of the CDN method, we find the TTE of the

network by putting at most four caches. The assigned storage budget is equally

divided between these nodes, assuming they are all using similar hardware. In

other words, we use homogeneous caching. For example, in Rocketfuel network

(Fig. 4.7), Nodes 5, 10, 12 and 14, are selected for caching, and respectively,

in DGM topology, Nodes 2, 3, 4, 5 and tree topology, Nodes 2, 3 and 4 are


1

23

45

6

78

910

11

12

13

14

15

1617

18192021

2223

24

25

26

27

28

29

30

31

32

33

34

35

36

37

3839

40

4142

Figure 4.8: DGM network

selected for caching. It is also worth noting that in the tree topology only three

nodes are selected for caching, since adding more caches has not increased the

TTE further. To evaluate the performance of NDN, we will enable caching

in all the candidate nodes. Therefore, storage budget will be equally divided

between more nodes, and each node can cache less number of objects.

Fig. 4.10, Fig. 4.11 and Fig. 4.12 show the TTE in different topologies

while using CDN model, NDN model and NDN simulation using back pressure

algorithm. We assumed that there is demand for 2000 objects, divided into 100

popularity groups, each with the size of 1Mb and all the links in the network

have the capacity of 1Gb/s. We had placed at most four caches in CDN while

all the nodes can cache in NDN scenarios. Note that in all the topologies,

NDN-Simulation using back-pressure closely follows our NDN-model.

At very low storage budget, CDN and NDN had a similar TTE, because

most of the content is provided by the peering point, and that will become

the bottleneck of the network. This means network onset of congestion will be


1

2

3

4

5

6

7

8

9

10

11

1213

1415

16

17

18

19

20

2122

23

24

25

26

27

28

29 30

31

32

33

34

35

36

37383940

Figure 4.9: Tree network

similar for both NDN and CDN scenarios. Different topologies have different

TTE for very low storage budget. The TTE depends on the onset of con-

gestion, and the onset of congestion depends on the topology of the network.

TTE is lowest for the tree topology and the highest for the DGM topology.

This observation is also in agreement with the reciprocal of network criticality

of each topology [66].

By increasing the caching storage, TTE is also increased. The storage

budget is equally divided between all caches. Therefore, the increase in the

total storage budget will increase the TTE. As the number of caches increases,

each cache will receive a smaller portion of the budget. Therefore, when there

is not enough additional storage available to each cache, there will be no

change in the number of cached contents, and the TTE will not change either.

This minimum increase in storage depends on the number of caches in the

network. In NDN, the steps are larger since there are more caches and a

greater increase in the total storage budget is required to cache more contents.


0

10

20

30

40

50

60

70

80

90

2 4 6 8 10 12 14 16 18 20

Tim

e to

Exh

aust

ion

(Mon

ths)

Cache Storage Budget (Gbit)

NDN-ModelNDN-Simulation

CDN

Figure 4.10: Time-to-exhaustion in Rocketfuel network

In CDN, the steps are smaller since there are only four caches and a smaller

amount of increase in storage budget, compared to NDN, will result in more

cached contents. However, the height of the steps decreases with increase of

the budget, because caching begins losing its effect. There is also a limit on

the maximum TTE of each topology, after which even caching does not help

anymore. This TTE is the maximum that a network can reach with the help

of caching. Similar to the low budget TTE, the maximum TTE also depends

on the topology of network.

Furthermore, in low storage budget, there is little difference in TTE be-

tween using CDN and NDN. Because of homogeneous caching, sometimes CDN

even performs better. However, in all the topologies, the network that uses


0

10

20

30

40

50

60

70

80

90

2 4 6 8 10 12 14 16 18 20

Tim

e to

Exh

aust

ion

(Mon

ths)



CDN

Figure 4.11: Time-to-exhaustion in DGM network

CDN is saturated in much lower storage budget compared to NDN. This bet-

ter performance is the direct result of the NDN paradigm. In NDN, due to

its in-network caching and point-to-point nature, each cached content is sent

over the links only once. However, in CDN each content is sent multiple times.

This waste of link capacity shows itself by having the network saturated much

sooner. There is a huge difference in maximum TTE between using CDN or

NDN in each topology. In Rocketfuel, using CDN will saturate the network

after 47 months. But using NDN, the network can be operational until 77

months. Similarly, DGM with CDN is operational for 74 months and with

NDN for 82 months. Tree topology with CDN is operational for 27 months

and with NDN for 67 months. This huge difference in tree topology is because


0

10

20

30

40

50

60

70

80

90

2 4 6 8 10 12 14 16 18 20

Tim

e to

Exh

aust

ion

(Mon

ths)



CDN

Figure 4.12: Time-to-exhaustion in Tree network

in NDN caches are placed throughout the network. As in Fig. 4.9, NDN places

caches in Nodes 2 to 13. But CDN only places caches in Nodes 2, 3 and 4. For

example, having a cache in Node 5 will reserve bandwidth in all the up-links

and will make more capacity available to deliver more content.

4.3.2 Limited NDN Deployment

To see how much of the difference in TTE between CDN and NDN comes from

the number of caches in the network, we will limit the number of caches in

NDN to four. Fig. 4.13 shows that even with four caches, content delivery using

NDN outperforms the CDN design. We have also considered a non-practical

case that every node in the CDN can also cache contents. This case is just for


0

10

20

30

40

50

60

70

80

0 5 10 15 20

Tim

e to

Exh

aust

ion

(Mon

ths)


NDN-full-cacheNDN-limited-cache

CDN-full-cacheCDN-limited-cache

Figure 4.13: Changes in TTE of Rocketfuel topology with number of caches

the comparison and in practice cannot be implemented due to the nature of

CDN. One could say that one of the reasons behind the NDN proposal is the

impossibility of in-network caching in TCP/IP. However, even if all the nodes

in the CDN had the caching capability, the network will saturate similar to

the case that there are four caches in the network. In addition, limited NDN

deployment has better TTE for low budget than full NDN deployment. This

suggests limiting the number of NDN caches when the storage budget is low.

We can also look at link utilization in the network. Fig. 4.14 shows the

percentage of links with various percentage of utilization during network con-

gestion. Using CDN, more than 60% of the links will have a link utilization

of more than 90%. In contrast, NDN scenario, even with limited deployment,


0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

CDN-limited

CDN-Full

NDN-limited

NDN-full

<10 <20 <30 <40 <50 <60 <70 <80 <90 <100

Figure 4.14: Link utilization of NDN vs CDN

has less than 20% of the links with high utilization. Using NDN, has resulted

in a network that more than 40% of the links have link utilization of less than

10%. This difference in link utilization means that if CDN is used to increase

the TTE of the network, we have to increase the capacity of most of the links.

But using NDN will result in a much fewer bottlenecks, which makes capacity

planning much easier and cheaper.

4.3.3 I/O Speed Effect

One of the parameters we have considered in our modeling is the I/O limit of

each cache. The I/O limit depends on hardware design of the cache. Fig. 4.15

shows the effect of this parameter. To better see the difference the I/O speed

makes, we have increased the capacity of all the links to limit the effect of

congestion. As shown in Fig. 4.15, as the I/O limit increases from 10Gb/s


120

125

130

135

140

145

150

155

160

165

0 5 10 15 20 25 30 35 40

Tim

e to

Exh

aust

ion

(Mon

ths)


NDN-high-ioNDN-low-io

CDN-high-ioCDN-low-io

Figure 4.15: Changes in TTE of Rocketfuel topology with I/O limit

to 100Gb/s, TTE also increases. But it must be said that having low link

capacity will greatly diminish the improvement gained by having a hardware

with higher I/O limit.

4.3.4 Routing Protocol Effect in CDN

As mention above, our modeling tries to maximize the TTE and therefore

optimizes the routing of data. However, in practice routing is not optimal.

As shown in Fig. 4.16, by enforcing shortest-path routing for CDN in the

Rocketfuel network the TTE will be reduced by more than ten months. The

NDN does not have this problem since its strategy layer can employ an optimal

routing algorithm.


95

100

105

110

115

120

125

130

135

140

145

0 2 4 6 8 10 12

Tim

e to

Exh

aust

ion

(Mon

ths)


CDN-optimalCDN-SP

Figure 4.16: Changes in TTE of Rocketfuel topology with Routing algorithm

4.3.5 Heterogeneous Caching

Heterogeneous caching is using caches on the network that each has a different

amount of storage. In contrast to homogeneous caching, where all the caches

use the same amount of storage. Homogeneous caching may be cheaper since

the cache hardware comes in pre-configures packages, and having a customized

hardware costs more. Therefore, Service Providers must do a cost-benefit

analysis on having a heterogeneous caching system.

Fig. 4.17 shows the effect of heterogeneous caching on the TTE when NDN

is used. In using heterogeneous caching, the model will assign each cache

a different storage capacity while satisfying total storage budget constraint.

It is expected that the symmetry in tree and DGM topologies would imply


115

120

125

130

135

140

145

150

155

160

165

0 5 10 15 20

Tim

e to

Exh

aust

ion

(Mon

ths)


Rocketfuel-HTRocketfuel-HO

DGM-HTDGM-HOTree-HTTree-HO

Figure 4.17: Heterogeneous vs Homogeneous caching storage in NDN

little benefit to the heterogeneity. However, there is some difference in the

Rocketfuel topologies, which is less symmetric than other topologies we mod-

eled. If heterogeneous caching is employed, TTE in the Rocketfuel network is

increased at most by three months.

4.4 Summary

Service providers are under a lot of pressure due to daily increase of Over- The-

Top contents. In this chapter, we presented a cache placement and content

routing method for service providers to delay the congestion of their network

considering their limited budget. We modeled both ICN and CDN and aimed

to maximize the time-to-exhaustion of the network. Our result shows that


a limited deployment of ICN improves the time-to-exhaustion of the network

and lowers the number of links with high utilization.

Chapter 5

Conclusion

The Internet is evolving fast, regarding architecture and usage. Numerous

devices are getting connected to the Internet every day, and more and more

contents are constantly created. The current end-to-end communication us-

ing TCP/IP is not designed for these new use-cases. However, networking

paradigms, such as Information-Centric Networking, aim to tackle these prob-

lems. They move towards a point-to-point communication model, decouple

data names from their location and change router buffers into caches for con-

tent storage. In this chapter, we first review our contributions in this work and

then propose some ideas that can be implemented to extend our contributions.

5.1 Contributions

In this work, we designed a content-based publish/subscribe system using ICN

paradigm as the data dissemination layer in the CVST platform. Also, we

showed the benefits of using ICN in content delivery in service providers.

115

Chapter 5. Conclusion 116

5.1.1 Data Dissemination in CVST

The CVST platform collects a rich set of data from many transportation data

sources. These sources include traffic sensors, road cameras, road incidents

and closures reports, Twitter traffic reports, public transit data (bus location

information and bike station data), border delay time, and last but not least

the loop detector data.

We presented a content-based publish/subscribe system for CVST that

employs the ICN paradigm. In a content-based publish/subscribe systems, a

subscriber can define a query in addition to the topic of the interest and receive,

in real-time, the contents that match that query. We present the architecture

for a distributed broker that connects publishers and subscribers, registers the

schema for the publishers and saves the queries submitted by the subscribers.

These tasks are exposed as a set of APIs to publishers and subscribers. The

broker uses a set of scalable micro-services and supports ICN-base and IP-

based protocols to communicate with publishers and subscribers.

The publisher-broker and subscriber-broker communication layer over ICN

provides a platform to build an efficient, robust, scalable, and secure data dis-

semination layer. We presented the detailed design of the data dissemination

layer and its advantages. The platform has been used to publish live video

feed from drone flights, as well as many other data types. Our demonstration

shows the feasibility of Vision as a Service in an application platform.


5.1.2 Time to Exhaustion

We proposed an in-network caching strategy for service providers to increase

the time-to-exhaustion of their network. We suggested that service providers

use Information-Centric Networking for caching and content delivery. Even a

limited deployment of ICN provides a substantial increase in time-to-exhaustion

of the network and lowers the number of links with high utilization. We studied

different parameters that affect the performance of content delivery, such as

I/O limited, routing algorithms, and heterogeneous and homogeneous caching.

We also validated our model by simulation.

5.2 Future Works

In this section, we review possible extensions of our work. The extensions are

in two categories, the extensions of the data dissemination layer for CVST and

the extensions of the content delivery in service providers using ICN paradigm.

We demonstrated the data dissemination layer uses the schema of a data

types for publications and subscriptions. More data sources can be easily

added to the data dissemination layer. Since the system understands the data

based on its schema, adding more data types is just creating and registering

the schema in the system. Access control, security, and privacy has native

support in Named-Data Networking. Our publish/subscribe system can easily

be extended to use these features to do verification, encryption, and authoriza-

tion. The broker can act as the central authority to control, issue and validate

the signing and encryption keys.


We also expect that Vision as a Service (VaaS) will be available region-

wide by placing a network of drones throughout a region. Drones will be

dispatched on demand directly from base locations or transported by vehicle

to appropriate launching locations to investigate network anomalies.

The broker can also be extended to support aggregation queries. In the

current design, an application can subscribe to the raw data by filtering based

on some conditions. However, the data aggregation is a common feature in the

IoT systems. The Matching Engine micro-service is a good candidate to im-

plement the aggregation. The Matching Engine workers have direct access to

the data, and the queries and can use the Query servers as a temporary buffer

for both spatial and temporal aggregation. Additional micro-services may be

added to check for the aggregation result and notify the subscribers. The in-

terface for creating the aggregation queries can be implemented by extending

the Subscription portal.

Also, the Subscription Portal may be extended to register remote sub-

scribers. Currently, portal acts as a subscriber and receives notifications for

all of its registered subscriptions. However, the portal can register queries

for remote subscribers given their callback paths. The callback path of the

subscribers can be entered by the user in the query registration page of the

portal, and the portal will pass that information to the XSUB API.

Our work in controlling time-to-exhaustion in service providers may be

extended by integrating the logic of cache placement and content routing in a

central controller, such as SDI controller that SAVI provides. This controller

may use the output solution to the optimization problem and set the routing


tables in the network routers to maximize the time-to-exhaustion. The solution

also provides a cache placement recommendation that may be used with the

dynamic resource allocation that an infrastructure such as SAVI provides to

instantiate new cache instances and route traffic towards that instances.

Also, different caching hardware may be used in various parts of the net-

work. Netflix OpenConnect [50] provides caching servers with different capa-

bilities, and the service providers must put the hardware in the best place in

the network. Our work can be extended to take into account these hardware

differences.

Bibliography

[1] D. Perino and M. Varvello, “A Reality Check for Content CentricNetworking,” in Proceedings of the ACM SIGCOMM Workshop onInformation-centric Networking, ser. ICN ’11. New York, NY, USA:ACM, 2011, pp. 44–49.

[2] V. Jacobson, D. K. Smetters, J. D. Thornton, M. F. Plass, N. H. Briggs,and R. L. Braynard, “Networking Named Content,” in Proceedings of the5th International Conference on Emerging Networking Experiments andTechnologies, ser. CoNEXT ’09. New York, NY, USA: ACM, 2009, pp.1–12.

[3] D. Raychaudhuri, K. Nagaraja, and A. Venkataramani, “Mobilityfirst:A Robust and Trustworthy Mobility-Centric Architecture for the FutureInternet,” SIGMOBILE Mob. Comput. Commun. Rev., vol. 16, no. 3, pp.2–13, Dec. 2012.

[4] A. Leon-Garcia, H. Bannazadeh, and A. Tizghadam, “Smart city plat-forms on multitier Software-Defined infrastructure cloud computing,” in2016 IEEE International Smart Cities Conference (ISC2) (ISC2 2016),Trento, Italy, Sep. 2016.

[5] J.-M. K.-M. Kang, H. Bannazadeh, and A. Leon-Garcia, “SAVI testbed:Control and management of converged virtual ICT resources,” in Inte-grated Network Management (IM 2013), 2013 IFIP/IEEE InternationalSymposium on, May 2013, pp. 664–667.

[6] “Global Internet Phenomena,” Sandvine, Tech. Rep.,2013. [Online]. Available: https://www.sandvine.com/trends/global-internet-phenomena/

[7] C. Labovitz, “Massive Ongoing Changes in Con-tent Distribution,” http://blog.streamingmedia.com/wp-

120

https://www.sandvine.com/trends/global-internet-phenomena/

https://www.sandvine.com/trends/global-internet-phenomena/

Bibliography 121

content/uploads/2013/07/2013CDNSummit-B102A.pdf, Tech. Rep.,2013.

[8] C. Labovitz, S. Iekel-Johnson, D. McPherson, J. Oberheide, and F. Jaha-nian, “Internet inter-domain traffic,” ACM SIGCOMM Computer Com-munication Review, vol. 41, no. 4, pp. 75–86, 2011.

[9] World Urbanization Prospects: The 2014 Revision, Highlights (ST/E-SA/SER.A/352), United Nations, Department of Economic and SocialAffairs, Population Division, 2014.

[10] J. M. Hernandez-Munoz, J. B. Vercher, L. Munoz, J. A. Galache,M. Presser, L. A. H. Gomez, and J. Pettersson, “Smart cities at the fore-front of the future internet,” in The Future Internet Assembly. Springer,2011, pp. 447–462.

[11] A. Shariat, A. Tizghadam, and A. Leon-Garcia, “An ICN-Based Publish-Subscribe platform to deliver UAV service in smart cities,” in 2016 IEEEConference on Computer Communications Workshops (INFOCOM WK-SHPS): SmartCity16: The 2nd IEEE INFOCOM Workshop on SmartCities and Urban Computing (SmartCity’16), San Francisco, USA, Apr.2016.

[12] ——, “Optimizing time to exhaustion in service providers us-ing Information-Centric networking,” in 28th International TeletrafficCongress (ITC 28), Wurzburg, Germany, Sep. 2016.

[13] T. Koponen, M. Chawla, B.-G. Chun, A. Ermolinskiy, K. H. Kim,S. Shenker, and I. Stoica, “A Data-Oriented (and beyond) Network Archi-tecture,” SIGCOMM Comput. Commun. Rev., vol. 37, no. 4, pp. 181–192,Aug. 2007.

[14] D. Smetters and V. Jacobson, “Securing Network Content,” Tech. Rep.,2009.

[15] M. Gritter and D. R. Cheriton, “An Architecture for Content Rout-ing Support in the Internet,” in Proceedings of the 3rd Conference onUSENIX Symposium on Internet Technologies and Systems - Volume 3,ser. USITS’01. Berkeley, CA, USA: USENIX Association, 2001, pp. 4–4.

[16] PURSUIT. [Online]. Available: http://www.fp7-pursuit.eu/PursuitWeb/

http://www.fp7-pursuit.eu/PursuitWeb/

Bibliography 122

[17] N. Fotiou, P. Nikander, D. Trossen, and G. C. Polyzos, “Developing In-formation Networking Further: From PSIRP to PURSUIT,” BroadbandCommunications, Networks, and Systems, pp. 1–13, 2012.

[18] D. Lagutin, K. Visala, and S. Tarkoma, “Publish/Subscribe for Internet:PSIRP Perspective.” Future Internet Assembly, vol. 84, 2010.

[19] SAIL Project. [Online]. Available: http://www.sail-project.eu

[20] COMET Project. [Online]. Available: http://www.cometproject.eu/

[21] CONVERGENCE. [Online]. Available: http://www.ict-convergence.eu/

[22] V. Jacobson. (2006, August) A New Way to look at Networking. [Online].Available: https://www.youtube.com/watch?v=oCZMoY3q2uM

[23] Content Centric Networking. [Online]. Available: http://www.ccnx.org

[24] Named Data Networking. [Online]. Available: http://named-data.net

[25] G. Carofiglio, G. Morabito, L. Muscariello, I. Solis, and M. Varvello,“From Content Delivery Today to Information Centric Networking,” Com-put. Netw., vol. 57, no. 16, pp. 3116–3127, Nov. 2013.

[26] D. Rossi and G. Rossini, “On Sizing CCN Content Stores by Exploit-ing Topological Information,” in Computer Communications Workshops(INFOCOM WKSHPS), 2012 IEEE Conference on, March 2012, pp. 280–285.

[27] H. Yuan and P. Crowley, “Experimental Evaluation of Content Distribu-tion with Ndn and Http,” in INFOCOM, 2013 Proceedings IEEE, April2013, pp. 240–244.

[28] M. Varvello, D. Perino, and L. Linguaglossa, “On the Design and Im-plementation of a Wire-Speed pending Interest Table,” in Proceedings ofthe 2nd IEEE International Workshop on Emerging Design Choices inName-Oriented Networking, NOMEN, vol. 13, 2013.

[29] C. Dannewitz, M. D’Ambrosio, and V. Vercellone, “Hierarchical DHT-Based Name Resolution for Information-Centric Networks-Based NameResolution for Information-Centric Networks,” Comput. Commun.,vol. 36, no. 7, pp. 736–749, Apr. 2013.

http://www.sail-project.eu

http://www.cometproject.eu/

http://www.ict-convergence.eu/

https://www.youtube.com/watch?v=oCZMoY3q2uM

http://www.ccnx.org

http://named-data.net

Bibliography 123

[30] K. V. Katsaros, N. Fotiou, X. Vasilakos, C. N. Ververidis, C. Tsilopoulos,G. Xylomenos, and G. C. Polyzos, “On Inter-Domain Name Resolutionfor Information-Centric Networks,” in Proceedings of the 11th Interna-tional IFIP TC 6 Conference on Networking - Volume Part I, ser. IFIP’12.Berlin, Heidelberg: Springer-Verlag, 2012, pp. 13–26.

[31] A. Badam, K. Park, V. S. Pai, and L. L. Peterson, “Hashcache: CacheStorage for the Next Billion,” in Proceedings of the 6th USENIX Sympo-sium on Networked Systems Design and Implementation, ser. NSDI’09.Berkeley, CA, USA: USENIX Association, 2009, pp. 123–136.

[32] S. C. Nelson, G. Bhanage, and D. Raychaudhuri, “GSTAR: generalizedstorage-aware routing for mobilityfirst in the future mobile internet,” inProceedings of the sixth international workshop on MobiArch. ACM,2011, pp. 19–24.

[33] A. Tizghadam and A. Leon-Garcia, “Application platform for smart trans-portation,” in Future Access Enablers for Ubiquitous and Intelligent In-frastructures, ser. Lecture Notes of the Institute for Computer Sciences,Social Informatics and Telecommunications Engineering, V. Atanasovskiand A. Leon-Garcia, Eds. Springer International Publishing, 2015, vol.159, pp. 26–32.

[34] CVST Portal. [Online]. Available: http://portal.cvst.ca

[35] Smart Application on Virtual Infrastructure. [Online]. Available:http://www.savinetwork.ca/

[36] A. Carzaniga, M. Papalini, and A. L. Wolf, “Content-Based Publish/Sub-scribe Networking and Information-Centric Networking,” in Proceedingsof the ACM SIGCOMM Workshop on Information-centric Networking,ser. ICN ’11. New York, NY, USA: ACM, 2011, pp. 56–61.

[37] J. Chen, M. Arumaithurai, L. Jiao, X. Fu, and K. Ramakrishnan,“COPSS: An Efficient Content Oriented Publish/Subscribe System,” inArchitectures for Networking and Communications Systems (ANCS),2011 Seventh ACM/IEEE Symposium on, Oct 2011, pp. 99–110.

[38] H.-A. Jacobsen, “Publish/Subscribe,” in Encyclopedia of Database Sys-tems. Springer, 2009, pp. 2208–2211.

[39] ——, “Content-based Publish/Subscribe,” in Encyclopedia of DatabaseSystems. Springer, 2009, pp. 464–466.

http://portal.cvst.ca

http://www.savinetwork.ca/

Bibliography 124

[40] R. Baldoni, M. Contenti, and A. Virgillito, “The evolution of publish/sub-scribe communication systems,” in Future directions in distributed com-puting. Springer, 2003, pp. 137–141.

[41] G. Chockler, R. Melamed, Y. Tock, and R. Vitenberg, “Constructingscalable overlays for pub-sub with many topics,” in Proceedings of thetwenty-sixth annual ACM symposium on Principles of distributed com-puting. ACM, 2007, pp. 109–118.

[42] E. Fidler, H.-A. Jacobsen, G. Li, and S. Mankovski, “The PADRES Dis-tributed Publish/Subscribe System,” in FIW, 2005, pp. 12–30.

[43] Y. Zhang, A. Afanasyev, J. Burke, and L. Zhang, “A Survey of MobilitySupport in Named Data Networking,” in Proceedings of the third Work-shop on Name-Oriented Mobility: Architecture, Algorithms and Applica-tions (NOM’2016).

[44] “Cisco Visual Networking Index: The Zettabyte Era-Trends and Analy-sis,” Cisco, Tech. Rep., 2013.

[45] “Cisco Visual Networking Index: Global Mobile Data Traffic ForecastUpdate, 2013-2018,” Cisco, Tech. Rep., 2013.

[46] M. Rabinovich and O. Spatscheck, Web caching and replication. Addison-Wesley Reading, 2002.

[47] A.-M. K. Pathan, “Utility-oriented internetworking of content deliverynetworks,” Ph.D. dissertation, The University of Melbourne, 2009.

[48] B. Cain, A. Barbir, R. Nair, and O. Spatscheck, “Known Content Network(CN) Request-Routing Mechanisms,” 2003.

[49] J. Pang, A. Akella, A. Shaikh, B. Krishnamurthy, and S. Seshan, “On theresponsiveness of DNS-based network control,” in Proceedings of the 4thACM SIGCOMM conference on Internet measurement. ACM, 2004, pp.21–26.

[50] Netflix Open Connect Content Delivery Network. [Online]. Available:https://openconnect.itp.netflix.com/

[51] D. Rayburn. (2010) An Overview Of Transparent Caching and Its RoleIn The CDN Market. [Online]. Available: http://blog.streamingmedia.com/2010/10/an-overview-of-transparent-caching.html

https://openconnect.itp.netflix.com/

http://blog.streamingmedia.com/2010/10/an-overview-of-transparent-caching.html

http://blog.streamingmedia.com/2010/10/an-overview-of-transparent-caching.html

Bibliography 125

[52] G. Tyson, S. Kaune, S. Miles, Y. El-khatib, A. Mauthe, and A. Taweel,“A trace-driven analysis of caching in content-centric networks,” in Com-puter Communications and Networks (ICCCN), 2012 21st InternationalConference on, July 2012, pp. 1–7.

[53] P. Agyapong and M. Sirbu, “Economic incentives in information- centricnetworking: implications for protocol design and public policy,” Commu-nications Magazine, IEEE, vol. 50, no. 12, pp. 18–26, December 2012.

[54] G. Carofiglio, M. Gallo, L. Muscariello, and D. Perino, “Modeling datatransfer in content-centric networking,” in Teletraffic Congress (ITC),2011 23rd International, Sept 2011, pp. 111–118.

[55] Y. Wang, Z. Li, G. Tyson, S. Uhlig, and G. Xie, “Optimal cache allocationfor content-centric networking,” in Network Protocols (ICNP), 2013 21stIEEE International Conference on, Oct 2013, pp. 1–10.

[56] RabbitMQ. [Online]. Available: https://www.rabbitmq.com/

[57] Elasticsearch. [Online]. Available: https://www.elastic.co/products/elasticsearch

[58] Percolator. [Online]. Available: https://www.elastic.co/blog/percolator

[59] Apache Avro. [Online]. Available: http://avro.apache.org/

[60] Apache Hadoop. [Online]. Available: http://hadoop.apache.org/

[61] GeoJson. [Online]. Available: http://geojson.org/

[62] A. Tizghadam and A. Leon-Garcia, “Betweenness centrality and resistancedistance in communication networks,” Network, IEEE, vol. 24, no. 6, pp.10–16, November 2010.

[63] E. M. Yeh, T. Ho, M. Burd, Y. Cui, and D. Leong, “Vip: A framework forjoint dynamic forwarding and caching in named data networks,” CoRR,vol. abs/1310.5569, 2013.

[64] R. Mahajan, N. Spring, D. Wetherall, and T. Anderson, “Inferring linkweights using end-to-end measurements,” in Proceedings of the 2Nd ACMSIGCOMM Workshop on Internet Measurment, ser. IMW ’02. New York,NY, USA: ACM, 2002, pp. 231–236.

https://www.rabbitmq.com/

https://www.elastic.co/products/elasticsearch

https://www.elastic.co/products/elasticsearch

https://www.elastic.co/blog/percolator

http://avro.apache.org/

http://hadoop.apache.org/

http://geojson.org/

Bibliography 126

[65] D. Applegate and E. Cohen, “Making intra-domain routing robust tochanging and uncertain traffic demands: Understanding fundamentaltradeoffs,” in Proceedings of the 2003 Conference on Applications, Tech-nologies, Architectures, and Protocols for Computer Communications, ser.SIGCOMM ’03. New York, NY, USA: ACM, 2003, pp. 313–324.

[66] A. Tizghadam and A. Leon-Garcia, “Robust network planning in nonuni-form traffic scenarios,” Computer Communications, vol. 34, no. 12, pp.1436 – 1449, 2011.

data dissemination using information-centric networking · 2.1 information-centricnetworking...

Documents