data dissemination using information-centric networking · 2.1 information-centricnetworking...
TRANSCRIPT
Data Dissemination using Information-CentricNetworking
by
Ali Shariatmadari
A thesis submitted in conformity with the requirementsfor the degree of Doctor of Philosophy
Graduate Department of Electrical and Computer EngineeringUniversity of Toronto
c© Copyright 2016 by Ali Shariatmadari
Abstract
Data Dissemination using Information-Centric Networking
Ali Shariatmadari
Doctor of Philosophy
Graduate Department of Electrical and Computer Engineering
University of Toronto
2016
Information-Centric Networking (ICN) is a promising paradigm to answer
challenges the current Internet is facing. It is a paradigm that puts content
first, and inherently enables content mobility and content security. In this
work, we use ICN in real world applications. We present an ICN-based data-
dissemination layer for Smart City platforms. We also present a content-based
publish/subscribe overlay system based on that data-dissemination layer. We
are using the system to collect and publish data from various sources, in-
cluding demos with Unmanned Autonomous Vehicles (UAVs) providing live
transportation video.
Furthermore, by promoting in-network caching, ICN is a promising paradigm
to answer current challenges in the service provider’s domain. This work re-
ports on a cache placement and content routing strategy for service providers
to delay the onset of congestion (time-to-exhaustion) to the extent possible in
order to optimize their capital expenditure for their limited capacity planning
budget. We show that even a limited deployment of ICN provides a substantial
increase in the time-to-exhaustion of the network and a decrease in the num-
ii
ber of links with high utilization. We also study the effects of homogeneous
and heterogeneous caching mechanisms on the performance of an ICN based
content-delivery system.
iii
to my wife, my mother, and my father
iv
Acknowledgements
This work would not have been possible without the help and support of
many. First and foremost, I wish to offer my sincerest gratitude to my super-
visor, Professor Alberto Leon-Garcia, who has supported me by his generous
and continuous support, advice, and guidance throughout my study. His in-
sightful suggestions and ideas have been precious for the development of this
thesis. It has been an honor and privilege for me to work with him, and for
that, I am grateful.
Besides my advisor, I would like to thank the respectable members of my
examination committee, Prof. Roch Glitho, Prof. Baochun Li, Prof. Ben Liang,
and Prof. Shahrokh Valaee, for their constructive comments, feedbacks, and
questions.
My sincere thanks also go to Dr. Ali Tizghadam for all the stimulating
discussions, suggestions, and ideas. Also, I would like to thank all the members
of the Network Architecture Lab.
I wish to give my special gratitude to my wife, Maryam, whose love and
support made my journey possible. Finally, I thank my parents for their
love and encouragement, without whom I would never have enjoyed so many
opportunities.
v
Contents
1 Motivations 1
1.1 Challenges of Current Internet . . . . . . . . . . . . . . . . . . 2
1.2 Possible Solution: Information-Centric Networking . . . . . . . 3
1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3.1 Data Dissemination using ICN in Smart City Platforms 4
1.3.2 Content Delivery in Service Providers . . . . . . . . . . 6
1.4 Thesis organization . . . . . . . . . . . . . . . . . . . . . . . . 7
2 Background and Related Works 8
2.1 Information-Centric Networking . . . . . . . . . . . . . . . . . 8
2.1.1 Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.2 A Brief History of ICN . . . . . . . . . . . . . . . . . . 12
2.1.3 Named-Data Networking . . . . . . . . . . . . . . . . . 13
2.1.4 MobilityFirst . . . . . . . . . . . . . . . . . . . . . . . 19
2.1.5 ICN Design Selection . . . . . . . . . . . . . . . . . . . 22
2.2 CVST Platform . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.2.1 Smart Application on Virtual Infrastructure . . . . . . 29
vi
2.2.2 Publish/Subscribe Systems . . . . . . . . . . . . . . . . 31
2.3 Content Delivery over Internet . . . . . . . . . . . . . . . . . . 34
2.3.1 Content Delivery Networks . . . . . . . . . . . . . . . . 37
2.3.2 Content Provider’s Cache . . . . . . . . . . . . . . . . 39
2.3.3 Transparent Caching . . . . . . . . . . . . . . . . . . . 41
2.3.4 Cache Placement in ICN . . . . . . . . . . . . . . . . . 42
2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3 Data Dissemination in CVST 44
3.1 ICN-Based Data Dissemination Layer . . . . . . . . . . . . . . 44
3.1.1 Publisher-Broker Exchange . . . . . . . . . . . . . . . . 46
3.1.2 Subscriber-Broker Exchange . . . . . . . . . . . . . . . 47
3.1.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.2 Broker Architecture . . . . . . . . . . . . . . . . . . . . . . . . 53
3.2.1 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.3.1 Broker Implementation . . . . . . . . . . . . . . . . . . 61
3.3.2 Communication Layer . . . . . . . . . . . . . . . . . . 62
3.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.4.1 Traffic Flow Sensors . . . . . . . . . . . . . . . . . . . 66
3.4.2 Public Transportation . . . . . . . . . . . . . . . . . . 69
3.4.3 Drone Vision as a Service . . . . . . . . . . . . . . . . 71
3.4.4 Subscription Portal . . . . . . . . . . . . . . . . . . . . 76
3.5 Evaluation and Performance Tests . . . . . . . . . . . . . . . . 78
3.5.1 IDD Publication Test . . . . . . . . . . . . . . . . . . . 79
vii
3.5.2 Scalability of the Matching Engine . . . . . . . . . . . 80
3.5.3 IDD and IP Performance Comparison . . . . . . . . . . 82
3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
4 Content Delivery in Service Providers 86
4.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . 87
4.1.1 Content Distribution in Service Providers . . . . . . . . 87
4.1.2 Time-to-exhaustion . . . . . . . . . . . . . . . . . . . . 89
4.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . 92
4.2.1 Demands and Storage Budget . . . . . . . . . . . . . . 94
4.2.2 Content Delivery Networks . . . . . . . . . . . . . . . . 96
4.2.3 Named-Data Networking . . . . . . . . . . . . . . . . . 100
4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
4.3.1 Time-to-Exhaustion of different topologies . . . . . . . 103
4.3.2 Limited NDN Deployment . . . . . . . . . . . . . . . . 108
4.3.3 I/O Speed Effect . . . . . . . . . . . . . . . . . . . . . 110
4.3.4 Routing Protocol Effect in CDN . . . . . . . . . . . . . 111
4.3.5 Heterogeneous Caching . . . . . . . . . . . . . . . . . . 112
4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
5 Conclusion 115
5.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
5.1.1 Data Dissemination in CVST . . . . . . . . . . . . . . 116
5.1.2 Time to Exhaustion . . . . . . . . . . . . . . . . . . . . 117
5.2 Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
viii
Bibliography 119
ix
List of Tables
2.1 Summary of memory technologies [1] . . . . . . . . . . . . . . 19
2.2 NDN and MobilityFirst Comparison . . . . . . . . . . . . . . . 24
3.1 The APIs exposed by XPUB and XSUB services . . . . . . . . 55
4.1 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
x
List of Figures
2.1 NDN Protocol Stack [2] . . . . . . . . . . . . . . . . . . . . . 14
2.2 Structure of NDN Packets . . . . . . . . . . . . . . . . . . . . 15
2.3 NDN Forwarding Process . . . . . . . . . . . . . . . . . . . . . 16
2.4 The MobilityFirst architecture [3] . . . . . . . . . . . . . . . . 20
2.5 Mobile Delivery in MobilityFirst [3] . . . . . . . . . . . . . . . 22
2.6 Layered Architecture of CVST Platform [4] . . . . . . . . . . . 27
2.7 Multi-tier Cloud for End-to-End Application Platform . . . . 30
2.8 SAVI test-bed main components [5] . . . . . . . . . . . . . . . 31
2.9 Peak Period Traffic Composition — North America [6] . . . . 35
2.10 Traffic estimation of different types for global and mobile networks 36
2.11 Internet traffic source distribution in 2013 [7] . . . . . . . . . . 40
2.12 Internet’s architecture is changing [8] . . . . . . . . . . . . . . 41
3.1 Application Platform for Smart Transportation . . . . . . . . 45
3.2 Publisher-Broker Communication . . . . . . . . . . . . . . . . 47
3.3 Subscriber-Broker Communication . . . . . . . . . . . . . . . . 48
3.4 High-level architecture of content-based publish/subscribe over
IDD in CVST . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
xi
3.5 Design of the Broker: Abstraction of the complexity of different
system components . . . . . . . . . . . . . . . . . . . . . . . . 56
3.6 Sequence Diagram of the Content-Based Publish/Subscribe Sys-
tem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.7 Scalability of the Broker with Micro-service design . . . . . . . 59
3.8 Apache Avro schema used in XPUB-Matcher communication . 63
3.9 Avro schema used in Matcher-XSUB communication . . . . . . 63
3.10 Sample data gathered from traffic sensors . . . . . . . . . . . . 65
3.11 Schema of the traffic sensor data . . . . . . . . . . . . . . . . 65
3.12 Sample subscription for traffic sensor data . . . . . . . . . . . 66
3.13 A match all query . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.14 Data of traffic sensors on the CVST portal . . . . . . . . . . . 67
3.15 Sample data gathered from public transit vehicles . . . . . . . 68
3.16 Schema of for Toronto Public Transit Data . . . . . . . . . . . 69
3.17 A sample geo distance query for public transportation data . . 70
3.18 Publishing Drone Data . . . . . . . . . . . . . . . . . . . . . . 71
3.19 Sample Drone Data . . . . . . . . . . . . . . . . . . . . . . . . 72
3.20 Video playback of a drone flight on CVST portal . . . . . . . 73
3.21 Subscription Portal: Public Transportation Query . . . . . . . 74
3.22 Subscription Portal: Public Transportation Data . . . . . . . . 75
3.23 Subscription Portal: Traffic Sensor Query . . . . . . . . . . . . 76
3.24 Subscription Portal: Traffic Sensor Data . . . . . . . . . . . . 77
3.25 FIB table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
xii
3.26 Interests and Data packets log during XPUB and publisher com-
munication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
3.27 Scalability of the Matching Engine - Experiment Setup . . . . 79
3.28 Scalability of the Matching Engine, one minute rolling average 80
3.29 Scalability of the Matching Engine, five minutes rolling average 81
3.30 Data usage: IDD vs IP — Experiment Setup . . . . . . . . . . 82
3.31 Data usage: IDD vs IP — Results . . . . . . . . . . . . . . . . 83
4.1 Network of a Service Provider . . . . . . . . . . . . . . . . . . 87
4.2 Content distribution in Service Providers . . . . . . . . . . . . 88
4.3 Flows between sources and destinations pass through multiple
links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4.4 Time-to-exhaustion. Traffic is increasing monthly until network
is congested. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.5 Feasibility model for CDN . . . . . . . . . . . . . . . . . . . . 99
4.6 Feasibility model for NDN . . . . . . . . . . . . . . . . . . . . 102
4.7 Rocketfuel network . . . . . . . . . . . . . . . . . . . . . . . . 103
4.8 DGM network . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
4.9 Tree network . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
4.10 Time-to-exhaustion in Rocketfuel network . . . . . . . . . . . 106
4.11 Time-to-exhaustion in DGM network . . . . . . . . . . . . . . 107
4.12 Time-to-exhaustion in Tree network . . . . . . . . . . . . . . . 108
4.13 Changes in TTE of Rocketfuel topology with number of caches 109
4.14 Link utilization of NDN vs CDN . . . . . . . . . . . . . . . . . 110
4.15 Changes in TTE of Rocketfuel topology with I/O limit . . . . 111
xiii
4.16 Changes in TTE of Rocketfuel topology with Routing algorithm 112
4.17 Heterogeneous vs Homogeneous caching storage in NDN . . . 113
xiv
Chapter 1
Motivations
Current Internet is a product of four decades of evolution. Today, the rapid
growth of contents and the number of connected devices is changing the archi-
tecture of the Internet. The Internet was designed for different circumstances,
at a time when the primary concern was sharing resources. There were few
and expensive computers and their accessories, with few connections between
them. Therefore, host-to-host communication model became the central prin-
ciple of the design of the Internet. In this design, each machine must have
an IP address and follow the TCP/IP protocol to be able to communicate to
other machines in the network. Although TCP/IP has been doing the job
well, today’s network is not all about end-to-end communication between two
hosts. Let us go over some challenges that the Internet is facing.
1
Chapter 1. Motivations 2
1.1 Challenges of Current Internet
A variety of things are expected to get connected to the Internet, billions of
them. These things operate over multiple domains such as transportation, en-
ergy, weather, construction, health, agriculture, etc. This phenomenon, known
as the Internet of Things (IoT), is changing the architecture of the Internet.
These devices are highly heterogeneous and have hardware constraints. They
have lower power consumption, CPU, and memory usage in order of mag-
nitudes. They usually have multiple interfaces over different communication
protocols and lack or have a limited number of configuration options. TCP/IP
is an end-to-end communication protocol and expects the application layer to
provide such services. Therefore, in a constrained environment of IoT devices,
using TCP/IP as a communication layer will be very challenging.
The Internet traffic is also rapidly growing due to Over-The-Top (OTT)
and Video-on-Demand (VoD) services such as Netflix and YouTube. Video
traffic is now consuming most of the bandwidth on the Internet. A more
detailed analysis shows that Netflix (31.6%) and YouTube (18.7%) combined,
account for over 50% of downstream traffic in fixed access [6]. This growth is
another force that is changing the architecture of the Internet. The content
providers are exploiting the economies of scale and using Content Delivery
Networks (CDN) to transfer this everyday increasing traffic, which exacerbates
the change. CDNs were introduced to overcome the limitations of traditional
Web caching systems by deploying several caches throughout the globe and
populating these caches with the popular content during the off-peak traffic
hours. Some content providers are very keen to work with service providers
Chapter 1. Motivations 3
(SP) to provide these caches. For example, Netflix OpenConnect program is
rapidly expanding its coverage by offering to install and maintain the caches in
the SP’s network. But using TCP/IP for content delivery is quite inefficient.
A Gigabyte of content, like a TV Show, can generate a petabyte1 of transient
data. Contents, such as live video streams, are transferred over the Internet
multiple time, which puts enormous pressure on the infrastructure.
1.2 Possible Solution: Information-Centric Net-
working
New networking paradigms such as Information-Centric Networking (ICN)
provide solutions to these problems. ICN is a clean-slate network architecture
for future Internet. It has named-data at the core of the networking, and
names are decoupled from content location, applications, storage or media of
transport. Decoupling data name and its location gives ICN native support
for mobility since the users only need to know the name of the content and
not where the content is located. It also supports data security and privacy
requirements by enabling digital signature and encryption. This solution is not
only agnostic about the source of the content but also gives us the capability
of in-network caching for all contents. In-network caching will help to place
popular content near the consumer to lower the latency, will result in a better
utilization of the infrastructure and will increase the throughput.
An IoT platform is a substrate that offers data collection from a diverse
11 PB=10005 bytes =1015 bytes = 1000 terabytes
Chapter 1. Motivations 4
set of sensors operating in different domains. The substrate should be able to
transfer various types of data generated by these sources and decouple data
collection and delivery. Not only, the substrate must provide data validation
and integrity, but also must guarantee secure communication. Data sources
must be able to respond to pull-based and push-based data requests. At the
same time, the platform should provide support for middle-wares and value
added services such as data processing and aggregation. Such requirements
make ICN a potential alternative networking solution for an IoT platform.
Also, built-in support for in-network caching and multicasting in ICN im-
proves the utilization of underlying infrastructure by removing redundant flows
of the same content and helps the providers to control the extensive cost of last-
mile technologies. Furthermore, detecting popular contents and storing them
in caches near the edge of the network will decrease the latency, and moving
away from host-to-host communication model and employing a strategy layer
will improve content delivery in the mobile environment.
1.3 Contributions
This thesis makes the following contributions.
1.3.1 Data Dissemination using ICN in Smart City Plat-
forms
The urban population of the world is growing. By 2050, 2.5 billion people will
be added to world’s urban population [9]. This growth poses major difficulties
Chapter 1. Motivations 5
for cities to meet objectives such as the quality of life and the socio-economic
development of their citizens. The vision of a Smart City is a response to
these challenges. One of the major obstacles in the path to Smart Cities is
the current heterogeneous technologies use in cities and their lack of inter-
operability. Therefore, a unified platform for Internet of Things can become
the building blocks of the Smart City concept, both at the infrastructure and
service level [10].
A Smart City Platforms requires collecting data from a heterogeneous set
of data sources in various domains, mobile and fixed. Also, the platform shall
anonymize, cleanse and check the integrity of the collected data. It shall send
the received data, in various formats, to interested parties and shall guarantee a
secure data transfer. The platform shall provide different methods for accessing
the data streams, which include content as well as event notifications. For
example, a customer shall be able to pull the data, and another one may
register to receive notifications from the system upon the availability of the
data. The streams have diverse requirements for provenance, privacy and
security. And last but not least, the platform shall be scalable to cope with
the daily increase of the number of data sources and data sinks.
We present a platform to gather data streams from a wide range of data
sources including road cameras, loop detectors, planned and emergency road
closures, fixed and mobile traffic sensors, drones, social media networks, public
transit vehicles, etc. This platform makes the data available to a broad range
of customers using a novel data dissemination layer. We based the design of
the data-dissemination layer on Information-Centric Networking, which inher-
Chapter 1. Motivations 6
ently enables content mobility, caching, and security. Here we will focus on the
Named Data Networking (NDN) implementation of ICN. NDN does not in-
herently support event notifications. Therefore, we enhanced NDN to add the
push notification capability. We present a Naming design for our system that
ensures we can use the inherent features of NDN, such as in-network caching,
scalability and mobility [11].
We implemented an ICN-aware content-based publish/subscribe system
using the data-dissemination layer. In this system, data sources are publishers
that send their data updates to a network of brokers. A user can express its
interest in the data updates through a set of subscription queries and subscribe
to the notification of the availability of the content that matched the queries.
The broker registers the subscriptions queries and matches the newly published
data against them and then notifies the subscribers.
1.3.2 Content Delivery in Service Providers
Exponential traffic growth due to the increasing popularity of Over-The-Top
Video services has put service providers under much pressure. By promoting
in- network caching, Information-Centric Networking is a promising paradigm
to answer current challenges in the service provider’s domain. In this work, we
report on a cache placement and content routing strategy for service providers
to delay the onset of congestion of their network. We aim to optimize the
capital expenditure of their limited capacity planning budget. We show that
even a limited deployment of ICN provides a substantial increase of the onset
of congestion of the network and a decrease in the number of links with high
Chapter 1. Motivations 7
utilization [12].
1.4 Thesis organization
The rest of this document is organized as follows. First, we provide in Chap-
ter 2 a review of the general knowledge required for proper understanding of
this thesis, including Information-Centric Networking and Content Delivery
in the Internet. Then, Chapter 3 focuses on the design and implementation
of the data dissemination layer. Chapter 4 focuses on how Service Providers
may delay the congestion of their network by using Information-Centric Net-
working. Each chapter provides evaluation results for the proposed methods.
The thesis concludes with Chapter 5, which summarizes the contributions and
provides an outlook on future works.
Chapter 2
Background and Related Works
In this chapter, we will review the concept of Information-Centric Network-
ing in Section 2.1. We go over Naming, Name-based Routing and In-Network
Caching, and then we review different implementations of ICN paradigm. We
review CVST, a platform for Smart city applications in Section 2.2. In Sec-
tion 2.3 we survey current technologies used for content delivery over the In-
ternet, such as Content Delivery Networks.
2.1 Information-Centric Networking
Information-Centric Networking (ICN) [2, 3, 13] is a clean slate networking
paradigm that tries to solve current networking problems by replacing the
host-to-host communication model. ICN puts the data at the focus center
of the network and then designs the facilities necessary for transferring that
data. Using ICN, users express their interest for content and then the network
is responsible for providing that content for them. In ICN, it does not matter
8
Chapter 2. Background and Related Works 9
where the content is stored, and the roles of identifier and locator of the content
are decoupled. In the current architecture, IP plays both of these roles.
ICN assigns a name to the data itself, not the content container that stores
that data. Once content is created it has a name that cannot be changed, which
is similar to the way version controlling systems work in software programming.
Content routers then use this name to route and forward data requests to the
authorized sources. Since the routing is based on the name instead of the
host address, network efficiency can be improved by using in-network caching.
Therefore, if a router has already cached the data, it can answer the data
request itself. Otherwise, the request, based on its name, is forwarded to
the next hop for processing. This decoupling also provides better support for
user’s mobility. Most of the ICN designs also include an inherent protection
and authentication of data itself, in contrast with encrypting the connection
between the two parties in the current layout.
2.1.1 Concepts
In this section, we will review the concepts and terminologies that are common
between ICN designs.
Naming
As discussed earlier, one of the problems of current Internet architecture is
that IP addresses are playing the role of both locator and identifier of the
information. HTTP URLs are translated to IP addresses using DNS and the
IP addresses are mapped to the location of the content server. Therefore, the
Chapter 2. Background and Related Works 10
location of the data is attached to its name. Any change in the location of the
data will result in changing its name, and there is no consistent way to keep
track of identical copies of data in different places. To solve this problem, ICN
decouples content from its location. This decoupling shifts the paradigm from
current host-to-host communication to a hop-to-hop communication model be-
tween network entities. When a consumer requests data, the network provides
that data from any authorized source. One of the first benefits of this model is
that only the receiver can retrieve the information, and no data can be received
unless the receiver requests it. This one way requesting, is different from the
current architecture that anyone can send data to any IP address in the net-
work. ICN designs put Naming at the core of the networking model, which
makes it the most important part of designing of an ICN model. A naming
model answers three questions [14]:
• validity: The ability to check no one has tampered the content, usually
by having a verifiable digital signature.
• provenance: The ability to bind the data with the content publisher,
usually using its public key.
• relevance: The ability to map the content to the original request.
Name-based Routing
In ICN, after the receiver sends a request, the network will find the authorized
source for the data and will retrieve the content. It follows that all ICN
designs should do named-base routing. Also, naming data creates the ability
Chapter 2. Background and Related Works 11
to aggregate all the requests for that data and intrinsically provides multicast
forwarding capability.
In-Network Caching
By decoupling information and its location, named data can be stored any-
where in the network, i.e. in-network caching. In-network caching is accom-
plished without any overlay and is an intrinsic part of ICN networks. In-
network caching is an improvement over the way routers’ storage is used today,
which is only for buffering packets. In ICN when a router receives an interest
for content, if it has it in the cache, it can provide it immediately.
Security
In TCP/IP, security is achieved by encrypting the transmission channel plus
authenticating the end points of communication. In this model, there is no way
to provide the authenticity of the data itself, and we have to trust the container
of the data. Moreover, TCP/IP is designed to forward any traffic towards
the destination, which results in an imbalance of power between senders and
receivers. This imbalance creates the ability, for attackers and spammers,
to launch Distributed Denial of Service (DDoS) attacks. However, in ICN,
content can be protected against alteration or eavesdropping and only genuine
copies of the data can exist in the network. Also, ICN architecture is receiver
driven which prevents DDoS attacks.
Chapter 2. Background and Related Works 12
Mobility
TCP/IP was designed with the fixed and immobile hosts in mind, but today,
we are facing with a sharp increase in the number of connected mobile devices.
The network that the host is attached to determine the IP address of the host.
Therefore, the IP address of a mobile device will change if it moves to other
networks. This change of address will result in a distributed connection of
every TCP/IP active session on the device. Some workarounds using different
overlay solutions may be used to remedy this problem. These solutions come
with many inefficiencies since the problem is in the TCP/IP design. Moreover,
IP networks must forward traffic on spanning trees to avoid loops and cannot
make full use of multiple connections of a particular host. ICN will tackle both
of these problems. ICN can take full advantage of multiple connections that a
device has and efficiently manages the communication using all of them. The
reason is, in ICN, there is no end-to-end connection, and every device is only
talking to its next hop.
2.1.2 A Brief History of ICN
The introduction of the idea of separating names and locators goes back
to TRIAD project [15]. However, the Data Oriented Network Architecture
(DONA) [13] is one of the first complete ICN designs. DONA uses a flat
name architecture that replaces current hierarchical names (URLs) by using
the notion of self-certifying names.
A self-certifying name is a tuple of the cryptographic hash of the public
key of the content publisher, P, and a unique label, L, as an identifier of the
Chapter 2. Background and Related Works 13
data that is published under that name. L can be a cryptographic hash of
the content, which makes the label unique and the data immutable. Entities
that are interested in that data will learn its name from a trusted external
source, such as a search engine for names. The name is self-certifying because
anyone who has access to the public key of the publisher can verify the re-
lationship between the data, the content publisher and the label. The name
resolution and routing is done using servers called Resolution Handlers (RHs).
DONA uses source routing by querying these RH servers, which returns a set
of network links that a request must traverse to reach its destination. DONA
is compatible with current Internet architecture, but the requests take a long
path to be able to reach their destination, which causes unnecessary delays.
Moreover, source routing information creates overhead in the packet header.
In addition to DONA, there are many other proposed architectures for
ICN, such as PURSUIT [16–18], SAIL [19], COMET [20] and CONVER-
GENCE [21], but here we review Named Data Networking [2] and Mobili-
tyFirst [3].
2.1.3 Named-Data Networking
Named-Data Networking (NDN) [2] is a fully-fledged ICN architecture, which
initially introduced in a Google Talk [22] by Van Jacobson and developed
as Content-Centric Networking in PARC [23]. Then, NDN [24] started as
an NSF-funded Future Internet Architecture project that began in 2010, in
collaboration between 12 campuses.
Fig. 2.1 shows the NDN vision of the Internet protocol stack in comparison
Chapter 2. Background and Related Works 14
IPpackets
email WWW phone ...
SMTP HTTP RTP ...
TCP UDP ...
ethernet PPP ...
copper fiber radio ...
CSMA async sonet ...
Every node
copper fiber radio ...
Individual apps
Individual links Strategy
Security
File Stream ...
browser chat ...
Contentchunks
IP UDP P2P BCast ...
Figure 2.1: NDN Protocol Stack [2]
to current Internet protocol stack. The narrow waist of the hourglass is a layer
with minimal required functionality and plays the role of a universal agreement
between the hosts that want to communicate over the network. Currently, IP is
playing this part, but in NDN, content chunks are the global agreement. NDN
envisions that it will operate over various networking technologies, including
TCP/IP.
NDN Architecture
NDN uses two kinds of packets for data delivery, Interest packets, and Data
packets. These two kinds are analogous to the way TCP Data and Ack packets
work. The difference is, in NDN, the consumer sends an Interest packet and
then receives the Data packet corresponding to that Interest from the network.
However, in TCP, the server sends the Data, and the client responds with an
Ack. Fig. 2.2 shows the architecture of these packets.
Chapter 2. Background and Related Works 15
Interest Packet Data Packet
Name Name
(order preference, publisher filter,exclude filter, …)
Selectors MetaInfo
Nonce
Guiders(scope, Interest lifetime)
Content
Signature
(content type,freshness period, …)
(signature type, key locator,signature bits, …)
Figure 2.2: Structure of NDN Packets
To be able to forward contents hop-by-hop based on names, NDN uses
three tables, Pending Interest Table (PIT), a Forwarding Information Base
(FIB), and a Content Store (CS). The PIT, which has a list of all the Interests
and their incoming interface, will prevent duplicate forwarding of an Interest
and will satisfy the pending Interest packets when the corresponding Data
packet arrives from the authorized source. The FIB is a table that matches
name prefixes to output interfaces, and the CS will cache the incoming Data
packets so that the node can satisfy future Interest packets.
When an Interest packet arrives (Fig. 2.3) in an NDN node, the node will
first check the CS, then the PIT and then the FIB table. Routers use the
longest match lookup to match the Name in the Interest packet to the FIB
entries. When a Data packet arrives, the node first checks the PIT, and if
there is a match, optionally stores it in the CS. Because each Interest packet
will result in one Data packet, the flow balance in balanced, and the Data
packet will always take the reverse path of the Interest packet.
Chapter 2. Background and Related Works 16
ContentStore
Pending InterestTable (PIT)
FIBInterest ✗ ✓✗
forward
✓Data ✓add incominginterface
✗
drop orNACK
ContentStore
Pending InterestTable (PIT)
✗
Data✓forward
discard Data
cache
Downstream Upstream
✗lookup miss ✓lookup hit
Figure 2.3: NDN Forwarding Process
Strategy Layer
The strategy layer (Fig. 2.1) uses the information in PIT and FIB tables to
find the best forwarding path for an Interest packet. For example, an adaptive
forwarding strategy will make an informed forwarding decision about which
interfaces will be used to forward a particular Interest packet based on the
number of Interest packets cached in the PIT table. It can also balance the
forwarding of Interest packets among multiple interfaces, detect failures and
choose alternative forwarding paths. An effective strategy layer may use multi-
path forwarding capability of NDN to avoid congestion and failures. Strategy
layer may also handle the transmission of control messages among neighbor
routers.
Strategy layer plays a major role in optimizing the utilization of underly-
Chapter 2. Background and Related Works 17
ing infrastructure, especially in a mobile environment, where packet delivery is
unreliable. In mobile transmission, Interest packets or Data packets might get
lost or damaged, or connectivity is interrupted. Strategy layer may be used
to re-transmit the Interest packets that are not satisfied within a reasonable
period. Although, ultimately the transport is receiver driven and the applica-
tion that originates the Interest packets will be responsible for the unsatisfied
Interest packets.
For example, when a client sends Interest packets for content, the routers
along the path between the client to the authorized source can cache the
corresponding Data packets. If during the transmission, the client moves to
a new network, some Data packets will not reach the client. However, upon
joining the new network, the client’s strategy layer will be triggered to reissue
the Interest packets for the missing Data packets. This new Interest packet
will fetch the data from the nearest upstream content store that has cached
the Data packets. On the other hand, if the authorized source of the data
moves to another network some issued Interest packets by the client will not
reach the source, and they will eventually time out. The strategy layer will
reissue them until the content is completely retrieved.
In Chapter 3 with discuss the design of a data dissemination layer. In
this design, the strategy layer acts as a load balancer. Multiple instances of
a service are registered under the same name and the strategy layer sends
Interest packets to them in round robin. Furthermore, in Chapter 4 by using
the strategy layer we optimize the content routing in the network and delay
the network congestion.
Chapter 2. Background and Related Works 18
NDN Naming
NDN is using hierarchical names similar to URLs but not necessarily human
readable, for example, a video can be named /cvst/videos/sample.mp4. In
NDN, contents are divided into chunks and each chunk is immutable and
has a unique name. For example, /cvst/videos/sample.mp4/_v2/_s1 points
to the first section of the version 2 of the sample.mp4. Usually the first
section of the latest version of the content is represented by a path similar to
/cvst/videos/sample.mp4. Names are hierarchical, similar to HTTP, which
allows efficient aggregation in routing tables and fast lookup. Interest can refer
to names that do not exist and publishers can generate content for that name
on the fly.
Name prefixes are usually globally meaningful, similar to domain names in
HTTP, but they can also refer to a local context such as /home/projector.
These naming conventions for pieces of data are not part of NDN, but it can
be designed to provide the ability of relative data retrieval for applications.
Naming design plays an important role in enabling full potentials of NDN
in an application. We go over the Naming design of our event notification
layer in Chapter 3.
Performance and Scalability
Many studies [25–30] researched the performance, scalability, and practical-
ity of Named-Data Networking. For example, authors in [1] try to evaluate
if the NDN model can be implemented using today’s technology. At first, a
comparison of current memory technologies is made, since the memory access
Chapter 2. Background and Related Works 19
Table 2.1: Summary of memory technologies [1]
Technology Access time (ns) Max Size
TCAM 4 ~20MbSRAM 0.45 ~210MbRLDRAM 15 ~2GbDRAM 55 ~10GBHigh-speed SSD 1,000 ~10TBSSD 10,000 ~1TB
latency is the bottleneck of today’s router design. Table 2.1 summarizes mem-
ory technologies and their access latency. The authors use HashCache [31] to
implement the indexing required for Content Store, PIT, and FIB tables.
Authors propose the use of 40 bits for indexing the hash tables to reduce
the collision. Besides, Bloom filters are used to do the longest prefix match in
the FIB table. They propose that if a name is using B components, the router
uses B Bloom filters to query for each potential prefix match and then query
the hash table to detect possible false positives. The memory bits needed for
Bloom filters are five to twenty times the number of items in the FIB table.
If one wants to store 250 million entries in the FIB, current approximate of
global unique host names, the router required 1.5 GB of off-chip RLDRAM
for the index of the hash and 4 Gbits of on-chip SRAM for each bloom filter.
2.1.4 MobilityFirst
The MobilityFirst [3] puts mobile devices as the first-class citizens of its archi-
tecture and focuses on handling delay/disruptive tolerant networks in addition
Chapter 2. Background and Related Works 20
Figure 2.4: The MobilityFirst architecture [3]
to multi-homing, multi/any-cast support and security. MobilityFirst assigns
a 160-bit Globally Unique Identifier (GUID) to any entity, such as devices,
contexts or data. GUIDs are either assigned randomly or are generated by a
global or local Name Certification Service (NCS) as a self-certifying hash of
the public key of the publisher. For example, a single video content will have
the same GUID everywhere in the network. By assigning GUID to all net-
work objects, MobilityFirst supports both host-to-host and hop-by-hop data
transformation.
Using a Global Name Resolution Service (GNRS), the GUIDs are mapped
to one or more topological network addresses and are used as the authoritative
header for routing. Therefore, both flat name addressing and network based
addressing are used, which is called a hybrid GUID and network address based
routing scheme. Routing table size is reduced, but the need for a distributed
Chapter 2. Background and Related Works 21
service for name resolution is required, which is implemented as a distributed
hash table hosted by network routers.
As shown in Fig. 2.4, suppose John wants to receive content on all of his
devices. He first registers them using an NCS, which assigns the same GUID to
all of his devices. These devices upon link establishment with the network will
register their Network Addresses (NA) in GNRS. The sender, using NCS, looks
up the GUID and then, using send or get functions, sends/gets information
from/to them. The packet of this request will have a source and destination
GUID, and a Service IDentifier (SID). SID shows the delivery method, e.g.,
unicast, multicast or anycast. The packet can also include a set of network
addresses of the destination GUID, which are resolved by using GNRS. The
network address resolution task can also be delegated to the content routers
in the network, as well.
If due to movement or link disruption message delivery fails, the packet is
stored in content routers in the network, and then the routers will periodically
query the GNRS for rebinding the destination GUIDs and network addresses.
Fig. 2.5 shows a scenario of temporary disconnection of a mobile node. Delivery
to Node NA99 is failed due to the device movement to another network with
a new network address NA75. The new network connection establishment
will trigger a new GUID/address rebind in GNRS. The last content router,
which has cached the data, is constantly querying GNRS for new network
address. When that router receives this new address, it will retry to send the
data to its destination. MobilityFirst employs a Generalized STorage-Aware
Routing (GSTAR) mechanism, which is a link state routing protocol, at the
Chapter 2. Background and Related Works 22
Figure 2.5: Mobile Delivery in MobilityFirst [3]
intra-domain level to better support disconnections and delays of mobility and
variable link conditions [32].
In addition to on-path caching on intermediate routers, MobilityFirst also
provides off-path caching ability. Off-path cached versions of content are all
known to the network with the same GUID, and all the cache servers register
themselves in GNRS with a different network address. When a client sends a
get request with an SID set to anycast, the content will be transmitted from
the nearest cache server.
2.1.5 ICN Design Selection
In this section, we do a comparative study between NDN and MobilityFirst.
We first discuss the commonalities and then focus on the differences between
Chapter 2. Background and Related Works 23
these two models. NDN and MobilityFirst share three commonalities in their
implementation.
Receiver Driven
Both implementations first publish the content, which means they advertise
the availability of the content and then consumers subscribe or request that
content. The request does not have to happen at the same time or know the
location of the published content. In NDN, content availability is advertised
and then the consumer can send an Interest for that content. In MobilityFirst,
data is registered in NCS and is assigned a GUID and then the consumer
requests that GUID by sending a get packet with the GUID of content as the
destination.
In-Network Caching
Since the name and locator are decoupled, in both NDN and MobilityFirst,
a router can directly serve the content from its local cache or forward the
request to the next hop. In-network caching happens regardless of the protocol
used for the transportation. However, there is a difference between NDN and
MobilityFirst on the amount of data they cache, which goes back to the way
content is named on these two models.
Content-oriented security
Contrary to current security models that are based on securing the path that
data takes to reach the consumer, ICN models can secure the content itself.
Chapter 2. Background and Related Works 24
NDN MobilityFirst
Naming Names are hierarchical Flat naming, singlecomponent
Routing Local routing based on arouting table Uses a distributed hash table
Caching On-path and Off-path On-path and Off-path
Mobility Sending new Interests androuting table updates
Late binding of name andnetwork address by routers
SecurityData packets include
signatures. Distributed trustmodel
Uses self-certifying names
DeveloperFriendly
Open source, available inmany programming
languages
Closed source, limitedavailability
Table 2.2: NDN and MobilityFirst Comparison
Both NDN and MobilityFirst have the content signed by the content creator.
NDN puts this signature inside the data packet, so every packet of data will
include its verifiable signature. MobilityFirst uses self-certifying naming and
the signature is placed in the name of the content.
Now we focus on the differences between NDN and MobilityFirst models.
There are three areas where these two models drift apart and use different
implementations.
Naming
NDN provides the Naming requirements (Section 2.1.1) by including the sig-
nature of the content, including its name in the data sent to the consumer.
This signature ensures that the name and content are bound together, verifi-
Chapter 2. Background and Related Works 25
able by the user. Naming in NDN can be local, and neither requires to have
a particular structure nor needs to be globally unique. On the other hand,
MobilityFirst can use self-certifying names for GUIDs. A self-certifying name
provides the binding between the name and the content; however, the name is
not in a human-readable form and cannot take arbitrary architecture. More-
over, users must use other means such as search engines or applications to find
the name of the content they need.
Routing
An ICN network must be able to route the content to consumers. To be
scalable, NDN uses a hierarchical naming structure and consolidates name
prefixes to reduce the routing tables. The size of the routing table is at least
the size of unique prefixes in the network. MobilityFirst translates the names
to a network address using Global Name Resolution System (GNRS) and then
maps network addresses to interfaces in each router. The size of this routing
table is bounded by the number of routers in the network. The trade off is
between having a global name resolution system and large routing table.
Narrow Waist
ICN designs, including NDN and MobilityFirst, use hop-by-hop communica-
tion between ICN layers, which can be done over either IP or any of another
local delivery protocols. However, to be able to have a global connectivity,
each design must have a narrow waist. NDN defines chunks of data as the
narrow waist of the network. Therefore, content will be divided into chunks,
Chapter 2. Background and Related Works 26
and each chunk will have its name and digital signature. Different services
such as transport protocols are implemented over this structure. Mobility-
First puts Name-based service layer as its narrow waist. These service layers
use GUIDs for all the network-attached objects including hosts, content, and
services, and enable a series of APIs that can be used by upper layers.
Table 2.2 shows a comparison between NDN and MobilityFirst. We chose
NDN as out ICN implementation. The protocol is very simple in design and it
needs much fewer components to function, which makes it suitable for limited
deployment. Also, NDN naming method has many commonalities with the
way HTTP and MPEG-DASH name contents and Names are human readable.
NDN implementation is available open source, with development kits available
for many programming languages and is used in many open-source projects.
Furthermore, our work can be extended by using MobilityFirst as the ICN
implementation.
2.2 CVST Platform
The rapid rate of urbanization globally has become a challenge to municipali-
ties and governments. Smart cities arises as a promising solution to challenges
of urbanization that involves the gathering and analysis of the information
from different sources in real-time. For example, in existing traffic manage-
ment systems, data is frequently not shared or readily available outside its
agency’s domain. These limitations affect the performance of the real-time
analysis of data and reliable detection of the root causes of traffic jams.
Chapter 2. Background and Related Works 27
IaaS
PaaS
Information-Centric Networking
BIaaSPublish/SubscribeOverlay
AlgorithmicEnginesEngines
APIsAnaly�c
SaaS Portal CustomKPIs
UrbanPlanning
Conges�onpricing
ThirdPartyApps
…
Physical Resource Orchestration
Figure 2.6: Layered Architecture of CVST Platform [4]
Providing city information in a single platform is a key enabler to effective
city management. Connected Vehicles and Smart Transportation (CVST) [33,
34] is an open and scalable platform for developing smart city applications.
The platform consists of four main building blocks [4]. Fig. 2.6 depicts the
multi-layer architecture of the CVST platform. The lowest layer is the In-
frastructure as a Service (IaaS) layer, which provides resource management in
a cloud environment. This layer is based on SAVI (Section 2.2.1) cloud and
provides resources that can be scaled up/down/out to adjust to the varying
demands of applications.
Platform as a Service (PaaS) layer is divided into two parts. The bot-
tom sub-layer is responsible for the end-to-end multi-domain orchestration,
and it uses capabilities from SAVI. The top sub-layer is concerned with data
dissemination. The data dissemination layer of CVST platform has these re-
quirements:
Chapter 2. Background and Related Works 28
a. Collect data about the city from variety of sources with different types
b. Support mobility of data sources.
c. Meet privacy requirements of different data types.
d. Guarantee secure data transmission.
e. Allow customers to pull data from the platform on demand.
f. Notify the customers of data availability, e.g. push notifications.
g. Have the ability to scale out to support new data sources and data sinks.
h. Be optimized for best performance.
i. Provide services such as data anonymization, cleansing, verification on
top of data collection.
The Business Intelligence as a Service (BIaaS) layer provides an analytics
platform to extract statistics, to data trends, and to identify data patterns.
The BIaaS applies different techniques, such as stream analytics, for purposes,
such as KPI analysis. This layer provides a set of APIs (Application Program-
ming Interfaces) that are used by both internal and external applications.
BIaaS uses content-based publish/subscribe to collect and send data.
The Smart applications as a Service (SaaS) layer offers a set of smart
city applications. Applications such as real-time dashboards and monitoring
systems, traffic flow optimizations, and route assistance may be provided by
public or private organizations. These applications use the APIs provided by
other layers to access the raw or processed data.
Chapter 2. Background and Related Works 29
2.2.1 Smart Application on Virtual Infrastructure
CVST uses an infrastructure that operates on virtualized resources, managed
using IaaS and PaaS principles. Smart Applications on Virtual Infrastruc-
tures (SAVI) is an initiative to build a test-bed for research and development
of future Internet architectures and applications. SAVI [35] project explores
the role of virtualization and software-defined infrastructure in application
platforms and provides necessary tools for doing experimentation in deploying
future application platforms. SAVI provides large scale computing, storage,
and fast network fabric over a cloud infrastructure.
All resources, computing, networking and others, are managed by a single
management system to offer enabling services. The resources required sup-
porting CVST span multiple resource tiers. As shown in Fig. 2.7, these tiers
are spread across a large geographic extent from remote massive core data cen-
ters to smart edge resources located closer to the user, to Customer Premise
Edge (CPE) resources, such as sensors, near the user or environment. Each
tier provides services that vastly differ in its processing, storage, and network-
ing capacity requirements. This three-tier application platform has been built
and deployed in the SAVI test-bed.
As shown in Fig. 2.8, SAVI test-bed is designed and implemented to help
to overcome challenges in implementing and testing new network applications.
It provides resource management, scalability, reliability, security, and account-
ability to facilitate rapid development of applications.
Software-Defined Infrastructure (SDI) is an approach where a software
manager manages virtual and physical resources in a converged fashion. The
Chapter 2. Background and Related Works 30
Figure 2.7: Multi-tier Cloud for End-to-End Application Platform
SDI manager is hierarchical to ensure scalability and to handle heterogeneity.
Each resource type is controlled by one or more associated controllers, which
themselves interact with the SDI Manager. The controllers also communicate
with a topology manager that provides an integrated view of all resources, and
monitoring and analytics system. The SDI manager provides coordination and
an infrastructure-wide view to the resource controllers and results in a more
efficient resource management.
SAVI test-bed consists of eight nodes and has been in operation across
Canada since 2003. The CANARIE and ORION networks provide the Layer 2
connectivity between SAVI core (i.e. datacenter) and smart edge nodes. Each
node has its SDI manager on top of OpenStack and OpenFlow.
SAVI offers CVST the flexibility in resource management, a unified archi-
tecture, support for deployment of heterogeneous and programmable physical
and virtual resources, and powerful resources required for data analytics and
intelligence. There are services such as VM migration and multi-layer moni-
toring that are used to improve resiliency and robustness of CVST.
SAVI provides a set of heterogeneous physical resources. Some resources
that SAVI provides in this infrastructure include:
a. High-performance server blades with multi-core CPUs
Chapter 2. Background and Related Works 31
AccessNode
EdgeNodeEdge Node
EdgeNode
EdgeNode
SAVIDedicatedNetwork
SAVITBControlCenter
CoreNode
CoreNode
Experiment/ApplicationX
Experiment/ApplicationY
VirtualNetworkX
VirtualNetworkY
AccessNode
AccessNode
AccessNode
AccessNode
Figure 2.8: SAVI test-bed main components [5]
b. Dedicated bare-metal machines with dedicated networking resources,
available in different flavors including high performance and low power
c. Graphics Processor Units (GPU) attached to bare-metal machines
d. Programmable Hardware using NetFPGA available as both attached to
bare-metals or as a standalone network device
2.2.2 Publish/Subscribe Systems
CVST platform uses publish/subscribe paradigm for data dissemination. Event
processing systems use patterns such as Publish/Subscribe between different
parties. Publish/Subscribe is a building block in many applications such as
Chapter 2. Background and Related Works 32
social media, financial systems, and network management. Publish/Subscribe
decouples data sources and sinks and is an effective pattern for large-scale data
dissemination systems.
There have been some attempts at having publish/subscribe system using
Information-Centric Networking. The authors in [36, 37] propose to change
the design of the NDN and add a built-in notification system to it. Instead,
we believe that the being receiver driven is at the heart of the NDN paradigm
and can answer the needs of a high-performance publish/subscribe system.
Publish/Subscribe is an abstract for an information dissemination paradigm
that moves information from a set of content creators (Publishers) to content
consumers (Subscribers). Publishers create the content and emit it to the
system and then the system notifies the interested subscribers. The communi-
cation between the publishers and the subscribers can either happen directly
or be facilitated by a set of broker servers. Subscribers can define their interest
in the contents by various models. The most popular subscription model is
topic based subscription. Publishers create the contents and attach a label
or topic, and then, the system sends this content to the subscribers who are
interested in that topic.
Typically, each subscriber has a distinct, possibly different, interest in the
same data. Subscribers should not only be able to specify the topic of the
data, but also describe some conditions on the data itself. Every data source
is a publisher that sends its data to the broker and then the subscribers will
receive the data from the broker based on some conditions. A publish/sub-
scribe system that subscribers can express a criterion on the published data is
Chapter 2. Background and Related Works 33
a Content-Based Publish/Subscribe. The subscribers will receive all the pub-
lished data matched with their criterion [38, 39]. Elvin [40], SIENA [41], and
PADRES [42] are some examples of content-based publish/subscribe systems.
CVST aims to collect information from a variety of data sources. Many
of these data sources have a constrained environment and have limited avail-
able resources. Therefore, they cannot run complicated applications. Current
publish/subscribe systems support content-based publish/subscribe paradigm
in their application layer and inherit the shortcomings of TCP/IP paradigm.
These systems require to either have a sophisticated application layer or con-
figure the network layer specific to their applications.
To support mobility in a TCP/IP, the network must keep the TCP session
alive by using rendezvous mechanisms [43]. These solutions either bring lim-
itations to the type of applications supported by the network or compromise
the security of the system. Furthermore, to support security, applications are
responsible for data encryption and integrity checks.
For example, PADRES has no built-in support for security and involves the
application layer in the data multicast and routing. PADRES only supports
a limited number of data formats. Furthermore, PADRES has to deal with
the mobility of publisher and subscriber in the application layer. The result
is an application layer that becomes more and more complicated. A complex
application layer cannot run on resource constrained devices. On the other
hand, it is very hard to scale a complex application. With the increase of
cloud-based solutions, it is more desirable to have applications that can scale
out by running similar instances that can share the workload.
Chapter 2. Background and Related Works 34
In contrast, Information-Centric Networking decouples content from its lo-
cator, applications, storage and media. Therefore, the network supports not
only caching and multicast, but also mobility, security, and scalability. In
ICN the job of content distribution is done by the network, not the appli-
cation. Furthermore, we extend ICN to support real-time event notification,
which is required for a publish/subscribe system. In this work, we present the
dissemination layer for CVST platform using Information-Centric Network-
ing paradigm. We also discuss the details of the design and implementation
of content-based publish/subscribe overlay using this dissemination layer in
Chapter 3.
2.3 Content Delivery over Internet
Video traffic is consuming most of the bandwidth on the Internet. Fig. 2.9
shows the peak traffic composition of North America reported for the second
half of 2013 [6]. It shows that Real-Time Entertainment is responsible for over
67% of downstream bytes during the peak period for fixed access and 40% for
mobile access. More detail analysis of the data shows that Netflix (31.6%)
and YouTube (18.7%) combined, account for over 50% of downstream traffic
in fixed access. Not to mention, this is just the beginning of the problem.
Nowadays, users watch videos on YouTube and Netflix, or share files us-
ing BitTorrent. Content dissemination, driven by video-centric services, has
caused an exponential growth of Internet traffic. As Fig. 2.10a shows, by 2017,
traffic generated by IP Video and file sharing will be in the range of 80 to 90
Chapter 2. Background and Related Works 35
(a) Fixed Access (b) Mobile Access
Figure 2.9: Peak Period Traffic Composition — North America [6]
percent of the total IP traffic of the Internet. IP Video consists of Internet
video, IP Video on Demand (VoD), video streamed gaming and video con-
ferencing. Globally, IP video traffic will account for 73 percent of traffic in
2017 [44]. Fig. 2.10b shows similar estimation for mobile network [45].
Fig. 2.10a shows that compound annual growth rate of Video traffic is
estimated to be about 69% [44]. In April 2014 alone, Netflix reached 50 million
subscribers, and started streaming in UltraHD (4K). Hulu reached 5 million
paying subscribers, Comcast released their own cloud based DVR and AT&T
announced their intention of investing $500 million in streaming video business.
These are all indications, that in the next few years, many similar video services
will be offered to the consumers.
The other problem that operators are dealing with is mobile traffic. By
2017, mobile traffic will surpass wire and will account for about 55% of total
IP traffic. Fig. 2.9b shows that, during the peak period, Real-Time Entertain-
Chapter 2. Background and Related Works 36
(a) Global Consumer IP Traffic [44]a
aThe percentages within parenthesesnext to the legend denote the relativetraffic shares in 2012 and 2017.
(b) Mobile Video Will Generate Over69 Percent of Mobile Data Traffic by2018 [45]a
aFigures in parentheses refer to trafficin 2018.
Figure 2.10: Traffic estimation of different types for global and mobile networks
ment traffic is the most dominant and it is accounting for almost 50% of the
downstream transmission on the network. Fig. 2.10b represents the estimates
of the mobile video traffic and the fact that it will generate most of the mobile
traffic growth through 2018 because the video has much higher bit rate than
other mobile contents.
Between 2013 and 2018, mobile video will grow at a compound annual
growth rate of 69% and it will reach 11 Exabyte1 per month of the total 15.9
Exabyte of mobile traffic. This growth rate is the highest among all other
mobile content categories. Even today, mobile video represents more than
half of global mobile data traffic, which brought its own challenges to service
providers. Video traffic will be the largest part of the traffic with the highest
growth rate, both globally and mobile, and we have to get ready to face this
11 EB =10006 bytes =1018 bytes = 10000 terabytes
Chapter 2. Background and Related Works 37
challenge eventually [44].
Content delivery over the web has been historically from one server to mul-
tiple clients. With the rapid increase of the web users, two problems arose.
First, service providers could not handle all the web traffic, which led to putting
web caches in their network. Web caches [46] are the first attempt of having
in-network storage. The major problem with web caches was the content in-
consistency, because there was no coordination between content providers and
the cache owners. Nevertheless, with the rapid increase of user’s connection
speed, web caches lost their usefulness.
Second problem was the web servers could not handle the load anymore. To
fix this problem, content providers shifted toward multiple servers to multiple
clients model. In this model, usually a load-balancing server forwards the
requests to different servers based on their current load. However, due to
high expenses only the big companies who had enough resources were able to
incorporate this method; hence, we saw the rise of Content Delivery Networks.
2.3.1 Content Delivery Networks
Content Delivery Networks [47] provide multi server to multi client paradigm
for everyone. We can divide the building blocks of CDNs into the following
categories [25]:
Storage
CDNs have a vastly distributed network over the Internet and host the contents
of their customers in many locations. CDNs were only hosting static contents
Chapter 2. Background and Related Works 38
in the beginning, but now dynamic content hosting is also available. Because
of the relationship between CDNs and content providers, content inconsistency
problem of web caches is eradicated. Content is given to CDNs and they will
replicate it on their servers across the globe. These servers are usually placed
at the Internet peering points or inside of the operator’s network.
Structure of CDNs is complex, mostly because of how TCP/IP was de-
signed, as a host-to-host communication protocol. CDNs lack a fine grain
control over the placement of their servers, the service is provided for a subset
of applications and there is no collaboration between different CDNs.
Request Routing
CDNs are a distributed implementation of the multi-server to multi-client
model and they forward the requests from clients towards the best server.
Request routing in CDNs plays the role of load balancing and is usually done
using Domain Name Service (DNS). As we know, DNS is designed for trans-
lating domain names to IP addresses, but CDNs are exploiting it for load
balancing.
When a user sends a request for a URL, a DNS request is sent to the user’s
recursive DNS server. This server will forward the request to the authoritative
DNS server responsible for the domain name. Then the authoritative server
will respond with the IP address of the server that hosts the requested content.
Now consider if the authoritative DNS server sends a different IP address based
on different parameters such as location of the user’s recursive DNS server,
content availability or server condition, we will have a simple load balancing
Chapter 2. Background and Related Works 39
mechanism and this is what CDNs usually do [48]. Another possibility is that
authoritative DNS resolve to an Anycast IP address, and then the request will
be routed to the nearest authoritative DNS server, where each one can return
a different IP address to the user.
This process heavily relies on DNS, which arises several issues. For exam-
ple, when a user uses Google’s DNS, the recursive DNS will not represent user’s
true location. This method also assumes that user’s recursive DNS server fol-
lows the TTL of the DNS response and removes it after the expiration, but
DNS protocol does not guaranty this and in practice, it is not followed [49]
either.
Authoritative DNS server can also return an Anycast IP address to the
user, which will result in a similar load balancing effect. The difference in this
method is that the system will be less DNS dependent, but the trade-off is
that every cache server should have exactly the same content. Furthermore,
the IP routing will be using the shortest path algorithm only and not other
factors like server load.
2.3.2 Content Provider’s Cache
The Internet is changing very fast. Fifty percent of the traffic is generated by
35 services only (Fig. 2.11). Compare this with 2007, where fifty percent of
traffic came from thousands of web sites and 2009, where fifty percent of traffic
came from 150 web sites. These significant content sources, such as Netflix
and Google, are creating their own version of a Content Delivery Network.
Content providers are putting their own caches inside operators’ network to
Chapter 2. Background and Related Works 40
Figure 2.11: Internet traffic source distribution in 2013 [7]
serve the users. This architecture is a win-win situation for both the content
providers and network operators, since the operators will also save the traffic
that otherwise would pass their Internet exchange connection.
Some content providers are very keen to work with operators to provide
these caches. For example, Netflix Open Connect [50] program is rapidly
expanding its coverage, since they offer to install and maintain a cache for
free in an ISP’s network. Netflix will save money by not using a traditional
CDN while increasing its customers’ quality of experience because customers
will experience lower delays. Netflix uses a proactive caching technique where
they place their popular content in the cache during off-peak hours and then
later serve that content to users. Large content providers, datacenters and
CDNs not only directly connecting to each other, but also, by bypassing tier-1
providers are connecting to operator’s network [8]. These interconnections are
changing the face of the Internet from a hierarchical architecture to a flatter
Chapter 2. Background and Related Works 41
Sprint, MCI, UUnet, PSInet
NAP NAP
ISP1 ISP2 ISP3
NationalBackboneOprators
RegionalAccess
Providers
LocalAccess
Providers
CustomerIP Network
Consumers and business customers
(a) Traditional Internet Logical Topology
IXP
ISP1 ISP2
Global InternetCore
Regional / Tier 2Providers
CustomerIP Network
Consumers and business customers
IXP IXP
Hyper GiantsLarge Content, Consumer, CDN
Global Transit /National Backbones
(b) Emerging New Internet Logical Topol-ogy
Figure 2.12: Internet’s architecture is changing [8]
one (Fig. 2.12).
2.3.3 Transparent Caching
Transparent cache [51] servers are caches deployed by the operators directly
in their network and have a full control over them. Simply put, a transparent
cache looks at all the contents of different applications such as video in the
operator’s network and serves them directly if possible. For example, it will
detect if a video content is getting popular, then caches it locally and serves
the users from that local cache. Transparency is important and the cache does
not meddle with user’s requests such play, pause or fast-forward or the adver-
tisements that the content provider puts in their content. This transparency
allows using these nodes to cache different types of contents from different
sources without the need of an agreement between the operators and content
providers. These equipments are usually expensive and use techniques such as
deep packet inspection.
Chapter 2. Background and Related Works 42
Service providers use caching to delay congestion in their network as much
as possible. In Chapter 4, we review how caching and routing effect the onset
of congestion of the network and will show how using ICN for content delivery
is beneficial for service providers.
2.3.4 Cache Placement in ICN
There is a wealth of literature on cache deployment in the context of ICN [26,
52, 53] some with contradictory results. The authors in [54] provide an an-
alytical model of the cache miss probability of a single caching system and
extend it to a network of caches. In [26], the authors use different central-
ity metrics for sizing storage in a content-centric networks, but couldn’t find
an incentive for heterogeneous caching. The authors in [55] solve a budget
constrained caching problem in Content-centric networking context and note
that topology has a significant impact on the optimal cache placement. They
have considered hop counts as the base metric for optimizing cache placement.
Reference [25] studies the evolution of CDN and its challenges and shows how
ICN paradigm can help to overcome them.
2.4 Summary
In this chapter, we reviewed Information-Centric Networking. We compared
two implementations of this paradigm, Named-Data Networking and Mobil-
ityFirst. Also, we discussed the CVST platform and the requirements of its
data dissemination layer. We also studied how Content Delivery Networks
Chapter 2. Background and Related Works 43
distribute content optimally until the network of service providers, and the
effects of the Over-The-Top on Service Providers.
Chapter 3
Data Dissemination in CVST
In this chapter, we describe the design and implementation of the ICN-based
data dissemination layer of CVST and the content-based publish/subscribe
overlay using that layer.
In Section 3.1, we will discuss the detail design of our ICN-based Data
Dissemination (IDD) layer. In Section 3.2, we review the architecture of the
content-based publish/subscribe system in CVST and in Section 3.3 we will
discuss implementation detail of the system. In Section 3.5, we review the
performance tests and evaluations of the system.
3.1 Data Dissemination using Information Cen-
tric Networking
Fig. 3.1 denotes the major building blocks of the CVST platform including
data ingestion through publishers, data dissemination layer, analytics and al-
44
Chapter 3. Data Dissemination in CVST 45
DatabaseSubscriber(RawData)
Broker
API
Portal
DataForm
at
DataValidation
DataCleansing
No
Yes
Publisher
DataAnonymization
Applications(Subscribers)
Alg.Engine
CongestionPricing
Routing
AnalyticsEngine
Simulation
Data Dissemination (over ICN)
Figure 3.1: Application Platform for Smart Transportation
gorithmic engines, application programming interfaces (APIs), and the end-
user portal. As depicted in Fig. 2.6, the IDD is the top sub-layer of PaaS layer.
Its task is to disseminate arbitrary streams of various data from any source to
any destination. Sources may include road sensors, cameras, social application
feeds, public transportation GPS traces, construction events, incident reports,
open data, and private data that can only be accessed securely. Data streams
can be real-time or retrieved from data stores. The system must be extensible
and be able to accommodate new sources of information. The IDD sub-layer
also provides data verification and integrity, privacy, and security.
The communication layer in Fig. 3.4 is based on NDN paradigm [24]. In
NDN, naming is one of the most important parts of application design. It can
affect the performance and complication of the system. Names are used to
route Interest packets towards the destination and to select the applications
responsible for processing the Interest packets. Following a proper Naming de-
sign, NDN provides support for data mobility, provenance and integrity. How-
Chapter 3. Data Dissemination in CVST 46
ever, one of the requirements of the CVST platform is to have real-time event-
notification capability and NDN does not inherently support that. Therefore,
we extended NDN with event-notification capability to unify content distribu-
tion and event notification. We have two naming design in the IDD layer. One
is for publisher-broker communication and another one for subscriber-broker
communication.
3.1.1 Publisher-Broker Exchange
Each publisher, on start, will have a conversation with the broker to let it
know that it is alive. The publisher will send an Interest packet that can also
contain some configuration in the naming. We call this the “start” process.
Data name, denoted in Fig. 3.2, consists of four parts. The first part is the
name of the broker, here /broker, the second part is the action of this Interest,
here /pub/start, and the third part is the full name of the publisher. The
last part is the publisher-specific configuration that is encoded and added to
the name and is read by the broker. This configuration includes a sequence
number to be used by the broker.
After the publisher sends out the start Interest packet, the broker will
respond with an acknowledgment packet, including some configurations. The
broker will also immediately sends out an Interest packet to the publisher.
The name of this Interest packet consists of three parts. The first part is the
name of the publisher, the second part is the action /data and the last part is
the sequence number of the latest data that the broker has already received.
At the start, the sequence number is a random number and will increase
Chapter 3. Data Dissemination in CVST 47
Publisher Broker/broker/pub/start/<pub_id>/<config>
ACK
New Data
/publisher/data/<seq#>
/publisher/data/<new seq#>
Figure 3.2: Publisher-Broker Communication
over time. The publisher satisfies this Interest packet when it has new infor-
mation available to publish. Then the brokers will receive that information
and immediately sends the next Interest packet. The Interest packets at the
publisher may expire without any new data. The expiration of the Interest
means that if the publisher does not generate any new data for a while, it can-
not send it to the broker. However, this is not a problem, since the publisher
will re-initiate the “start” process discussed above periodically, and will receive
another Interest packet from the broker. It must be noted that the sequence
number is a choice made by the publisher.
3.1.2 Subscriber-Broker Exchange
Similar data exchange will happen on the subscriber side as denoted in Fig. 3.3.
On start, the subscriber will send its alive status to the broker. This name of
this Interest packet consists of, the name of the broker, the action /sub/start,
and the full path of the subscriber, plus the subscriber’s specific configuration.
The broker acknowledges this action and responds with a set of configurations.
Chapter 3. Data Dissemination in CVST 48
Subscriber
/<sub_id>/match/<data_name>
/broker/sub/data/<#sub_id>/<seq#>
/borker/sub/<#sub_id>/<new seq#>
/broker/sub/start/<sub_id>/<config>
ACK
Broker
/<data_name>
Data
Figure 3.3: Subscriber-Broker Communication
Then the subscriber sends an Interest packet for the data notification. The
name of this Interest consists of the name of the broker, the action /sub/data,
the path of the subscriber and the sequence number of the data the subscriber
has already received. When the broker acquires a data from the publisher that
matches what the subscriber has requested, the name of that data will be sent
to the subscriber. The subscriber will then use that name to claim the data.
After receiving of the data, the subscriber sends the next data request.
Subscribers are required to send the Interest packets periodically. This
periodic Interest acts as a heartbeat for the subscriber and provides more
flexibility for the broker. For example, the broker will be able to pause the
data matching for a subscriber if the heartbeat stops. It is also possible to
register a subscriber without the heartbeat. In that case, the subscriber will
provide a callback name in the registration process and the broker will send an
Interest similar to the start Interest of the publisher to notify the subscriber of
Chapter 3. Data Dissemination in CVST 49
the data. Then the subscriber will send the data request Interest accordingly.
3.1.3 Discussion
In this section, we discuss the reasoning behind our design and its benefits in
different scenarios.
Simple is better than complex
One of the advantages of the presented naming design is to simplify the ar-
chitecture of the broker and to minimize the necessity of keeping the state of
the system as much as possible. For example, in the publisher-broker commu-
nication, a publisher either sends out the “start” Interest to notify the broker,
or satisfies the Interests from the broker. On the other hand, the broker only
needs to have access to the latest data it has received from the publisher.
The same reasoning may be applied to the side of the subscriber. By
employing the heartbeat, the subscriber will become responsible for keeping
its status alive for the period of its interest in the data. The broker can be
configured to stop data matching for the subscriber due to lack of heartbeat
signal.
Mobility
One of the main advantages provided by the IDD layer is the mobility support.
The IDD layer uses point-to-point communication protocol, unlike TCP, which
is an end-to-end protocol. Nodes will communicate by knowing the name of the
data that they are interested in, and the network will route that data towards
Chapter 3. Data Dissemination in CVST 50
the destination. So a mobile publisher will continue to receive Interest packets
from the broker, and the broker will receive the data, without the need to
re-initiate the communication.
If the network becomes partitioned, the publisher will not receive the In-
terest packets from the broker. When new data is available, it will go through
the “start” process, since there is no pending Interest from the broker by the
publisher side. Based on the publisher configuration, this can be repeated
indefinitely. The publisher can also be configured to store the historical data
locally. When the link is back on, the broker will be notified of the existence
of the new data and will send the Interest for the data. The sequence number
in the Interest contains the latest information the broker has received. The
publisher, based on its configuration, may have stored the historical data, and
therefore, will send them to the broker, or the historical data was not needed,
and the last available data will be forwarded to the broker. For example, the
history of a live video stream will not be saved; however, the log of a traffic
sensor data will be saved, and if asked by the broker, will be sent over. The
historical data can be purged when the broker sends the new Interest packet
with a new sequence number acknowledging the receive of the data. This sce-
nario also happens if the Interest packet sent from the broker to the publisher
is dropped.
If the data packet from the publisher is dropped, the broker will not send
new Interest packet to the publisher, and there will be no Interest packet by
the publisher side for the next data. To resend the data again, the publisher
will go into the “start” process after a deadline and will notify the broker of
Chapter 3. Data Dissemination in CVST 51
the existence of the data. The deadline is configurable, and it is on the scale
of the round-trip time.
On the side of the subscriber, the heartbeat will be received by the broker
even if the subscriber is mobile. Interest packet acts as a breadcrumb for the
Data packet and data will take the reverse path of the Interest packet until it
reaches the subscriber. If the subscriber has moved from its original place, it
will resend another heartbeat interest. The heartbeat Interest will either be
satisfied by one of the upper layer routers, due to in-network caching, or by
the broker itself.
The heartbeat will also resolve the network partitioning scenario. After
network partitioning, the subscriber will not receive any new data, and the
heartbeat Interest will contain an old sequence number. After resolving of the
network partitioning, the broker will receive a heartbeat with an old sequence
number. If the broker has been configured to store the historical data for
the subscriber, the name of the historical data will be sent to the subscriber.
Otherwise, the name of the new data will be sent over. The same process
happens if the heartbeat Interest or the Data packet is dropped on their way.
As presented earlier, the broker only sends the name of the matched data.
Therefore, the subscriber will be responsible for retrieving that matched data,
which follows the receiver-driven philosophy of the ICN paradigm.
Better Infrastructure Utilization
Two important features of the IDD layer are multicast and in-network caching
capability. Routers store all the incoming interests in a Pending Interest Table
Chapter 3. Data Dissemination in CVST 52
and, upon receiving of the data, satisfy all the Interest packets for the same
name at once. They also use their storage to save the content in their content
store to satisfy future Interest packets. As presented earlier, the broker will
send only the name of the data to the subscribers, and the subscribers will
request that data separately. The broker can satisfy the Interest packets of all
the subscribers with the same data packet, i.e. the broker sends out the data
only once and all the subscribers will receive it.
Scalability
Multiple broker instances may receive the published data for the system. This
scalability is achieved by using the same name for all of them. The best route
selection strategy will send the Interest packets towards the lowest-cost next
hop. Since the sequence number is set by the publisher and increasing over
time, one of the instances of the broker will receive the data.
To handle scalability at the broker, we use the fact that, in our design,
the subscriber will receive the name of the data from the broker and will send
another Interest packet to retrieve the matched data. Thus, the power of
forwarding subscribers to the right place is in the hand of the broker. The
best route selection strategy can also help with the forwarding of the Interest
packets for the matched data to the proper server. For example, the broker can
forward the subscriber to the publisher itself and avoid storing the data itself,
using the fact that data names are unique, and the network is responsible for
retrieving the data.
Chapter 3. Data Dissemination in CVST 53
Security
We must address two problems in the security domain. First, the broker must
accept data only from known publishers. No one should be able to inject data
into the system. Second, no one should be able to access the published data
without authorization.
Every Interest and Data packet are signed to ensure the provenance and
integrity of the data. In the CVST platform, the first problem is solved by
having the broker act as the trustworthy key management system. The bro-
ker issues certificates for publishers and subscribers in the registration phase.
These certificates are used to sign the Interest and Data packets. Through
the broker, everyone has access to each other’s public key and can validate the
provenance of the data.
To solve the second problem, we must use a shared key encryption algo-
rithm. The broker will issue the shared key and provides it to the publishers
and subscribers. Since the broker can validate the identity of the publishers
and subscribers, only the authorized users will have access to the key. The
publishers encrypt their data, and the broker and subscribers decrypt the data
with the key.
3.2 Broker Architecture
CVST collects data from many types of producers, however, the consumers
are typically interested in a portion of the published data. Therefore, CVST
uses a content based publish/subscribe paradigm. For example, the central
Chapter 3. Data Dissemination in CVST 54
Communicatoin Layer (IDD, IP, ...)
Registration/SchemaManager
SubscriptionTable
PublicationTable
DistributedMatching Engine
Publishers Subscribers
Message Queuing
Broker
Figure 3.4: High-level architecture of content-based publish/subscribe overIDD in CVST
database of the CVST platform is a subscriber that receives all the new data
updates from all the data sources. In another example, a drone incident is a
content type that is generated and disseminated in the platform. A subscriber
may only be interested in the drone incidents in a particular area. Fig. 3.4
shows the high-level architecture of the publish/subscribe system in CVST
platform. The system consists of publishers, subscribers, and the broker that
communicate using a communication layer.
Micro-service Abstraction
Using micro-services is an approach in software system design in which the
system is structured into smaller individual service units. Each service runs as
Chapter 3. Data Dissemination in CVST 55
API Type Description
/register XPUB Register a schema in the broker for publication/unregister XPUB Remove the schema from the broker/publish XPUB Publish new data based on a registered schema/subscribe XSUB Register a query in the broker for subscription/unsubscribe XSUB Remove a registered query from the broker/schemas XSUB Request for the list of registered schemas/schemas/<:id> XSUB Request for a specific schema using its id
Table 3.1: The APIs exposed by XPUB and XSUB services
an independent process and communicates with other services through APIs.
To be able to abstract the implementation of different communication pro-
tocols, the broker is divided into three services, XPUB, XSUB, and Matcher.
Fig. 3.5 shows how these services communicate with each other using a message
queuing system.
XPUB is responsible for communicating with the publishers and XSUB
with the subscribers. XPUB and XSUB hide the complexity of different com-
munication layers from the Matcher and provide the ability to add more proto-
cols without affecting other parts of the system. They define a set of APIs that
can be used by publishers and subscribers to talk to the system. They also hide
the complexity of the Matcher from the publishers and the subscribers, which
provides the ability to improve and replace the Matcher without affecting cur-
rent publishers and subscribers. XPUB and XSUB support multiple protocols.
For each protocol, a separate instance of XPUB or XSUB is started.
For example, an instance of XPUB may listen on any address such as
"ndn:/broker/xpub", "tcp://0.0.0.0:4040" or "http://broker:8080/broker/xpub",
as long as XPUB or XSUB has the protocol implemented.
Chapter 3. Data Dissemination in CVST 56
XPUBWorkers
XSUBWorkers
Communication Layer (IDD, IP, ...)
MatcherWorkers
Message Queuing
Figure 3.5: Design of the Broker: Abstraction of the complexity of differentsystem components
Schema Registration
Each publisher must first register itself with the system. Registering a pub-
lisher means the publisher must provide a schema for its data, which will be
used later by the broker to verify the structure of the incoming data from
the publisher. Schema will also be used by subscribers to define the criteria
of their subscription. The registration information of the publisher is saved
in the Publication Table. The publisher may also provide some additional
configuration in the registration process.
Subscriptions
For subscribers, registration involves providing a query based on data schemas.
These queries are based on the schema that publishers have registered in the
system. Subscribers also provide callback paths that are used by the broker to
send notifications about the newly published data that match the registered
Chapter 3. Data Dissemination in CVST 57
Publisher XPUB Matcher XSUB Subscriber
Registraion
Publication
Subscription
Match Notification
Data Retrieval
register schemastore schema
ok
ok
register query
store query
ok
okpublish data
match datadata is matched
send notification
data request
data
Figure 3.6: Sequence Diagram of the Content-Based Publish/Subscribe System
queries.
Message Queuing
All the communication between the components of the broker is facilitated
by using a Message Queuing system. Using message queuing separates the
different components of the system and facilitates distributing them among
the various machines. Message Queuing, itself, is a cluster of nodes which act
as one logical system from points of view of other components.
Matching Engine
The incoming data from a publisher is sent to the XPUB. The XPUB puts
the data in a queue which is then picked up by one of the Matcher’s workers.
At first, the matching engine, based on the Publication Table, checks if the
Chapter 3. Data Dissemination in CVST 58
data conforms to the schema provided by the publisher, if not the data is
rejected. Then the data is matched against the Subscription table. If a match
is found, the data is put back in another queue with additional data from the
subscription. The matched data is picked up from the queue by one of the
XSUB instances, which sends notifications to the subscribers. The matched
data is stored for later retrieval by the subscribers. The sequence diagram of
the publication, matching and data retrieval is depicted in Fig. 3.6.
3.2.1 Discussion
In this section, we discuss the advantages of abstraction and breaking down
of the broker functionality across multiple micro-services.
Agility
By having separate services for different tasks, development can be focused on
individual components independently. Each service can be updated or even
replaced without affecting the other parts of the system as long as there is
an instance in the system that provides the compatible APIs. For example,
matching engine can be updated or even replaced independently of XPUB and
XSUB.
Efficiency
Another advantage of micro-service based design is more efficient use of un-
derlying infrastructure. As discuss in Section 2.2, CVST is running on top of
an IaaS layer which can provide resources on demand. Therefore, each service
Chapter 3. Data Dissemination in CVST 59
W W W W W W
XPUB XPUB XPUB XSUB XSUB XSUB
Matcher Matcher Matcher
MQ MQ
Load Balancer
Publisher Publisher
Subscriber Subscriber
Figure 3.7: Scalability of the Broker with Micro-service design
can independently ask only for the resources it requires, which increases the
efficiency.
Scalability
The Micro-service design provides solutions that can scale well to mitigate
high traffic demands. Although the broker is logically one node, the high
traffic load will be distributed among different system components. Fig. 3.7
shows how the system may run in a distributed way. Each part of the system
can run as a set of instances, which are glued together by the message queuing
system.
Chapter 3. Data Dissemination in CVST 60
For example, an XPUB may consist of multiple instances behind a load
balancer listening on a particular network address. One of these instances will
receive the data from a publisher and sends it to the message queuing system.
Message queuing is itself a cluster of nodes. XPUB workers can connect to any
of the Message queuing nodes and store the data in the queue. The Message
queuing will notify the Matcher workers about the new data. The data is
replicated on the other nodes to protect the system against failure.
One of the available workers of the matching system will pick up the new
data from the queue and match it against the subscription queries. The match-
ing engine is also a cluster of nodes, and all the nodes in that cluster can match
data and registered queries. The workers of the Matcher may connect to any
of the nodes in the matching engine cluster to do the matching, and if there
is a matched subscription, the worker will store the data back in the message
queuing system for the XSUB instances.
Similarly, XSUB load is distributed between its instances by the message
queuing system. One of the workers of XSUB picks up this data and notifies
the subscribers about the new match. XPUB and XSUB do not keep any
state about their clients and expose a set of RESTful APIs. Also, the clients
do not keep any state about the instance of the service, with which, they are
communication. Furthermore, on the network level, using IDD ensures that
there are no connections made between clients and the instances.
Chapter 3. Data Dissemination in CVST 61
3.3 Implementation
In this section, we review some implementation details of different components
of the content-based publish/subscribe overlay.
3.3.1 Broker Implementation
In this section, we review the implementation details of different broker com-
ponents. As depicted in Fig. 3.5, the broker has three main components,
Matcher, XPUB, and XSUB. They communicate with each other using a Mes-
sage Queuing system. XPUB and XSUB communicate to the outside worlds
using the Communication Layer.
Message Queuing
For Message Queuing, we use RabbitMQ [56]. RabbitMQ is an open source
messaging system that is robust and easy to use. It runs on all major operating
system and supports a lot of developing platforms. It also provides clustering
and high availability features.
Matching Engine
The Matching Engine is responsible for checking if the incoming data satis-
fies the set of constraints defined by the queries registered by the subscribers.
Running Queries is one the main features of any database engine. However,
matching data against a query, i.e. a reverse query, is not a standard feature.
To have a fast, distributed and reliable Matching engine, we have chosen Elas-
Chapter 3. Data Dissemination in CVST 62
ticsearch [57]. Elasticsearch is a distributed search engine that provides the
reverse query capability known as Percolator [58]. We also used Elasticsearch
to store Publication and Subscription tables. When a publisher registers its
schema or a subscriber registers its query, the corresponding data will be saved
in Elasticsearch for later retrieval.
Matcher, XPUB and XSUB
Matcher, XPUB and XSUB are implemented in Python language. As discussed
in Section 3.2.1, each component runs as a set of independent processes that do
the same job in parallel. For example, the workers of Matcher pick up the data
from RabbitMQ and match them against the subscriptions using Elasticsearch
(Fig. 3.4) and then put the result back in another RabbitMQ queue.
3.3.2 Communication Layer
The system supports two communication protocols: IDD and HTTP. IDD
communication is based on the design discussed in Section 3.1. The HTTP
APIs provide a similar URL syntax as the IDD layer. Publishers and Sub-
scribers have the option to choose either of these protocols to communicate
with the broker.
Data Serialization
We use Apache Avro [59] as the data serialization system. Apache Avro is a
sub-project of Apache Hadoop [60] and uses a compact binary data format
with a rich data structure and integrates with many developing platforms,
Chapter 3. Data Dissemination in CVST 63
1 {2 "namespace": "ca.cvst.broker",3 "type": "record",4 "name": "xpub",5 "fields": [6 {"name": "publisher_schema", "type": "string"},7 {"name": "data", "type": "bytes"}8 ]9 }
Figure 3.8: Apache Avro schema used in XPUB-Matcher communication
1 {2 "namespace": "ca.cvst.broker",3 "type": "record",4 "name": "xsub",5 "fields": [6 {"name": "subscribers", "type": "map", "values":7 {"type": "array", "items": "string"}8 },9 {"name": "data", "type": "bytes"}
10 ]11 }
Figure 3.9: Avro schema used in Matcher-XSUB communication
without the need for code generation. Apache Avro relies on data schema to
read and write the data. It also supports schema exchange in a connection
handshake. An Apache Avro schema is a JSON1 document.
For each type of published data, every part of the system, such as pub-
lishers, subscribers, and the broker will use the same schema to encode and
decode the data. Using one schema throughout the system for each data source
ensures the consistency of the data everywhere. For example, if the publisher
uses an unknown, invalid or tampered schema, the broker will not verify the
data and will drop it. On the other hand, using Apache Avro provides the
capability of schema evolution without disruption of the system functionality.
In other words, if the data schema changes, each component can still use the
1JavaScript Object Notation
Chapter 3. Data Dissemination in CVST 64
old schema until the new schema is propagated to every component in the
system.
XPUB-Matcher Communication
XPUB workers listen on one or multiple addresses for new data publications.
The publications are received from the publishers in a binary format, seri-
alized by Apache Avro. Then, XPUB adds the schema name of the pub-
lishers and stores the data in the queue for Matcher. Fig. 3.8 shows the
schema used for XPUB-Matcher communication. The schema has two fields.
“publisher_schema” is the name of the publisher, such as “TTC” and “data”
is binary data encoded by Apache Avro.
Matcher-XSUB Communication
If the published data matched any of the subscriptions, Matcher will put the
data back in the query so XSUB workers can notify the subscribers. We use
“headers” exchange in RabbitMQ to send data to multiple XSUBs only once.
In addition to the published data, Matcher will also include the list of the
subscribers that each XSUB must notify. Then the data is sent to the sub-
scriber. The subscribers choose the communication protocol in the registration
process. Fig. 3.9 shows the Apache Avro schema used in the Matcher-XSUB
communication. The field, “subscribers”, is the list of subscribers callback
addresses that the XSUB must notify about the new “data”.
Chapter 3. Data Dissemination in CVST 65
1 {2 "main_road_id": "C09-00069",3 "main_road_name": "HWY -2",4 "ref_road_ID": 4308,5 "ref_road_name": "Kingsway",6 "length": 2.145277 "JAM_FACTOR": 1,8 "avg_speed_capped": 51.15,9 "avg_speed_uncapped": 51.15,
10 "free_flow_speed": 55.92,11 "confidence": 0.92,12 "timestamp": 146311830213 }
Figure 3.10: Sample data gathered from traffic sensors
1 {2 "namespace": "ca.cvst.schemas",3 "type": "record",4 "name": "hw_sensor",5 "fields": [6 {"name": "main_road_id", "type": "string"},7 {"name": "main_road_name", "type": "string"},8 {"name": "confidence", "type": "float"},9 {"name": "ref_road_ID", "type": "int"},
10 {"name": "ref_road_name", "type": "string"},11 {"name": "length", "type": "float"}12 {"name": "JAM_FACTOR", "type": "int"},13 {"name": "avg_speed_capped", "type": "float"},14 {"name": "avg_speed_uncapped", "type": "float"},15 {"name": "free_flow_speed", "type": "float"},16 {"name": "timestamp", "type": "long"}17 ]18 }
Figure 3.11: Schema of the traffic sensor data
3.4 Examples
In this section, we review some examples that are published using our control-
based publish/subscribe system. We review three publications: traffic flow
sensors, public transportation, and live video feed of drone flights. We also
review our subscription portal, which can be used to create subscription queries
and receive publication data in real-time.
Chapter 3. Data Dissemination in CVST 66
1 {2 "bool": {3 "must": [4 {"match": {"main_road_name": "HWY"}},5 {"match": {"main_road_name": "401"}},6 {"match": {"main_road_name": "Express"}},7 {"range": {"avg_speed_capped": {"lte": 60}}}8 ]9 }
10 }
Figure 3.12: Sample subscription for traffic sensor data
1 {2 "match_all": {}3 }
Figure 3.13: A match all query
3.4.1 Traffic Flow Sensors
CVST collects data from the traffic flow sensors installed on the roads of the
city of Toronto. A publisher receives the raw data from a live feed, and before
publication, parses, validates and cleans them. Fig. 3.10 lists a sample data
received from traffic sensors.
main_road_id is a unique string for the main road the sensor covers.
main_road_name is a text description of the road. ref_road_id is a unique
identifier for the location of the sensor. ref_road_name is the text description
of the location of the sensor. length is the length of the road that is covered
by the sensor in kilometers. JAM_FACTOR is a number between 0 and 10 and
indicates the expected quality of the travel. As the number approaches ten,
the quality of travel is getting worse. For example, when there is a road clo-
sure, the Jam Factor will be 10. avg_speed_capped is the average speed of the
road in km/h capped by the speed limit. avg_speed_uncapped is the average
Chapter 3. Data Dissemination in CVST 67
Figure 3.14: Data of traffic sensors on the CVST portal
speed of the road in km/h not capped by the speed limit. free_flow_speed
is the free flow speed on this part of the road. confidence is an indication
of how the speed was determined and is usually a value between 0.7 and 1.0.
If the road is closed, the value is -1. timestamp is a Unix time epoch which
indicates the time the data has been generated.
The publisher will use the Apache Avro schema listed in Fig. 3.11 for
data serialization. Each schema has a namespace and a name. The type
of the schema is always “record”. Each schema defines a series of fields
that maps directly to their corresponding fields in the data. As shown in
Fig. 3.11, namespace is set to ca.cvst.schemas, and the name of the schema
is hw_sensor. Therefore, the Fully qualified domain name (FQDN) of the
schema is ca.cvst.schemas.hw_sensor. For each data field in Fig. 3.10 there
is a field in the schema. For example, main_road_id is defined as a field with
Chapter 3. Data Dissemination in CVST 68
1 {2 "vehicle_id": 1007,3 "coordinates": [4 -79.50425,5 43.7791486 ],7 "routeNumber": "41",8 "route_name": "41-Keele",9 "dirTag": "41_0_41A",
10 "heading": "216",11 "predictable": true,12 "GPStime": 1463363577,13 "last_update": "Mon, 16 May 2016 01:53:00 -0000",14 "timestamp": 1463363581,15 "dateTime": "Mon, 16 May 2016 01:53:01 -0000"16 }
Figure 3.15: Sample data gathered from public transit vehicles
the type of string, and JAM_FACTOR is defined as a field with the type of int.
A subscriber can define a query based on the schema in Fig. 3.11. Fig. 3.12
lists a sample query that asks the broker to send the data of the sensors on
HWY 401 Express that report a speed less than or equal to 60 km/h. Here, the
match against HWY 401 Express is defined as a combination of three smaller
match conditions. In addition to the road name, a range condition is defined
for the avg_speed_capped. All of these conditions are wrapped in a must
segment, which acts as a logical and operator.
A subscriber can receive all the data published by a publisher by registering
a match_all query as listed in Fig. 3.13. The central database in the CVST
platform (Fig. 3.1) is one of the subscribers that receives all the data and makes
them available to be processed by other parts of the system, such as analytics
engine. The portal server is another subscriber to the data and notifies the
web clients of the changes in the data, and the web clients will update their
interface accordingly. Fig. 3.14 shows the presentation of the traffic sensor
Chapter 3. Data Dissemination in CVST 69
1 {2 "namespace": "ca.cvst.schemas",3 "type": "record",4 "name": "ttc",5 "fields": [6 {"name": "vehicle_id", "type": "int"},7 {"name": "coordinates", "data_type": "geo_point",8 "type": {9 "type": "array", "items":"double"}
10 },11 {"name": "routeNumber", "type": "string"},12 {"name": "route_name", "type": "string"},13 {"name": "dirTag", "type": "string"},14 {"name": "heading", "type": "string"},15 {"name": "predictable", "type": "boolean"},16 {"name": "GPStime", "type": "long"},17 {"name": "last_update", "type": "string"},18 {"name": "timestamp", "type": "long"},19 {"name": "dateTime", "type": "string"}20 ]21 }
Figure 3.16: Schema of for Toronto Public Transit Data
data on the CVST portal.
3.4.2 Public Transportation
Another source of data in CVST is the real-time information of the Toronto
Public Transit fleet. Fig. 3.15 shows a sample data reported by public tran-
sit vehicles and Fig. 3.16 shows the schema for that data. vehicle_id is a
unique id for the vehicle that has reported the data. coordinates is an array
of numbers that represent the current longitude and latitude of the vehicle.
routeNumber is a unique id for the route that the vehicle is operating on.
route_name is the text description of the route. dirTag provides more infor-
mation about the route. heading specifies the heading of the vehicle in degrees
and is between 0 and 360. A negative value indicates that the heading is not
currently available. predictable specifies whether the vehicle’s location is
Chapter 3. Data Dissemination in CVST 70
1 {2 "bool" : {3 "must" : {4 "match" {5 "routeNumber" : 416 },7 },8 "filter" : {9 "geo_distance_range": {
10 "from": "50m",11 "to": "1km",12 "pin.location": {13 "lat": 43.779148,14 "lon": -79.5042515 }16 }17 }18 }19 }
Figure 3.17: A sample geo distance query for public transportation data
currently predictable. GPStime specifies the time of the GPS installed on the
vehicle. last_update specifies the last time that the vehicle has reported its
position. timestamp specifies the Unix time epoch of the report. dateTime is
a text representation of timestamp.
Notice that coordinates (Line 7 in Fig. 3.16) has an extra attribute
data_type. This extra attribute provides more information for the match-
ing engine about the nature of the data and adds the capability for the sub-
scriber to define specific queries. For example, coordinates is defined as a
geo_point. Therefore, a subscriber can define a geo-distance query by provid-
ing a coordinate and a distance from that coordinate, and the matching engine
will calculate if the data point falls in the specified area. This data specific
queries are only possible if the matching engine knows in advance that data is
a geo_point. Fig. 3.16 shows a sample geo distance query that asks for the
data of all the vehicles of route number 41 when the distance of the vehicles
Chapter 3. Data Dissemination in CVST 71
OctorotorTelemetryRadio TelemetryRadio
HDCamera DigitalVideoTransmitter
DigitalVideoReceiver
Video Processing
Loca�onProcessing
Publisher
Ground StationSystem
BrokerDB
Subscriber
VideoRecording Engine
DatabaseServer
Portal ServerClients
CVST Platform
PortalSubscriber
VideoStorage
Figure 3.18: Publishing Drone Data
to a particular location is between 500m to 1km.
3.4.3 Drone Vision as a Service
Road traffic information is typically gathered from sources, such as loop de-
tectors, radar detectors, traditional CCTV and infrared cameras and mobile
probes employing technologies such as GPS. Installation of these sources is
costly, and so they are installed only on the main roads and intersections with
high traffic. Some sources such as highway cameras have other limitations as
well. For example, the maximum height of installation of a camera is limited.
Moreover, most of the sources are immobile, and if there are changes in the
traffic pattern in the city, they are not useful.
Visual analytics play an important role by offering immediate surveillance
in small and large cities. However, current monitoring systems are spatially
Chapter 3. Data Dissemination in CVST 72
1 {2 "_id": "1",3 "ect": "Tue, 24 Nov 2015 12:53:33 -0000",4 "geojson": {5 "features": [6 {7 "geometry": {8 "coordinates": [9 -79.3344972,
10 43.70273211 ],12 "type": "Point"13 },14 "properties": {15 "name": "Center"16 },17 "type": "Feature"18 },19 ],20 "type": "FeatureCollection"21 },22 "timestamp": 1448369613,23 "video": {24 "src": "/api/drone_camera/1/video",25 "type": "video/mp4"26 }27 }
Figure 3.19: Sample Drone Data
blind. For example, they can only provide road condition and visual cover-
age at discrete locations with a limited number of traffic cameras and data
sensing devices that are not sufficiently dense to provide on-demand immedi-
ate visual surveillance. Unmanned Autonomous Vehicles (UAV), drones, are
applicable in multiple smart city domains, including transportation, construc-
tion, agriculture, etc. In this section, we discuss how our platform offers an
infrastructure for Vision as a Service (VaaS) using UAVs.
UAVs are good candidates to help gather real-time information with a bet-
ter view and lower cost than the sensors currently in use. However, they need
a platform that can handle their mobility and provide security and real-time
data analysis. They can travel at higher altitude and speed than vehicular traf-
Chapter 3. Data Dissemination in CVST 73
Figure 3.20: Video playback of a drone flight on CVST portal
fic towards the incident location. VaaS provides the city planners the ability
to allocate the required resources, i.e. drones and their associated networking
and computing resources, on demand and extends the coverage of the existing
intelligent transportation systems.
Fig. 3.18 shows the functional blocks of the system. We have deployed an
Octorotor UAV, mounted with an HD camera. The camera signal and GPS
location information are transmitted to a nearby ground station, where the
video feed is transcoded and published to the system.
At first, the publisher publishes to the broker the start of the event. Then
the publisher starts collecting the video and location information from the
drone system. The drone reports its location information to the ground station
using the wireless link used for the control system and the publisher extracts
the location information from the control software. The HD video camera
Chapter 3. Data Dissemination in CVST 74
Figure 3.21: Subscription Portal: Public Transportation Query
installed on the drone sends the video over the high-bandwidth wireless link
to the ground. At the ground PC, the video is encoded to the proper size and
format and then published along with the current location of the drone to the
CVST platform as two separate publications.
The broker distributes the published data to all the subscribers. By default,
a UAV event has three subscribers:
a. Video recording system, which starts workers for recording the video in
proper format and storing it in the right location.
b. The database subscriber that stores the event information, such as drone
Chapter 3. Data Dissemination in CVST 75
Figure 3.22: Subscription Portal: Public Transportation Data
location updates and the URL of the live or recorded video feed.
c. The CVST portal which hosts live streaming and playback and associ-
ated analytics.
At the end of the event, publisher published the end of the event and sub-
sequently the database and the portal will be updated accordingly. Fig. 3.19
shows a sample data that is published during an event. It contains the location
of the event as a GeoJson [61] document, the time of the event in timestamp
as Unix time epoch and the current URL to the video feed. The video feed is
always available to be consumed by the web portal.
The first official live demonstration of VaaS took place in Toronto on Oc-
tober 2015, launching a UAV to a height of 75 meters adjacent to the Don
Valley Parkway. Fig. 3.20 shows a screen shot of the portal while playing a
live stream of a drone flight. The platform has been used to publish live video
feed from drone flights in many demonstrations, and during these demos, the
live video feeds were available on the CVST platform. The recorded videos of
the flights can be viewed on the CVST portal [34].
Chapter 3. Data Dissemination in CVST 76
Figure 3.23: Subscription Portal: Traffic Sensor Query
3.4.4 Subscription Portal
We also developed a web portal that can act as a subscriber and receive live
updates for different queries in real-time. Using the portal, users can register
an account, login to the portal and register their queries for different pub-
lished data types. Fig. 3.21 shows the query builder interface when the user
is creating a subscription for public transport data.
Behind the scene, the portal is using the XSUB API as discussed in Sec-
tion 3.2. The portal queries for available registered publishers in the system
and their schemas by using calling the /schemas api. For example, as depicted
Chapter 3. Data Dissemination in CVST 77
Figure 3.24: Subscription Portal: Traffic Sensor Data
in Fig. 3.21, two publishers are registered in the system. The Field name is
populated based and the Apache Avro schema of the publishers. Therefore,
the portal does not need any hard coded data about the publishers to pro-
vide this functionality and can dynamically support new publishers. Any new
publisher in the system is automatically available to the users.
The portal provides a simple interface for creating subscriptions. For ex-
ample, as shown in Fig. 3.21, user is interested in all the updates for vehicle
id 1003. Multiple conditions can be added to the query at the same time.
Similar to Fig. 3.12, the query contains a match query. Query builder will also
ask for a Time to live (TTL) for the query. After TTL is expired, the query
is removed from the system.
After submitting the query, results will be pushed to the portal as soon as
they are available. Fig. 3.22 shows the live results received from the publisher
based on the query defined above. The fields defined in the schema and their
values are presented to the user in a table. Fig. 3.23 is the subscription portal
while the user is defining a query for the traffic sensor data. Here, the query
will receive the data of highway sensors on roads that their name contain 401.
Fig. 3.24 shows the subscription page while receiving live updates of this query.
Chapter 3. Data Dissemination in CVST 78
1 FIB:2 /xsub nexthops={faceid=262 (cost=0)}3 /hw_sensor nexthops={faceid=259 (cost=0)}4 /xpub nexthops={faceid=261 (cost=0)}
Figure 3.25: Forwarding Information Base table after XPUB, XSUB and pub-lisher are started
1 [ Forwarder ] onIncomingInteres t f a c e=261 i n t e r e s t=/xpub/ pub l i sh / s t a r t /hw_sensor/%FEQ%AE2 [ ContentStore ] f i nd /xpub/ pub l i sh / s t a r t /hw_sensor/%FEQ%AE L3 [ ContentStore ] no−match4 [ Forwarder ] onContentStoreMiss i n t e r e s t=/xpub/ pub l i sh / s t a r t /hw_sensor/%FEQ%AE5 [ Forwarder ] onOutgo ingInterest f a c e=259 i n t e r e s t=/xpub/ pub l i sh / s t a r t /hw_sensor/%FEQ%AE6 [ Forwarder ] onIncomingInteres t f a c e=259 i n t e r e s t=/hw_sensor/data/%FEQ%AE7 [ ContentStore ] f i nd /hw_sensor/data/%FEQ%AE R8 [ ContentStore ] no−match9 [ Forwarder ] onContentStoreMiss i n t e r e s t=/hw_sensor/data/%FEQ%AE
10 [ Forwarder ] onOutgo ingInterest f a c e=261 i n t e r e s t=/hw_sensor/data/%FEQ%AE11 [ Forwarder ] onIncomingData f a c e=261 data=/hw_sensor/data/%FEQ%AE/%FD%01/%00%0012 [ ContentStore ] i n s e r t /hw_sensor/data/%FEQ%AE/%FD%01/%00%0013 [ Forwarder ] onIncomingData matching=/hw_sensor/data/%FEQ%AE14 [ Forwarder ] onOutgoingData f a c e=259 data=/hw_sensor/data/%FEQ%AE/%FD%01/%00%0015 [ Forwarder ] onIncomingInteres t f a c e=259 i n t e r e s t=/hw_sensor/data/%FEQ%AF16 [ ContentStore ] f i nd /hw_sensor/data/%FEQ%AF R17 [ ContentStore ] no−match18 [ Forwarder ] onContentStoreMiss i n t e r e s t=/hw_sensor/data/%FEQ%AF19 [ Forwarder ] onOutgo ingInterest f a c e=261 i n t e r e s t=/hw_sensor/data/%FEQ%AF20 [ Forwarder ] onIncomingData f a c e=261 data=/hw_sensor/data/%FEQ%AF/%FD%01/%00%0021 [ ContentStore ] i n s e r t /hw_sensor/data/%FEQ%AF/%FD%01/%00%0022 [ Forwarder ] onIncomingData matching=/hw_sensor/data/%FEQ%AF23 [ Forwarder ] onOutgoingData f a c e=259 data=/hw_sensor/data/%FEQ%AF/%FD%01/%00%00
Figure 3.26: Interests and Data packets log during XPUB and publisher com-munication
3.5 Evaluation and Performance Tests
In this section, we present some results of our system evaluation and perfor-
mance tests. In Section 3.5.1 we go over the network trace of publishing traffic
flow sensor data. In Section 3.5.2, we test the scalability of the workers of the
Matching Engine by putting the system under heavy load and then scale out
the workers by launching new virtual machines instances. In Section 3.5.3, we
test the performance of data delivery to subscribers using IP and Name-Data
Networking. All of these evaluations are end-to-end tests and involve all the
system components.
Chapter 3. Data Dissemination in CVST 79
Query Servers
Publications
Message Queuing
Worker VMs
....Figure 3.27: Scalability of the Matching Engine - Experiment Setup
3.5.1 IDD Publication Test
Fig. 3.25 lists the status of Forwarding Information Base table of the router
after XPUB, XSUB and Traffic Flow publisher are started. Line 1 is the
face registered by the XSUB, line 2 is the face registered by the Traffic Flow
publisher, and line 3 is the face registered by the XPUB service.
Fig. 3.26 lists the packet log of the start process discussed in Section 3.1.1.
Publisher periodically sends the “start” Interest packet to the XPUB instance.
Line 1 in Fig. 3.26 shows that the Interest packet of the “start” process is re-
ceived by the XPUB, under the name /xpub/publish/start/hw_sensor/%FEQ%AE.
/xpub is the path to the XPUB instance, /publish/start is the action verb
for the “start” process, /hw_sensor is the path of the publisher, and %FEQ%AE
is the binary format of the sequence number of the data available in the pub-
lisher.
Line 6 shows the Interest packet that the XPUB sends back to the publisher
at /hw_sensor/data/%FEQ%AE to request the newly published data. Notice
that the same sequence number is used in the name of the data. Line 11
Chapter 3. Data Dissemination in CVST 80
0
1000
2000
3000
4000
5000
6000
7000
8000
0 1000 2000 3000 4000 5000 6000 7000 0
2
4
6
8
10
12
14
16D
eliv
ery
Rat
e (m
sg/s
)
Num
ber o
f Wor
kers
Time (s)
workers
deliveries.mean(1m)
Figure 3.28: Scalability of the Matching Engine, one minute rolling average
show the data is sent by the publisher to XPUB, properly segmented. Line 12
indicates that data is cached so it will be available for another Interest packets
requesting the same data. Line 15 shows that after the XPUB receives the new
data, it immediately sends an Interest packet requesting for the next sequence
number.
3.5.2 Scalability of the Matching Engine
Since the Matching Engine does most of the computationally intensive job
in the system, we did an experiment to test how it can scale out. Fig. 3.27
shows the setup of the experiment. We used separate machines for different
components of this experiment, the Message Queuing Server, the Query Server,
and Worker Servers are each a separate virtual machine (VM) instance running
Chapter 3. Data Dissemination in CVST 81
0
1000
2000
3000
4000
5000
6000
7000
0 1000 2000 3000 4000 5000 6000 7000 0
2
4
6
8
10
12
14
16D
eliv
ery
Rat
e (m
sg/s
)
Num
ber o
f Wor
kers
Time (s)
workers
deliveries.mean(5m)
Figure 3.29: Scalability of the Matching Engine, five minutes rolling average
on the SAVI platform. Therefore, we can launch as many workers as we
need independently of other parts of the system. To keep workers busy, we
bombarded the message queuing system with new messages and kept it full
throughout the experiment. Then we started workers one by one to consume
the messages and match them against a subscription query in the Query Server.
Over time, the number of workers is increased from 1 to 16, and the rate of
the message delivery is measured every ten seconds on the Message Queuing
Server.
Fig. 3.28 and Fig. 3.29 show the message delivery rate of the message
queuing system over one minute and five minutes rolling average respectively.
To better understand the trend of the data we have superimposed the number
of workers over the delivery rate. The left axes in Fig. 3.28 and Fig. 3.29
Chapter 3. Data Dissemination in CVST 82
Subscribers
....R1 R2
Publications BrokerL1
Figure 3.30: Data usage: IDD vs IP — Experiment Setup
show the delivery rate, while the right axes indicate the number of workers
over time. The increasing trend of the delivery rate follows the number of
workers in the system. For example, when there is one worker, the delivery
rate is about 500 msg/s. Increasing the number of workers to five, increases
the delivery rate to 2500msg/s. This experiment shows the system can scale
out and mitigate a higher load with a higher delivery rate by adding more
parallel workers to the system.
3.5.3 IDD and IP Performance Comparison
Next, we setup a test experiment to evaluate the performance of IDD layer.
Fig. 3.30 shows the test setup. Similar to Section 3.5.2, we continuously publish
our test publications to the broker, and a setup a series of subscribers with
queries that match those publications. We have set up the system in a way that
all the communications between the broker and the subscribers are transmitted
over a single network link, noted as L1. As shown in Fig. 3.30, L1 is between
routers R1 and R2. R1 is the connection point of the broker’s network and
L1, and R2 is the connection point of the subscribers’ network and L1.
Chapter 3. Data Dissemination in CVST 83
0
50
100
150
200
250
300
0 60 120 180 240 300 360 420 480 540 600 660 720 780
Dat
a R
ate
(KB/
s)
Time (s)
IDD
IP
Figure 3.31: Data usage: IDD vs IP — Results
We tested the system with both IP and IDD as the communication pro-
tocol. Throughout the trial, we increase the number of subscribers every 60
seconds and measure the link utilization of L1 every 10 seconds. Fig. 3.31
shows the one minute rolling average of the link utilization of L1 when sub-
scribers are added to the system every 60 seconds. As one can see, when the
subscribers use IP, link utilization of L1 is constantly higher than when IDD
is used.
This difference comes from the fact that IDD only puts one copy of the
data on the wire. All the subscribers are sending Interest packets for the same
data name. After one of the subscribers sends its Interest packet over L1 to the
broker, the subsequent Interest packets are stored in R2. The broker satisfies
the first Interest packet with the matched data. This data reaches R2 on its
Chapter 3. Data Dissemination in CVST 84
path. R2 caches the data and satisfies all of its pending Interest packets. On
the other hand, in the IP-based communication, there is no in-network caching
on the protocol layer and the broker has to send each subscriber a new copy
of the data, which results in a higher link utilization.
To improve the performance of IP-based protocols one must put a cache
near the R2. Then, all the requests from subscribers towards the broker should
be rerouted to that cache, for example, by configuration in the application of
the subscribers or packet level inspection at R2. In IP, data is coupled with its
location, the application and routers. Therefore, the network lacks the sup-
port for mobility and provenance. In IDD, they are decoupled. Therefore, the
application is responsible for creating the content and the network is respon-
sible for delivery it. The application does not have to know about the client
or network configuration, and the network does not need to know about the
application specific packets to forward them to specific caches.
3.6 Summary
In this chapter, we presented our Naming design for Named-Data Networking
to have push notification in the data dissemination layer of CVST platform.
We discussed how the design would provide a simple, scalable communication
layer that has support for security and mobility. We have used this design in a
scalable and distributed implementation of a content-based publish/subscribe
system. Using micro-services provides the publish/subscribe system the capa-
bility to easily scale out and serve more requests. We also demonstrated some
Chapter 3. Data Dissemination in CVST 85
examples, such as live stream of drone events, using this dissemination layer
in the CVST platform.
Chapter 4
Content Delivery in Service
Providers
The CDN architecture is optimized to deliver the content until it reaches the
network of service providers. The Service Providers (SP) usually place Con-
tent Delivery Network (CDN) caches at the Internet Exchange peering points,
connected to the core of the network. Inside the operator’s network, it is a
different matter. The area of the network that is close to the consumer, and
is known as last mile network, is not optimized for Over-The-Top content.
The CDN architecture does not solve the inefficient use of the SP’s network
infrastructure. When the users are requesting the same content, that content
is transmitted over the network of the SP multiple times. In this chapter,
we use time-to-exhaustion (TTE) as our metric and formulate the problem to
place the caches in the network and route the content in a way that TTE is
maximized.
86
Chapter 4. Content Delivery in Service Providers 87
INTERNET CDNCDNCDNCDN
NetflixGoogleAmazonAkamai
3rd PartyCaches
OperatorCDN
Access{ { Last Mile
Agg/Edge
Figure 4.1: Network of a Service Provider
The rest of this chapter is organized as follows. In Section 4.1, the content
delivery problem in a service provider is investigated. Further, details of our
analytical model are discussed in Section 4.2. Simulation results and validation
are provided in Section 4.3.
4.1 Problem Definition
4.1.1 Content Distribution in Service Providers
Fig. 4.1 shows a simplified path that content takes from its source, through the
operator’s network and at last to the consumer. A service provider’s network
usually consists of multiple layers, the Core layer, the Aggregation layer, and
the Access layer. The core of the network transfers the highest volume of data
from various aggregation sites between sources and destinations. The Core
has few points of presence (PoP) and high capacity communication. Content
servers are usually connected to the core through an Internet exchange peering
point. The next level is the Aggregation level and is a concentration point of
multiple distribution centers, which themselves may be connected to smaller
distribution Edge centers. Each center in the aggregation level usually serves
Chapter 4. Content Delivery in Service Providers 88
INTERNET NetflixGoogleAmazonAkamai
PeeringPoint
Agg/Edge
(a) Content delivery from Peering Points
INTERNET NetflixGoogleAmazonAkamai
PeeringPoint
Cache
Agg/Edge
(b) Effect of caching on network traffic
Figure 4.2: Content distribution in Service Providers
about one to three million customers. The final layer is called Access layer and
is directly connected to the consumers. For example, a cable provider edge
layer contains cable modem termination systems (CMTS) and each CMTS
serves about 10 to 50 thousand subscribers. Access layer of a wireless service
provider contains several cellular antennas.
Now consider the subscribers that request an OTT video content. As shown
in Fig. 4.2a, for every request for content, a new connection is created between
the content source and the consumer’s machine. Even when all the users are
requesting for the same content, that content is transmitted over the network
multiple times. Note that the source of this content may be either controlled
by the operator itself or come from a VoD content server owned by a 3rd
party CDN. If the content source is a live stream from outside of the network,
operators are faced with an even bigger challenge than for VoD content. Many
consumers watch live stream content concurrently, and the operator does not
Chapter 4. Content Delivery in Service Providers 89
have any control over the content coming from outside of the network. This
structure is not scalable and is an apparent waste of underlying resources.
For example, consider a Service Provider in Canada that is serving OTT
content to its users in Ontario and Quebec. If the peering point is in Chicago,
all the traffic requests from users in Quebec and Ontario are served from
Chicago. Installing a cache in Toronto will save a lot of traffic that, other-
wise, would have gone over the network from Chicago to users in Ontario and
Quebec.
Fig. 4.2b shows a network that has a cache near the Aggregation and the
Edge level. All the flows, which were passing through the Core, are now
terminated at a lower level of the network. Therefore, putting a cache in lower
levels of the network saves the extra bandwidth used by multiple transmission
and increases the available capacity of the network. Hence, Content Providers
are putting their caches inside the operator’s network. Netflix, with its Open
Connect program, convinced the operators to set up cache servers even deeper
in their network, in places such as metro areas, to reduce the traffic load on
their core and peering points.
4.1.2 Time-to-exhaustion
In a network with increasing demand, such as service providers, congestion is
inevitable. For a service provider, serving content from a peering points incurs
cost. At the same time, serving more content to users means more revenue.
The demand increase will eventually exhaust the network at some point in the
future unless the onset of congestion of the network is increased. The network
Chapter 4. Content Delivery in Service Providers 90
S
S
S
D
D
Figure 4.3: Flows between sources and destinations pass through multiple links
onset of congestion is when the capacity of a link in the network is exceeded,
i.e., the link is congested.
However, the onset of congestion not only depends on the network topology
but also on the pattern of the growth of the demands. For example, network
congestion in a network with a linear demand growth is different from a net-
work with an exponential demand growth. Furthermore, the demand matrix
plays a major role in the onset of congestion of the network. For example,
introducing new services, offering new types of quality of service for content
delivery, or adding new customers change the onset of congestion of the net-
work.
Fig. 4.3 shows how content routing and caching can affect the onset of
congestion. Following the max-flow-min-cut theorem, the maximum amount
of flow passing from the sources to the destinations in a network is equal to
total link capacity of the minimum cut of that network. Now, consider a case
that most of the flows are routed through a critical link, which makes that
link congested sooner than later. Service providers have two solutions to this
problem. The first solution is optimizing the content routing and passing them
Chapter 4. Content Delivery in Service Providers 91
0 5 10 15 20 25 30Time (Months)
0
100
200
300
400
500
600
Traf
fic (G
B/s
)
Network Capacity
TTE 1 TTE 2
Demand 1Demand 2
Figure 4.4: Time-to-exhaustion. Traffic is increasing monthly until network iscongested.
through different links. Therefore, the critical link will have a lower average
load over time, and its congestion is delayed. Another option is to move the
flow destinations, e.g. caches, to other parts of the network. In other words,
putting caches in the network will delay the onset of congestion.
For example, assume that the demand is increasing every month. Fig. 4.4
shows such a scenario. If the current demand (shown as Demand 1) in the
network is 100Gb/s and the network can handle a maximum of 400Gb/s, the
current infrastructure will keep up with the traffic for the next 16 months.
However, by placing caches in the network and optimizing content routing,
the demand pattern changes (shown as Demand 2), and the network will stay
congestion-free for another eight months.
Chapter 4. Content Delivery in Service Providers 92
The problem that SPs are facing is how to plan their future network to
accommodate the constant increasing of the demand, to provide a congestion
free network and to minimize the costs. Another challenge is that the SP
already has an established network. SPs need time to buy pieces of equipment,
test them, and deploy them in their infrastructure. These investments keep
the network congestion-free for a limited time.
Also, the budget planning process has a time element. In other words,
the budget is planned for a limited period, such as a year. Therefore, Service
providers use the notion of time-to-exhaustion for forecasting. TTE becomes
crucial for network capacity planning since it affects the amount and the timing
of investment in the infrastructure. For example, with a limited budget, the
SP must choose how to plan the additional capacity and where to put the
caches and what type of content should to cached.
We aim to maximize the time-to-exhaustion, considering a limited budget,
by placing caches in the best locations and optimizing the content routing. We
will show that using ICN-based paradigms, such as Named-Data Networking,
will outperform optimal cache placement and content routing in CDN and will
prolong time-to-exhaustion of the network. The strategy layer of the NDN can
be used to route the content in the optimal way.
4.2 Problem Formulation
We model our network as a directed graph G(V,E) with the set of nodes V
and links E. U is the notation for the set of nodes that have a demand for
Chapter 4. Content Delivery in Service Providers 93
Constants
V Set of nodesE Set of directional linksG(V,E) Graph of the networkP Nodes that are connected to the peering pointsU Nodes that have a demand for contentsC Nodes that can cache contentsLk Size of the content kαki Demand for content k at node i
Γ+i ,Γ
−i Set of ingress and egress neighbors of node i
rki Maximum rate node i can read content k from its cacheB Total storage available for all cachesV (.) The function that maps storage to its budget valueci,j Capacity of link (i, j)I(.) Indicator function, 1 if the condition is true, 0 o.w.M Maximum number of caches in the networkφsdi,j Shortest-path betweenness of link (i, j) from node s to d
Common Variables
Si Storage at node ipi Decision variable for cache placement at node ihki Decision variable for caching content k at node iβki Total demand by node i for content k
CDN Specific Variables
fkdi,j Flow for content k on link (i, j) going to node dγkds Traffic flow from node s to node d for content k
NDN Specific Variables
fki,j The rate interests for content k is sent on link (i, j)
Table 4.1: Notations
Chapter 4. Content Delivery in Service Providers 94
contents. P indicates the set of nodes that can satisfy demands for contents,
e.g Internet exchange points. The set of nodes that are candidates for caching
contents is noted by C. All the notations are listed in Table 4.1.
4.2.1 Demands and Storage Budget
To find the TTE of the network, we will model the network for one time epoch.
We assume that within this time epoch, the demands are known and fixed,
but the location of the caches, the cached content and content routing are not.
We also assume a limited storage budget, B, is available for capacity planning
of all the caches in the network.
Since the demand of each user changes over time following different pat-
terns, we will run an exhaustive search to find the TTE by solving a series
of feasibility problems. A feasibility problem does not have any objective and
will only find a feasible solution to the problem. For each budget value, we
change the demands of the users following a pre-known pattern. If, for a set of
demands, the network becomes congested, the problem will become infeasible.
When the problem becomes infeasible, we have found the TTE of the network.
Final solution of the model provides a cache placement and content routing
policy that maximizes the TTE of the network.
Demands
We denote the demand at each node i for content k by αki . Note that αk
i
depends on time, however, we are solving the problem for each time epoch
separatly. The demand at each node also depends on whether the node i
Chapter 4. Content Delivery in Service Providers 95
caches content k or not, denoted by a binary variable hki . In other words, the
traffic of populating a cache is also a demand. Therefore, total demand at
node i can be written as:
βki = αk
i + hki ∀i ∈ C ∪ U (4.1)
Note that βki is the number of the requests for content k, not the size of
the demand. The size of the demand is Lkβki where Lk is the size of content
k.
Storage Budget
Each cache in the network is assigned a part of the storage, denoted by Si.
However, B is the storage budget in dollar value. We assume that the relation
between the amount of storage and its dollar value can be written as a function
V (Si). The sum of all the budgets assigned to caches should be equal to B.
Let pi be the binary variable that decides if node i is a cache. Therefore, the
budget constraint can be written as Eq (4.2).
∑i
piV (Si) ≤ B ∀i ∈ C (4.2)
Here we assume that V (.) is a linear function, however, this can be extended
to any convex function. For each cache, total cached objects can not exceed
the size of the storage of that cache as written in Eq (4.3).
∑k
Lkhki ≤ piSi ∀i ∈ C (4.3)
Chapter 4. Content Delivery in Service Providers 96
Also, total number of caches placed in the network can be limited by an
upper bound, M , as written in Eq (4.4).
∑i
pi ≤M ∀i ∈ C (4.4)
To have homogeneous caching, we may also add a constraint that enforces
all Si to be equal.
Cache Replacement Policy and Routing
Caching policy provided by the solution will maximize the TTE of the network.
hki is the binary variable that shows if content k is cached at node i. Solving
the model for two different time epochs with different demands will result
in different hki . The difference between hki for different demands will be the
cache replacement policy of node i. Adopting a certain caching replacement
policy, such as Least Recently Used (LRU) or Least Frequently Used (LFU),
will reduce the TTE of the network.
The solution also provides the content routing policy for the network.
Adopting a routing protocol such as shortest-path will also reduce the TTE of
the network. We study this effect in the result section.
4.2.2 Content Delivery Networks
In service providers, transparent caching is done by putting one or more caches
in the network and re-routing the requests towards them. SPs may also host
the content sources of their own or from third parties. To model this, we define
Chapter 4. Content Delivery in Service Providers 97
a multi-commodity flow problem.
Flow Conservation
The flow conservation at node s for content k can be written as Eq (4.5). We
denote fkdi,j as the flow for content k on link (i, j) going to node d and γkds for
the flow for content k from node s to node d. The left-hand side of Eq (4.5)
is the difference between total egress (Γ−s ) and ingress (Γ+s ) flows for content
k at node s that is destined for node d.
∑j∈Γ−
s
fkds,j −
∑j∈Γ+
s
fkdj,s = γkds − Lkβ
ks δ(s− d) ∀s, d ∈ V (4.5)
The right-hand side of Eq (4.5) is the total flow that is originated at node s
towards node d for content k minus the demand at node s for content k. δ(i)
is the Kronecker delta function, it is equal to 1 when i is zero, otherwise it is
zero. Therefore, Lkβks in Eq (4.5) will only have any effect when s and d are
the same node. In other words, the ingress and egress flow destined to node
d at any node other than d is equal to the traffic produced at that node for
node d. When s and d are equal all ingress traffic into node d will be equal to
the demand at node d. Therefore, considering the fact that node d does not
send traffic to itself (i.e. fkdd,j = 0,∀j and γkdd = 0), Eq (4.5) will be reduced to
∑j∈Γ+
d
fkdj,d = Lkβ
kd
Chapter 4. Content Delivery in Service Providers 98
Cache Population Traffic
The cache population traffic is satisfied by peering points. Therefore, the total
demand originated at the core (P) of the network, must be bigger than the
size of the cached content (Eq (4.6)).
∑s∈P
γkis ≥ Lkhki ∀i ∈ C (4.6)
I/O and Link Capacity Limits
A node can only become a source of the flow for a content request when it
is a cache and it has the content cached. I(i ∈ C) in Eq (4.7) is equal to 1,
if only node i is a cache candidate. hki will be equal to 1 when the content
k is cached at node i. rki is the rate that each node can read contents from
its cache storage and put them on the wire. It is the limitation of the node’s
hardware, e.g. I/O limit of the node’s hard disks.
∑d
γkdi ≤ I(i ∈ C)rki Lkhki (4.7)
Also, each link (i, j) has a limited capacity, denote by ci,j. The link capacity
enforces that the sum of all the flows to all destinations for all contents be less
than total link capacity, as in Eq (4.8).
∑k,d
fkdi,j ≤ ci,j (4.8)
The complete feasibility problem that models a CDN in the network of a
service provider is shown in Fig. 4.5:
Chapter 4. Content Delivery in Service Providers 99
solvesubject to∑
j∈Γ−s
fkds,j −
∑j∈Γ+
s
fkdj,s = γkds − Lkβ
ks δ(s− d) ∀s, d
∑s∈P
γkis ≥ Lkhki ∀i ∈ C∑
d
γkdi ≤ I(i ∈ C)rki Lkhki ∀i ∈ V \ P∑
k,d
fkdi,j ≤ ci,j
βki = αk
i + hki ∀i ∈ V \ P∑i
piV (Si) ≤ B∑k
Lkhki ≤ piSi
Figure 4.5: Feasibility model for CDN
Shortest-path routing
Routing in the network of the service providers is usually based on shortest-
path routing. To study the effects of shortest-path routing, we add a routing
constraint to our model. Shortest-path routing is modeled using the shortest-
path betweenness centrality of each link.
Betweenness centrality (BC) is one of the centrality metrics in graphs [62].
Betweenness centrality measures the degree to which a node or a link is needed
when connecting other nodes along paths. Shortest-path betweenness central-
ity of the link (i, j) with respect to the source node s and the destination
node d, denoted as φsdi,j, is defined as the proportion of the number of the
Chapter 4. Content Delivery in Service Providers 100
shortest paths from node s to d that passes through link (i, j). Therefore, the
average traffic for content k that passes through link (i, j) from source s to
destination d can be written as φsdi,jγ
kds . To model shortest path routing we can
add Eq (4.9) to the model. Eq (4.9) will have the link (i, j) to not transfer
any traffic more than its share, if the routing is done using shortest-path.
fkdi,j ≤
∑s
φsdi,jγ
kds s ∈ V (4.9)
4.2.3 Named-Data Networking
Interest Forwarding
To model NDN, we will find the locations that potentially can satisfy more
interest in contents. This notion of interest here is more of the nature of
content popularity in a node, similar to the virtual interest packets studied
in [63], and is different from the Interest packet in NDN paradigm. We denote
fki,j as the rate that interest for content k is forwarded on link (i, j). Since NDN
is a point-to-point protocol we do not have flows from sources to destinations,
but potential interests that move around the network until they are satisfied.
Suppose node s has some interest in content k (βks ). Therefore, the egress
interests (∑
j∈Γ−sfks,j) from node s is increased by βk
s . This is written as an
inequality in Eq (4.10).
∑j∈Γ−
s
fks,j −
∑j∈Γ+
s
fkj,s ≤ βk
s (4.10)
Now consider a node that has a content cached in its content store and
Chapter 4. Content Delivery in Service Providers 101
can satisfy interest for that content and remove the interest from the network.
Each node also has an I/O limit for reading its content store that limits the
rate interests are satisfied. Otherwise, the interest will be forwarded towards
other nodes in the network. Therefore, a node can at most satisfy the interests
by the rate that is bounded by its I/O limit, as written in Eq (4.11).
∑j∈Γ−
s
fks,j −
∑j∈Γ+
s
fkj,s + I(i ∈ C)rksh
ks ≥ βk
s (4.11)
Consider the scenario that node s is not caching content k. Therefore,
Eq (4.10) and Eq (4.11) will be reduced to an equality and will enforce that
node s forwards all of its ingress and local interests. However, if node s caches
content k, the ingress interests can be satisfied by an amount bounded by
the hardware limitations of node s. Finding the movement of this potential
interest in the network can be used to find the best place to cache the content.
Link capacity limit
The next step is to model the link capacity constraint. In NDN, Data packets
follow the reverse path of the Interest packet to reach the destination. There-
fore, sending an interest over the link (i, j) will result in the data sent back over
the link (j, i). We can use this to write link capacity constraint as Eq (4.12).
∑k
Lkfki,j ≤ cj,i (4.12)
Including Eq (4.1), Eq (4.2), Eq (4.3), the complete feasibility problem for
NDN is shown in Fig. 4.6
Chapter 4. Content Delivery in Service Providers 102
solvesubject to∑
j∈Γ−s
fks,j −
∑j∈Γ+
s
fkj,s ≤ βk
s∑j∈Γ−
s
fks,j −
∑j∈Γ+
s
fkj,s + I(i ∈ C)rksh
ks ≥ βk
s∑k
Lkfki,j ≤ cj,i
βki = αk
i + hki ∀i ∈ V \ P∑i
piV (Si) ≤ B∑k
Lkhki ≤ piSi
Figure 4.6: Feasibility model for NDN
4.3 Results
We evaluated the numerical result of our model using multiple network topolo-
gies. Fig. 4.7 is one of the Rocketfuel networks [64]. Fig. 4.8 is a Dorogovtsev-
Goltsev-Mendes (DGM) topology and Fig. 4.9 is a tree network. The Rock-
etfuel topology is simplified by removing the leaf nodes from the original net-
work and consolidating the demands from the removed nodes into their parent
nodes [65]. The simplified network has 50 nodes and 194 directed links. These
three topologies are comparable in size. The number of users is 25 nodes in
Rocketfuel and 27 nodes in DGM and tree topologies. We consider one peering
point for each network, and the rest of nodes are cache candidates nodes. At
each node, the demand for each content follows a Zipf distribution with α = 2.
We assumed all the users have the same demand, and it is uniformly increasing
Chapter 4. Content Delivery in Service Providers 103
1
2
3
4
5
6 7
8
9
10 11
12
13
14
15
16
1718
19
20
2122
23
24
25
26
27
28
293031
3233
34
3536
37
38
39
40
41
42
4344
45
46 47
48
49
50
Figure 4.7: Rocketfuel network
by 5% every month. This increase is based on the current observation of OTT
demand increase. As mentioned in Section 4.2.1, for each budget point we
solve a series of feasibility problem and increase the demand until the network
is saturated. We also simulated the back-pressure algorithm in [63] to compare
with the performance of our model.
4.3.1 Time-to-Exhaustion of different topologies
To evaluate the performance of the CDN method, we find the TTE of the
network by putting at most four caches. The assigned storage budget is equally
divided between these nodes, assuming they are all using similar hardware. In
other words, we use homogeneous caching. For example, in Rocketfuel network
(Fig. 4.7), Nodes 5, 10, 12 and 14, are selected for caching, and respectively,
in DGM topology, Nodes 2, 3, 4, 5 and tree topology, Nodes 2, 3 and 4 are
Chapter 4. Content Delivery in Service Providers 104
1
23
45
6
78
910
11
12
13
14
15
1617
18192021
2223
24
25
26
27
28
29
30
31
32
33
34
35
36
37
3839
40
4142
Figure 4.8: DGM network
selected for caching. It is also worth noting that in the tree topology only three
nodes are selected for caching, since adding more caches has not increased the
TTE further. To evaluate the performance of NDN, we will enable caching
in all the candidate nodes. Therefore, storage budget will be equally divided
between more nodes, and each node can cache less number of objects.
Fig. 4.10, Fig. 4.11 and Fig. 4.12 show the TTE in different topologies
while using CDN model, NDN model and NDN simulation using back pressure
algorithm. We assumed that there is demand for 2000 objects, divided into 100
popularity groups, each with the size of 1Mb and all the links in the network
have the capacity of 1Gb/s. We had placed at most four caches in CDN while
all the nodes can cache in NDN scenarios. Note that in all the topologies,
NDN-Simulation using back-pressure closely follows our NDN-model.
At very low storage budget, CDN and NDN had a similar TTE, because
most of the content is provided by the peering point, and that will become
the bottleneck of the network. This means network onset of congestion will be
Chapter 4. Content Delivery in Service Providers 105
1
2
3
4
5
6
7
8
9
10
11
1213
1415
16
17
18
19
20
2122
23
24
25
26
27
28
29 30
31
32
33
34
35
36
37383940
Figure 4.9: Tree network
similar for both NDN and CDN scenarios. Different topologies have different
TTE for very low storage budget. The TTE depends on the onset of con-
gestion, and the onset of congestion depends on the topology of the network.
TTE is lowest for the tree topology and the highest for the DGM topology.
This observation is also in agreement with the reciprocal of network criticality
of each topology [66].
By increasing the caching storage, TTE is also increased. The storage
budget is equally divided between all caches. Therefore, the increase in the
total storage budget will increase the TTE. As the number of caches increases,
each cache will receive a smaller portion of the budget. Therefore, when there
is not enough additional storage available to each cache, there will be no
change in the number of cached contents, and the TTE will not change either.
This minimum increase in storage depends on the number of caches in the
network. In NDN, the steps are larger since there are more caches and a
greater increase in the total storage budget is required to cache more contents.
Chapter 4. Content Delivery in Service Providers 106
0
10
20
30
40
50
60
70
80
90
2 4 6 8 10 12 14 16 18 20
Tim
e to
Exh
aust
ion
(Mon
ths)
Cache Storage Budget (Gbit)
NDN-ModelNDN-Simulation
CDN
Figure 4.10: Time-to-exhaustion in Rocketfuel network
In CDN, the steps are smaller since there are only four caches and a smaller
amount of increase in storage budget, compared to NDN, will result in more
cached contents. However, the height of the steps decreases with increase of
the budget, because caching begins losing its effect. There is also a limit on
the maximum TTE of each topology, after which even caching does not help
anymore. This TTE is the maximum that a network can reach with the help
of caching. Similar to the low budget TTE, the maximum TTE also depends
on the topology of network.
Furthermore, in low storage budget, there is little difference in TTE be-
tween using CDN and NDN. Because of homogeneous caching, sometimes CDN
even performs better. However, in all the topologies, the network that uses
Chapter 4. Content Delivery in Service Providers 107
0
10
20
30
40
50
60
70
80
90
2 4 6 8 10 12 14 16 18 20
Tim
e to
Exh
aust
ion
(Mon
ths)
Cache Storage Budget (Gbit)
NDN-ModelNDN-Simulation
CDN
Figure 4.11: Time-to-exhaustion in DGM network
CDN is saturated in much lower storage budget compared to NDN. This bet-
ter performance is the direct result of the NDN paradigm. In NDN, due to
its in-network caching and point-to-point nature, each cached content is sent
over the links only once. However, in CDN each content is sent multiple times.
This waste of link capacity shows itself by having the network saturated much
sooner. There is a huge difference in maximum TTE between using CDN or
NDN in each topology. In Rocketfuel, using CDN will saturate the network
after 47 months. But using NDN, the network can be operational until 77
months. Similarly, DGM with CDN is operational for 74 months and with
NDN for 82 months. Tree topology with CDN is operational for 27 months
and with NDN for 67 months. This huge difference in tree topology is because
Chapter 4. Content Delivery in Service Providers 108
0
10
20
30
40
50
60
70
80
90
2 4 6 8 10 12 14 16 18 20
Tim
e to
Exh
aust
ion
(Mon
ths)
Cache Storage Budget (Gbit)
NDN-ModelNDN-Simulation
CDN
Figure 4.12: Time-to-exhaustion in Tree network
in NDN caches are placed throughout the network. As in Fig. 4.9, NDN places
caches in Nodes 2 to 13. But CDN only places caches in Nodes 2, 3 and 4. For
example, having a cache in Node 5 will reserve bandwidth in all the up-links
and will make more capacity available to deliver more content.
4.3.2 Limited NDN Deployment
To see how much of the difference in TTE between CDN and NDN comes from
the number of caches in the network, we will limit the number of caches in
NDN to four. Fig. 4.13 shows that even with four caches, content delivery using
NDN outperforms the CDN design. We have also considered a non-practical
case that every node in the CDN can also cache contents. This case is just for
Chapter 4. Content Delivery in Service Providers 109
0
10
20
30
40
50
60
70
80
0 5 10 15 20
Tim
e to
Exh
aust
ion
(Mon
ths)
Cache Storage Budget (Gbit)
NDN-full-cacheNDN-limited-cache
CDN-full-cacheCDN-limited-cache
Figure 4.13: Changes in TTE of Rocketfuel topology with number of caches
the comparison and in practice cannot be implemented due to the nature of
CDN. One could say that one of the reasons behind the NDN proposal is the
impossibility of in-network caching in TCP/IP. However, even if all the nodes
in the CDN had the caching capability, the network will saturate similar to
the case that there are four caches in the network. In addition, limited NDN
deployment has better TTE for low budget than full NDN deployment. This
suggests limiting the number of NDN caches when the storage budget is low.
We can also look at link utilization in the network. Fig. 4.14 shows the
percentage of links with various percentage of utilization during network con-
gestion. Using CDN, more than 60% of the links will have a link utilization
of more than 90%. In contrast, NDN scenario, even with limited deployment,
Chapter 4. Content Delivery in Service Providers 110
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
CDN-limited
CDN-Full
NDN-limited
NDN-full
<10 <20 <30 <40 <50 <60 <70 <80 <90 <100
Figure 4.14: Link utilization of NDN vs CDN
has less than 20% of the links with high utilization. Using NDN, has resulted
in a network that more than 40% of the links have link utilization of less than
10%. This difference in link utilization means that if CDN is used to increase
the TTE of the network, we have to increase the capacity of most of the links.
But using NDN will result in a much fewer bottlenecks, which makes capacity
planning much easier and cheaper.
4.3.3 I/O Speed Effect
One of the parameters we have considered in our modeling is the I/O limit of
each cache. The I/O limit depends on hardware design of the cache. Fig. 4.15
shows the effect of this parameter. To better see the difference the I/O speed
makes, we have increased the capacity of all the links to limit the effect of
congestion. As shown in Fig. 4.15, as the I/O limit increases from 10Gb/s
Chapter 4. Content Delivery in Service Providers 111
120
125
130
135
140
145
150
155
160
165
0 5 10 15 20 25 30 35 40
Tim
e to
Exh
aust
ion
(Mon
ths)
Cache Storage Budget (Gbit)
NDN-high-ioNDN-low-io
CDN-high-ioCDN-low-io
Figure 4.15: Changes in TTE of Rocketfuel topology with I/O limit
to 100Gb/s, TTE also increases. But it must be said that having low link
capacity will greatly diminish the improvement gained by having a hardware
with higher I/O limit.
4.3.4 Routing Protocol Effect in CDN
As mention above, our modeling tries to maximize the TTE and therefore
optimizes the routing of data. However, in practice routing is not optimal.
As shown in Fig. 4.16, by enforcing shortest-path routing for CDN in the
Rocketfuel network the TTE will be reduced by more than ten months. The
NDN does not have this problem since its strategy layer can employ an optimal
routing algorithm.
Chapter 4. Content Delivery in Service Providers 112
95
100
105
110
115
120
125
130
135
140
145
0 2 4 6 8 10 12
Tim
e to
Exh
aust
ion
(Mon
ths)
Cache Storage Budget (Gbit)
CDN-optimalCDN-SP
Figure 4.16: Changes in TTE of Rocketfuel topology with Routing algorithm
4.3.5 Heterogeneous Caching
Heterogeneous caching is using caches on the network that each has a different
amount of storage. In contrast to homogeneous caching, where all the caches
use the same amount of storage. Homogeneous caching may be cheaper since
the cache hardware comes in pre-configures packages, and having a customized
hardware costs more. Therefore, Service Providers must do a cost-benefit
analysis on having a heterogeneous caching system.
Fig. 4.17 shows the effect of heterogeneous caching on the TTE when NDN
is used. In using heterogeneous caching, the model will assign each cache
a different storage capacity while satisfying total storage budget constraint.
It is expected that the symmetry in tree and DGM topologies would imply
Chapter 4. Content Delivery in Service Providers 113
115
120
125
130
135
140
145
150
155
160
165
0 5 10 15 20
Tim
e to
Exh
aust
ion
(Mon
ths)
Cache Storage Budget (Gbit)
Rocketfuel-HTRocketfuel-HO
DGM-HTDGM-HOTree-HTTree-HO
Figure 4.17: Heterogeneous vs Homogeneous caching storage in NDN
little benefit to the heterogeneity. However, there is some difference in the
Rocketfuel topologies, which is less symmetric than other topologies we mod-
eled. If heterogeneous caching is employed, TTE in the Rocketfuel network is
increased at most by three months.
4.4 Summary
Service providers are under a lot of pressure due to daily increase of Over- The-
Top contents. In this chapter, we presented a cache placement and content
routing method for service providers to delay the congestion of their network
considering their limited budget. We modeled both ICN and CDN and aimed
to maximize the time-to-exhaustion of the network. Our result shows that
Chapter 4. Content Delivery in Service Providers 114
a limited deployment of ICN improves the time-to-exhaustion of the network
and lowers the number of links with high utilization.
Chapter 5
Conclusion
The Internet is evolving fast, regarding architecture and usage. Numerous
devices are getting connected to the Internet every day, and more and more
contents are constantly created. The current end-to-end communication us-
ing TCP/IP is not designed for these new use-cases. However, networking
paradigms, such as Information-Centric Networking, aim to tackle these prob-
lems. They move towards a point-to-point communication model, decouple
data names from their location and change router buffers into caches for con-
tent storage. In this chapter, we first review our contributions in this work and
then propose some ideas that can be implemented to extend our contributions.
5.1 Contributions
In this work, we designed a content-based publish/subscribe system using ICN
paradigm as the data dissemination layer in the CVST platform. Also, we
showed the benefits of using ICN in content delivery in service providers.
115
Chapter 5. Conclusion 116
5.1.1 Data Dissemination in CVST
The CVST platform collects a rich set of data from many transportation data
sources. These sources include traffic sensors, road cameras, road incidents
and closures reports, Twitter traffic reports, public transit data (bus location
information and bike station data), border delay time, and last but not least
the loop detector data.
We presented a content-based publish/subscribe system for CVST that
employs the ICN paradigm. In a content-based publish/subscribe systems, a
subscriber can define a query in addition to the topic of the interest and receive,
in real-time, the contents that match that query. We present the architecture
for a distributed broker that connects publishers and subscribers, registers the
schema for the publishers and saves the queries submitted by the subscribers.
These tasks are exposed as a set of APIs to publishers and subscribers. The
broker uses a set of scalable micro-services and supports ICN-base and IP-
based protocols to communicate with publishers and subscribers.
The publisher-broker and subscriber-broker communication layer over ICN
provides a platform to build an efficient, robust, scalable, and secure data dis-
semination layer. We presented the detailed design of the data dissemination
layer and its advantages. The platform has been used to publish live video
feed from drone flights, as well as many other data types. Our demonstration
shows the feasibility of Vision as a Service in an application platform.
Chapter 5. Conclusion 117
5.1.2 Time to Exhaustion
We proposed an in-network caching strategy for service providers to increase
the time-to-exhaustion of their network. We suggested that service providers
use Information-Centric Networking for caching and content delivery. Even a
limited deployment of ICN provides a substantial increase in time-to-exhaustion
of the network and lowers the number of links with high utilization. We studied
different parameters that affect the performance of content delivery, such as
I/O limited, routing algorithms, and heterogeneous and homogeneous caching.
We also validated our model by simulation.
5.2 Future Works
In this section, we review possible extensions of our work. The extensions are
in two categories, the extensions of the data dissemination layer for CVST and
the extensions of the content delivery in service providers using ICN paradigm.
We demonstrated the data dissemination layer uses the schema of a data
types for publications and subscriptions. More data sources can be easily
added to the data dissemination layer. Since the system understands the data
based on its schema, adding more data types is just creating and registering
the schema in the system. Access control, security, and privacy has native
support in Named-Data Networking. Our publish/subscribe system can easily
be extended to use these features to do verification, encryption, and authoriza-
tion. The broker can act as the central authority to control, issue and validate
the signing and encryption keys.
Chapter 5. Conclusion 118
We also expect that Vision as a Service (VaaS) will be available region-
wide by placing a network of drones throughout a region. Drones will be
dispatched on demand directly from base locations or transported by vehicle
to appropriate launching locations to investigate network anomalies.
The broker can also be extended to support aggregation queries. In the
current design, an application can subscribe to the raw data by filtering based
on some conditions. However, the data aggregation is a common feature in the
IoT systems. The Matching Engine micro-service is a good candidate to im-
plement the aggregation. The Matching Engine workers have direct access to
the data, and the queries and can use the Query servers as a temporary buffer
for both spatial and temporal aggregation. Additional micro-services may be
added to check for the aggregation result and notify the subscribers. The in-
terface for creating the aggregation queries can be implemented by extending
the Subscription portal.
Also, the Subscription Portal may be extended to register remote sub-
scribers. Currently, portal acts as a subscriber and receives notifications for
all of its registered subscriptions. However, the portal can register queries
for remote subscribers given their callback paths. The callback path of the
subscribers can be entered by the user in the query registration page of the
portal, and the portal will pass that information to the XSUB API.
Our work in controlling time-to-exhaustion in service providers may be
extended by integrating the logic of cache placement and content routing in a
central controller, such as SDI controller that SAVI provides. This controller
may use the output solution to the optimization problem and set the routing
Chapter 5. Conclusion 119
tables in the network routers to maximize the time-to-exhaustion. The solution
also provides a cache placement recommendation that may be used with the
dynamic resource allocation that an infrastructure such as SAVI provides to
instantiate new cache instances and route traffic towards that instances.
Also, different caching hardware may be used in various parts of the net-
work. Netflix OpenConnect [50] provides caching servers with different capa-
bilities, and the service providers must put the hardware in the best place in
the network. Our work can be extended to take into account these hardware
differences.
Bibliography
[1] D. Perino and M. Varvello, “A Reality Check for Content CentricNetworking,” in Proceedings of the ACM SIGCOMM Workshop onInformation-centric Networking, ser. ICN ’11. New York, NY, USA:ACM, 2011, pp. 44–49.
[2] V. Jacobson, D. K. Smetters, J. D. Thornton, M. F. Plass, N. H. Briggs,and R. L. Braynard, “Networking Named Content,” in Proceedings of the5th International Conference on Emerging Networking Experiments andTechnologies, ser. CoNEXT ’09. New York, NY, USA: ACM, 2009, pp.1–12.
[3] D. Raychaudhuri, K. Nagaraja, and A. Venkataramani, “Mobilityfirst:A Robust and Trustworthy Mobility-Centric Architecture for the FutureInternet,” SIGMOBILE Mob. Comput. Commun. Rev., vol. 16, no. 3, pp.2–13, Dec. 2012.
[4] A. Leon-Garcia, H. Bannazadeh, and A. Tizghadam, “Smart city plat-forms on multitier Software-Defined infrastructure cloud computing,” in2016 IEEE International Smart Cities Conference (ISC2) (ISC2 2016),Trento, Italy, Sep. 2016.
[5] J.-M. K.-M. Kang, H. Bannazadeh, and A. Leon-Garcia, “SAVI testbed:Control and management of converged virtual ICT resources,” in Inte-grated Network Management (IM 2013), 2013 IFIP/IEEE InternationalSymposium on, May 2013, pp. 664–667.
[6] “Global Internet Phenomena,” Sandvine, Tech. Rep.,2013. [Online]. Available: https://www.sandvine.com/trends/global-internet-phenomena/
[7] C. Labovitz, “Massive Ongoing Changes in Con-tent Distribution,” http://blog.streamingmedia.com/wp-
120
Bibliography 121
content/uploads/2013/07/2013CDNSummit-B102A.pdf, Tech. Rep.,2013.
[8] C. Labovitz, S. Iekel-Johnson, D. McPherson, J. Oberheide, and F. Jaha-nian, “Internet inter-domain traffic,” ACM SIGCOMM Computer Com-munication Review, vol. 41, no. 4, pp. 75–86, 2011.
[9] World Urbanization Prospects: The 2014 Revision, Highlights (ST/E-SA/SER.A/352), United Nations, Department of Economic and SocialAffairs, Population Division, 2014.
[10] J. M. Hernandez-Munoz, J. B. Vercher, L. Munoz, J. A. Galache,M. Presser, L. A. H. Gomez, and J. Pettersson, “Smart cities at the fore-front of the future internet,” in The Future Internet Assembly. Springer,2011, pp. 447–462.
[11] A. Shariat, A. Tizghadam, and A. Leon-Garcia, “An ICN-Based Publish-Subscribe platform to deliver UAV service in smart cities,” in 2016 IEEEConference on Computer Communications Workshops (INFOCOM WK-SHPS): SmartCity16: The 2nd IEEE INFOCOM Workshop on SmartCities and Urban Computing (SmartCity’16), San Francisco, USA, Apr.2016.
[12] ——, “Optimizing time to exhaustion in service providers us-ing Information-Centric networking,” in 28th International TeletrafficCongress (ITC 28), Wurzburg, Germany, Sep. 2016.
[13] T. Koponen, M. Chawla, B.-G. Chun, A. Ermolinskiy, K. H. Kim,S. Shenker, and I. Stoica, “A Data-Oriented (and beyond) Network Archi-tecture,” SIGCOMM Comput. Commun. Rev., vol. 37, no. 4, pp. 181–192,Aug. 2007.
[14] D. Smetters and V. Jacobson, “Securing Network Content,” Tech. Rep.,2009.
[15] M. Gritter and D. R. Cheriton, “An Architecture for Content Rout-ing Support in the Internet,” in Proceedings of the 3rd Conference onUSENIX Symposium on Internet Technologies and Systems - Volume 3,ser. USITS’01. Berkeley, CA, USA: USENIX Association, 2001, pp. 4–4.
[16] PURSUIT. [Online]. Available: http://www.fp7-pursuit.eu/PursuitWeb/
Bibliography 122
[17] N. Fotiou, P. Nikander, D. Trossen, and G. C. Polyzos, “Developing In-formation Networking Further: From PSIRP to PURSUIT,” BroadbandCommunications, Networks, and Systems, pp. 1–13, 2012.
[18] D. Lagutin, K. Visala, and S. Tarkoma, “Publish/Subscribe for Internet:PSIRP Perspective.” Future Internet Assembly, vol. 84, 2010.
[19] SAIL Project. [Online]. Available: http://www.sail-project.eu
[20] COMET Project. [Online]. Available: http://www.cometproject.eu/
[21] CONVERGENCE. [Online]. Available: http://www.ict-convergence.eu/
[22] V. Jacobson. (2006, August) A New Way to look at Networking. [Online].Available: https://www.youtube.com/watch?v=oCZMoY3q2uM
[23] Content Centric Networking. [Online]. Available: http://www.ccnx.org
[24] Named Data Networking. [Online]. Available: http://named-data.net
[25] G. Carofiglio, G. Morabito, L. Muscariello, I. Solis, and M. Varvello,“From Content Delivery Today to Information Centric Networking,” Com-put. Netw., vol. 57, no. 16, pp. 3116–3127, Nov. 2013.
[26] D. Rossi and G. Rossini, “On Sizing CCN Content Stores by Exploit-ing Topological Information,” in Computer Communications Workshops(INFOCOM WKSHPS), 2012 IEEE Conference on, March 2012, pp. 280–285.
[27] H. Yuan and P. Crowley, “Experimental Evaluation of Content Distribu-tion with Ndn and Http,” in INFOCOM, 2013 Proceedings IEEE, April2013, pp. 240–244.
[28] M. Varvello, D. Perino, and L. Linguaglossa, “On the Design and Im-plementation of a Wire-Speed pending Interest Table,” in Proceedings ofthe 2nd IEEE International Workshop on Emerging Design Choices inName-Oriented Networking, NOMEN, vol. 13, 2013.
[29] C. Dannewitz, M. D’Ambrosio, and V. Vercellone, “Hierarchical DHT-Based Name Resolution for Information-Centric Networks-Based NameResolution for Information-Centric Networks,” Comput. Commun.,vol. 36, no. 7, pp. 736–749, Apr. 2013.
Bibliography 123
[30] K. V. Katsaros, N. Fotiou, X. Vasilakos, C. N. Ververidis, C. Tsilopoulos,G. Xylomenos, and G. C. Polyzos, “On Inter-Domain Name Resolutionfor Information-Centric Networks,” in Proceedings of the 11th Interna-tional IFIP TC 6 Conference on Networking - Volume Part I, ser. IFIP’12.Berlin, Heidelberg: Springer-Verlag, 2012, pp. 13–26.
[31] A. Badam, K. Park, V. S. Pai, and L. L. Peterson, “Hashcache: CacheStorage for the Next Billion,” in Proceedings of the 6th USENIX Sympo-sium on Networked Systems Design and Implementation, ser. NSDI’09.Berkeley, CA, USA: USENIX Association, 2009, pp. 123–136.
[32] S. C. Nelson, G. Bhanage, and D. Raychaudhuri, “GSTAR: generalizedstorage-aware routing for mobilityfirst in the future mobile internet,” inProceedings of the sixth international workshop on MobiArch. ACM,2011, pp. 19–24.
[33] A. Tizghadam and A. Leon-Garcia, “Application platform for smart trans-portation,” in Future Access Enablers for Ubiquitous and Intelligent In-frastructures, ser. Lecture Notes of the Institute for Computer Sciences,Social Informatics and Telecommunications Engineering, V. Atanasovskiand A. Leon-Garcia, Eds. Springer International Publishing, 2015, vol.159, pp. 26–32.
[34] CVST Portal. [Online]. Available: http://portal.cvst.ca
[35] Smart Application on Virtual Infrastructure. [Online]. Available:http://www.savinetwork.ca/
[36] A. Carzaniga, M. Papalini, and A. L. Wolf, “Content-Based Publish/Sub-scribe Networking and Information-Centric Networking,” in Proceedingsof the ACM SIGCOMM Workshop on Information-centric Networking,ser. ICN ’11. New York, NY, USA: ACM, 2011, pp. 56–61.
[37] J. Chen, M. Arumaithurai, L. Jiao, X. Fu, and K. Ramakrishnan,“COPSS: An Efficient Content Oriented Publish/Subscribe System,” inArchitectures for Networking and Communications Systems (ANCS),2011 Seventh ACM/IEEE Symposium on, Oct 2011, pp. 99–110.
[38] H.-A. Jacobsen, “Publish/Subscribe,” in Encyclopedia of Database Sys-tems. Springer, 2009, pp. 2208–2211.
[39] ——, “Content-based Publish/Subscribe,” in Encyclopedia of DatabaseSystems. Springer, 2009, pp. 464–466.
Bibliography 124
[40] R. Baldoni, M. Contenti, and A. Virgillito, “The evolution of publish/sub-scribe communication systems,” in Future directions in distributed com-puting. Springer, 2003, pp. 137–141.
[41] G. Chockler, R. Melamed, Y. Tock, and R. Vitenberg, “Constructingscalable overlays for pub-sub with many topics,” in Proceedings of thetwenty-sixth annual ACM symposium on Principles of distributed com-puting. ACM, 2007, pp. 109–118.
[42] E. Fidler, H.-A. Jacobsen, G. Li, and S. Mankovski, “The PADRES Dis-tributed Publish/Subscribe System,” in FIW, 2005, pp. 12–30.
[43] Y. Zhang, A. Afanasyev, J. Burke, and L. Zhang, “A Survey of MobilitySupport in Named Data Networking,” in Proceedings of the third Work-shop on Name-Oriented Mobility: Architecture, Algorithms and Applica-tions (NOM’2016).
[44] “Cisco Visual Networking Index: The Zettabyte Era-Trends and Analy-sis,” Cisco, Tech. Rep., 2013.
[45] “Cisco Visual Networking Index: Global Mobile Data Traffic ForecastUpdate, 2013-2018,” Cisco, Tech. Rep., 2013.
[46] M. Rabinovich and O. Spatscheck, Web caching and replication. Addison-Wesley Reading, 2002.
[47] A.-M. K. Pathan, “Utility-oriented internetworking of content deliverynetworks,” Ph.D. dissertation, The University of Melbourne, 2009.
[48] B. Cain, A. Barbir, R. Nair, and O. Spatscheck, “Known Content Network(CN) Request-Routing Mechanisms,” 2003.
[49] J. Pang, A. Akella, A. Shaikh, B. Krishnamurthy, and S. Seshan, “On theresponsiveness of DNS-based network control,” in Proceedings of the 4thACM SIGCOMM conference on Internet measurement. ACM, 2004, pp.21–26.
[50] Netflix Open Connect Content Delivery Network. [Online]. Available:https://openconnect.itp.netflix.com/
[51] D. Rayburn. (2010) An Overview Of Transparent Caching and Its RoleIn The CDN Market. [Online]. Available: http://blog.streamingmedia.com/2010/10/an-overview-of-transparent-caching.html
Bibliography 125
[52] G. Tyson, S. Kaune, S. Miles, Y. El-khatib, A. Mauthe, and A. Taweel,“A trace-driven analysis of caching in content-centric networks,” in Com-puter Communications and Networks (ICCCN), 2012 21st InternationalConference on, July 2012, pp. 1–7.
[53] P. Agyapong and M. Sirbu, “Economic incentives in information- centricnetworking: implications for protocol design and public policy,” Commu-nications Magazine, IEEE, vol. 50, no. 12, pp. 18–26, December 2012.
[54] G. Carofiglio, M. Gallo, L. Muscariello, and D. Perino, “Modeling datatransfer in content-centric networking,” in Teletraffic Congress (ITC),2011 23rd International, Sept 2011, pp. 111–118.
[55] Y. Wang, Z. Li, G. Tyson, S. Uhlig, and G. Xie, “Optimal cache allocationfor content-centric networking,” in Network Protocols (ICNP), 2013 21stIEEE International Conference on, Oct 2013, pp. 1–10.
[56] RabbitMQ. [Online]. Available: https://www.rabbitmq.com/
[57] Elasticsearch. [Online]. Available: https://www.elastic.co/products/elasticsearch
[58] Percolator. [Online]. Available: https://www.elastic.co/blog/percolator
[59] Apache Avro. [Online]. Available: http://avro.apache.org/
[60] Apache Hadoop. [Online]. Available: http://hadoop.apache.org/
[61] GeoJson. [Online]. Available: http://geojson.org/
[62] A. Tizghadam and A. Leon-Garcia, “Betweenness centrality and resistancedistance in communication networks,” Network, IEEE, vol. 24, no. 6, pp.10–16, November 2010.
[63] E. M. Yeh, T. Ho, M. Burd, Y. Cui, and D. Leong, “Vip: A framework forjoint dynamic forwarding and caching in named data networks,” CoRR,vol. abs/1310.5569, 2013.
[64] R. Mahajan, N. Spring, D. Wetherall, and T. Anderson, “Inferring linkweights using end-to-end measurements,” in Proceedings of the 2Nd ACMSIGCOMM Workshop on Internet Measurment, ser. IMW ’02. New York,NY, USA: ACM, 2002, pp. 231–236.
Bibliography 126
[65] D. Applegate and E. Cohen, “Making intra-domain routing robust tochanging and uncertain traffic demands: Understanding fundamentaltradeoffs,” in Proceedings of the 2003 Conference on Applications, Tech-nologies, Architectures, and Protocols for Computer Communications, ser.SIGCOMM ’03. New York, NY, USA: ACM, 2003, pp. 313–324.
[66] A. Tizghadam and A. Leon-Garcia, “Robust network planning in nonuni-form traffic scenarios,” Computer Communications, vol. 34, no. 12, pp.1436 – 1449, 2011.