mplane reasoner(s) & analysis modules pedro casas ftw vienna mplane final workshop 30 november...

mPlane Reasoner(s) & Analysis Modules Pedro Casas FTW Vienna mPlane final workshop 30 November 2015, Heidelberg

Upload: charity-charleen-fletcher

Post on 17-Jan-2016




1 download


Page 1: MPlane Reasoner(s) & Analysis Modules Pedro Casas FTW Vienna mPlane final workshop 30 November 2015, Heidelberg

mPlane Reasoner(s) & Analysis Modules

Pedro CasasFTW Vienna

mPlane final workshop30 November 2015, Heidelberg

Page 2: MPlane Reasoner(s) & Analysis Modules Pedro Casas FTW Vienna mPlane final workshop 30 November 2015, Heidelberg



mPlane itervative measurement

Measurement Layer

mInterface mInterface mInterface mInterface mInterface mInterfacee

mProbe 1 mProbe 2 mProbe N legacyProbe 1 legacyProbe 2 legacyProbe N

WP2Raw data





ry a

nd A





legacyDB 1

legacyDB 2

legacyDB N

mPlaneRepository DBStream


Data collection& processing

Intelligent Reasoner



le 1



le 2



le N

Analysis Modules

Page 3: MPlane Reasoner(s) & Analysis Modules Pedro Casas FTW Vienna mPlane final workshop 30 November 2015, Heidelberg


The Useful – Coordination and Analysis

The mPlane Reasoner(s)

Analysis Modules

Page 4: MPlane Reasoner(s) & Analysis Modules Pedro Casas FTW Vienna mPlane final workshop 30 November 2015, Heidelberg

WP4 Overview

Intelligent Reasoner for Iterative and Adaptive Analysis Guides and automates the iterative measurement and exploration, diagnosis


Monitoring Data Analysis Modules Complex data analysis, high visibility, filter data accessed at Repos, very specific

data (low volume) from probes

Supervisor The glue of the mPlane protocol Provides centralized control of

distributed measurement framework


Page 5: MPlane Reasoner(s) & Analysis Modules Pedro Casas FTW Vienna mPlane final workshop 30 November 2015, Heidelberg

The Reasoner is responsible for driving the measurement analysis process, which by nature is iterative, and ideally adaptive (learning).

Depending on the use-case, the Reasoner has different roles:

In the case of troubleshooting support iteratively find the Root Causes of the associated problems

In the case of generic measurement analysis automate the iterative process

Each use case defines/instantiates a specific Reasoner addressing its goals

Still, generic design rules of a specific Reasoner can be reused in other use cases

The mPlane Reasoner

Page 6: MPlane Reasoner(s) & Analysis Modules Pedro Casas FTW Vienna mPlane final workshop 30 November 2015, Heidelberg

The Reasoner – ComponentsThe Reasoner consists of 3 different blocks:

The Knowledge Structure: The memory or knowledge of the system Initially based on expert domain knowledge (diagnosis rules) Extended by learning from past experiences (knowledge discovery)

The Reasoning/Diagnosis Process: Automates/structures the iterative analysis

The Knowledge Discovery Process: Enriches the knowledge structure and the reasoning process Based on learning (supervised/unsupervised)

Page 7: MPlane Reasoner(s) & Analysis Modules Pedro Casas FTW Vienna mPlane final workshop 30 November 2015, Heidelberg

The Reasoner – The Overall Picture


The “Knowledge” of the Reasoner

Knowledge Discovery

What I Know


Automate Analysis, based on what I know

Page 8: MPlane Reasoner(s) & Analysis Modules Pedro Casas FTW Vienna mPlane final workshop 30 November 2015, Heidelberg

The Diagnosis Process (1/2) The Reasoner does not work on raw data, but on events

An event captures a particular type of network conditions

E.g., link congestion, YouTube throughput drop, overloaded cell, Google CDN load-

balancing, etc.

Events are extracted from raw measurements through a retrieval process (actual

algorithms at WP2, WP4, queries, etc. )

Events are defined as m-tuples including the following fields: event name: e.g., link overload. location type: e.g., Gn downlink interface. time span: e.g., 2013-10-21-12:30:00, 2013-10-21-12:35:00. retrieval process: e.g., Simple Link Congestion Detection Algorithm – SLCDA (with

utilization threshold Cth). additional diagnosis features: e.g., number of flows, number of bytes, list of server

IPs originating the flows, etc.

Page 9: MPlane Reasoner(s) & Analysis Modules Pedro Casas FTW Vienna mPlane final workshop 30 November 2015, Heidelberg

The Diagnosis Process (2/2) Some examples of events related to Root Cause Analysis (RCA)

1. A congested Gn interface in a mobile ISP during 5’:

2. An anomaly detected in YouTube traffic, impacting users’ QoE for 5’:

Page 10: MPlane Reasoner(s) & Analysis Modules Pedro Casas FTW Vienna mPlane final workshop 30 November 2015, Heidelberg

Diagnosis Graph (1/4) Relates problems/issues with events and root causes, exploring the

temporal and spatial relationships between events

Which type of diagnosis graph reasoning? Rule-based reasoning (decision-tree like graph)

Easier to implement and configure (easy to add domain knowledge)

Gives simple and direct association between the diagnosed root cause and the evidence(s) for better interpretation

It is very effective in the practice

Other types of Iterative Reasoning can be implemented in such a way (not only RCA, but generic iterative measurement processes)

Using per use-case graphs, the Reasoner looks for the presence of events, and identifies the root cause as the leaf with the highest probability

Page 11: MPlane Reasoner(s) & Analysis Modules Pedro Casas FTW Vienna mPlane final workshop 30 November 2015, Heidelberg

Example: Who to blame when YouTube is not working?

AS 2AS 1

ISP Network

Devices? ISP? Internet? YouTube?


Page 12: MPlane Reasoner(s) & Analysis Modules Pedro Casas FTW Vienna mPlane final workshop 30 November 2015, Heidelberg

Diagnosis Graph (2/4) An example of a Diagnosis Graph (DG) associated to the detection and RCA of QoE-

relevant anomalies in YouTube:

In the example, the DG is structured in 5 different macro-blocks:

① QoE-relevant Anomaly Detection block

② End-device Diagnosis block

③ ISP Diagnosis block

④ Internet paths Diagnosis block

⑤ CDN servers Diagnosis block

Example of root causes and the associated rules’ description

Page 13: MPlane Reasoner(s) & Analysis Modules Pedro Casas FTW Vienna mPlane final workshop 30 November 2015, Heidelberg

ISP Diagnosis block

Purpose: detect QoE degradation


1) Continuous passive monitoring

2) Trigger of active monitoring in case of alarms

Diagnosis Graph (3/4)

Page 14: MPlane Reasoner(s) & Analysis Modules Pedro Casas FTW Vienna mPlane final workshop 30 November 2015, Heidelberg

High level Diagnosis Graph for ISP (simplified from D4.2):

Triggers Internet Active


Alarms from

different POPs?

Issue external to SP domain

Alarms from

different BRAS?

Issue in SP Core Network

Issue on BRAS

Issue on DSLAM

Issue on Access Lines

Triggers POP Active Probe

Inter-domain measurements


Triggers DSLAM Active


Diagnosis Graph (4/4)

Page 15: MPlane Reasoner(s) & Analysis Modules Pedro Casas FTW Vienna mPlane final workshop 30 November 2015, Heidelberg

Knowledge Discovery Domain knowledge and operational experience is incomplete (just using domain-

based diagnosis graphs limits the system capabilities)

Therefore, the specification of an initial diagnosis graph can be rather under-performing, both in accuracy and completeness

The role of Automatic Knowledge Discovery correlate all the events that occur at the same time and are spatially related to the service problem under investigation…

…And learn new diagnosis rules (new knowledge) from past experiences Supervised learning in case of labeled data

Unsupervised learning in the general case

Some mPlane techniques : Automatic Rule Mining, Sub-Space Clustering, Decision–Trees Learning

Final expert intervention to validate the identified diagnosis rules, which are added to the Knowledge Structure

Page 16: MPlane Reasoner(s) & Analysis Modules Pedro Casas FTW Vienna mPlane final workshop 30 November 2015, Heidelberg

Multiple mPlane Reasoners A mPlane Reasoner is an extended mPlane client, which performs

sequential tasks based on intermediate analysis results, actuating through the mPlane Supervisor interfaces

In the practice, we implemented different Reasoners following the aforementioned principles, but tailored to the specific needs of each use case:

1. Reasoner in nodejs: basic mPlane Reasoner

2. Reasoner for Content Popularity Estimation

3. Reasoner for Content Curation

4. Reasoner for Web browsing QoE

5. Reasoner for Mobile Network RCA

6. Reasoner for Anomaly Detection and RCA

7. Reasoner for SLA Verification

8. Reasoner for Multimedia Content Delivery Analysis

9. Reasoner for GLIMPSE

Page 17: MPlane Reasoner(s) & Analysis Modules Pedro Casas FTW Vienna mPlane final workshop 30 November 2015, Heidelberg

Analysis Modules or Algorithms further evaluate the measurements gathered and pre-analyzed by the lower layers of mPlane

They operate on low amounts of data (as compared with the data available on WP3 or eventually gathered at WP2)

Analysis Modules

Page 18: MPlane Reasoner(s) & Analysis Modules Pedro Casas FTW Vienna mPlane final workshop 30 November 2015, Heidelberg

Per-use case algorithmsThe main Analysis Modules are linked to the proposed use cases:

Find the cause of Quality of Experience (QoE) degradations

Estimate the future popularity trends of services and contents for network optimization

Classify and promote interesting web content to end-users

Assess and troubleshoot performance and quality of multimedia stream delivery

Diagnose performance issues in web and identify the segment that is responsible for the QoE


Find root cause of problems related to connectivity and poor QoE on mobile devices

Detect and diagnose anomalies in Internet-scale services (e.g., CDN-based services)

Verify SLAs

…but there is more

Page 19: MPlane Reasoner(s) & Analysis Modules Pedro Casas FTW Vienna mPlane final workshop 30 November 2015, Heidelberg

QoE QoE-based monitoring for YouTube: metrics to detect playback

stallings Relate OWD variation to QoE, for generic class of applications

Topology Detect Anycast Services: determine if a service uses IP anycast Reverse Traceroute – DisNETPerf: find probes near some point of

interest in the network to launch active measurements Topology discovery: identification of middle boxes, TCP proxies

and NATs MPLS transit tunnel analysis: Classification of MPLS tunnels

based on their usage/purpose (mono-path, ECMP, multi-FEC, etc.) Topology/Performance

Analyze dynamics of forwarding and routing paths : determine whether routing paths follow perturbations experienced by forwarding paths or vice versa

Prediction of Unmeasured Paths: Inference of path properties (RTT, Available Bandwidth, etc.) on unmeasured network paths

Some Extended Analysis Modules

Page 20: MPlane Reasoner(s) & Analysis Modules Pedro Casas FTW Vienna mPlane final workshop 30 November 2015, Heidelberg

Partial Mapping of Analysis Modules to Use Cases

Reasoners and Analysis Modules (as well as everything presented so far during the day) isavailable at the mPlane website as soft tools:

Page 21: MPlane Reasoner(s) & Analysis Modules Pedro Casas FTW Vienna mPlane final workshop 30 November 2015, Heidelberg

- 21 -

0 1 2 3 4 5 61





number of stallings




4 seconds of stalling

On the real mobile network

Lab studies

1 single stalling event heavily deteriorates the experience of the end-user

2 or more stallings already means bad quality

Duration of the stallings is less critical, but also has an important impact on QoE

Stallings are the impairments perceived by the end-user (independently of the video resolution, or even DASH)

MOS = F( N,


Selected examples I: YouTube QoE

Page 22: MPlane Reasoner(s) & Analysis Modules Pedro Casas FTW Vienna mPlane final workshop 30 November 2015, Heidelberg

We introduced a simple KPI to monitor YouTube QoE from passive network measurements

Buffer depletion generally occurs because the downlink bandwidth is lower than the video bitrate

  Ex: std 360p YouTube videos VBR=600 kbps DBW > 750 kbps

Stallings and Download Throughput

Page 23: MPlane Reasoner(s) & Analysis Modules Pedro Casas FTW Vienna mPlane final workshop 30 November 2015, Heidelberg

Selected examples II: Anomaly Detection and Diagnosis

(1) Reference-Set identification: find past traffic distributions which are a suitable reference of normality

(2) AD test: use a normalized variant of the Kullback-Leibler divergence to decide if current distribution is compatible with the reference-set




x1 and x2 are similar → L(x1,x2) is smallx1 and x3 are dissimilar → L(x1,x3) is large




We conceived a statistical AD tool which works with full feature distributions

AD algorithm consists of two phases:

Page 24: MPlane Reasoner(s) & Analysis Modules Pedro Casas FTW Vienna mPlane final workshop 30 November 2015, Heidelberg

Using ADTool for Detecting and Diagnosing Anomalies

Many interesting service anomalies are observed as abrupt changes in the DNS counts

Reasoner approach: correlate observations from multiple metrics revealing service-related and/or device related anomalies:

Fully Qualified Domain Name Device OS Device manufacturer (TAC number in mobile devices) HTTP response code and so on…

Example: service/device related real anomaly in mobile devices

Page 25: MPlane Reasoner(s) & Analysis Modules Pedro Casas FTW Vienna mPlane final workshop 30 November 2015, Heidelberg

Selected examples: Anomaly Detection and DiagnosisDNS queries counts in a mobile network

Periodic spikes daily synchronization events

Peak hour utilization

Traffic anomaly, what’s that? easy to detect, not so easy to diagnose

Similar behavior in tablets The anomaly is only

observable for Apple devices (Akamai DNS) (Apple Push Notification Service)

Connection issues to Apple push notification servers

Page 26: MPlane Reasoner(s) & Analysis Modules Pedro Casas FTW Vienna mPlane final workshop 30 November 2015, Heidelberg

Problem solved:

Anycast enumeration and geolocation

Iterative methodology based on geographically distributed VPs Determine if a service uses IP anycast Enumerate replicas sharing the same IP

address Geolocate those replicas

The iterative workflow is lightweight O(100) pkts, and fast O(100) ms

Shall support RIPE, mPlane/Planetlab probes (RIPE integration in mPlane)

Selected examples III: Anycast Detection

Page 27: MPlane Reasoner(s) & Analysis Modules Pedro Casas FTW Vienna mPlane final workshop 30 November 2015, Heidelberg

Selected examples IV: DisNETPerf Problem solved Reverse Traceroute (no IP spoofing nor IP record):

find the mPlane probe that is closest to a given Point of Interest (PoI) to enable troubleshooting on the path from that PoI to some user without control on the PoI side (e.g., YouTube server)

Neighborhood model: combined topology- and delay-based distance (BGP same AS + min RTT)

Main idea: we rely on a large set of probes widely spread (e.g., RIPE Atlas) Given IPs (eg YouTube) and IPd (eg, PoP @Heidelberg), locate IPdisnet IPdisnet “mimics” IPs in terms of IPs IPd path similarity Run traceroute measurements from IPc to IPd Collect data for troubleshooting-purposes

Page 28: MPlane Reasoner(s) & Analysis Modules Pedro Casas FTW Vienna mPlane final workshop 30 November 2015, Heidelberg

DisNETPerf in a Nutshell

mPlane – 2nd Review MeetingBrussels, February 10th, 2015

Reverse Traceroute IPs IPd?

Page 29: MPlane Reasoner(s) & Analysis Modules Pedro Casas FTW Vienna mPlane final workshop 30 November 2015, Heidelberg
Page 30: MPlane Reasoner(s) & Analysis Modules Pedro Casas FTW Vienna mPlane final workshop 30 November 2015, Heidelberg

Backup slides

Page 31: MPlane Reasoner(s) & Analysis Modules Pedro Casas FTW Vienna mPlane final workshop 30 November 2015, Heidelberg

Selected examples I: Content Popularity

Early detection of contents which will receive attention



Page 32: MPlane Reasoner(s) & Analysis Modules Pedro Casas FTW Vienna mPlane final workshop 30 November 2015, Heidelberg

How mPlane can make it happen

Probes (passive)


Analysis ModulesPopularity ModelerPopularity Predictor

ReasonerDetect devices and

caches close to location

SupervisorNotify popular


HTTP requests

CDN supervisorCaching strategies

based on future popular contents

Page 33: MPlane Reasoner(s) & Analysis Modules Pedro Casas FTW Vienna mPlane final workshop 30 November 2015, Heidelberg

Preliminary Results

Popularity Modeler and Predictor modules Topic models: GMM + LDA Maximum likelihood

Caching policy based on content popularity vs. LRU and LFU (Least Recently/Frequently Used)

We improve the SotA algorithms by obtaining the similar RMSE for a much smaller observation window (30’ vs. 4 hs)


Page 34: MPlane Reasoner(s) & Analysis Modules Pedro Casas FTW Vienna mPlane final workshop 30 November 2015, Heidelberg

Selected examples II: Passive Media Curation

A new way of helping users finding, fast, relevant content in the web


User clicks are a good measure of Interest (users don’t click randomly)

Curated (relevant) content

Page 35: MPlane Reasoner(s) & Analysis Modules Pedro Casas FTW Vienna mPlane final workshop 30 November 2015, Heidelberg



WP4.1 – Analysis Modules

Portals vs Contents

Content popularityStatisticsClassify Contents

Elect Content to promote


Publish Content

How can mPlane make it happen

User URLs

Interesting URLs

WP3- Scalable data analysis

Orchestrate running since few months)

Up to 5M requests/hour

WP2’ (active)

Page 36: MPlane Reasoner(s) & Analysis Modules Pedro Casas FTW Vienna mPlane final workshop 30 November 2015, Heidelberg

Content versus Portal classifier module

Features: Hostname URL length Frequency as hostname Request Arrival Process cross-correlation 1-day periodicity

« Feed » a naive bayes classifier Tested different combinations The best is: URL length+period:

As accurate as the 5 features together

Accuracy Tested on manually verified ground truth traces Used 2/3 for training and 1/3 for “prediction” Overall 96% accuracy of the classifier 94% precision, 100% recall in detecting content-URLs

Content-URL Web,

Page 37: MPlane Reasoner(s) & Analysis Modules Pedro Casas FTW Vienna mPlane final workshop 30 November 2015, Heidelberg

Content promotion module (in progress) Three types of promoted content-URLs so far

Live stream: News/Videos/Blogs currently attracting the attention of the crowd Top : Most popular (last day, week, month etc) over all content-URLs. Hot: A « mixture » of popularity and freshness (adapted from reddit’s hot algorithm)

First users like them!

Timestamp of first view

Absolute reference: start date of Netcurator

A freshness constant period (12 hours)



Very relevant


Extremely relevant



Not that relevant
