multidimensional analysis of atypical events in cyber-physical data

36
22/6/27 Multidimensional Analysis of Atypical Events in Cyber- Physical Data Lu-An Tang, Xiao Yu, Sangkyum Kim, Jiawei Han, Wen-Chih Peng, Yizhou Sun

Upload: kirti

Post on 13-Jan-2016

61 views

Category:

Documents


1 download

DESCRIPTION

Multidimensional Analysis of Atypical Events in Cyber-Physical Data. Lu-An Tang , Xiao Yu, Sangkyum Kim, Jiawei Han, Wen-Chih Peng, Yizhou Sun. Outline. Introduction Backgrounds Model Construction Query Processing Experiments. Introduction. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Multidimensional Analysis of Atypical Events in Cyber-Physical Data

23/4/21

Multidimensional Analysis of Atypical Events in Cyber-Physical Data

Lu-An Tang, Xiao Yu, Sangkyum Kim, Jiawei Han, Wen-Chih Peng, Yizhou Sun

Page 2: Multidimensional Analysis of Atypical Events in Cyber-Physical Data

23/4/21 2DAIS UIUC

OutlineOutline

Introduction

Backgrounds

Model Construction

Query Processing

Experiments

Page 3: Multidimensional Analysis of Atypical Events in Cyber-Physical Data

23/4/21 3DAIS UIUC

IntroductionIntroduction

Cyber Physical System: Integrate physical devices (e.g., sensors, cameras) with cyber components to form a situation aware analytical system

Many promising applicationstraffic observation

intruder/motion detection

battlefield surveillance

remote healthcare

Key task: Analyze the atypical data with multi-dimensional information

Page 4: Multidimensional Analysis of Atypical Events in Cyber-Physical Data

23/4/21 4DAIS UIUC •4

Motivation Example IMotivation Example I

Taffic Monitoring System: Typical CPSInductive loop sensors

Thousands, placed every few miles in highways

24 hours * 7 days

monitoring traffic and report congestions

Page 5: Multidimensional Analysis of Atypical Events in Cyber-Physical Data

23/4/21 5DAIS UIUC

Motivation Example IIMotivation Example II

Questions from Transportation OfficersWhen do the congestion usually happen in downtown?

Where do the congestion happen in the weekday?

In the past three months, which road is the most seriously congested, how do those congestion start?

Traditional SQL query cannot satisfy them

Page 6: Multidimensional Analysis of Atypical Events in Cyber-Physical Data

23/4/21 6DAIS UIUC

Our ContributionOur Contribution

They demand the results that are summarized, self-organized and succinct, be delivered in short time

Our goal

Construct a data model for atypical data in CPS

Support efficient query processing with such model

Page 7: Multidimensional Analysis of Atypical Events in Cyber-Physical Data

23/4/21 7DAIS UIUC

Challenges

Massive DataThousands of sensors generate giga-bytes, even tera-bytes of data

Complex EventThe atypical event is a dynamic process influencing multiple spatial regions

How to represent such an event? – new measure/model

Effectiveness & EfficiencyIf the query range is large, many events are involved

Retrieve the significant ones in short time – new algorithm

Page 8: Multidimensional Analysis of Atypical Events in Cyber-Physical Data

23/4/21 8DAIS UIUC

Our ContributionOur Contribution

Introduce the techniques to discover atypical events and summarize them as atypical micro-clusters

Integrate the similar micro-clusters to macro-clusters to generate big figure

Construct the data model of atypical cluster forest

Using a guiding algorithm to retrieve the significant cluster efficiently

Page 9: Multidimensional Analysis of Atypical Events in Cyber-Physical Data

23/4/21 9DAIS UIUC

OutlineOutline

Introduction

Backgrounds

Model Construction

Query Processing

Experiments

Page 10: Multidimensional Analysis of Atypical Events in Cyber-Physical Data

23/4/21 10DAIS UIUC

CPS Systems in Traffic Application

PeMS: collects data in California highway

CarWeb: collects real time GPS data from cars

Google Traffic: Toolkit on Google Map

CubeView by Shekhar et.al: Implement traditional OLAP on the traffic data

AITVS: based on CubeView, using two more distinct views to support investigation

Most focus on SQL based queries, lacking analysis power

Build on the whole dataset – huge I/O overhead, atypical data are dwarfed

Page 11: Multidimensional Analysis of Atypical Events in Cyber-Physical Data

23/4/21 11DAIS UIUC

Other Spatial OLAP Techniques

Spatial Cube by Stefanovic et. al: dimension members are spatially referenced and can be represented on a Map

Trajectory Cube by Giannotti et. al: include temporal, spatial, demo-graphic and techno-graphic dimensions, two kinds of measures: spatial measure and numerical measure

Flow Cube by Gonzalez et. al: analyzing item flows in RFID applications

Different object – cannot use them directly in this problem

Page 12: Multidimensional Analysis of Atypical Events in Cyber-Physical Data

23/4/21 12DAIS UIUC

PreliminariesPreliminaries

Atypical record: (s, t, f(s,t))s: sensor

t: reported time

f(s,t): severity measure

Analytical query Q(W, T, etc)W: spatial region

T: time period

There might be query conditions on other dimensions

Return total severity:

Too abstract

( , ) ( , )s W t T

F W T f s t

Page 13: Multidimensional Analysis of Atypical Events in Cyber-Physical Data

23/4/21 13DAIS UIUC

Problem FormulationProblem Formulation

Let R be the CPS dataset, retrieving the atypical events from R, designing a measure to represent the event and integrating the information of multiple events

Process analytical query Q in online time

We assume the atypical criteria is given and the atypical dataset can be acquired in advance

Page 14: Multidimensional Analysis of Atypical Events in Cyber-Physical Data

23/4/21 14DAIS UIUC

System OverviewSystem Overview

the CPS Dataset

Atypical Events

Micro-clusters

Macro-clusters

Analytical Query

Red-zone Guided Filtering

Qualified Clusters

Significant Clusters

Atypical Forest Construction Analytical Query Processing

Page 15: Multidimensional Analysis of Atypical Events in Cyber-Physical Data

23/4/21 15DAIS UIUC

OutlineOutline

Introduction

Backgrounds

Model Construction

Query Processing

Experiments

Page 16: Multidimensional Analysis of Atypical Events in Cyber-Physical Data

23/4/21 16DAIS UIUC

Atypical EventAtypical Event

Let us examine the atypical event -- congestion in traffic monitoring system :

start from a single segment of the streets

expand along the road and influence nearby roads

may cover hundred road segments when reaching the full size

The data records in a congestion are spatially close and timely relevant

Page 17: Multidimensional Analysis of Atypical Events in Cyber-Physical Data

23/4/21 17DAIS UIUC

Retrieve the Atypical EventRetrieve the Atypical Event

Scan the dataset, retrieve the atypical records and group them by a time threshold and distance threshold

The atypical event is a set of atypical recordsThe size is not bounded (or bounded by the size of dataset R)

Difficult to represent and integrate

Too detail -- not a good measure Atypical RecordsID

EA

<s1, 8:05am - 8:10am, 4 min>; <s1, 8:10am - 8:15am, 5 min>; <s2, 8:10am - 8:15am, 5 min>; <s3, 8:15am - 8:20am, 5 min>; <s4, 8:15am - 8:20am, 2 min>; . . .

EB

<s3, 6:20pm - 6:25pm, 2 min>; <s4, 6:20pm - 6:25pm, 5 min>; <s1, 6:25pm - 6:30pm, 5 min>; <s4, 6:25pm - 6:30pm, 5 min>; <s5, 6:30pm - 6:35pm, 5 min>; . . .

EC

<s1, 8:20am - 8:25am, 1 min>; <s1, 8:25am - 8:30am, 5 min>; <s9, 8:25am - 8:30am, 5 min>; <s1, 8:30am - 8:35am, 5 min>; <s7, 8:35am - 8:40am, 3 min>; . . .

Page 18: Multidimensional Analysis of Atypical Events in Cyber-Physical Data

23/4/21 18DAIS UIUC

Atypical Micro-ClusterAtypical Micro-Cluster

Aggregate the atypical records in one dimensionSummarize the total severity by sensors (sensor/spatial feature)

Summarize the total severity by time window (temporal features)

The size is bounded by the total numbers of sensors and time windows

Still keeping detailed information

Page 19: Multidimensional Analysis of Atypical Events in Cyber-Physical Data

23/4/21 19DAIS UIUC

Example in Congestion EventExample in Congestion Event

Atypical RecordsID

EA

<s1, 8:05am - 8:10am, 4 min>; <s1, 8:10am - 8:15am, 5 min>; <s2, 8:10am - 8:15am, 5 min>; <s3, 8:15am - 8:20am, 5 min>; <s4, 8:15am - 8:20am, 2 min>; . . .

EB

<s3, 6:20pm - 6:25pm, 2 min>; <s4, 6:20pm - 6:25pm, 5 min>; <s1, 6:25pm - 6:30pm, 5 min>; <s4, 6:25pm - 6:30pm, 5 min>; <s5, 6:30pm - 6:35pm, 5 min>; . . .

EC

<s1, 8:20am - 8:25am, 1 min>; <s1, 8:25am - 8:30am, 5 min>; <s9, 8:25am - 8:30am, 5 min>; <s1, 8:30am - 8:35am, 5 min>; <s7, 8:35am - 8:40am, 3 min>; . . .

ID Spatial Features Temporal Features

CA

<s1, 182 min>; <s2, 97 min>; <s3, 33 min>; <s4, 12 min>;…

<8:05am - 8:10 am, 4 min>; <8:10 am - 8:15 am, 10 min>;…

CB

<s1, 12 min>; <s2, 51 min>; <s3, 34 min>; <s4, 140 min>;…

<6:20 pm - 6:25 pm, 7 min>; <6:25 pm - 6:30 pm, 13 min>;…

CC

<s1, 103 min>; <s2, 75 min>; <s7, 54 min>; <s9, 60 min>;…

<8:20am - 8:25 am, 1 min>; <8:25 am - 8:30 am, 15 min>;…

Page 20: Multidimensional Analysis of Atypical Events in Cyber-Physical Data

23/4/21 20DAIS UIUC

Integrate the Micro-clustersIntegrate the Micro-clusters

The micro-clusters represent an individual event

Atypical events may happen in similar places/time

For example, 10E highway congested in evening rush hours in weekday

For analytical purposes, it is helpful to group those similar congestions as a whole

Two sub-problems:Which ones to merge?

How to merge?

Page 21: Multidimensional Analysis of Atypical Events in Cyber-Physical Data

23/4/21 21DAIS UIUC

Similarity Measure for Atypical ClustersSimilarity Measure for Atypical Clusters

Basic PrinciplesConsider the similarity on multiple dimensions – users may specify a preference weight

Weighted measure on the data themselves (e.g., if sensor s1

report higher severities in the clusters than s2, then the

weight of s1 is higher) – employ the severity as weight

Page 22: Multidimensional Analysis of Atypical Events in Cyber-Physical Data

23/4/21 22DAIS UIUC

Cluster IntegrationCluster Integration

For two clusters C1 and C2, the system

carry out aggregation on the feature of each dimension

for the common items, sum up their severity

keep the non-overlap items

Example

C1 {s1, 100 min; s2, 20 min}

C2 {s1, 30 min; s3, 40 min}

C23{s1, 130 min; s2, 20 min; s3, 40 min}

The spatial and temporal features are algebraic –efficient to aggregate

Page 23: Multidimensional Analysis of Atypical Events in Cyber-Physical Data

23/4/21 23DAIS UIUC

Macro-ClustersMacro-Clusters

The macro-clusters are generated by merging the micro-clusters

The similarities are computed among those macro-clusters and even larger ones can be further generated

Page 24: Multidimensional Analysis of Atypical Events in Cyber-Physical Data

23/4/21 24DAIS UIUC

Clustering ForestClustering Forest

The clusters make up the hierarchy of a tree

Different aggregate paths (preference on dimensions ) form a cluster forest

c1 c2 c3 c4 c5 c6 c7 c8 c9 c10

b1 b4 b6

a1

b5b3b2

a2

Page 25: Multidimensional Analysis of Atypical Events in Cyber-Physical Data

23/4/21 25DAIS UIUC

OutlineOutline

Introduction

Backgrounds

Model Construction

Query Processing

Experiments

Page 26: Multidimensional Analysis of Atypical Events in Cyber-Physical Data

23/4/21 26DAIS UIUC

The Efficiency Problem on Online QueryThe Efficiency Problem on Online Query

Usually it is not realistic to materialize the entire data forest

Only some middle results (i.e., the micro-clusters in lower level cells) are pre-computed (Partial materialization)

The time complexity of the cluster integration algorithm is O(n2)

Query efficiency will be influenced if n is large –the analytical query Q(W, T) usually covers large region with long time – n is indeed large

Page 27: Multidimensional Analysis of Atypical Events in Cyber-Physical Data

23/4/21 27DAIS UIUC

The Effectiveness ProblemThe Effectiveness Problem

In the result, only few significant macro-clusters are generated

The remaining are the trivial ones that cannot be aggregated with others

Page 28: Multidimensional Analysis of Atypical Events in Cyber-Physical Data

23/4/21 28DAIS UIUC

Pruning-beforehand StrategyPruning-beforehand Strategy

Filter out the insignificant micro-clusters

The insignificant micro-clusters may integrate together and generate significant macro-clusters

Can we foretell which micro-cluster will contribute to significant macro-clusters?

Page 29: Multidimensional Analysis of Atypical Events in Cyber-Physical Data

23/4/21 29DAIS UIUC

Red-Zone Guided ClusteringRed-Zone Guided Clustering

Since it is fast to compute the total severity in a specified region

Select out the regions with high severities (red zones)

Filter out the micro-clusters locating outside those red zones

Only keep the ones in/intersect with red zones (where the significant macro-clusters may locate)

Page 30: Multidimensional Analysis of Atypical Events in Cyber-Physical Data

23/4/21 30DAIS UIUC

Red-Zone Guided Clustering ExampleRed-Zone Guided Clustering Example

a

bc

d

e

f

g

h

i j

k

m

no

Page 31: Multidimensional Analysis of Atypical Events in Cyber-Physical Data

23/4/21 31DAIS UIUC

OutlineOutline

Introduction

Backgrounds

Model Construction

Query Processing

Experiments

Page 32: Multidimensional Analysis of Atypical Events in Cyber-Physical Data

23/4/21 32DAIS UIUC

Experiment SetupExperiment Setup

PeMS datasets from UC Berkeley1 year traffic data

4,076 loop detectors in 38 freeways in California

totally 54 GB

HardwareInter 2200 Dual CPU @ 2.20G Hz and 2.19G Hz

1.98 GB RAM; Windows XP SP2.

All the algorithms are implemented in Java

Page 33: Multidimensional Analysis of Atypical Events in Cyber-Physical Data

23/4/21 33DAIS UIUC

Model Construction Model Construction

Comparing Atypical Cluster (AC) with Original CubeView (OC) and Modified CubeView (MC)

AC is an order of magnitude faster than OC

Page 34: Multidimensional Analysis of Atypical Events in Cyber-Physical Data

23/4/21 34DAIS UIUC

Query EfficienyQuery Efficieny

All: Do not prune; Pru: Prune beforehand; Gui: Guided Clustering

Gui cost 20% time of All, and is close to Pru

Page 35: Multidimensional Analysis of Atypical Events in Cyber-Physical Data

23/4/21 35DAIS UIUC

Query EffectivenessQuery Effectiveness

Ground Truth: Generated by All

Pru may miss real significant macro-clusters, but Gui can guarantee the recall

Page 36: Multidimensional Analysis of Atypical Events in Cyber-Physical Data

23/4/21 36DAIS UIUC

ConclusionsConclusions

We have investigated the problem of multi-dimensional analysis of atypical events in CPS

Atypical cluster is designed to present the event and serve as the measure for data model

The red-zone algorithm is proposed to retrieve the significant clusters for analytical query

Performance evaluation on real large datasets

Thank You Very Much!Thank You Very Much!Any Questions?Any Questions?