multidimensional analysis of atypical events in cyber-physical data
Post on 13-Jan-2016
61 Views
Preview:
DESCRIPTION
TRANSCRIPT
23/4/21
Multidimensional Analysis of Atypical Events in Cyber-Physical Data
Lu-An Tang, Xiao Yu, Sangkyum Kim, Jiawei Han, Wen-Chih Peng, Yizhou Sun
23/4/21 2DAIS UIUC
OutlineOutline
Introduction
Backgrounds
Model Construction
Query Processing
Experiments
23/4/21 3DAIS UIUC
IntroductionIntroduction
Cyber Physical System: Integrate physical devices (e.g., sensors, cameras) with cyber components to form a situation aware analytical system
Many promising applicationstraffic observation
intruder/motion detection
battlefield surveillance
remote healthcare
Key task: Analyze the atypical data with multi-dimensional information
23/4/21 4DAIS UIUC •4
Motivation Example IMotivation Example I
Taffic Monitoring System: Typical CPSInductive loop sensors
Thousands, placed every few miles in highways
24 hours * 7 days
monitoring traffic and report congestions
23/4/21 5DAIS UIUC
Motivation Example IIMotivation Example II
Questions from Transportation OfficersWhen do the congestion usually happen in downtown?
Where do the congestion happen in the weekday?
In the past three months, which road is the most seriously congested, how do those congestion start?
Traditional SQL query cannot satisfy them
23/4/21 6DAIS UIUC
Our ContributionOur Contribution
They demand the results that are summarized, self-organized and succinct, be delivered in short time
Our goal
Construct a data model for atypical data in CPS
Support efficient query processing with such model
23/4/21 7DAIS UIUC
Challenges
Massive DataThousands of sensors generate giga-bytes, even tera-bytes of data
Complex EventThe atypical event is a dynamic process influencing multiple spatial regions
How to represent such an event? – new measure/model
Effectiveness & EfficiencyIf the query range is large, many events are involved
Retrieve the significant ones in short time – new algorithm
23/4/21 8DAIS UIUC
Our ContributionOur Contribution
Introduce the techniques to discover atypical events and summarize them as atypical micro-clusters
Integrate the similar micro-clusters to macro-clusters to generate big figure
Construct the data model of atypical cluster forest
Using a guiding algorithm to retrieve the significant cluster efficiently
23/4/21 9DAIS UIUC
OutlineOutline
Introduction
Backgrounds
Model Construction
Query Processing
Experiments
23/4/21 10DAIS UIUC
CPS Systems in Traffic Application
PeMS: collects data in California highway
CarWeb: collects real time GPS data from cars
Google Traffic: Toolkit on Google Map
CubeView by Shekhar et.al: Implement traditional OLAP on the traffic data
AITVS: based on CubeView, using two more distinct views to support investigation
Most focus on SQL based queries, lacking analysis power
Build on the whole dataset – huge I/O overhead, atypical data are dwarfed
23/4/21 11DAIS UIUC
Other Spatial OLAP Techniques
Spatial Cube by Stefanovic et. al: dimension members are spatially referenced and can be represented on a Map
Trajectory Cube by Giannotti et. al: include temporal, spatial, demo-graphic and techno-graphic dimensions, two kinds of measures: spatial measure and numerical measure
Flow Cube by Gonzalez et. al: analyzing item flows in RFID applications
Different object – cannot use them directly in this problem
23/4/21 12DAIS UIUC
PreliminariesPreliminaries
Atypical record: (s, t, f(s,t))s: sensor
t: reported time
f(s,t): severity measure
Analytical query Q(W, T, etc)W: spatial region
T: time period
There might be query conditions on other dimensions
Return total severity:
Too abstract
( , ) ( , )s W t T
F W T f s t
23/4/21 13DAIS UIUC
Problem FormulationProblem Formulation
Let R be the CPS dataset, retrieving the atypical events from R, designing a measure to represent the event and integrating the information of multiple events
Process analytical query Q in online time
We assume the atypical criteria is given and the atypical dataset can be acquired in advance
23/4/21 14DAIS UIUC
System OverviewSystem Overview
the CPS Dataset
Atypical Events
Micro-clusters
Macro-clusters
Analytical Query
Red-zone Guided Filtering
Qualified Clusters
Significant Clusters
Atypical Forest Construction Analytical Query Processing
23/4/21 15DAIS UIUC
OutlineOutline
Introduction
Backgrounds
Model Construction
Query Processing
Experiments
23/4/21 16DAIS UIUC
Atypical EventAtypical Event
Let us examine the atypical event -- congestion in traffic monitoring system :
start from a single segment of the streets
expand along the road and influence nearby roads
may cover hundred road segments when reaching the full size
The data records in a congestion are spatially close and timely relevant
23/4/21 17DAIS UIUC
Retrieve the Atypical EventRetrieve the Atypical Event
Scan the dataset, retrieve the atypical records and group them by a time threshold and distance threshold
The atypical event is a set of atypical recordsThe size is not bounded (or bounded by the size of dataset R)
Difficult to represent and integrate
Too detail -- not a good measure Atypical RecordsID
EA
<s1, 8:05am - 8:10am, 4 min>; <s1, 8:10am - 8:15am, 5 min>; <s2, 8:10am - 8:15am, 5 min>; <s3, 8:15am - 8:20am, 5 min>; <s4, 8:15am - 8:20am, 2 min>; . . .
EB
<s3, 6:20pm - 6:25pm, 2 min>; <s4, 6:20pm - 6:25pm, 5 min>; <s1, 6:25pm - 6:30pm, 5 min>; <s4, 6:25pm - 6:30pm, 5 min>; <s5, 6:30pm - 6:35pm, 5 min>; . . .
EC
<s1, 8:20am - 8:25am, 1 min>; <s1, 8:25am - 8:30am, 5 min>; <s9, 8:25am - 8:30am, 5 min>; <s1, 8:30am - 8:35am, 5 min>; <s7, 8:35am - 8:40am, 3 min>; . . .
23/4/21 18DAIS UIUC
Atypical Micro-ClusterAtypical Micro-Cluster
Aggregate the atypical records in one dimensionSummarize the total severity by sensors (sensor/spatial feature)
Summarize the total severity by time window (temporal features)
The size is bounded by the total numbers of sensors and time windows
Still keeping detailed information
23/4/21 19DAIS UIUC
Example in Congestion EventExample in Congestion Event
Atypical RecordsID
EA
<s1, 8:05am - 8:10am, 4 min>; <s1, 8:10am - 8:15am, 5 min>; <s2, 8:10am - 8:15am, 5 min>; <s3, 8:15am - 8:20am, 5 min>; <s4, 8:15am - 8:20am, 2 min>; . . .
EB
<s3, 6:20pm - 6:25pm, 2 min>; <s4, 6:20pm - 6:25pm, 5 min>; <s1, 6:25pm - 6:30pm, 5 min>; <s4, 6:25pm - 6:30pm, 5 min>; <s5, 6:30pm - 6:35pm, 5 min>; . . .
EC
<s1, 8:20am - 8:25am, 1 min>; <s1, 8:25am - 8:30am, 5 min>; <s9, 8:25am - 8:30am, 5 min>; <s1, 8:30am - 8:35am, 5 min>; <s7, 8:35am - 8:40am, 3 min>; . . .
ID Spatial Features Temporal Features
CA
<s1, 182 min>; <s2, 97 min>; <s3, 33 min>; <s4, 12 min>;…
<8:05am - 8:10 am, 4 min>; <8:10 am - 8:15 am, 10 min>;…
CB
<s1, 12 min>; <s2, 51 min>; <s3, 34 min>; <s4, 140 min>;…
<6:20 pm - 6:25 pm, 7 min>; <6:25 pm - 6:30 pm, 13 min>;…
CC
<s1, 103 min>; <s2, 75 min>; <s7, 54 min>; <s9, 60 min>;…
<8:20am - 8:25 am, 1 min>; <8:25 am - 8:30 am, 15 min>;…
23/4/21 20DAIS UIUC
Integrate the Micro-clustersIntegrate the Micro-clusters
The micro-clusters represent an individual event
Atypical events may happen in similar places/time
For example, 10E highway congested in evening rush hours in weekday
For analytical purposes, it is helpful to group those similar congestions as a whole
Two sub-problems:Which ones to merge?
How to merge?
23/4/21 21DAIS UIUC
Similarity Measure for Atypical ClustersSimilarity Measure for Atypical Clusters
Basic PrinciplesConsider the similarity on multiple dimensions – users may specify a preference weight
Weighted measure on the data themselves (e.g., if sensor s1
report higher severities in the clusters than s2, then the
weight of s1 is higher) – employ the severity as weight
23/4/21 22DAIS UIUC
Cluster IntegrationCluster Integration
For two clusters C1 and C2, the system
carry out aggregation on the feature of each dimension
for the common items, sum up their severity
keep the non-overlap items
Example
C1 {s1, 100 min; s2, 20 min}
C2 {s1, 30 min; s3, 40 min}
C23{s1, 130 min; s2, 20 min; s3, 40 min}
The spatial and temporal features are algebraic –efficient to aggregate
23/4/21 23DAIS UIUC
Macro-ClustersMacro-Clusters
The macro-clusters are generated by merging the micro-clusters
The similarities are computed among those macro-clusters and even larger ones can be further generated
23/4/21 24DAIS UIUC
Clustering ForestClustering Forest
The clusters make up the hierarchy of a tree
Different aggregate paths (preference on dimensions ) form a cluster forest
c1 c2 c3 c4 c5 c6 c7 c8 c9 c10
b1 b4 b6
a1
b5b3b2
a2
23/4/21 25DAIS UIUC
OutlineOutline
Introduction
Backgrounds
Model Construction
Query Processing
Experiments
23/4/21 26DAIS UIUC
The Efficiency Problem on Online QueryThe Efficiency Problem on Online Query
Usually it is not realistic to materialize the entire data forest
Only some middle results (i.e., the micro-clusters in lower level cells) are pre-computed (Partial materialization)
The time complexity of the cluster integration algorithm is O(n2)
Query efficiency will be influenced if n is large –the analytical query Q(W, T) usually covers large region with long time – n is indeed large
23/4/21 27DAIS UIUC
The Effectiveness ProblemThe Effectiveness Problem
In the result, only few significant macro-clusters are generated
The remaining are the trivial ones that cannot be aggregated with others
23/4/21 28DAIS UIUC
Pruning-beforehand StrategyPruning-beforehand Strategy
Filter out the insignificant micro-clusters
The insignificant micro-clusters may integrate together and generate significant macro-clusters
Can we foretell which micro-cluster will contribute to significant macro-clusters?
23/4/21 29DAIS UIUC
Red-Zone Guided ClusteringRed-Zone Guided Clustering
Since it is fast to compute the total severity in a specified region
Select out the regions with high severities (red zones)
Filter out the micro-clusters locating outside those red zones
Only keep the ones in/intersect with red zones (where the significant macro-clusters may locate)
23/4/21 30DAIS UIUC
Red-Zone Guided Clustering ExampleRed-Zone Guided Clustering Example
a
bc
d
e
f
g
h
i j
k
m
no
23/4/21 31DAIS UIUC
OutlineOutline
Introduction
Backgrounds
Model Construction
Query Processing
Experiments
23/4/21 32DAIS UIUC
Experiment SetupExperiment Setup
PeMS datasets from UC Berkeley1 year traffic data
4,076 loop detectors in 38 freeways in California
totally 54 GB
HardwareInter 2200 Dual CPU @ 2.20G Hz and 2.19G Hz
1.98 GB RAM; Windows XP SP2.
All the algorithms are implemented in Java
23/4/21 33DAIS UIUC
Model Construction Model Construction
Comparing Atypical Cluster (AC) with Original CubeView (OC) and Modified CubeView (MC)
AC is an order of magnitude faster than OC
23/4/21 34DAIS UIUC
Query EfficienyQuery Efficieny
All: Do not prune; Pru: Prune beforehand; Gui: Guided Clustering
Gui cost 20% time of All, and is close to Pru
23/4/21 35DAIS UIUC
Query EffectivenessQuery Effectiveness
Ground Truth: Generated by All
Pru may miss real significant macro-clusters, but Gui can guarantee the recall
23/4/21 36DAIS UIUC
ConclusionsConclusions
We have investigated the problem of multi-dimensional analysis of atypical events in CPS
Atypical cluster is designed to present the event and serve as the measure for data model
The red-zone algorithm is proposed to retrieve the significant clusters for analytical query
Performance evaluation on real large datasets
Thank You Very Much!Thank You Very Much!Any Questions?Any Questions?
top related