beyond streams and graphs: dynamic tensor analysis jimeng sun christos faloutsos dacheng tao...
TRANSCRIPT
![Page 1: Beyond Streams and Graphs: Dynamic Tensor Analysis Jimeng Sun Christos Faloutsos Dacheng Tao Speaker: Jimeng Sun](https://reader035.vdocuments.mx/reader035/viewer/2022062307/551c3c5455034693488b48b0/html5/thumbnails/1.jpg)
Beyond Streams and Graphs:
Dynamic Tensor Analysis Jimeng Sun Christos FaloutsosDacheng
Tao
Speaker: Jimeng Sun
![Page 2: Beyond Streams and Graphs: Dynamic Tensor Analysis Jimeng Sun Christos Faloutsos Dacheng Tao Speaker: Jimeng Sun](https://reader035.vdocuments.mx/reader035/viewer/2022062307/551c3c5455034693488b48b0/html5/thumbnails/2.jpg)
MotivationGoal: incremental pattern discovery on streaming applications
Streams: E1: Environmental sensor networks E2: Cluster/data center monitoring
Graphs: E3: Social network analysis
Tensors: E4: Network forensics E5: Financial auditing E6: fMRI: Brain image analysis
How to summarize streaming data effectively and incrementally?
![Page 3: Beyond Streams and Graphs: Dynamic Tensor Analysis Jimeng Sun Christos Faloutsos Dacheng Tao Speaker: Jimeng Sun](https://reader035.vdocuments.mx/reader035/viewer/2022062307/551c3c5455034693488b48b0/html5/thumbnails/3.jpg)
E3: Social network analysisTraditionally, people focus on static networks and find community structuresWe plan to monitor the change of the community structure over time and identify abnormal individuals
DB
Aut
hors
Keywords
DM
DB
1990
2004
![Page 4: Beyond Streams and Graphs: Dynamic Tensor Analysis Jimeng Sun Christos Faloutsos Dacheng Tao Speaker: Jimeng Sun](https://reader035.vdocuments.mx/reader035/viewer/2022062307/551c3c5455034693488b48b0/html5/thumbnails/4.jpg)
E4: Network forensicsDirectional network flowsA large ISP with 100 POPs, each POP 10Gbps link capacity [Hotnets2004]
450 GB/hour with compression
Task: Identify abnormal traffic pattern and find out the cause
normal trafficabnormal traffic
dest
inati
on
source
dest
inati
on
sourceCollaboration with Prof. Hui Zhang and Dr. Yinglian Xie
![Page 5: Beyond Streams and Graphs: Dynamic Tensor Analysis Jimeng Sun Christos Faloutsos Dacheng Tao Speaker: Jimeng Sun](https://reader035.vdocuments.mx/reader035/viewer/2022062307/551c3c5455034693488b48b0/html5/thumbnails/5.jpg)
Static Data model
For a timestamp, the stream measurements can be modeled using a tensorDimension: a single stream
E.g, <Christos, “graph”>
Mode: a group of dimensions of the same kind.
E.g., Source, Destination, Port
Des
tina
tion
Source
Time = 0
![Page 6: Beyond Streams and Graphs: Dynamic Tensor Analysis Jimeng Sun Christos Faloutsos Dacheng Tao Speaker: Jimeng Sun](https://reader035.vdocuments.mx/reader035/viewer/2022062307/551c3c5455034693488b48b0/html5/thumbnails/6.jpg)
Static Data model (cont.)Tensor
Formally,
Generalization of matrices
Represented as multi-array, data cube.Order 1st 2nd 3rd
Correspondence Vector Matrix 3D array
Example Sensors
Aut
hors
Keywords
SourcesD
estin
atio
nsP
orts
![Page 7: Beyond Streams and Graphs: Dynamic Tensor Analysis Jimeng Sun Christos Faloutsos Dacheng Tao Speaker: Jimeng Sun](https://reader035.vdocuments.mx/reader035/viewer/2022062307/551c3c5455034693488b48b0/html5/thumbnails/7.jpg)
Dynamic Data model (our focus)
Streams come with structure(time, source, destination, port)(time, author, keyword)
time
De
stin
atio
n
Source
![Page 8: Beyond Streams and Graphs: Dynamic Tensor Analysis Jimeng Sun Christos Faloutsos Dacheng Tao Speaker: Jimeng Sun](https://reader035.vdocuments.mx/reader035/viewer/2022062307/551c3c5455034693488b48b0/html5/thumbnails/8.jpg)
Dynamic Data model (cont.)Tensor Streams
A sequence of Mth order tensor
where
n is increasing over timeOrder 1st 2nd 3rd
Correspondence
Multiple streams Time evolving graphs
3D arrays
Example
Sources
Des
tinat
ions
Por
tstime
Sensors
…
time
…
au
thor
keyword
…
![Page 9: Beyond Streams and Graphs: Dynamic Tensor Analysis Jimeng Sun Christos Faloutsos Dacheng Tao Speaker: Jimeng Sun](https://reader035.vdocuments.mx/reader035/viewer/2022062307/551c3c5455034693488b48b0/html5/thumbnails/9.jpg)
Dynamic tensor analysis
UDestination
USource
Old cores
Sourc
e
De
stin
atio
n
New TensorOld Tensors
![Page 10: Beyond Streams and Graphs: Dynamic Tensor Analysis Jimeng Sun Christos Faloutsos Dacheng Tao Speaker: Jimeng Sun](https://reader035.vdocuments.mx/reader035/viewer/2022062307/551c3c5455034693488b48b0/html5/thumbnails/10.jpg)
Roadmap
Motivation and main ideasBackground and related workDynamic and streaming tensor analysisExperimentsConclusion
![Page 11: Beyond Streams and Graphs: Dynamic Tensor Analysis Jimeng Sun Christos Faloutsos Dacheng Tao Speaker: Jimeng Sun](https://reader035.vdocuments.mx/reader035/viewer/2022062307/551c3c5455034693488b48b0/html5/thumbnails/11.jpg)
Y
Background – Singular value decomposition (SVD)
SVD
Best rank k approximation in L2PCA is an important application of SVD
Am
n
m
nRR
R
UVT k
k k
UT
![Page 12: Beyond Streams and Graphs: Dynamic Tensor Analysis Jimeng Sun Christos Faloutsos Dacheng Tao Speaker: Jimeng Sun](https://reader035.vdocuments.mx/reader035/viewer/2022062307/551c3c5455034693488b48b0/html5/thumbnails/12.jpg)
Latent semantic indexing (LSI)
Singular vectors are useful for clustering or correlation detection
1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 30 0 0 1 1
pattern
cluster
querycache
0.18 0
0.36 0
0.18 0
0.90 0
0 0.53
0 0.800 0.27
=DM
DB
9.64 0
0 5.29x
0.58 0.58 0.58 0 0
0 0 0 0.71 0.71
x
document-conceptconcept-term
concept-association
frequent
![Page 13: Beyond Streams and Graphs: Dynamic Tensor Analysis Jimeng Sun Christos Faloutsos Dacheng Tao Speaker: Jimeng Sun](https://reader035.vdocuments.mx/reader035/viewer/2022062307/551c3c5455034693488b48b0/html5/thumbnails/13.jpg)
Tensor Operation: Matricize X(d)
Unfold a tensor into a matrix
5 76 81 3
2 4
Acknowledge to Tammy Kolda for this slide
![Page 14: Beyond Streams and Graphs: Dynamic Tensor Analysis Jimeng Sun Christos Faloutsos Dacheng Tao Speaker: Jimeng Sun](https://reader035.vdocuments.mx/reader035/viewer/2022062307/551c3c5455034693488b48b0/html5/thumbnails/14.jpg)
Tensor Operation: Mode-product
Multiply a tensor with a matrix
source
dest
inati
on
port
group
source
dest
inati
on
port
groupso
urc
e
![Page 15: Beyond Streams and Graphs: Dynamic Tensor Analysis Jimeng Sun Christos Faloutsos Dacheng Tao Speaker: Jimeng Sun](https://reader035.vdocuments.mx/reader035/viewer/2022062307/551c3c5455034693488b48b0/html5/thumbnails/15.jpg)
Related workLow Rank approximationPCA, SVD: orthogonal
based projection
Multilinear analysisTensor decompositions:
Tucker, PARAFAC, HOSVD
Stream miningScan data once to
identify patternsSampling: [Vitter85],
[Gibbons98]Sketches: [Indyk00],
[Cormode03]
Graph miningExplorative: [Faloutsos04]
[Kumar99][Leskovec05]…
Algorithmic: [Yan05][Cormode05]…
Our Work
![Page 16: Beyond Streams and Graphs: Dynamic Tensor Analysis Jimeng Sun Christos Faloutsos Dacheng Tao Speaker: Jimeng Sun](https://reader035.vdocuments.mx/reader035/viewer/2022062307/551c3c5455034693488b48b0/html5/thumbnails/16.jpg)
Roadmap
Motivation and main ideasBackground and related workDynamic and streaming tensor analysisExperimentsConclusion
![Page 17: Beyond Streams and Graphs: Dynamic Tensor Analysis Jimeng Sun Christos Faloutsos Dacheng Tao Speaker: Jimeng Sun](https://reader035.vdocuments.mx/reader035/viewer/2022062307/551c3c5455034693488b48b0/html5/thumbnails/17.jpg)
Tensor analysisGiven a sequence of tensorsfind the projection matricessuch that the reconstruction error e is minimized:
…
…
t
Note that this is a generalization of PCA when n is a constant
Sources
Des
tinat
ions
Por
ts
Source Projection
Des
tinat
ion
Pro
ject
ion
Port Projection
Core Tensor
![Page 18: Beyond Streams and Graphs: Dynamic Tensor Analysis Jimeng Sun Christos Faloutsos Dacheng Tao Speaker: Jimeng Sun](https://reader035.vdocuments.mx/reader035/viewer/2022062307/551c3c5455034693488b48b0/html5/thumbnails/18.jpg)
DB
Aut
hors
Keywords
DM
DB
UA
UK
1990
2004
1990
2004
Why do we care?
Anomaly detectionReconstruction error drivenMultiple resolution
Multiway latent semantic indexing (LSI) Philip Yu
Michael Stonebreak
er
QueryPattern
time
![Page 19: Beyond Streams and Graphs: Dynamic Tensor Analysis Jimeng Sun Christos Faloutsos Dacheng Tao Speaker: Jimeng Sun](https://reader035.vdocuments.mx/reader035/viewer/2022062307/551c3c5455034693488b48b0/html5/thumbnails/19.jpg)
1st order DTA - problemGiven x1…xn where each xi RN, find
URNR such that the error e is small:
n
N
x1
xn
….
?
tim
e
Sensors
UT
indooroutdoor
Y
Sensors
R
Note that Y = XU
![Page 20: Beyond Streams and Graphs: Dynamic Tensor Analysis Jimeng Sun Christos Faloutsos Dacheng Tao Speaker: Jimeng Sun](https://reader035.vdocuments.mx/reader035/viewer/2022062307/551c3c5455034693488b48b0/html5/thumbnails/20.jpg)
1st order DTAInput: new data vector x RN, old variance
matrix C RN N
Output: new projection matrix U RN R
Algorithm:1. update variance matrix Cnew = xTx + C2. Diagonalize UUT = Cnew 3. Determine the rank R and return U
xT C UUTx
Cnew
Diagonalization has to be done for every new x!
Old X
x
tim
e
![Page 21: Beyond Streams and Graphs: Dynamic Tensor Analysis Jimeng Sun Christos Faloutsos Dacheng Tao Speaker: Jimeng Sun](https://reader035.vdocuments.mx/reader035/viewer/2022062307/551c3c5455034693488b48b0/html5/thumbnails/21.jpg)
1st order STAAdjust U smoothly when new data arrive without diagonalization [VLDB05]
For each new point xProject onto current lineEstimate errorRotate line in the direction of the error and in proportion to its magnitude
For each new point x and for i = 1, …, k : yi := Ui
Tx (proj. onto Ui)
di di + yi2 (energy i-th eigenval.)
ei := x – yiUi (error)
Ui Ui + (1/di) yiei (update estimate)
x x – yiUi (repeat with remainder)
error
U
Sensor 1
Sen
sor
2
![Page 22: Beyond Streams and Graphs: Dynamic Tensor Analysis Jimeng Sun Christos Faloutsos Dacheng Tao Speaker: Jimeng Sun](https://reader035.vdocuments.mx/reader035/viewer/2022062307/551c3c5455034693488b48b0/html5/thumbnails/22.jpg)
Mth order DTA
dU
TdU
Reconstruct Variance Matrix
dC
dC
Update Variance Matrix
dS
Diagonalize Variance Matrix
dU
TdU
dSX(d)X(d)
dX TdX
Mat
riciz
ing,
Tra
nspo
se
Construct Variance Matrix of Incremental Tensor
Matricizing
T
![Page 23: Beyond Streams and Graphs: Dynamic Tensor Analysis Jimeng Sun Christos Faloutsos Dacheng Tao Speaker: Jimeng Sun](https://reader035.vdocuments.mx/reader035/viewer/2022062307/551c3c5455034693488b48b0/html5/thumbnails/23.jpg)
Mth order DTA – complexityStorage: O( Ni), i.e., size of an input tensor at a single
timestampComputation: Ni
3 (or Ni2) diagonalization of C
+ Ni Ni matrix multiplication X (d)T X(d)
For low order tensor(<3), diagonalization is the main cost
For high order tensor, matrix multiplication is the main cost
![Page 24: Beyond Streams and Graphs: Dynamic Tensor Analysis Jimeng Sun Christos Faloutsos Dacheng Tao Speaker: Jimeng Sun](https://reader035.vdocuments.mx/reader035/viewer/2022062307/551c3c5455034693488b48b0/html5/thumbnails/24.jpg)
Mth order STA
TdX
Matricizing
Run 1st order STA along each modeComplexity:
Storage: O( Ni)
Computation: Ri Ni which is smaller than DTAy1
U1
xe1
U1 updated
![Page 25: Beyond Streams and Graphs: Dynamic Tensor Analysis Jimeng Sun Christos Faloutsos Dacheng Tao Speaker: Jimeng Sun](https://reader035.vdocuments.mx/reader035/viewer/2022062307/551c3c5455034693488b48b0/html5/thumbnails/25.jpg)
Roadmap
Motivation and main ideasBackground and related workDynamic and streaming tensor analysisExperimentsConclusion
![Page 26: Beyond Streams and Graphs: Dynamic Tensor Analysis Jimeng Sun Christos Faloutsos Dacheng Tao Speaker: Jimeng Sun](https://reader035.vdocuments.mx/reader035/viewer/2022062307/551c3c5455034693488b48b0/html5/thumbnails/26.jpg)
Experiment
ObjectivesComputational efficiencyAccurate approximationReal applications
Anomaly detection Clustering
![Page 27: Beyond Streams and Graphs: Dynamic Tensor Analysis Jimeng Sun Christos Faloutsos Dacheng Tao Speaker: Jimeng Sun](https://reader035.vdocuments.mx/reader035/viewer/2022062307/551c3c5455034693488b48b0/html5/thumbnails/27.jpg)
Data set 1: Network dataTCP flows collected at CMU backboneRaw data 500GB with compressionConstruct 3rd order tensors with hourly windows with <source, destination, port>Each tensor: 500500100 1200 timestamps (hours)
Sparse data10AM to 11AM on 01/06/2005
valuePower-law distribution
![Page 28: Beyond Streams and Graphs: Dynamic Tensor Analysis Jimeng Sun Christos Faloutsos Dacheng Tao Speaker: Jimeng Sun](https://reader035.vdocuments.mx/reader035/viewer/2022062307/551c3c5455034693488b48b0/html5/thumbnails/28.jpg)
Data set 2: Bibliographic data (DBLP)
Papers from VLDB and KDD conferencesConstruct 2nd order tensors with yearly windows with <author, keywords> Each tensor: 45843741 11 timestamps (years)
![Page 29: Beyond Streams and Graphs: Dynamic Tensor Analysis Jimeng Sun Christos Faloutsos Dacheng Tao Speaker: Jimeng Sun](https://reader035.vdocuments.mx/reader035/viewer/2022062307/551c3c5455034693488b48b0/html5/thumbnails/29.jpg)
Computational cost
3rd order network tensor 2nd order DBLP tensorOTA is the offline tensor analysisPerformance metric: CPU time (sec)Observations:
DTA and STA are orders of magnitude faster than OTAThe slight upward trend in DBLP is due to the increasing number of papers each year (data become denser over time)
![Page 30: Beyond Streams and Graphs: Dynamic Tensor Analysis Jimeng Sun Christos Faloutsos Dacheng Tao Speaker: Jimeng Sun](https://reader035.vdocuments.mx/reader035/viewer/2022062307/551c3c5455034693488b48b0/html5/thumbnails/30.jpg)
Accuracy comparison
Performance metric: the ratio of reconstruction error between DTA/STA and OTA; fixing the error of OTA to 20%Observation: DTA performs very close to OTA in both datasets, STA performs worse in DBLP due to the bigger changes.
3rd order network tensor 2nd order DBLP tensor
![Page 31: Beyond Streams and Graphs: Dynamic Tensor Analysis Jimeng Sun Christos Faloutsos Dacheng Tao Speaker: Jimeng Sun](https://reader035.vdocuments.mx/reader035/viewer/2022062307/551c3c5455034693488b48b0/html5/thumbnails/31.jpg)
Network anomaly detection
Reconstruction error gives indication of anomalies.Prominent difference between normal and abnormal ones is mainly due to unusual scanning activity (confirmed by the campus admin).
Reconstruction error over time
Normal trafficAbnormal traffic
![Page 32: Beyond Streams and Graphs: Dynamic Tensor Analysis Jimeng Sun Christos Faloutsos Dacheng Tao Speaker: Jimeng Sun](https://reader035.vdocuments.mx/reader035/viewer/2022062307/551c3c5455034693488b48b0/html5/thumbnails/32.jpg)
Multiway LSIAuthors Keywords Yearmichael carey, michaelstonebreaker, h. jagadish,hector garcia-molina
queri,parallel,optimization,concurr,objectorient
1995
surajit chaudhuri,mitch cherniack,michaelstonebreaker,ugur etintemel
distribut,systems,view,storage,servic,process,cache
2004
jiawei han,jian pei,philip s. yu,jianyong wang,charu c. aggarwal
streams,pattern,support, cluster, index,gener,queri
2004
Two groups are correctly identified: Databases and Data miningPeople and concepts are drifting over time
DB
DM
![Page 33: Beyond Streams and Graphs: Dynamic Tensor Analysis Jimeng Sun Christos Faloutsos Dacheng Tao Speaker: Jimeng Sun](https://reader035.vdocuments.mx/reader035/viewer/2022062307/551c3c5455034693488b48b0/html5/thumbnails/33.jpg)
Conclusion
Tensor stream is a general data modelDTA/STA incrementally decompose tensors into core tensors and projection matricesThe result of DTA/STA can be used in other applications
Anomaly detectionMultiway LSI
![Page 34: Beyond Streams and Graphs: Dynamic Tensor Analysis Jimeng Sun Christos Faloutsos Dacheng Tao Speaker: Jimeng Sun](https://reader035.vdocuments.mx/reader035/viewer/2022062307/551c3c5455034693488b48b0/html5/thumbnails/34.jpg)
The world is not flat, neither should data mining be.
Final word: Think structurally!
Contact: Jimeng Sun [email protected]