xiaochen zhu 1, shaoxu song 1, xiang lian 2, jianmin wang 1, lei zou 3 1 tsinghua university, china...
TRANSCRIPT
Matching Heterogeneous Event Data
Xiaochen Zhu1, Shaoxu Song1, Xiang Lian2, Jianmin Wang1, Lei Zou3
1Tsinghua University, China2University of Texas - Pan American, USA
3Peking University, China
1/21
SIGMOD 2014
Outline`
Motivation Event Matching Similarity
Structural Similarity Function Iterative Computation Estimation
Matching Composite Events Experiments Conclusion
2/21
SIGMOD 2014
Information System and Event Log
Information systems play an important role in large enterprises:
Enterprise Resource Planning (ERP) Office Automation (OA)
These systems record the business history in their event logs.
3/21
SIGMOD 2014
Trace ID Trace Trace ID Trace
1 ACDEF 6 BCDEF
2 BCDFE 7 BCDFE
3 ACDFE 8 BCDEF
4 ACDFE 9 BCDFE
5 ACDEF 10 BCDFE
ACDEF
Event ID Trace ID Event Name Timestamp
1 1 Pay by Cash (A) 04-22 13:33:34
2 1 Check Inventory (C) 04-22 15:18:11
3 1 Validate (D) 04-22 15:31:50
4 1 Ship Goods (E) 04-23 08:14:26
5 1 Email Customer (F) 04-23 08:17:18
Event Data Integration
Complex event processing Provenance analysis Decision support
4/21
Business Data Warehouse
Event Logs
Beijing Subsidiary
Event Logs
Shanghai Subsidiary
Event Logs
Hong Kong Subsidiary
Information systems
Information systems
Information systems
SIGMOD 2014
Exploring the correspondence among events
Heterogeneous Events
Different events may represent the same activity
5/21
ID Trace
t1 Pay by Cash (A) Check Inventory (C) Validate (D) Ship Goods (E) Email Customer (F)
t2 Pay by Credit Card (B) Check Inventory (C) Validate (D) Email Customer (F) Ship Goods (E)
… …
ID Trace
s1 Order Accepted (1) Pay by Cash (2) Inventory Checking & Validation (4) ????????? (5) Send Notification (6)
s2 Order Accepted (1) Pay by Credit Card (3) Inventory Checking & Validation (4) Send Notification (6) ???????? (5)
… …
SIGMOD 2014
Linguistic Matching Dislocated MatchingSemantic MatchingOpaque MatchingComposite Events Matching
Convert Event Log to Graph Text Similarity fails Statistics and structural information Event Log Event Dependency Graph (V, E, f)
6/21
Trace ID Trace
1 ACDEF
2 BCDFE
3 ACDFE
4 ACDFE
5 ACDEF
6 BCDEF
7 BCDFE
8 BCDEF
9 BCDFE
10 BCDFE
A
B
C D
E
F
1.0
0.6
1.0
0.6
0.4
0.4
f(B,C)=0.6
1.00.4
0.4
0.6
0.6
f(A)=0.4
frequency of appearance
frequency of consecutive eventsSIGMOD 2014
7
Related WorkLinguistic Matching
Semantic Matching
Opaque Matching
Dislocated Matching
Composite Events
Graph Edit Distance
OpaqueSchema Matching
Behavioral Matching
Event Matching Similarity1. R. M. Dijkman, M. Dumas, and L. Garc´ıa-Ba˜nuelos. Graph matching algorithms for business process model similarity search. In BPM, pages 48–63, 20092. J. Kang and J. F. Naughton. On schema matching with opaque column names and data values. In SIGMOD Conference, pages 205–216, 20033. S. Nejati, M. Sabetzadeh, M. Chechik, S. M. Easterbrook, and P. Zave. Matching and merging of statecharts specifications. In ICSE, pages 54–64, 2007.
Event Matching Framework8/21
A
B
C D
E
F
1.0
0.6
1.0
0.6
0.4
0.4
0.6
1.0 0.40.4
0.6
0.6
0.4
1
3
2
4
5
6
1.0
0.6
1.0
0.6
0.40.4
0.6
1.00.4
0.4
0.6
0.6
0.4
1.0
Event Logs Dependency Graphs
Event Matching
Similarities
Correspondences
CompositeEvent
Matching
Trace ID Trace
1 ACDEF
… …
Trace ID Trace
1 12456
… …
1 2 3 4 5 6
A 0.23 0.80 0.52 0.20 0.15 0.19
B 0.38 0.53 0.76 0.24 0.20 0.23
C 0.30 0.16 0.20 0.61 0.20 0.22
D 0.34 0.15 0.20 0.37 0.24 0.25
E 0.27 0.21 0.19 0.18 0.28 0.20
F 0.30 0.19 0.23 0.23 0.20 0.72
A2, B3, C4, D1 E5, F6A2, B3, {C,D}4, E5, F6
Event Matching
Similarities
SIGMOD 2014
Outline
Motivation Event Matching Similarity
Intuition Iterative Computation Estimation
Matching Composite Events Experiments Conclusion
9/21
SIGMOD 2014
An Intuition from Simrank*
Intuition of evaluating the similarity of two events v1 and v2: 1. S(v1 ,v2)=1, if both v1 and v2 have no input neighbor; 2. v1 is similar to v2, if they frequently share similar
input neighbors.
10/21
SIGMOD 2014
* G. Jeh and J. Widom. Simrank: a measure of structural-context similarity. In KDD, pages 538–543, 2002.
A
B
C D
E
F
1
3
2
4
5
6
Problem: Cannot deal with dislocated matching
Handle the Dislocated Matching
Introduce an artificial event vX
1. S( , )=1; 2. v1 is similar to v2, if they frequently share similar
input neighbors.
11/21
SIGMOD 2014
A
B
C D
E
F
1
3
2
4
5
6
𝐯𝟏𝐗 𝐯𝟐
𝐗
Iterative Computation12/21
SIGMOD 2014
A
B
C D
E
F
𝐯𝟏𝐗
1
3
2
4
5
6
𝐯𝟐𝐗
1 2 3 4 5 6
1.00 0 0 0 0 0 0
A 0 0 0 0 0 0 0
B 0 0 0 0 0 0 0
C 0 0 0 0 0 0 0
D 0 0 0 0 0 0 0
E 0 0 0 0 0 0 0
F 0 0 0 0 0 0 0
I = 0I = 1I = 2
I = 20
1 2 3 4 5 6
1.00 0 0 0 0 0 0
A 0 0.23 0.80 0.52 0.20 0.15 0.19
B 0 0.38 0.53 0.76 0.24 0.20 0.23
C 0 0.30 0.10 0.13 0.40 0.13 0.17
D 0 0.34 0.11 0.15 0.34 0.17 0.17
E 0 0.27 0.14 0.13 0.13 0.13 0.13
F 0 0.30 0.13 0.15 0.18 0.13 0.63
1 2 3 4 5 6
1.00 0 0 0 0 0 0
A 0 0.23 0.80 0.52 0.20 0.15 0.19
B 0 0.38 0.53 0.76 0.24 0.20 0.23
C 0 0.30 0.16 0.20 0.61 0.19 0.22
D 0 0.34 0.15 0.20 0.36 0.21 0.22
E 0 0.27 0.21 0.19 0.17 0.26 0.19
F 0 0.30 0.19 0.23 0.22 0.19 0.70
1 2 3 4 5 6
1.00 0 0 0 0 0 0
A 0 0.23 0.80 0.52 0.20 0.15 0.19
B 0 0.38 0.53 0.76 0.24 0.20 0.23
C 0 0.30 0.16 0.20 0.61 0.20 0.22
D 0 0.34 0.15 0.20 0.37 0.24 0.25
E 0 0.27 0.21 0.19 0.18 0.28 0.20
F 0 0.30 0.19 0.23 0.23 0.20 0.72
Estimation
For huge and complex graphs, it needs tens or hundreds of iterations to converge.
Instead, we only do I rounds of iterations, and then estimate the converged similarities.
13/21
SIGMOD 2014
Trade-off between accuracy and efficiency.
I : accuracy time I: accuracy time
Outline
Motivation Event Matching Similarity
Structural Similarity Function Iterative Computation Estimation
Matching Composite Events Experiments Conclusion
14/21
SIGMOD 2014
Matching Composite Events
Candidates of Composite Events: C and D, E and F… Pre-defined or discovered automatically
Heuristics: Which candidate improves the average similarity
15/21
SIGMOD 2014
A
B
C D
E
F
1
3
2
4
5
6
A
B
C,D
E
F
A
B
C D E,F
Outline
Motivation Event Matching Similarity
Structural Similarity Function Iterative Computation Estimation
Matching Composite Events Experiments Conclusion
16/21
SIGMOD 2014
Experiment Setting
Real Life Data Set: employed from a real bus manufacturer
True event matching is generated manually by domain experts. Criteria: to evaluate the accuracy of event matching,
F-measure of precision and recall. Baseline: Graph Edit Distance1, Opaque matching2, Behavioral
Matching3.
1. R. M. Dijkman, M. Dumas, and L. Garc´ıa-Ba˜nuelos. Graph matching algorithms for business process model similarity search. In BPM, pages 48–63, 20092. J. Kang and J. F. Naughton. On schema matching with opaque column names and data values. In SIGMOD Conference, pages 205–216, 20033. S. Nejati, M. Sabetzadeh, M. Chechik, S. M. Easterbrook, and P. Zave. Matching and merging of statecharts specifications. In ICSE, pages 54–64, 2007.
17/21
No. of Event Logs 149 Min Event Size 2
No. of Traces 6000 Max Event Size 11
ICDE 2014
Conclusion
Event matching framework: Work well with dislocated matching. Work well with opaque event names.
An estimative function for trade-off.
Heuristics on matching composite events.
20/21
SIGMOD 2014