advanced techniques for mining structured data: process miningappice/courses/1617/pm/pm5.pdf ·...

33
Advanced Techniques for Mining Structured Data: Process Mining Frequent Pattern Discovery /Event Forecasting Dr A. Appice Scuola di Dottorato in Informatica e Matematica XXXII

Upload: others

Post on 20-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Advanced Techniques for Mining Structured Data: Process Miningappice/courses/1617/pm/pm5.pdf · Sliding window model 7 Id Activity Class Name User Timestamp 1 UPDATE com.liferay.portlet.documentlibrary.model.DLFileEntry

Advanced Techniques for MiningStructured Data: Process Mining

Frequent Pattern Discovery /Event ForecastingDr A. Appice

Scuola di Dottorato in Informatica e Matematica XXXII

Page 2: Advanced Techniques for Mining Structured Data: Process Miningappice/courses/1617/pm/pm5.pdf · Sliding window model 7 Id Activity Class Name User Timestamp 1 UPDATE com.liferay.portlet.documentlibrary.model.DLFileEntry

Problem definition

1. Given a set T of examples, which relate the characteristics of an event at time t(predictor variables) to the (numeric and discrete) characteristics of the eventsobserved in the window t-w, t-w+1, ..t-1 (descriptor variables)

2. Learn a forecasting model F(T) to forecast the characteristics of the next event:

- Regression (for numeric variables)

- Classification (for categorical variables)

2

Page 3: Advanced Techniques for Mining Structured Data: Process Miningappice/courses/1617/pm/pm5.pdf · Sliding window model 7 Id Activity Class Name User Timestamp 1 UPDATE com.liferay.portlet.documentlibrary.model.DLFileEntry

Applications

• Use F(T) to:

• to check conformance

• to recommend appropriate actions of enterprises' users.

3

Page 4: Advanced Techniques for Mining Structured Data: Process Miningappice/courses/1617/pm/pm5.pdf · Sliding window model 7 Id Activity Class Name User Timestamp 1 UPDATE com.liferay.portlet.documentlibrary.model.DLFileEntry

Event forecasting service

• Off-line step

• Sliding widow model + event log of full traces in order to learn aforecasting model F(T)

• On-line step

• recent events in a running trace + F(T) generated off-line in order to forecastthe next event of the running trace

4

Page 5: Advanced Techniques for Mining Structured Data: Process Miningappice/courses/1617/pm/pm5.pdf · Sliding window model 7 Id Activity Class Name User Timestamp 1 UPDATE com.liferay.portlet.documentlibrary.model.DLFileEntry

Event Forecasting Service

• Deployed in OPENNESS (PON VINCENTE)

5

Page 6: Advanced Techniques for Mining Structured Data: Process Miningappice/courses/1617/pm/pm5.pdf · Sliding window model 7 Id Activity Class Name User Timestamp 1 UPDATE com.liferay.portlet.documentlibrary.model.DLFileEntry

Sliding window model

• Temporal correlation between events of a case

• The future event is correlated to the events observed in the recent past

• The timestamp is ransformed into the time (in seconds) gone by the beginning of the case.

• When an optional characteristic lacks in the related event, the associated variable assumes the value \none" in the training example.

6

1 2 3 4 5 6 7 8 9 10

Page 7: Advanced Techniques for Mining Structured Data: Process Miningappice/courses/1617/pm/pm5.pdf · Sliding window model 7 Id Activity Class Name User Timestamp 1 UPDATE com.liferay.portlet.documentlibrary.model.DLFileEntry

Sliding window model

7

Id Activity Class Name User Timestamp

1 UPDATE com.liferay.portlet.documentlibrary.model.DLFileEntry Paul 2014-11-24 11:42:22.0

1 UPDATE com.liferay.portlet.documentlibrary.model.DLFileEntry Paul 2014-11-24 17:21:25.0

1 DELETE com.liferay.portlet.documentlibrary.model.DLFileEntry Paul 2014-11-24 17:49:55.0

1 CREATE com.liferay.portlet.blogs.model.BlogsEntry Paul 2014-11-24 18:22:00.0

1 UPDATE com.liferay.portlet.blogs.model.BlogsEntry Paul 2014-11-24 18:32:00.0

2 CREATE com.liferay.portlet.blogs.model.BlogsEntry Mary 2014-11-25 12:12:12.0

... ... .. ...

none, none, none, 0,

(1) none, none, none, 0,

UPDATE,com.liferay.portlet.documentlibrary.model.DLFileEntry, Paul,0

(1)

descriptive space X

predictive space Y

Page 8: Advanced Techniques for Mining Structured Data: Process Miningappice/courses/1617/pm/pm5.pdf · Sliding window model 7 Id Activity Class Name User Timestamp 1 UPDATE com.liferay.portlet.documentlibrary.model.DLFileEntry

Sliding window model

8

Id Activity Class Name User Timestamp

1 UPDATE com.liferay.portlet.documentlibrary.model.DLFileEntry Paul 2014-11-24 11:42:22.0

1 UPDATE com.liferay.portlet.documentlibrary.model.DLFileEntry Paul 2014-11-24 17:21:25.0

1 DELETE com.liferay.portlet.documentlibrary.model.DLFileEntry Paul 2014-11-24 17:49:55.0

1 CREATE com.liferay.portlet.blogs.model.BlogsEntry Paul 2014-11-24 18:22:00.0

1 UPDATE com.liferay.portlet.blogs.model.BlogsEntry Paul 2014-11-24 18:32:00.0

2 CREATE com.liferay.portlet.blogs.model.BlogsEntry Mary 2014-11-25 12:12:12.0

... ... .. ...

none, none, none, 0,

(2) UPDATE, com.liferay.portlet.documentlibrary.model.DLFileEntry, Paul,0

UPDATE,com.liferay.portlet.documentlibrary.model.DLFileEntry, Paul, 20363

(2)

descriptive space X

predictive space Y

Page 9: Advanced Techniques for Mining Structured Data: Process Miningappice/courses/1617/pm/pm5.pdf · Sliding window model 7 Id Activity Class Name User Timestamp 1 UPDATE com.liferay.portlet.documentlibrary.model.DLFileEntry

In alternative, Landmark model

• For each event, the Landmark goes from the starting time point of the case to the present time.

• The descriptive characteristics are aggregated on the Landmark time. • A categorical characteristic is transformed into n numeric variables

(one variable for each distinct value of the characteristic domain). Each aggregated variable measures the frequency of the value over the Landmark.

• A numeric characteristic (e.g.time) is transformed into a numeric variable that sums values in the

Page 10: Advanced Techniques for Mining Structured Data: Process Miningappice/courses/1617/pm/pm5.pdf · Sliding window model 7 Id Activity Class Name User Timestamp 1 UPDATE com.liferay.portlet.documentlibrary.model.DLFileEntry

Forecasting model: how-to?

• Predictive clustering tree (PCT)

• Tree structured predictive clustering models that generalize decisiontrees

10

X1 {1,1} X1 {1,1}

X2> 2X2 2

Y1=c1,…,Yq=cq

Y1=c’1,…,Yq=c’q

Y1=c’’1,…,Yq=c’’q

X1 {1,1} ; Y1=c1 ,…, Yq=cq

X1 {1,1} and X2 2; Y1=c’1 ,…, Yq=c’qX1 {1,1} and X2> 2; Y1=c’’1,…, Yq=c’’q

Page 11: Advanced Techniques for Mining Structured Data: Process Miningappice/courses/1617/pm/pm5.pdf · Sliding window model 7 Id Activity Class Name User Timestamp 1 UPDATE com.liferay.portlet.documentlibrary.model.DLFileEntry

Predictive clusters

• Each cluster is associate to:the description of the event grouped in the cluster based on properties of

events observed in the recent past,

the values forecast for the properties of the next event in the case

(S, f)

• S is symbolic description defined on X

• f is a predictive function f: X Y

11

Page 12: Advanced Techniques for Mining Structured Data: Process Miningappice/courses/1617/pm/pm5.pdf · Sliding window model 7 Id Activity Class Name User Timestamp 1 UPDATE com.liferay.portlet.documentlibrary.model.DLFileEntry

Learning the forecasting model

•At each internal node t, a test has to be selected by maximizing the (inter-cluster) variance reduction over the target space, defined as follows:

∆𝐘 T t , P = Var T t , 𝐘 − ti∈P#T ti

T tVar(T ti , 𝐘),

•where T(t) denotes the set of training examples falling in t and P defines a partition T(t1) and T(t2) of T(t).

12

Page 13: Advanced Techniques for Mining Structured Data: Process Miningappice/courses/1617/pm/pm5.pdf · Sliding window model 7 Id Activity Class Name User Timestamp 1 UPDATE com.liferay.portlet.documentlibrary.model.DLFileEntry

Learning the forecasting model

• The partition is defined according to a Boolean test on a predictor variable in X.

•A new partition is recursively found until a stopping criterion is satisfied. • a node is leaf when it hosts a number of examples that is

smaller than 2 size(𝐓), with size(T) the number of training examples

13

Page 14: Advanced Techniques for Mining Structured Data: Process Miningappice/courses/1617/pm/pm5.pdf · Sliding window model 7 Id Activity Class Name User Timestamp 1 UPDATE com.liferay.portlet.documentlibrary.model.DLFileEntry

Learning the forecasting model

• In the multi-target context, the variance reduction is computed for each target variable.

• The total variance reduction is the average value of variance taken over the set of target variables

14

Page 15: Advanced Techniques for Mining Structured Data: Process Miningappice/courses/1617/pm/pm5.pdf · Sliding window model 7 Id Activity Class Name User Timestamp 1 UPDATE com.liferay.portlet.documentlibrary.model.DLFileEntry

Learning the forecasting model

• For a numeric target variable Y (i.e. Y Y, Y is numeric), the variance function Var(·) returns the variance of the target variable Y of the examples in the partition T(t), whereas the predictive function is the average of the target values in a cluster (leaf node). • The variance reduction is computed after scaling real values of Y

falling in T(t) in the interval [0,1].

• For a categorical target variable Y (i.e. Y Y, Y is categorical), the variance function Var(·) returns the Gini index of the target variable Y of the examples in the partition T(t), whereas the predictive function is the majority class for the target variable in the cluster.

15

Page 16: Advanced Techniques for Mining Structured Data: Process Miningappice/courses/1617/pm/pm5.pdf · Sliding window model 7 Id Activity Class Name User Timestamp 1 UPDATE com.liferay.portlet.documentlibrary.model.DLFileEntry

Learning the forecasting model

• If a leaf node is found, a predictive cluster is added to the final model.

• The symbolic description of this predictive cluster is the conjunction of Boolean tests along the path from the root to current leaf.

• The predictive function is that associated with the leaf, constructed for each target variable, by considering target values of examples falling in the leaf partition.

16

Page 17: Advanced Techniques for Mining Structured Data: Process Miningappice/courses/1617/pm/pm5.pdf · Sliding window model 7 Id Activity Class Name User Timestamp 1 UPDATE com.liferay.portlet.documentlibrary.model.DLFileEntry
Page 18: Advanced Techniques for Mining Structured Data: Process Miningappice/courses/1617/pm/pm5.pdf · Sliding window model 7 Id Activity Class Name User Timestamp 1 UPDATE com.liferay.portlet.documentlibrary.model.DLFileEntry
Page 19: Advanced Techniques for Mining Structured Data: Process Miningappice/courses/1617/pm/pm5.pdf · Sliding window model 7 Id Activity Class Name User Timestamp 1 UPDATE com.liferay.portlet.documentlibrary.model.DLFileEntry
Page 20: Advanced Techniques for Mining Structured Data: Process Miningappice/courses/1617/pm/pm5.pdf · Sliding window model 7 Id Activity Class Name User Timestamp 1 UPDATE com.liferay.portlet.documentlibrary.model.DLFileEntry

running case PCTnext event

On-line phase

Page 21: Advanced Techniques for Mining Structured Data: Process Miningappice/courses/1617/pm/pm5.pdf · Sliding window model 7 Id Activity Class Name User Timestamp 1 UPDATE com.liferay.portlet.documentlibrary.model.DLFileEntry
Page 22: Advanced Techniques for Mining Structured Data: Process Miningappice/courses/1617/pm/pm5.pdf · Sliding window model 7 Id Activity Class Name User Timestamp 1 UPDATE com.liferay.portlet.documentlibrary.model.DLFileEntry
Page 23: Advanced Techniques for Mining Structured Data: Process Miningappice/courses/1617/pm/pm5.pdf · Sliding window model 7 Id Activity Class Name User Timestamp 1 UPDATE com.liferay.portlet.documentlibrary.model.DLFileEntry
Page 24: Advanced Techniques for Mining Structured Data: Process Miningappice/courses/1617/pm/pm5.pdf · Sliding window model 7 Id Activity Class Name User Timestamp 1 UPDATE com.liferay.portlet.documentlibrary.model.DLFileEntry

• 10-fold cross validation of cases in a log and by varying the window size between two and the maximum length of a case in the log

Experiments

Page 25: Advanced Techniques for Mining Structured Data: Process Miningappice/courses/1617/pm/pm5.pdf · Sliding window model 7 Id Activity Class Name User Timestamp 1 UPDATE com.liferay.portlet.documentlibrary.model.DLFileEntry

Accuracy averaged on the target space

Page 26: Advanced Techniques for Mining Structured Data: Process Miningappice/courses/1617/pm/pm5.pdf · Sliding window model 7 Id Activity Class Name User Timestamp 1 UPDATE com.liferay.portlet.documentlibrary.model.DLFileEntry

Number of leaves

Page 27: Advanced Techniques for Mining Structured Data: Process Miningappice/courses/1617/pm/pm5.pdf · Sliding window model 7 Id Activity Class Name User Timestamp 1 UPDATE com.liferay.portlet.documentlibrary.model.DLFileEntry

Learning time

Page 28: Advanced Techniques for Mining Structured Data: Process Miningappice/courses/1617/pm/pm5.pdf · Sliding window model 7 Id Activity Class Name User Timestamp 1 UPDATE com.liferay.portlet.documentlibrary.model.DLFileEntry

Case study in VINCENTE: data

• Daily routine of users of the platform OPENNESS belonging to a specific group (group id=13723)

• between September 1, 2014 and November 30, 2014

• 201 full traces

• 5477 events

• 3 characteristics (activity, class, timestamp)

Page 29: Advanced Techniques for Mining Structured Data: Process Miningappice/courses/1617/pm/pm5.pdf · Sliding window model 7 Id Activity Class Name User Timestamp 1 UPDATE com.liferay.portlet.documentlibrary.model.DLFileEntry

Case study: experimental setup

• Off-line learning : 90% of randomly selected traces (180 traces)

• On-line learning: 10% of traces (21 traces)

Page 30: Advanced Techniques for Mining Structured Data: Process Miningappice/courses/1617/pm/pm5.pdf · Sliding window model 7 Id Activity Class Name User Timestamp 1 UPDATE com.liferay.portlet.documentlibrary.model.DLFileEntry

Case study: off-line learning

30

Page 31: Advanced Techniques for Mining Structured Data: Process Miningappice/courses/1617/pm/pm5.pdf · Sliding window model 7 Id Activity Class Name User Timestamp 1 UPDATE com.liferay.portlet.documentlibrary.model.DLFileEntry

Case study: on-line forecasting (1/2)Trace Number of Events Activity type Class name Time (secs)

1 14 12.77% 100.00% 4.743

2 17 100.00% 100.00% 20.879

3 36 91.67% 87.50% 0.71

4 58 100.00% 80.00% 2.96

5 93 100.00% 100.00% 454.12

6 98 100.00% 66.67% 1950.00

7 101 50.00% 50.00% 928.72

8 105 100.00% 0.00% 14155.87

9 124 100.00% 50.00% 2388.25

10 125 58.33% 58.33% 9.39

11 127 100.00% 50.00% 2388.25

31

Page 32: Advanced Techniques for Mining Structured Data: Process Miningappice/courses/1617/pm/pm5.pdf · Sliding window model 7 Id Activity Class Name User Timestamp 1 UPDATE com.liferay.portlet.documentlibrary.model.DLFileEntry

Case study: on-line forecasting(2/2)Trace Number of Events Activity type Class name Time (secs)

12 129 100.00% 83.33% 2.48

13 132 100.00% 87.5% 2.15

14 138 92.86% 71.42% 1.62

15 139 100.00% 94.12% 1.47

16 141 100.00% 66.67% 1950.00

17 174 20.00% 80.00% 587.37

18 179 96.15% 61.54% 2.62

19 181 97.92% 64.58% 165.14

20 190 60.00% 80.00% 587.37

21 194 50.00% 50.00% 928.72

Avg 82.37% 70.56% 755.23

32

Page 33: Advanced Techniques for Mining Structured Data: Process Miningappice/courses/1617/pm/pm5.pdf · Sliding window model 7 Id Activity Class Name User Timestamp 1 UPDATE com.liferay.portlet.documentlibrary.model.DLFileEntry

Bibliography

A. Appice, S. Pravilovic e D. Malerba, Process Mining to Forecast the Future of Running Cases, 2nd Internation workshop on New Fronteirs in Mining compelx Patterns, NFMCP@ECMLPKDD 2013

A. Appice, D. Malerba, V. Morreale, G. Vella, Business Event Forecasting. In: IFKAD 2015, Bari, Italy, 10-12 June