advanced techniques for mining structured data: process miningappice/courses/1617/pm/pm5.pdf ·...

Advanced Techniques for MiningStructured Data: Process Mining

Frequent Pattern Discovery /Event ForecastingDr A. Appice

Scuola di Dottorato in Informatica e Matematica XXXII

Problem definition

1. Given a set T of examples, which relate the characteristics of an event at time t(predictor variables) to the (numeric and discrete) characteristics of the eventsobserved in the window t-w, t-w+1, ..t-1 (descriptor variables)

2. Learn a forecasting model F(T) to forecast the characteristics of the next event:

- Regression (for numeric variables)

- Classification (for categorical variables)

2

Applications

• Use F(T) to:

• to check conformance

• to recommend appropriate actions of enterprises' users.

3

Event forecasting service

• Off-line step

• Sliding widow model + event log of full traces in order to learn aforecasting model F(T)

• On-line step

• recent events in a running trace + F(T) generated off-line in order to forecastthe next event of the running trace

4

Event Forecasting Service

• Deployed in OPENNESS (PON VINCENTE)

5

Sliding window model

• Temporal correlation between events of a case

• The future event is correlated to the events observed in the recent past

• The timestamp is ransformed into the time (in seconds) gone by the beginning of the case.

• When an optional characteristic lacks in the related event, the associated variable assumes the value \none" in the training example.

6

1 2 3 4 5 6 7 8 9 10


7

Id Activity Class Name User Timestamp

1 UPDATE com.liferay.portlet.documentlibrary.model.DLFileEntry Paul 2014-11-24 11:42:22.0


1 DELETE com.liferay.portlet.documentlibrary.model.DLFileEntry Paul 2014-11-24 17:49:55.0

1 CREATE com.liferay.portlet.blogs.model.BlogsEntry Paul 2014-11-24 18:22:00.0

1 UPDATE com.liferay.portlet.blogs.model.BlogsEntry Paul 2014-11-24 18:32:00.0

2 CREATE com.liferay.portlet.blogs.model.BlogsEntry Mary 2014-11-25 12:12:12.0

... ... .. ...

none, none, none, 0,

(1) none, none, none, 0,

UPDATE,com.liferay.portlet.documentlibrary.model.DLFileEntry, Paul,0

(1)

descriptive space X

predictive space Y


8

Id Activity Class Name User Timestamp



1 DELETE com.liferay.portlet.documentlibrary.model.DLFileEntry Paul 2014-11-24 17:49:55.0

1 CREATE com.liferay.portlet.blogs.model.BlogsEntry Paul 2014-11-24 18:22:00.0

1 UPDATE com.liferay.portlet.blogs.model.BlogsEntry Paul 2014-11-24 18:32:00.0

2 CREATE com.liferay.portlet.blogs.model.BlogsEntry Mary 2014-11-25 12:12:12.0

... ... .. ...

none, none, none, 0,

(2) UPDATE, com.liferay.portlet.documentlibrary.model.DLFileEntry, Paul,0

UPDATE,com.liferay.portlet.documentlibrary.model.DLFileEntry, Paul, 20363

(2)

descriptive space X

predictive space Y

In alternative, Landmark model

• For each event, the Landmark goes from the starting time point of the case to the present time.

• The descriptive characteristics are aggregated on the Landmark time. • A categorical characteristic is transformed into n numeric variables

(one variable for each distinct value of the characteristic domain). Each aggregated variable measures the frequency of the value over the Landmark.

• A numeric characteristic (e.g.time) is transformed into a numeric variable that sums values in the

Forecasting model: how-to?

• Predictive clustering tree (PCT)

• Tree structured predictive clustering models that generalize decisiontrees

10

X1 {1,1} X1 {1,1}

X2> 2X2 2

Y1=c1,…,Yq=cq

Y1=c’1,…,Yq=c’q

Y1=c’’1,…,Yq=c’’q

X1 {1,1} ; Y1=c1 ,…, Yq=cq

X1 {1,1} and X2 2; Y1=c’1 ,…, Yq=c’qX1 {1,1} and X2> 2; Y1=c’’1,…, Yq=c’’q

Predictive clusters

• Each cluster is associate to:the description of the event grouped in the cluster based on properties of

events observed in the recent past,

the values forecast for the properties of the next event in the case

(S, f)

• S is symbolic description defined on X

• f is a predictive function f: X Y

11

Learning the forecasting model

•At each internal node t, a test has to be selected by maximizing the (inter-cluster) variance reduction over the target space, defined as follows:

∆𝐘 T t , P = Var T t , 𝐘 − ti∈P#T ti

T tVar(T ti , 𝐘),

•where T(t) denotes the set of training examples falling in t and P defines a partition T(t1) and T(t2) of T(t).

12


• The partition is defined according to a Boolean test on a predictor variable in X.

•A new partition is recursively found until a stopping criterion is satisfied. • a node is leaf when it hosts a number of examples that is

smaller than 2 size(𝐓), with size(T) the number of training examples

13


• In the multi-target context, the variance reduction is computed for each target variable.

• The total variance reduction is the average value of variance taken over the set of target variables

14


• For a numeric target variable Y (i.e. Y Y, Y is numeric), the variance function Var(·) returns the variance of the target variable Y of the examples in the partition T(t), whereas the predictive function is the average of the target values in a cluster (leaf node). • The variance reduction is computed after scaling real values of Y

falling in T(t) in the interval [0,1].

• For a categorical target variable Y (i.e. Y Y, Y is categorical), the variance function Var(·) returns the Gini index of the target variable Y of the examples in the partition T(t), whereas the predictive function is the majority class for the target variable in the cluster.

15


• If a leaf node is found, a predictive cluster is added to the final model.

• The symbolic description of this predictive cluster is the conjunction of Boolean tests along the path from the root to current leaf.

• The predictive function is that associated with the leaf, constructed for each target variable, by considering target values of examples falling in the leaf partition.

16

running case PCTnext event

On-line phase

• 10-fold cross validation of cases in a log and by varying the window size between two and the maximum length of a case in the log

Experiments

Accuracy averaged on the target space

Number of leaves

Learning time

Case study in VINCENTE: data

• Daily routine of users of the platform OPENNESS belonging to a specific group (group id=13723)

• between September 1, 2014 and November 30, 2014

• 201 full traces

• 5477 events

• 3 characteristics (activity, class, timestamp)

Case study: experimental setup

• Off-line learning : 90% of randomly selected traces (180 traces)

• On-line learning: 10% of traces (21 traces)

Case study: off-line learning

30

Case study: on-line forecasting (1/2)Trace Number of Events Activity type Class name Time (secs)

1 14 12.77% 100.00% 4.743

2 17 100.00% 100.00% 20.879

3 36 91.67% 87.50% 0.71

4 58 100.00% 80.00% 2.96

5 93 100.00% 100.00% 454.12

6 98 100.00% 66.67% 1950.00

7 101 50.00% 50.00% 928.72

8 105 100.00% 0.00% 14155.87

9 124 100.00% 50.00% 2388.25

10 125 58.33% 58.33% 9.39

11 127 100.00% 50.00% 2388.25

31

Case study: on-line forecasting(2/2)Trace Number of Events Activity type Class name Time (secs)

12 129 100.00% 83.33% 2.48

13 132 100.00% 87.5% 2.15

14 138 92.86% 71.42% 1.62

15 139 100.00% 94.12% 1.47

16 141 100.00% 66.67% 1950.00

17 174 20.00% 80.00% 587.37

18 179 96.15% 61.54% 2.62

19 181 97.92% 64.58% 165.14

20 190 60.00% 80.00% 587.37

21 194 50.00% 50.00% 928.72

Avg 82.37% 70.56% 755.23

32

Bibliography

A. Appice, S. Pravilovic e D. Malerba, Process Mining to Forecast the Future of Running Cases, 2nd Internation workshop on New Fronteirs in Mining compelx Patterns, NFMCP@ECMLPKDD 2013

A. Appice, D. Malerba, V. Morreale, G. Vella, Business Event Forecasting. In: IFKAD 2015, Bari, Italy, 10-12 June

advanced techniques for mining structured data: process miningappice/courses/1617/pm/pm5.pdf ·...

Documents