a two-step fast algorithm for the automated discovery of declarative workflows

28
A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows Claudio Di Ciccio and Massimo Mecella Claudio Di Ciccio ([email protected]) IEEE Symposium on Computational Intelligence and Data Mining (CIDM 2013) Wednesday, April the 17 th , Singapore

Upload: dior

Post on 29-Jan-2016

51 views

Category:

Documents


0 download

DESCRIPTION

A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows. Claudio Di Ciccio and Massimo Mecella. Process Mining. Definition. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows

A Two-Step Fast Algorithm for the Automated Discovery of Declarative WorkflowsClaudio Di Ciccio and Massimo Mecella

Claudio Di Ciccio ([email protected])IEEE Symposium on Computational Intelligence and Data Mining (CIDM 2013)Wednesday, April the 17th, Singapore

Page 2: A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows

Process Mining

• Process Mining [Aalst2011.book], also referred to as Workflow Mining, is the set of techniques that allow the extraction of process descriptions, stemming from a set of recorded real executions (logs).

• ProM [AalstEtAl2009] is one of the most used plug-in based software environment for implementing workflow mining (and more) techniques.• www.processmining.org

Definition

CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows2

Page 3: A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows

• Artful processes [HillEtAl06] informal processes typically

carried out by those people whose work is mental rather than physical (managers, professors, researchers, engineers, etc.)

• “knowledge workers”[ACTIVE09]

• Knowledge workers create artful processes “on the fly”

• Though artful processes are frequently repeated, they are not exactly reproducible, even by their originators, nor can they be easily shared Loosely structured Highly flexible

Artful processes and knowledge workersA different context for process mining

CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows3

Page 4: A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows

MINERful++

MINERful++ is the workflow discovery algorithm of MailOfMine Its input is a collection of strings T and an alphabet ΣT

Each string t is a trace Each character is an event (enacted task) The collection represents the log

Its output is a declarative process model What is a declarative process model?

Our mining algorithm

CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows4

Page 5: A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows

On the modeling of processesThe imperative model

• Represents the whole process at once

• The most used notation is based on a subclass of Petri Nets (namely, the Workflow Nets)

CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows5

Page 6: A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows

On the modeling of processes

• Rather than using a procedural language for expressing the allowed sequence of activities, it is based on the description of workflows through the usage of constraints• the idea is that every task can be

performed, except those which do not respect such constraints

• this technique fits for processes that are highly flexible and subject to changes, such as artful processes

The declarative modelIf A is performed,

B must be perfomed,no matter

before or afterwards(responded existence)

Whenever B is performed,C must be performed

afterwardsand B can not be repeated

until C is done(alternate response)

The notation here is based on [Pesic08, MaggiEtAl11] (ConDec, Declare)

CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows6

Page 7: A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows

Constraint templates as Regular Expressions (REs)Declare constraint templates

CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows7

Page 8: A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows

On the modeling of processesImperative vs. declarative

Imperative

Declarative

Declarative models work better in presence of a partial specification of the process scheme

CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows8

Page 9: A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows

A real discovered process model“Spaghetti process” [Aalst2011.book]

CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows9

Page 10: A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows

The declarative specification of an artful process (e.g.)

A project meeting is scheduled We suppose that a final agenda will be committed

(“confirmAgenda”) after that requests for a new proposal (“requestAgenda”), proposals themselves (“proposeAgenda”) and comments (“commentAgenda”) have been circulated.

Shortcuts for tasks (process alphabet): p (“proposeAgenda”) r (“requestAgenda”) c (“commentAgenda”) n (“confirmAgenda”)

Scenario

CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows10

Page 11: A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows

The declarative specification of an artful process (e.g.)

Existence constraints1. Participation(n)2. Uniqueness(n)3. End(n)

Relation constraints4. Response(r, p)5. RespondedExistence(c, p)6. Succession(p, n)

The agenda1. must be confirmed,2. only once:3. it is the last thing to do.

During the compilation:4. the proposal follows a

request;5. if a comment circulates,

there has been / will be a proposal;

6. after the proposal, there will be a confirmation, and there can be no confirmation without a preceding proposal.

Constraints on activities

CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows11

Page 12: A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows

MINERful++

MINERful++ is a two-step algorithm1. Construction of a Knowledge Base2. Constraints inference by means of queries evaluated on the KB

This allows the discovery of constraints through a faster procedure on data which are smaller in size than the whole input

Returned constraints are weighted with their support the normalized fraction of cases in which the constraint is verified

over the set of input traces

Workflow discovery by constraints inference

CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows12

Page 13: A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows

MINERful++ by example

In order to see how it works now we see a run of MINERful++ over a string, compliant with the previous

example:

r r p c r p c r c p c n

We start with the construction of the algorithm’s Knowledge Base

Workflow discovery by constraints inference

CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows13

Page 14: A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows

MINERful++ by example

p p occurred 3 times in 1 string

γp(3) = 1

• For each m ≠ 3 γp(m) = 0

p did not occur as the first nor as the last charactergi(p) = 0gl(p) = 0

n γn(1) = 1

For each m ≠ 1, γn(m) = 0

n occurred as the last character in 1 stringgi(n) = 0gl(n) = 1

Building the “ownplay” of p and n

r r p c r p c r c p c n

r r p c r p c r c p c n

CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows14

Page 15: A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows

MINERful++ by example

With respect to the occurrence of p, n occurred…i. Never before: 3 times

δp,n(-∞) = 3ii. 2 char’s after: 1 time

δp,n(2) = 1iii. 6 char’s after: 1 time

δp,n(6) = 1iv. 9 char’s after: 1 time

δp,n(9) = 1v. Repetitions in-between:

i. onwards: 2 timesb→

p,n = 2ii. backwards: never

b←p,n = 0

Looking at the string

i. r r p c r p c r c p c n

ii. r r p c r p c r c p c n

iii. r r p c r p c r c p c n

iv. r r p c r p c r c p c n

v. i. r r p c r p c r c p c n

ii. r r p c r p c r c p c n

Building the “interplay” of p and n

CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows15

Page 16: A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows

MINERful++ by example

-∞ -5 -2 +1 +2 +4 +5 +8 +9

δr,p 2 1 2 2 2 1 2 1 1

Building the “interplay” of r and p

b→r,p = 1

b←r,p = 0

r r p c r p c r c p c n

CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows16

Page 17: A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows

MINERful++

Let r and p be the number of times in which r and p respectively appear in the log

we have, e.g.,support for Response(r, p)

hint: how many times p was not read in the traces after r occurred?In those cases, Response(r, p) does not hold

support for Succession(r, p) how many times p was not read in the traces after r occurred,

nor r was read before p occurred?In those cases, Succession(r, p) does not hold

Computing the support of constraints

CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows17

Page 18: A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows

MINERful++

Computing the support of constraints

CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows18

Page 19: A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows

MINERful++ by example

1. r r p c r p c r c p c n

2. r c c r p p c p c r c p p n

3. c c c c c r r p c r r p r r p r p p p n

4. c p r p n

5. r p p n

6. r r r c r c p n

7. r p c n

8. c r p r r c c p p n

9. p c c n

10. c r r c c c r r p c p c c p n

Support for… Response(r, p)

1.0 Succession(r, p)

0.96364 Precedence(c, n)

0.9 The support can be used to prune out

those constraints falling under a given threshold E.g., 0.95

A threshold equal to 1.0 selects those constraints which are always valid on the log

Computing the support of constraints

CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows19

Page 20: A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows

Relation constraint templates subsumption

E.g., A trace like

a b a b c a b c csatisfies (w.r.t. b and a):

• RespondedExistence(a, b), RespondedExistence(b, a),CoExistence(a, b), CoExistence(b, a), Response(a, b), AlternateResponse(a, b), ChainResponse(a, b), Precedence(a, b), AlternatePrecedence(a, b), ChainPrecedence(a, b),Succession(a, b), AlternateSuccession(a, b),ChainSuccession(a, b)

The mining algorithm would have to show the most strict constraint only (ChainSuccession(a, b))

MINERful++ faces this issue, by pruning the returned constraints on the basis of the subsumption hierarchy of constraints

Constraint templates are not independent of each other

CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows20

Page 21: A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows

Relation constraint templates subsumptionConstraint templates are not independent of each other

CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows21

Page 22: A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows

Relation constraint templates subsumptionA hint on the pruning procedure

1.0

1.0

1.0

1.0

1.0

1.01.01.0

1.01.01.0

✖✖✖

✖✖

✖✖

✖✖✖

CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows22

Page 23: A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows

Relation constraint templates subsumptionA hint on the pruning procedure

0.8

0.8

0.9

0.9

0.9

0.90.70.7

0.90.90.9

✖ ✖?✖?✖

?✖?✖

CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows23

Support

Support

Page 24: A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows

Evaluation

MINERful++ is Independent on the formalism used for expressing constraints Modular (two-phase) Capable of eliminating redundancy in the process model Fast

Characteristics of MINERful++

CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows

Sony VAIO VGN-FE11HIntel Core Duo T2300 1.66 GHz2 MB L2 cache2 GB of DDR2 RAM @ 667 Mhz

24

Page 25: A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows

Evaluation

Linear w.r.t. the number of traces in the log|T|

Quadratic w.r.t. the size of traces in the log|tmax|

Quadratic w.r.t. the size of the alphabet|ΣT|

Hence, polynomial in the size of the inputO(|T|·|tmax |2·|ΣT |2)

CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows

On the complexity of MINERful++

25

Page 26: A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows

Preliminary resultsMINERful++ on real data

Logs extracted from 2 mailboxes [DiCiccioMecella2012]

5 traces 34.75 events each on average 139 events read in total

Evaluation of inferred constraints conducted with an expert domain

Precision ≈ 0.794

26 CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows

Page 27: A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows

Future work

Integrate MINERful++ with ProM MINERful++ is already capable of reading/writing XES logs

Study the effects of errors in logs on the inferred workflow Error-injected synthetic logs can help us conduct an automated

analysis on the quality of results Refine the estimation calculi for support Auto-tune the support threshold

Depending on the constraint template, which can be more or less “robust”

Enlarge the set of discovered constraint templates to the branching Declare constraints

Research in progress

CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows27

Page 28: A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows

References

• [Aalst2011.book] van der Aalst, W.M.P.: Process Mining: Discovery, Conformance and Enhancement of Business Processes. Springer (2011).

• [AalstEtAl2009] van der Aalst, W.M.P., van Dongen, B.F., Güther, C.W., Rozinat, A., Verbeek, E., Weijters, T.: Prom: The process mining toolkit. In de Medeiros, A.K.A., Weber, B., eds.: BPM (Demos). Volume 489 of CEUR Workshop Proceedings., CEUR-WS.org (2009)

• [HillEtAl06] Hill, C., Yates, R., Jones, C., Kogan, S.L.: Beyond predictable workflows: Enhancing productivity in artful business processes. IBM Systems Journal 45(4), 663–682 (2006)

• [ACTIVE09] Warren, P., Kings, N., et al.: Improving knowledge worker productivity - the active integrated approach. BT Technology Journal 26(2), 165–176 (2009)

• [Pesic08] Pesic, M.: Constraint-based Workflow Management Systems: Shifting Control to Users. Ph.D. Thesis. Technische Universiteit Eindhoven, 2008.

• [MaggiEtAl11] Maggi, F.M., Mooij, A.J., van der Aalst, W.M.P.: User-guided discovery of declar- ative process models. In: CIDM, IEEE (2011) 192–199

• [DiCiccioEtAl2012] Di Ciccio, C., Mecella, M., Scannapieco, M., Zardetto, D., Catarci, T.: MailOfMine – analyzing mail messages for mining artful col- laborative processes. In: Data-Driven Process Discovery and Analysis. Springer, 55-81 (2012).

Cited articles and resources, in order of appearance

CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows28