distributed process discovery and conformance checking

79
Distributed Process Discovery and Conformance Checking prof.dr.ir. Wil van der Aalst www.processmining.org

Upload: wil-van-der-aalst

Post on 07-Nov-2014

269 views

Category:

Technology


1 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Distributed Process Discovery and Conformance Checking

Distributed Process Discovery and Conformance Checking

prof.dr.ir. Wil van der Aalstwww.processmining.org

Page 2: Distributed Process Discovery and Conformance Checking

PAGE 1

On the different roles of (process) models …

Page 3: Distributed Process Discovery and Conformance Checking

Play-Out

PAGE 2

Page 4: Distributed Process Discovery and Conformance Checking

Play-Out (Classical use of models)

PAGE 3

A B C D

A C B DA B C D

A E D

A C B DA C B D

A E D

A E D

Page 5: Distributed Process Discovery and Conformance Checking

Play-In

PAGE 4

Page 6: Distributed Process Discovery and Conformance Checking

Play-In

PAGE 5

A C B DA B C D

A E D

A C B DA C B D

A E D

A E DA B C D

Page 7: Distributed Process Discovery and Conformance Checking

Example Process Discovery(Vestia, Dutch housing agency, 208 cases, 5987 events)

PAGE 6

Page 8: Distributed Process Discovery and Conformance Checking

Example Process Discovery(ASML, test process lithography systems, 154966 events)

PAGE 7

Page 9: Distributed Process Discovery and Conformance Checking

Example Process Discovery(AMC, 627 gynecological oncology patients, 24331 events)

PAGE 8

Page 10: Distributed Process Discovery and Conformance Checking

Replay

PAGE 9

Page 11: Distributed Process Discovery and Conformance Checking

Replay

PAGE 10

A B C D

Page 12: Distributed Process Discovery and Conformance Checking

Replay

PAGE 11

A E D

Page 13: Distributed Process Discovery and Conformance Checking

Replay can detect problems

PAGE 12

AC D

Problem!missing token

Problem!token left behind

Page 14: Distributed Process Discovery and Conformance Checking

Conformance Checking (WOZ objections Dutch municipality, 745 objections, 9583 event, f= 0.988)

PAGE 13

Page 15: Distributed Process Discovery and Conformance Checking

Replay can extract timing information

PAGE 14

A5B8 C9 D13

5

8

9

13

3

4

5

43

265

8

764

7

74

3

Page 16: Distributed Process Discovery and Conformance Checking

PAGE 15

Performance Analysis Using Replay(WOZ objections Dutch municipality, 745 objections, 9583 event, f= 0.988)

Page 17: Distributed Process Discovery and Conformance Checking

PAGE 16

Big Data

Page 18: Distributed Process Discovery and Conformance Checking

Big Data

PAGE 17

Source: “Big Data: The Next Frontier for Innovation, Competition, and Productivity” McKinsey Global Institute, 2011.

“Enterprises globally stored more than 7 exabytesof new data on disk drives in 2010, while consumers stored morethan 6 exabytes of new data on devices such as PCs and notebooks.”

“All of the world's music can be stored

on a $600 disk drive.”

“Indeed, we are generating so much data today that it is

physically impossible to store it all. Health

care providers, for instance, discard 90

percent of the data that they generate.”

Page 19: Distributed Process Discovery and Conformance Checking

PAGE 18

Hilbert and Lopez. The World's Technological Capacity to Store, Communicate, and Compute Information. Science, 332(6025):60-65, 2011.

Page 20: Distributed Process Discovery and Conformance Checking

PAGE 19

www.olifantenpaadjes.nl

Page 21: Distributed Process Discovery and Conformance Checking

PAGE 20

Page 22: Distributed Process Discovery and Conformance Checking

PAGE 21

Page 23: Distributed Process Discovery and Conformance Checking

Evidence-Based Computer Science

PAGE 22

Page 24: Distributed Process Discovery and Conformance Checking

PAGE 23

How good is my model?

Page 25: Distributed Process Discovery and Conformance Checking

Four Competing Quality Criteria

PAGE 24

“able to replay event log” “Occam’s razor”

“not overfitting the log” “not underfitting the log”

Page 26: Distributed Process Discovery and Conformance Checking

Example: one log four models

PAGE 25

astart register

request

bexamine thoroughly

cexamine casually

d checkticket

decide

pay compensation

reject request

reinitiate requeste

g

hfend

astart register

request

cexamine casually

dcheckticket

decide reject request

e hend

N3 : fitness = +, precision = -, generalization = +, simplicity = +

N2 : fitness = -, precision = +, generalization = -, simplicity = +

astart register

request

bexamine

thoroughly

cexamine casually

dcheck ticket

decide

pay compensation

reject request

reinitiate request

e

g

h

f

end

N1 : fitness = +, precision = +, generalization = +, simplicity = +

astart register

request

cexamine casually

dcheckticket

decide reject request

e hend

N4 : fitness = +, precision = +, generalization = -, simplicity = -

aregister request

dexamine casually

ccheckticket

decide reject request

e h

a cexamine casually

dcheckticket

decide

e g

a dexamine casually

ccheckticket

decide

e g

register request

register request

pay compensation

pay compensation

aregister request

b dcheckticket

decide reject request

e h

aregister request

d bcheckticket

decide reject request

e h

a b dcheckticket

decide

e gregister request

pay compensation

examine thoroughly

examine thoroughly

examine thoroughly

(all 21 variants seen in the log)

acdeh

abdeg

adceh

abdeh

acdeg

adceg

adbeh

acdefdbeh

adbeg

acdefbdeh

acdefbdeg

acdefdbeg

adcefcdeh

adcefdbeh

adcefbdeg

acdefbdefdbeg

adcefdbeg

adcefbdefbdeg

adcefdbefbdeh

adbefbdefdbeg

adcefdbefcdefdbeg

455

191

177

144

111

82

56

47

38

33

14

11

9

8

5

3

2

2

1

1

1

# trace

1391

process discovery

fitness

precisiongeneralization

simplicity

“able to replay event log” “Occam’s razor”

“not overfitting the log” “not underfitting the log”

Page 27: Distributed Process Discovery and Conformance Checking

Model N1

PAGE 26

acdeh

abdeg

adceh

abdeh

acdeg

adceg

adbeh

acdefdbeh

adbeg

acdefbdeh

acdefbdeg

acdefdbeg

adcefcdeh

adcefdbeh

adcefbdeg

acdefbdefdbeg

adcefdbeg

adcefbdefbdeg

adcefdbefbdeh

adbefbdefdbeg

adcefdbefcdefdbeg

455

191

177

144

111

82

56

47

38

33

14

11

9

8

5

3

2

2

1

1

1

# trace

1391

Page 28: Distributed Process Discovery and Conformance Checking

Model N2

PAGE 27

acdeh

abdeg

adceh

abdeh

acdeg

adceg

adbeh

acdefdbeh

adbeg

acdefbdeh

acdefbdeg

acdefdbeg

adcefcdeh

adcefdbeh

adcefbdeg

acdefbdefdbeg

adcefdbeg

adcefbdefbdeg

adcefdbefbdeh

adbefbdefdbeg

adcefdbefcdefdbeg

455

191

177

144

111

82

56

47

38

33

14

11

9

8

5

3

2

2

1

1

1

# trace

1391

Page 29: Distributed Process Discovery and Conformance Checking

Model N3

PAGE 28

acdeh

abdeg

adceh

abdeh

acdeg

adceg

adbeh

acdefdbeh

adbeg

acdefbdeh

acdefbdeg

acdefdbeg

adcefcdeh

adcefdbeh

adcefbdeg

acdefbdefdbeg

adcefdbeg

adcefbdefbdeg

adcefdbefbdeh

adbefbdefdbeg

adcefdbefcdefdbeg

455

191

177

144

111

82

56

47

38

33

14

11

9

8

5

3

2

2

1

1

1

# trace

1391

Page 30: Distributed Process Discovery and Conformance Checking

Model N4

PAGE 29

acdeh

abdeg

adceh

abdeh

acdeg

adceg

adbeh

acdefdbeh

adbeg

acdefbdeh

acdefbdeg

acdefdbeg

adcefcdeh

adcefdbeh

adcefbdeg

acdefbdefdbeg

adcefdbeg

adcefbdefbdeg

adcefdbefbdeh

adbefbdefdbeg

adcefdbefcdefdbeg

455

191

177

144

111

82

56

47

38

33

14

11

9

8

5

3

2

2

1

1

1

# trace

1391

astart register

request

cexamine casually

dcheckticket

decide reject request

e hend

N4 : fitness = +, precision = +, generalization = -, simplicity = -

aregister request

dexamine casually

ccheckticket

decide reject request

e h

a cexamine casually

dcheckticket

decide

e g

a dexamine casually

ccheckticket

decide

e g

register request

register request

pay compensation

pay compensation

aregister request

b dcheckticket

decide reject request

e h

aregister request

d bcheckticket

decide reject request

e h

a b dcheckticket

decide

e gregister request

pay compensation

examine thoroughly

examine thoroughly

examine thoroughly

(all 21 variants seen in the log)

Page 31: Distributed Process Discovery and Conformance Checking

PAGE 30

Process Discovery

Page 32: Distributed Process Discovery and Conformance Checking

Process Discovery (small selection)

PAGE 31

α algorithm

α++ algorithm

α# algorithm

language-based regions

state-based regionsgenetic mining

heuristic mining

hidden Markov models

neural networks

automata-based learning

stochastic task graphs

conformal process graph

mining block structures

multi-phase miningpartial-order based mining

fuzzy mining

LTL mining

ILP mining

distributed genetic mining

Page 33: Distributed Process Discovery and Conformance Checking

Petri net view: Just discover the places …

PAGE 32

Adding a place limits behavior:• overfitting ≈ adding too many places • underfitting ≈ adding too few places

Page 34: Distributed Process Discovery and Conformance Checking

Example: Process Discovery UsingState-Based Regions

PAGE 33

[ ]

a

d

b dc

e

cb

[a,b,c,d][a,b,c][a,c]

[ a,b][a,e] [a,d,e]

[a]

a

b

c

de

p2

end

p4

p3p1

start

Page 35: Distributed Process Discovery and Conformance Checking

Example of Region

PAGE 34

[ ]

a

d

b dc

e

cb

[a,b,c,d][a,b,c][a,c]

[ a,b][a,e] [a,d,e]

[a]

a

b

c

de

p2

end

p4

p3p1

start

enter: b,eleave: ddo-not-cross: a,c

Page 36: Distributed Process Discovery and Conformance Checking

Example: Process Discovery UsingLanguage-Based Regions

PAGE 35

R

A place is feasible if it can be added without disabling any of thetraces in the event log.

Page 37: Distributed Process Discovery and Conformance Checking

PAGE 36

Conformance Checking

Page 38: Distributed Process Discovery and Conformance Checking

Replaying trace “abeg”

PAGE 37

m=1r=1

a b e g

6

11

6= 0.83333

Page 39: Distributed Process Discovery and Conformance Checking

Can be lifted to log level

PAGE 38

astart register

request

bexamine

thoroughly

cexamine casually

dcheck ticket

decide

pay compensation

reject request

reinitiate request

e

g

h

f

end

p3p1

p2 p4

N1

p5

acdeh

abdeg

adceh

abdeh

acdeg

adceg

adbeh

acdefdbeh

adbeg

acdefbdeh

acdefbdeg

acdefdbeg

adcefcdeh

adcefdbeh

adcefbdeg

acdefbdefdbeg

adcefdbeg

adcefbdefbdeg

adcefdbefbdeh

adbefbdefdbeg

adcefdbefcdefdbeg

455

191

177

144

111

82

56

47

38

33

14

11

9

8

5

3

2

2

1

1

1

# trace

1391

Page 40: Distributed Process Discovery and Conformance Checking

From “playing the token game” to optimal alignments …

a b » e ga b d e g

PAGE 39

observed trace: “abeg”

move in model only

Page 41: Distributed Process Discovery and Conformance Checking

Another alignment

a b c d e ga b » d e g

PAGE 40

observed trace: “abcdeg”

move in log only

Page 42: Distributed Process Discovery and Conformance Checking

Moves in an alignment

PAGE 41

a b » d e ga » c d e g

trace in event log

possible run of model

move in log

move in model move in both

Optimal alignment describes modeled behavior closest to observed behavior

Page 43: Distributed Process Discovery and Conformance Checking

Moves have costs

• Standard cost function:

−c(x,») = 1−c(»,y) = 1−c(x,y) = 0, if x=y−c(x,y) = ∞, if x≠y

PAGE 42

… a …… » …

… » …… a …

… a …… a …

… a …… b …

Page 44: Distributed Process Discovery and Conformance Checking

Non-fitting trace: abefdeg

PAGE 43

abefdeg

a b » e f d » e ga b d e f d b e g 2

2a b e f d e ga b » » d e g

Page 45: Distributed Process Discovery and Conformance Checking

Any cost structure is possible

PAGE 44

… send-letter(John,2 weeks, $400)

… send-email(Sue,3weeks,$500)

• Similar activities (more similarity implies lower costs).• Resource conformance (done by someone that does

not have the specified role).• Data conformance (path is not possible for this

customer). • Time conformance (missed the legal deadline)

Page 46: Distributed Process Discovery and Conformance Checking

Fitness

PAGE 45

astart register

request

bexamine thoroughly

cexamine casually

d checkticket

decide

pay compensation

reject request

reinitiate requeste

g

hfend

astart register

request

cexamine casually

dcheckticket

decide reject request

e hend

N3 : fitness = +, precision = -, generalization = +, simplicity = +

N2 : fitness = -, precision = +, generalization = -, simplicity = +

astart register

request

bexamine

thoroughly

cexamine casually

dcheck ticket

decide

pay compensation

reject request

reinitiate request

e

g

h

f

end

N1 : fitness = +, precision = +, generalization = +, simplicity = +

astart register

request

cexamine casually

dcheckticket

decide reject request

e hend

N4 : fitness = +, precision = +, generalization = -, simplicity = -

aregister request

dexamine casually

ccheckticket

decide reject request

e h

a cexamine casually

dcheckticket

decide

e g

a dexamine casually

ccheckticket

decide

e g

register request

register request

pay compensation

pay compensation

aregister request

b dcheckticket

decide reject request

e h

aregister request

d bcheckticket

decide reject request

e h

a b dcheckticket

decide

e gregister request

pay compensation

examine thoroughly

examine thoroughly

examine thoroughly

(all 21 variants seen in the log)

acdeh

abdeg

adceh

abdeh

acdeg

adceg

adbeh

acdefdbeh

adbeg

acdefbdeh

acdefbdeg

acdefdbeg

adcefcdeh

adcefdbeh

adcefbdeg

acdefbdefdbeg

adcefdbeg

adcefbdefbdeg

adcefdbefbdeh

adbefbdefdbeg

adcefdbefcdefdbeg

455

191

177

144

111

82

56

47

38

33

14

11

9

8

5

3

2

2

1

1

1

# trace

1391

1.0

0.8

1.0

1.0

Our A* algorithm exploits the Petri net marking equation and uses other “tricks” to prune the search space.

Aligned event log is starting point for other types of analysis.

Page 47: Distributed Process Discovery and Conformance Checking

PAGE 46

Distributing/Decomposing “Big Data” Process Mining Problems

Page 48: Distributed Process Discovery and Conformance Checking

PAGE 47

Page 49: Distributed Process Discovery and Conformance Checking

What if?

PAGE 48

book car

c

add extra insurance

d change booking

e

confirm initiate check-in

j

check driver’s license

k

charge credit card

i

select car

g

supply car

in

a

b

skip extrainsurance

f

h

add extra insurance

skip extra insurance

l

out

c1 c2

c3

c4

c5

c6

c7

c8

c9

c10

c11

acefgijklacddefhkjilabdefjkgil

acdddefkhijlacefgijklabefgjikl

...

processdiscovery

conformance checking

there are more than 1000 different

activities?

there are more than 1.000.000

cases?

there are more than 100.000.000

events?

Page 50: Distributed Process Discovery and Conformance Checking

Distributed computing

• multicore CPU• manycore GPU• cluster computing• grid computing• cloud computing• …

PAGE 49

Page 51: Distributed Process Discovery and Conformance Checking

How to distribute process discovery?

PAGE 50

Page 52: Distributed Process Discovery and Conformance Checking

How to distribute conformance checking?

PAGE 51

abcdegadcefbcfdeg

abdcegabcdefbcdegabdfcefdcegacdefbdceg

abcdegabdceg

abdcefbdcefbdcegabcdeg

abcdefbcdefbdcegabcdefbdceg

acdefgadcfeg

abdcefcdfegabcdeg

a b

c

d

e

c1in

c2

c3

c4

c5

g

outc6

f

abcdegabdceg

abcdefbcdegabcdegabdceg

abdcefbdcefbdcegabcdeg

abcdefbcdefbdcegabcdefbdceg

abcdeg

a b

c

d

e

c1in

c2

c3

c4

c5

g

outc6

f

adcefbcfdegabdfcefdcegacdefbdceg

acdefgadcfeg

abdcefcdfeg

b is often skipped

f occurs too often

Page 53: Distributed Process Discovery and Conformance Checking

Classification based on partitioning of event log: vertical and horizontal

PAGE 52

sets of cases

sets of activities

Page 54: Distributed Process Discovery and Conformance Checking

Replication: Same event log on all computing nodes

PAGE 53

Only makes sense if random elements, e.g., genetic process mining.

Page 55: Distributed Process Discovery and Conformance Checking

Vertical distribution I: Split cases arbitrarily

PAGE 54

abcdegabdcefbcdeg

abdcegabcdefbcdegabdcefbdcegabcdefbdceg

abcdegabdceg

abdcefbdcefbdcegabcdeg

abcdefbcdefbdcegabcdefbdceg

abcdegabdceg

abdcefbcdegabcdeg

abcdegabdcefbcdeg

abdcegabcdefbcdegabdcefbdcegabcdefbdceg

abcdegabdceg

abdcefbdcefbdcegabcdeg

abcdefbcdefbdcegabcdefbdceg

abcdegabdceg

abdcefbcdegabcdeg

sets of cases

Page 56: Distributed Process Discovery and Conformance Checking

Vertical distribution II: Split cases based on a specific feature

PAGE 55

abcdegabdcefbcdeg

abdcegabcdefbcdegabdcefbdcegabcdefbdceg

abcdegabdceg

abdcefbdcefbdcegabcdeg

abcdefbcdefbdcegabcdefbdceg

abcdegabdceg

abdcefbcdegabcdeg

abcdegabdcegabcdegabdcegabcdegabcdegabdcegabcdeg

abdcefbcdegabcdefbcdegabdcefbdcegabcdefbdcegabcdefbdcegabdcefbcdeg

abdcefbdcefbdcegabcdefbcdefbdceg

Page 57: Distributed Process Discovery and Conformance Checking

Horizontal distribution

PAGE 56

sets of activities

Page 58: Distributed Process Discovery and Conformance Checking

Horizontal distribution: The key idea

PAGE 57

projected on a,b,e,f,g

projected on b,c,d,e

Page 59: Distributed Process Discovery and Conformance Checking

PAGE 58

Passages for Horizontal Distribution

Page 60: Distributed Process Discovery and Conformance Checking

Passages

PAGE 59

Page 61: Distributed Process Discovery and Conformance Checking

Passage P=(X,Y)

PAGE 60

causal dependency: may trigger or enable

Page 62: Distributed Process Discovery and Conformance Checking

Minimal passages

PAGE 61

a passage is minimal if it does not contain smaller passages

Page 63: Distributed Process Discovery and Conformance Checking

Passages define an equivalence relation on the edges in the graph

PAGE 62

Page 64: Distributed Process Discovery and Conformance Checking

Minimal passage 1: ({a},{b,c})

PAGE 63

Page 65: Distributed Process Discovery and Conformance Checking

Minimal passage 2: ({b,c,d},{d,e,f})

PAGE 64

Page 66: Distributed Process Discovery and Conformance Checking

Minimal passage 3: ({e},{g})

PAGE 65

Page 67: Distributed Process Discovery and Conformance Checking

Minimal passage 4: ({f},{h})

PAGE 66

Page 68: Distributed Process Discovery and Conformance Checking

Minimal passage 5: ({g,h},{i})

PAGE 67

Page 69: Distributed Process Discovery and Conformance Checking

So What?

• Any process model can be partitioned in minimal passages.

• Discovery and conformance checking can be done per passage!

PAGE 68

b

a

c

d

e

j

i

hf nk

l

m

o

pgi o

clouds may contain arbitrary subprocesses not explicitly recorded in the

event log (invisible activities or small networks used for routing, e.g. XOR/AND/OR-

split/joins)

Page 70: Distributed Process Discovery and Conformance Checking

Example result for Petri nets

PAGE 69

“The event log fits all passages if and only if the event log fits the whole model.”

b

a

c

d

e

j

i

hf nk

l

m

o

pgi o

Key insight: interface transitions controlled by event log

Page 71: Distributed Process Discovery and Conformance Checking

causal structure obtained using heuristics & domain knowledge

Discovery example

PAGE 70

a

in

g

out

b

f

a

b

c

d

c

d

e

f

e

g

a b

c

d

e

c1in

c2

c3

c4

c5

g

outc6

f

Page 72: Distributed Process Discovery and Conformance Checking

Conformance checking

PAGE 71

book car

c

add extra insurance

d change booking

e

confirm initiate check-in

j

check driver’s license

k

charge credit card

i

select car

g

supply car

in

a

b

skip extrainsurance

f

h

add extra insurance

skip extra insurance

l

out

c1 c2

c3

c4

c5

c6

c7

c8

c9

c10

c11

aceflacddeflabdefl

acdddeflaceflabefl

...

Page 73: Distributed Process Discovery and Conformance Checking

Create Skeleton

PAGE 72

Page 74: Distributed Process Discovery and Conformance Checking

Net fragments per passage

PAGE 73

Page 75: Distributed Process Discovery and Conformance Checking

Initial implementation in ProM

PAGE 74

Page 76: Distributed Process Discovery and Conformance Checking

Super linear speedups possible (even when using a single computer decomposition helps)

PAGE 75

Page 77: Distributed Process Discovery and Conformance Checking

PAGE 76

Conclusion

Page 78: Distributed Process Discovery and Conformance Checking

Conclusion

PAGE 77“Big Data”

b

a

c

d

e

j

i

hf nk

l

m

o

pgi o

Page 79: Distributed Process Discovery and Conformance Checking

PAGE 78

www.processmining.org

www.win.tue.nl/ieeetfpm/