a two-step fast algorithm for the automated discovery of declarative workflows
DESCRIPTION
A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows. Claudio Di Ciccio and Massimo Mecella. Process Mining. Definition. - PowerPoint PPT PresentationTRANSCRIPT
A Two-Step Fast Algorithm for the Automated Discovery of Declarative WorkflowsClaudio Di Ciccio and Massimo Mecella
Claudio Di Ciccio ([email protected])IEEE Symposium on Computational Intelligence and Data Mining (CIDM 2013)Wednesday, April the 17th, Singapore
Process Mining
• Process Mining [Aalst2011.book], also referred to as Workflow Mining, is the set of techniques that allow the extraction of process descriptions, stemming from a set of recorded real executions (logs).
• ProM [AalstEtAl2009] is one of the most used plug-in based software environment for implementing workflow mining (and more) techniques.• www.processmining.org
Definition
CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows2
• Artful processes [HillEtAl06] informal processes typically
carried out by those people whose work is mental rather than physical (managers, professors, researchers, engineers, etc.)
• “knowledge workers”[ACTIVE09]
• Knowledge workers create artful processes “on the fly”
• Though artful processes are frequently repeated, they are not exactly reproducible, even by their originators, nor can they be easily shared Loosely structured Highly flexible
Artful processes and knowledge workersA different context for process mining
CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows3
MINERful++
MINERful++ is the workflow discovery algorithm of MailOfMine Its input is a collection of strings T and an alphabet ΣT
Each string t is a trace Each character is an event (enacted task) The collection represents the log
Its output is a declarative process model What is a declarative process model?
Our mining algorithm
CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows4
On the modeling of processesThe imperative model
• Represents the whole process at once
• The most used notation is based on a subclass of Petri Nets (namely, the Workflow Nets)
CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows5
On the modeling of processes
• Rather than using a procedural language for expressing the allowed sequence of activities, it is based on the description of workflows through the usage of constraints• the idea is that every task can be
performed, except those which do not respect such constraints
• this technique fits for processes that are highly flexible and subject to changes, such as artful processes
The declarative modelIf A is performed,
B must be perfomed,no matter
before or afterwards(responded existence)
Whenever B is performed,C must be performed
afterwardsand B can not be repeated
until C is done(alternate response)
The notation here is based on [Pesic08, MaggiEtAl11] (ConDec, Declare)
CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows6
Constraint templates as Regular Expressions (REs)Declare constraint templates
CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows7
On the modeling of processesImperative vs. declarative
Imperative
Declarative
Declarative models work better in presence of a partial specification of the process scheme
CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows8
A real discovered process model“Spaghetti process” [Aalst2011.book]
CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows9
The declarative specification of an artful process (e.g.)
A project meeting is scheduled We suppose that a final agenda will be committed
(“confirmAgenda”) after that requests for a new proposal (“requestAgenda”), proposals themselves (“proposeAgenda”) and comments (“commentAgenda”) have been circulated.
Shortcuts for tasks (process alphabet): p (“proposeAgenda”) r (“requestAgenda”) c (“commentAgenda”) n (“confirmAgenda”)
Scenario
CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows10
The declarative specification of an artful process (e.g.)
Existence constraints1. Participation(n)2. Uniqueness(n)3. End(n)
Relation constraints4. Response(r, p)5. RespondedExistence(c, p)6. Succession(p, n)
The agenda1. must be confirmed,2. only once:3. it is the last thing to do.
During the compilation:4. the proposal follows a
request;5. if a comment circulates,
there has been / will be a proposal;
6. after the proposal, there will be a confirmation, and there can be no confirmation without a preceding proposal.
Constraints on activities
CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows11
MINERful++
MINERful++ is a two-step algorithm1. Construction of a Knowledge Base2. Constraints inference by means of queries evaluated on the KB
This allows the discovery of constraints through a faster procedure on data which are smaller in size than the whole input
Returned constraints are weighted with their support the normalized fraction of cases in which the constraint is verified
over the set of input traces
Workflow discovery by constraints inference
CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows12
MINERful++ by example
In order to see how it works now we see a run of MINERful++ over a string, compliant with the previous
example:
r r p c r p c r c p c n
We start with the construction of the algorithm’s Knowledge Base
Workflow discovery by constraints inference
CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows13
MINERful++ by example
p p occurred 3 times in 1 string
γp(3) = 1
• For each m ≠ 3 γp(m) = 0
p did not occur as the first nor as the last charactergi(p) = 0gl(p) = 0
n γn(1) = 1
For each m ≠ 1, γn(m) = 0
n occurred as the last character in 1 stringgi(n) = 0gl(n) = 1
Building the “ownplay” of p and n
r r p c r p c r c p c n
r r p c r p c r c p c n
CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows14
MINERful++ by example
With respect to the occurrence of p, n occurred…i. Never before: 3 times
δp,n(-∞) = 3ii. 2 char’s after: 1 time
δp,n(2) = 1iii. 6 char’s after: 1 time
δp,n(6) = 1iv. 9 char’s after: 1 time
δp,n(9) = 1v. Repetitions in-between:
i. onwards: 2 timesb→
p,n = 2ii. backwards: never
b←p,n = 0
Looking at the string
i. r r p c r p c r c p c n
ii. r r p c r p c r c p c n
iii. r r p c r p c r c p c n
iv. r r p c r p c r c p c n
v. i. r r p c r p c r c p c n
ii. r r p c r p c r c p c n
Building the “interplay” of p and n
CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows15
MINERful++ by example
-∞ -5 -2 +1 +2 +4 +5 +8 +9
δr,p 2 1 2 2 2 1 2 1 1
Building the “interplay” of r and p
b→r,p = 1
b←r,p = 0
r r p c r p c r c p c n
CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows16
MINERful++
Let r and p be the number of times in which r and p respectively appear in the log
we have, e.g.,support for Response(r, p)
hint: how many times p was not read in the traces after r occurred?In those cases, Response(r, p) does not hold
support for Succession(r, p) how many times p was not read in the traces after r occurred,
nor r was read before p occurred?In those cases, Succession(r, p) does not hold
…
Computing the support of constraints
CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows17
MINERful++
Computing the support of constraints
CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows18
MINERful++ by example
1. r r p c r p c r c p c n
2. r c c r p p c p c r c p p n
3. c c c c c r r p c r r p r r p r p p p n
4. c p r p n
5. r p p n
6. r r r c r c p n
7. r p c n
8. c r p r r c c p p n
9. p c c n
10. c r r c c c r r p c p c c p n
Support for… Response(r, p)
1.0 Succession(r, p)
0.96364 Precedence(c, n)
0.9 The support can be used to prune out
those constraints falling under a given threshold E.g., 0.95
A threshold equal to 1.0 selects those constraints which are always valid on the log
Computing the support of constraints
CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows19
Relation constraint templates subsumption
E.g., A trace like
a b a b c a b c csatisfies (w.r.t. b and a):
• RespondedExistence(a, b), RespondedExistence(b, a),CoExistence(a, b), CoExistence(b, a), Response(a, b), AlternateResponse(a, b), ChainResponse(a, b), Precedence(a, b), AlternatePrecedence(a, b), ChainPrecedence(a, b),Succession(a, b), AlternateSuccession(a, b),ChainSuccession(a, b)
The mining algorithm would have to show the most strict constraint only (ChainSuccession(a, b))
MINERful++ faces this issue, by pruning the returned constraints on the basis of the subsumption hierarchy of constraints
Constraint templates are not independent of each other
CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows20
Relation constraint templates subsumptionConstraint templates are not independent of each other
CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows21
Relation constraint templates subsumptionA hint on the pruning procedure
1.0
1.0
1.0
1.0
1.0
1.01.01.0
1.01.01.0
✖✖✖
✖✖
✖✖
✖✖✖
CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows22
Relation constraint templates subsumptionA hint on the pruning procedure
0.8
0.8
0.9
0.9
0.9
0.90.70.7
0.90.90.9
✖ ✖?✖?✖
✖
✖
✖
?✖?✖
CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows23
Support
Support
Evaluation
MINERful++ is Independent on the formalism used for expressing constraints Modular (two-phase) Capable of eliminating redundancy in the process model Fast
Characteristics of MINERful++
CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows
Sony VAIO VGN-FE11HIntel Core Duo T2300 1.66 GHz2 MB L2 cache2 GB of DDR2 RAM @ 667 Mhz
24
Evaluation
Linear w.r.t. the number of traces in the log|T|
Quadratic w.r.t. the size of traces in the log|tmax|
Quadratic w.r.t. the size of the alphabet|ΣT|
Hence, polynomial in the size of the inputO(|T|·|tmax |2·|ΣT |2)
CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows
On the complexity of MINERful++
25
Preliminary resultsMINERful++ on real data
Logs extracted from 2 mailboxes [DiCiccioMecella2012]
5 traces 34.75 events each on average 139 events read in total
Evaluation of inferred constraints conducted with an expert domain
Precision ≈ 0.794
26 CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows
Future work
Integrate MINERful++ with ProM MINERful++ is already capable of reading/writing XES logs
Study the effects of errors in logs on the inferred workflow Error-injected synthetic logs can help us conduct an automated
analysis on the quality of results Refine the estimation calculi for support Auto-tune the support threshold
Depending on the constraint template, which can be more or less “robust”
Enlarge the set of discovered constraint templates to the branching Declare constraints
Research in progress
CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows27
References
• [Aalst2011.book] van der Aalst, W.M.P.: Process Mining: Discovery, Conformance and Enhancement of Business Processes. Springer (2011).
• [AalstEtAl2009] van der Aalst, W.M.P., van Dongen, B.F., Güther, C.W., Rozinat, A., Verbeek, E., Weijters, T.: Prom: The process mining toolkit. In de Medeiros, A.K.A., Weber, B., eds.: BPM (Demos). Volume 489 of CEUR Workshop Proceedings., CEUR-WS.org (2009)
• [HillEtAl06] Hill, C., Yates, R., Jones, C., Kogan, S.L.: Beyond predictable workflows: Enhancing productivity in artful business processes. IBM Systems Journal 45(4), 663–682 (2006)
• [ACTIVE09] Warren, P., Kings, N., et al.: Improving knowledge worker productivity - the active integrated approach. BT Technology Journal 26(2), 165–176 (2009)
• [Pesic08] Pesic, M.: Constraint-based Workflow Management Systems: Shifting Control to Users. Ph.D. Thesis. Technische Universiteit Eindhoven, 2008.
• [MaggiEtAl11] Maggi, F.M., Mooij, A.J., van der Aalst, W.M.P.: User-guided discovery of declar- ative process models. In: CIDM, IEEE (2011) 192–199
• [DiCiccioEtAl2012] Di Ciccio, C., Mecella, M., Scannapieco, M., Zardetto, D., Catarci, T.: MailOfMine – analyzing mail messages for mining artful col- laborative processes. In: Data-Driven Process Discovery and Analysis. Springer, 55-81 (2012).
Cited articles and resources, in order of appearance
CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows28