enhancing business process mining with distributed tracing data …€¦ · instrument sample...
TRANSCRIPT
Chair of Software Engineering for Business Information Systems (sebis) Faculty of InformaticsTechnische Universität Münchenwwwmatthes.in.tum.de
Enhancing Business Process Mining with Distributed Tracing Data in a Microservice ArchitectureJochen Graeff (B.Sc.) | 21.08.2017 | Master thesis final presentationAdvisor: Martin Kleehaus
Agenda
MotivationResearch questionsApproach§ Build sample architecture§ Instrument sample architecture§ Develop activity generation algorithm§ Set-up extended architecture§ Analysis creation
Live DemoEvaluation§ Benefits§ Limitations
© sebisJochen Graeff – Master thesis final presentation 2
Motivation
How does user behaviour and system behaviour influence each other?
© sebisJochen Graeff – Master thesis final presentation 3
BUSINESS LAYER
APPLICATION LAYER>
Is there a gap between the layers?
© sebisJochen Graeff – Master thesis final presentation 4
S6
S9
S1
S4
S2
S3
S7
S5
S8Application
Layer
Business transactions
Run-time behavior
User session
BusinessLayer Reserve
carShow
Details
Search cars
End rental
Book car
Process discovery of multiple sessions
Search cars
Reservecar
Book car
Show detailsReport Issue
<
Process Mining
Span ID 1
Span ID 2
Span ID 3
S1
S2
S3
DistributedTracing
Analysis of single trace
Transparency
Research questions
RQ1. How can a relationship between business activities and a distributedapplication architecture be established?
RQ2. What data has to be extracted and how has it to be mapped to enableand store the relationship knowledge?
RQ3. How can business process mining be extended with technical aspects inorder to uncover
a) user and system throughput times for business activity executions and,b) correlations between business process performance and system
behaviour?
© sebisJochen Graeff – Master thesis final presentation 5
?
?
?
Build sample architecture
Jochen Graeff – Master thesis final presentation
- Car sharing platform
- Spring Cloud microservicearchitecture
- 3 x infrastructure services
- 6 x business services
1Buildsample architecture
Build sample architecture
© sebisJochen Graeff – Master thesis final presentation 7
EUREKADISCOVERY SERVICE
:8761
CONFIGURA-TION
SERVICE
:8888
API
ACCOUNTINGSERVICE
:6060
CARS SERVICE
:9090
MAPSSERVICE
:6061
NOTIFICA-TIONS
SERVICE
:6063
PAYMENTSSERVICE
:5050
USERSERVICE
:5050
API
API
API
API
API
API
ZUULEDGE
SERVICE
:8762
API
WEBUISERVICE
:5050
API
APIBusiness services
Infrastructure services
HTTP
Front-end services
Database access
Instrument sample architecture
© sebisJochen Graeff – Master thesis final presentation 8
- Car sharing platform
- Spring Cloud microservicearchitecture
- 3 x infrastructure services
- 6 x business services
1Buildsample architecture 2 Instrument
samplearchitecture 3
- Add Spring Cloud Sleuth to every business service
- append sessionIDto span in controller of webuiservice
Instrument sample architecture
- Instrument every business service with spring cloud sleuth in order to generate span data in the applications
§ Add dependency spring-cloud-starter-zipkin§ Set sampling rate
© sebisJochen Graeff – Master thesis final presentation 9
BUSINESSSERVICE
API
WEBUISERVICE
API
- Instrument service (as for business services)- Append sessionID in every webui service endpoint
definition: tracer.addTag("sessionID", sessionID);
Develop activity generation algorithm
© sebisJochen Graeff – Master thesis final presentation 10
- Car sharing platform
- Spring Cloud microservicearchitecture
- 3 x infrastructure services
- 6 x business services
1Buildsample architecture 2 Instrument
samplearchitecture 3
Develop activity generationalgorithm
- Spring Cloud Sleuth
- append sessionIDto span in controller of webuiservice
- definition of required attributes and foreign for event log
- Definition of healthy/failed activities
- user and system activities
- inputs: spans, annotation & mapping table
Activity generation
© sebisJochen Graeff – Master thesis final presentation 11
CASE ID ACTIVITY START_TS END_TS DURATION TYPE SERVICE_NAME
12345 Book car 07/06/17 12:45:32.000
07/06/17 12:45:32.800
800 ms user webui-service
12345 http:/initializebooking
07/06/17 12:45:32.200
07/06/17 12:45:32.700
500 ms system accounting-service
12345 http:/handlecarbooking
07/06/17 12:45:32.400
07/06/17 12:45:32.500
100 ms system cars-serivice
12345 Unlock car 07/06/17 12:37:32.400
07/06/17 12:37:35.400
3000 ms user webui-service
... ... ... ... ... ... ...
TRACE ID
SPAN ID
PARENTID
NAME TIMESTAMP DURATION
a a null http:/bookcar 07/06/17 12:45:32.000
800 ms
a b a initialize 07/06/17 12:45:32.100
600 ms
a c b http:/initializebooking
07/06/17 12:45:32.200
500 ms
a d c http:/handlecarbooking
07/06/17 12:45:32.400
100 ms
b b null http:/opencar 07/06/17 12:37:32.400
3000 ms
spans table (zipkin)
activities table
annotations table (zipkin)TRACE ID
SPAN ID
KEY VALUE
a a http.method GET
a a cs
a a cr
a a sessionID 12345
a b ss
mapping tabletechnical_activity
pretty_name
is_activity
calls_service
http:/bookcar Book car 1 webui-service
http:/initializebooking
http:/initializebooking
1 accounting-service
http:/handlecarbooking
http:/handlecarbooking
1 cars-service
http:/opencar Unlock car 1 webui-service
http:/findroute Find route 1 webui-service
Setup extended architecture
© sebisJochen Graeff – Master thesis final presentation 12
- Car sharing platform
- Spring Cloud microservicearchitecture
- 3 x infrastructure services
- 6 x business services
1Buildsample architecture 2 Instrument
samplearchitecture 3Develop log
generationalgorithm 4 Setup
extended architecture
- Spring Cloud Sleuth
- append sessionIDto span in controller of webuiservice
- definition of required attributes and foreign for event log
- Definition of healthy/failed activities
- user and system activities
- inputs: spans, annotation & mapping table
- Deploy MySQL- Deploy Zipkin- Deploy Celonis- Build log
generation service that executes the algorithm
3Develop activity generationalgorithm
ZIPKINDISTRIBUTED TRACINGSERVICE
:9411
API
ACTIVITIESGENERATIONSERVICE
:6064
API
Setup extended architecture
© sebisJochen Graeff – Master thesis final presentation 13
EUREKADISCOVERY SERVICE
:8761
CONFIGURA-TION
SERVICE
:8888
API
ACCOUNT-ING
SERVICE
:6060
CARS SERVICE
:9090
MAPSSERVICE
:6061
NOTI-FICATIONS SERVICE
:6063
PAYMENTSSERVICE
:5050
USERSERVICE
:5050
API
API
API
API
API
API
ZUULEDGE
SERVICE
:8762
WEBUISERVICE
:5050
API
APIBusiness services
Infrastructure services
HTTP
Front-end services
Database access
MySQLDatabase
Extended services
API CELONIS PROCESS MININGTOOL
:9000
Analysis creation
© sebisJochen Graeff – Master thesis final presentation 14
- Car sharing platform
- Spring Cloud microservicearchitecture
- 3 x infrastructure services
- 6 x business services
1Buildsample architecture 2 Instrument
samplearchitecture 3Develop log
generationalgorithm 4 Setup
extendedarchitecture 5 Analysis
Creation
- Spring Cloud Sleuth
- append sessionIDto span in controller of webuiservice
- definition of required attributes and foreign for event log
- Definition of healthy/failed activities
- user and system activities
- inputs: spans, annotation & mapping table
- Deploy MySQL- Deploy Zipkin- Deploy Celonis- Build log
generation service that executes the algorithm
- Configure data model
- Configure continuous data load
- Create 4 analysis with§ Business§ Application§ Cross-Domain§ Single Activity
views on the process
3Develop activity generationalgorithm
Analysis creation
© sebisJochen Graeff – Master thesis final presentation 15
Live Demo
© sebisJochen Graeff – Master thesis final presentation 16
Benefits
© sebisJochen Graeff – Master thesis final presentation 17
Cross-domain analysis
Provides a more holistic view between the business and application layer
Resource-efficient data source
Resource-efficient (easy to implement) input source for process mining in microservicearchitectures
Portability
Approach transferable to different architectures with limited effort (i.a. due to OpenTracingstandard)
Ubiquity
Ubiquitous through distributed tracing becoming a standard tool for microservicedebugging
Flexibility on process perspectives
Process scope flexibility through appending spans with arbitrary IDs
Bottom up process discovery
Bottom up process discovery in legacy systems
Limitations
© sebisJochen Graeff – Master thesis final presentation 18
System under survey (SUS)
Approach only tested on SUS- Single architecture- Data volume- Data contents
Process visualisation of system activities
Petri nets not a suitable visualisation method for system activities
Performance overhead
Necessity for SampingRate=1.0 for capturing whole process instances leads to performance overhead
Real-time event handling
Presented prototype only generates event log and reloads data model every 5 min
Backup
§ Workflow of activity generation and data reloading§ Related work§ Process Mining§ Inputs for activity generation§ Distributed tracing§ User request and span/trace context§ ‚End car rental‘ user activity sequence diagram
© sebisJochen Graeff – Master thesis final presentation 19
Workflow of activity generation and data reloading
© sebisJochen Graeff – Master thesis final presentation 20
User
Perform clicksin front-end
Trigger manual activitygeneration
Log generation service
Trigger scheduledactivity generation
~ 5min
Execute activity generationalgortihm
MySQL
Write activities
table
Write trace_span
table
Write technical_activities
table
Insert new entry into
reload_triggertable
PMT
Check for new entry in reload_trigger
table
Reload data model
~ 5min
Related work
5.3.R
elatedw
ork
Table 5.2.: Classification of related work
Logorigin
Activitytypes
Capturedbehaviour
Type of work Evaluationenviron-ment
Systemarchitecture
Languageindepend-ence
(Near)real-time
Poggi et al. [53] Web logs UserActivities
Business Algorithmevaluation
Real-lifeevent logs
No n/a No
Abe & Kudo [7] Web logs Useractivities
Business Framework Real-lifeevent logs
n/a n/a No
Bruckmann et al. [13] n/a UserActivities,systemactivities
Businessandsystem
Architectureproposal
n/a n/a n/a Yes
Leemans & van derAalst [41]
Joinpoint-pointcutmodelinstru-mentation
Useractivities
System Instrumentationstrategy,implementation
Real-lifeevent logs
Yes Yes No
Rubin et al. [59] Custominstru-mentation
Useractivities,systemactivities
User andsystem
Experimental Real-lifeevent logs
No n/a No
Proof-of-conceptprototype of this work
Distributedtracinginstru-mentation
Useractivities,systemactivities
Business,user andsystem
Instrumentationstrategy,architecturedescription,implementation
Simulateduserrequestson testingsystem
Yes Yes Yes
95
© sebisJochen Graeff – Master thesis final presentation 21
Process mining
© sebisJochen Graeff – Master thesis final presentation 22
Event LogTIMESTAMP ACTIVITY CASE ID2016-03-05 22:38:41.868 SEARCH CARS #12342016-03-05 23:46:32.306 REPORT ISSUE #56782016-03-05 23:47:42.321 RESERVE CAR #12342016-03-05 23:53:12.354 SHOW DETAILS #9012... ... ...
Search cars
Reserve car
Book car
Show detailsReport issue
<
“The idea of process mining is to discover, monitor and improve realprocesses (i.e., not assumed processes) by extracting knowledge from event logs readily available in today's (information) systems.”
IEEE CIS Task Force on Process Mining
There are three Classes of Process Mining:
1. Process Discovery2. Conformance Checking3. Extension
Develop activity generation algorithm
© sebisJochen Graeff – Master thesis final presentation 23
spans table (zipkin)namespan_idtrace_idstart_tsparent_idduration...
annotations table (zipkin)span_idtrace_ida_keya_valuea_timestamp...
mapping tabletechnical_activitypretty_nametypeis_activitycalls_service
1 has 1..*
activities tablesessionIDactivitystart_tsend_tsdurationtrace_idspan_idservice namefailuresortingtype...
1 has 1..* 1..* has 1
Distributed tracing
© sebisJochen Graeff – Master thesis final presentation 24
The path taken through a simple servicing system on behalf of user request X.
The causal and temporal relationship between four spans of a trace.
Sigelman et al. (2010). Dapper, A Large Scale Distributed Systems Tracing Infrastructure. Google Research
time
SERVICE CTraceID = 100SpanID = 103ParentID = 101
SERVICE BTraceID = 100SpanID = 102ParentID = 101
SERVICE ATraceID = 100SpanID = 101ParentID = 100
FRONT-ENDTraceID = 100SpanID = 100ParentID = null
user
FRONTENDSERVICE
SERVICE A
SERVICE B SERVICE C
rpc1
rpc2 rpc3
requestX responseX
User request and span/trace context
© sebisJochen Graeff – Master thesis final presentation 25
ZUULSERVICE
ACCOUNTINGSERVICE
CARMANAGEMENTSERVICE
requestrequestrequest
responseresponseresponse
TraceID =aSpanID =asr
TraceID =aSpanID =ccs
TraceID =aSpanID =b
TraceID =aSpanID =ccr
TraceID =aSpanID =ass
TraceID =aSpanID =b
TraceID =aSpanID =csr
TraceID =aSpanID =ecs
TraceID =aSpanID =d
TraceID =aSpanID =ecr
TraceID =aSpanID =css
TraceID =aSpanID =d
TraceID =aSpanID =esr
TraceID =aSpanID =f
TraceID =aSpanID =ess
TraceID =aSpanID =f
‚End car rental‘ user activity
CARSSERVICE
lockscar
ACCOUNTINGSERVICE
PAYMENTSERVICE
ZUULSERVICE
NOTIFICATIONSERVICE
GET /lockCar
User
GET /endRental
GET /handlePayment
PUT /notifyUser
GET /finalizeBooking
notifiesuser
handlespayment
finalizesbooking
© sebisJochen Graeff – Master thesis final presentation 26