transactional monitoring for loosely coupled service architectures
TRANSCRIPT
@dkhan
Transactional monitoring for loosely coupled service architectures Daniel KhanNode.js Technology Lead
Some BackgroundWho I am and what I do• Daniel Khan• @dkhan• [email protected]• Technology lead @Dynatrace• Performance Monitoring
@dkhan
The ConsumersView
@dkhan
2000
@dkhan
2005
@dkhan
2016
@dkhan
The new world of Microservices
Teams choose their technologies freely Independent deployment Elastic scaling Service brokers Circuit breakers Unknown or obscure dependencies Randomly interwoven third party dependencies The monoliths are still somewhere
@dkhan
The website is
slow!
@dkhan
Find the Faulty Part
@dkhan
@dkhan
Find out before the User does
@dkhan
So we have to Monitor
@dkhan
Follow each Transaction
Complete Transaction CoverageBrowser / Native Mobile Java/
.NET
PerformanceWarehouse
PurePathCollector
DynatraceServer
DynatraceClient
SessionsStore
ExportedSession
OfflineSession Analysis
Web Server/ PHP
C++, VB, ADK
CICS
Mainframez/OS
MQ/ESB
Database
@dkhan
@dkhan
@dkhan
2016
@dkhan
3 Metrics per Service
5 Metrics per Host
5 Metrics per Runtime
40 Services = 120 Metrics
20 Hosts = 100 Metrics
40 Runtimes = 200 Metrics
420 Metrics
@dkhan
We cannot watch 400+ metrics So we need to find ways to automate finding anomalies
@dkhan
Response Times
Error Rates
Load
Anomaly Detection
Historic
Data
“Normal”
Model
New Data
Hypothesis
Likeliness
Judgement
update
calculate derive
testproduces
Anomaly?
defines
Anomaly Detection Workflow
@dkhan
Distinguish Impact from Cause
Automated Analysis of ProblemsService slowdown
Automated Analysis of ProblemsService slowdown
Dependent services slow down
Automated Analysis of ProblemsService slow down
Dependent service slow down
Users are affected
Automated Analysis of ProblemsService slow down
Dependent service slow down
Users are affected
Analyze Dependencies
Automated Analysis of ProblemsService slow down
Dependent service slow down
Users are affected
Analyze Dependencies
Exclude non-relevant services
Automated Analysis of ProblemsService slow down
Dependent service slow down
Users are affected
Analyze Dependencies
Exclude non-relevant services
Follow causality chain
Automated Analysis of ProblemsService slow down
Dependent service slow down
Users are affected
Analyze Dependencies
Exclude non-relevant services
Follow causality chain
@dkhan
Productized
@dkhan
@dkhan
@dkhan
@dkhan
Thank You! | Daniel Khan | @dkhan | [email protected]