magpie : distributed profiling for performance analysis

36
Magpie: Distributed Profiling for Performance Analysis Paul Barham (joint work with Rebecca Isaacs and Richard Mortier) 11 th November 2002

Upload: hanae-french

Post on 04-Jan-2016

53 views

Category:

Documents


2 download

DESCRIPTION

Magpie : Distributed Profiling for Performance Analysis. Paul Barham (joint work with Rebecca Isaacs and Richard Mortier) 11 th November 2002. What is Magpie?. Bottom-up approach to characterising the workload of a distributed system Observe concurrency, communication & latency - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Magpie :  Distributed Profiling for Performance Analysis

Magpie: Distributed Profiling for Performance Analysis

Paul Barham(joint work with Rebecca Isaacs

and Richard Mortier)11th November 2002

Page 2: Magpie :  Distributed Profiling for Performance Analysis

What is Magpie?

• Bottom-up approach to characterising the workload of a distributed system

• Observe concurrency, communication & latency– Focus is on responsiveness, rather than throughput

• Resource consumption is accounted to individual service invocations– CPU, disk and network b/w consumed by each request on

each machine in the distributed system– “Low level accounting + higher level annotations”– Online measurement, offline processing

• Why? Web Services = “wide area synchronous RPC”

Page 3: Magpie :  Distributed Profiling for Performance Analysis

Motivation: Indy PerformanceModelling Toolkit

• Indy is a hybrid event/throughput based modelling kernel– Inputs: Topology, hardware, workload– Outputs: Utilizations, response times, bottlenecks, etc.

• Scope is multi-tier server farms running .NET web sites• Our goal is to use Magpie to derive a generative model of the

system workload suitable for input to Indy– Acquire a workload description with less human effort / knowledge

than current approach of “incremental micro-benchmarking”

– Extract a detailed model from a live, ‘representative’ system• Measure with a realistic mix of transaction types – caches! • Not just a long-term average across all transactions

– Build a probabilistic model of the usage profile which includes “hidden” transaction types, eg error conditions, session state

• Complex behaviour may not be easily observable manually

• e.g. web transaction type discriminator is not necessarily the URL

Page 4: Magpie :  Distributed Profiling for Performance Analysis

Outline of rest of talk

1. Instrumenting a .NET web site

2. Some raw data and visualizations

3. Data analysis by clustering

4. Current status and future work

Page 5: Magpie :  Distributed Profiling for Performance Analysis

Duwamish7: VS.Net Enterprise Sample Site

Page 6: Magpie :  Distributed Profiling for Performance Analysis

10,000ft view of an e-commerce site

http://someurl.aspx

SQL request

data

web page

Client Web Server SQL Server

Page 7: Magpie :  Distributed Profiling for Performance Analysis

A little bit more detail…

StoredProcs D

BM

S

SQL Server(s)

Cac

he

Web Server(s)Client(s)

AS

P.N

et

BusinessLogic

AD

O.N

et

CLR / Jit Compiler

w3wp

frontend

Cac

he

• Site-specific content and code shown in purple

• Requests can be either static content, or active – execute code to generate HTML

• Code may execute SQL queries on database.

Page 8: Magpie :  Distributed Profiling for Performance Analysis

Internal Architecture of IIS6 / ASP.NET

• w3wp.exe – Per-“site” worker proc– ISAPI extension hosts CLR– ASP.Net, ADO.Net

• Threadpools – One for CLR, one for IIS

• Nested state machines– One per-request, event driven – Non-blocking upcalls

• All async I/O – Uses single IOCompletionPort– Demux using “OVERLAPPED”s

• http.sys– aka “Universal listener”– Kernel-mode driver– “TransmitFile() on steroids”– HTTP fragment cache

http.sys

ST

AR

T

UR

LIN

FO

AU

TH

EN

T

AU

TH

OR

IZ

RE

SP

ON

SE

LOG

DO

NE

Per-Request FSMs

Type-specific Handlers

Static ISAPI CGI

aspnet_isapi.dll

“ECB”

Common Language Runtime

CLR

aspnet_isapi.dll

Application

TCP/IP

Winsock

ULATQ

I/O Compn Port

ISAPI

Kernel

HA

ND

LE

HTTPAPI

ASP.Net ADO.Net Winsock

W3_MAIN_CONTEXT

User

Main W3 Threadpool

Page 9: Magpie :  Distributed Profiling for Performance Analysis

Low Level Profiling

Kernel Tracing

• Windows XP has very efficient low-level event tracing built in to the kernel (“ETW”)

• Command-line tools for turning on or off tracing of specific system activities

• Magpie uses ETW on each machine to capture– Every context switch– File IO– Disk IO– Network send and receive completions– Process and thread creation and deletion– Page faults

• Overhead is surprisingly small! (~8MB log for 5 mins)

Page 10: Magpie :  Distributed Profiling for Performance Analysis

Stitching together Asynchronous I/O and Thread Synchronization

Detours– Public tool written by Galen Hunt (MSR)– Can be used to intercept calls to any DLL and record

their arguments etc.

• Magpie usage:– Intercept calls to Winsock2 API

• Allows communication to be followed (WSASend, WSARecv)

– Intercept the HTTPAPI• Follows the processing of HTTP requests by http.sys

– Intercept certain Win32 API calls• Observe thread synchronization (e.g. SetEvent, Waits)• Follow async I/O completions (GetQueuedCompletionStatus)• Directly observe threadpool.

Page 11: Magpie :  Distributed Profiling for Performance Analysis

Higher Level AnnotationsISAPI filter

– Extension DLLs loaded into IIS (web server) process– Sees all HTTP requests – Filter registers with IIS to receive particular event notifications– Can examine and modify both incoming and outgoing streams of

data

• Magpie ISAPI filter:– Allocates a unique identifier to each incoming request and adds it to

the HTTP headers– Generates additional trace events on entry and exit

• Records cycle counter, request ID & resource usage

– “Thread 274 is now working on Request CCC00017”

Page 12: Magpie :  Distributed Profiling for Performance Analysis

Following the active content

HTTPModule

– Plugins for ASP.NET extensibility– Sees all active requests– Executes in the “managed” (CLR) environment– Each active request is processed by multiple

HTTPModules, e.g. session, authentication, etc.

• Magpie HTTPModule:– Stashes request identifier in (managed) thread local state– Records cycle counter, managed thread id, request ID &

resource usage

Page 13: Magpie :  Distributed Profiling for Performance Analysis

Accounting Database Activity

SQL Profiler– Standard part of SQL Server 2000 distribution– Logs selected events to table or file

• (events can be user defined)

• Magpie SQL Profiling– “Wraps” original stored procedures using reflection– Wrappers generate user-defined profiler events

before and after executing the original stored proc.• Recorded by the SQL Profiler in output trace• Data includes request ID, cycle counter + resource usage

– Uses extended stored procedure to get cycle counter + resource usage stats

Page 14: Magpie :  Distributed Profiling for Performance Analysis

Intercepting Outgoing RPCs

Common Language Runtime Profiler

• Two COM interfaces:– ICorProfiler: provides notifications, e.g. function enter/leave,

module/class load, thread mappings, .Net remoting, JIT compilation– ICorProfilerCallback: Runtime provides API which allows profiler to

examine and modify VM state.

• Magpie CLR Profiler:– Monitors CLROS thread mappings

• Records thread ids, cycle counter + resource usage– Intercepts JIT compilation of relevant ADO.NET classes/methods

• Rewrites byte-code (IL) to insert calls to our own profiling functions• Modifies SQL stored procedure invocations to call wrapped

versions and pass an extra argument (i.e. request ID)• Lots of fun but very hairy code!

Page 15: Magpie :  Distributed Profiling for Performance Analysis

SQL ServerWeb Server

Kernel Kernel

PerfInfo PerfInfo

ASP.NET

App Logic

AD

O.N

ET

CLR Profiler & IL Patcher

SQL Profiler

DBMS

StoredProcs

Wra

pper

s

Exten-ded SPs

User-defined events

Re-written

IL

Pkt Tap

TDSHTTP

http.sys

HTTP APIIntercept

IIS

ISA

PI F

ilter

HT

TP

Mod

ule

CLR

Pkt Tap

WinSock2 APIIntercept

Pkt Tap

WinSock2 APIIntercept

SQL ServerWeb Server

Kernel Kernel

PerfInfo

ASP.NET

App Logic

AD

O.N

ET

CLR Profiler & IL Patcher

SQL Profiler

DBMS

StoredProcs

Wra

pper

s

Exten-ded SPs

User-defined events

Re-written

IL

Pkt Capture

TDShttp.sys

HTTP APIIntercept

HTTP APIIntercept

IIS

ISA

PI F

ilter

HT

TP

Mod

ule

CLR

Pkt Capture

WinSock2 APIIntercept

WinSock2 APIIntercept

Pkt Capture

WinSock2 APIIntercept

WinSock2 APIIntercept

Putting it all together…

Page 16: Magpie :  Distributed Profiling for Performance Analysis

Raw Event Data – Yuk!...574530113665473L 157.58.60.98.3173 > 192.168.187.66.http: P 9987:10368(381) \\

ack 80470 win 64240 (DF) (ttl 127, id 27528, len 421)574530113724014 TcpReceive svchost.exe 11c 381 192.168.187.066:80 157.058.060.098:3173...574530114445762 f4c + HttpReceiveHttpRequest(ReqQueueHandle=1c0, RequestId=0, \\

Flags=1, pRequestBuffer=faae48, RequestBufferLength=9d0, pBytesReceived=0, pOverlapped=faae14) 0

574530114613968 f4c - HttpReceiveHttpRequest() -> 3e5574530114724562 f4c + GetQueuedCompletionStatus(1b8,f5ff84,f5ff8c,f5ff98,0)574530114836287 f4c - GetQueuedCompletionStatus(,,,,) -> 0 5a8d11fc 0574530114947322 f4c + GetQueuedCompletionStatus(1b8,f5ff84,f5ff8c,f5ff98,1b7740)574530115070456 CSwitch 0 w3wp.exe 658 f4c Waiting WrQueue System 4 e50 3913680574530115394668 CSwitch 0 System 4 e50 Ready - w3wp.exe 658 f4c 324212574530115456005 f4c - GetQueuedCompletionStatus(,,,,) -> 1 5a8d11fc fab854574530115683276 f4c ! HttpReceiveHttpRequest Ov: fab854 CID: 40000284e2000000 \\

ReqID: 40000284e2000000 LA: 192.168.187.66:80 RA: 157.58.60.98:3173 \\Bytes: 0 Flags: 0 Verb: 4 RawUrl: /duwamish7/categories.aspx?ID=843

574530115882433 12673200697.357422 ccc00663 f4c /duwamish7/categories.aspx?ID=843574530116116697 f4c + GetQueuedCompletionStatus(1b8,f5ff84,f5ff8c,f5ff98,0)...574530116959382 CSwitch 0 w3wp.exe 658 f4c Waiting WrQueue w3wp.exe 658 6dc 1564714574530117373185 12673200697.3574 ccc00663 6dc /duwamish7/categories.aspx?ID=843...574530119411731 6dc + send(s=4f0, buf=2739044, len=7f, flags=0)574530119656299L 192.168.187.66.4681 > 192.168.187.68.ms-sql-s: P 23578:23705(127) \\

ack 227230 win 16735 (DF) (ttl 128, id 10136, len 167)574530119721688 6dc - send() -> 7f...574530120169086 6dc + WSARecv(s=4f0, lpBuffers=499f4fc 16384, dwBufferCount=1, \\

lpNumberOfBytesRecvd=499f514, lpFlags=499f510, lpOverlapped=0 0, \\lpCompletionRoutine=0)

574530120325932 CSwitch 0 w3wp.exe 658 6dc Waiting UserRequest w3wp.exe 658 ec8 3366550...

incoming HTTP request

IIS worker thread picks up request from http.sys

IndyProf label

HttpModule label

ASP worker thread takes over

send request to SQL Server

blocking wait for reply from SQL Server

...574530113665473L 157.58.60.98.3173 > 192.168.187.66.http: P 9987:10368(381) \\

ack 80470 win 64240 (DF) (ttl 127, id 27528, len 421)574530113724014 TcpReceive svchost.exe 11c 381 192.168.187.066:80 157.058.060.098:3173...574530114445762 f4c + HttpReceiveHttpRequest(ReqQueueHandle=1c0, RequestId=0, \\

Flags=1, pRequestBuffer=faae48, RequestBufferLength=9d0, pBytesReceived=0, pOverlapped=faae14) 0

574530114613968 f4c - HttpReceiveHttpRequest() -> 3e5574530114724562 f4c + GetQueuedCompletionStatus(1b8,f5ff84,f5ff8c,f5ff98,0)574530114836287 f4c - GetQueuedCompletionStatus(,,,,) -> 0 5a8d11fc 0574530114947322 f4c + GetQueuedCompletionStatus(1b8,f5ff84,f5ff8c,f5ff98,1b7740)574530115070456 CSwitch 0 w3wp.exe 658 f4c Waiting WrQueue System 4 e50 3913680574530115394668 CSwitch 0 System 4 e50 Ready - w3wp.exe 658 f4c 324212574530115456005 f4c - GetQueuedCompletionStatus(,,,,) -> 1 5a8d11fc fab854574530115683276 f4c ! HttpReceiveHttpRequest Ov: fab854 CID: 40000284e2000000 \\

ReqID: 40000284e2000000 LA: 192.168.187.66:80 RA: 157.58.60.98:3173 \\Bytes: 0 Flags: 0 Verb: 4 RawUrl: /duwamish7/categories.aspx?ID=843

574530115882433 12673200697.357422 ccc00663 f4c /duwamish7/categories.aspx?ID=843574530116116697 f4c + GetQueuedCompletionStatus(1b8,f5ff84,f5ff8c,f5ff98,0)...574530116959382 CSwitch 0 w3wp.exe 658 f4c Waiting WrQueue w3wp.exe 658 6dc 1564714574530117373185 12673200697.3574 ccc00663 6dc /duwamish7/categories.aspx?ID=843...574530119411731 6dc + send(s=4f0, buf=2739044, len=7f, flags=0)574530119656299L 192.168.187.66.4681 > 192.168.187.68.ms-sql-s: P 23578:23705(127) \\

ack 227230 win 16735 (DF) (ttl 128, id 10136, len 167)574530119721688 6dc - send() -> 7f...574530120169086 6dc + WSARecv(s=4f0, lpBuffers=499f4fc 16384, dwBufferCount=1, \\

lpNumberOfBytesRecvd=499f514, lpFlags=499f510, lpOverlapped=0 0, \\lpCompletionRoutine=0)

574530120325932 CSwitch 0 w3wp.exe 658 6dc Waiting UserRequest w3wp.exe 658 ec8 3366550...

incoming HTTP request packet

IIS worker thread picks up request from http.sys

IndyProf label

ASP worker thread takes over

send request to SQL Server

blocking wait for reply from SQL Server

Logs sortedtogether usingcycle counter

(async I/O Completion)

Page 17: Magpie :  Distributed Profiling for Performance Analysis

Visualisation Tools – PowerPoint Macros!

WEB.f4c

WEB.6dc

Disk

Net RX

Net TX

48.48224443s

Net TX

Net RX

Disk

SQL.a34

48.48224443s

-Get

Que

uedC

omp

!Http

Rece

iveH

t

Indy

Prof

Req

+G

etQ

ueue

dCom

p

+G

etQ

ueue

dCom

p

Http

Mod

Begi

n

+sen

d

-sen

d

+W

SARe

cv-W

SARe

cv

SQL:

Star

tPro

f

HTTP Request packet from client

IIS worker thread picks up request from http.sys

IndyProf label

HttpModule labelSync WinSock send to DB

TDS request packet sent and received

SQL thread unblocks

SQL profiler trace label

WEB.f4c

WEB.6dc

Disk

Net RX

Net TX

48.48224443s

Net TX

Net RX

Disk

SQL.a34

48.48224443s

-Get

Que

uedC

omp

!Http

Rece

iveH

t

Indy

Prof

Req

+G

etQ

ueue

dCom

p

+G

etQ

ueue

dCom

p

Http

Mod

Begi

n

+sen

d

-sen

d

+W

SARe

cv-W

SARe

cv

SQL:

Star

tPro

f

HTTP Request packet from client

IIS worker thread picks up request from http.sys

IndyProf label

HttpModule labelSync WinSock send to DB

TDS request packet sent and received

SQL thread unblocks

SQL profiler trace label

Part of the visualisation of Transaction ccc00663:/duwamish7/categories.asp?ID=843

blocked IIS ASP.NET SQLKEY: Disk Unaccounted

Time/s

Page 18: Magpie :  Distributed Profiling for Performance Analysis

Transaction ccc000b9: /duwamish7/categories.aspx?ID=831

WEB.eec

WEB.398

Disk

Net RX

Net TX

10.051s

Net TX

Net RX

Disk

SQL.9c4

10.051s

!Htt

pR

ece

iveH

tIn

dyP

rofR

eq

Htt

pM

od

Beg

in

+se

nd

-sen

d+

WS

AR

ecv

-WS

AR

ecv

+se

nd

-sen

d+

WS

AR

ecv

-WS

AR

ecv

+se

nd

-sen

d+

WS

AR

ecv

-WS

AR

ecv

-WS

AR

ecv

SQ

L:S

tart

Pro

f

SQ

L:E

nd

Pro

f

-WS

AR

ecv

SQ

L:S

tart

Pro

f

SQ

L:E

nd

Pro

f

-WS

AR

ecv

SQ

L:S

tart

Pro

f

SQ

L:E

nd

Pro

f

10.100s

10.100s

Blocked IIS ASP.NET SQLKEY: Disk Other

Page 19: Magpie :  Distributed Profiling for Performance Analysis

Interleaving of Simultaneous Requests

WEB.cc8

WEB.cc4

WEB.b90

WEB.ad8

WEB.4b0

Disk

Net RX

Net TX

28.84s

Net TX

Net RX

Disk

SQL.6cc

SQL.e84

28.84s

Transactions between 28.8s and 31.0s

(Each colour is a different transaction, grey = blocked)

Page 20: Magpie :  Distributed Profiling for Performance Analysis

“An assortment of magical tastes”…

WEB.eec

WEB.398

Disk

Net RX

Net TX

10.0513824767s 10.1559640767s

Net TX

Net RX

Disk

SQL.9c4

10.0513824767s 10.1559640767s

Transaction ccc000b9:/duwamish7/categories.aspx?ID=831

!Htt

pRec

eive

Ht

Indy

Prof

Req

Htt

pMod

Beg

in

+sen

d-s

end

+W

SAR

ecv

-WSA

Rec

v

+sen

d-s

end

+W

SAR

ecv

-WSA

Rec

v

+sen

d-s

end

+W

SAR

ecv

-WSA

Rec

v

Htt

pMod

End

+H

ttpS

endR

espo

-Htt

pSen

dRes

po Indy

Prof

Res

p

-WSA

Rec

v

SQL:

Star

tPro

f

SQL:

End

Prof

-WSA

Rec

vSQ

L:St

artP

rof

SQL:

End

Prof

-WSA

Rec

vSQ

L:St

artP

rof

SQL:

End

Prof

WEB.eec

WEB.398

Disk

Net RX

Net TX

10.0513824767s 10.1559640767s

Net TX

Net RX

Disk

SQL.9c4

10.0513824767s 10.1559640767s

Transaction ccc000b9:/duwamish7/categories.aspx?ID=831

!Htt

pRec

eive

Ht

Indy

Prof

Req

Htt

pMod

Beg

in

+sen

d-s

end

+W

SAR

ecv

-WSA

Rec

v

+sen

d-s

end

+W

SAR

ecv

-WSA

Rec

v

+sen

d-s

end

+W

SAR

ecv

-WSA

Rec

v

Htt

pMod

End

+H

ttpS

endR

espo

-Htt

pSen

dRes

po Indy

Prof

Res

p

-WSA

Rec

v

SQL:

Star

tPro

f

SQL:

End

Prof

-WSA

Rec

vSQ

L:St

artP

rof

SQL:

End

Prof

-WSA

Rec

vSQ

L:St

artP

rof

SQL:

End

Prof

WEB.a18

WEB.848

Disk

Net RX

Net TX

22.7645571867s 22.8266142967s

Net TX

Net RX

Disk

22.7645571867s 22.8266142967s

Transaction ccc000ba:/duwamish7/book.aspx?ID=37816

-Get

Que

uedC

omp

!Htt

pRec

eive

Ht

+W

aitF

orSi

ngle

-Wai

tFor

Sing

leIn

dyPr

ofR

eq+W

aitF

orSi

ngle

-Wai

tFor

Sing

le

+Wai

tFor

Sing

le-W

aitF

orSi

ngle

+Get

Que

uedC

omp

-Get

Que

uedC

omp

+Set

Eve

nt

Con

text

Switc

hI-S

etE

vent

Htt

pMod

Beg

in

+Wai

tFor

Sing

le-W

aitF

orSi

ngle

+Wai

tFor

Sing

le-W

aitF

orSi

ngle

Con

text

Switc

hI+W

aitF

orSi

ngle

-Wai

tFor

Sing

leH

ttpM

odE

nd+W

aitF

orSi

ngle

-Wai

tFor

Sing

le

+Wai

tFor

Sing

le-W

aitF

orSi

ngle

+H

ttpS

endR

espo

+Get

Que

uedC

omp

-Get

Que

uedC

omp

+Get

Que

uedC

omp

Con

text

Switc

hI

+Get

Que

uedC

omp

-Get

Que

uedC

omp

+Get

Que

uedC

omp

Con

text

Switc

hI-H

ttpS

endR

espo

+Pos

tQue

uedC

om-P

ostQ

ueue

dCom

-Get

Que

uedC

omp

+Wai

tFor

Sing

le-W

aitF

orSi

ngle

Indy

Prof

Res

p+W

aitF

orSi

ngle

-Wai

tFor

Sing

le

+Wai

tFor

Sing

le-W

aitF

orSi

ngle

+H

ttpS

endR

espo

-Htt

pSen

dRes

po+

Htt

pRec

eive

Ht

-Htt

pRec

eive

Ht

+Get

Que

uedC

omp

-Get

Que

uedC

omp

+Get

Que

uedC

omp

Con

text

Switc

hI

WEB.a18

Disk

Net RX

Net TX

22.5109738067s 22.5302513033s

Net TX

Net RX

Disk

22.5109738067s 22.5302513033s

Transaction ccc000b3:/duwamish7/images/banner/line.gif

-Get

Que

uedC

omp

!Htt

pRec

eive

Ht

+W

aitF

orSi

ngle

-Wai

tFor

Sing

le

Indy

Prof

Req

+W

aitF

orSi

ngle

-Wai

tFor

Sing

le

+W

aitF

orSi

ngle

-Wai

tFor

Sing

le

+H

ttpS

endH

ttpR

Con

text

Sw

itch

I

-Htt

pSen

dHtt

pR

+Get

Que

uedC

omp

-Get

Que

uedC

omp

+W

aitF

orSi

ngle

-Wai

tFor

Sing

le

Indy

Prof

Res

p+W

aitF

orSi

ngle

-Wai

tFor

Sing

le

+W

aitF

orSi

ngle

-Wai

tFor

Sing

le

+H

ttpS

endR

espo

-Htt

pSen

dRes

po

+H

ttpR

ecei

veH

t

-Htt

pRec

eive

Ht

+Get

Que

uedC

omp

-Get

Que

uedC

omp

+Get

Que

uedC

omp

Con

text

Sw

itch

I

WEB.6dc

WEB.474

Disk

Net RX

Net TX

18.3357530767s 18.4334278667s

Net TX

Net RX

Disk

SQL.9d8

18.3357530767s 18.4334278667s

Transaction ccc001a6:/duwamish7/categories.aspx?ID=836

-Get

Que

ued

Com

p

!Htt

pRec

eive

Ht

IndyP

rofR

eq

+G

etQ

ueue

dC

omp

-Get

Que

ued

Com

p+G

etQ

ueue

dC

omp

Htt

pM

odB

egin

+se

nd

-sen

d

+W

SA

Rec

v

Con

text

Sw

itch

I-W

SA

Rec

v

Con

text

Sw

itch

I

Con

text

Sw

itch

I

+se

nd

-sen

d

+W

SA

Rec

v

Con

text

Sw

itch

I-W

SA

Rec

v

+se

nd

-sen

d

+W

SA

Rec

v

Con

text

Sw

itch

I-W

SA

Rec

v

Con

text

Sw

itch

I

Htt

pM

odE

nd

+H

ttpS

endR

espo

Con

text

Sw

itch

I-H

ttpS

endR

espo

+Pos

tQue

ued

Com

-Pos

tQue

ued

Com

-Get

Que

ued

Com

p

IndyP

rofR

esp

+H

ttpS

endR

espo

-Htt

pSen

dRes

po

+H

ttpR

ecei

veH

t-H

ttpR

ecei

veH

t+G

etQ

ueue

dC

omp

-Get

Que

ued

Com

p+G

etQ

ueue

dC

omp

Con

text

Sw

itch

I

-WSA

Rec

v

SQL:S

tart

Pro

fC

onte

xtS

wit

chI

Con

text

Sw

itch

IC

onte

xtS

wit

chI

Con

text

Sw

itch

IC

onte

xtS

wit

chI

Con

text

Sw

itch

I

SQL:E

ndP

rof

-WSA

Rec

vSQ

L:S

tart

Pro

fC

onte

xtS

wit

chI

Con

text

Sw

itch

IC

onte

xtS

wit

chI

Con

text

Sw

itch

IC

onte

xtS

wit

chI

Con

text

Sw

itch

I

SQL:E

ndP

rof

-WSA

Rec

vSQ

L:S

tart

Pro

fC

onte

xtS

wit

chI

Con

text

Sw

itch

IC

onte

xtS

wit

chI

Con

text

Sw

itch

IC

onte

xtS

wit

chI

Con

text

Sw

itch

I

SQL:E

ndP

rof

WEB.f4c

WEB.6dc

Disk

Net RX

Net TX

49.0598638833s 49.13234228s

Net TX

Net RX

Disk

SQL.a34

49.0598638833s 49.13234228s

Transaction ccc0067f:/duwamish7/book.aspx?ID=58

-Get

Que

ued

Com

p

!Htt

pRec

eive

Ht

IndyP

rofR

eq

Con

text

Sw

itch

I

Htt

pM

odB

egin

+se

nd

-sen

d

+W

SAR

ecv

Con

text

Sw

itch

I

+G

etQ

ueue

dC

omp

-Get

Que

ued

Com

p+G

etQ

ueue

dC

omp

Con

text

Sw

itch

I-W

SAR

ecv

Htt

pM

odE

nd

+H

ttpS

endR

espo

Con

text

Sw

itch

I

Con

text

Sw

itch

I-H

ttpS

endR

espo

+Pos

tQue

ued

Com

-Pos

tQue

ued

Com

-Get

Que

ued

Com

p

IndyP

rofR

esp

+H

ttpS

endR

espo

Con

text

Sw

itch

I-H

ttpS

endR

espo

+H

ttpR

ecei

veH

t

-Htt

pRec

eive

Ht

+G

etQ

ueue

dC

omp

-WSA

Rec

v

SQL:S

tart

Pro

fC

onte

xtSw

itch

IC

onte

xtSw

itch

IC

onte

xtSw

itch

IC

onte

xtSw

itch

IC

onte

xtSw

itch

IC

onte

xtSw

itch

I

SQL:E

ndPro

f

blocked IIS ASP.NET SQLKEY: Disk Other

WEB.a18

WEB.848

Disk

Net RX

Net TX

22.6001556s 22.8630055267s

Net TX

Net RX

Disk

SQL.130

22.6001556s 22.8630055267s

Transaction ccc000b9:/duwamish7/book.aspx?ID=37816

-GetQ

ueu

ed

Com

p!H

ttp

Receiv

eH

t+W

ait

ForS

ing

le-W

ait

ForS

ing

leIn

dyP

rofR

eq

+Wait

ForS

ing

le-W

ait

ForS

ing

le+W

ait

ForS

ing

le-W

ait

ForS

ing

le+ S

etE

ven

tC

on

textS

wit

ch

I- SetE

ven

tH

ttp

Mod

Beg

in+W

ait

ForS

ing

le-W

ait

ForS

ing

le+W

ait

ForS

ing

le-W

ait

ForS

ing

le

+Wait

ForS

ing

leC

on

textS

wit

ch

I-W

ait

ForS

ing

le+W

ait

ForS

ing

le-W

ait

ForS

ing

le+

sen

d- s

en

d+ W

SA

Recv

+GetQ

ueu

ed

Com

p-G

etQ

ueu

ed

Com

p+G

etQ

ueu

ed

Com

p

Con

textS

wit

ch

I- W

SA

Recv

+Wait

ForS

ing

le-W

ait

ForS

ing

leC

on

textS

wit

ch

I

Dis

kR

ead

Con

textS

wit

ch

I

Dis

kR

ead

Con

textS

wit

ch

I

Dis

kR

ead

Con

textS

wit

ch

I

+Wait

ForS

ing

le-W

ait

ForS

ing

leH

ttp

Mod

En

d+W

ait

ForS

ing

le-W

ait

ForS

ing

le+W

ait

ForS

ing

le-W

ait

ForS

ing

le+ H

ttp

Sen

dR

esp

o

Con

textS

wit

ch

I- H

ttp

Sen

dR

esp

o+P

ost

Qu

eu

ed

Com

-Post

Qu

eu

ed

Com

-GetQ

ueu

ed

Com

p+W

ait

ForS

ing

le-W

ait

ForS

ing

leIn

dyP

rofR

esp

+Wait

ForS

ing

le-W

ait

ForS

ing

le+W

ait

ForS

ing

le-W

ait

ForS

ing

le+H

ttp

Sen

dR

esp

o- H

ttp

Sen

dR

esp

o+ H

ttp

Receiv

eH

t- H

ttp

Receiv

eH

t+G

etQ

ueu

ed

Com

p

SQ

L:S

tart

Pro

f+W

ait

ForS

ing

le-W

ait

ForS

ing

le+W

ait

ForS

ing

le-W

ait

ForS

ing

leC

on

textS

wit

ch

I+ S

etE

ven

t- SetE

ven

tC

on

textS

wit

ch

I+ S

etE

ven

t- SetE

ven

tC

on

textS

wit

ch

I+ S

etE

ven

t- SetE

ven

t+W

ait

ForS

ing

le-W

ait

ForS

ing

leC

on

textS

wit

ch

I+ S

etE

ven

t- SetE

ven

tC

on

textS

wit

ch

I+ S

etE

ven

t- SetE

ven

tC

on

textS

wit

ch

I+W

ait

ForS

ing

le

Dis

kR

ead

Con

textS

wit

ch

I-W

ait

ForS

ing

le+W

ait

ForS

ing

le

Dis

kR

ead

Con

textS

wit

ch

I-W

ait

ForS

ing

le+W

ait

ForS

ing

le

Dis

kR

ead

Con

textS

wit

ch

I-W

ait

ForS

ing

le+ S

etE

ven

t- SetE

ven

tS

QL

:En

dP

rof

Page 21: Magpie :  Distributed Profiling for Performance Analysis

How similar are two requests?

WEB.6dc

WEB.474

Disk

Net RX

Net TX

18.3357530767s 18.4334278667s

Net TX

Net RX

Disk

SQL.9d8

18.3357530767s 18.4334278667s

Transaction ccc001a6:/duwamish7/categories.aspx?ID=836

-Get

Que

uedC

omp

!Htt

pRec

eive

Ht

Indy

Pro

fReq

+G

etQ

ueue

dCom

p-G

etQ

ueue

dCom

p+G

etQ

ueue

dCom

p

Htt

pMod

Beg

in

+se

nd

-sen

d

+W

SAR

ecv

Con

text

Sw

itch

I-W

SAR

ecv

Con

text

Sw

itch

I

Con

text

Sw

itch

I

+se

nd

-sen

d

+W

SAR

ecv

Con

text

Sw

itch

I-W

SAR

ecv

+se

nd

-sen

d

+W

SAR

ecv

Con

text

Sw

itch

I-W

SAR

ecv

Con

text

Sw

itch

I

Htt

pMod

End

+H

ttpS

endR

espo

Con

text

Sw

itch

I-H

ttpS

endR

espo

+Pos

tQue

uedC

om-P

ostQ

ueue

dCom

-Get

Que

uedC

omp

Indy

Pro

fRes

p+

Htt

pSen

dRes

po

-Htt

pSen

dRes

po+

Htt

pRec

eive

Ht

-Htt

pRec

eive

Ht

+G

etQ

ueue

dCom

p-G

etQ

ueue

dCom

p+G

etQ

ueue

dCom

p

Con

text

Sw

itch

I

-WSA

Rec

vSQ

L:S

tart

Pro

fC

onte

xtSw

itch

IC

onte

xtSw

itch

IC

onte

xtSw

itch

IC

onte

xtSw

itch

IC

onte

xtSw

itch

IC

onte

xtSw

itch

I

SQL:E

ndPro

f

-WSA

Rec

vSQ

L:S

tart

Pro

fC

onte

xtSw

itch

IC

onte

xtSw

itch

IC

onte

xtSw

itch

IC

onte

xtSw

itch

IC

onte

xtSw

itch

IC

onte

xtSw

itch

I

SQL:E

ndPro

f

-WSA

Rec

vSQ

L:S

tart

Pro

fC

onte

xtSw

itch

IC

onte

xtSw

itch

IC

onte

xtSw

itch

IC

onte

xtSw

itch

IC

onte

xtSw

itch

IC

onte

xtSw

itch

I

SQL:E

ndPro

f

WEB.eec

WEB.398

Disk

Net RX

Net TX

10.0513824767s 10.1559640767s

Net TX

Net RX

Disk

SQL.9c4

10.0513824767s 10.1559640767s

Transaction ccc000b9:/duwamish7/categories.aspx?ID=831

!Htt

pRec

eive

Ht

Indy

Pro

fReq

Htt

pMod

Beg

in

+se

nd

-sen

d+W

SAR

ecv

-WSA

Rec

v

+se

nd

-sen

d+W

SAR

ecv

-WSA

Rec

v

+se

nd

-sen

d+W

SAR

ecv

-WSA

Rec

v

Htt

pMod

End

+H

ttpS

endR

espo

-Htt

pSen

dRes

po Indy

Pro

fRes

p

-WSA

Rec

v

SQL:S

tart

Pro

f

SQL:E

ndPro

f

-WSA

Rec

vSQ

L:S

tart

Pro

f

SQL:E

ndPro

f

-WSA

Rec

vSQ

L:S

tart

Pro

f

SQL:E

ndPro

f

WEB.eec

WEB.398

Disk

Net RX

Net TX

10.0513824767s 10.1559640767s

Net TX

Net RX

Disk

SQL.9c4

10.0513824767s 10.1559640767s

Transaction ccc000b9:/duwamish7/categories.aspx?ID=831

!Htt

pRec

eive

Ht

Indy

Pro

fReq

Htt

pMod

Beg

in

+se

nd

-sen

d+W

SAR

ecv

-WSA

Rec

v

+se

nd

-sen

d+W

SAR

ecv

-WSA

Rec

v

+se

nd

-sen

d+W

SAR

ecv

-WSA

Rec

v

Htt

pMod

End

+H

ttpS

endR

espo

-Htt

pSen

dRes

po Indy

Pro

fRes

p

-WSA

Rec

v

SQL:S

tart

Pro

f

SQL:E

ndPro

f

-WSA

Rec

vSQ

L:S

tart

Pro

f

SQL:E

ndPro

f

-WSA

Rec

vSQ

L:S

tart

Pro

f

SQL:E

ndPro

f

Page 22: Magpie :  Distributed Profiling for Performance Analysis

Mining Some Structure…

Issues include:

• Multiple clocks (at least one per machine)• Lots of concurrency• Only partial orders provided by network traffic• Noisy observations – preemptive scheduling• Aperiodic sampling – irregular “events”

Current approach…• Borrowing algorithms from gene-sequence

comparison and speech recognition• Construct a “string” representation of traces• Cluster using variant of String Edit Distance

Page 23: Magpie :  Distributed Profiling for Performance Analysis

Levenshtein String Edit Distance

• Example:

• Can be computed in O(|s1|*|s2|) time using simple dynamic programming algorithm

appropriate m-eaning||||| ||||| |||approximate matching

d(s1,s2)=7

• d('', '') = 0 • d(s, '') = d('', s) = |s| • d(s1+ch1, s2+ch2) = min( d(s1, s2) + (ch1==ch2 ? 0 : 1), d(s1+ch1, s2) + 1, d(s1, s2+ch2) + 1 )

Page 24: Magpie :  Distributed Profiling for Performance Analysis

Partial ordering using packets… HTTPREQ 46265 | -GetQueuedCompletionStatus a18 | !HttpReceiveHttpRequest a18 | IndyProfReq a18 | +GetQueuedCompletionStatus a18 | CPU 4178433 Block | -------------------------------| HttpModBegin dc0 | +ProcessRequestMain | +OnStateChange | -OnStateChange | +FlushBuffer | +send dc0 | -------------------------------+------------------------------- TDSREQ 23389 = 23389 TDSREQ -send dc0 | Unblock -FlushBuffer | SQL:StartProf 87c +ReadNetlib | CPU 31605 Block +WSARecv dc0 | Unblock CPU 5038985 Block | CPU 31360 Block ACK 23390 | Unblock Unblock | +WaitForSingleObject 87c CPU 13112 Block | -WaitForSingleObject 87c | +SetEvent 87c | CPU 33897 Block | Unblock | DiskRead 87c | CPU 4020880 | +SetEvent 87c | SQL:EndProf 87c | CPU 182984 --------------------------------------------------------------- TDSRESP 8819 = 8819 TDSRESP Unblock | -WSARecv dc0 | -ReadNetlib | +OnStateChange | -OnStateChange | -ProcessRequestMain | HttpModEnd dc0 | +HttpSendRespnseEntityBody dc0 | HTTPRESP 23391 | HTTPRESP 23392 | CPU 4179107 Block | ACK 46296 | HTTPRESP 23393 | HTTPRESP 23394 | HTTPRESP 23395 | ACK 46297 | HTTPRESP 23396 | HTTPRESP 23397 | HTTPRESP 23398 | ACK 46298 | HTTPRESP 23399 | HTTPRESP 23400 | ACK 46299 | ACK 46300 | Unblock | -HttpSendResponseEntityBody dc0 | +PostQueuedCompletionStatus dc0 | -PostQueuedCompletionStatus dc0 | CPU 832046 Block | -GetQueuedCompletionStatus a18 | IndyProfResp a18 | +HttpSendResponseEntityBody a18 | -HttpSendResponseEntityBody a18 | +HttpReceiveHttpRequest(A) a18 | -HttpReceiveHttpRequest a18 | +GetQueuedCompletionStatus a18 | CPU 2089400 Block | Unblock | ACK 23413 |

Web Server

SQL Server

Page 25: Magpie :  Distributed Profiling for Performance Analysis

Example alphabet for trace strings…

Page 26: Magpie :  Distributed Profiling for Performance Analysis

A Distance Metric for Magpie Traces

1. Assign each Magpie instrumentation point a discrete label

2. Each trace entry has an 8-tuple of resource usage deltas

– (Web CPU, Web DISK, WAN Rx, WAN Tx, LAN Rx, LAN Tx, SQL CPU, SQL DISK)

3. Deterministically flatten the partial order into a total order

4. Consider as a string of ‘weighted characters’, where weight is the length of observation vector:– e.g. !1 (0 {1 >5 [1 ]0 <1 }4 B0 b4 $0 )0 B0 b0 Q0 q0

5. Extend string-edit-distance to use normalised Euclidian distance between between observation ‘vectors’– Insert/delete cost = ||v|| Substitution cost = ||v1-v2||“distance of point

from origin”“distance between two

points in 8D space

Page 27: Magpie :  Distributed Profiling for Performance Analysis

Example String Edit Distances…

Page 28: Magpie :  Distributed Profiling for Performance Analysis

Trivial Clustering Algorithm…

• Doesn’t need to be very fancy… yet!• Uses a ‘representative’ trace as cluster centroid

– Pick the best of 5 as in quicksort• Compute distance from each trace to each

cluster centroid– Add a “best-so-far” threshold to string-edit algorithm– (dynamic programming algs are monotonic)

• Compare inter/intra-cluster mean distances to decide when to create a new cluster

• Periodically move ‘singleton’ clusters to an outliers list and try to merge back in at the end

• Approx O(N * C) - where C is #clusters

Page 29: Magpie :  Distributed Profiling for Performance Analysis

Typical clusters…Centroid: !1(0R0r0)1B0b0Q0q00.000000 static_ccc006c9 !1(0R0r0)1B0b0Q0q00.088417 static_ccc00168 !1(0R0r0)1B0b0Q0q00.003326 static_ccc00618 !1(0R0r0)1B0b0Q0q00.060546 static_ccc006c3 !1(0R0r0)1B0b0Q0q00.013645 static_ccc00616 !1(0R0r0)1B0b0Q0q00.032970 static_ccc006c5 !1(0R0r0)0B0b0Q0q00.048854 static_ccc006c4 !1(0R0r0)1B0b0Q0q00.043589 static_ccc006c7 !1(0R1r0)1B0b0Q0q00.106195 static_ccc0038e !1(0R0r0)1B0b0Q0q00.057646 static_ccc00043 !1(0R0r0)1B0b0Q0q00.025930 static_ccc00556 !1(0R0r0)1B0b0Q0q00.008057 static_ccc0038f !1(0R0r0)1B0b0Q0q0...

Centroid: !7(0{2>17[3]0<0}3B0b3$0)0B0b0Q0q00.000000 chkout_ccc0041c !7(0{2>18[3]0<0}3B0b3$0)0B0b0Q0q00.473224 chkout_ccc00092 !7(0{2>12[3]0<0}3B0b4$0)0B0b0Q0q00.273929 chkout_ccc00246 !7(0{2>16[3]0<0}3B0b4$0)1B0b0Q0q00.167314 chkout_ccc002a8 !7(0{2>17[3]0<0}3B0b3$0)0B0b0Q0q00.185258 chkout_ccc004be !7(0{2>17[3]0<0}3B0b3$0)0B0b0Q0q00.123095 chkout_ccc00675 !7(0{2>18[3]0<0}3B0b4$0)0B0b0Q0q00.100177 chkout_ccc005f4 !7(0{2>17[3]0<0}3B0b4$0)0B0b0Q0q00.318734 chkout_ccc001a5 !7(0{2>12[3]0<0}3B0b3$0)0B0b0Q0q00.049554 chkout_ccc00165 !7(0{2>17[3]0<0}3B0b3$0)0B0b0Q0q00.369109 chkout_ccc003ac !7(0{2>12[3]0<0}3B0b3$0)0B0b0Q0q0...

Centroid: !1(0{1>4[1]0<1}4B0b3$0)0B0b0Q0q00.000000 books_ccc00748 !1(0{1>5[1]0<1}4B0b4$0)0B0b0Q0q00.396412 logon_ccc005be !2(0{2>5[1]0<0}4B0b3$0)0B0b0Q0q00.273409 books_ccc005d4 !1(0{1>4[1]0<1}4B0b5$0)1B0b0Q0q00.069283 books_ccc006e7 !1(0{1>4[1]0<1}4B0b4$0)0B0b0Q0q00.103720 books_ccc00442 !1(0{1>4[1]0<1}4B0b3$0)0B0b0Q0q00.044051 books_ccc00040 !1(0{1>4[1]0<1}4B0b3$0)0B0b0Q0q00.208350 books_ccc000b9 !1(0{1>4[1]0<1}5B0b4$0)0B0b0Q0q00.147137 books_ccc002cf !1(0{1>4[1]0<1}4B0b4$0)0B0b0Q0q00.099524 books_ccc0082b !1(0{1>4[1]0<1}4B0b4$0)0B0b0Q0q00.244620 books_ccc003ed !1(0{1>7[1]0<1}5B0b4$0)0B0b0Q0q00.441445 logon_ccc001e0 !2(0{2>5[1]0<0}4B1b3$0)0B0b0Q0q00.116913 books_ccc00105 !1(0{1>4[1]0<1}4B0b4$0)0B0b0Q0q00.096342 books_ccc00686 !1(0{1>4[1]0<1}4B0b4$0)0B0b0Q0q0...

Centroid: !1(0{2}5B0b4$0)00.000000 books_ccc00301 !1(0{2}5B0b4$0)00.214118 books_ccc00548 !1(0{2}5B0b4$0)00.204555 books_ccc006fd !1(0{2}5B0b4$0)00.150912 books_ccc0079f !1(0{2}5B0b4$0)00.019864 books_ccc00873 !1(0{2}5B0b4$0)00.208676 books_ccc00484 !1(0{2}5B0b4$0)00.212472 books_ccc0029f !1(0{2}5B0b4$0)00.036158 books_ccc00842 !1(0{2}5B0b4$0)00.210166 books_ccc00517 !1(0{2}5B0b4$0)00.171975 books_ccc00589 !1(0{2}5B0b4$0)00.288412 books_ccc003d3 !1(0{4}5B0b4$0)00.217915 books_ccc0069c !1(0{2}5B0b4$0)00.238472 books_ccc0076e !1(0{2}5B0b3$0)0

Page 30: Magpie :  Distributed Profiling for Performance Analysis

Visualisation of all Clusters

Transaction ID

Tra

nsa

ctio

n ID

n2 pairwise distances

Dis

tan

ce

Page 31: Magpie :  Distributed Profiling for Performance Analysis

Just the ‘active content’…

Transaction ID

Tra

nsa

ctio

n ID

Dis

tan

ce

Page 32: Magpie :  Distributed Profiling for Performance Analysis

A First-cut Workload Model…

• Just use cluster centroids and sizes to generate ‘similar’ transaction mix?

– Pretty good at capturing coarse differences• e.g. Number of SQL requests

– Doesn’t deal with continuous distributions very well• e.g. Cache/memory performance, zipf file size distributions

– But is it better than just using URL-based averages?

• Evaluation metric:– Take the original trace, assume each transaction is replaced

by the centroid of its cluster and add up the RMS error.– Evaluate with and without the ‘outlier’ clusters.

Page 33: Magpie :  Distributed Profiling for Performance Analysis

Evaluation of Magpie cluster-based model

• Results: per-resource RMS errors (across all transactions):

0% 20% 40% 60% 80% 100% 120%

Web CPU

Web DISK

WAN Rx

WAN Tx

LAN Rx

LAN Tx

SQL CPU

SQL DISK

MAGPIE-O/L

MAGPIE

URL

Method Web CPU Web DISK WAN Rx WAN Tx LAN Rx LAN Tx SQL CPU SQL DISK

URLs 48546337 1358 266 1958 127 339 12331844 206902

MAGPIE 7106121 1165 131 2053 1.3 37.7 5040474 11214

MAGPIE-O/L 7218741 1183 132 2080 1.32 38.34 5120613 11392

• RMS error improvement over just using URLs:

Page 34: Magpie :  Distributed Profiling for Performance Analysis

Better Models using Bayesian Learning?

C Tx B B Rx C

B B Rx Tx B B

B

C,D

S2 S2 S3 S3 S3 S4

S1 S1 S2 S2 S3 S1 S1

S3

S1 S2

S3

Receive Pkt

Send Pkt

Compute,Disk IO

Waiting

S1 S2

S3S4

Blocked

Send Pkt

Process Req

(ignore fictitious details )time

Web

SQL

• Ongoing discussions with Michael Isard, Mike Tipping and Chris Bishop• Learn probabilistic models of resource usage by different request types• Construct the per-machine FSMs• Possibly apply coupled hidden Markov models (CHMMs)?

Page 35: Magpie :  Distributed Profiling for Performance Analysis

Current Status…

• Using Matlab & Bayes Network Toolkit (BNT)• Start by trying to fit simple HMM to just static

request cluster– One discrete hidden state, 8 continuous observed

variables with assumed Gaussian distns.– “Priors” computed from mean / var of observations

… unfortunately, none of the supplied learning algorithms converge claiming our data is “infinitely improbable”!

Page 36: Magpie :  Distributed Profiling for Performance Analysis

Ongoing Work

• Investigating better ways of extracting models from the data, esp. machine learning

• Transfer Magpie ideas/tools to the Indy team

• Use Magpie to learn parameters in the “live” system order to calibrate processor, memory system and cache models (more speculative)

• Exploring other types of distributed system, e.g. GXA / Web Services v2 async messaging