multiprocessor kernel performance profiling

41
March 12, 2001 Kperfmon-MP Multiprocessor Kernel Performance Profiling Alex Mirgorodskii [email protected] Computer Sciences Department University of Wisconsin 1210 W. Dayton Street Madison, WI 53706-1685 USA

Upload: avedis

Post on 07-Feb-2016

34 views

Category:

Documents


0 download

DESCRIPTION

Multiprocessor Kernel Performance Profiling. Alex Mirgorodskii [email protected] Computer Sciences Department University of Wisconsin 1210 W. Dayton Street Madison, WI 53706-1685 USA. Kperfmon: Overview. Specify a resource Almost any function or basic block in the kernel - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Multiprocessor Kernel Performance Profiling

March 12, 2001Kperfmon-MP

Multiprocessor Kernel Performance Profiling

Alex [email protected]

Computer Sciences Department

University of Wisconsin

1210 W. Dayton Street

Madison, WI 53706-1685

USA

Page 2: Multiprocessor Kernel Performance Profiling

– 2 –Kperfmon-MP March 12, 2001

Kperfmon: Overview

• Specify a resource – Almost any function or basic block in the kernel

• Apply a metric to the resource:– Number of entries to a function or basic block– Wall clock time, CPU time (virtual time)– All Sparc Hardware Counters: cache misses,

branch mispredictions, instructions per cycle, ...

• Visualize the metric data in real time

Page 3: Multiprocessor Kernel Performance Profiling

– 3 –Kperfmon-MP March 12, 2001

Kperfmon-MP: Goals

Modify uniprocessor Kperfmon to provide:

• Safe operation on SMP machines– Thread safety– Migration safety

• New feature: Per-CPU performance data– More detailed performance data– Reduce cache coherence traffic caused by the tool

Page 4: Multiprocessor Kernel Performance Profiling

– 4 –Kperfmon-MP March 12, 2001

Kperfmon: Technology

• No need for kernel recompilation– Works with stock SPARC Solaris 7 kernels– Supports both 32-bit and 64-bit kernels

• No need for rebooting– Important for 24 x 7 systems

• Use the KernInst framework to:– Insert measurement code in the kernel at run time– Sample accumulated metric values from the user

space periodically

Page 5: Multiprocessor Kernel Performance Profiling

– 5 –Kperfmon-MP March 12, 2001

Patch HeapPatch Heap Data HeapData Heap

Kernel SpaceKernel Space

InstrumentationrequestInstrumentationrequest

ioctl()ioctl()

/dev/kerninst/dev/kerninst

KperfmonKperfmonKperfmonKperfmon

Kperfmon System

KerninstdKerninstdKerninstdKerninstd

SamplingrequestSamplingrequest

VisisVisisVisisVisis

Page 6: Multiprocessor Kernel Performance Profiling

– 6 –Kperfmon-MP March 12, 2001

Kperfmon instrumentation

• Counter primitive– Number of entries to a function or a basic block

• Wall clock timer primitive– Real time spent in a function

• CPU timer primitive– Excludes time while the thread was switched-out– Can count more than just timer ticks

• All HW-counter metrics use this mechanism

Page 7: Multiprocessor Kernel Performance Profiling

– 7 –Kperfmon-MP March 12, 2001

tcp_lookup()

cnt

sethi hi(&cnt), r0 or r0, lo(&cnt), r0 ldx [r0], r1retry: add r1, 1, r2 casx [r0], r1, r2 cmp r1, r2 bne retry mov r2, r1 nop ba,a tcp_lookup+4

Code Patch AreaCode Patch Area

Data AreaData Area(entry)

Non-MP Counter primitive

• Atomic, thread-safe update

• Lightweight

• No register save/restore required

Relocated instruction

Page 8: Multiprocessor Kernel Performance Profiling

– 8 –Kperfmon-MP March 12, 2001

tcp_lookup()

stop timer

start timer

Code Patch AreaCode Patch Area

Data AreaData Area(entry)

(exit)

Non-MP Wall clock timer primitive

• Inclusive (includes time in callees)

• Keeps accumulating if switched-out

timer

Page 9: Multiprocessor Kernel Performance Profiling

– 9 –Kperfmon-MP March 12, 2001

tcp_lookup()tcp_lookup()

switch-out

switch-in

(entry)

(exit)

Non-MP CPU timer primitive

• Exclude the time spent while switched out

– Instrument context switch routines

• HW counter metrics are based on this mechanism

stop timer

start timer

pause timer

restart timer

context switchcontext switch List of paused timershead

free free

Page 10: Multiprocessor Kernel Performance Profiling

– 10 –Kperfmon-MP March 12, 2001

tcp_lookup()tcp_lookup()

switch-out

switch-in

(entry)

(exit)

stop timer

start timer

pause timer

restart timer

context switchcontext switch List of paused timershead

free free

Non-MP CPU timer primitive

• Exclude the time spent while switched out

– Instrument context switch routines

• HW counter metrics are based on this mechanism

Page 11: Multiprocessor Kernel Performance Profiling

– 11 –Kperfmon-MP March 12, 2001

tcp_lookup()tcp_lookup()

switch-out

switch-in

(entry)

(exit)

stop timer

start timer

pause timer

restart timer

context switchcontext switch List of paused timershead

free free

Non-MP CPU timer primitive

• Exclude the time spent while switched out

– Instrument context switch routines

• HW counter metrics are based on this mechanism

Page 12: Multiprocessor Kernel Performance Profiling

– 12 –Kperfmon-MP March 12, 2001

tcp_lookup()tcp_lookup()

switch-out

switch-in

(entry)

(exit)

stop timer

start timer

pause timer

restart timer

context switchcontext switch List of paused timershead

freetmr

Non-MP CPU timer primitive

• Exclude the time spent while switched out

– Instrument context switch routines

• HW counter metrics are based on this mechanism

Page 13: Multiprocessor Kernel Performance Profiling

– 13 –Kperfmon-MP March 12, 2001

tcp_lookup()tcp_lookup()

switch-out

switch-in

(entry)

(exit)

stop timer

start timer

pause timer

restart timer

context switchcontext switchhead

free

List of paused timers

tmr

Non-MP CPU timer primitive

• Exclude the time spent while switched out

– Instrument context switch routines

• HW counter metrics are based on this mechanism

Page 14: Multiprocessor Kernel Performance Profiling

– 14 –Kperfmon-MP March 12, 2001

tcp_lookup()tcp_lookup()

switch-out

switch-in

(entry)

(exit)

stop timer

start timer

pause timer

restart timer

context switchcontext switchhead

free free

List of paused timers

Non-MP CPU timer primitive

• Exclude the time spent while switched out

– Instrument context switch routines

• HW counter metrics are based on this mechanism

Page 15: Multiprocessor Kernel Performance Profiling

– 15 –Kperfmon-MP March 12, 2001

tcp_lookup()tcp_lookup()

switch-out

switch-in

(entry)

(exit)

stop timer

start timer

pause timer

restart timer

context switchcontext switchhead

free free

List of paused timers

Non-MP CPU timer primitive

• Exclude the time spent while switched out

– Instrument context switch routines

• HW counter metrics are based on this mechanism

Page 16: Multiprocessor Kernel Performance Profiling

– 16 –Kperfmon-MP March 12, 2001

tcp_lookup()tcp_lookup()

switch-out

switch-in

(entry)

(exit)

stop timer

start timer

pause timer

restart timer

context switchcontext switchhead

free free

List of paused timers

Non-MP CPU timer primitive

• Exclude the time spent while switched out

– Instrument context switch routines

• HW counter metrics are based on this mechanism

Page 17: Multiprocessor Kernel Performance Profiling

– 17 –Kperfmon-MP March 12, 2001

Kperfmon-MP: Goals

Modify uniprocessor Kperfmon to provide:

• Safe operation on SMP machines– Thread safety– Migration safety

• New feature: Per-CPU performance data– More detailed performance data– Reduce cache coherence traffic caused by the tool

Page 18: Multiprocessor Kernel Performance Profiling

– 18 –Kperfmon-MP March 12, 2001

ld [head], R1

add R1, 4, R1

st R1, [head]

Non-MP timer allocation routine

Thread Safety

• Used on switch-out to save the paused timers

• Context switch is serial on uniprocessors– No thread safety problems there

• Context switches may be concurrent on SMPs!– Multiple threads are being scheduled simultaneously– The allocation code is no longer safe

head

freetmr free free

Page 19: Multiprocessor Kernel Performance Profiling

– 19 –Kperfmon-MP March 12, 2001

MP timer allocation routine

Thread Safety

• Context switches may be concurrent on SMPs

• Use the atomic cas instruction to ensure safety

alloc:

ld [head], R1

add R1, 4, R2

cas [head], R1, R2

cmp R1, R2

bne alloc

head

freetmr free free

Page 20: Multiprocessor Kernel Performance Profiling

– 20 –Kperfmon-MP March 12, 2001

tcp_lookup()tcp_lookup()cnt-cpu0…

rd cpu#, r0ldx cnt[r0], r1add r1, 1, r2casx r2, cnt[r0]…

Code Patch AreaCode Patch Area Data AreaData Area

(entry)

Per-CPU performance data

• Instrumentation code is shared by all CPUs

• Per-CPU copies of the primitive’s data– Two copies are never placed in the same cache line

cnt-cpu1

cnt-cpu31

Page 21: Multiprocessor Kernel Performance Profiling

– 21 –Kperfmon-MP March 12, 2001

timer-cpu0

Data AreaData Area

timer-cpu1

Migration Between Primitives

• Wall timer started on CPU0, stopped on CPU1

• Counters and CPU timers are not affected

tcp_lookup()tcp_lookup()

switch-out

switch-in

(entry)

(exit)

stop timer

start timer

context switchcontext switch

CPU0

Page 22: Multiprocessor Kernel Performance Profiling

– 22 –Kperfmon-MP March 12, 2001

timer-cpu0

Data AreaData Area

timer-cpu1

Migration Between Primitives

• Wall timer started on CPU0, stopped on CPU1

• Counters and CPU timers are not affected

tcp_lookup()tcp_lookup()

switch-out

switch-in

(entry)

(exit)

stop timer

start timer

context switchcontext switch

CPU0

Page 23: Multiprocessor Kernel Performance Profiling

– 23 –Kperfmon-MP March 12, 2001

timer-cpu0

Data AreaData Area

timer-cpu1

Migration Between Primitives

• Wall timer started on CPU0, stopped on CPU1

• Counters and CPU timers are not affected

tcp_lookup()tcp_lookup()

switch-out

switch-in

(entry)

(exit)

stop timer

start timer

context switchcontext switch

CPU0

Page 24: Multiprocessor Kernel Performance Profiling

– 24 –Kperfmon-MP March 12, 2001

timer-cpu0

Data AreaData Area

timer-cpu1

Migration Between Primitives

• Wall timer started on CPU0, stopped on CPU1

• Counters and CPU timers are not affected

tcp_lookup()tcp_lookup()

switch-out

switch-in

(entry)

(exit)

stop timer

start timer

context switchcontext switch

CPU0

CPU1

Page 25: Multiprocessor Kernel Performance Profiling

– 25 –Kperfmon-MP March 12, 2001

timer-cpu0

Data AreaData Area

timer-cpu1

Migration Between Primitives

• Wall timer started on CPU0, stopped on CPU1

• Counters and CPU timers are not affected

tcp_lookup()tcp_lookup()

switch-out

switch-in

(entry)

(exit)

stop timer

start timer

context switchcontext switch

CPU0

CPU1

Page 26: Multiprocessor Kernel Performance Profiling

– 26 –Kperfmon-MP March 12, 2001

timer-cpu0

Data AreaData Area

timer-cpu1

Migration Between Primitives

• Wall timer started on CPU0, stopped on CPU1

• Counters and CPU timers are not affected

tcp_lookup()tcp_lookup()

switch-out

switch-in

(entry)

(exit)

stop timer

start timer

context switchcontext switch

CPU0

CPU1

Page 27: Multiprocessor Kernel Performance Profiling

– 27 –Kperfmon-MP March 12, 2001

Solution: virtualization

• Implement wall timers on top of CPU timers!

Data AreaData Area

tcp_lookup()tcp_lookup()

(entry)

(exit)

start timerCPU0

timer-cpu0

timer-cpu1

Page 28: Multiprocessor Kernel Performance Profiling

– 28 –Kperfmon-MP March 12, 2001

Solution: virtualization

• Implement wall timers on top of CPU timers!

Data AreaData Area

tcp_lookup()tcp_lookup()

(entry)

(exit)

start timerCPU0

timer-cpu0

timer-cpu1

Page 29: Multiprocessor Kernel Performance Profiling

– 29 –Kperfmon-MP March 12, 2001

Solution: virtualization

• Implement wall timers on top of CPU timers!

Data AreaData Area

tcp_lookup()tcp_lookup()

switch-out

(entry)

(exit)

start timer

context switchcontext switch

CPU0

timer-cpu0

timer-cpu1

pause timerrecord curr.

time

Page 30: Multiprocessor Kernel Performance Profiling

– 30 –Kperfmon-MP March 12, 2001

Solution: virtualization

• Implement wall timers on top of CPU timers!

Data AreaData Area

timer-cpu1

tcp_lookup()tcp_lookup()

switch-out

switch-in

(entry)

(exit)

start timer

context switchcontext switch

CPU0

CPU1

timer-cpu0

pause timerrecord curr.

time

add timeswitched-outrestart timer

Page 31: Multiprocessor Kernel Performance Profiling

– 31 –Kperfmon-MP March 12, 2001

Solution: virtualization

• Implement wall timers on top of CPU timers!

Data AreaData Area

timer-cpu1

tcp_lookup()tcp_lookup()

switch-out

switch-in

(entry)

(exit)

stop timer

start timer

context switchcontext switch

CPU0

CPU1

timer-cpu0

add timeswitched-outrestart timer

pause timerrecord curr.

time

Page 32: Multiprocessor Kernel Performance Profiling

– 32 –Kperfmon-MP March 12, 2001

Solution: virtualization

• Implement wall timers on top of CPU timers!

Data AreaData Area

timer-cpu1

tcp_lookup()tcp_lookup()

switch-out

switch-in

(entry)

(exit)

stop timer

start timer

context switchcontext switch

CPU0

CPU1

timer-cpu0

add timeswitched-outrestart timer

pause timerrecord curr.

time

Page 33: Multiprocessor Kernel Performance Profiling
Page 34: Multiprocessor Kernel Performance Profiling
Page 35: Multiprocessor Kernel Performance Profiling
Page 36: Multiprocessor Kernel Performance Profiling
Page 37: Multiprocessor Kernel Performance Profiling
Page 38: Multiprocessor Kernel Performance Profiling

– 38 –Kperfmon-MP March 12, 2001

Conclusion

• Techniques for correct MP profiling:– Atomic memory updates to ensure thread safety– Virtualized timers to handle thread migration

• Per-CPU data collection is important– Provides detailed performance information– Introduces fewer coherence cache misses

Page 39: Multiprocessor Kernel Performance Profiling

– 39 –Kperfmon-MP March 12, 2001

Future Work

• New metrics– Locality of CPU assignments– Per-thread performance data

• Formal verification of instrumentation code for migration/preemption problems

• Ports to other architectures and OS’es

Page 40: Multiprocessor Kernel Performance Profiling

– 40 –Kperfmon-MP March 12, 2001

http://www.cs.wisc.edu/paradynhttp://www.cs.wisc.edu/paradynhttp://www.cs.wisc.edu/paradynhttp://www.cs.wisc.edu/paradyn

The Big Picture

Page 41: Multiprocessor Kernel Performance Profiling

– 41 –Kperfmon-MP March 12, 2001

The Big Picture

• Demo: Wednesday, Room 6372Demo: Wednesday, Room 6372

• Available for download on requestAvailable for download on request– mailto: [email protected]: [email protected]

– Public release in AprilPublic release in April