mise: providing performance predictability in shared main memory systems

49
MISE: Providing Performance Predictability in Shared Main Memory Systems Lavanya Subramanian, Vivek Seshadri, Yoongu Kim, Ben Jaiyen, Onur Mutlu 1

Upload: joey

Post on 08-Feb-2016

18 views

Category:

Documents


0 download

DESCRIPTION

MISE: Providing Performance Predictability in Shared Main Memory Systems. Lavanya Subramanian , Vivek Seshadri , Yoongu Kim, Ben Jaiyen , Onur Mutlu. Main Memory Interference is a Problem. Main Memory. Core. Core. Core. Core. Unpredictable Application Slowdowns. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: MISE:  Providing Performance Predictability  in Shared Main Memory Systems

1

MISE: Providing Performance

Predictability in Shared Main Memory

SystemsLavanya Subramanian, Vivek Seshadri,

Yoongu Kim, Ben Jaiyen, Onur Mutlu

Page 2: MISE:  Providing Performance Predictability  in Shared Main Memory Systems

2

Main Memory Interference is a Problem

Main Memory

Core Core

Core Core

Page 3: MISE:  Providing Performance Predictability  in Shared Main Memory Systems

3

Unpredictable Application Slowdowns

leslie3d (core 0)

gcc (core 1)01

23456

Slow

dow

n

leslie3d (core 0)

mcf (core 1)01

23456

Slow

dow

nAn application’s performance depends on

which application it is running with

Page 4: MISE:  Providing Performance Predictability  in Shared Main Memory Systems

4

Need for Predictable Performance There is a need for predictable performance

When multiple applications share resources Especially if some applications require performance

guarantees

Example 1: In mobile systems Interactive applications run with non-interactive

applications Need to guarantee performance for interactive

applications

Example 2: In server systems Different users’ jobs consolidated onto the same

server Need to provide bounded slowdowns to critical jobs

Our Goal: Predictable performance in the presence of memory

interference

Page 5: MISE:  Providing Performance Predictability  in Shared Main Memory Systems

5

Outline1. Estimate Slowdown

Key Observations Implementation MISE Model: Putting it All Together Evaluating the Model

2. Control Slowdown

Page 6: MISE:  Providing Performance Predictability  in Shared Main Memory Systems

6

Outline1. Estimate Slowdown

Key Observations Implementation MISE Model: Putting it All Together Evaluating the Model

2. Control Slowdown Providing Soft Slowdown

Guarantees Minimizing Maximum Slowdown

Page 7: MISE:  Providing Performance Predictability  in Shared Main Memory Systems

7

Slowdown: Definition

Shared

Alone

ePerformanc ePerformanc Slowdown

Page 8: MISE:  Providing Performance Predictability  in Shared Main Memory Systems

8

Key Observation 1For a memory bound application,

Performance Memory request service rate

0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.30.40.50.60.70.80.91

omnetpp mcf

astar

Normalized Request Service Rate

Nor

mal

ized

Perf

orm

ance

Shared

Alone

Rate ServiceRequest Rate ServiceRequest Slowdown

Shared

Alone

ePerformanc ePerformanc Slowdown

Easy

Harder

Intel Core i7, 4 coresMem. Bandwidth: 8.5 GB/s

Page 9: MISE:  Providing Performance Predictability  in Shared Main Memory Systems

9

Key Observation 2Request Service Rate Alone (RSRAlone) of an application can be estimated by giving the

application highest priority in accessing memory

Highest priority Little interference(almost as if the application were run alone)

Page 10: MISE:  Providing Performance Predictability  in Shared Main Memory Systems

10

Key Observation 2

Request Buffer State Main

Memory

1. Run aloneTime units Service

orderMain

Memory

12

Request Buffer State Main

Memory

2. Run with another application Service

orderMain

Memory

123

Request Buffer State Main

Memory

3. Run with another application: highest priority Service

orderMain

Memory

123

Time units

Time units

3

Page 11: MISE:  Providing Performance Predictability  in Shared Main Memory Systems

11

Memory Interference-induced Slowdown Estimation (MISE) model for memory bound

applications

)(RSR Rate ServiceRequest )(RSR Rate ServiceRequest Slowdown

SharedShared

AloneAlone

Page 12: MISE:  Providing Performance Predictability  in Shared Main Memory Systems

12

Key Observation 3 Memory-bound application

No interference

Compute Phase

Memory Phase

With interference

Memory phase slowdown dominates overall slowdown

time

timeReq

Req

Req Req

Req Req

Page 13: MISE:  Providing Performance Predictability  in Shared Main Memory Systems

13

Key Observation 3 Non-memory-bound application

time

time

No interference

Compute Phase

Memory Phase

With interference

Only memory fraction ( ) slows down with interference

1

1

Shared

Alone

RSRRSR

Shared

Alone

RSRRSR ) - (1 Slowdown

Memory Interference-induced Slowdown Estimation (MISE) model for non-memory

bound applications

Page 14: MISE:  Providing Performance Predictability  in Shared Main Memory Systems

14

Outline1. Estimate Slowdown

Key Observations Implementation MISE Model: Putting it All Together Evaluating the Model

2. Control Slowdown Providing Soft Slowdown

Guarantees Minimizing Maximum Slowdown

Page 15: MISE:  Providing Performance Predictability  in Shared Main Memory Systems

15

Interval Based Operation

time

Interval

Estimate slowdown

Interval

Estimate slowdown

Measure RSRShared, Estimate RSRAlone

Measure RSRShared, Estimate RSRAlone

Page 16: MISE:  Providing Performance Predictability  in Shared Main Memory Systems

16

Measuring RSRShared and α Request Service Rate Shared (RSRShared)

Per-core counter to track number of requests serviced

At the end of each interval, measure

Memory Phase Fraction ( ) Count number of stall cycles at the core Compute fraction of cycles stalled for memory

Length IntervalServiced Requests ofNumber RSRShared

Page 17: MISE:  Providing Performance Predictability  in Shared Main Memory Systems

17

Estimating Request Service Rate Alone (RSRAlone) Divide each interval into shorter epochs

At the beginning of each epoch Memory controller randomly picks an

application as the highest priority application

At the end of an interval, for each application, estimate

PriorityHigh Given n Applicatio Cycles ofNumber EpochsPriority High During Requests ofNumber RSR

Alone

Goal: Estimate RSRAlone

How: Periodically give each application highest priority in

accessing memory

Page 18: MISE:  Providing Performance Predictability  in Shared Main Memory Systems

18

Inaccuracy in Estimating RSRAloneRequest Buffer

StateMain

Memory

Time units Service order

Main Memory

123

When an application has highest priority Still experiences some interference

Request Buffer State

Main Memory

Time units Service order

Main Memory

123

Time units Service order

Main Memory

123

Interference Cycles

High Priority

Main Memory

Time units Service order

Main Memory

123Request Buffer

State

Page 19: MISE:  Providing Performance Predictability  in Shared Main Memory Systems

19

Accounting for Interference in RSRAlone Estimation Solution: Determine and remove

interference cycles from RSRAlone calculation

A cycle is an interference cycle if a request from the highest priority application

is waiting in the request buffer and another application’s request was issued

previously

Cycles ceInterferen -Priority High Given n Applicatio Cycles ofNumber EpochsPriority High During Requests ofNumber RSR

Alone

Page 20: MISE:  Providing Performance Predictability  in Shared Main Memory Systems

20

Outline1. Estimate Slowdown

Key Observations Implementation MISE Model: Putting it All Together Evaluating the Model

2. Control Slowdown Providing Soft Slowdown

Guarantees Minimizing Maximum Slowdown

Page 21: MISE:  Providing Performance Predictability  in Shared Main Memory Systems

21

MISE Model: Putting it All Together

time

Interval

Estimate slowdown

Interval

Estimate slowdown

Measure RSRShared, Estimate RSRAlone

Measure RSRShared, Estimate RSRAlone

Page 22: MISE:  Providing Performance Predictability  in Shared Main Memory Systems

22

Outline1. Estimate Slowdown

Key Observations Implementation MISE Model: Putting it All Together Evaluating the Model

2. Control Slowdown Providing Soft Slowdown

Guarantees Minimizing Maximum Slowdown

Page 23: MISE:  Providing Performance Predictability  in Shared Main Memory Systems

23

Previous Work on Slowdown Estimation Previous work on slowdown estimation

STFM (Stall Time Fair Memory) Scheduling [Mutlu+, MICRO ‘07]

FST (Fairness via Source Throttling) [Ebrahimi+, ASPLOS ‘10]

Per-thread Cycle Accounting [Du Bois+, HiPEAC ‘13]

Basic Idea:

Shared

Alone

Time Stall Time Stall Slowdown

Hard

EasyCount number of cycles application receives interference

Page 24: MISE:  Providing Performance Predictability  in Shared Main Memory Systems

24

Two Major Advantages of MISE Over STFM Advantage 1:

STFM estimates alone performance while an application is receiving interference Hard

MISE estimates alone performance while giving an application the highest priority Easier

Advantage 2: STFM does not take into account compute

phase for non-memory-bound applications MISE accounts for compute phase Better

accuracy

Page 25: MISE:  Providing Performance Predictability  in Shared Main Memory Systems

25

Methodology Configuration of our simulated system

4 cores 1 channel, 8 banks/channel DDR3 1066 DRAM 512 KB private cache/core

Workloads SPEC CPU2006 300 multi programmed workloads

Page 26: MISE:  Providing Performance Predictability  in Shared Main Memory Systems

26

Quantitative Comparison

0 10 20 30 40 50 60 70 80 90 1001

1.5

2

2.5

3

3.5

4

ActualSTFMMISE

Million Cycles

Slow

dow

nSPEC CPU 2006 application

leslie3d

Page 27: MISE:  Providing Performance Predictability  in Shared Main Memory Systems

27

Comparison to STFM

cactusADM0 20 40 60 80 100

0

1

2

3

4

Slow

dow

n

0 20 40 60 80 1000

1

2

3

4

Slow

dow

nGemsFDTD

0 20 40 60 80 100

01234

Slow

dow

n

soplex

0 20 40 60 80 1000

1

2

3

4

Slow

dow

n

wrf0 20 40 60 80 100

0

1

2

3

4

Slow

dow

n

calculix0 20 40 60 80 10

001234

Slow

dow

npovray

Average error of MISE: 8.2%Average error of STFM: 29.4%

(across 300 workloads)

Page 28: MISE:  Providing Performance Predictability  in Shared Main Memory Systems

28

Outline1. Estimate Slowdown

Key Observations Implementation MISE Model: Putting it All Together Evaluating the Model

2. Control Slowdown Providing Soft Slowdown

Guarantees Minimizing Maximum Slowdown

Page 29: MISE:  Providing Performance Predictability  in Shared Main Memory Systems

29

Providing “Soft” Slowdown Guarantees Goal

1. Ensure QoS-critical applications meet a prescribed slowdown bound

2. Maximize system performance for other applications

Basic Idea Allocate just enough bandwidth to QoS-critical

application Assign remaining bandwidth to other

applications

Page 30: MISE:  Providing Performance Predictability  in Shared Main Memory Systems

30

MISE-QoS: Mechanism to Provide Soft QoS Assign an initial bandwidth allocation to QoS-

critical application Estimate slowdown of QoS-critical application

using the MISE model After every N intervals

If slowdown > bound B +/- ε, increase bandwidth allocation

If slowdown < bound B +/- ε, decrease bandwidth allocation

When slowdown bound not met for N intervals Notify the OS so it can migrate/de-schedule jobs

Page 31: MISE:  Providing Performance Predictability  in Shared Main Memory Systems

31

Methodology Each application (25 applications in total)

considered the QoS-critical application Run with 12 sets of co-runners of different

memory intensities Total of 300 multi programmed workloads Each workload run with 10 slowdown bound

values Baseline memory scheduling mechanism

Always prioritize QoS-critical application [Iyer+, SIGMETRICS 2007]

Other applications’ requests scheduled in FRFCFS order[Zuravleff +, US Patent 1997, Rixner+, ISCA 2000]

Page 32: MISE:  Providing Performance Predictability  in Shared Main Memory Systems

32

A Look at One Workload

leslie3d hmmer lbm omnetpp0

0.5

1

1.5

2

2.5

3

AlwaysPriori-tizeMISE-QoS-10/1MISE-QoS-10/3Sl

owdo

wn

QoS-critical non-QoS-critical

MISE is effective in 1. meeting the slowdown bound for the

QoS-critical application 2. improving performance of non-QoS-

critical applications

Slowdown Bound = 10 Slowdown Bound =

3.33 Slowdown Bound = 2

Page 33: MISE:  Providing Performance Predictability  in Shared Main Memory Systems

33

Effectiveness of MISE in Enforcing QoS

Predicted Met

Predicted Not Met

QoS Bound Met 78.8% 2.1%

QoS Bound Not Met 2.2% 16.9%

Across 3000 data points

MISE-QoS meets the bound for 80.9% of workloads

AlwaysPrioritize meets the bound for 83% of workloads

MISE-QoS correctly predicts whether or not the bound is met for 95.7% of workloads

Page 34: MISE:  Providing Performance Predictability  in Shared Main Memory Systems

34

Performance of Non-QoS-Critical Applications

0 1 2 3 Avg0

0.20.40.60.8

11.21.4

AlwaysPrioritizeMISE-QoS-10/1MISE-QoS-10/3MISE-QoS-10/5MISE-QoS-10/7MISE-QoS-10/9

Number of Memory Intensive Applications

Har

mon

ic S

peed

up

Higher performance when bound is looseWhen slowdown bound is 10/3 MISE-QoS improves system performance by

10%

Page 35: MISE:  Providing Performance Predictability  in Shared Main Memory Systems

35

Outline1. Estimate Slowdown

Key Observations Implementation MISE Model: Putting it All Together Evaluating the Model

2. Control Slowdown Providing Soft Slowdown

Guarantees Minimizing Maximum Slowdown

Page 36: MISE:  Providing Performance Predictability  in Shared Main Memory Systems

36

Other Results in the Paper Sensitivity to model parameters

Robust across different values of model parameters

Comparison of STFM and MISE models in enforcing soft slowdown guarantees MISE significantly more effective in enforcing

guarantees

Minimizing maximum slowdown MISE improves fairness across several system

configurations

Page 37: MISE:  Providing Performance Predictability  in Shared Main Memory Systems

37

Summary Uncontrolled memory interference slows down

applications unpredictably Goal: Estimate and control slowdowns Key contribution

MISE: An accurate slowdown estimation model Average error of MISE: 8.2%

Key Idea Request Service Rate is a proxy for performance Request Service Rate Alone estimated by giving an

application highest priority in accessing memory Leverage slowdown estimates to control

slowdowns Providing soft slowdown guarantees Minimizing maximum slowdown

Page 38: MISE:  Providing Performance Predictability  in Shared Main Memory Systems

38

Thank You

Page 39: MISE:  Providing Performance Predictability  in Shared Main Memory Systems

39

MISE: Providing Performance

Predictability in Shared Main Memory

SystemsLavanya Subramanian, Vivek Seshadri,

Yoongu Kim, Ben Jaiyen, Onur Mutlu

Page 40: MISE:  Providing Performance Predictability  in Shared Main Memory Systems

40

Backup Slides

Page 41: MISE:  Providing Performance Predictability  in Shared Main Memory Systems

Case Study with Two QoS-Critical Applications Two comparison points

Always prioritize both applications Prioritize each application 50% of time

41

astar mcf leslie3d mcf0

1

2

3

4

5

6

7

8

9

10

AlwaysPrioritizeEqualBandwidthMISE-QoS-10/1MISE-QoS-10/2MISE-QoS-10/3MISE-QoS-10/4MISE-QoS-10/5

Slow

dow

n

MISE-QoS can achieve a lower slowdown bound for both applications

MISE-QoS provides much lower slowdowns for non-QoS-critical

applications

Page 42: MISE:  Providing Performance Predictability  in Shared Main Memory Systems

Minimizing Maximum Slowdown Goal

Minimize the maximum slowdown experienced by any application

Basic Idea Assign more memory bandwidth to the more

slowed down application

Page 43: MISE:  Providing Performance Predictability  in Shared Main Memory Systems

Mechanism Memory controller tracks

Slowdown bound B Bandwidth allocation of all applications

Different components of mechanism Bandwidth redistribution policy Modifying target bound Communicating target bound to OS periodically

Page 44: MISE:  Providing Performance Predictability  in Shared Main Memory Systems

Bandwidth Redistribution At the end of each interval,

Group applications into two clusters

Cluster 1: applications that meet bound

Cluster 2: applications that don’t meet bound

Steal small amount of bandwidth from each application in cluster 1 and allocate to applications in cluster 2

Page 45: MISE:  Providing Performance Predictability  in Shared Main Memory Systems

Modifying Target Bound If bound B is met for past N intervals

Bound can be made more aggressive Set bound higher than the slowdown of most

slowed down application

If bound B not met for past N intervals by more than half the applications Bound should be more relaxed Set bound to slowdown of most slowed down

application

Page 46: MISE:  Providing Performance Predictability  in Shared Main Memory Systems

46

Results: Harmonic Speedup

4 8 160

0.1

0.2

0.3

0.4

0.5

0.6

0.7

FRFCFSATLASTCMSTFMMISE-Fair

Har

mon

ic S

peed

up

Page 47: MISE:  Providing Performance Predictability  in Shared Main Memory Systems

47

Results: Maximum Slowdown

4 8 160

2

4

6

8

10

12

14

16

FRFCFSATLASTCMSTFMMISE-Fair

Core Count

Max

imum

Slo

wdo

wn

Page 48: MISE:  Providing Performance Predictability  in Shared Main Memory Systems

Sensitivity to Memory Intensity

0 25 50 75 100 Avg0

5

10

15

20

25

FRFCFSATLASTCMSTFMMISE-Fair

Max

imum

Slo

wdo

wn

Page 49: MISE:  Providing Performance Predictability  in Shared Main Memory Systems

49

MISE’s Implementation Cost1. Per-core counters worth 20 bytes Request Service Rate Shared Request Service Rate Alone

1 counter for number of high priority epoch requests 1 counter for number of high priority epoch cycles 1 counter for interference cycles

Memory phase fraction ( )2. Register for current bandwidth allocation – 4

bytes3. Logic for prioritizing an application in each

epoch