two ways to exploit multi-megabyte caches aenao research group @ toronto kaveh aasaraai ioana burcea...

49
Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas Moshovos asaraai, ioana, myrto, elham, zebchuk, moshovos}@eecg.toronto

Upload: thomasine-reynolds

Post on 03-Jan-2016

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas

Two Ways to Exploit Multi-Megabyte Caches

AENAO Research Group @ TorontoKaveh Aasaraai

Ioana Burcea

Myrto Papadopoulou

Elham Safi

Jason Zebchuk

Andreas Moshovos

{aasaraai, ioana, myrto, elham, zebchuk, moshovos}@eecg.toronto.edu

Page 2: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas

EPFL, Jan. 2008 2Aenao Group/Toronto

Future Caches: Just Larger?

CPU

I$ D$

CPU

I$ D$

CPU

I$ D$

interconnect

Main Memory

1. “Big Picture” Management2. Store Metadata

10s – 100s of MB

Page 3: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas

EPFL, Jan. 2008 3Aenao Group/Toronto

Conventional Block Centric Cache

“Small” Blocks Optimizes Bandwidth and Performance

Large L2/L3 caches especially

Fine-Grain View of Memory

L2 Cache

Big Picture Lost

Page 4: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas

EPFL, Jan. 2008 4Aenao Group/Toronto

“Big Picture” View

Region: 2n sized, aligned area of memory Patterns and behavior exposed

Spatial locality

Exploit for performance/area/power

Coarse-Grain View of Memory

L2 Cache

Page 5: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas

EPFL, Jan. 2008 5Aenao Group/Toronto

Exploiting Coarse-Grain Patterns

Many existing coarse-grain optimizations Add new structures to track coarse-grain information

CPU

L2 Cache

Stealth Prefetching

Run-time Adaptive Cache Hierarchy Management via

Reference Analysis

Destination-Set Prediction

Spatial Memory Streaming

Coarse-Grain Coherence Tracking

RegionScout

Circuit-Switched

Coherence

Hard to justify for a commercial design

Coarse-Grain Framework

Embed coarse-grain information in tag array

Support many different optimizations with less area overhead

Adaptable optimization FRAMEWORK

Page 6: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas

EPFL, Jan. 2008 6Aenao Group/Toronto

L2 Cache

RegionTracker Solution

Manage blocks, but also track and manage regions

Tag Array

L1

L1

L1

L1

Data Array

Data Blocks

BlockRequests

Block Requests

RegionTracker

RegionProbes

RegionResponses

Page 7: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas

EPFL, Jan. 2008 7Aenao Group/Toronto

RegionTracker Summary

Replace conventional tag array: 4-core CMP with 8MB shared L2 cache Within 1% of original performance Up to 20% less tag area Average 33% less energy consumption

Optimization Framework: Stealth Prefetching: same performance, 36% less area RegionScout: 2x more snoops avoided, no area overhead

Page 8: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas

EPFL, Jan. 2008 8Aenao Group/Toronto

Road Map

Introduction

Goals

Coarse-Grain Cache Designs

RegionTracker: A Tag Array Replacement

RegionTracker: An Optimization Framework

Conclusion

Page 9: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas

EPFL, Jan. 2008 9Aenao Group/Toronto

Goals

1. Conventional Tag Array Functionality Identify data block location and state Leave data array un-changed

2. Optimization Framework Functionality Is Region X cached? Which blocks of Region X are cached? Where? Evict or migrate Region X Easy to assign properties to each Region

Page 10: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas

EPFL, Jan. 2008 10Aenao Group/Toronto

Coarse-Grain Cache Designs

Increased BW, Decreased hit-rates

Region X

Large Block Size

Tag Array Data Array

Page 11: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas

EPFL, Jan. 2008 11Aenao Group/Toronto

Sector Cache

Decreased hit-rates

Region X

Tag Array Data Array

Page 12: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas

EPFL, Jan. 2008 12Aenao Group/Toronto

Sector Pool Cache

High Associativity (2 - 4 times)

Region X

Tag Array Data Array

Page 13: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas

EPFL, Jan. 2008 13Aenao Group/Toronto

Decoupled Sector Cache

Region information not exposed Region replacement requires scanning multiple entries

Region X

Tag Array Data ArrayStatus Table

Page 14: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas

EPFL, Jan. 2008 14Aenao Group/Toronto

Design Requirements

Small block size (64B) Miss-rate does not increase Lookup associativity does not increase No additional access latency

(i.e., No scanning, no multiple block evictions)

Does not increase latency, area, or energy Allows banking and interleaving

Fit in conventional tag array “envelope”

Page 15: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas

EPFL, Jan. 2008 15Aenao Group/Toronto

RegionTracker: A Tag Array Replacement

L1

L1

L1

L1

Data Array

3 SRAM arrays, combined smaller than tag array

RegionVectorArray

BlockStatusTable

EvictedRegionBuffer

Page 16: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas

EPFL, Jan. 2008 16Aenao Group/Toronto

Basic Structures

Region Vector Array(RVA)

Region Tag ……

block0

block15

wayV

1 4

Block Status Table(BST)

status

3 2

Address: specific RVA set and BST set RVA entry: multiple, consecutive BST sets BST entry: one of four RVA sets

Ex: 8MB, 16-way set-associative cache, 64-byte blocks, 1KB region

Page 17: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas

EPFL, Jan. 2008 17Aenao Group/Toronto

Common Case: Hit

Region Tag RVA Index Region OffsetBlock Offset49 061021

Address:

Region Vector Array(RVA)

Region Tag ……

block0

block15

wayV

Block Offset19 6 0

Block Status Table(BST)

1 4

status

3 2

Data Array + BST Index

To Data Array

Ex: 8MB, 16-way set-associative cache, 64-byte blocks, 1KB region

Page 18: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas

EPFL, Jan. 2008 18Aenao Group/Toronto

Worst Case (Rare): Region Miss

Region Tag RVA Index Region OffsetBlock Offset

49 061021

Address:

Region Vector Array(RVA)

Region Tag ……

block0

block15

wayV

Block Offset19 6 0

Block Status Table(BST)

status

3

Ptr

2

Data Array + BST Index

EvictedRegionBuffer(ERB)No

Match!

Ptr

Page 19: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas

EPFL, Jan. 2008 19Aenao Group/Toronto

Methodology

Flexus simulator from CMU SimFlex group Based on Simics full-system simulator

4-core CMP modeled after Piranha Private 32KB, 4-way set-associative L1 caches Shared 8MB, 16-way set-associative L2 cache 64-byte blocks

Miss-rates: Functional simulation of 2 billion instructions per core Performance and Energy: Timing simulation using SMARTS sampling

methodology Area and Power: Full custom implementation on 130nm commercial

technology 9 commercial workloads:

WEB: SpecWEB on Apache and Zeus OLTP: TPC-C on DB2 and Oracle DSS: 5 TPC-H queries on DB2

Interconnect

L2

P

D$ I$

P

D$ I$

P

D$ I$

P

D$ I$

Page 20: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas

EPFL, Jan. 2008 20Aenao Group/Toronto

Miss-Rates vs. Area

Sector Cache: 512KB sectors, SPC and RT: 1KB regions Trade-offs comparable to conventional cache

0.99

1

1.01

1.02

1.03

1.04

1.05

0.5 0.6 0.7 0.8 0.9 1 1.1 1.2

Sector Pool Cache

RegionTracker

Conventional Tags

better

Rela

tive M

iss-

Rate

Relative Tag Array Area

Sector Cache (0.25, 1.26)

14-way 15-way

52-way

48-way

Page 21: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas

EPFL, Jan. 2008 21Aenao Group/Toronto

Performance & Energy

0.97

0.98

0.99

1.00

1.01

1.02

1.03

WEB OLTP DSS0%

10%

20%

30%

40%

50%

WEB OLTP DSS

12-way set-associative RegionTracker: 20% less area Error bars: 95% confidence interval

Performance within 1%, with 33% tag energy reduction

Norm

aliz

ed E

xecu

tion T

ime

better

Reduct

ion in T

ag E

nerg

y

better

Performance Energy

Page 22: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas

EPFL, Jan. 2008 22Aenao Group/Toronto

Road Map

Introduction

Goals

Coarse-Grain Cache Designs

RegionTracker: A Tag Array Replacement

RegionTracker: An Optimization Framework

Conclusion

Page 23: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas

EPFL, Jan. 2008 23Aenao Group/Toronto

RegionTracker: An Optimization Framework

L1

L1

L1

L1

RVA

ERB

Data Array

BST

Stealth Prefetching:Average 20% performance improvement

Drop-in RegionTracker for 36% less area overhead

RegionScout:In-depth analysis

Page 24: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas

EPFL, Jan. 2008 24Aenao Group/Toronto

Snoop Coherence: Common Case

Main Memory

CPU CPU CPURead x

mis

sm

iss

Read x+1Read x+2Read x+n

Many snoops are to non-shared regions

Page 25: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas

EPFL, Jan. 2008 25Aenao Group/Toronto

RegionScout

Eliminate broadcasts for non-shared regions

Main Memory

CPUCPU CPU

Global Region Miss

Region Miss

Non-Shared Regions Locally Cached Regions

Read xRead x

RegionMiss

MissMiss

Page 26: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas

EPFL, Jan. 2008 26Aenao Group/Toronto

RegionTracker Implementation

Minimal overhead to support RegionScout optimization

Still uses less area than conventional tag array

Non-Shared Regions

Add 1 bit to each RVA entry

Locally Cached Regions

Already provided by RVA

Page 27: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas

EPFL, Jan. 2008 27Aenao Group/Toronto

RegionTracker + RegionScout

0%

10%

20%

30%

40%

50%

60%

RS 7KB RS 12KB RS 22KB RSRT

Reduct

ion in

Snoop B

roadca

sts

better

4 processors, 512KB L2 Caches 1KB regions

Avoid 41% of Snoop Broadcasts,no area overhead compared to conventional tag

array

BlockScout(4KB)

Page 28: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas

EPFL, Jan. 2008 28Aenao Group/Toronto

Result Summary

Replace Conventional Tag Array: 20% Less tag area 33% Less tag energy Within 1% of original performance

Coarse-Grain Optimization Framework: 36% reduction in area overhead for Stealth Prefetching Filter 41% of snoop broadcasts with no area overhead

compared to conventional cache

Page 29: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas

Predictor Virtualization

Ioana Burcea

Joint work with

Stephen Somogyi

Babak Falsafi

Page 30: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas

EPFL, Jan. 2008 30Aenao Group/Toronto

Predictor Virtualization

Interconnect

L2

CPU CPU

L1-D

L1-I

CPU

L1-D

L1-I

Main Memory

Optimization Engines: Predictors

CPU CPU CPU

L1-D

L1-I

CPU CPU

L1-D L1-I

CPU

L1-D

L1-I

CPU CPU CPUCPU CPU

L1-D

L1-IL1-DL1-IL1-DL1-IL1-D

Page 31: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas

EPFL, Jan. 2008 31Aenao Group/Toronto

Motivating Trends

Dedicating resources to predictors hard to justify: Chip multiprocessors

Space dedicated to predictors X #processors Larger predictor tables

Increased performance

Memory hierarchies offer the opportunity Increased capacity How many apps really use the space?

Use conventional memory hierarchies to store predictor information

Page 32: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas

EPFL, Jan. 2008 32Aenao Group/Toronto

PV Architecture contd.

Optimization Engine

Predictor Table

request predictionrequest

Page 33: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas

EPFL, Jan. 2008 33Aenao Group/Toronto

PV Architecture contd.

Optimization Engine

prediction

Predictor Virtualization

request

Page 34: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas

EPFL, Jan. 2008 34Aenao Group/Toronto

PV Architecture contd.

Optimization Engine

prediction

+

indexPVStart

PVCache MSHR

PVProxy

L2

Main MemoryPVTable

request

On the backside of the L1

Page 35: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas

EPFL, Jan. 2008 35Aenao Group/Toronto

To Virtualize Or Not to Virtualize?

1. Re-Use2. Predictor Info Prefetching

Common Case

CPU

I$ D$

interconnect

Main Memory

L2/L3

Infrequent

Page 36: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas

EPFL, Jan. 2008 36Aenao Group/Toronto

To Virtualize or Not?

Challenge Hit in the PVCache most of the time

Will not work for all predictors out of the box

Reuse is necessary Intrinsic

Easy to virtualize Non-intrinsic

Must be engineered

More so if the predictor needs to be fast to start with

Page 37: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas

EPFL, Jan. 2008 37Aenao Group/Toronto

Will There Be Reuse?

Intrinsic: Multiple [predictions per entry We’ll see an example

Can be engineered Group temporally correlated entries together:

Cache block

CPU

I$ D$

interconnect

Main Memory

L2/L3

Page 38: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas

EPFL, Jan. 2008 38Aenao Group/Toronto

Spatial Memory Streaming

Footprint: Blocks accessed per memory region

Predict next time the footprint will be the same Handle: PC + offset within region

Page 39: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas

EPFL, Jan. 2008 39Aenao Group/Toronto

Spatial Generations

Page 40: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas

EPFL, Jan. 2008 40Aenao Group/Toronto

Virtualizing SMS

Detector Predictor

patterns

patterns

prefetchestrigger access

Virtualize

Page 41: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas

EPFL, Jan. 2008 41Aenao Group/Toronto

Virtualizing SMS

VirtualTable1K

11

PVCache8

11

tag pattern

tag tagpattern

pattern0 11 43 54 85 unused

Page 42: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas

EPFL, Jan. 2008 42Aenao Group/Toronto

Packing Entries in One Cache Block

Index: PC + offset within spatial group PC →16 bits 32 blocks in a spatial group → 5 bit offset

→ 32 bit spatial pattern

Pattern table: 1K sets 10 bits to index the table → 11 bit tag

Cache block: 64 bytes 11 entries per cache block → Pattern table

1K sets – 11-way set associative

21 bit index

tag pattern

tag tagpattern

pattern0 11 43 54 85 unused

Page 43: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas

EPFL, Jan. 2008 43Aenao Group/Toronto

Memory Address Calculation

+000000

16 bits 5 bits

10 bits

PV Start Address

PC Block offset

Memory Address

Page 44: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas

EPFL, Jan. 2008 44Aenao Group/Toronto

Simulation Infrastructure

SimFlex: CMU Impetus Full-system simulator based on Simics

Base processor configuration 8-wide OoO 256-entry ROB / 64-entry LSQ L1D/L1I 64KB 4-way set-associative UL2 8MB 16-way set-associative

Commercial workloads TPC-C: DB2 and Oracle TPC-H: Query 1, Query 2, Query 16, Query 17 Web: Apache and Zeus

Page 45: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas

EPFL, Jan. 2008 45Aenao Group/Toronto

SMS – Performance Potential

0

20

40

60

80

100

120

140

Infin

ite1

K -

16

a1

K -

11

a5

12

-11

a2

56

-11

a1

28

-11

a6

4-1

1a

32

-11

a1

6 -

11

a8

- 1

1a

Infin

ite1

K -

16

a1

K -

11

a5

12

-11

a2

56

-11

a1

28

-11

a6

4-1

1a

32

-11

a1

6 -

11

a8

- 1

1a

Infin

ite1

K -

16

a1

K -

11

a5

12

-11

a2

56

-11

a1

28

-11

a6

4-1

1a

32

-11

a1

6 -

11

a8

- 1

1a

Apache Oracle Qry 17

Pe

rce

nta

ge

L1

Re

ad

Mis

se

s (

%)

Covered Uncovered Overpredictions

better

Page 46: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas

EPFL, Jan. 2008 46Aenao Group/Toronto

Virtualized Spatial Memory Streaming

-100

1020304050607080

Apache Zeus DB2 Oracle Qry 1 Qry 2 Qry 16 Qry 17

Per

cent

age

Spe

edup

SMS - 1K sets SMS - 8 sets SMS - PVCache 8 sets

Original Prefetcher: Cost: 60KB

Virtualized Prefetcher: Cost: <1Kbyte

Nearly Identical Performance

better

Page 47: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas

EPFL, Jan. 2008 47Aenao Group/Toronto

Impact of Virtualization on L2 Misses

0

0.5

1

1.5

2

2.5

Apache Oracle Qry 17Per

cen

tag

e In

crea

se L

2 M

isse

s

PV-8 PV-16 PV-32

Page 48: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas

EPFL, Jan. 2008 48Aenao Group/Toronto

Impact of Virtualization on L2 Requests

0

10

20

30

40

50

Apache Oracle Qry 17

Perc

enta

ge In

crea

se L

2 Re

ques

ts

PV-8 PV-16 PV-32

Page 49: Two Ways to Exploit Multi-Megabyte Caches AENAO Research Group @ Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas

Coarse-Grain Tracking

Jason Zebchuk