lightning session - carnegie mellon universityomutlu/pub/enhanced-memory... · 2016. 6. 24. ·...
TRANSCRIPT
-
Accelerating Dependent Cache Misses with an Enhanced Memory Controller
Milad Hashemi, Khubaib, Eiman Ebrahimi, Onur Mutlu, Yale N. PattTuesday June 21: Session 7A, 3:30pm
-
Memory Access Latency• Thelatencyofaccessingmainmemoryismadeupoftwoparts:
DRAMMultiprocessor
-
Memory Access Latency• Thelatencyofaccessingmainmemoryismadeupoftwoparts:• DRAMaccesslatency
DRAMMultiprocessor
-
Memory Access Latency• Thelatencyofaccessingmainmemoryismadeupoftwoparts:• DRAMaccesslatency• On-chiplatency
DRAMMultiprocessor
-
On-Chip Delay
0%10%20%30%40%50%60%70%80%90%100%
4xcalculix
4xpovray
4xnamd
4xgamess
4xperlb
ench
4xtonto
4xgrom
ac4xgobm
k4xdealII
4xsje
ng4xgcc
4xhm
mer
4xh264ref
4xbzip2
4xastar
4xXalancbm
k4xzeusmp
4xcactus
4xwrf
4xGe
msFDT
D4xleslie
4xom
netpp
4xmilc
4xsoplex
4xsphinx
4xbw
aves
4xlibquantum
4xlbm
4xmcf
TotalM
issCycles
On-ChipDelay
DRAM-Access
-
LD[R3]->R5
Dependent Cache MissesCacheMiss
-
ADDR4,R5->R9
LD[R3]->R5
Dependent Cache MissesCacheMiss
-
ADDR9,R1->R6
ADDR4,R5->R9
LD[R3]->R5
Dependent Cache MissesCacheMiss
-
CacheMissLD[R6]->R8
ADDR9,R1->R6
ADDR4,R5->R9
LD[R3]->R5
Dependent Cache MissesCacheMiss
-
PhysicalRegister
File Live In Vector
Uop Buffer
Reservation Station
ALU 0
ALU 1EMC Data
Cache
Load StoreQueue
Result Data
Tag Broadcast
Decoded micro-opsfrom core
Live-outregistersto core
Live-inregistersfrom core
Dirty cache
lines to core
Compute Capable Memory Controller
-
Effective Memory Access Latency Reduction
0
50
100
150
200
250
300
350
400
450
H1 H2 H3 H4 H5 H6 H7 H8 H9 H10 Mean
EffectiveMem
oryAccessLatency
CoreAccess
-
Effective Memory Access Latency Reduction
0
50
100
150
200
250
300
350
400
450
H1 H2 H3 H4 H5 H6 H7 H8 H9 H10 Mean
EffectiveMem
oryAccessLatency
EMCAccess
CoreAccess
-
Accelerating Dependent Cache Misses with an Enhanced Memory Controller
Milad Hashemi, Khubaib, Eiman Ebrahimi, Onur Mutlu, Yale N. PattTuesday June 21: Session 7A, 3:30pm