optimizing dram timing for the common-case donghyuk lee yoongu kim, gennady pekhimenko, samira khan,...
TRANSCRIPT
![Page 1: Optimizing DRAM Timing for the Common-Case Donghyuk Lee Yoongu Kim, Gennady Pekhimenko, Samira Khan, Vivek Seshadri, Kevin Chang, Onur Mutlu Adaptive-Latency](https://reader030.vdocuments.mx/reader030/viewer/2022032414/56649ef35503460f94c05b52/html5/thumbnails/1.jpg)
Optimizing DRAM Timing
for the Common-Case
Donghyuk LeeYoongu Kim, Gennady Pekhimenko,
Samira Khan,Vivek Seshadri, Kevin Chang, Onur
Mutlu
Adaptive-Latency DRAM
![Page 2: Optimizing DRAM Timing for the Common-Case Donghyuk Lee Yoongu Kim, Gennady Pekhimenko, Samira Khan, Vivek Seshadri, Kevin Chang, Onur Mutlu Adaptive-Latency](https://reader030.vdocuments.mx/reader030/viewer/2022032414/56649ef35503460f94c05b52/html5/thumbnails/2.jpg)
2
(11 – 11 – 28)
Timing Parameters
DRAM Module
x86 CPU
DDR3 1600MT/s (11-11-28)
SPECmcf
Runtime: 527min Runtime: 477min -10.5% (no
error)
(8 – 8 – 19)
MemCtrl
ParsecGUPSMemcachedApache
![Page 3: Optimizing DRAM Timing for the Common-Case Donghyuk Lee Yoongu Kim, Gennady Pekhimenko, Samira Khan, Vivek Seshadri, Kevin Chang, Onur Mutlu Adaptive-Latency](https://reader030.vdocuments.mx/reader030/viewer/2022032414/56649ef35503460f94c05b52/html5/thumbnails/3.jpg)
3
Reducing DRAM Timing
Why can we reduce DRAM timing parameters without any errors?
![Page 4: Optimizing DRAM Timing for the Common-Case Donghyuk Lee Yoongu Kim, Gennady Pekhimenko, Samira Khan, Vivek Seshadri, Kevin Chang, Onur Mutlu Adaptive-Latency](https://reader030.vdocuments.mx/reader030/viewer/2022032414/56649ef35503460f94c05b52/html5/thumbnails/4.jpg)
4
Executive Summary• Observations
– DRAM timing parameters are dictated by the worst-case cell (smallest cell across all products at highest temperature)
– DRAM operates at lower temperature than the worst case
• Idea: Adaptive-Latency DRAM – Optimizes DRAM timing parameters for the
common case (typical DIMM operating at low temperatures)
• Analysis: Characterization of 115 DIMMs– Great potential to lower DRAM timing
parameters (17 – 54%) without any errors
• Real System Performance Evaluation – Significant performance improvement (14%
for memory-intensive workloads) without errors (33 days)
![Page 5: Optimizing DRAM Timing for the Common-Case Donghyuk Lee Yoongu Kim, Gennady Pekhimenko, Samira Khan, Vivek Seshadri, Kevin Chang, Onur Mutlu Adaptive-Latency](https://reader030.vdocuments.mx/reader030/viewer/2022032414/56649ef35503460f94c05b52/html5/thumbnails/5.jpg)
5
1. DRAM Operation Basics 2. Reasons for Timing Margin in DRAM
4. Adaptive-Latency DRAM
5. DRAM Characterization 6. Real System Performance Evaluation
3. Key Observations
![Page 6: Optimizing DRAM Timing for the Common-Case Donghyuk Lee Yoongu Kim, Gennady Pekhimenko, Samira Khan, Vivek Seshadri, Kevin Chang, Onur Mutlu Adaptive-Latency](https://reader030.vdocuments.mx/reader030/viewer/2022032414/56649ef35503460f94c05b52/html5/thumbnails/6.jpg)
6
DRAM Stores Data as Charge
1. Sensing2. Restore3. Precharge
DRAM Cell
Sense-Amplifier
Three steps of charge movement
![Page 7: Optimizing DRAM Timing for the Common-Case Donghyuk Lee Yoongu Kim, Gennady Pekhimenko, Samira Khan, Vivek Seshadri, Kevin Chang, Onur Mutlu Adaptive-Latency](https://reader030.vdocuments.mx/reader030/viewer/2022032414/56649ef35503460f94c05b52/html5/thumbnails/7.jpg)
7
Data 0
Data 1
Cell
time
ch
arg
eSense-Amplifier
DRAM Charge over Time
Sensing
Restore
Why does DRAM need the extra timing margin?
Timing Parameters
In theoryIn
practice
margin
Cell
Sense-Amplifier
![Page 8: Optimizing DRAM Timing for the Common-Case Donghyuk Lee Yoongu Kim, Gennady Pekhimenko, Samira Khan, Vivek Seshadri, Kevin Chang, Onur Mutlu Adaptive-Latency](https://reader030.vdocuments.mx/reader030/viewer/2022032414/56649ef35503460f94c05b52/html5/thumbnails/8.jpg)
8
1. DRAM Operation Basics 2. Reasons for Timing Margin in DRAM
4. Adaptive-Latency DRAM
5. DRAM Characterization 6. Real System Performance Evaluation
3. Key Observations
![Page 9: Optimizing DRAM Timing for the Common-Case Donghyuk Lee Yoongu Kim, Gennady Pekhimenko, Samira Khan, Vivek Seshadri, Kevin Chang, Onur Mutlu Adaptive-Latency](https://reader030.vdocuments.mx/reader030/viewer/2022032414/56649ef35503460f94c05b52/html5/thumbnails/9.jpg)
9
1. Process Variation – DRAM cells are not equal– Leads to extra timing margin for
cell that can store small amount of charge
`
2. Temperature Dependence – DRAM leaks more charge at
higher temperature– Leads to extra timing margin
when operating at low temperature
Two Reasons for Timing Margin
1. Process Variation – DRAM cells are not equal– Leads to extra timing margin for a
cell that can store a large amount of charge
1. Process Variation – DRAM cells are not equal– Leads to extra timing margin for a
cell that can store a large amount of charge
![Page 10: Optimizing DRAM Timing for the Common-Case Donghyuk Lee Yoongu Kim, Gennady Pekhimenko, Samira Khan, Vivek Seshadri, Kevin Chang, Onur Mutlu Adaptive-Latency](https://reader030.vdocuments.mx/reader030/viewer/2022032414/56649ef35503460f94c05b52/html5/thumbnails/10.jpg)
10
DRAM Cells are Not EqualRealIdeal
Same Size Same Charge
Different Size Different Charge
Largest Cell
Smallest Cell
Same Latency Different Latency
Large variation in cell size Large variation in charge
Large variation in access latency
![Page 11: Optimizing DRAM Timing for the Common-Case Donghyuk Lee Yoongu Kim, Gennady Pekhimenko, Samira Khan, Vivek Seshadri, Kevin Chang, Onur Mutlu Adaptive-Latency](https://reader030.vdocuments.mx/reader030/viewer/2022032414/56649ef35503460f94c05b52/html5/thumbnails/11.jpg)
11
Contact
Process Variation
Access Transistor
Bitline
Capacitor
Small cell can store small charge• Small cell capacitance•High contact resistance• Slow access transistor
❶ Cell Capacitance
❷ Contact Resistance❸ Transistor Performance
ACCESS
DRAM Cell
High access latency
![Page 12: Optimizing DRAM Timing for the Common-Case Donghyuk Lee Yoongu Kim, Gennady Pekhimenko, Samira Khan, Vivek Seshadri, Kevin Chang, Onur Mutlu Adaptive-Latency](https://reader030.vdocuments.mx/reader030/viewer/2022032414/56649ef35503460f94c05b52/html5/thumbnails/12.jpg)
12
Two Reasons for Timing Margin
1. Process Variation – DRAM cells are not equal– Leads to extra timing margin for a
cell that can store a large amount of charge
`
2. Temperature Dependence – DRAM leaks more charge at
higher temperature– Leads to extra timing margin for
cells that operate at the high temperature
2. Temperature Dependence – DRAM leaks more charge at
higher temperature– Leads to extra timing margin for
cells that operate at the high temperature
2. Temperature Dependence – DRAM leaks more charge at
higher temperature– Leads to extra timing margin for
cells that operate at the low temperature
![Page 13: Optimizing DRAM Timing for the Common-Case Donghyuk Lee Yoongu Kim, Gennady Pekhimenko, Samira Khan, Vivek Seshadri, Kevin Chang, Onur Mutlu Adaptive-Latency](https://reader030.vdocuments.mx/reader030/viewer/2022032414/56649ef35503460f94c05b52/html5/thumbnails/13.jpg)
13
Charge Leakage Temperature
Room Temp.Hot Temp.
(85°C)
Small Leakage Large LeakageCells store small charge at high
temperature and large charge at low temperature
Large variation in access latency
![Page 14: Optimizing DRAM Timing for the Common-Case Donghyuk Lee Yoongu Kim, Gennady Pekhimenko, Samira Khan, Vivek Seshadri, Kevin Chang, Onur Mutlu Adaptive-Latency](https://reader030.vdocuments.mx/reader030/viewer/2022032414/56649ef35503460f94c05b52/html5/thumbnails/14.jpg)
14
DRAM Timing Parameters
• DRAM timing parameters are dictated by the worst-case – The smallest cell with the smallest
charge in all DRAM products
– Operating at the highest temperature
• Large timing margin for the common-case
![Page 15: Optimizing DRAM Timing for the Common-Case Donghyuk Lee Yoongu Kim, Gennady Pekhimenko, Samira Khan, Vivek Seshadri, Kevin Chang, Onur Mutlu Adaptive-Latency](https://reader030.vdocuments.mx/reader030/viewer/2022032414/56649ef35503460f94c05b52/html5/thumbnails/15.jpg)
15
Our Approach• We optimize DRAM timing parameters for the common-case – The smallest cell with the smallest
charge in a DRAM module– Operating at the current
temperature
• Common-case cell has extra charge than the worst-case cell
Can lower latency for the common-case
![Page 16: Optimizing DRAM Timing for the Common-Case Donghyuk Lee Yoongu Kim, Gennady Pekhimenko, Samira Khan, Vivek Seshadri, Kevin Chang, Onur Mutlu Adaptive-Latency](https://reader030.vdocuments.mx/reader030/viewer/2022032414/56649ef35503460f94c05b52/html5/thumbnails/16.jpg)
16
1. DRAM Operation Basics 2. Reasons for Timing Margin in DRAM
4. Adaptive-Latency DRAM
5. DRAM Characterization 6. Real System Performance Evaluation
3. Key Observations
![Page 17: Optimizing DRAM Timing for the Common-Case Donghyuk Lee Yoongu Kim, Gennady Pekhimenko, Samira Khan, Vivek Seshadri, Kevin Chang, Onur Mutlu Adaptive-Latency](https://reader030.vdocuments.mx/reader030/viewer/2022032414/56649ef35503460f94c05b52/html5/thumbnails/17.jpg)
17
Key Observations1. Sensing
2. Restore
3. Precharge
Sense cells with extra charge faster Lower sensing latency
No need to fully restore cells with extra charge Lower restore latency
No need to fully precharge bitlines for cells with extra charge Lower precharge latency
![Page 18: Optimizing DRAM Timing for the Common-Case Donghyuk Lee Yoongu Kim, Gennady Pekhimenko, Samira Khan, Vivek Seshadri, Kevin Chang, Onur Mutlu Adaptive-Latency](https://reader030.vdocuments.mx/reader030/viewer/2022032414/56649ef35503460f94c05b52/html5/thumbnails/18.jpg)
18
Typical DIMM at
Low Temperature
Observation 1. Faster Sensing
More ChargeStrong ChargeFlowFaster Sensing
Typical DIMM at Low Temperature More charge Faster sensing
Timing(tRCD)
17% ↓ No
Errors
115 DIMM Characteriz
ation
![Page 19: Optimizing DRAM Timing for the Common-Case Donghyuk Lee Yoongu Kim, Gennady Pekhimenko, Samira Khan, Vivek Seshadri, Kevin Chang, Onur Mutlu Adaptive-Latency](https://reader030.vdocuments.mx/reader030/viewer/2022032414/56649ef35503460f94c05b52/html5/thumbnails/19.jpg)
19
Observation 2. Reducing Restore TimeLarger Cell & Less Leakage Extra ChargeNo Need to FullyRestore Charge
Typical DIMM at lower temperature More charge Restore time
reduction
Typical DIMM at
Low Temperature Read (tRAS)
37% ↓ Write (tWR)
54% ↓No Errors
115 DIMM Characteriz
ation
![Page 20: Optimizing DRAM Timing for the Common-Case Donghyuk Lee Yoongu Kim, Gennady Pekhimenko, Samira Khan, Vivek Seshadri, Kevin Chang, Onur Mutlu Adaptive-Latency](https://reader030.vdocuments.mx/reader030/viewer/2022032414/56649ef35503460f94c05b52/html5/thumbnails/20.jpg)
20
Empty
(0V)
Full (Vdd)
Half
Observation 3. Reducing Precharge Time
Bit
line
Sense-Amplifier
Sensing
Precharge
Precharge ?
– Setting bitline to half-full charge
Typical DIMM at Lower
Temperature
![Page 21: Optimizing DRAM Timing for the Common-Case Donghyuk Lee Yoongu Kim, Gennady Pekhimenko, Samira Khan, Vivek Seshadri, Kevin Chang, Onur Mutlu Adaptive-Latency](https://reader030.vdocuments.mx/reader030/viewer/2022032414/56649ef35503460f94c05b52/html5/thumbnails/21.jpg)
21
Empty (0V)
Full (Vdd)
Half
bitline
Not Fully Precharge
d
More Charge Strong Sensing
Access Empty Cell
Access Full Cell
Timing(tRP)
35% ↓ No
Errors
115 DIMM Characteriz
ation
Typical DIMM at Lower Temperature More charge Precharge time
reduction
Observation 3. Reducing Precharge Time
![Page 22: Optimizing DRAM Timing for the Common-Case Donghyuk Lee Yoongu Kim, Gennady Pekhimenko, Samira Khan, Vivek Seshadri, Kevin Chang, Onur Mutlu Adaptive-Latency](https://reader030.vdocuments.mx/reader030/viewer/2022032414/56649ef35503460f94c05b52/html5/thumbnails/22.jpg)
22
Key Observations1. Sensing
2. Restore
3. Precharge
Sense cells with extra charge faster Lower sensing latency
No need to fully restore cells with extra charge Lower restore latency
No need to fully precharge bitlines for cells with extra charge Lower precharge latency
![Page 23: Optimizing DRAM Timing for the Common-Case Donghyuk Lee Yoongu Kim, Gennady Pekhimenko, Samira Khan, Vivek Seshadri, Kevin Chang, Onur Mutlu Adaptive-Latency](https://reader030.vdocuments.mx/reader030/viewer/2022032414/56649ef35503460f94c05b52/html5/thumbnails/23.jpg)
23
1. DRAM Operation Basics 2. Reasons for Timing Margin in DRAM
4. Adaptive-Latency DRAM
5. DRAM Characterization 6. Real System Performance Evaluation
3. Key Observations
![Page 24: Optimizing DRAM Timing for the Common-Case Donghyuk Lee Yoongu Kim, Gennady Pekhimenko, Samira Khan, Vivek Seshadri, Kevin Chang, Onur Mutlu Adaptive-Latency](https://reader030.vdocuments.mx/reader030/viewer/2022032414/56649ef35503460f94c05b52/html5/thumbnails/24.jpg)
24
Adaptive-Latency DRAM
• Key idea– Optimize DRAM timing parameters
online
• Two components– DRAM manufacturer profiles multiple
sets of reliable DRAM timing parameters at different temperatures for each DIMM
– System monitors DRAM temperature & uses appropriate DRAM timing parameters
reliable DRAM timing parameters
DRAM temperature
![Page 25: Optimizing DRAM Timing for the Common-Case Donghyuk Lee Yoongu Kim, Gennady Pekhimenko, Samira Khan, Vivek Seshadri, Kevin Chang, Onur Mutlu Adaptive-Latency](https://reader030.vdocuments.mx/reader030/viewer/2022032414/56649ef35503460f94c05b52/html5/thumbnails/25.jpg)
25
1. DRAM Operation Basics 2. Reasons for Timing Margin in DRAM
4. Adaptive-Latency DRAM
5. DRAM Characterization 6. Real System Performance Evaluation
3. Key Observations
![Page 26: Optimizing DRAM Timing for the Common-Case Donghyuk Lee Yoongu Kim, Gennady Pekhimenko, Samira Khan, Vivek Seshadri, Kevin Chang, Onur Mutlu Adaptive-Latency](https://reader030.vdocuments.mx/reader030/viewer/2022032414/56649ef35503460f94c05b52/html5/thumbnails/26.jpg)
26
DRAM Temperature• DRAM temperature measurement• Server cluster: Operates at under 34°C • Desktop: Operates at under 50°C• DRAM standard optimized for 85°C
• Previous works – DRAM temperature is low• El-Sayed+ SIGMETRICS 2012• Liu+ ISCA 2007• Previous works – Maintain DRAM temperature low• David+ ICAC 2011• Liu+ ISCA 2007• Zhu+ ITHERM 2008
DRAM operates at low temperatures in the
common-case
![Page 27: Optimizing DRAM Timing for the Common-Case Donghyuk Lee Yoongu Kim, Gennady Pekhimenko, Samira Khan, Vivek Seshadri, Kevin Chang, Onur Mutlu Adaptive-Latency](https://reader030.vdocuments.mx/reader030/viewer/2022032414/56649ef35503460f94c05b52/html5/thumbnails/27.jpg)
27
TemperatureController
PC
HeaterFPGAs FPGAs
DRAM Testing Infrastructure
![Page 28: Optimizing DRAM Timing for the Common-Case Donghyuk Lee Yoongu Kim, Gennady Pekhimenko, Samira Khan, Vivek Seshadri, Kevin Chang, Onur Mutlu Adaptive-Latency](https://reader030.vdocuments.mx/reader030/viewer/2022032414/56649ef35503460f94c05b52/html5/thumbnails/28.jpg)
28
Test Pattern
Write
timeAccess
Verify
Refresh Interval: 64–512ms
• Single cache line test (Read/Write)
• Overlapping multiple single cache line tests to simulate power noise and couplingWrit
eAcce
ssVerify
time
Refresh Interval: 64–512ms
Access
Access
Verify
. . . . . .. . .
![Page 29: Optimizing DRAM Timing for the Common-Case Donghyuk Lee Yoongu Kim, Gennady Pekhimenko, Samira Khan, Vivek Seshadri, Kevin Chang, Onur Mutlu Adaptive-Latency](https://reader030.vdocuments.mx/reader030/viewer/2022032414/56649ef35503460f94c05b52/html5/thumbnails/29.jpg)
29
Control Factors• Timing parameters
– Sensing: tRCD– Restore: tRAS (read), tWR(write)– Precharge: tRP
• Temperature: 55 – 85°C
• Refresh interval: 64 – 512ms – Longer refresh interval leads to
smaller charge – Standard refresh interval: 64ms
![Page 30: Optimizing DRAM Timing for the Common-Case Donghyuk Lee Yoongu Kim, Gennady Pekhimenko, Samira Khan, Vivek Seshadri, Kevin Chang, Onur Mutlu Adaptive-Latency](https://reader030.vdocuments.mx/reader030/viewer/2022032414/56649ef35503460f94c05b52/html5/thumbnails/30.jpg)
30
15.0
ns
12.5
ns
10.0
ns
7.5n
s
35.0
ns
32.5
ns
30.0
ns
27.5
ns
25.0
ns
22.5
ns
20.0
ns
15.0
ns
12.5
ns
10.0
ns
7.5n
s
15.0
ns12
.5ns
10.0
ns7.
5ns
5.0n
s
10
102
103
104
105
0
Err
ors
Temperature: 85°C/Refresh Interval: 64, 128, 256, 512ms
1. Timings ↔ Charge
More charge enables more timing parameter reduction
Sensing
Restore (Read)
Precharge
Restore (Write)
15.0
ns
12.5
ns
10.0
ns
7.5n
s
35.0
ns
32.5
ns
30.0
ns
27.5
ns
25.0
ns
22.5
ns
20.0
ns
15.0
ns
12.5
ns
10.0
ns
7.5n
s
15.0
ns12
.5ns
10.0
ns7.
5ns
5.0n
s
15.0
ns
12.5
ns
10.0
ns
7.5n
s
35.0
ns
32.5
ns
30.0
ns
27.5
ns
25.0
ns
22.5
ns
20.0
ns
15.0
ns
12.5
ns
10.0
ns
7.5n
s
15.0
ns12
.5ns
10.0
ns7.
5ns
5.0n
s
15.0
ns
12.5
ns
10.0
ns
7.5n
s
35.0
ns
32.5
ns
30.0
ns
27.5
ns
25.0
ns
22.5
ns
20.0
ns
15.0
ns
12.5
ns
10.0
ns
7.5n
s
15.0
ns12
.5ns
10.0
ns7.
5ns
5.0n
s
15.0
ns
12.5
ns
10.0
ns
7.5n
s
35.0
ns
32.5
ns
30.0
ns
27.5
ns
25.0
ns
22.5
ns
20.0
ns
15.0
ns
12.5
ns
10.0
ns
7.5n
s
15.0
ns12
.5ns
10.0
ns7.
5ns
5.0n
s
![Page 31: Optimizing DRAM Timing for the Common-Case Donghyuk Lee Yoongu Kim, Gennady Pekhimenko, Samira Khan, Vivek Seshadri, Kevin Chang, Onur Mutlu Adaptive-Latency](https://reader030.vdocuments.mx/reader030/viewer/2022032414/56649ef35503460f94c05b52/html5/thumbnails/31.jpg)
3115
.0ns
12.5
ns
10.0
ns
7.5n
s
15.0
ns
12.5
ns
10.0
ns
7.5n
s
35.0
ns
32.5
ns
30.0
ns
27.5
ns
25.0
ns
22.5
ns
20.0
ns
15.0
ns12
.5ns
10.0
ns7.
5ns
5.0n
s
15.0
ns
12.5
ns
10.0
ns
7.5n
s
35.0
ns
32.5
ns
30.0
ns
27.5
ns
25.0
ns
22.5
ns
20.0
ns
15.0
ns
12.5
ns
10.0
ns
7.5n
s
15.0
ns12
.5ns
10.0
ns7.
5ns
5.0n
s
15.0
ns12
.5ns
10.0
ns7.
5ns
5.0n
s
15.0
ns
12.5
ns
10.0
ns
7.5n
s
35.0
ns
32.5
ns
30.0
ns
27.5
ns
25.0
ns
22.5
ns
20.0
ns
15.0
ns
12.5
ns
10.0
ns
7.5n
s
Temperature: 55, 65, 75, 85°C/Refresh Interval: 512ms
15.0
ns
12.5
ns
10.0
ns
7.5n
s
35.0
ns
32.5
ns
30.0
ns
27.5
ns
25.0
ns
22.5
ns
20.0
ns
15.0
ns
12.5
ns
10.0
ns
7.5n
s
15.0
ns12
.5ns
10.0
ns7.
5ns
5.0n
s
10
102
103
104
105
0
Err
ors
2. Timings ↔ Temperature
Lower temperature enablesmore timing parameter reduction
15.0
ns
12.5
ns
10.0
ns
7.5n
s
15.0
ns
12.5
ns
10.0
ns
7.5n
s
35.0
ns
32.5
ns
30.0
ns
27.5
ns
25.0
ns
22.5
ns
20.0
ns
15.0
ns12
.5ns
10.0
ns7.
5ns
5.0n
s
Sensing
Restore (Read)
Precharge
Restore (Write)
![Page 32: Optimizing DRAM Timing for the Common-Case Donghyuk Lee Yoongu Kim, Gennady Pekhimenko, Samira Khan, Vivek Seshadri, Kevin Chang, Onur Mutlu Adaptive-Latency](https://reader030.vdocuments.mx/reader030/viewer/2022032414/56649ef35503460f94c05b52/html5/thumbnails/32.jpg)
32
3. Summary of 115 DIMMs
• Latency reduction for read & write (55°C)– Read Latency: 32.7%– Write Latency: 55.1%
• Latency reduction for each timing parameter (55°C) – Sensing: 17.3%– Restore: 37.3% (read), 54.8%
(write)– Precharge: 35.2%
![Page 33: Optimizing DRAM Timing for the Common-Case Donghyuk Lee Yoongu Kim, Gennady Pekhimenko, Samira Khan, Vivek Seshadri, Kevin Chang, Onur Mutlu Adaptive-Latency](https://reader030.vdocuments.mx/reader030/viewer/2022032414/56649ef35503460f94c05b52/html5/thumbnails/33.jpg)
33
1. DRAM Operation Basics 2. Reasons for Timing Margin in DRAM
4. Adaptive-Latency DRAM
5. DRAM Characterization 6. Real System Performance Evaluation
3. Key Observations
![Page 34: Optimizing DRAM Timing for the Common-Case Donghyuk Lee Yoongu Kim, Gennady Pekhimenko, Samira Khan, Vivek Seshadri, Kevin Chang, Onur Mutlu Adaptive-Latency](https://reader030.vdocuments.mx/reader030/viewer/2022032414/56649ef35503460f94c05b52/html5/thumbnails/34.jpg)
34
Real System Evaluation Method
• System– CPU: AMD 4386 ( 8 Cores, 3.1GHz,
8MB LLC)– DRAM: 4GByte DDR3-1600 (800Mhz
Clock)– OS: Linux– Storage: 128GByte SSD
• Workload– 35 applications from SPEC, STREAM,
Parsec, Memcached, Apache, GUPS
![Page 35: Optimizing DRAM Timing for the Common-Case Donghyuk Lee Yoongu Kim, Gennady Pekhimenko, Samira Khan, Vivek Seshadri, Kevin Chang, Onur Mutlu Adaptive-Latency](https://reader030.vdocuments.mx/reader030/viewer/2022032414/56649ef35503460f94c05b52/html5/thumbnails/35.jpg)
35
sople
x
mcf
milc
libq
lbm
gem
s
copy
s.cl
ust
er
gups
non-i
nte
n...
inte
nsi
ve
all-
work
l...0%
5%10%15%20%25%
Single Core Multi Coreso
ple
x
mcf
milc
libq
lbm
gem
s
copy
s.cl
ust
er
gups
non-i
nte
n...
inte
nsi
ve
all-
work
l...0%
5%10%15%20%25%
Single Core Multi Core
1.4%6.7%
sople
x
mcf
milc
libq
lbm
gem
s
copy
s.cl
ust
er
gups
non-i
nte
n...
inte
nsi
ve
all-
work
l...0%
5%10%15%20%25%
Single Core Multi Core
5.0%
Single-Core Evaluation
AL-DRAM improves performance on a real system
Perf
orm
an
ce
Imp
rovem
en
t
AverageImproveme
nt
all-
35
-w
ork
load
![Page 36: Optimizing DRAM Timing for the Common-Case Donghyuk Lee Yoongu Kim, Gennady Pekhimenko, Samira Khan, Vivek Seshadri, Kevin Chang, Onur Mutlu Adaptive-Latency](https://reader030.vdocuments.mx/reader030/viewer/2022032414/56649ef35503460f94c05b52/html5/thumbnails/36.jpg)
36
sople
x
mcf
milc
libq
lbm
gem
s
copy
s.cl
ust
er
gups
non-i
nte
n...
inte
nsi
ve
all-
work
l...0%
5%10%15%20%25%
Single Core Multi Coreso
ple
x
mcf
milc
libq
lbm
gem
s
copy
s.cl
ust
er
gups
non-i
nte
n...
inte
nsi
ve
all-
work
l...0%
5%10%15%20%25%
Single Core Multi Coreso
ple
x
mcf
milc
libq
lbm
gem
s
copy
s.cl
ust
er
gups
non-i
nte
n...
inte
nsi
ve
all-
work
l...0%
5%10%15%20%25%
Single Core Multi Core14.0
%
2.9%
sople
x
mcf
milc
libq
lbm
gem
s
copy
s.cl
ust
er
gups
non-i
nte
n...
inte
nsi
ve
all-
work
l...0%
5%10%15%20%25%
Single Core Multi Core
10.4%
Multi-Core Evaluation
AL-DRAM provides higher performance for
multi-programmed & multi-threaded workloads
Perf
orm
an
ce
Imp
rovem
en
t
Average Improveme
nt
all-
35
-w
ork
load
![Page 37: Optimizing DRAM Timing for the Common-Case Donghyuk Lee Yoongu Kim, Gennady Pekhimenko, Samira Khan, Vivek Seshadri, Kevin Chang, Onur Mutlu Adaptive-Latency](https://reader030.vdocuments.mx/reader030/viewer/2022032414/56649ef35503460f94c05b52/html5/thumbnails/37.jpg)
37
Conclusion• Observations
– DRAM timing parameters are dictated by the worst-case cell (smallest cell across all products at highest temperature)
– DRAM operates at lower temperature than the worst case
• Idea: Adaptive-Latency DRAM – Optimizes DRAM timing parameters for the
common case (typical DIMM operating at low temperatures)
• Analysis: Characterization of 115 DIMMs– Great potential to lower DRAM timing
parameters (17 – 54%) without any errors
• Real System Performance Evaluation – Significant performance improvement (14%
for memory-intensive workloads) without errors (33 days)
![Page 38: Optimizing DRAM Timing for the Common-Case Donghyuk Lee Yoongu Kim, Gennady Pekhimenko, Samira Khan, Vivek Seshadri, Kevin Chang, Onur Mutlu Adaptive-Latency](https://reader030.vdocuments.mx/reader030/viewer/2022032414/56649ef35503460f94c05b52/html5/thumbnails/38.jpg)
Optimizing DRAM Timing
for the Common-Case
Donghyuk LeeYoongu Kim, Gennady Pekhimenko, Samira Khan,
Vivek Seshadri, Kevin Chang, Onur Mutlu
Adaptive-Latency DRAM
![Page 39: Optimizing DRAM Timing for the Common-Case Donghyuk Lee Yoongu Kim, Gennady Pekhimenko, Samira Khan, Vivek Seshadri, Kevin Chang, Onur Mutlu Adaptive-Latency](https://reader030.vdocuments.mx/reader030/viewer/2022032414/56649ef35503460f94c05b52/html5/thumbnails/39.jpg)
39
Backup Slides
![Page 40: Optimizing DRAM Timing for the Common-Case Donghyuk Lee Yoongu Kim, Gennady Pekhimenko, Samira Khan, Vivek Seshadri, Kevin Chang, Onur Mutlu Adaptive-Latency](https://reader030.vdocuments.mx/reader030/viewer/2022032414/56649ef35503460f94c05b52/html5/thumbnails/40.jpg)
40
Overhead• DRAM Manufacturer
– Additional tests: can be integrated into existing test process (i.e., TCSR test)
• DRAM (DIMM)– Already have in-DRAM temperature
sensor (i.e., Low Power DDR)– Multiple sets of timing parameters can be
stored in SPD (Serial Presence Detect)
• System Support for AL-DRAM– Already have ability to change DRAM
timing online
![Page 41: Optimizing DRAM Timing for the Common-Case Donghyuk Lee Yoongu Kim, Gennady Pekhimenko, Samira Khan, Vivek Seshadri, Kevin Chang, Onur Mutlu Adaptive-Latency](https://reader030.vdocuments.mx/reader030/viewer/2022032414/56649ef35503460f94c05b52/html5/thumbnails/41.jpg)
41
A B C0
2
4
6
8
1035.0ns 32.5ns 30.0ns 27.5ns 25.0ns 22.5ns 20.0ns
Err
ors
tRAS:
tRCD:tRP:Ref.
Interval:
10.0ns12.5ns200ms
12.5ns10.0ns200ms
10.0ns10.0ns200msReducing a timing parameter
Reduces potential reduction of other parameters
Multiple Timing Parameters
![Page 42: Optimizing DRAM Timing for the Common-Case Donghyuk Lee Yoongu Kim, Gennady Pekhimenko, Samira Khan, Vivek Seshadri, Kevin Chang, Onur Mutlu Adaptive-Latency](https://reader030.vdocuments.mx/reader030/viewer/2022032414/56649ef35503460f94c05b52/html5/thumbnails/42.jpg)
42
Temperature (°C)
0
100
200
300
400
500
600
700
Maxim
um
err
or-
free r
efr
esh
in
terv
al (m
s)
55°C 65°C 75°C 85°C64ms SPEC
More charge than requiredNeed for reliable operation
from other fail mechanisms (i.e., VRT)
Safety-margin Safe refresh interval
Extra charge that can be used for latency reduction
Temperature ↔ Refresh Interval
![Page 43: Optimizing DRAM Timing for the Common-Case Donghyuk Lee Yoongu Kim, Gennady Pekhimenko, Samira Khan, Vivek Seshadri, Kevin Chang, Onur Mutlu Adaptive-Latency](https://reader030.vdocuments.mx/reader030/viewer/2022032414/56649ef35503460f94c05b52/html5/thumbnails/43.jpg)
43
Cell capacitor
Bitline capacitor
Sense-amplifier
Access transistor
Bitli
ne
DRAM Cell Organization
![Page 44: Optimizing DRAM Timing for the Common-Case Donghyuk Lee Yoongu Kim, Gennady Pekhimenko, Samira Khan, Vivek Seshadri, Kevin Chang, Onur Mutlu Adaptive-Latency](https://reader030.vdocuments.mx/reader030/viewer/2022032414/56649ef35503460f94c05b52/html5/thumbnails/44.jpg)
44
Access transistor
Bitline capacitor
Cell capacitor Charge-
sharing
Sense
AmplifyPrecharge
Leakage
Sense-amplifier
Bit
line
Turn-on access transistor
1
Ready to access data
2Fully
charged3
Precharged to Vdd/2
4
DRAM Cell Operation
![Page 45: Optimizing DRAM Timing for the Common-Case Donghyuk Lee Yoongu Kim, Gennady Pekhimenko, Samira Khan, Vivek Seshadri, Kevin Chang, Onur Mutlu Adaptive-Latency](https://reader030.vdocuments.mx/reader030/viewer/2022032414/56649ef35503460f94c05b52/html5/thumbnails/45.jpg)
45
Largest charge
Smallest charge
Typical cell Worst cellW
orst
tem
p.Ty
pica
l tem
p.
Fast restore
Slow restore
Slowlyleak
Fast leak
DRAM Cell Charge Variations