elastic refresh: techniques to mitigate refresh penalties in high density memory
Post on 09-Jan-2016
53 Views
Preview:
DESCRIPTION
TRANSCRIPT
Elastic Refresh: Techniques to Mitigate Refresh Penalties in High Density Memory
Jeffrey Stuecheli1,2, Dimitris Kaseridis1, Hillery C. Hunter3 & Lizy K. John1
1ECE Department, The University of Texas at Austin
2IBM Corp., Austin
3IBM Thomas J. Watson Research Center
Laboratory for Computer Architecture 12/7/2010
MICRO-43
2 Laboratory for Computer Architecture 12/7/2010
Overview/Summary
Refresh overhead is increasing with device density
Due to the nature of this increase, performance is suffering
Current refresh scheduling methods ineffective in hiding these delays
We propose more sophisticated mitigation methods
– Elastic Refresh Scheduling
Basic DRAM/Refresh Info
Each bit stored on a capacitor
Single read transistor to hold charge
Leakage, looses charge over time
Refresh: Rewrite cell on periodic basis
DDR3– Temperature dependence on refresh
requirement, 64ms@85oC, 32ms@95oC– DRAM device contains internal address
counter– JEDEC simply specifies the time interval
(tREFI, time REFresh Interval) tREFI = 64ms/8096 = 7.8 us (3.9 us for 95oC)
3 Laboratory for Computer Architecture 12/7/2010
Background
Transition to denser devices
7.8 us based on 8k Rows per bank
DRAM device density doubles ~2 year
With one refresh per row, tREFI would half each generation
Instead, multiple rows are refreshed with each command
Current delivery constraints forces increase in tRFC with denser devices
95 nm 512 MBit
42 nm 2GBit
4 Laboratory for Computer Architecture 12/7/2010
Background
“Stacked” Refresh Operations in a Single Command Example
Source: TN-47-16 Designing for High-Density DDR2 Memory Introduction by MICRON
5 Laboratory for Computer Architecture 12/7/2010
Background
6 Laboratory for Computer Architecture 12/7/2010
tRFC Growth with DRAM Density DRAM type Refresh Completion Time
512Mbit 90ns
1Gbit 110ns
2Gbit 160ns
4Gbit 300ns
8Gbit 350ns
In the most basic terms, tRFC should scale linearly with density
– Based strictly on current to charge capacitance
~Fixed charge per bit
This has been reflected in the DDR3 spec, with the exception of 8 GBit
Net, even if DRAM vendors can slow the growth, the delay is large today
Background
Slowdown Effects Observed in Simulation
Simics/Gems
4 cores, 2 1333MHz channels, 2 DDR3 Ranks/channel
7 Laboratory for Computer Architecture 12/7/2010
Motivation
8 Laboratory for Computer Architecture 12/7/2010
Why it is so bad
Refresh
26ns 326ns
Worst Case Refresh Hit DRAM Read
DRAM capacity
tRFC bandwidth overhead
(95oC per Rank)
latency overhead
(95oC)
512Mb 90ns 2.7% 1.4ns
1Gb 110ns 3.3% 2.1ns
2Gb 160ns 5.0% 4.9ns
4Gb 300ns 7.7% 11.5ns
8Gb 350ns 9.0% 15.7nsRefreshes Reads
tRFCtREFI
Motivation
Postponing Refresh Operations
Each cell needs to be refreshed every 64 ms,
Refresh command spacing is based around an average rate.
As such, cell failure will not occur if no refresh is sent as tREFI expires.
Current DDR3 spec allows the controller to fall eight tREFI intervals behind (backlog count)
– Cell refresh rate is elongated by 0.1% (8 in 8k)
9 Laboratory for Computer Architecture 12/7/2010
Motivation
10 Laboratory for Computer Architecture 12/7/2010
Current Approaches
Demand Refresh (DR)
– Most basic policy, sends refresh operations as high priority operations every tREFI period
Delay Until Empty (DUE)
– Policy utilizes DRAM ability to postpone refreshes.
– Refresh operations are postponed until no reads are queued, or the max backlog count has been reached
Why These policies are ineffective
– DR: Does nothing to hide refreshes
– DUE: Too aggressive in sending refresh operations. Does not take advantage of the backlog in many cases.
Motivation
11 Laboratory for Computer Architecture 12/7/2010
Elastic Refresh
Exploit
– Non-uniform request distribution
– Refresh overhead just has to fit in free cycles
Initially not aggressive, converges with DUE as refresh backlog grows
Latency sensitive workloads are often lower bandwidth
Decrease the probability of reads conflicting with refreshes
12 Laboratory for Computer Architecture 12/7/2010
Idle Delay Function
Refresh Backlog1 2 3 4 5 6 7 8
ProportionalConstantHigh
Priority
Introduce refresh backlog dependent idle threshold
With a log backlog, there is no reason to send refresh command
With a bursty request stream, the probability of a future request decreases with time
As backlog grows, decrease this delay threshold
Elastic Refresh
Idle
Delay
Threshold
Tuning the Idle Delay Function
Parameter Units Description
Max Delay Memory ClocksSets the delay in the constant region
Proportional Slope
Memory Clocks per Postponed Step
Sets slope of the proportional region
High Priority Pivot Postponed Step
Point where the idle delay goes to zero
The optimal shape of the IDF is workload dependent
IDF can be controlled with the listed parameters
Our system contains hardware to determine “good” parameters
– Max Delay and Proportional Slope
Elastic Refresh
13 Laboratory for Computer Architecture 12/7/2010
Max Delay Circuit
Current Idle Count (14)
Delay Accumlator (20)
Operation Count (10)
+0
1
+
+
Max Delay (10)
To Idle Delay Function
carry
cat
DRAM Read Sent Circuit used to collect average Rank
idle period
Conceptually, given a exponential type distribution, the average can be used to find the tail
Calculated average is used as Max Delay
Circuit function,– Accumulate idle delay over 1024 events– Average calculated with concatenation of
accumulator
Elastic Refresh
14 Laboratory for Computer Architecture 12/7/2010
15 Laboratory for Computer Architecture 12/7/2010
Proportional Slope CircuitLow High
Divide By 2 Divide By 2
Postponed < Threshold
carrycarry++
Conceptually, proportional region acts to gracefully transition to high priority, while utilizing full postponed range
Circuit works to balance the utilization across the postponed range (High/Low counts)
PI type controller adjusts slot to balance High/Low counts
Low High
- Integral
+
Prop Slope
w(p) w(i)
To Idle Delay Function
+
Elastic Refresh
Hardware Cost Trivial integration into DUE based policies
– Structure replaces “empty” indication of DUE
Logic size
– ~100 latch bits for static policy
– ~80 additional latch bits for dynamic policy
Logic cycle time
– Low frequency compared to ALU functions in processor core.
– Infrequent updates could enable pipelined control.
Elastic Refresh
16 Laboratory for Computer Architecture 12/7/2010
Refresh Queue
Input Queue Bank Queues x 8
Rank Queues x NRequest
InputInterface
Refresh Scheduler
OutputTo DRAMIO Drivers
tREFI Counter
Simulation Methodology
Simics extended with GEMS model
– 1, 4 & 8 cores CMP
– First-Ready, First-Come-First-Served memory controller policy
– DDR3 1333MHz 8-8-8 memory, 2 MC, 2 Ranks/MC
– tRFC= 550ns, tREFI = 3.9μs @95oC (estimation of 16GBit)
– Refresh policies:
• Demand Refresh (DR) • Defer Until Empty (DUE) • Elastic Refresh policies
SPEC cpu2006 workloads
17 Laboratory for Computer Architecture 12/7/2010
Results
Integer
8 Cores
18 Laboratory for Computer Architecture 12/7/2010
Related Work
B. Bhat and F. Mueller,“Making DRAM refresh predictable,” Real-Time Systems, Euromicro Conference 2010
M. Ghosh and H. S. Lee, “Smart Refresh: An enhanced memory controller design for reducing energy in conventional and 3D die-stacked DRAMs,” in MICRO 40
K. Toshiaki, P. Paul, H. David, K. Hoki, J. Golz, F. Gregory, R. Raj, G. John, R. Norman, C. Alberto, W. Matt, and I. Subramanian, “An 800 MHz embedded DRAM with a concurrent refresh mode,” in IEEE ISSCC Digest of Technical Papers, Feb. 2004
19 Laboratory for Computer Architecture 12/7/2010
Conclusions
The significant degradation of refresh can be mitigated with low overhead mechanisms
Commodity DRAM is cost driven
– Elastic refresh requires no DRAM changes
Future work:
– Coordinate refresh with other structures on the CMP
– Investigate refresh for future DRAM devices (DDR4)
• Example, dynamically select how many rows to refreshed
20 Laboratory for Computer Architecture 12/7/2010
Thank You,Questions?
Laboratory for Computer ArchitectureUniversity of Texas Austin
IBM Austin
IBM T. J. Watson Lab
21 Laboratory for Computer Architecture 12/7/2010
top related