multiple sleep mode leakage control for cache peripheral circuits in embedded processors
DESCRIPTION
Multiple Sleep Mode Leakage Control for Cache Peripheral Circuits in Embedded Processors. Houman Homayoun, Avesta Makhzan, Alex Veidenbaum Dept. of Computer Science, UC Irvine [email protected]. On-chip Caches and Power. On-chip caches in high-performance processors are large - PowerPoint PPT PresentationTRANSCRIPT
Multiple Sleep Mode Leakage Multiple Sleep Mode Leakage Control for Cache Peripheral Control for Cache Peripheral
Circuits in Embedded ProcessorsCircuits in Embedded Processors
Houman Homayoun, Avesta Makhzan, Alex Veidenbaum
Dept. of Computer Science, UC Irvine
On-chip Caches and Power On-chip caches in high-performance
processors are large more than 60% of chip budget
Dissipate significant portion of power via leakage
Much of it was in the SRAM cells Many architectural techniques proposed to
remedy this Today, there is also significant leakage in the
peripheral circuits of an SRAM (cache) In part because cell design has been optimized
Pentium M processor die photoCourtesy of intel.com
1
10
100
1000
10000
100000
mem
ory c
ell
INVX
INV2X
INV3X
INV4X
INV5X
INV6X
INV8X
INV12
X
INV16
X
INV20
X
INV24
X
INV32
X
( pw )
200X
6300X
Using minimal sized transistor for area considerations in cells and larger, faster and accordingly more leaky transistors to satisfy timing requirements in peripherals.
Using high vt transistors in cells compared with typical threshold voltage transistors in peripherals
Leakage Power Component of Different Cache
Size
SRAM peripheral circuits dissipate more than 80% of the total leakage power
0%
20%
40%
60%
80%
100%
2KB 4KB 8KB 16KB
data output driver
row pre/decoder and driver
data input driver
address input driver
others (sense amp, memory cell and etc)
A Zig-Zag Circuit
Rpeq for the first and third inverters and Rneq for the second and fourth inverters doesn’t change.
Fall time of the circuit does not change
P1 P2
N1 N2
vdd
vdd
vss
vss
010
slpN1
slpP2
Sleep signal
Sleep signal
P3
N3
vdd
vss
1
P4
N4
vdd
vss
slpP4
slpN36
L
W6
L
W
12L
W12
L
W
slpN5
Sleep signal
5.1L
W
0
vss
A Zig-Zag Share Circuit
To improve leakage reduction and area-efficiency of the zig-zag scheme, using one set of sleep transistors shared between multiple stages of inverters (ICCD’08)
Zig-Zag Horizontal Sharing Minimize impact on rise time Minimize area overhead
Zig-Zag Horizontal and Vertical Sharing Maximize leakage power saving Minimize the area overhead
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
(Footer,Header) Gate Bias Voltage Pair
Nor
mal
ized
Lea
kage
Pow
er
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
No
rmal
ized
Wak
e-U
p D
elay
Normalized leakage Normalized wake-up delay
Increasing the bias voltage increases the leakage power while decreases the wakeup delay overhead
Multiple Sleep Modes
power mode wakeup delay (cycle)
leakage reduction (%)
basic-lp 1 42%
lp 2 75%
aggr-lp 3 81%
ultra-lp 4 90%
Power overhead of waking up peripheral circuits Almost equivalent to the switching power of sleep
transistors Sharing a set of sleep transistors horizontally and
vertically for multiple stages of a (wordline) driver makes the power overhead even smaller
Low-end Architecture
Given the miss service time of 30 cycles likely that processor stalls during the miss service period Occurrence of additional cache misses while one DL1 cache
miss is already pending further increases the chance of pipeline stall
basic-lp lp u ltra -lp
aggr-lp
D L1 m iss
P rocessor sta ll
D L1 m iss++
P ending D L1 m iss
P ending D L1 m iss /es
D L1 m iss serviced
P rocessor continue
Low Power Modes in a 2KB DL1 Cache
Fraction of total execution time DL1 cache spends in each of the power mode
85% of the time DL1 peripherals put into low power modes Most of the time spent in the basic-lp mode (58% of total
execution time)
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
basic
mat
h bccr
c
dijkst
ra
djpeg fft gs
gsmla
me
mad
patric
iapgp
qsort
rijndae
l
sear
ch sha
susa
n_corn
ers
susa
n_edges
tiff2
bw
aver
age
hp trivial-lp lp aggr-lp ultra-lp