kyushu university esa’07 @ las vegas, june 2007 the effect of nanometer-scale technologies on the...
TRANSCRIPT
Kyushu University ESA’07 @ Las Vegas, June 2007
The Effect of Nanometer-Scale The Effect of Nanometer-Scale Technologies on the Cache Size Technologies on the Cache Size
Selection for Low Energy Embedded Selection for Low Energy Embedded SystemsSystems
Hamid NooriHamid Noori, Maziar Goudarzi, , Maziar Goudarzi, Koji Inoue, and Kazuaki MurakamiKoji Inoue, and Kazuaki Murakami
Kyushu UniversityKyushu University
Kyushu University ESA’07 @ Las Vegas, June 2007
OutlineOutline
Motivations and ObservationsMotivations and Observations Energy EvaluationEnergy Evaluation Problem DefinitionProblem Definition Experimental ResultsExperimental Results ConclusionConclusion
Kyushu University ESA’07 @ Las Vegas, June 2007
OutlineOutline
Motivations and ObservationsMotivations and Observations
Problem FormulationProblem Formulation
Energy Evaluation ModelEnergy Evaluation Model
Experimental ResultsExperimental Results
ConclusionConclusion
Kyushu University ESA’07 @ Las Vegas, June 2007
Motivations and Motivations and Observations (1/2)Observations (1/2) Caches contribute a Caches contribute a
large portion of large portion of energy consumption energy consumption in embedded in embedded systemssystems
Leakage power is Leakage power is increasing in new increasing in new nanometer-scale nanometer-scale technologiestechnologies
Kyushu University ESA’07 @ Las Vegas, June 2007
Motivations and Motivations and Observations (2/2)Observations (2/2)
0
0.05
0.1
0.15
0.2
0.25
0.3
180nm 100nm 70nm
Technology
Dy
na
mic
En
erg
y (
nJ
)
32K 16K 8K 4K 2K 1K
0
50
100
150
200
250
300
180nm 100nm 70nm
Technology
Le
ak
ag
e P
ow
er
(mW
)
32K 16K 8K 4K 2K 1K
4-way set-associative cache with 16-byte block size 4-way set-associative cache with 16-byte block size Dynamic: 180nm ~ 4x 100nm & 9x 70nm (CACTI 4.1)Dynamic: 180nm ~ 4x 100nm & 9x 70nm (CACTI 4.1) Static: 70nm ~ 400x 180nm & 5x 100nm (CACTI 4.1)Static: 70nm ~ 400x 180nm & 5x 100nm (CACTI 4.1)
Kyushu University ESA’07 @ Las Vegas, June 2007
GoalGoal
The effect of different nanometer-The effect of different nanometer-scale technologies on cache scale technologies on cache configuration selection in low-configuration selection in low-energy embedded systemsenergy embedded systems
Kyushu University ESA’07 @ Las Vegas, June 2007
OutlineOutline
Energy EvaluationEnergy Evaluation
Kyushu University ESA’07 @ Las Vegas, June 2007
Energy Evaluation Energy Evaluation (1/3)(1/3) StaticStatic DynamicDynamic
energy_memory(Config, Tech) =energy_memory(Config, Tech) =
energy_dynamic(Config, Tech) + energy_dynamic(Config, Tech) +
energy_static(Config, Tech)energy_static(Config, Tech)
Kyushu University ESA’07 @ Las Vegas, June 2007
Energy Evaluation Energy Evaluation (2/3)(2/3)
energy_dynamic(Config, Tech) = energy_dynamic(Config, Tech) = cache_accesses(Config) * energy_cache_access(Config, cache_accesses(Config) * energy_cache_access(Config,
Tech) + Tech) + cache_misses(Config) * energy_miss(Config,Tech) cache_misses(Config) * energy_miss(Config,Tech)
energy_miss(Config, Tech) = energy_miss(Config, Tech) =
energy_off_chip_access + energy_off_chip_access + energy_cache_block_refill(Config,Tech)energy_cache_block_refill(Config,Tech)
energy_static(Config, Tech) = energy_static(Config, Tech) = executed_clock_cycles(Config) * clock_period * executed_clock_cycles(Config) * clock_period *
leakage_power(Config, Tech)leakage_power(Config, Tech)
Kyushu University ESA’07 @ Las Vegas, June 2007
Energy Evaluation Energy Evaluation (3/3)(3/3) SimplescalaSimplescalarr
– cache_accessescache_accesses– cache_missescache_misses– executed_clock_cyclesexecuted_clock_cycles
CACTI 4.1CACTI 4.1– energy_cache_accessenergy_cache_access– energy_cache_block_refillenergy_cache_block_refill– leakage_powerleakage_power
energy_off_chip_access = 20 nJenergy_off_chip_access = 20 nJ Clock freq = 200MHzClock freq = 200MHz
Kyushu University ESA’07 @ Las Vegas, June 2007
OutlineOutline
Problem DefinitionProblem Definition
Kyushu University ESA’07 @ Las Vegas, June 2007
Problem DefinitionProblem Definition
““For a given application, processor For a given application, processor architecture, technology, and architecture, technology, and instruction- and data-cache instruction- and data-cache organization (i.e. the cache organization (i.e. the cache
associativity and line-size), find the associativity and line-size), find the cache size that results in minimum cache size that results in minimum energy consumption (i.e. minimizes energy consumption (i.e. minimizes Equation 1 for a given technology) Equation 1 for a given technology) over the entire application run.over the entire application run.””
Kyushu University ESA’07 @ Las Vegas, June 2007
OutlineOutline
Experimental ResultsExperimental Results
Kyushu University ESA’07 @ Las Vegas, June 2007
Experimental ResultsExperimental Results
Applications from MibenchApplications from Mibench SimpleScalarSimpleScalar CACTI 4.1CACTI 4.1
– Three technologies: 180nm, 100nm, and Three technologies: 180nm, 100nm, and 70nm70nm
Kyushu University ESA’07 @ Las Vegas, June 2007
Instruction CacheInstruction Cache
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
20000
64K 32K 16K 8K 4K 2K 1K
Cache Size
Clo
ck
Cy
cle
s (M
)
Kyushu University ESA’07 @ Las Vegas, June 2007
Energy Evaluation for three Energy Evaluation for three different technologies - different technologies - qsortqsort
0
500
1000
1500
2000
2500
3000
3500
4000
64K 32K 16K 8K 4K 2K 1K
Cache Size
To
tal E
ne
rgy
(m
J)
- 1
80
nm
static dynamic
0
500
1000
1500
2000
2500
3000
3500
4000
64K 32K 16K 8K 4K 2K 1K
Cache Size
To
tal E
ne
rgy
(m
J)
- 1
00
nm
static dynamic
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
64K 32K 16K 8K 4K 2K 1K
Cache Size
To
tal E
ne
rgy
(m
J)
- 7
0n
m
static dynamic
Kyushu University ESA’07 @ Las Vegas, June 2007
Energy SavingEnergy Saving
There are two different points for a minimum-energy There are two different points for a minimum-energy cache size which are 64K (180nm), and 16K (100nm cache size which are 64K (180nm), and 16K (100nm and 70nm).and 70nm).
Total energy is reduced by 38% and 55% respectively Total energy is reduced by 38% and 55% respectively in 100nm and 70nm processes when selecting 16KB in 100nm and 70nm processes when selecting 16KB size for the instruction cache instead of 64KB. size for the instruction cache instead of 64KB.
In this application (In this application (qsortqsort), this saving comes at a ), this saving comes at a performance penalty of 37% performance penalty of 37%
We also note that energy is reduced by 50% in 180nm We also note that energy is reduced by 50% in 180nm process when employing a 64KB cache instead of process when employing a 64KB cache instead of 16KB; i.e., bigger cache used to result in less energy. 16KB; i.e., bigger cache used to result in less energy. But as shown above, this trend is reversed in But as shown above, this trend is reversed in nanometer technologies. nanometer technologies.
Kyushu University ESA’07 @ Las Vegas, June 2007
Other ApplicationsOther Applications
Cache Size 100nm 70nm
180nm 100nm 70nm Energy saving
Performance penalty
Energy saving
Performance penalty
basicmath 32K 32K 32K 0.0 0.0 0.0 0.0
bitcounts 2K 2K 2K 0.0 0.0 0.0 0.0
Cjpeg 16K 16K 4K 0.0 0.0 3.38 123.88
Djpeg 16K 16K 4K 0.0 0.0 28.12 79.27
Lame 32K 8K 8K 30.02 36.39 55.54 36.39
dijkstra 16K 16K 1K 0.0 0.0 14.41 211.07
patricia 32K 32K 32K 0.0 0.0 0.0 0.0
blowfish 32K 32K 8K 0.0 0.0 40.70 80.40
rijndael 32K 32K 16K 0.0 0.0 8.62 61.02
average 3.33 4.04 16.75 65.78
Kyushu University ESA’07 @ Las Vegas, June 2007
Data CacheData Cache
750000
800000
850000
900000
950000
1000000
64K 32K 16K 8K 4K 2K 1K
Cache Size
Clo
ck
Cy
cle
s (K
)
Kyushu University ESA’07 @ Las Vegas, June 2007
Energy Evaluation for three Energy Evaluation for three different technologies - different technologies - qsortqsort
0
20
40
60
80
100
120
64K 32K 16K 8K 4K 2K 1K
Cache Size
To
tal E
ne
rgy
(m
J)
- 1
80
nm
static dynamic
0
100
200
300
400
500
600
64K 32K 16K 8K 4K 2K 1K
Cache Size
To
tal E
ne
rgy
(m
J)
- 1
00
nm
static dynamic
0
500
1000
1500
2000
2500
64K 32K 16K 8K 4K 2K 1K
Cache Size
To
tal E
ne
rgy
(mJ)
- 7
0nm
static dynamic
Kyushu University ESA’07 @ Las Vegas, June 2007
Energy SavingEnergy Saving
According to the results 32K, 2K and 1K are minimum-According to the results 32K, 2K and 1K are minimum-energy data cache sizes for 180nm, 100nm and 70nm, energy data cache sizes for 180nm, 100nm and 70nm, respectively.respectively.
The minimum-energy caches for 100nm (2KB) and The minimum-energy caches for 100nm (2KB) and 70nm (1KB) technologies respectively consume 88% 70nm (1KB) technologies respectively consume 88% and 56% less energy compared to the minimum-and 56% less energy compared to the minimum-energy cache of 180nm process (i.e. 32KB). energy cache of 180nm process (i.e. 32KB).
The corresponding performance penalty is only 9% The corresponding performance penalty is only 9% and 14% respectively. and 14% respectively.
In 180nm technology, the optimal cache size (32KB) In 180nm technology, the optimal cache size (32KB) consumes 28% and 40% less energy than 2KB and consumes 28% and 40% less energy than 2KB and 1KB caches, but this relation is reversed, with 1KB caches, but this relation is reversed, with increasing significance, in 100nm and 70nm increasing significance, in 100nm and 70nm technologies.technologies.
Kyushu University ESA’07 @ Las Vegas, June 2007
Other ApplicationsOther Applications
Cache Size 100nm 70nm
180nm 100nm 70nm Energy saving
Performance penalty
Energy saving
Performance penalty
basicmath 4K 2K 2K 28.15 2.73 43.02 2.73
susan 8K 2K 2K 34.84 10.08 62.20 10.08
cjpeg 32K 8K 8K 48.13 12.21 66.22 12.21
djpeg 32K 8K 8K 25.46 25.96 58.71 25.96
lame 32K 16K 8K 21.93 12.97 47.52 53.85
dijkstra 32K 8K 8K 34.44 35.87 58.77 35.87
patricia 32K 8K 8K 57.04 9.85 77.69 24.79
blowfish 32K 8K 4K 57.91 11.43 69.28 52.10
rijndael 32K 16K 8K 36.61 9.00 59.98 33.89
sha 32K 1K 1K 74.53 13.7 91.34 13.72
average 41.09 14.38 63.47 26.52
Kyushu University ESA’07 @ Las Vegas, June 2007
The effect of miss rate on optimal The effect of miss rate on optimal
cache size for different cache size for different technologiestechnologies
0
10000
20000
30000
40000
50000
60000
64K 32K 16K 8K 4K 2K 1K
Cache Size
Nu
mb
er
of
Mis
se
s (
K)
1-way 2-way
Kyushu University ESA’07 @ Las Vegas, June 2007
Energy EvaluationEnergy Evaluation
0
200
400
600
800
1000
1200
1400
64K 32K 16K 8K 4K 2K 1K
Cache Size
To
tal E
ne
rgy
(m
J)
180nm 100nm 70nm
0
200
400
600
800
1000
1200
1400
1600
64K 32K 16K 8K 4K 2K 1K
Cache Size
To
tal E
ne
rgy
(m
J)
180nm 100nm 70nm
Kyushu University ESA’07 @ Las Vegas, June 2007
ResultsResults
For direct mapped cache, the minimum-energy cache size for three For direct mapped cache, the minimum-energy cache size for three technologies is 32Ktechnologies is 32K
For 2-way, 32K, 16K and 16K are candidates with minimum energy for For 2-way, 32K, 16K and 16K are candidates with minimum energy for 180nm, 100nm and 70nm. 180nm, 100nm and 70nm.
When the slope of miss rate is very sharp, dynamic energy becomes When the slope of miss rate is very sharp, dynamic energy becomes dominant compared to static energy, and therefore, for any technology we dominant compared to static energy, and therefore, for any technology we will reach to the same cache size. will reach to the same cache size.
However when a 2-way set associative cache is used, the sharpness in miss However when a 2-way set associative cache is used, the sharpness in miss rate diagram flattens and again the static energy becomes more important. rate diagram flattens and again the static energy becomes more important. That is why in 100nm and 70nm we have a different optimal point compared That is why in 100nm and 70nm we have a different optimal point compared to 180nm in the 2-way cache. to 180nm in the 2-way cache.
Thus, as the miss ratio variations become softer, the optimal cache sizes for Thus, as the miss ratio variations become softer, the optimal cache sizes for different technologies get farther. different technologies get farther.
For the instruction cache, where execution clock cycles changes from 800 M For the instruction cache, where execution clock cycles changes from 800 M to 17000 M (~21 times more), the optimal cache sizes are 64K, 16K and 16K, to 17000 M (~21 times more), the optimal cache sizes are 64K, 16K and 16K, whereas for data cache with softer variation, from 800 M to 1000 M (only 1.2 whereas for data cache with softer variation, from 800 M to 1000 M (only 1.2 times more, the minimum-energy cache sizes are 32K, 2K and 1K. times more, the minimum-energy cache sizes are 32K, 2K and 1K.
In the case of the 2-way cache, the optimal cache size for 100nm and 70nm In the case of the 2-way cache, the optimal cache size for 100nm and 70nm processes (16KB in both of them) respectively consumes 9% and 29% less processes (16KB in both of them) respectively consumes 9% and 29% less energy compared to the 180nm optimal cache (32KB) with 25% performance energy compared to the 180nm optimal cache (32KB) with 25% performance loss. loss.
Kyushu University ESA’07 @ Las Vegas, June 2007
ConclusionsConclusions
The results show that for re-implementing low energy embedded The results show that for re-implementing low energy embedded systems in a new technology the cache may need to be re-selected. systems in a new technology the cache may need to be re-selected.
Our study showed that the sharper the slope of miss rate for Our study showed that the sharper the slope of miss rate for different cache sizes, the less variation in optimal cache size for different cache sizes, the less variation in optimal cache size for different technologies. different technologies.
The experiments showed that in all cases, the optimal cache size The experiments showed that in all cases, the optimal cache size decreases in finer technologies despite the increase in misses and decreases in finer technologies despite the increase in misses and dynamic energy. This is due to high impact of static energy in future dynamic energy. This is due to high impact of static energy in future technologies and confirms that, unlike micrometer-scale technologies and confirms that, unlike micrometer-scale technologies, simply adding more cache does not reduce total technologies, simply adding more cache does not reduce total system energy in future; system energy in future; cache size must be reduced to minimize cache size must be reduced to minimize total system energy in future nanometer technologiestotal system energy in future nanometer technologies. .
In data cache to due the less cache accesses (less dynamic energy) In data cache to due the less cache accesses (less dynamic energy) compared to the instruction cache, this fact is magnified. compared to the instruction cache, this fact is magnified.
Since the smaller caches are more suitable for low energy systems Since the smaller caches are more suitable for low energy systems in finer technologies, finding an optimal cache configuration that in finer technologies, finding an optimal cache configuration that simultaneously optimizes performance and energy is increasingly simultaneously optimizes performance and energy is increasingly more difficult in future.more difficult in future.
Kyushu University ESA’07 @ Las Vegas, June 2007
Thank you for your attentionThank you for your attention
Kyushu University ESA’07 @ Las Vegas, June 2007
Energy Saving & Energy Saving & Performance PenaltyPerformance Penalty
Energy Saving = Energy Saving =
(energy_cache180_NTech – energy_cacheNTech) / (energy_cache180_NTech – energy_cacheNTech) / energy_cache180_NTechenergy_cache180_NTech
Performance Penalty = Performance Penalty =
(exec_time_cacheNTech – (exec_time_cacheNTech – exec_time_cache180) / exec_time_cache180exec_time_cache180) / exec_time_cache180
Kyushu University ESA’07 @ Las Vegas, June 2007
Instruction Cache – Instruction Cache – Energy SavingEnergy Saving
0
10
20
30
40
50
60
70
En
erg
y s
av
ing
fo
r 1
00
nm
(%
)
20°C 60°C 100°C
0
5
10
15
20
25
30
35
40
45
50
basic
mat
h
bitcounts
qsort
cjpeg
djpeg
lam
e
dijkst
ra
patric
ia
blowfis
h
aver
age
En
erg
y s
av
ing
fo
r 7
0n
m (
%)
20°C 60°C 100°C
100nm: 8%, 27%, and 41% for 20°C, 60°C, 100°C (max: 65%)
70nm: 1%, 6%, and 16% for 20°C, 60°C, 100°C (max: 45%)
Kyushu University ESA’07 @ Las Vegas, June 2007
Instruction Cache – Instruction Cache – Performance PenaltyPerformance Penalty
-2
0
2
4
6
8
10
12
14
basic
mat
h
bitcounts
qsort
cjpeg
djpeg
lam
e
dijkst
ra
patric
ia
blowfis
h
aver
age
Pe
rfo
rma
nc
e p
en
alt
y f
or
10
0n
m (
%)
20°C 60°C 100°C
-1
4
9
14
19
24
29
34
39
44
basic
mat
h
bitcounts
qsort
cjpeg
djpeg
lam
e
dijkst
ra
patric
ia
blowfis
h
aver
age
Pe
rfo
rma
nc
e p
en
alt
y f
or
70
nm
(%
)
20°C 60°C 100°C
100nm: 1%, 1.2%, and 2.2% for 20°C, 60°C, 100°C
70nm: 0.6%, 2.3%, and 16% for 20°C, 60°C, 100°C
Kyushu University ESA’07 @ Las Vegas, June 2007
Data Cache – Energy Data Cache – Energy SavingSaving
0
10
20
30
40
50
60
70
80
En
erg
y s
av
ing
fo
r 1
00
nm
(%
)
20°C 60°C 100°C
0
10
20
30
40
50
60
70
basic
mat
h
susa
nqso
rt
cjpeg
djpeg
lam
e
dijkst
ra
patric
ia
blowfis
h
aver
age
En
erg
y s
av
ing
fo
r 7
0n
m (
%)
20°C 60°C 100°C
100nm: 3.3%, 25%, and 47% for 20°C, 60°C, 100°C (max: 75%)
70nm: 7%, 22%, and 33% for 20°C, 60°C, 100°C (max: 65%)
Kyushu University ESA’07 @ Las Vegas, June 2007
Data Cache – Data Cache – Performance PenaltyPerformance Penalty
0
2
4
6
8
10
12
14
16
18
20
Per
form
ance
pen
alty
for
100n
m (%
)
20°C 60°C 100°C
0
5
10
15
20
25
30
35
40
45
50
Per
form
ance
pen
alty
for
70n
m (%
)
20°C 60°C 100°C
100nm: 0.8%, 5.3%, and 8% for 20°C, 60°C, 100°C
70nm: 3.6%, 10%, and 20% for 20°C, 60°C, 100°C
Kyushu University ESA’07 @ Las Vegas, June 2007
Architecture and Architecture and Reconfiguration Flow for a Reconfiguration Flow for a
Temperature-Aware Temperature-Aware Configurable CacheConfigurable Cache
Configurable Cache +Configurable Cache +
– HardwareHardware Thermal sensorThermal sensor Accessible read portAccessible read port
– SoftwareSoftware A table in Operating System (OS) for recoding A table in Operating System (OS) for recoding
temperature ranges and their suitable cache temperature ranges and their suitable cache configurationconfiguration
Kyushu University ESA’07 @ Las Vegas, June 2007
Flow of configuring Flow of configuring Temperature-Aware Temperature-Aware Configurable Cache Configurable Cache
Static and dynamicpower for differentcache configuration
and temperatures forthe target technology
Execution time, number ofhits and misses for
different cacheconfigurations obtained
through running theapplication on an ISS
Determining thelowest energy cache
configuration fordifferent targettemperatures
Fill the lookup table of theconfigurable cache withproper configuration for
each temperature
Evaluationphase
(offline)
Detect the currenttemperature
Use the lookup table andload the proper
configuration for thecurrent temperature
Execute theapplication
Reconfigurationphase (online)
Kyushu University ESA’07 @ Las Vegas, June 2007
Temperature Temperature measurement accuracy measurement accuracy
(1/2)(1/2) TTjj = T = Taa + θ + θJAJA . P . P
– TTjj: Junction Temperature: Junction Temperature
– TTaa: Ambient Temperature: Ambient Temperature
– P: PowerP: Power
– θθJA JA : Junction-to-Ambient: Junction-to-Ambient Thermal Thermal ResistanceResistance
Kyushu University ESA’07 @ Las Vegas, June 2007
Temperature Temperature measurement accuracy measurement accuracy
(2/2)(2/2)
ARM7TDMIARM7TDMI ARM966E-SARM966E-S
180nm180nm Power Power consumptioconsumptio
nn
24.15 mW24.15 mW 140 mW140 mW
FrequencyFrequency 115 MHz115 MHz 200 MHz200 MHz
130nm130nm Power Power consumptioconsumptio
nn
7.98 mW7.98 mW 62.5 mW62.5 mW
FrequencyFrequency 133 MHz133 MHz 250 MHz250 MHz
90nm90nm Power Power consumptioconsumptio
nn
7.08 mW7.08 mW 51.7 mW51.7 mW
FrequencyFrequency 236 MHz236 MHz 470 MHz470 MHz
θJA: 7°C/W ~ 35 °C/W ΔT = (Tj - Ta) ~ 5 °C
Kyushu University ESA’07 @ Las Vegas, June 2007
ConclusionsConclusions
Our results show that up to 66% and 45% energy Our results show that up to 66% and 45% energy consumption can be saved for 100nm and 70nm for consumption can be saved for 100nm and 70nm for instruction cache when the temperature changes from 0°C instruction cache when the temperature changes from 0°C to 100°C. to 100°C.
Due to the increase of leakage effect in finer technologies Due to the increase of leakage effect in finer technologies and higher temperatures, the smaller caches will be more and higher temperatures, the smaller caches will be more energy efficient for future low energy systems.energy efficient for future low energy systems.
Since the smaller caches are more suitable for low energy Since the smaller caches are more suitable for low energy systems in finer technologies and higher temperatures, systems in finer technologies and higher temperatures, finding an optimal cache configuration that simultaneously finding an optimal cache configuration that simultaneously optimizes performance and energy is increasingly more optimizes performance and energy is increasingly more difficult in future, specially at high temperatures. difficult in future, specially at high temperatures.
Since the accesses to data cache are less than the Since the accesses to data cache are less than the accesses to instruction cache, the data cache is more accesses to instruction cache, the data cache is more easily affected by temperature and technology than easily affected by temperature and technology than instruction cache. By using a configurable data cache, up instruction cache. By using a configurable data cache, up to 74% and 64% energy can be saved for 100nm and to 74% and 64% energy can be saved for 100nm and 70nm respectively. 70nm respectively.
Kyushu University ESA’07 @ Las Vegas, June 2007
Thank you for your attentionThank you for your attention
Questions?Questions?
Kyushu University ESA’07 @ Las Vegas, June 2007
Motivations and Motivations and Observations (3/4)Observations (3/4) BSIM3 equation for subthreshold leakageBSIM3 equation for subthreshold leakage
Kyushu University ESA’07 @ Las Vegas, June 2007
Experimental Results Experimental Results (1/)(1/) Applications from MibenchApplications from Mibench SimpleScalarSimpleScalar CACTI 4.1CACTI 4.1
– Three technologies: 180nm, 100nm, and Three technologies: 180nm, 100nm, and 70nm70nm
– Six Temperatures: Six Temperatures: 00°C, 2°C, 200°C, 4°C, 400°C, 6°C, 600°C, °C, 8800°C, 10°C, 1000°C °C
Configurable CacheConfigurable Cache– Size: 64KB~1KBSize: 64KB~1KB
Kyushu University ESA’07 @ Las Vegas, June 2007
Qsort-Instruction Qsort-Instruction CacheCache
0
2000000
4000000
6000000
8000000
10000000
12000000
14000000
16000000
18000000
20000000
128K 64K 32K 16K 8K 4K 2K 1K
Instruction Cache Size - qsort
No
. of E
xecu
tion
Clo
ck C
ycle
s (K
)
0
500
1000
1500
2000
2500
3000
3500
4000
128K 64K 32K 16K 8K 4K 2K 1K
Instruction Cache Size - qsort
Dyn
amic
En
erg
y (m
J) -
100n
m
Kyushu University ESA’07 @ Las Vegas, June 2007
Qsort-Instruction Qsort-Instruction CacheCache
0
50
100
150
200
250
300
350
400
450
500
128K 64K 32K 16K 8K 4K 2K 1K
Instruction Cache Size - qsort
Sta
tic
En
erg
y (
mj)
- 1
00
nm
0°C 20°C 40°C
60°C 80°C 100°C
0
500
1000
1500
2000
2500
3000
3500
4000
4500
128K 64K 32K 16K 8K 4K 2K 1K
Instruction Cache Size - qsort
To
tal E
ne
rgy
(m
J)
- 1
00
nm
0°C 20°C 40°C
60°C 80°C 100°C
• {0°C ~ 80°C} 64KB , {80°C ~ 100°C} 32KB
• 17% energy saving and 19.6% performance penalty
Kyushu University ESA’07 @ Las Vegas, June 2007
Qsort-Data Cache Qsort-Data Cache
1000000
1050000
1100000
1150000
1200000
1250000
1300000
1350000
512K 256K 128K 64K 32K 16K 8K 4K 2K 1K
Data Cache Size - qsort
No
. of E
xecu
tion
Clo
ck C
ycle
s (K
)
0
20
40
60
80
100
120
512K 256K 128K 64K 32K 16K 8K 4K 2K 1K
Data Cache Size -qsort
Dyn
amic
En
erg
y (m
J) -
100n
m
2-way set-associative, 16 bytes line size, 100nm.
Kyushu University ESA’07 @ Las Vegas, June 2007
Qsort-Data CacheQsort-Data Cache
0
500
1000
1500
2000
2500
3000
512K 256K 128K 64K 32K 16K 8K 4K 2K 1K
Data Cache Size - qsort
Sta
tic
En
erg
y (
mJ
) -
10
0n
m
0°C 20°C 40°C
60°C 80°C 100°C
Fig. 12. Static energy for different data cache sizes (100nm).
0
100
200
300
400
500
600
700
800
128K 64K 32K 16K 8K 4K 2K 1K
Data Cache Size - qsort
To
tal E
ner
gy
(mJ)
- 10
0nm
0°C 20°C 40°C
60°C 80°C 100°C