instruction level power analysis
DESCRIPTION
TRANSCRIPT
![Page 1: Instruction level power analysis](https://reader033.vdocuments.mx/reader033/viewer/2022061218/54b659744a79592d298b45d9/html5/thumbnails/1.jpg)
Instruction Level Power Analysis
1
![Page 2: Instruction level power analysis](https://reader033.vdocuments.mx/reader033/viewer/2022061218/54b659744a79592d298b45d9/html5/thumbnails/2.jpg)
2
Layout
Introduction Components of Power Consumption Power Characterization Instruction Level Power Analysis for RISC
processors Extensions for VLIW/EPIC processors Register Files Caches
![Page 3: Instruction level power analysis](https://reader033.vdocuments.mx/reader033/viewer/2022061218/54b659744a79592d298b45d9/html5/thumbnails/3.jpg)
3
Introduction
Why power of nano-electronics became so important? Because of Moore’s law still holds true through
complex applications Mobile systems – battery “bottleneck” High performance computation – heat
extraction Operating cost and reliability
Data warehouse of ISP with 8000 servers needs 2 MW
![Page 4: Instruction level power analysis](https://reader033.vdocuments.mx/reader033/viewer/2022061218/54b659744a79592d298b45d9/html5/thumbnails/4.jpg)
4
Introduction
Power or Energy? Aren’t they go hand-in-hand? Power varies significantly with time! A given battery has fixed amount of energy Average power consumption = Energy/Execution-
time Decides average chip and junction temperature Decides battery life (if peak current < rated
current) Peak power and current
Voltage drops, hot spots, rate of battery discharge Power-efficient, Energy-efficient, Battery-efficient
design paradigms do exist!
![Page 5: Instruction level power analysis](https://reader033.vdocuments.mx/reader033/viewer/2022061218/54b659744a79592d298b45d9/html5/thumbnails/5.jpg)
5
Components of Power Consumption
System = hardware platform + software (sys. & app.) Software impacts hardware power consumption
Static power Sub-threshold leakage & reverse biased junction leakage Quiescent biasing power (in case of non-CMOS circuits)
Dynamic power Charging and discharging of capacitance (switching
activity) Short circuit power during transition (rate of change,
delay) Alternative grouping (used at component/cell level)
Switching power at the boundaries of cells Internal cell power
Short circuit power Switching power at internal nodes
![Page 6: Instruction level power analysis](https://reader033.vdocuments.mx/reader033/viewer/2022061218/54b659744a79592d298b45d9/html5/thumbnails/6.jpg)
6
System Abstractions - PowerFunctional Specifications and Constraints
System Level Netlist
Register Transfer Level (RTL) Netlist
Component/Cell Level Netlist
Layout or Configuration-bits
Chip
Tim
e co
mp
lexity
Accu
racy
of p
ow
er
chara
cteriza
tion
Op
port
un
itie
s fo
r op
tim
izati
on
![Page 7: Instruction level power analysis](https://reader033.vdocuments.mx/reader033/viewer/2022061218/54b659744a79592d298b45d9/html5/thumbnails/7.jpg)
7
Power Characterization
Measurement (Chip/Board Level) Most accurate Perhaps the fastest, if setup and tools
exist Too late to change hardware details Software/Load control is still possible Typically used for software
optimizations
![Page 8: Instruction level power analysis](https://reader033.vdocuments.mx/reader033/viewer/2022061218/54b659744a79592d298b45d9/html5/thumbnails/8.jpg)
8
Power Characterization (cont…)
Transistor Level (estimation) Spice simulation of transistor level netlist Most accurate in the simulation world Requires complete implementation details Unmanageable time complexity even for
simpler designs Typically used for cell/component
characterization Synopsys PowerMill (said to provide spice-
like accuracy)
![Page 9: Instruction level power analysis](https://reader033.vdocuments.mx/reader033/viewer/2022061218/54b659744a79592d298b45d9/html5/thumbnails/9.jpg)
9
Power Characterization (cont…)
Cell Level (estimation) After logic synthesis Requires RTL implementation Simulation to capture switching activity
Requires delay simulation if glitches need to be accounted Characterized cells – empirical formulas or table look-up Interconnect power
Either unaccounted or Using estimated wire load models (typically based on
experience) or Extracted layout (if done after physical synthesis)
Still unmanageable time complexity especially to use in design space exploration
Synopsys PrimePower Netlist, interconnect capacitance, VCD traces, cell power
library
![Page 10: Instruction level power analysis](https://reader033.vdocuments.mx/reader033/viewer/2022061218/54b659744a79592d298b45d9/html5/thumbnails/10.jpg)
10
Power Characterization (cont…)
Register Transfer Level (estimation) Requires conceptual RTL description (detailed
micro-architecture) Data-path is modeled as netlist of macro cells,
which are characterized offline Control path and glue logic
Either unaccounted or estimated based on I/O Simulation to capture switching activity
Typically glitches are not considered but methods do exist
Interconnect power Typically unaccounted but possible to estimate
through floor-planning Typically used in DSE mostly using in-house tools
![Page 11: Instruction level power analysis](https://reader033.vdocuments.mx/reader033/viewer/2022061218/54b659744a79592d298b45d9/html5/thumbnails/11.jpg)
11
System Level Power Estimation
For Design Space Exploration Least accurate but uncertainty of exploration results
can be reduced if models have good fidelity Purpose, target architecture and available system
details govern the system-level estimation models Selecting algorithm or designing hardware for given
algorithm? ASIC based or processor based? Is ISA fixed or extensible?
Typically system-level power estimation models are macro-architecture template specific
Major constituents of power consumption Computation, communication, storage units & peripherals
![Page 12: Instruction level power analysis](https://reader033.vdocuments.mx/reader033/viewer/2022061218/54b659744a79592d298b45d9/html5/thumbnails/12.jpg)
12
Power Estimation Models
Activity Based Models Instruction Level Energy Models
![Page 13: Instruction level power analysis](https://reader033.vdocuments.mx/reader033/viewer/2022061218/54b659744a79592d298b45d9/html5/thumbnails/13.jpg)
13
Activity Based Models
Fixed Activity Model N-Transition Model Dual Bit Model
![Page 14: Instruction level power analysis](https://reader033.vdocuments.mx/reader033/viewer/2022061218/54b659744a79592d298b45d9/html5/thumbnails/14.jpg)
14
Fixed Activity Model
P = ∑ i kiGifi
Where:ki = PFA proportionality constant extracted
empirically from past designsGi = Measure of hardware complexity
fi = Activation frequency
Disadvantage: Do not model the influence of data activity on power consumption
![Page 15: Instruction level power analysis](https://reader033.vdocuments.mx/reader033/viewer/2022061218/54b659744a79592d298b45d9/html5/thumbnails/15.jpg)
15
N-Transition Model
P = Pconst + n.Pchange
Disadvantage:
It does not differentiate between transitions on different inputs.
![Page 16: Instruction level power analysis](https://reader033.vdocuments.mx/reader033/viewer/2022061218/54b659744a79592d298b45d9/html5/thumbnails/16.jpg)
16
Dual Bit Type Model
Drawback in previous approaches: Less Accurate Characterizes the
module on basis of Uniform White Noise (UWN) input
Leads to high error if the input dynamic range does not fully occupy the word length
![Page 17: Instruction level power analysis](https://reader033.vdocuments.mx/reader033/viewer/2022061218/54b659744a79592d298b45d9/html5/thumbnails/17.jpg)
17
Dual Bit Type ModelThe Approach
Combines reduced complexity of the architecture level with the accuracy of gate and circuit level
Black box model of capacitance switched in each module for various types of inputs
Easy to parameterize capacitance models to take into account size , etc.
![Page 18: Instruction level power analysis](https://reader033.vdocuments.mx/reader033/viewer/2022061218/54b659744a79592d298b45d9/html5/thumbnails/18.jpg)
18
Dual Bit Type ModelModeling Complexity
Power consumed by a module is a function of its complexity as large modules contain more circuitry
Examples: Capacitance of N-bit ripple carry subtracter:
CT = Ceff * N Not restricted to linear models, but can be
used to specify even more complex models
![Page 19: Instruction level power analysis](https://reader033.vdocuments.mx/reader033/viewer/2022061218/54b659744a79592d298b45d9/html5/thumbnails/19.jpg)
19
Dual Bit Type ModelCapacitive Data Coefficients
Describe the average amount of capacitance switched within a module during an input transition LSB regions suffer random transitions and
hence can be characterized by a single capacitive coefficient CUU
MSB region experiences sign transitions and so is characterized by capacitive sign coefficients C+-,C++, etc.
![Page 20: Instruction level power analysis](https://reader033.vdocuments.mx/reader033/viewer/2022061218/54b659744a79592d298b45d9/html5/thumbnails/20.jpg)
20
Instruction Level Power Estimation
First introduced to characterize processor power consumption to drive software optimizations
Each instruction is associated with some current
Inter instruction effects for better accuracy
![Page 21: Instruction level power analysis](https://reader033.vdocuments.mx/reader033/viewer/2022061218/54b659744a79592d298b45d9/html5/thumbnails/21.jpg)
21
Instruction Level Power Estimation
E = Σ(Bi x Ni) + Σ(O(i,j) x N(I,j)) + ΣEk
Bi: Base Energy Cost Oi.j: Inter-instruction effect Energy Cost Ek: additional energy penalties due to
resource constraints Require cost associated with every pair
of instructions: O(N2), where N = number of instructions in ISA
![Page 22: Instruction level power analysis](https://reader033.vdocuments.mx/reader033/viewer/2022061218/54b659744a79592d298b45d9/html5/thumbnails/22.jpg)
22
JouleTrack
Experiments on StrongARM by Amit Sinha & A.P.Chandran Current/instruction ~ 0.2A (averaged over all
instructions) Min-max variation of 38% of average current Address mode and data dependent variation is
smaller But, max current variation across benchmarks is
< 8% ! Concluded that first order energy model of a
given processor is, E = V I(V, f) T Second order effects can be significant for data-
path dominated processors such as DSP, VLIW
![Page 23: Instruction level power analysis](https://reader033.vdocuments.mx/reader033/viewer/2022061218/54b659744a79592d298b45d9/html5/thumbnails/23.jpg)
23
Instruction Level Power Estimation
Impractical for CISC processors with very large instruction set Higher Average Instruction Energy Low Energy Per Instruction Variance Do not consider inter instruction effects Cluster Similar Instructions as a single
class Exponential Storage Problem for VLIW
architectures No. of Long Instructions = N operations
into a K-wide VLIW = N(2k)
![Page 24: Instruction level power analysis](https://reader033.vdocuments.mx/reader033/viewer/2022061218/54b659744a79592d298b45d9/html5/thumbnails/24.jpg)
24
Modified Energy Model for VLIW
Assume Independent Energy dissipation for different Execution slots
Consider nop as the base energy E(W) = ΣU(wn|wn-1) + mxpxS + lxqxM U(wn|wn-1) = U(0|0) + Σv(wnk,wn-1k)
Wnk = operation issued on lane k by instruction wn Example
Wn = [ ALU NOP NOP NOP], Wn-1 = [ LS NOP ALU NOP]
U(wn|wn-1) = U(0|0) + v(ALU|LS) + v(NOP|ALU) Memory Requirement
O(K*N2)
![Page 25: Instruction level power analysis](https://reader033.vdocuments.mx/reader033/viewer/2022061218/54b659744a79592d298b45d9/html5/thumbnails/25.jpg)
25
Modified Energy Model for VLIW Cluster Similar Instructions based on cost
Θ = {e1, e2, …, et} et = energy consumption of instruction t
Partition Θ into K clusters (C1, C2, …, Ck) s.t. ΣΣ (xi,j –cj)2 = minimum
Large number of clusters Good Accuracy Huge no. of experiments
Small number of clusters Small number of experiments High Variance between clusters Reduced Accuracy
Memory Requirement O(C*N2)
![Page 26: Instruction level power analysis](https://reader033.vdocuments.mx/reader033/viewer/2022061218/54b659744a79592d298b45d9/html5/thumbnails/26.jpg)
26
Limitations of ILPA
Does not provide any insight on the causes of power consumption within the processor core
Does not account for the power consumed in the memory system, which is often dominant
To address the second limitation, power estimation frameworks which integrate processor and memory models are built around instruction set simulators
![Page 27: Instruction level power analysis](https://reader033.vdocuments.mx/reader033/viewer/2022061218/54b659744a79592d298b45d9/html5/thumbnails/27.jpg)
27
MicroArchitecture ILPA
Pipeline Aware Instruction Level Energy Model Divide the design into smaller architectural blocks
Usually Processor’s Pipeline Stages Fetch, Decode, RF, Execute, WB
E(wn|wn-1) = Σ As(wn|wn-1) + I(wn|wn-1) As = Energy Consumed Per stage s when executing
wn after wn-1 I(wn|wn-1) = Interstage connections energy
(PipeLine Registers + Buses) Provides better insight for power bottlenecks Smoother Energy Behaviour than Blackbox model Require a Pipeline Structure Aware ISS
![Page 28: Instruction level power analysis](https://reader033.vdocuments.mx/reader033/viewer/2022061218/54b659744a79592d298b45d9/html5/thumbnails/28.jpg)
28
Energy Models for Register File
Assume Linear Power Behaviour for access across different ports PRF = Pi + 1/T Σ (Er,n + Ew,n) Er,n = Σ H(RRi,n, RRi,n-1) *ErbEw,n = Σ H(RWi,n, oldi,n) * Ewb
![Page 29: Instruction level power analysis](https://reader033.vdocuments.mx/reader033/viewer/2022061218/54b659744a79592d298b45d9/html5/thumbnails/29.jpg)
29
Energy Model for Caches
Power consumption depends on mode of operation (read, write, idle)
Energy consumed in a given clock cycle is function of node transition between previous and current cycle.
Characterize energy as function of state transitions(read-read, read-write, etc).
For a given transition, dependence upon transition on address lines.
![Page 30: Instruction level power analysis](https://reader033.vdocuments.mx/reader033/viewer/2022061218/54b659744a79592d298b45d9/html5/thumbnails/30.jpg)
30
Thank You