architectural-level design exploration for power aware system
DESCRIPTION
Architectural-level Design Exploration for Power Aware System. Dexin Li October 2000. Background. Component-level low power design cannot meet system-level design goals System needs not only low power designs, but also power aware features. Motivation. - PowerPoint PPT PresentationTRANSCRIPT
Architectural-level Design Exploration for Power Aware System
Dexin Li
October 2000
Background
• Component-level low power design cannot meet system-level design goals
• System needs not only low power designs, but also power aware features
Motivation• System architecture is important for power
aware system designs– Our micro-rover example shows bus/bus interface
consume 25.65% of total system power• By adopting a variety of low-power design
techniques and low power components, architectural optimization becomes more important.
Application Example
• Microrover - Robot exploring Mars– Solar power: 15W @noon
– Electronics system:• Processor, microcontroller• Camera• radio frequency modem• Non-volatile memory/hard drive• Scientific equipment: APXS & ASI/MET• Bus drivers
– System tasks:• Steering and driving• Capture pictures and send compressed data• Perform scientific experiments, store data on media, and send data
Previous Work
• A lot of lower power design techniques– Voltage scaling, frequency scaling, clock gating– Bus encoding, bus segmentation– Algorithm transformation, imprecise arithmetic
• Other Power Aware methodology– PACT: on demand control of power consumption and
performance– µAMPS: adaptive energy-aware distributed
microsensors
IMPACCT methodology• A framework to enable power aware design
– Behavioral level optimization
• Scheduling, partitioning, migration
– Architectural level design exploration
• Constraint-driven design space exploration
• Meet power and performance constraints
• Different view of system behavior, thus different solution– Static, know system behaviors prior to architecture exploration
– Mixed, hybrid, prepare solutions for a few scenarios, pick up one at run time
– Dynamic, determine the system behaviors and explore design space both at run time,
Assumptions for the problem• Use COTS component to construct system
• Communication– Un-directional– Components’ stand-alone time is absorbed into
communication time (at coarse granularity)
• Static view of the system behavior
Problem statement• Design a tool(algorithm) that comes up with an architectural
topology and power management scheme that satisfy system-level power, workload and schedule constraints.
• Input:
– Component property
– Workload graph
– Behavioral schedule
– System constraints
• Output:
– A feasible architecture
– power management scheme
Component property• Component name• Power modes• Communication bandwidth• Mapping table
– Performance / power
– Clock frequency / power
– Supply voltage / power
• Bus interface– Maximum fanout
– Root node eligibility(Can be root node or not)
Workload graph
• A representation of communication– Vertices: components– Edges: workload(data
transfer rate)– Weight: required
communication bandwidth
rfsc1
hd cpu2
sc1
cpu1mc1
cammc1
40
202
5
180
40
30
20
Behavioral schedule
• Mission-level schedules• From behavioral
scheduling or system specification
• Communication-active and stand-alone-active– Granularity related– Here assumes they are
same
CPU1
MC1
MC2
CPU2
CAM
RF
HD
SC1
SC2
10 20 40 5030 min
System constraints• Power
– Maximum power, constant – Maximum power, function of time– Power range, constant, function of time
• Protocol– Topology: e.g. tree for 1394 bus– Communication bandwidth
• 100, 200, 400Mbps for different 1394 bus components
• Up to 80% of bandwidth for isochronous transfers
Output topology
• A feasible topology meets all system constraints, if any
rfsc1
hd cpu2
sc2
cpu1mc1
cam
mc2
Output-PM scheme
• Power management scheme– Working together with the output topology– Indicating results for each components, at each
schedule interval • power mode
• power consumption number
• required bandwidth
– Used as feedback to behavioral scheduling or software development
Problem Formulation• Tool elements:
– Component library(CL)
– Topology generator(TG)
– Power management inspector(PMI)
– Power calculator(PC)
• With workload graph, TG first generates a graph from which different topology would be abstracted out; PMI sets working modes to each component, and check whether they are legal combinations. PC finds out power number for the entire system and see whether it meets power constraints. If yes, the problem is solved; if not, different working modes or different topology are tried, and check again.
Component Model
• Component composition:– Functional unit (FU)– Bus interface (BI)
• Power management model:– Layered power modes– Modes correspondence
between FU and BI
FU
BI
Application
Bus media
LNK Full-on
Deep sleep
sleepPHY
SUS
BIApplication
Bus media
FU
Suppose when FU is working, it has communication with other components.
Bus Model
• Sender and receiver
• Service layers
• Transfer property(modes, speed, bandwidth)
• Configuration process
LNK
PHYBus media
TRS
LNK
PHY
TRS
application application
sender receiver
Configuration Management I
• Power modes constraints:– Intra-component constraints– Inter-component constraints
LNK Full-on
Deep sleep
sleepPHY
SUS
BIApplication
Bus media
FU
yes
noA B C
Data to be transferred from node A to C
Node B can’t be put in SUS mode.
Configuration Management II
• Bandwidth constraints:
A B C
Data to be transferred from node A to C @ 400Mbps
Node B’s transfer speed should be 400Mbps, too
A B C D
Data transfer rates:A to D: 150MbpsB to D: 80MbpsBandwidth for C:No less than 230MbpsFor FireWire bus: 400Mbps
Low power design techniques I
• Bus segmentation– Improve communication bandwidth– Power reduction by disable unused components
or clusters– Enabling other low power design techniques
segmentation
Low power design techniques II
• Clock Scaling and Voltage scaling– Trade off between performance and power– Two or multiple levels of frequencies or voltages
to select from– Extra hardware needed to implement the
techniques
Using low power design techniques
• Bus segmentation with clock scaling– With clustered bus, we can keep same
bandwidth by lower the clock frequency for the communication
segmentation400Mbps bus 200Mbps cluster
100Mbps clusterSuspended cluster
Algorithm I• Creating Communication-Scheduling Table
– Obtain combined information of both schedule and communication
– Used for finding out constraint set for each component
– Format: • CST : (tuple1, tuple2, ...)
• Tuple1:(workload_path, interval, required_bandwidth)
(('cpu2','mc1'),((20,30), 10)), (('cpu2','mc2'),((0,15), 20)), (('cpu1','cam'),((10,20), 20)),...
Algorithm II• Building Constraint Set
– Find legal modes• Working mode
• Power mode
• Bandwidth level
– Constrained by • Topology
• system schedule
• communication
Cam: ON: LNKCam: WL: 120
Camera must be working at at least link-layer-on mode;Required bandwidth is 120Mbps, thus the bus driver should work at at least 200Mbps
Algorithm III
• Enumerating topology
• Complexity– pick up |Et| from |Eg|, |Et|, # of edges in the
tree;|Eg|, # of edges in the graph
1. Start from workload graph G;2. Add some redundant edges to G, we get G’;3. Abstract valid topology T from G’4. Append T to topology library TL
Enumerating topology
Algorithm IV• Traversing Power
management schemes– Grouping nodes into
three classes:• Transferring (C1)
• Passing (C2)
• Idle(C3)
– Traverse different combinations
– Try bus segmentation and clock scaling techniques
C1 C2 C3
1 Full-on Full-on Full-on
2 Full-on Full-on PHY
3 Full-on PHY PHY
4 Full-on PHY SUS
5 Clustered
Full-on
Clustere
d PHY
SUS
Algorithm: top level1.Reading in component property, workload graph, system schedule, and system
constraints
2. Creating Communication-Scheduling Table
3. Building Constraint Set
4. Enumerating topology, building topology library TL
5. For Ti in TL :
6. For interval in schedule :
7. Traverse power management schemes PMSi;
8. Run power_calculator to find power number P for PMSi
9. If p satisfy power_constraint :
10. print “find a feasible solution”, Ti, PMSi
11. Stop
12. Print “can’t find a feasible solution”
Example• FireWire 1394 bus architecture
– Tree topology
– Transfer speed 100, 200, 400Mbps
• Application-Micro rover
– 9 nodes
– System schedule:walking, taking picture, walking and collect scientific data
– Workload graph
– power Constraints:
• Constant value
• Function of time
• A range with max and min value or function
RFCAM
NVM/HD
SC1 SC2
CPU2
CPU1
MC1
MC2160 20
30
20 10
12030
CPU1
MC1
MC2CPU2
CAM
RF
HD
SC1
SC2
10 20 40 5030 min
Experimental methodology• Constraint-driven design
space exploration
• Pre-given schedule from behavioral level to break the iteration loop
• Proliferate the exploration space by adding some edges to original graph
• Use both scheduling and communication information as knowledge, to build constraint set
scheduleschedule workloadworkload
Constraint setConstraint set
topologytopologyTopology
iteratorTopology
iterator
Power calculatorPower
calculatorComponent
library
Power modestraversor
Power modestraversor
Solution
Experiments
• Experiment 1:
CAM
CPU
RF
HD MC
SC
80
40 30
120
80
30
SC MCHD
CAM
CPU
RF
MAX_POWER constraint = 15.0WActual MAX_POWER = 14.9W
Experiments
MC
SC
HDCAM CPU
RF
MAX_POWER constraint = 14.0WActual MAX_POWER = 13.94W
min
CAM
CPU
MC
SC
RF
HD
10 20 40 5030
Experiments
IntervalComponent
0-10 10-30 30-40 40-50 50-60
CAM LNK200 PHY PHY PHY PHY
CPU LNK200 LNK200 PHY LNK200 LNK100
MC LNK200 LNK200 LNK100 PHY PHY
SC LNK200 LNK200 PHY PHY PHY
RF PHY LNK200 PHY PHY LNK100
HD PHY PHY LNK100 LNK200 PHY
Experiments
IntervalComponent
0-10 10-30 30-40 40-50 50-60
CAM LNK100 SUS SUS SUS SUS
CPU LNK100 LNK100 SUS LNK200 LNK100
MC LNK100 LNK100 LNK100 SUS SUS
SC LNK100 LNK100 SUS SUS SUS
RF SUS LNK100 SUS SUS LNK100
HD SUS SUS LNK100 LNK200 SUS
Experimental Results
SC2
MC1HD
CAM
CPU1
RF
CPU2 MC2
SC1
Time(min)
10
12
13
14
15
11
Power(W)
10 20 30 40 50 60
8
7
9
Power constraints
rf
sc1
hd
cpu2
sc2
cpu1 mc1cam mc2
rf
sc1
hdcpu2
sc2
cpu1 mc1
cam
mc2
Summary and future work
• A tool to explore design space for power aware architecture
• Meets different kinds of power constraints
• Incorporate low power design techniques
• Interaction with behavioral scheduling to refine solution
• Future work: hybrid and dynamic exploration
Algorithm1.Read in component property, communication graph, read system schedule, read system constraints;2. Construct searching graph (SG); if |SG| > Max_SG then stop;3. Construct schedule intervals Si;4. Enumerate all the topologies from searching graph Ti SG5. For each Ti do 6. { if Ti is topologically illegal then next Ti;7. Build configuration constraints set(CCS)) for each component;8. Initialize first schedule interval S1, all components in Full-on modes;8. For each Si do9. { if (Si != S1) copy power modes sets(PMSi) from previous interval;10. While (PMSi not exhausted) 11. { If PMSi is legal then run power_calculator 12. { if system power satisfy power constraints then next Si; 13. Else next Ti;14. } else15. { find next PMSi; }16. }17. Next Ti ;18. } print “find a solution:”; output Ti, PMS; stop19. }20. Go to step 2;