low power hardware synthesis from concurrent action oriented specifications (caos)
DESCRIPTION
Low Power Hardware Synthesis from Concurrent Action Oriented Specifications (CAOS). Sandeep K. Shukla Gaurav Singh FERMAT Lab, Virginia Tech. Outline. CAOS Scheduling Problem Complexity Analysis Peak Power Problem Complexity Analysis Technique – Rescheduling ( suppressing actions ) - PowerPoint PPT PresentationTRANSCRIPT
Formal Engineering Research with Models Abstractions and Transformations
FERMAT
Low Power Hardware Synthesis from Concurrent
Action Oriented Specifications (CAOS)
Sandeep K. Shukla
Gaurav Singh
FERMAT Lab, Virginia Tech.
FERMAT / Virginia Tech 2
Outline
• CAOS Scheduling Problem– Complexity Analysis
• Peak Power Problem– Complexity Analysis– Technique – Rescheduling ( suppressing actions )
• Dynamic Power Problem– Complexity Analysis– Techniques – Rescheduling, Operand Isolation, Clock Gating, Gated Guards.
FERMAT / Virginia Tech 3
CAOS Scheduling Problem
( Complexity Analysis )
FERMAT / Virginia Tech 4
SCHEDULING PROBLEMS WITHOUT A PEAK POWERCONSTRAINT
• Maximum Non-conflicting Subset of actions (MNS)
– Choosing actions which can execute in a clock cycle.
• Minimum Length Schedule Construction (MLS)
– Distributing actions over multiple clock cycles.
FERMAT / Virginia Tech 5
MAXIMUM NON-CONFLICTING SUBSET OF ACTIONS (MNS)
Instance - Set A = {a1, a2, …, an} of enabled actions; a collection
C of pairs of actions, where {ai, aj} Є C means that actions ai and
aj conflict; an integer K ≤ n.
Question - Is there subset A’ C A such that |A’| > K and no pair of
actions in A’ conflict?
• MNS problem is NP-Complete.
• Corresponds to Maximum Independent Set (MIS) Problem.
FERMAT / Virginia Tech 6
MAXIMUM NON-CONFLICTING SUBSET OF ACTIONS (MNS)
NOTE - For any ρ ≥ 1, a ρ-approximation algorithm for a
combinatorial optimization problem is a heuristic that produces a
solution which is within a factor ρ of the optimal solution value.
• It is known that for any Є > 0, there is no O(n1- Є) - approximation
algorithm for the MIS problem, unless P = NP.
• Same holds for MNS Problem.
FERMAT / Virginia Tech 7
MAXIMUM NON-CONFLICTING SUBSET OF ACTIONS (MNS)
SOLUTION - Heuristics with good performance guarantees can be
devised by exploiting the relationship between MNS and MIS
problems.
• SPECIAL CASES – – Each action conflicts with at most Δ other actions for some constant
Δ- • Approximation algorithm exists that provides a performance guarantee of Δ+1.
– Planar graphs, near-planar graphs and unit disk graphs- • Efficient approximation algorithms are known for such classes of graphs.
FERMAT / Virginia Tech 8
MINIMUM LENGTH SCHEDULE CONSTRUCTION (MLS)
Instance - Set A = {a1, a2,…,an} of actions; a collection C of
pairs of actions, where {ai, aj} Є C means that actions ai and aj
conflict, an integer t ≤ n.
Question - Is there a partition of A into r subsets A1, A2,...,Ar for
some r ≤ t such that for each i, 1 ≤ i ≤ r, the actions in Ai are
pair-wise non-conflicting?
• MLS problem is NP-Complete.
• Corresponds to Minimum K-coloring (MINCOLOR) Problem.
FERMAT / Virginia Tech 9
MINIMUM LENGTH SCHEDULE CONSTRUCTION (MLS)
• It is known that for any Є > 0, there is no O(n1- Є) - approximation
algorithm for MINCOLOR problem, unless P = NP.
• Same holds for MLS Problem.
FERMAT / Virginia Tech 10
MINIMUM LENGTH SCHEDULE CONSTRUCTION (MLS)
SOLUTION – Heuristics for graph coloring can be used in
constructing schedules of near-minimum length.
• SPECIAL CASES – – Upper bound on the length of schedule is two -
• Corresponds to the problem of determining whether a graph is 2-colorable.
• Efficient algorithms are known.
– Each action conflicts with at most Δ other actions – • For such instances, a schedule of length at most Δ + 1 can be
constructed in polynomial time.
FERMAT / Virginia Tech 11
PEAK POWER PROBLEM
( Complexity Analysis )
FERMAT / Virginia Tech 12
SCHEDULING PROBLEMS INVOLVING A POWERCONSTRAINT
Single Clock Cycle –
– Maximum Number of Actions in a Time Slot Subject to Peak Power Constraint (MNA-PP).
– Maximizing Utility Subject to Peak Power Constraint (MU-PP).
FERMAT / Virginia Tech 13
Maximum Number of Actions in a Time Slot Subject to Peak Power Constraint (MNA-PP).
Instance –– set A = {a1, a2,…, an} of non-conflicting actions,– for each action ai, the power pi needed to execute that action, – a positive number P representing the peak power constraint.
Requirement - Find a subset A’ C A such that - – total power needed to execute actions in A’ is at most P and– |A’| is a maximum over all subsets of A that satisfy peak power
constraint.
Optimal Solution - – Sort actions in A into non-decreasing order by the amount of power.– Keep adding actions in order as long as the peak power constraint is
satisfied.
FERMAT / Virginia Tech 14
Maximizing Utility Subject to Peak Power Constraint (MU-PP)
Instance – – set A = {a1, a2,…,an} of non-conflicting actions, – for each action ai, its power pi consumed and its utility ui, – a positive number P representing the peak power, – a positive number Γ representing the required utility.
Question - Is there a subset A’ C A such that the total power needed to execute all the actions in A’ is at most P and the utility of A’ is at least Γ ?
• MU-PP problem is NP-Complete.
• Corresponds to KNAPSACK Problem.
FERMAT / Virginia Tech 15
Maximizing Utility Subject to Peak Power Constraint (MU-PP)
• Any approximation algorithm for the KNAPSACK problem can be used as
an approximation algorithm with the same performance guarantee for the
optimization version of MU-PP
• When the weights and profits are integers, there is a polynomial time
approximation scheme (PTAS) for the KNAPSACK problem.
FERMAT / Virginia Tech 16
SCHEDULING PROBLEMS INVOLVING A POWERCONSTRAINT
Multiple Clock Cycles –
– Minimizing Makespan Subject to Peak Power Constraint (MM-PP).
– Minimizing Peak Power Subject to Makespan Constraint (MPP-M).
– Minimizing Makespan and Peak Power – Decision Version(MPP-DECISION)
FERMAT / Virginia Tech 17
Minimizing Makespan Subject to Peak Power Constraint (MM-PP)
Instance – – set A = {a1, a2,…,an} of non-conflicting actions, – for each action ai, the power pi needed to execute that
action,– a positive number P representing the peak power
Requirement –
Find a schedule of minimum length for the actions in A such that the total power needed to execute the actions in each time slot is at most P.
FERMAT / Virginia Tech 18
Minimizing Peak Power Subject to a Makespan Constraint (MPP-M)
Instance – – set A = {a1, a2,…,an} of non-conflicting actions, – for each action ai, the power pi needed to execute that
action,– a positive number L representing the makespan (number of
slot used by a schedule).
Requirement –
Find a schedule of length at most L for the actions in A such that the maximum total power used in any time slot is a minimum over all schedules of length at most L.
NOTE - MPP-M is dual of MM-PP.
FERMAT / Virginia Tech 19
Minimizing Makespan and Peak Power (MPP-DECISION)– Decision Version of MM-PP and MPP-M.
Instance – – set A = {a1, a2,…,an} of non-conflicting actions, – for each action ai, the power pi needed to execute that action,– a positive number P representing the peak power,– a positive number L representing the makespan.
Question – Is there a schedule of length at most L for the actions in A such that the
total power used in any time slot is at most P ?
• MPP-DECISION problem is Strongly NP-Complete.
• Corresponds to 3-PARTITION problem.
• No pseudo-polynomial algorithm for the MPP-DECISION problem, unless
P = NP.
FERMAT / Virginia Tech 20
Approximation Algorithms for MM-PP
• Efficient approximation algorithms possible by reducing the
problem to the well known BIN PACKING problem.
• Example - Simple algorithm called First Fit Decreasing (FFD)
provides a performance guarantee of 11/9.
– Sort items in non-increasing order of their sizes and then assign each item to the first bin in which it will fit.
– Sophisticated implementation reduces the running time to O(n log n).
FERMAT / Virginia Tech 21
Approximation Algorithms for MPP-M
• Efficient approximation algorithms possible by reducing the
problem to classical multiprocessor scheduling problem.
• Example –– 4/3 approximation algorithm -
• Sort the actions in non-increasing order of their power requirements.
• Assign each action to a time slot for which the total power used is the smallest at that time.
– Can be implemented to run in O(n log n) time.
FERMAT / Virginia Tech 22
LOW PEAK POWER TECHNIQUE
Re-scheduling – Suppress some actions in each cycle to reduce peak power of the design.
Possible Ways – – Conflict - based
• Add extra conflicts for peak power sake.
– Memory - based• Use memory to select how many actions to execute in each
cycle.
FERMAT / Virginia Tech 23
MEMORY-BASED LOW PEAK POWER TECHNIQUE
ALGORITHM -– Arrange actions based on their TRS ordering.
– Find possible combinations of non-conflicting actions which can violate the peak power constraint when executed concurrently.
– For each violating combination -• find a satisfying combination by suppressing some actions.• give priority to actions which come earlier in TRS-ordering.• store the satisfying combinations in a memory.
– In hardware, memory is used to execute appropriate actions in each clock cycle in order to satisfy the peak power constraint.
FERMAT / Virginia Tech 24
MEMORY-BASED LOW PEAK POWER TECHNIQUE
Implemented in Bluespec Compiler –
– Around 10% peak-power savings achieved for small designs like Vending Machine.
– Larger power savings may be possible for larger designs • Experiments Ongoing.
FERMAT / Virginia Tech 25
MEMORY-BASED LOW PEAK POWER TECHNIQUE
LIMITATIONS -
– Some designs written under the assumption that maximum number of actions will execute in each clock cycle might not be able to use this technique.
– Increases latency so applicable mostly to latency-insensitive designs.
– Designs with large number of actions may result in a big memory.
FERMAT / Virginia Tech 26
DYNAMIC POWER PROBLEM
( Complexity Analysis )
FERMAT / Virginia Tech 27
DYNAMIC POWER PROBLEM (DPP)
Instance –
- set A = {a1, a2,…,an} of actions.
- a positive integer P representing dynamic power consumed.
Requirement -
Select the ordering of execution of actions in A such that P is minimized.
• DPP is NP-Complete.
• Corresponds to Traveling Salesman Problem - sub-problem to DPP.
FERMAT / Virginia Tech 28
LOW DYNAMIC POWER TECHNIQUES
• Re-scheduling.
• Operand Isolation.
• Clock Gating.
• Gated Guards.
FERMAT / Virginia Tech 29
RE-SCHEDULING
• Actions can be re-scheduled such that switching at the inputs of the functional units is minimized.
• Resource sharing - Conflicts can be created such that same functional units can be shared among actions consisting of same operations on same operands.
FERMAT / Virginia Tech 30
OPERAND ISOLATION
• Operand Isolation –– Computation corresponding to the body of an action is
allowed only when its output is used in the present clock cycle.
– Involves - • Insertion of gates at the appropriate points without affecting
guards.• Selection of activation signal.
– Guards of actions used as gating signals.
– Implemented algorithm in Bluespec Compiler saved upto 25% dynamic power.
FERMAT / Virginia Tech 31
x
y
zcurrentstate
nextstate
enablesignals
x’
y’
z’
next-statevaluesQ D
EN
bodylogic
condlogic
action foo
Φ2
Computations stay quiescent except
when action executes, i.e. guard is True
action foo (… cond … (x < y) …);
x <= x + z …
endrule
OPERAND ISOLATION – SINGLE ACTION
FERMAT / Virginia Tech 32
OPERAND ISOLATION – MULTIPLE ACTIONS
Isolating multiple actions of a design.
Scheduler
DataSelect
State
DQ
Enable
RuleN
CondN
ActionN
Cond1
Action1
Rule1Rule Control
ΦN
Φ2
FERMAT / Virginia Tech 33
REGISTER CLOCK GATING
• Register Clock-gating -– Registers having a common ENABLE signal can be provided
the same gated clock.
– CAOS - Registers being updated by same set of actions can be passed the same gated clock.
• Implemented algorithm in Bluespec Compiler saved upto 45% dynamic power.
FERMAT / Virginia Tech 34
REGISTER CLOCK GATING
In CAOS, guards of the actions provide the control for gating the clocks of the registers.
EN
Register
DINQOUT
CLK
EN
CLK
GATED_CLK
GATED_CLK
FERMAT / Virginia Tech 35
GATED GUARDS
• In hardware, only required guards should be computed in each clock cycle for power sake.
• Static analysis can be done to figure out which guards should be
computed.
FERMAT / Virginia Tech 36
Gated Guards• Rule 1: (x > y) && (y != 0) --> (x = y; y = x;)
• Rule 2: (x <= y) && (y != 0) --> (y = y - x;)
• Rule 3: (y == 0) --> (result = x;)
Let P = ( x > y) ; Q = (y == 0);
Then g1: P && !Q
g2: !P && !Q
g3: Q
------------------------------------------
g1 && g2 = false;
g1 && g3 = false;
g3 && g1 = false
FERMAT / Virginia Tech 37
Gated Guards
What else can we infer?
(x > y), (y != 0), (x’ == y), (y’ == x)
------------------------------------------------------
(x’ <= y’) && (y’ != 0) OR (y == 0)
So after Rule 1 execution, we know for sure, G1 cannot be true, but G2 or G3 may be true, and hence G1 need not be evaluated. Also prioritize G3.
FERMAT / Virginia Tech 38
Gated Guard• Gcd (70, 42)
• x = 70, y = 42 --> Rule 1
• x = 42, y = 70 --> Rule 2
• x = 42, y = 28 --> Rule 1
• x = 28, y = 42 --> Rule 2
• x = 28, y = 14 --> Rule 1
• x = 14, y = 28 --> Rule 2
• x = 14, y = 14 --> Rule 2
• x = 14, y = 0 --> Rule 3
• result = 14
FERMAT / Virginia Tech 39
Gated Guard• Use a F/F that gets value 1, when Rule 1 is
fired, and becomes 0, when other rules are fired.
• If this F/F holds a value 1, evaluate only G3 and then G2.
• Unless Rule 1 is fired, this F/F stays at 0, and hence can be clock gated most of the time.
• This example may not be very useful, as the guards are simple to evaluate, but guard calculus on complex guards can lead to savings.
FERMAT / Virginia Tech 40
GATED GUARDS
• Theorem proving techniques can be used for deductions.
• Such analysis can be done for more complicated designs.
• A memory in hardware can be used to store the information about which guards need not be computed in the present clock cycle.
FERMAT / Virginia Tech 41
Thank You !!
?