optimizing parallel prefix adders using integer linear...

40
1 Design Space Exploration for Power-Efficient Mixed-Radix Ling Adders Chung-Kuan Cheng Computer Science and Engineering Depart. University of California, San Diego

Upload: doduong

Post on 29-Aug-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

1

Design Space Exploration for Power-Efficient Mixed-Radix Ling Adders

Chung-Kuan ChengComputer Science and Engineering

Depart. University of California, San Diego

2

Outline Prefix Adder Problem

Background & Previous Work Extensions: High-radix, Ling

Our Work Area/Timing/Power Models Mixed-Radix (2,3,4) Adders ILP Formulation

Experimental Results Future Work

Prefix Adder – Challenges

Logical Levels

Wire Tracks

Fanouts

Area

Physical placement

Detail routing

New Design Scope

Timing

Gate Cap

Wire Cap

Gate sizingBuffer insertion

Signal slope

Input arrival time

Output require time

Increasing impact of physical design and concern of power.

Power

Static power

Dynamic power

Power gating

Activity Probability

4

Binary Addition Input: two n-bit binary numbers

and , one bit carry-in Output: n-bit sum and one bit

carry out Prefix Addition: Carry generation &

propagation

011... aaan

011... bbbn 0c

011... sssn

nc

)(

:Propagate:Generate

1

iiii

iiii

iii

iii

bacscpgc

bapbag

5

Prefix Addition – Formulation

iiiiii bapbag Pre-processing:

Post-processing:

Prefix Computation:

iii

iii

cps

cPGc

0]0:[]0:[1

]:1[]:[]:[

]:1[]:[]:[]:[

kjjiki

kjjijiki

PPP

GPGG

6

1234

12:13:14:1

Prefix Adder – Prefix Structure Graph

gpi

pi

G[i:0]

si

biai

GP[i, j] GP[j-1, k]

GP[i, k]

gp generator

sum generator

GP cell

Pre-processing

Post-processing

Prefix Computation

7

Previous Works – Classical prefix adders

12345678

12:13:14:16:17:18:1 5:1

Brent-Kung:Logical levels: 2log2n–1 Max fanouts: 2Wire tracks: 1

12345678

12:13:14:16:17:18:1 5:1

Kogge-Stone:Logical levels: log2nMax fanouts: 2Wire tracks: n/2

12345678

12:13:14:16:17:18:1 5:1

Sklansky:Logical levels: log2nMax fanouts: n/2Wire tracks: 1

8

High-Radix Adders Each cell has more than two fan-

in’s Pros: less logic levels

6 levels (radix-2) vs. 3 levels (radix-4) for 64-bit addition

Cons: larger delay and power in each cell

9

Radix-3 Sklansky & Kogge-Stone Adder

David Harris, “Logical Effort of Higher Valency Adders”

10

Ling Adders

iiiiii bapbag Pre-processing:

Post-processing:

Prefix Computation:

iii

iii

cps

cPGc

0]0:1[]0:1[

]:1[]:[]:[

]:1[]:[]:[]:[

kjjiki

kjjijiki

PPP

GPGG

Prefix Ling

* *[ 1:0] [ 1:0] 1( )i i i i i is G t G t p

,i i i i i i

i i i

g a b p a bt a b

1*

]1:[1*

]1:[ , iiiiiiii ppPggG* * * *[ : ] [ : ] [ 1: 1] [ 1: ]i k i j i j j kG G P G

*]:1[

*]:[

*]:[ kjjiki PPP

*]0:1[1 iii Gpc

11

An 8-bit Ling Adder

H1H2H3H4H5H6H7H8

12345678

0 1

si

ai bipi-1

Hi

gi pi-1 pi-2

G*i P*

i-1

gi-1

12

Area Model Distinguish physical placement from logical

structure, but keep the bit-slice structure.

Logical view Physical view

Bit position

Logical level

Bit position

Physical level

Compact placement

12345678 12345678

13

Timing Model Cell delay calculation:

pfd Effort Delay Intrinsic Delay

hgf

Logical Effort Electrical Effort = Cout/Cin= (fanouts+wirelength) / size

Intrinsic properties of the cell

14

Power Model Total power consumption:

Dynamic power + Static Power Static power: leakage current of device

Psta = *#cells Dynamic power: current switching

capacitancePdyn = Cload

is the switching probability = j (j is the logical level*)

cellsCjPPP loadstadyntotal # * Vanichayobon S, etc, “Power-speed Trade-off in Parallel Prefix Circuits”

15

ILP Formulation OverviewStructure variables: •GP cells•Connections (wires)•Physical positions

Capacitance variables: •Gate cap•Vertical wire cap•Horizontal wire cap

Timing variables: •Input arrival time•Output arrival time

Power Objective

ILPILOG CPLEX

Optimal Solution

16

Integer Linear Programming (ILP) ILP: Linear Programming with integer

variables. Difficulties and techniques:

Constraints are not linear Linearize using pseudo linear constraints

Search Space too large Reduce search space

Search is slow Add redundant constraints to speedup

17

ILP – Integer Linear Programming

Linear Programming: linear constraints, linear objective, fractional variables.

Integer Linear Programming: Linear Programming with integer variables.

LP Optimal

Constraints

ILP Optimal

ILP – Pseudo-Linear Constraint

Minimize: x3

Subject to: x1 300 x2 500 x3 = min(x1, x2)

Minimize: x3

Subject to: x1 300 x2 500 x3 x1

x3 x2

x3 x1 – 1000 b1 (1)

x3 x2 – 1000 (1 – b1) (2) b1 is binary

Problem: ILP formulation:

LP objective: 0ILP objective: 300

• A constraint is called pseudo-linear if it’s not effective until some integer variables are fixed.

• Pseudo-linear constraints mostly arise from IF/ELSE scenarios• binary decision variables are introduced to indicate true or false.

19

ILP Solver Search Procedure

0Root (all vars are fractional)

b1

b2

b3

b4

2

‘0’ ‘1’infeasible

‘0’ ‘1’

3feasible

(current best)

infeasible

Minimize F(b1, b2, b3, b4, f1,…) bi is binary

It is VERY helpful if ILP objective is close to LP objective

2

‘0’ ‘1’

4

‘0’ ‘1’

3 2

4

55‘0’ ‘1’

Cut

2 Bound (Smallest candidate)

Interval Adjacency Constraint

H1H2H3H4H5H6H7H8

12345678

(7,3): Interval [7,1]

(3,2): Interval [3,1]

(7,2): Interval [7,4]

Must be adjacent,i.e. 4 = 3 + 1

(column id, logic level)

21

Linearization for Interval Adjacency Constraint

(i, j)

(i, h) (k1, l1) (k2, l2)

wl wr1 wr2

],[ ),(),(R

hiL

hi yy ],[ )1,1()1,1(R

lkL

lk yy ],[ )2,2()2,2(R

lkL

lk yy

],[ ),(),(R

jiL

ji yy

11 if 1),(),( (i,j,k,l) wrwl(i,j,h) yy Llk

Rhi

1 if 1),,,(1),(

),( wl(i,j,h) lkjiwrkylk

Rhi

11 ),,,(1),(

),( wl(i,j,h))(nlkjiwrkylk

Rhi

11 ),,,(1),(

),( wl(i,j,h))(nlkjiwrkylk

Rhi

iyLji ),(

Linearize

Pseudo Linear

Left interval bound equal to column index

22

Search Space Reduction Ling’s adder:

separate odd and even bits

Double the bit-width we are able to search H1H2H3H4H5H6H7H8

12345678

23

Redundant Constraints Cell (i,j) is known to have logic level j before

wire connection Assume load is MinLoad (fanout=1 with

minimum wire length):

MinLoadjP ji ),(

Cell (i,j) has a path of length j-1 Assume each cell along the path has MinLoad

)(),( MinLoadLEPDjT ji

24

Experiments – 16-bit Uniform Timing

Experiments – 16-bit Uniform Timing

25

26

Min-Power Radix-2 Adder (delay= 22, power = 45.5FO4 )

1

1

2

2

3

3

4

4

5

5

6

6

7

7

8

8

9

9

10

10

11

11

12

12

13

13

14

14

15

15

16

16

27

Min-Power Radix-2&4 Adder (delay=18, power = 29.75FO4 )

1

1

2

2

3

3

4

4

5

5

6

6

7

7

8

8

9

9

10

10

11

11

12

12

13

13

14

14

15

15

16

16

Radix-2 Cell Radix-4 Cell

28

Min-Power Mixed-Radix Adder (delay=20, power = 28.0FO4)

1

1

2

2

3

3

4

4

5

5

6

6

7

7

8

8

9

9

10

10

11

11

12

12

13

13

14

14

15

15

16

16

Radix-2 Cell Radix-4 Cell Radix-3 Cell

29

Experiments – 16-bit Non-uniform Time (Mixed Radix)

ILP is able to handle non-uniform timingsLing adders are most superior in increasing arrival time– faster carries

Increasing Arrival Time (delay=35.5, power = 27.0FO4 )

30

Decreasing Arrival Time (delay=34.5, power = 30.5FO4)

31

Convex Arrival Time (delay=35.9, power = 32.4FO4 )

32

Increasing Required Time (delay=34.5, power = 30.5FO4)

33

Decreasing Required Time (delay=36.5, power = 32.5FO4)

34

Convex Required Time (delay=36.5, power = 32.5FO4)

35

36

Experiments – 64-bit Hierarchical Structure (Mixed-Radix) Handle high bit-width applications 16x4 and 8x8

ILP Block ILP Block ILP Block ILP Block

ILP Block

a1b1a16b16a17b17a32b32a33b33a48b48a49b49a64b64

…... …... …... …...

Level 1

Level 2

…... …... …... …...

…... …... …... …...

GP*[64:50]GP*[48:34] GP*[32:18] GP*[16:2]

GP*[1:1]GP*[17:17]GP*[33:33]GP*[49:49]

…... …... …...H64 H49 H48 H33 H32 H17 H16 H1

37

Experiments – 64-bit Hierarchical Structure

TSL: a 64-bit high-radix three-stage Ling adderV. Oklobdzija and B. Zeydel, “Energy-Delay Characteristics of CMOS Adders”,in High-Performance Energy-Efficient Microprocessor Design, pp. 147-170, 2006

38

ASIC Implementation - Results 64-bit hierarchical design (mixed-radix)

by ILP vs. fast carry look-ahead adder by Synopsys Design Compiler

TSMC 90nm standard cell library was used

39

Future Work ILP formulation improvement

Expected to handle 32 or 64 bit applications without hierarchical scheme

Optimizing other computer arithmetic modules Comparator, Multiplier

40

Q & A

Thank You!