multiplication and sum-of-products circuits:

32
Multiplication and Sum-of-Products Circuits: Giving Up Simplicity To Gain Speed Steve Nuchia

Upload: darin

Post on 12-Jan-2016

41 views

Category:

Documents


1 download

DESCRIPTION

Multiplication and Sum-of-Products Circuits:. Giving Up Simplicity To Gain Speed Steve Nuchia. In The Beginning. ???. ???. With Log Table. Strength In Numbers. Partial Products. Partial Products. Partial Products. Accumulation. 13701. 095041. 091340. 0561741. 0456700. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Multiplication and Sum-of-Products Circuits:

Multiplication andSum-of-Products Circuits:

Giving Up Simplicity

To Gain Speed

Steve Nuchia

Page 2: Multiplication and Sum-of-Products Circuits:

In The Beginning

BA???

???

Page 3: Multiplication and Sum-of-Products Circuits:

With Log Table

BA

Page 4: Multiplication and Sum-of-Products Circuits:

Strength In Numbers

BA

1

0

1

0

q

i

p

jji

ji baR

Page 5: Multiplication and Sum-of-Products Circuits:

007654

0000000

04208

001110

1852

02111

321

7654

Partial Products

Page 6: Multiplication and Sum-of-Products Circuits:

007654

0000000

04208

001110

1852

02111

321

7654

Partial Products

Page 7: Multiplication and Sum-of-Products Circuits:

007654

0000000

04208

001110

1852

02111

321

7654

Partial Products

Page 8: Multiplication and Sum-of-Products Circuits:

1471650007654

14050100000000

14050104208

108420001110

107311852

0211102111

321

7654

Accumulation

Page 9: Multiplication and Sum-of-Products Circuits:

007654

0000000

04208

001110

1852

02111

321

7654

Pairwise Summation

13701

091340

0456700

095041

0561741

Page 10: Multiplication and Sum-of-Products Circuits:

00

40

51

01

61

41

10

007654

0000000

04208

001110

1852

02111

321

7654

Column Counting

Page 11: Multiplication and Sum-of-Products Circuits:

Binary Multiplication

• The multiplication table is trivial (AND gate).• No multi-digit entries in the table, so the partial

products are well-formed numbers.• Addition of binary numbers is hard:

– O(1+n/10) with linear hardware– O(log n) with O(n2 log n) hardware.

• Column counting is the accepted solution.– Wallace Trees circa 1964.

Page 12: Multiplication and Sum-of-Products Circuits:

Continuing research in the area has led to steady improvement in the designs for Partial Product Reduction Trees (PPRTs) for parallel multipliers designs, as evidenced in the progression of work in [18], [2], [12], [10], [11], [6]. However, almost all of this prior work focused on finding good basic building blocks (compressors) that could be connected in a regular pattern to build a PPRT. ...

Oklobdzija & Stelling 1998

Page 13: Multiplication and Sum-of-Products Circuits:

A compressor operates in a single column of the PPRT […] These compressors are made up of full adders that are interconnected in a way to minimize the compressor’s delay. In contrast, our approach is to design a faster PPRT by finding a globally optimal way of interconnecting the low-level components (adders).

Page 14: Multiplication and Sum-of-Products Circuits:

The DSP Filter Setting• Infinite series of data values arriving at a fixed

rate.

• Compute the convolution with a specified vector, fast enough to keep up.

• Economic considerations often favor an FPGA (programmable gate array) solution.

• Linear algebra sum-of-products problems are more likely to a) be floating point and b) favor a software-intensive solution.

Page 15: Multiplication and Sum-of-Products Circuits:

1100011010110010111001101

At At-1 At-2 At-3At+2 At+1

C-1 C0 C1

2

][2 11 kAC tk

ki

itik SkAC ][2

products.-of-sum theis )14(

0 when 0)(

)1(2

12)( 14

f

kkf

kfSkf k

Page 16: Multiplication and Sum-of-Products Circuits:

Improving the Standard Circuit

• The final accumulator has to be fast enough. What if it isn’t?

• Idea: distribute the feedback through the PPRT.

• OK, How?

• Opportunistic feedback: whenever a full adder has fewer than three inputs, give it feedback.

• Problem: The Supermarket Separator.

• Solution starts with the generalized full adder.

Page 17: Multiplication and Sum-of-Products Circuits:

Generalized Full Adder

• Inputs represent data and control information.

• Outputs represent the number of “effective” one bits among the inputs.

• Maps directly into a Xilinx FPGA logic cell (with maximum of four inputs).

C S

a b c d

)()(2

:e.g. equation, arithmetic

and logical Mixed

dcbaSC

Page 18: Multiplication and Sum-of-Products Circuits:

Supermarket Separator Problem

C S

a b c d

1

0

1

0

1

1

0

1

0

0

1

1

k=1

k=0, t=1

k=q-2

k=q-1, t=0

Page 19: Multiplication and Sum-of-Products Circuits:

Time Signatures

• To allow for feedback, need to be able to do the bookkeeping.

• Zeros may appear on some wires as columns are reduced. To exploit this sparseness, we need to detect and manipulate it.

• Time signature algebra: associate a vector with each wire (or bus) giving the maximum arithmetic value carried on the wire in each clock period.

Page 20: Multiplication and Sum-of-Products Circuits:

Time Signature Constraints

• The arithmetic contribution of a signal must be conserved.

• No non-zero contribution can cross over a supermarket barrier.

• Remark: Delaying a signal by one clock should be an identity in the algebra.

Page 21: Multiplication and Sum-of-Products Circuits:

Signal Origination

C S

a b c d

0

1

1

1

0

0

1

1

0

00

1

Control or N/C

3,2,1,0

At the top of the tree,the input data are assumedto have time signature 1111.

Page 22: Multiplication and Sum-of-Products Circuits:

Pair Splitting

C S

a b c d

3,2,1,0

1,1,1,0

1,1,0,0

A wire can carry no more than a contribution of 1. The sumbit may be a one if the bus carries more than zero. The carrybit may be one if the bus carries more than one.Note: the carry bit belongs to the next higher file.

Page 23: Multiplication and Sum-of-Products Circuits:

Wire Splitting1,0,1,1

1,0,G,1G,0,1,G

G is for Garbage. The information content (contribution) ofthe wire is split but the electrical signal is not altered.

Page 24: Multiplication and Sum-of-Products Circuits:

Right-Shift Rule

A signal may be re-assigned to the next-lower fileif it is doubled.

This is occasionally useful when a cell wouldotherwise be underutilized.

Page 25: Multiplication and Sum-of-Products Circuits:

Diagonal Shift Rule

ft-r(f-s)s)t-(r

qrtk

fk

) (mod

Weight Effective

As long as no contribution slides across a mod q barrier,signals can be reassigned to neighbors on the positive-slopediagonal. The TS is given relative to the rank r, so the TSvector must be “rotated” by the shift length s.

t0 t1 t2 t3 t3t0 t1 t2

If t3 = G or 0.

Page 26: Multiplication and Sum-of-Products Circuits:

Sink Rule

• When a signal contains only one active timeslot and that timeslot contains the sole representative of the lowest remaining column, that signal is sunk and is removed from consideration.

• Sunk signals may be stored for parallel output or may be consumed as soon as they are produced, depending on the application.

Page 27: Multiplication and Sum-of-Products Circuits:

Gate Rule

C S

a b c d

G

1

G

G

1

0G

1

k=1

1,0,1,1

k=2

)()(2 dcbaSC

Clock-period indicator signalsare used to gate out garbagein the generalized full adder.

0,0,0,0

Page 28: Multiplication and Sum-of-Products Circuits:

Design Generation

• Currently, I have a Prolog program with constraint propagation extensions that knows the algebra. It does not yet successfully generate designs.

• The general strategy is to generate desgigns rank-by rank, under iterative deepening, until a successful (valid and complete) design is found.

Page 29: Multiplication and Sum-of-Products Circuits:

Generation, Continued

• Once a valid design is found, its cost will be used as an upper bound for an exhaustive search for better designs.

• Efficiently generating candidate designs with feedback is a chicken-and-egg problem. I am using a “suspense list” of inputs not yet connected to outputs to handle this problem.

Page 30: Multiplication and Sum-of-Products Circuits:

Generation, Continued

• The routines that implement the TS algebra have to “wire up” the TS rules without knowing the TS of the feedback inputs. Tricky coding problem, but under control.

• The end game is not yet well understood. That needs more study.

• I hope to be generating real designs soon, and to have some idea what an optimal design might look like in January.

Page 31: Multiplication and Sum-of-Products Circuits:

Sign Handling

• We haven’t talked about signed numbers. Signed data can be handled rather easily by this circuit, but signed coefficients require some thought.

• The standard circuit sign-extends the partial product terms in the feedback path. To do that, you have to know the sign bit’s value!

• I have a solution: next seminar.

Page 32: Multiplication and Sum-of-Products Circuits:

Conclusions• Inventing an appropriate algebra helped me

to formulate the optimization problem for software solution and gives me confidence that the resulting designs are correct.

• Optimality, of course, is a different problem.

• The range of applicability of this circuit is not very broad: it is best suited for FPGA realization near the maximum clock speed of the logic family.