special purpose machines - iit bombayranade/606/vlsilb.pdfspecial purpose machines what is the best...

35
Special Purpose Machines What is the best way to build a VLSI chip to solve problem X? Best = Minimum area, Minimum time, ... X = Matrix multiplication, sorting, FFT, ... Upper bounds: Solve the problem using Mesh/Butterfly/... and layout the network in VLSI. Devise new networks better suited for VLSI. Lower bounds: Prove: “No matter what network is used, area/time/... have to be at least ...” 0-0

Upload: others

Post on 25-Apr-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Special Purpose Machines - IIT BombayRanade/606/Vlsilb.pdfSpecial Purpose Machines What is the best way to build a VLSI chip to solve problem X? Best = Minimum area, Minimum time,

Special Purpose Machines

What is the best way to build a VLSI chip to solveproblem X?

Best = Minimum area, Minimum time, ...

X = Matrix multiplication, sorting, FFT, ...

Upper bounds:

• Solve the problem using Mesh/Butterfly/...and layout the network in VLSI.

• Devise new networks better suited for VLSI.

Lower bounds:

Prove: “No matter what network is used, area/time/...have to be at least ...”

0-0

Page 2: Special Purpose Machines - IIT BombayRanade/606/Vlsilb.pdfSpecial Purpose Machines What is the best way to build a VLSI chip to solve problem X? Best = Minimum area, Minimum time,

Lower Bounds

Memory based bounds: A ≥ f

“At some point while solving problem, it is neces-sary to remember f words.”

“Chip must have area at least f to remember fwords.”

I/O based bounds: AT ≥ f

“Input/Output consists of f words

“Every chip with area A can read at most O(A)words per time step.”

Bisection based bounds: AT 2 ≥ f2

“At most√

A words can flow from one half of chipto other half in one step.”

“Problem requires at least f words to flow.”

0-1

Page 3: Special Purpose Machines - IIT BombayRanade/606/Vlsilb.pdfSpecial Purpose Machines What is the best way to build a VLSI chip to solve problem X? Best = Minimum area, Minimum time,

Model

Processors:

• 1 word memory. (w bits)

• input/output: 1 word per step.

• communication: 1 word per link per step.

Input-output schedule:

• Each input word is read just once.

• Input/Output is where/when oblivious.

INPUT: X : x1, x2, . . . , xm

OUTPUT: Y : y1, . . . , yn

Where Oblivious: Which processor reads xi (gen-erates yj) is fixed before beginning of execution.

When Oblivious: When xi is read (yj generated)is fixed before beginning of execution.

0-2

Page 4: Special Purpose Machines - IIT BombayRanade/606/Vlsilb.pdfSpecial Purpose Machines What is the best way to build a VLSI chip to solve problem X? Best = Minimum area, Minimum time,

Terminology

Execution: Sequence of events: power up, readinputs, compute, generate outputs, read moreinputs, . . . , stop.

m input chip can have 2mw different execu-tions.

State: Values stored in the processors.

P processor chip can have 2Pw states.

Behaviour: Values generated as ouptuts.

n output chip can have ≤ 2nw behaviours.

Behaviour after time t: Values output aftertime t.

Important Fact: Chip Behaviour after time t

= f(chip state at time t, chip inputs after t)

0-3

Page 5: Special Purpose Machines - IIT BombayRanade/606/Vlsilb.pdfSpecial Purpose Machines What is the best way to build a VLSI chip to solve problem X? Best = Minimum area, Minimum time,

Lower bounds on chip area

Basic Observation: Chip produces different be-haviour in two executions given the same input af-ter step t

⇒Chip state was different at time t.

Typical Argument: Prove that given chip pro-duces different behaviours after time t given sameinput after time t in executions E1, E2, . . . , EN .

⇒ Chip state at time t distinct in all executions.

⇒ 2Pw ≥ N

⇒ P ≥ log Nw

⇒ Chip Area ≥ log Nw

0-4

Page 6: Special Purpose Machines - IIT BombayRanade/606/Vlsilb.pdfSpecial Purpose Machines What is the best way to build a VLSI chip to solve problem X? Best = Minimum area, Minimum time,

Cyclic Shift Lower bound

INPUT 1: xn−1, xn−2, . . . , x0 “Data”

INPUT 2: s “shift amount”

Assume w ≥ log n

OUTPUT: yn−1, yn−2, . . . , y0

where yi = xi+s mod n

Lemma: Cyclic Shift requires A = Ω(n)

Claim: All data words must be read before anyoutput word is generated. (Next)

Proof of Lemma: Let t = time instant when alldata words read, no output word generated.

Consider executions with shift amount s = 0.

Inputs read after t are same in all executions.

Number of behaviours for chip after time t = 2nw

∃ 2nw distinct executions in which same input isread after time t, and in which distinct behaviouris produced after time t.

⇒ A = Ω( log 2nw

w ) = Ω(n)

0-5

Page 7: Special Purpose Machines - IIT BombayRanade/606/Vlsilb.pdfSpecial Purpose Machines What is the best way to build a VLSI chip to solve problem X? Best = Minimum area, Minimum time,

Proof of Claim

Suppose yj is generated before xi is read in someexecution.

Obliviousness ⇒ above is true for all executions.

Set s = i− j mod n

⇒ Chip must output yj = xj+s mod n = xi

We can feed different value for xi after seeing whatchip outputs as yj .

⇒ Chip forced to make mistake. Contradiction.

Informal Interpretation: There exists a time in-stant t at which chip must have all n data words inmemory. So n processors are necessary, and hencen area.

0-6

Page 8: Special Purpose Machines - IIT BombayRanade/606/Vlsilb.pdfSpecial Purpose Machines What is the best way to build a VLSI chip to solve problem X? Best = Minimum area, Minimum time,

Dependence

INPUTS: X : x0, x1, x2, . . . , xn−1

OUTPUTS: Y : y0, y1, y2, . . . , ym−1

yi depends upon xj under control c iff there existtwo executions in which

1. Distinct values are assigned to input xj .

2. Inputs in X − xj are assigned the value c,identical in both executions.

3. yi takes on a different value in the two exe-cutions.

yi depends upon xj ≡ ∃c such that yi depends uponxj for control c.

Example: In cyclic shift every output word de-pends upon every data word.

Implication of dependence: If yi depends uponxj then xj must be read before yi is generated.

0-7

Page 9: Special Purpose Machines - IIT BombayRanade/606/Vlsilb.pdfSpecial Purpose Machines What is the best way to build a VLSI chip to solve problem X? Best = Minimum area, Minimum time,

Dependence between Sets

INPUTS: X, Subset X ′ : x0, x1, x2, . . . , xn−1

OUTPUTS: Y , Subset Y ′ : y0, y1, y2, . . . , ym−1

Y ′ depends upon X ′ iff yi depends upon xj b ∀i, j.Example: In cyclic shift the set of all outputwords depends upon the set of all data words.

0-8

Page 10: Special Purpose Machines - IIT BombayRanade/606/Vlsilb.pdfSpecial Purpose Machines What is the best way to build a VLSI chip to solve problem X? Best = Minimum area, Minimum time,

Flow

INPUTS: X, Subset X ′ : x0, x1, x2, . . . , xn−1

OUTPUTS: Y , Subset Y ′ : y0, y1, y2, . . . , yn−1

X ′ flows to Y ′ under control c iff there exist 2nw

executions in which:

1. The value c is given to the inputs in X −X ′.

2. The input X ′ take distinct values.

3. The outputs Y ′ take on exactly the same val-ues as the inputs X ′.

X ′ flows to Y ′ ≡ ∃c such that X ′ flows to Y ′

under some control c.

Example: In cyclic shift, the data inputs flow tothe outputs. The control c needed for this is 0, i.e.the shift amount is set to 0.

0-9

Page 11: Special Purpose Machines - IIT BombayRanade/606/Vlsilb.pdfSpecial Purpose Machines What is the best way to build a VLSI chip to solve problem X? Best = Minimum area, Minimum time,

Main Theorem

If Y ′ depends upon X ′ and X ′ flows to Y ′ undersome control c then Area = Ω(n), where n denotesthe number of words in X ′.

Proof: Because of dependence X ′ must be readbefore any Y ′ is generated. Let t = time afterreading X ′ but before generating Y ′.

Consider 2nw executions in which X ′ takes differ-ent values. In all these executions we will set inputsX−X ′ to c. Thus in each, Y ′ takes the same valueas X ′, because of the flow condition.

Further, in all executions chip will read the sameinput (part of c) after step t. But in each it mustproduce different output.

Thus at step t chip must have 2nw states. Thusit must be capable of storing nw bits, i.e. have nprocessors, i.e. area n.

0-10

Page 12: Special Purpose Machines - IIT BombayRanade/606/Vlsilb.pdfSpecial Purpose Machines What is the best way to build a VLSI chip to solve problem X? Best = Minimum area, Minimum time,

Remark

Area required may be small if you have only de-pendence or only flow.

Assume w = 1 in these examples.

Only flow: Problem: Addition modulo 2n

Input: A = a0, a1, . . . , an−1 and B = b0, b1, . . . , bn−1

Output: C = c0, . . . , cn−1

Clearly A flows to C when under control B = 0.But addition can be done in O(1) area by process-ing numbers lsb to msb.

Only dependence: Artificial Problem

Input: x0, . . . , xn−1

Output: y0, . . . , yn−1

Each yi =∑

i yi mod 2

Clearly every yi depends upon every xj . But againyi can be computed in O(1) area by reading allinputs, keeping track of the sum using just O(1)processors. After this the sum is output n times.

0-11

Page 13: Special Purpose Machines - IIT BombayRanade/606/Vlsilb.pdfSpecial Purpose Machines What is the best way to build a VLSI chip to solve problem X? Best = Minimum area, Minimum time,

f Word Flow

INPUTS: X, Subset X ′ : x0, x1, x2, . . . , xn−1

OUTPUTS: Y , Subset Y ′ : y0, y1, y2, . . . , yn−1

X ′ has an f word flow to Y ′ under control c iffthere exist 2fw executions in which:

1. The value c is given to the inputs in X −X ′.

2. The input X ′ take distinct values.

3. The outputs Y ′ take on exactly the same val-ues as the inputs X ′.

If f = n then this is the old definition; which wewill call full flow.

Example: Consider the problem of sorting n 1-bit numbers. Clearly, the inputs do not fully flowto outputs. However, all inputs that are alreadysorted will appear unchanged at the outputs. Thusthere are n executions in which the above condi-tions are satisfied. Thus 2fw = n, i.e. f = log nsince we have w = 1.

0-12

Page 14: Special Purpose Machines - IIT BombayRanade/606/Vlsilb.pdfSpecial Purpose Machines What is the best way to build a VLSI chip to solve problem X? Best = Minimum area, Minimum time,

Main Theorem Generalized:

If Y ′ depends upon X ′ and X ′ has an f word flowto Y ′ (under some control c) then Area = Ω(f).

Proof: Similar.

0-13

Page 15: Special Purpose Machines - IIT BombayRanade/606/Vlsilb.pdfSpecial Purpose Machines What is the best way to build a VLSI chip to solve problem X? Best = Minimum area, Minimum time,

Integer Multiplication

INPUTS: an−1 . . . a1a0 bn−1 . . . b1b0

OUTPUT: c2n−1 . . . c1c0

(ai, bj , ck are all bits)

Theorem: A = Ω(n) (bit model)

Proof:

X ′ : bn−1 . . . bn/2

Y ′ : c3n/2−1 . . . cn

Every bit in Y ′ depends upon every bit in X ′.

... using a = suitable power of 2.

X ′ flows to Y ′ under control a = 2n/2, rest of theinputs 0.

0-14

Page 16: Special Purpose Machines - IIT BombayRanade/606/Vlsilb.pdfSpecial Purpose Machines What is the best way to build a VLSI chip to solve problem X? Best = Minimum area, Minimum time,

Integer Addition

Lemma: Addition modulo 2n requires area θ(1).

“2s complement addition”

Proof:

Lemma: Addition modulo 2n − 1 requires areaΩ(n).

“1s complement addition”

Proof: Homework

0-15

Page 17: Special Purpose Machines - IIT BombayRanade/606/Vlsilb.pdfSpecial Purpose Machines What is the best way to build a VLSI chip to solve problem X? Best = Minimum area, Minimum time,

Summary of Area Bounds

Word model: w = θ(log n)

Cyclic shift Ω(n)

Sorting Ω(n)

Bit model:

Integer Multiplication Ω(n)

Addition mod 2n θ(1)

Addition mod 2n − 1 Ω(n).

Convolution Ω(n)

Sorting n integers each 1 + log n bits wide Ω(n)

Sorting n integers each 2 log n bits wide Ω(n log n)

Exercise: Find matching upper bounds.

0-16

Page 18: Special Purpose Machines - IIT BombayRanade/606/Vlsilb.pdfSpecial Purpose Machines What is the best way to build a VLSI chip to solve problem X? Best = Minimum area, Minimum time,

AT 2 Bounds

Behaviour of L : Values output by L.

(Communication) Transcript of C : Concatenationof all the words transmitted across C over time.

Behaviour of L = f(Values input inside L, Transcript(C))

Basic idea: L has different behaviour in two exe-cutions though same values input inside L

⇒ Transcript(C) is different in the two executions.

Typical Argument: Prove that L has differentbehaviours in executions E1, E2, . . . , Ef althoughsame external values are input in L.

⇒ Transcript(C) is different in each execution.

Number of different transcripts possible:

T = execution time

|C| : number of wires crossing C.

Number ≤ 2w|C|T

0-17

Page 19: Special Purpose Machines - IIT BombayRanade/606/Vlsilb.pdfSpecial Purpose Machines What is the best way to build a VLSI chip to solve problem X? Best = Minimum area, Minimum time,

Transcript Theorem

Suppose a small cut (of size O(√

A)) divides thechip into part L and R. Suppose L reads inputs X ′

and R generates outputs Y ′. Suppose X ′ has an fword flow to Y ′. Then AT 2 = Ω(f2) where n =number of words in X ′.

Proof: Let c denote the control for which X ′ flowsto Y ′. Consider executions in which all inputsother than X ′ are set to c, while the values givento X ′ are distinct. Clearly, there are 2fw such ex-ecutions.

But in all of these, the values read by part R fromthe external world are the same (part of the controlc). But yet, R shows different behaviour for eachexecution.

Thus there must be 2fw distinct transcripts.

Bits in a transcript = time * cutsize = O(Tw√

A).

Number of distinct transcripts possible = O(2Tw√

A).

Thus 2fw = O(2Tw√

A).

Thus f = O(T√

A), i.e. AT 2 = Ω(f2).

0-18

Page 20: Special Purpose Machines - IIT BombayRanade/606/Vlsilb.pdfSpecial Purpose Machines What is the best way to build a VLSI chip to solve problem X? Best = Minimum area, Minimum time,

General idea of proof:

1. Find the small cut.

2. Show X ′, Y ′ with flow.

Note that dependence is not needed.

0-19

Page 21: Special Purpose Machines - IIT BombayRanade/606/Vlsilb.pdfSpecial Purpose Machines What is the best way to build a VLSI chip to solve problem X? Best = Minimum area, Minimum time,

Cyclic Shift

INPUT 1: xn−1, xn−2, . . . , x0 “Data”

INPUT 2: s “shift amount”

Assume w ≥ log n

OUTPUT: yn−1, yn−2, . . . , y0

where yi = xi+value(s) mod n

Lemma: Cyclic Shift requires AT 2 = Ω(n2)

Upper Bound Comparison:√(n)×

√(n) mesh : T = O(

√n), A = O(n)

⇒ AT 2 = O(n2) : Optimal. Area cannot be re-duced without increasing time, and vice versa.

n node hypercube: T = O(log n), A = O(n2)

⇒ AT 2 = O(n2 log2 n) : Suboptimal. Improve-ment possible in time or area without increasingthe other.

n node Butterfly: T = O(log n), A = O(n2/ log2 n)

⇒ AT 2 = O(n2) : Optimal.

0-20

Page 22: Special Purpose Machines - IIT BombayRanade/606/Vlsilb.pdfSpecial Purpose Machines What is the best way to build a VLSI chip to solve problem X? Best = Minimum area, Minimum time,

Proof Outline

Proc P is output-heavy : P outputs ≥ n/3 values.

Output-heavy proc ⇒ T ≥ n/3 ⇒ AT 2 = Ω(n2)

Partition Lemma: Assuming no processor is output-heavy, there exists small cut C and constant α suchthat X ′ = xi1 , xi2 , . . . , xin/6 are input on one sideand Y ′ = yi1−α, xi2−α, . . . , xin/6−α are outputon the other side.

(Proved next)

WLOG X ′ is input in R, Y ′ is output in L.

Under control c = α, X ′ flows to Y ′.

AT 2 = Ω((

n6

)2)

= Ω(n2)

small cut = cut of length O(√

A).

0-21

Page 23: Special Purpose Machines - IIT BombayRanade/606/Vlsilb.pdfSpecial Purpose Machines What is the best way to build a VLSI chip to solve problem X? Best = Minimum area, Minimum time,

Proof of Partition Lemma

Claim: “Output Partition” There exists smallcut C s.t. at least n/3 outputs are generated oneach side.

Proved next. Sliding argument.

WLOG assume side L reads at least n/2 inputs.

Inputs on side L: xi1 , xi2 , . . . , xin/2

Outputs on side R: yj1 , xj2 , . . . , xjn/3

Put complete bipartite graph on above inputs/outputs.

Assign colours to edges: (xp, yq) = p− q mod n

0 ≤ Colour < n.

n2/6 edges ⇒ At least n/6 edges of same colour.

colour(xp, yq) = shift count which will force yq totake value xp.

X ′, Y ′ = endpoints of edges with most popularcolour.

0-22

Page 24: Special Purpose Machines - IIT BombayRanade/606/Vlsilb.pdfSpecial Purpose Machines What is the best way to build a VLSI chip to solve problem X? Best = Minimum area, Minimum time,

Output Partition Lemma:

There exists a small cut with at least n/3 out-puts on each side, assuming no processor is outputheavy.

Proof: Let chip height = h and length = l, withh ≤ l

Number the intersections in the chip from 1 to lhin column major order.

O(i) = number of outputs in first i intersections.

O(0) = 0, O(lh) = n.

Consider smallest i when O(i) ≥ n/3. Clearly,O(i− 1) < n/3.

O(i) cannot increase more than n/3 as i increasesby 1, because no processor is output heavy. ThusO(i) < 2n/3.

But there is a cut of size h + 1 separating the firsti locations from the rest.

h + 1 = O(√

A).

0-23

Page 25: Special Purpose Machines - IIT BombayRanade/606/Vlsilb.pdfSpecial Purpose Machines What is the best way to build a VLSI chip to solve problem X? Best = Minimum area, Minimum time,

Summary

Idea 1: Use of transcript theorem. Show largeflow.

Idea 2: Assume no processor is output-heavy.

Idea 3: Output partition: Assuming no proces-sor is output heavy, it is possible to find a smallcut such that each side of the cut generates n/3outputs, where n = number of outputs.

Idea 4: So n/2 inputs must be separated from n/3outputs. This by itself doesnt produce flow.

Idea 5: Averaging idea to show the existence ofcolour with at least n/6 edges. Hence flow.

Ideas 1-4 are applicable in general. Idea 5 is alsouseful.

0-24

Page 26: Special Purpose Machines - IIT BombayRanade/606/Vlsilb.pdfSpecial Purpose Machines What is the best way to build a VLSI chip to solve problem X? Best = Minimum area, Minimum time,

Boolean Matrix Multiplication

INPUT: n×n matrices A,B OUTPUT: C = AB.

Theorem: AT 2 = Ω(n4).

Proof: More sophisticated version of circular shift.

Proof: Main ideas: (1) Choosing A = shift matrixcauses rows of B to be circularly shifted into C,Choosing B = shift matrix causes columns of A tobe circularly shifted into C. (2) Show that thereexists a partition of the chip such that either rowsof B or columns of A must cross a cut.

0-25

Page 27: Special Purpose Machines - IIT BombayRanade/606/Vlsilb.pdfSpecial Purpose Machines What is the best way to build a VLSI chip to solve problem X? Best = Minimum area, Minimum time,

Shift Matrices

Si = Matrix obtained by rotating all rows of theidentity matrix by i circularly.

Example: n = 5, i = 1

S1 =

0 1 0 0 00 0 1 0 00 0 0 1 00 0 0 0 11 0 0 0 0

S1B : result of circularly rotating rows of B onestep upwards.

AS1 : result of circularly rotating columns of Aone step right.

0-26

Page 28: Special Purpose Machines - IIT BombayRanade/606/Vlsilb.pdfSpecial Purpose Machines What is the best way to build a VLSI chip to solve problem X? Best = Minimum area, Minimum time,

Proof Outline:

1. Assume no processor generates ≥ n2/3 out-puts. Else T = Ω(n2), i.e. AT 2 = Ω(n4).

2. ∃ cut π such that ≥ n2/3 bits of C are gen-erated on both sides.

3. One of the following is true:

(a) n2/6 bits of A are read on one side ofπ and n2/6 bits of C generated on theother such that there is flow. Control:set B to a suitable shift matrix.

(b) n2/6 bits of B are read on one side ofπ and n2/6 bits of C generated on theother such that there is flow. Control:set A to a suitable shift matrix.

0-27

Page 29: Special Purpose Machines - IIT BombayRanade/606/Vlsilb.pdfSpecial Purpose Machines What is the best way to build a VLSI chip to solve problem X? Best = Minimum area, Minimum time,

Graph G

VERTICES: one vertex for each element of A,B,C.

EDGES: (aij , ci,j+k mod n) of color k(bij , ci+k mod n,j) of color n + k

Lemma: Any partition of G with at least n2/3vertices of C on each side has Ω(n3) edges crossingthe partition.

⇒ Ω(n2) edges of a single color cross the partition.

⇒ If color ≤ n, use shift matrix for B; else useshift matrix for A.

⇒ Ω(n2) bits flow from either A or B across π.

⇒ Main theorem proved!

0-28

Page 30: Special Purpose Machines - IIT BombayRanade/606/Vlsilb.pdfSpecial Purpose Machines What is the best way to build a VLSI chip to solve problem X? Best = Minimum area, Minimum time,

Proof of Lemma

Embed complete directed graph Kn2 → C

(Edges of Kn2 → paths of G)

α = congestion of embedding.

β = number of edges of G in the partition.

Key observation:

βα ≥ paths crossing the partition ≥ n2

3n2

3

Next we prove: α = n

⇒ β ≥ n3/9

0-29

Page 31: Special Purpose Machines - IIT BombayRanade/606/Vlsilb.pdfSpecial Purpose Machines What is the best way to build a VLSI chip to solve problem X? Best = Minimum area, Minimum time,

Estimating congestion α

EMBEDDING:

Path (cij , ckl) → (cij , aik), (aik, cil), (cil, bjl), (bjl, ckl)

Congestion of (cij , aik) (directed):

paths carried (cij , ck∗) ⇒ Congestion = n

Congestion of (aik, cil) (directed):

paths carried (ci∗, ckl) ⇒ Congestion = n

Congestion of (cil, bjl) (directed):

paths carried (cij , c∗l) ⇒ Congestion = n

Congestion of (bjl, ckl) (directed):

paths carried (c∗j , ckl) ⇒ Congestion = n

0-30

Page 32: Special Purpose Machines - IIT BombayRanade/606/Vlsilb.pdfSpecial Purpose Machines What is the best way to build a VLSI chip to solve problem X? Best = Minimum area, Minimum time,

Upper bounds for matrixmultiplication

n× n Mesh:

A = O(n2)

T = O(n)

AT 2 = O(n4) Optimal!

n3 processor Hypercube:

A = O(n6)

T = O(log n)

AT 2 = O(n6 log2 n) Very suboptimal.

Using specialized networks it is possible to get op-timal AT 2 even for small T .

0-31

Page 33: Special Purpose Machines - IIT BombayRanade/606/Vlsilb.pdfSpecial Purpose Machines What is the best way to build a VLSI chip to solve problem X? Best = Minimum area, Minimum time,

Concluding remarks

• AT 2 = Ω(n2), with n the problem size isequivalent to hT = Ω(n), which essentiallysays that (i) most output depends on mostinputs in a very detailed manner. (ii) prob-lem is communication intensive: nearly allthe inputs read on the left have to be broughtto the right. Which also means that throughcommunication arguments you cannot get bet-ter than O(n2) for input size n.

• For certain problems it is useful to considercommunication in and out of small tiles in thechip. We may not be able to show that theleft and right have need to have much com-munication, but some tile does; which givesa lower bound on the area.

• Consider the problem of determining whetherone given bitstring is a rotation of anothergiven bitstring. In this case there is only a 1bit answer. So our technique does not apply.But it is nevertheless possible to show thatAT 2 = Ω(n2).

0-32

Page 34: Special Purpose Machines - IIT BombayRanade/606/Vlsilb.pdfSpecial Purpose Machines What is the best way to build a VLSI chip to solve problem X? Best = Minimum area, Minimum time,

Exercises

1. Show that multiplication of n bit integers hasAT 2 = Ω(n2), assuming wordsize = 1.

Give multiplier designs and estimate if they areoptimal for AT 2.

2. Consider the problem of sorting n integers, eachw = 2 log n bits wide. Show that this has AT 2 =Ω(n2). (Hint: the flow from inputs to outputs isnot full, but still quite large.)

3. The list ranking problem has as input an arrayNEXT[1..n] which represents a list. The output isan array RANK[1..n] which represents the ranks,i.e. RANK[i] is the distance of the element storedat position i of the array from the end of the list.

Assume that the wordsize w = θ(log n). Show thatthe list ranking problem has AT 2 = Ω(n2).

(Hint: Let n = 10. Suppose you are told thatNEXT[1..3] is read on the left and RANK[8..10] isread on the right of a small cut of the chip. Whatcan you say about the flow from NEXT[1..3] toRANK[8..10]? You have to determine the controlunder which the flow is maximized. Note that you

0-33

Page 35: Special Purpose Machines - IIT BombayRanade/606/Vlsilb.pdfSpecial Purpose Machines What is the best way to build a VLSI chip to solve problem X? Best = Minimum area, Minimum time,

dont have complete freedom in setting NEXT: itshould form a list.)

0-34