pipelining high-radix srt division algorithmssquire/images/srt1.pdf · • pipelining techniques...

33
Pipelining High-Radix SRT Division Algorithms Saurabh Upadhyay Illinois Institute of Technology Pipelining High-Radix SRT Division Algorithms – p. 1/20

Upload: others

Post on 11-Oct-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Pipelining High-Radix SRT Division Algorithmssquire/images/srt1.pdf · • Pipelining techniques also vary with different logic gate family. • Intelligent pipelining technique can

Pipelining High-Radix SRT Division Algorithms

Saurabh Upadhyay

Illinois Institute of Technology

Pipelining High-Radix SRT Division Algorithms – p. 1/20

Page 2: Pipelining High-Radix SRT Division Algorithmssquire/images/srt1.pdf · • Pipelining techniques also vary with different logic gate family. • Intelligent pipelining technique can

Outline

• Introduction• Division by digit recurrence and pipelining• Use of different memory elements to pipeline the algorithm in

static CMOS• Use of pass-transistor logic• Domino clocking techniques for the algorithm• comparison and conclusion

Pipelining High-Radix SRT Division Algorithms – p. 2/20

Page 3: Pipelining High-Radix SRT Division Algorithmssquire/images/srt1.pdf · • Pipelining techniques also vary with different logic gate family. • Intelligent pipelining technique can

Introduction

• SRT Division is very popular division algorithm used in currentmicroprocessors.

• It is named for D. Sweeny, J. E. Robertson and K. D. Tocher.• This dissertation shows different ways to pipeline SRT division

algorithm.• Simulation results are compared to find out the fastest possible

implementation.

Pipelining High-Radix SRT Division Algorithms – p. 3/20

Page 4: Pipelining High-Radix SRT Division Algorithmssquire/images/srt1.pdf · • Pipelining techniques also vary with different logic gate family. • Intelligent pipelining technique can

What current microprocessors use?

Processor Division Algorithm Connectivity

DEC 21164 Alpha AXP SRT Adder-CoupledHal Sparc64 SRT IndependentHP PA7200 SRT IndependentHP PA8000 SRT Multiplier-accumulate-c/pIBM RS/6000 Power2 Newton-Raphson IntegratedIntel Pentium SRT Adder-coupledIntel Pentium Pro SRT IndependentMips R8000 Multiplicative IntegratedMips R10000 SRT Multiplier-coupledPowerPc 604 SRT IntegratedPowerPc 620 SRT IntegratedSun SuperSparc Goldschmidt Multiplier-integratedSun UltraSparc SRT Independent

Pipelining High-Radix SRT Division Algorithms – p. 4/20

Page 5: Pipelining High-Radix SRT Division Algorithmssquire/images/srt1.pdf · • Pipelining techniques also vary with different logic gate family. • Intelligent pipelining technique can

Division by Digit Recurrence

• x = q · d + rem

Pipelining High-Radix SRT Division Algorithms – p. 5/20

Page 6: Pipelining High-Radix SRT Division Algorithmssquire/images/srt1.pdf · • Pipelining techniques also vary with different logic gate family. • Intelligent pipelining technique can

Division by Digit Recurrence

• x = q · d + rem

• Divisor is normalized, i.e. 1

2≤ d < 1 for fractional division.

Pipelining High-Radix SRT Division Algorithms – p. 5/20

Page 7: Pipelining High-Radix SRT Division Algorithmssquire/images/srt1.pdf · • Pipelining techniques also vary with different logic gate family. • Intelligent pipelining technique can

Division by Digit Recurrence

• x = q · d + rem

• Divisor is normalized, i.e. 1

2≤ d < 1 for fractional division.

• For normalized fractional divisor the quotient is in the range0 < q < 2

Pipelining High-Radix SRT Division Algorithms – p. 5/20

Page 8: Pipelining High-Radix SRT Division Algorithmssquire/images/srt1.pdf · • Pipelining techniques also vary with different logic gate family. • Intelligent pipelining technique can

Division by Digit Recurrence

• x = q · d + rem

• Divisor is normalized, i.e. 1

2≤ d < 1 for fractional division.

• For normalized fractional divisor the quotient is in the range0 < q < 2

• wi+1 = r · wi − qi+1 · d

where w0 = x

Pipelining High-Radix SRT Division Algorithms – p. 5/20

Page 9: Pipelining High-Radix SRT Division Algorithmssquire/images/srt1.pdf · • Pipelining techniques also vary with different logic gate family. • Intelligent pipelining technique can

Division by Digit Recurrence

• x = q · d + rem

• Divisor is normalized, i.e. 1

2≤ d < 1 for fractional division.

• For normalized fractional divisor the quotient is in the range0 < q < 2

• wi+1 = r · wi − qi+1 · d

where w0 = x

• Consists of n iterations, each iteration produces one digit of thequotient.

Pipelining High-Radix SRT Division Algorithms – p. 5/20

Page 10: Pipelining High-Radix SRT Division Algorithmssquire/images/srt1.pdf · • Pipelining techniques also vary with different logic gate family. • Intelligent pipelining technique can

Division by Digit Recurrence (continued)

• qi+1 is selected such that wi+1 is bounded.

Pipelining High-Radix SRT Division Algorithms – p. 6/20

Page 11: Pipelining High-Radix SRT Division Algorithmssquire/images/srt1.pdf · • Pipelining techniques also vary with different logic gate family. • Intelligent pipelining technique can

Division by Digit Recurrence (continued)

• qi+1 is selected such that wi+1 is bounded.• qi+1 = QST (r · wi, d) . . . Quotient Selection Table

Pipelining High-Radix SRT Division Algorithms – p. 6/20

Page 12: Pipelining High-Radix SRT Division Algorithmssquire/images/srt1.pdf · • Pipelining techniques also vary with different logic gate family. • Intelligent pipelining technique can

Division by Digit Recurrence (continued)

• qi+1 is selected such that wi+1 is bounded.• qi+1 = QST (r · wi, d) . . . Quotient Selection Table

• q = q[n] = q[0] +∑n

i=1qir

−i

Pipelining High-Radix SRT Division Algorithms – p. 6/20

Page 13: Pipelining High-Radix SRT Division Algorithmssquire/images/srt1.pdf · • Pipelining techniques also vary with different logic gate family. • Intelligent pipelining technique can

Division by Digit Recurrence (continued)

• qi+1 is selected such that wi+1 is bounded.• qi+1 = QST (r · wi, d) . . . Quotient Selection Table

• q = q[n] = q[0] +∑n

i=1qir

−i

• Quotient is formed by concatenation of quotient bits.

Pipelining High-Radix SRT Division Algorithms – p. 6/20

Page 14: Pipelining High-Radix SRT Division Algorithmssquire/images/srt1.pdf · • Pipelining techniques also vary with different logic gate family. • Intelligent pipelining technique can

Division by Digit Recurrence (continued)

• qi+1 is selected such that wi+1 is bounded.• qi+1 = QST (r · wi, d) . . . Quotient Selection Table

• q = q[n] = q[0] +∑n

i=1qir

−i

• Quotient is formed by concatenation of quotient bits.• On-the-fly conversion is used for redundant quotient digit set.

2−1 MUX LOA

D A

ND

SH

IFT

−IN

CO

NT

RO

L

selectLoad

selectLoad

QM

Q QM

Q

QM

QM

Qin

in

qj+1q . .

.

Q REG

2−1 MUX

QM REG

Pipelining High-Radix SRT Division Algorithms – p. 6/20

Page 15: Pipelining High-Radix SRT Division Algorithmssquire/images/srt1.pdf · • Pipelining techniques also vary with different logic gate family. • Intelligent pipelining technique can

Basic SRT Division Algorithm

{carry[8:0],2’b00}{sum[8:0],2’b00}

11

CSA

2d

i+1q state0

0 {3’b000,Dividend}

mux41/mux51 mux21

11

d= {3’b000, Divisor}__2d

__d

q={q2+,q+,q−,q2−,0}i+1

ulp

mux21

11

11

88

3

d

QST

11

wi+1 = r · wi − qi+1 · d

Pipelining High-Radix SRT Division Algorithms – p. 7/20

Page 16: Pipelining High-Radix SRT Division Algorithmssquire/images/srt1.pdf · • Pipelining techniques also vary with different logic gate family. • Intelligent pipelining technique can

Pipelining

• Pipelining is a technique used to create clock cycle times forarithmetic data-paths.

• Usually, memory elements like flip-flops, transparent latches andpulsed latches are used to impose sequencing.

• Pipelining techniques also vary with different logic gate family.• Intelligent pipelining technique can improve performance greatly,

on the other hand poor pipelining introduces large sequencingoverheads.

Pipelining High-Radix SRT Division Algorithms – p. 8/20

Page 17: Pipelining High-Radix SRT Division Algorithmssquire/images/srt1.pdf · • Pipelining techniques also vary with different logic gate family. • Intelligent pipelining technique can

Pipelining with Flip-Flops

11

CSA

clk FFFF

11

11

0 {3’b000,Dividend}d= {3’b000, Divisor}

mux41/mux51 mux21mux21

clk

11

11

i+1q state0

{q2+,q+,q−,q2−,0}

{sum[8:0],2’b00}{carry[8:0],2’b00}

____2ddd2d

ulp

Tc = ∆logic + ∆CQ + ∆DC + tskew

Pipelining High-Radix SRT Division Algorithms – p. 9/20

Page 18: Pipelining High-Radix SRT Division Algorithmssquire/images/srt1.pdf · • Pipelining techniques also vary with different logic gate family. • Intelligent pipelining technique can

Pipelining with Flip-Flops

11

CSA

clk FFFF

11

11

0 {3’b000,Dividend}d= {3’b000, Divisor}

mux41/mux51 mux21mux21

clk

11

11

i+1q state0

{q2+,q+,q−,q2−,0}

{sum[8:0],2’b00}{carry[8:0],2’b00}

____2ddd2d

ulp

Tc = ∆logic + ∆CQ + ∆DC + tskew = 2ns

Pipelining High-Radix SRT Division Algorithms – p. 9/20

Page 19: Pipelining High-Radix SRT Division Algorithmssquire/images/srt1.pdf · • Pipelining techniques also vary with different logic gate family. • Intelligent pipelining technique can

Pipelining with Latches

CSA

1111

{q2+,q+,q−,q2−,0}

2d d

i+1q state0

0 {3’b000,Dividend}d= {3’b000, Divisor}

mux41/mux51 mux21 mux21

latch

{carry[8:0],2’b00}{sum[8:0],2’b00}

clk_bar clk_bar

111111

11 1111

__ __2dd

ulp

clkclkclk

latchlatch

latch latch

Tc = ∆logic + 2 · ∆DQ

Pipelining High-Radix SRT Division Algorithms – p. 10/20

Page 20: Pipelining High-Radix SRT Division Algorithmssquire/images/srt1.pdf · • Pipelining techniques also vary with different logic gate family. • Intelligent pipelining technique can

Pipelining with Latches

CSA

1111

{q2+,q+,q−,q2−,0}

2d d

i+1q state0

0 {3’b000,Dividend}d= {3’b000, Divisor}

mux41/mux51 mux21 mux21

latch

{carry[8:0],2’b00}{sum[8:0],2’b00}

clk_bar clk_bar

111111

11 1111

__ __2dd

ulp

clkclkclk

latchlatch

latch latch

Tc = ∆logic + 2 · ∆DQ = 1.8ns

Pipelining High-Radix SRT Division Algorithms – p. 10/20

Page 21: Pipelining High-Radix SRT Division Algorithmssquire/images/srt1.pdf · • Pipelining techniques also vary with different logic gate family. • Intelligent pipelining technique can

Pipelining with Pulsed Latches

{q2+,q+,q−,q2−,0}

2d

i+1q state0

11

0 {3’b000,Dividend}d= {3’b000, Divisor}

mux41/mux51 mux21

ulp

11

{carry[8:0],2’b00}{sum[8:0],2’b00}

11

____d d 2d

clk

mux21

latchlatch

11

CSA

clk

11

Tc = ∆logic + ∆DQ + max(0, ∆DC + tskew − tpw)

Pipelining High-Radix SRT Division Algorithms – p. 11/20

Page 22: Pipelining High-Radix SRT Division Algorithmssquire/images/srt1.pdf · • Pipelining techniques also vary with different logic gate family. • Intelligent pipelining technique can

Pipelining with Pulsed Latches

{q2+,q+,q−,q2−,0}

2d

i+1q state0

11

0 {3’b000,Dividend}d= {3’b000, Divisor}

mux41/mux51 mux21

ulp

11

{carry[8:0],2’b00}{sum[8:0],2’b00}

11

____d d 2d

clk

mux21

latchlatch

11

CSA

clk

11

Tc = ∆logic + ∆DQ + max(0, ∆DC + tskew − tpw) = 1.6ns

Pipelining High-Radix SRT Division Algorithms – p. 11/20

Page 23: Pipelining High-Radix SRT Division Algorithmssquire/images/srt1.pdf · • Pipelining techniques also vary with different logic gate family. • Intelligent pipelining technique can

Use of Pass Transistor Logic

• Static multiplexors are replaced with transmission gatemultiplexors, a well-known application of pass transistor logic.

• The algorithm now needs a 5-1 multiplexor instead of a 4-1multiplexor.

• The time periods are now,1.6ns for sequencing using Flip-Flops,1.7ns for sequencing using Latches and1.4ns for sequencing using Pulsed Latches.

• The implementation with flip-flops is faster than the one withlatches in this case.

Pipelining High-Radix SRT Division Algorithms – p. 12/20

Page 24: Pipelining High-Radix SRT Division Algorithmssquire/images/srt1.pdf · • Pipelining techniques also vary with different logic gate family. • Intelligent pipelining technique can

Using Domino Logic

• Designers prefer domino logic for high-performance systems.• Domino logic is considered twice as fast as the static CMOS logic,

but this is often not the case.• Requirement of dual-rail domino gates, remedies for

charge-sharing problem and clocking techniques used limit thespeed.

• Traditional domino clocking has large sequencing overhead.• Overlapping clocks can be used to hide sequencing overheads,

as shown ahead.

Pipelining High-Radix SRT Division Algorithms – p. 13/20

Page 25: Pipelining High-Radix SRT Division Algorithmssquire/images/srt1.pdf · • Pipelining techniques also vary with different logic gate family. • Intelligent pipelining technique can

Traditional Domino Clocking

ulp

{carry[8:0],2’b00}

11x2

clk

latchlatch

latch latchlatch

CSA

{sum[8:0],2’b00}

{q2+,q+,q−,q2−,0}

state0

{3’b000,Dividend}d= {3’b000, Divisor}

mux21 mux21

11x2

clk_bar clk_bar

clk clk

____2d d d 2d 0

clkmux51

clk_bar

clkclk

qi+1

11x2 11x2

11x211x2

11x211x2

Tc = ∆logic + 2 · ∆DC + 2 · tskew

Pipelining High-Radix SRT Division Algorithms – p. 14/20

Page 26: Pipelining High-Radix SRT Division Algorithmssquire/images/srt1.pdf · • Pipelining techniques also vary with different logic gate family. • Intelligent pipelining technique can

Traditional Domino Clocking

ulp

{carry[8:0],2’b00}

11x2

clk

latchlatch

latch latchlatch

CSA

{sum[8:0],2’b00}

{q2+,q+,q−,q2−,0}

state0

{3’b000,Dividend}d= {3’b000, Divisor}

mux21 mux21

11x2

clk_bar clk_bar

clk clk

____2d d d 2d 0

clkmux51

clk_bar

clkclk

qi+1

11x2 11x2

11x211x2

11x211x2

Tc = ∆logic + 2 · ∆DC + 2 · tskew = 1.6ns

Pipelining High-Radix SRT Division Algorithms – p. 14/20

Page 27: Pipelining High-Radix SRT Division Algorithmssquire/images/srt1.pdf · • Pipelining techniques also vary with different logic gate family. • Intelligent pipelining technique can

Skew Tolerant Domino Clocking

Static

Dynam

ic

Static

Dynam

ic

Static

Dynam

ic

overlap

yx Phase 2

2

1

222111

Phase 1

T c

Dynam

ic

Dynam

ic

Static

Static

Dynam

ic

Static

Pipelining High-Radix SRT Division Algorithms – p. 15/20

Page 28: Pipelining High-Radix SRT Division Algorithmssquire/images/srt1.pdf · • Pipelining techniques also vary with different logic gate family. • Intelligent pipelining technique can

Constraints

• te =Tc

N+ thold + tskew

• tp = tprech + tskew

• Tc = te + tp

• toverlap =N−1

NTc − tprech − tskew1 =

thold + tskew2 + tborrow

Pipelining High-Radix SRT Division Algorithms – p. 16/20

Page 29: Pipelining High-Radix SRT Division Algorithmssquire/images/srt1.pdf · • Pipelining techniques also vary with different logic gate family. • Intelligent pipelining technique can

Skew Tolerant Domino Implementation

{sum[8:0],2’b00}

CSA

2d d

state0

11x2

0 {3’b000,Dividend}d= {3’b000, Divisor}

mux21 mux21

ulp

{carry[8:0],2’b00}

____2dd

11x2

11x2

mux51

phi2

phi1phi1 phi1

{q2+,q+,q−,q2−,0}

qi+1

Tc = ∆logic

Pipelining High-Radix SRT Division Algorithms – p. 17/20

Page 30: Pipelining High-Radix SRT Division Algorithmssquire/images/srt1.pdf · • Pipelining techniques also vary with different logic gate family. • Intelligent pipelining technique can

Skew Tolerant Domino Implementation

{sum[8:0],2’b00}

CSA

2d d

state0

11x2

0 {3’b000,Dividend}d= {3’b000, Divisor}

mux21 mux21

ulp

{carry[8:0],2’b00}

____2dd

11x2

11x2

mux51

phi2

phi1phi1 phi1

{q2+,q+,q−,q2−,0}

qi+1

Tc = ∆logic = 1.1ns

Pipelining High-Radix SRT Division Algorithms – p. 17/20

Page 31: Pipelining High-Radix SRT Division Algorithmssquire/images/srt1.pdf · • Pipelining techniques also vary with different logic gate family. • Intelligent pipelining technique can

Comparison

Sequencing method Time Period, Number of devicesTcns(FO4) (p-mos,n-mos)

Skew-tolerant domino 1.1 (6.2) 1052 (264, 788)Pulsed Latches (xmux) 1.4 (7.9) 1096 (548, 548)Pulsed Latches 1.6 (9) 964 (482, 482)Flip-Flops (xmux) 1.6 (9) 1140 (570, 570)Traditional domino 1.6 (9) 1932 (704, 1228)Transparent Latches (xmux) 1.7 (9.6) 1360 (680, 680)Transparent Latches 1.8 (10.1) 1228 (614, 614)Flip-Flops 2 (11.2) 1008 (504, 504)

Pipelining High-Radix SRT Division Algorithms – p. 18/20

Page 32: Pipelining High-Radix SRT Division Algorithmssquire/images/srt1.pdf · • Pipelining techniques also vary with different logic gate family. • Intelligent pipelining technique can

Conclusions

• Speed improves significantly with a better pipelining approach.• Skew tolerant domino is the fastest implementation, yet has very

few devices used.• Alternatively, a pulsed latch implementation with transmission

gates offers high clock frequency.• Design complexity may also be an important factor while choosing

a sequencing method.

Pipelining High-Radix SRT Division Algorithms – p. 19/20

Page 33: Pipelining High-Radix SRT Division Algorithmssquire/images/srt1.pdf · • Pipelining techniques also vary with different logic gate family. • Intelligent pipelining technique can

Acknowledgements

• My deepest gratitude goes to my advisor Dr. James Stine.• Inspiring working conditions at the Illinois Institute of Technology

greatly facilitated my research.• Thank you!

Pipelining High-Radix SRT Division Algorithms – p. 20/20