a 240ps 64b carry-lookahead adder in 90nm cmos faezeh montazeri [email protected] advanced...

27
A 240ps 64b Carry-Lookahead A 240ps 64b Carry-Lookahead Adder in 90nm CMOS Adder in 90nm CMOS Faezeh Montazeri Faezeh Montazeri [email protected] [email protected] Advanced VLSI Course Presentation Advanced VLSI Course Presentation University of Tehran University of Tehran December 2006 December 2006 Based on : Based on : A 240ps 64b Carry-Lookahead Adder in 90nm A 240ps 64b Carry-Lookahead Adder in 90nm CMOS CMOS Sean Kao, Radu Zlatanovici, Borivoje Nikolić Sean Kao, Radu Zlatanovici, Borivoje Nikolić University of California, Berkeley University of California, Berkeley

Upload: daniela-strickland

Post on 02-Jan-2016

227 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: A 240ps 64b Carry-Lookahead Adder in 90nm CMOS Faezeh Montazeri fmontazeri@ece.ut.ac.ir Advanced VLSI Course Presentation University of Tehran December

A 240ps 64b Carry-Lookahead A 240ps 64b Carry-Lookahead Adder in 90nm CMOSAdder in 90nm CMOS

Faezeh MontazeriFaezeh [email protected]@ece.ut.ac.ir

Advanced VLSI Course PresentationAdvanced VLSI Course PresentationUniversity of TehranUniversity of Tehran

December 2006December 2006

Based on :Based on :A 240ps 64b Carry-Lookahead Adder in 90nm CMOSA 240ps 64b Carry-Lookahead Adder in 90nm CMOS

Sean Kao, Radu Zlatanovici, Borivoje NikolićSean Kao, Radu Zlatanovici, Borivoje NikolićUniversity of California, BerkeleyUniversity of California, Berkeley

Page 2: A 240ps 64b Carry-Lookahead Adder in 90nm CMOS Faezeh Montazeri fmontazeri@ece.ut.ac.ir Advanced VLSI Course Presentation University of Tehran December

2

0

10

20

30

0 10 20 30 40 50 60

Normalized Delay [90nm 1V FO4]

No

rma

lize

d E

ne

rgy

[r.

u.]

500 nm

350 nm

250 nm

180 nm

130 nm

90 nm

What Is an Optimal Adder?What Is an Optimal Adder?

Optimal adder:• Minimum delay for given energy• Minimum energy for given delay

64-bit Adders on IEEE Xplore 1995-2005

[1]

Page 3: A 240ps 64b Carry-Lookahead Adder in 90nm CMOS Faezeh Montazeri fmontazeri@ece.ut.ac.ir Advanced VLSI Course Presentation University of Tehran December

3

This WorkThis Work

Multi-issue 64-bit microprocessor environment:

• Optimize a set of representative 64-bit adders in

the energy – delay space

• Analyze the design tradeoffs

• Implement the optimal adder in

1.0V 90nm GP CMOS

Page 4: A 240ps 64b Carry-Lookahead Adder in 90nm CMOS Faezeh Montazeri fmontazeri@ece.ut.ac.ir Advanced VLSI Course Presentation University of Tehran December

4

OutlineOutline

• Energy – delay optimization

• Design tradeoffs for 64-bit adders

• Test chip implementation

• Measured results

• Summary

Page 5: A 240ps 64b Carry-Lookahead Adder in 90nm CMOS Faezeh Montazeri fmontazeri@ece.ut.ac.ir Advanced VLSI Course Presentation University of Tehran December

5

Energy – Delay OptimizationEnergy – Delay Optimization

Delay

Ene

rgy Domino CLA Adder

• Goal: obtain the energy – delay optimal adder • CAD tool: optimize custom digital circuits in the

energy – delay space [3]

Static CLA Adder

[1]

Page 6: A 240ps 64b Carry-Lookahead Adder in 90nm CMOS Faezeh Montazeri fmontazeri@ece.ut.ac.ir Advanced VLSI Course Presentation University of Tehran December

6

Circuit Optimization FrameworkCircuit Optimization Framework

Optimizer

(Matlab)

Delay, EnergyStatic timer

(C++)

Models Netlist Optimization Goal

Optimal Design

Variables

Design Variables

Static timer

(C++)

Optimization Core

[1]

Page 7: A 240ps 64b Carry-Lookahead Adder in 90nm CMOS Faezeh Montazeri fmontazeri@ece.ut.ac.ir Advanced VLSI Course Presentation University of Tehran December

7

Adder Optimization SetupAdder Optimization Setup

MinimizeDELAYsubject toMaximumENERGY

Generatesubtree

Propagatesubtree

G64

MUX

Carry

S0

S1

SUM

Sum precompute

A,B,Cin

Critical path

Non-critical path

CL = 27 fF

CIN ≤ 27fF

tSLOPE ≤ 100 ps [1]

Page 8: A 240ps 64b Carry-Lookahead Adder in 90nm CMOS Faezeh Montazeri fmontazeri@ece.ut.ac.ir Advanced VLSI Course Presentation University of Tehran December

8

0

10

20

30

40

50

6 8 10 12 14

Delay [FO4]

En

erg

y [p

J]

R2 CLA

R4 CLA

CLA: Full Tree ComparisonCLA: Full Tree Comparison

• 6 stages• Moderate

branching

• 3 stages• Larger

branching

Radix- 4 closer to optimum number of stages

Radix-2 Radix-4

[1]

Page 9: A 240ps 64b Carry-Lookahead Adder in 90nm CMOS Faezeh Montazeri fmontazeri@ece.ut.ac.ir Advanced VLSI Course Presentation University of Tehran December

9

CLA vs. LingCLA vs. Ling

0

10

20

30

40

50

6 8 10 12 14

Delay [FO4]

En

erg

y [p

J]

R2 Ling

R2 CLA

R4 Ling

R4 CLA

1i1i1iiiiiii

0121223

HbabaHbaS

gttgtgg0]:H[3

1iiii

0123123233

GbaS

gpppgppgpg0]:G[3

Conventional CLA• Higher stack in first stage• Simple sum precompute

Ling CLA• Lower stack in first stage • Complex sum precompute• Higher speed

[1]

[2]

Page 10: A 240ps 64b Carry-Lookahead Adder in 90nm CMOS Faezeh Montazeri fmontazeri@ece.ut.ac.ir Advanced VLSI Course Presentation University of Tehran December

10

Full vs. Sparse ComparisonFull vs. Sparse Comparison

0

10

20

30

40

50

6 8 10 12 14

Delay [FO4]

En

erg

y [p

J]

R2 FULL

R4 FULL

FULL SP2Ling CLA

[1]

Page 11: A 240ps 64b Carry-Lookahead Adder in 90nm CMOS Faezeh Montazeri fmontazeri@ece.ut.ac.ir Advanced VLSI Course Presentation University of Tehran December

11

Full vs. Sparse ComparisonFull vs. Sparse Comparison

0

10

20

30

40

50

6 8 10 12 14

Delay [FO4]

En

erg

y [p

J]

R2 FULL

R2 SP2

R4 FULL

R4 SP2

FULL SP2Ling CLA

SP2

R2 +

R4 +[1]

Page 12: A 240ps 64b Carry-Lookahead Adder in 90nm CMOS Faezeh Montazeri fmontazeri@ece.ut.ac.ir Advanced VLSI Course Presentation University of Tehran December

12

0

10

20

30

40

50

6 8 10 12 14

Delay [FO4]

En

erg

y [p

J]

R2 FULL

R2 SP2

R2 SP4

R4 FULL

R4 SP2

R4 SP4

Full vs. Sparse ComparisonFull vs. Sparse Comparison

Sparseness benefits adders with large carry trees

FULL SP4Ling CLA

SP2 SP4

R2 + +

R4 + –[1]

Page 13: A 240ps 64b Carry-Lookahead Adder in 90nm CMOS Faezeh Montazeri fmontazeri@ece.ut.ac.ir Advanced VLSI Course Presentation University of Tehran December

13

0

10

20

30

40

50

6 8 10 12 14

Delay [FO4]

En

erg

y [p

J]

R2 FULL

R2 SP2

R2 SP4

R4 FULL

R4 SP2

R4 SP4

Optimal AdderOptimal Adder

• Ling’s equations

• Radix-4 sparse-2

• Domino carry tree

• Static sum-precompute

• Delay of fastest adder:

7.3 FO4

[1]

Page 14: A 240ps 64b Carry-Lookahead Adder in 90nm CMOS Faezeh Montazeri fmontazeri@ece.ut.ac.ir Advanced VLSI Course Presentation University of Tehran December

14

Radix-4 Sparse-2 Carry TreeRadix-4 Sparse-2 Carry Tree

• Computes every other Ling pseudo-carry: H0, H2, H4 …• Each output selects two sums

SUMSEL

(A0, B0)

H4/I4

H16/I16

H64

Cin (A63, B63)G/T

s63Couts0

G/T gates

H gate

H/I gates

SUMSEL MUX

LEGEND

[1]

Page 15: A 240ps 64b Carry-Lookahead Adder in 90nm CMOS Faezeh Montazeri fmontazeri@ece.ut.ac.ir Advanced VLSI Course Presentation University of Tehran December

15

Adder Core Block DiagramAdder Core Block Diagram

• Critical paths implemented in clock-delayed domino • Non-critical paths implemented in static • At-speed BIST

TG

H4I4

H16I16

H64

Sum precompute

Sum selectMUX

pc1 pc2 pc3 pc4 psel

sum

Clock Generator

MUX Out FF

pc1

Scan chain

Scan chain

S0

S1 Buffer

Com

parator

Out

scan_in

footed domino

footless domino

static CMOShard edge

H64

H64'

Precomputed sums

inputs

[1]

Page 16: A 240ps 64b Carry-Lookahead Adder in 90nm CMOS Faezeh Montazeri fmontazeri@ece.ut.ac.ir Advanced VLSI Course Presentation University of Tehran December

16

Timing DiagramTiming Diagram

• 20 ps margin on all edges; Adjustable hard edges• Delay spread places precharge in critical path

pc1

pc2

pc3

pc4

psel

H64

H64'

Hard edge

TCYCLE DUTY CYCLE

24%

43%

53%

53%

45%

[1]

fmontazeri
Page 17: A 240ps 64b Carry-Lookahead Adder in 90nm CMOS Faezeh Montazeri fmontazeri@ece.ut.ac.ir Advanced VLSI Course Presentation University of Tehran December

17

Layout FloorplanLayout Floorplan

• Bitslice height: 24 metal tracks• Aligned clock lines• Sum precompute occupies space freed by sparse carry tree

TG H4

I16I4

H16

H64

J1

TG SUM SELECT

SUM SELECT

TG H4

I16I4

H16

H64

J1

TG SUM SELECT

SUM SELECT

XO

R2

XO

R2

XO

R2

XO

R2

XO

R2

XO

R2

XO

R2

XO

R2

XO

R2

XO

R2

K1

J1

J0J0

EVERY BITSLICE

SPARSE-2 CARRY TREE

SPARSE-2 SUM

PRECOMP

24 TRACKS

LEGEND

pc1 pc2 pc3 pc4 psel

[1]

fmontazeri
Page 18: A 240ps 64b Carry-Lookahead Adder in 90nm CMOS Faezeh Montazeri fmontazeri@ece.ut.ac.ir Advanced VLSI Course Presentation University of Tehran December

18

90 nm Test Chip90 nm Test Chip

CO

RE

2

CO

RE

3

CO

RE

4

CO

RE

6

CO

RE

7

CO

RE

8

CO

RE

5A

DD

ER

CO

RE

1

TE

ST

IN

TE

ST

OU

T

CK GEN

1.7 mm

1.6

mm

• 90 nm GP 7M 1P • SVT transistors• VDD = 1V• 8 adder cores + test

circuitry • Core 1: this work• Cores 2-8:

Supply noise measurements and supply grid experiments [4].

• Adder core size: 417 x 75m2

[1]

Page 19: A 240ps 64b Carry-Lookahead Adder in 90nm CMOS Faezeh Montazeri fmontazeri@ece.ut.ac.ir Advanced VLSI Course Presentation University of Tehran December

19

[1]

Page 20: A 240ps 64b Carry-Lookahead Adder in 90nm CMOS Faezeh Montazeri fmontazeri@ece.ut.ac.ir Advanced VLSI Course Presentation University of Tehran December

20

Chip PackagingChip Packaging

Chip-on-board:• Bond wires 60% shorter• Cleaner supply 10 ps shorter delays

Advance ProgramDigest

[1]

fmontazeri
Page 21: A 240ps 64b Carry-Lookahead Adder in 90nm CMOS Faezeh Montazeri fmontazeri@ece.ut.ac.ir Advanced VLSI Course Presentation University of Tehran December

21

Measured Results: DelayMeasured Results: Delay

CHIP-ON-BOARD:

• VDD = 1 V

– Average: 240 ps

– Fastest: 226 ps

• VDD = 1.3 V

– Average: 180 ps

Davg = 7.5 FO4

[1]

Page 22: A 240ps 64b Carry-Lookahead Adder in 90nm CMOS Faezeh Montazeri fmontazeri@ece.ut.ac.ir Advanced VLSI Course Presentation University of Tehran December

22

Measured Results: PowerMeasured Results: Power

VDD = 1V: Pmax = 260 mW

VDD = 1.3V: Pmax = 606 mW

Adder core

Clk gen

BIST

Leakage

[1]

Page 23: A 240ps 64b Carry-Lookahead Adder in 90nm CMOS Faezeh Montazeri fmontazeri@ece.ut.ac.ir Advanced VLSI Course Presentation University of Tehran December

23

ConclusionConclusion

• 90 nm GP 7M 1P

• SVT transistors

• VDD = 1V

• 8 adder cores + test circuitry

• Adder core size: 417 x 75m2

Page 24: A 240ps 64b Carry-Lookahead Adder in 90nm CMOS Faezeh Montazeri fmontazeri@ece.ut.ac.ir Advanced VLSI Course Presentation University of Tehran December

24

0

10

20

30

0 10 20 30 40 50 60

Normalized Delay [90nm 1V FO4]

No

rma

lize

d E

ne

rgy

[r.

u.]

500 nm350 nm250 nm180 nm130 nm90 nmThis work

64-bit Adders on IEEE Xplore 1995-2005

SummarySummary

• Ling radix-4 sparse-2 domino carry tree

• 90nm GP CMOS: 240ps, 260mW @1V

[1]

Page 25: A 240ps 64b Carry-Lookahead Adder in 90nm CMOS Faezeh Montazeri fmontazeri@ece.ut.ac.ir Advanced VLSI Course Presentation University of Tehran December

25

ReferencesReferences

• [1]. S. Kao, R. Zlatanovici, B. Nikolic, “A 240ps 64-bit Carry-Lookahead Adder in 90nm CMOS,” ISSCC2006, Feb.2006.

• [2]. H. Ling, “High Speed Binary Adder,” IBM J. R&D, vol. 25, no. 3, pp.156-166, May, 1981.

• [3]. R. Zlatanovici, B. Nikolic, “Power – Performance Optimization for Custom Digital Circuits,” Proc. PATMOS, pp. 404-414, Sept., 2005.

• [4] V. Abramzon, E. Alon, M. Horowitz Stanford University

Page 26: A 240ps 64b Carry-Lookahead Adder in 90nm CMOS Faezeh Montazeri fmontazeri@ece.ut.ac.ir Advanced VLSI Course Presentation University of Tehran December
Page 27: A 240ps 64b Carry-Lookahead Adder in 90nm CMOS Faezeh Montazeri fmontazeri@ece.ut.ac.ir Advanced VLSI Course Presentation University of Tehran December