asynchronous datapath design

43
Asynchronous Datapath Design • Adders • Comparators • Multipliers • Registers • Completion Detection • Bus • Pipeline •…..

Upload: lowri

Post on 31-Jan-2016

40 views

Category:

Documents


0 download

DESCRIPTION

Asynchronous Datapath Design. Adders Comparators Multipliers Registers Completion Detection Bus Pipeline …. Asynchronous Adder Design. Motivation Background: Sync and Async adders Delay-insensitive carry-lookahead adders Complexity Analysis Conclusions. Motivation. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Asynchronous Datapath Design

Asynchronous Datapath Design• Adders• Comparators• Multipliers• Registers• Completion Detection• Bus• Pipeline•…..

Page 2: Asynchronous Datapath Design

Asynchronous Adder Design

• Motivation• Background: Sync and Async adders• Delay-insensitive carry-lookahead adders• Complexity Analysis• Conclusions

Page 3: Asynchronous Datapath Design

Motivation

• Integer addition is one of the most important operations in digital computer systems

• Statistics shows that in a prototypical RISC

machine (DLX) 72% of the instructions perform additions(or subtractions) in the datapath.

• In ARM processors it even reaches 80%.

• The performance of processors is significantly influenced by the speed of their adders.

Page 4: Asynchronous Datapath Design

Background

• Adders: synchronous or asynchronous synchronous adders: worst case performance asynchronous adders: average case performance

• For example:

Ripple-Carry Adders(synchronous): O(n) Carry-Completion Sensing Adders(asynchronous): O(log n)

Page 5: Asynchronous Datapath Design

Background: Binary Addition

• Worst case 00000001 + 11111111 ---------------------- S 00000000 C 11111111 ---------------------- 100000000

• Adders can perform average case behavior

• Best case 00000000 + 00000000 ---------------------- S 00000000 C 00000000 ---------------------- 000000000

Page 6: Asynchronous Datapath Design

Background

• Ripple-Carry Adders:

• One-stage full adder:• Logic complexity: O(n)• Time complexity: O(n)

Page 7: Asynchronous Datapath Design

Background

• Carry-Sensing Completion Detection Adders: (asynchronous version of RCA)

Page 8: Asynchronous Datapath Design

Background

• One-stage CSCD Adder:

• Carry-Sensing Completion Detection Adders:

Logic complexity: O(n) Time complexity: O(log n)

Page 9: Asynchronous Datapath Design

Background

• Delay-Insensitive Ripple-Carry Adders: (DI version of RCA):

Page 10: Asynchronous Datapath Design

Background

• One-stage DIRCA:

• DIRCA Adders:

Logic complexity: O(n) Time complexity: O(log n)• One of the most robust adders

Page 11: Asynchronous Datapath Design

Background

• Completion detection for asynchronous adders:

Page 12: Asynchronous Datapath Design

Background

• DI adder VS Bundling Constraint adder:

Page 13: Asynchronous Datapath Design

Carry-Lookahead Adders

• RCA requires n stage-propagation delays. • For high speed processors, this scheme is undesirable. • One way to improve adder performance is to use parallel processing in computing the carries. • That is why Carry-Lookahead Adders (CLA) are introduced.

• CLAs:

Logic complexity: O(n) Time complexity: O(log n)

Page 14: Asynchronous Datapath Design

Carry-Lookahead Adders

Page 15: Asynchronous Datapath Design

Carry-Lookahead Adders

• A module:

• B module:

Page 16: Asynchronous Datapath Design

DI Carry-Lookahead Adders

• Delay-Insensitive Carry-Lookahead Adders (DICLA) may be implemented by using delay-insensitive code.

1. dual-rail signaling: inputs, sums, and carry bits

2. one-hot code: internal signals

A1=0A0=0

A1=0A0=1

A1=1A0=0

A1=1A0=1

a. No data b. valid 0 c. valid 1 d. illegal

a. No data: 000b. 001c. 010d. 100

Page 17: Asynchronous Datapath Design

QDI Carry-Lookahead Adders

• DI C module: 1. internal signals: one-hot code, k, g, p

2. input and sum bits: dual-rail signals

CLA A module

Page 18: Asynchronous Datapath Design

QDI Carry-Lookahead Adders

• DI D module: 1. Internal signals: one-hot code, K, G, P 2. Carry bits: dual-rail signals

CLA B module

Page 19: Asynchronous Datapath Design

DI Carry-Lookahead Adders

Page 20: Asynchronous Datapath Design

DI Carry-Lookahead Adders

If A3=B3 thenC3 is carry kill or generate

k3,g3

Page 21: Asynchronous Datapath Design

DI Carry-Lookahead Adders

G3,2, K3,2

can be used tospeed up the carry computation too.

k3,g3

K3,2, G3,2

Page 22: Asynchronous Datapath Design

Speeding Up DICLA

• Idea: Send the carry-generate’s and carry-kill’s to any possible stages which needs these information to compute carries immediately.• D module with speed-up circuitry

Page 23: Asynchronous Datapath Design

Speeding Up DICLA

• General form:• D module with speed-up circuitry

for carry-kill

for carry-generate

= gj-1+gj-2Pj-1+…+g0p1p2…pj-1

This is in fact the full carry-lookahead scheme.

Page 24: Asynchronous Datapath Design

Speeding Up DICLA

• Problem of full carry-lookahead scheme • practical limitations on fan-in and fan-out, irregular structure, and many long wire.• logic complexity increases more than linearly

• Solution: use the properties of tree-like structure• New speed-up circuitry:

Page 25: Asynchronous Datapath Design

• SP focuses on the root node of a subtree.• All leftmost root node of its right subtree

Page 26: Asynchronous Datapath Design

Power of Speed-up Circuitry

x : carry chainx’ in r subtreex-x’ in l subtree

Page 27: Asynchronous Datapath Design

Power of Speed-up Circuitry

Without Speed-up circuitry

Page 28: Asynchronous Datapath Design

Power of Speed-up Circuitry

With Speed-up circuitry

Page 29: Asynchronous Datapath Design

Optimization:

• Simplified D module • Simplified D’ module

• Better logic complexity• Delay-Insensitive again

Page 30: Asynchronous Datapath Design
Page 31: Asynchronous Datapath Design

Complexity Analysis

• DICLASP

• Logic Complexity: (n)• Time Complexity: (log log n)• Best area-time efficiency: (n log log n)

Page 32: Asynchronous Datapath Design

Complexity Analysis

Page 33: Asynchronous Datapath Design

CMOS: C module

Page 34: Asynchronous Datapath Design

CMOS: SD module

Page 35: Asynchronous Datapath Design

CMOS: SD’ module

Page 36: Asynchronous Datapath Design

SPICE Simulation:

SPICE Simulation contains two parts:• Random number inputs: 10000 random generated input pairs• Statistical data: running examples on a 32-bit ARM emulator

Page 37: Asynchronous Datapath Design

SPICE Simulation:

• Random number input distribution

Page 38: Asynchronous Datapath Design

SPICE Simulation:

• SPICE simulation results: random number inputs

• Speedup: DIRCA vs RCA: 6.39 DICLASP vs CLA: 2.64

Page 39: Asynchronous Datapath Design

SPICE Simulation:

• Breakdown of addition/subtraction operations: by runing three benchmark programs: Dhrystone f1, Dhrystone f2 and Espresso dc2 on a 32-bit ARM simulator

Page 40: Asynchronous Datapath Design

SPICE Simulation:dynamic traces

Page 41: Asynchronous Datapath Design

SPICE Simulation:

• dynamic traces• 83.92% instructions: |carry chain| <17

Page 42: Asynchronous Datapath Design

SPICE Simulation:

• SPICE simulation results: dynamic traces• Average computation time:

DIRCA 9.61ns DICALSP 5.25ns• Speedup: DIRCA vs RCA: 4.1

DICLASP vs CLA: 2.2

Page 43: Asynchronous Datapath Design

Conclusion

• DICLASP Best area-time efficiency: (n log log n)

Correctness: No adder is more robust than

DICLASP

Cost(Logic Complexity):No parallel adder is

cheaper than DICLASP ((n)). Speed(Time Complexity):No adder is better

than DICLASP ((log log n)). Suitable for VLSI implementation.