low-latency interfaces for mixed-timing domains [in dac-01] tiberiu chelceasteven m. nowick...

45
Low-Latency Interfaces for Mixed-Timing Domains [in DAC-01] Tiberiu Chelcea Tiberiu Chelcea Steven M. Steven M. Nowick Nowick Department of Computer Science Department of Computer Science Columbia University Columbia University {tibi,nowick}@cs.columbia.edu {tibi,nowick}@cs.columbia.edu

Post on 20-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Low-Latency Interfaces for

Mixed-Timing Domains[in DAC-01]

Tiberiu ChelceaTiberiu Chelcea Steven M. Steven M. NowickNowick

Department of Computer ScienceDepartment of Computer Science

Columbia UniversityColumbia University

{tibi,nowick}@cs.columbia.edu{tibi,nowick}@cs.columbia.edu

IntroductionIntroductionKey Trend in VLSI systems: Key Trend in VLSI systems: systems-on-a-chip systems-on-a-chip

(SoC)(SoC)

Two fundamental challenges:Two fundamental challenges: mixed-timing domainsmixed-timing domains long interconnect delayslong interconnect delays

Our Goal:Our Goal: design of efficient interface circuits design of efficient interface circuits

Desirable Features:Desirable Features: arbitrarily robustarbitrarily robust low-latency, high-throughputlow-latency, high-throughput modularity, scalabilitymodularity, scalability

Few satisfactory solutions to date….Few satisfactory solutions to date….

Timing Issues in SoC DesignTiming Issues in SoC Design

(a) single-clock

longinter-connect

Domain #1

sync or async

(b) mixed-timing domains

Domain #2 sync or async

Domain #1

Domain #2

longinter-connect

Timing Issues in SoC Design Timing Issues in SoC Design (cont.)(cont.)

Solution:Solution: provide interface circuits provide interface circuits(a) single-clock

longinter-connect

Carloni et al., “relay stations”

Domain #1sync or async

(b) mixed-timing domains

Domain #2sync or async

Domain #1

Domain #2

longinter-connect

NEW: “mixed-timingFIFO’s”

NEW: “mixed-timing“relay stations”

ContributionsContributionsComplete set of mixed-timing interface circuits:Complete set of mixed-timing interface circuits:

sync-sync, async-sync,sync-sync, async-sync, sync-async, async-async sync-async, async-async

Features:Features: Arbitrary Robustness:Arbitrary Robustness: wrt synchronization failures wrt synchronization failures High-Throughput:High-Throughput:

in steady-state operation: no synchronization in steady-state operation: no synchronization overhead overhead

Low-Latency:Low-Latency: “fast restart”“fast restart” in empty FIFO: only synchronization overheadin empty FIFO: only synchronization overhead

Reusability:Reusability: each interface partitioned into each interface partitioned into reusable sub-reusable sub-

componentscomponents

Two Contributions: Two Contributions: Mixed-Timing FIFO’sMixed-Timing FIFO’s Mixed-Timing Relay StationsMixed-Timing Relay Stations

Contribution #1: Mixed-Timing Contribution #1: Mixed-Timing FIFO’sFIFO’sAddresses issue of interfacing mixed-timing domainsAddresses issue of interfacing mixed-timing domains

Features: Features: token ring architecturetoken ring architecture circular array of identical cellscircular array of identical cells shared buses: data + controlshared buses: data + control data: “immobile” once enqueueddata: “immobile” once enqueued distributed control: allows concurrent distributed control: allows concurrent putput//getget operations operations

2 circulating tokens:2 circulating tokens: define define tailtail & & headhead of queue of queue

Potential benefits:Potential benefits: low latencylow latency low powerlow power scalabilityscalability

Contribution #2: Mixed-Timing Contribution #2: Mixed-Timing Relay StationsRelay Stations

Addresses issue of long interconnect delaysAddresses issue of long interconnect delays

““Latency-Insensitive Protocols”:Latency-Insensitive Protocols”: safely tolerate long safely tolerate long

interconnect delays between systemsinterconnect delays between systems

Prior Contribution:Prior Contribution: introduce introduce “relay stations”“relay stations” single-clock domains (single-clock domains (Carloni et al., ICCAD-99)Carloni et al., ICCAD-99)

Our Contribution:Our Contribution: introduce introduce “mixed-timing relay “mixed-timing relay

stations”stations” mixed-clock mixed-clock (sync-sync)(sync-sync) async-syncasync-sync

First proposed solutions to date….First proposed solutions to date….

Related WorkRelated WorkSingle-Clock Domains: Single-Clock Domains: handling clock handling clock

discrepanciesdiscrepancies clock skew and jitter clock skew and jitter (Kol98, Greenstreet95)(Kol98, Greenstreet95) long interconnect delays long interconnect delays (Carloni99)(Carloni99)

Mixed-Timing Domains:Mixed-Timing Domains: 3 common approaches 3 common approaches Use “Wrapper Logic”:Use “Wrapper Logic”:

add logic layer to synchronize data/controladd logic layer to synchronize data/control(Seitz80, Seizovic94)(Seitz80, Seizovic94)

drawback:drawback: long latencies in communicationlong latencies in communication

Modify Receiver’s Clock:Modify Receiver’s Clock: stretchable and pausible clocks stretchable and pausible clocks

(Chapiro84, Yun96, Bormann97, (Chapiro84, Yun96, Bormann97, Sjogren/Myers97)Sjogren/Myers97)

drawback:drawback: penalties in restarting clock penalties in restarting clock

Related Work: Closer ApproachesRelated Work: Closer Approaches

Mixed-Timing Domains (cont.):Mixed-Timing Domains (cont.):

Interface Circuits: Mixed-Clock FIFO’s Interface Circuits: Mixed-Clock FIFO’s (Intel, Jex et al. (Intel, Jex et al. 1997):1997):

drawback:drawback: significant area overhead = significant area overhead = synchronizersynchronizer for each cellfor each cell

Our approach: mixed-clock FIFO’s Our approach: mixed-clock FIFO’s … … only 2 synchronizers only 2 synchronizers for entire FIFOfor entire FIFO

OutlineOutline

Mixed-Clock InterfacesMixed-Clock Interfaces FIFOFIFO Relay StationRelay Station

• Async-Sync InterfacesAsync-Sync Interfaces FIFOFIFO Relay StationRelay Station

• ResultsResults

• ConclusionsConclusions

Mixed-Clock FIFO: Block LevelMixed-Clock FIFO: Block Level

full

req_put

data_put

CLK_put

req_get

valid_get

empty

data_get

CLK_getMix

ed

-Clo

ck

FIF

O

Bus for data itemsIndicates when FIFO full

Indicates when FIFO empty

Controls get operations

Initiates get operations

Bus for data items

Indicates data items validity(always 1 in this design)

synchronous synchronous putput inteface inteface

synchronous synchronous getget interface interface

Initiates put operations

Controls put operations

Mixed-Clock FIFO: Steady-State Mixed-Clock FIFO: Steady-State SimulationSimulation

Ge

tC

on

tro

ller

Empty Detector

Full DetectorPut

Controller

full

req_put

data_put

CLK_put

CLK_getdata_get

req_get

valid_get

empty

Put Controller enables a put operation

TAIL

At the end of clock cycle

Cell enqueues data

HEAD

Sender starts a put operationSteady state: FIFO neither full, nor empty

Mixed-Clock FIFO: Steady-State Mixed-Clock FIFO: Steady-State SimulationSimulation

Ge

tC

on

tro

ller

Empty Detector

Full DetectorPut

Controller

full

req_put

data_put

CLK_put

CLK_getdata_get

req_get

valid_get

empty

TAILPasses the put token

HEAD

Mixed-Clock FIFO: Steady-State Mixed-Clock FIFO: Steady-State SimulationSimulation

Ge

tC

on

tro

ller

Empty Detector

Full DetectorPut

Controller

full

req_put

data_put

CLK_put

CLK_getdata_get

req_get

valid_get

empty

TAIL

HEAD

Get OperationGet Operation

Mixed-Clock FIFO: Steady-State Mixed-Clock FIFO: Steady-State SimulationSimulation

Ge

tC

on

tro

ller

Empty Detector

Full DetectorPut

Controller

full

req_put

data_put

CLK_put

CLK_getdata_get

req_get

valid_get

empty

TAIL

HEADSteady state operation: Puts and Gets “reasonably spaced”Steady state operation: Puts and Gets “reasonably spaced”

Zero probabilityZero probability of synchronization failure of synchronization failure

Steady state operation:Steady state operation:Zero synchronization overheadZero synchronization overhead

Mixed-Clock FIFO: Steady-State Mixed-Clock FIFO: Steady-State SimulationSimulation

Ge

tC

on

tro

ller

Empty Detector

Full DetectorPut

Controller

full

req_put

data_put

CLK_put

CLK_getdata_get

req_get

valid_get

empty

TAILTAIL

HEAD

TAIL

Mixed-Clock FIFO: Full ScenarioMixed-Clock FIFO: Full Scenario

Ge

tC

on

tro

ller

Empty Detector

Full DetectorPut

Controller

full

req_put

data_put

CLK_put

CLK_getdata_get

req_get

valid_get

empty

Put interface stalled FIFO FULLFIFO FULL

HEAD

TAIL

Mixed-Clock FIFO: Full ScenarioMixed-Clock FIFO: Full Scenario

Ge

tC

on

tro

ller

Empty Detector

Full Detectorfull

req_put

data_put

CLK_put

CLK_getdata_get

req_get

valid_get

empty

HEAD

PutController

TAIL

Mixed-Clock FIFO: Full ScenarioMixed-Clock FIFO: Full Scenario

Ge

tC

on

tro

ller

Empty Detector

Full Detectorfull

req_put

data_put

CLK_put

CLK_getdata_get

req_get

valid_get

empty

PutController

TAILFIFO NOT FULLFIFO NOT FULL

HEAD

Mixed-Clock FIFO: Full ScenarioMixed-Clock FIFO: Full Scenario

Ge

tC

on

tro

ller

Empty Detector

Full Detectorfull

req_put

data_put

CLK_put

CLK_getdata_get

req_get

valid_get

empty

PutController

TAIL

HEAD

REG

Mixed-Clock FIFO: Cell Mixed-Clock FIFO: Cell ImplementationImplementation

En

En

f_ie_i

ptok_out ptok_in

gtok_ingtok_out

CLK_get en_get valid data_get

CLK_put en_put req_putdata_put

SR

en_puten_put

en_geten_get

Enables a get operationEnables a get operation

Enables a put operationEnables a put operationSynchronous Put PartSynchronous Put Part

Synchronous Get PartSynchronous Get Part

Data ValidityData ValidityControllerController

reusable

reusable

f_if_ie_ie_i

Cell FULLCell FULL

Cell EMPTYCell EMPTY

Status Bits:Status Bits:

ptok_outptok_out ptok_inptok_in

gtok_outgtok_out gtok_ingtok_inEn

En

validvaliddata_getdata_get

Data item outData item outValidity bit outValidity bit out

req_putreq_putdata_putdata_put

Data item inValidity bit in

Mixed-Clock FIFO: ArchitectureMixed-Clock FIFO: Architecture

Ge

tC

on

tro

ller

Empty Detector

Full DetectorPut

Controller

full

req_put

data_put

CLK_put

CLK_getdata_get

req_get

valid_get

empty

Synchronization IssuesSynchronization IssuesChallenge: interfaces are highly-concurrentChallenge: interfaces are highly-concurrent

Global “FIFO state”: controlled by 2 different clocksGlobal “FIFO state”: controlled by 2 different clocks

Problem #1: Problem #1: MetastabilityMetastability Each FIFO interface needs Each FIFO interface needs clean state signalsclean state signals

Solution:Solution: Synchronize “full” & “empty” signalsSynchronize “full” & “empty” signals ““full”full” with with CLK_putCLK_put ““empty”empty” with with CLK_getCLK_get

Add 2Add 2 (or more) (or more) synchronizing latchessynchronizing latches to each signal to each signal

Observable Observable “full”“full”//“empty”“empty” safely approximatesafely approximate true FIFO statetrue FIFO state

Synchronization Issues (cont.)Synchronization Issues (cont.)Problem #2:Problem #2: FIFO now may underflow/overflow!FIFO now may underflow/overflow!

synchronizing latches synchronizing latches add extra latencyadd extra latency

Solution:Solution: Modify definitions of “full” and “empty” Modify definitions of “full” and “empty”New FULL:New FULL: 0 or 1 empty cells left0 or 1 empty cells leftNew EMPTY:New EMPTY: 0 or 1 full cells left0 or 1 full cells left

e_0

e_1

e_2

e_3

e_3

e_2

e_1

e_0

CLK_put

CLK_put

CLK_put

full

Two consecutive empty cells FIFO not full=NO two consecutive

empty cells

Synchronizing Latches

New Full Detector

Synchronization Issues (cont.)Synchronization Issues (cont.)Problem #3:Problem #3: Potential for deadlockPotential for deadlock

Scenario:Scenario: suppose only 1 data item in quiescent FIFO suppose only 1 data item in quiescent FIFO FIFO still considered “empty” (new definition)FIFO still considered “empty” (new definition)

Get interface: Get interface: cannot dequeue data item!cannot dequeue data item!

Solution:Solution: bi-modal “empty detector”, bi-modal “empty detector”, combines:combines: ““New empty”New empty” detector (0 or 1 data items) detector (0 or 1 data items)

““True empty”True empty” detector (0 data items) detector (0 data items)

Two results folded into single global Two results folded into single global “empty”“empty” signal signal

Synchronization Issues: Avoiding Synchronization Issues: Avoiding DeadlockDeadlock

f_0

f_1

f_2

f_3

f_3

f_2

f_1

f_0

CLK_get

CLK_get

CLK_getne

f_1 f_3f_2f_0

CLK_get

CLK_get

CLK_get

oe

req_get

en_get

empty

Detects “new empty” (0 or 1 empty cells)Detects “new empty” (0 or 1 empty cells)

Detects “true empty” (0 empty cells)Detects “true empty” (0 empty cells)

Combine intoCombine intoglobal “empty”global “empty”

Bi-modal empty detection: select either Bi-modal empty detection: select either nene or or oeoe

Reconfigure whenever activeReconfigure whenever activeget interfaceget interface

When reconfiguredWhen reconfigureduse “ne”:use “ne”:

FIFO active FIFO active avoids underflowavoids underflow

When NOT When NOT reconfigured, use “oe”:reconfigured, use “oe”:

FIFO quiescent FIFO quiescent avoids deadlockavoids deadlock

Mixed-Clock FIFO: ArchitectureMixed-Clock FIFO: Architecture

Ge

tC

on

tro

ller

Empty Detector

Full DetectorPut

Controller

full

req_put

data_put

CLK_put

CLK_getdata_get

req_get

valid_get

empty

Put/Get ControllersPut/Get Controllers

Put Controller:Put Controller: enables put operationenables put operation disabled disabled when FIFOwhen FIFO fullfull

Get Controller:Get Controller: enables get operationenables get operation indicates when data validindicates when data valid disabled disabled when FIFOwhen FIFO emptyempty

en_putfull

req_put

en_get

empty

valid

req_get

valid_get

OutlineOutline

Mixed-Clock InterfacesMixed-Clock Interfaces FIFOFIFO Relay StationRelay Station

• Async-Sync InterfacesAsync-Sync Interfaces FIFOFIFO Relay StationRelay Station

• ResultsResults

• ConclusionsConclusions

Relay Stations: OverviewRelay Stations: Overview

system 1 now sends “data packets” to system 2system 1 now sends “data packets” to system 2

RS RS RS RS

Sys

tem

1

Sys

tem

2

Data Packet = Data Packet =

data item +data item +

validity bitvalidity bit

““stop”stop” control = stopIn + stopOut control = stopIn + stopOut- apply counter-pressureapply counter-pressure- result: stall communicationresult: stall communication

Proposed by Carloni et al. (ICCAD’99)

Steady State: pass data on every cycleSteady State: pass data on every cycle(either valid or invalid)(either valid or invalid)

Problem: Works only for single-clock systems!Problem: Works only for single-clock systems!

CLK

system 1 sends “data items” to system 2

Delay = > 1 cycleDelay = > 1 cycleDelay = 1 cycleDelay = 1 cycle

Relay Stations: ImplementationRelay Stations: Implementation

• In In normal operation:normal operation: packetInpacketIn copied to copied to MRMR and forwarded on and forwarded on

packetOutpacketOut

• When When stoppedstopped ( (stopInstopIn=1):=1): stopOutstopOut raised on the next clock edgeraised on the next clock edge extra packet copied to extra packet copied to ARAR

switc

h mux

MR

AR

Control

packetOutpacketIn

stopInstopOut

Relay Station Relay Station vs. vs. Mixed- Mixed-Clock FIFOClock FIFO

Steady state:Steady state: always always pass pass datadata

Data items:Data items: both both valid valid & & invalidinvalid

Stopping mechanism:Stopping mechanism: stopIn stopIn & & stopOutstopOut

Steady state:Steady state: only only pass pass data data when when requested requested

Data items:Data items: only only valid valid datadata

Stopping mechanism: Stopping mechanism: nonenone (only full/empty) (only full/empty)

validOut

dataOut

stopIn

validIn

dataIn

stopOut

emptyfull

req_getreq_put

dataOutdataIn

RelayStation

Mixed-ClockFIFO

full

req_put

data_put

CLK_put

empty

req_get

valid_get

data_get

CLK_getMix

ed

-Clo

ck

FIF

O

CLK

Mixed-Clock Relay Stations Mixed-Clock Relay Stations (MCRS)(MCRS)

RS RS RS RS

Sys

tem

1

Sys

tem

2

Mixed-Clock Relay Station derived from the Mixed-Clock FIFO

valid_putvalid_put

data_putdata_put

stopOutstopOut stopInstopIn

valid_getvalid_get

data_getdata_get

Mix

ed

-Clo

ck

Rela

y S

tati

on

CLK1CLK1 CLK2CLK2

MCRS

CLK1CLK1 CLK2

Change ONLY Put and Get ControllersChange ONLY Put and Get Controllers

NEW

packetInpacketIn packetOutpacketOut

Mixed-Clock Relay Station: Mixed-Clock Relay Station: ImplementationImplementation

Identical:Identical:- FIFO cells- FIFO cells- Full/Empty detectors- Full/Empty detectors (...or can simplify)(...or can simplify)

Only modify: Only modify: Put & Get ControllersPut & Get Controllers

validInvalidIn

fullfull en_puten_putstopIstopI

nn

emptyempty

validvalid

en_geten_get

validOutvalidOut

to cellsto cells

Put Controller Get Controller

Mixed-Clock Relay Station vs. Mixed-Clock FIFO

Always enqueue data (unless full)Always enqueue data (unless full)

OutlineOutline

• Mixed-Clock InterfacesMixed-Clock Interfaces FIFOFIFO Relay StationRelay Station

• Async-Sync InterfacesAsync-Sync Interfaces FIFOFIFO Relay StationRelay Station

• ResultsResults

• ConclusionsConclusions

Async-Sync FIFO: Block LevelAsync-Sync FIFO: Block Level

Asynchronous Asynchronous putput interface: uses interface: uses handshaking handshaking communicationcommunication put_req:put_req: request operation request operation put_ack:put_ack: acknowledge completion acknowledge completion no “full” signalno “full” signal

Synchronous Synchronous getget interface: no change interface: no change

full

req_put

data_put

CLK_put

req_get

valid_get

empty

data_get

CLK_getMix

ed

-Clo

ck

FIF

Oput_dataput_data

req_getreq_get

valid_getvalid_get

emptyempty

data_getdata_get

CLK_getCLK_get

put_reqput_req

put_ackput_ack

As

yn

c-S

yn

cF

IFO

Async Domain Sync Domain

Async-Sync FIFO: ArchitectureAsync-Sync FIFO: Architecture

cell cell cell cell cell

Ge

tC

on

tro

ller

Empty Detector

put_ack

put_req

put_data

CLK_getdata_get

req_get

valid_get

empty

Get interface: exactly as in Mixed-Clock FIFOGet interface: exactly as in Mixed-Clock FIFO

Asynchronous put interfaceNo Full Detector or Put ControllerNo Full Detector or Put ControllerWhen FIFO full, acknowledgement withheldWhen FIFO full, acknowledgement withheld

until safe to perform the put operationuntil safe to perform the put operation

REG

Async-Sync FIFO: Cell Async-Sync FIFO: Cell ImplementationImplementation

C+ OPT

DV

En

put_reqput_data put_ack

we

f_i

gtok_out

we1

gtok_in

CLK_get en_get get_data

e_i

Data ValidityData ValidityControllerController

new

Synchronous Get Part

reusable (from mixed-clock FIFO)(from mixed-clock FIFO)

Asynchronous Put PartAsynchronous Put Part

reusable

from asyncfrom asyncFIFO (Async00)FIFO (Async00)

Async-Sync Relay Stations (ASRS)Async-Sync Relay Stations (ASRS)

ARS ARS RS

Sys

tem

1(a

syn

c)

Sys

tem

2(s

ync)

ASRS

CLK2

Micropipeline

optional

OutlineOutline

• Mixed-Clock InterfacesMixed-Clock Interfaces FIFOFIFO Relay StationRelay Station

• Async-Sync InterfacesAsync-Sync Interfaces FIFOFIFO Relay StationRelay Station

• ResultsResults

• ConclusionsConclusions

ResultsResults

Each circuit implemented: Each circuit implemented: using both academic and industry toolsusing both academic and industry tools

MINIMALIST:MINIMALIST: Burst-Mode controllers [Nowick et al. Burst-Mode controllers [Nowick et al. ‘99]‘99]

PETRIFY:PETRIFY: Petri-Net controllers [Cortadella et al. ‘97] Petri-Net controllers [Cortadella et al. ‘97]

Pre-layout simulations: 0.6Pre-layout simulations: 0.6m HP CMOS m HP CMOS

technologytechnology

Experiments: Experiments: various FIFO capacitiesvarious FIFO capacities (4/8/16 cells) (4/8/16 cells) various data widths various data widths (8/16 bits)(8/16 bits)

Results: LatencyResults: Latency

DesignDesign4-place4-place 8-place8-place 16-place16-place

MinMin MaxMax MinMin MaxMax MinMin MaxMaxMixed-ClockMixed-Clock 5.435.43 6.346.34 5.795.79 6.646.64 6.146.14 7.177.17Async-SyncAsync-Sync 5.535.53 6.456.45 6.136.13 7.177.17 6.476.47 7.517.51Mixed-Clock RSMixed-Clock RS 5.485.48 6.416.41 6.056.05 7.027.02 6.236.23 7.287.28Async-Sync RSAsync-Sync RS 5.615.61 6.356.35 6.186.18 7.137.13 6.576.57 7.627.62

Experimental Setup:- 8-bit data items- various FIFO capacities (4, 8, 16)

For each design, latency not uniquely defined: For each design, latency not uniquely defined: Min/MaxMin/Max

Latency = time from enqueuing to dequeueing data into an empty FIFO

Results: Maximum Operating RateResults: Maximum Operating Rate

DesignDesign4-place4-place 8-place8-place 16-place16-place

PutPut GetGet PutPut GetGet PutPut GetGetMixed-ClockMixed-Clock 565565 549549 544544 523523 505505 484484Async-SyncAsync-Sync 421421 549549 379379 523523 357357 484484Mixed-Clock RSMixed-Clock RS 580580 539539 550550 517517 509509 475475Async-Sync RSAsync-Sync RS 421421 539539 379379 517517 357357 475475

Synchronous interfaces: MegaHertzAsynchronous interfaces: MegaOps/sec

Put vs. Get rates:- sync put faster than sync get- async put slower than sync get

ConclusionsConclusionsIntroduced several new low-latency interface circuitsIntroduced several new low-latency interface circuits

Address 2 major issues in SoC design:Address 2 major issues in SoC design: Mixed-timing domainsMixed-timing domains

mixed-clock FIFOmixed-clock FIFO async-sync FIFOasync-sync FIFO

Long interconnect delaysLong interconnect delays mixed-clock relay stationmixed-clock relay station async-sync relay stationasync-sync relay station

Other designs implemented and simulated:Other designs implemented and simulated: Sync-Async FIFO + Relay StationSync-Async FIFO + Relay Station Async-Async FIFO + Relay StationAsync-Async FIFO + Relay Station

Reusable components: mix & match to build circuitsReusable components: mix & match to build circuits

Provide useful set of interface circuits for SoC designProvide useful set of interface circuits for SoC design