advanced digital design practical example: darts by a. steininger and m. delvai vienna university of...

64
Advanced Digital Advanced Digital Design Design Practical Example: DARTS Practical Example: DARTS by A. Steininger and M. Delvai Vienna University of Technology

Post on 19-Dec-2015

219 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Advanced Digital Design Practical Example: DARTS by A. Steininger and M. Delvai Vienna University of Technology

Advanced Digital DesignAdvanced Digital DesignPractical Example: DARTSPractical Example: DARTS

by A. Steininger and M. DelvaiVienna University of Technology

Page 2: Advanced Digital Design Practical Example: DARTS by A. Steininger and M. Delvai Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 2

OutlineOutline

The Clock Distribution ProblemThe Clock Distribution Problem DARTS Idea & Project OutlineDARTS Idea & Project Outline DARTS ImplementationDARTS Implementation

concept & modulesconcept & modules complexity issuescomplexity issues performance resultsperformance results test concepttest concept timing assumptionstiming assumptions

fundamental Problem FT Asyn Logicfundamental Problem FT Asyn Logic

Page 3: Advanced Digital Design Practical Example: DARTS by A. Steininger and M. Delvai Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 3

concept:concept: precise global notion of time for entire (system on) chip

method:method: discrete evenly spaced time slicesglobal, “phase accurate”“phase accurate” clock

treesingle crystal oscillator

costs: costs: cumbersome

clock tree design considerable

waste of power single point of failuresingle point of failure

DSP

WLAN

Kamera

GPRS

GPS

The Synchronous ParadigmThe Synchronous Paradigm

Page 4: Advanced Digital Design Practical Example: DARTS by A. Steininger and M. Delvai Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 4

Phase Accurrate ClockingPhase Accurrate Clocking

low-skew clock distribution has low-skew clock distribution has become a substantial problem:become a substantial problem: non-negligible signal propagation timenon-negligible signal propagation time clock network is widely distributedclock network is widely distributed high fan-out of clock networkhigh fan-out of clock network enormous power dissipation in clock enormous power dissipation in clock

networknetwork sophisticated techniques for clock sophisticated techniques for clock

routing with little tool supportrouting with little tool support ……

Page 5: Advanced Digital Design Practical Example: DARTS by A. Steininger and M. Delvai Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 5

Current SolutionsCurrent Solutions

symmetric routing symmetric routing (H-tree, X-tree)(H-tree, X-tree)

configurable buffersconfigurable buffers deskewing circuitsdeskewing circuits gated clock treegated clock tree half swing clockhalf swing clock ……

can we go on like this?can we go on like this?

Page 6: Advanced Digital Design Practical Example: DARTS by A. Steininger and M. Delvai Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 6

Fault-Tolerant ClockingFault-Tolerant Clocking

crystal oscillators not very robustcrystal oscillators not very robust increasing extent of clock netincreasing extent of clock net higher clock rateshigher clock rates smaller voltage swingsmaller voltage swing shrinking feature size andshrinking feature size and

critical chargecritical charge more demanding applicationsmore demanding applications

Can we admit clock source & network Can we admit clock source & network as single points of failure?as single points of failure?

Page 7: Advanced Digital Design Practical Example: DARTS by A. Steininger and M. Delvai Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 7

Current SolutionsCurrent Solutions

need to use independent clock sourcesneed to use independent clock sources sacrifice global synchrony: sacrifice global synchrony:

GGlobally lobally AAsynchronous synchronous LLocally ocally SSynchronous ynchronous (GALS) systems(GALS) systems

perform synchronizationperform synchronization of clock sourcesof clock sources

(on microtick level)(on microtick level) of local time bases of local time bases

(on macrotick level)(on macrotick level)

DSP

WLAN

Kamera

GPRS

GPS

Page 8: Advanced Digital Design Practical Example: DARTS by A. Steininger and M. Delvai Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 8

The GALS ConceptThe GALS Concept

partition system in functional units FUpartition system in functional units FU apply synchronous paradigm within FUsapply synchronous paradigm within FUs apply asynchronous paradigm (hand-apply asynchronous paradigm (hand-

shake) for communication among FUsshake) for communication among FUsmost difficult clock most difficult clock routing problems routing problems eliminatedeliminated

potential metastabilitypotential metastabilityproblems at syn/asyn problems at syn/asyn boundariesboundaries

REQACK

FU1

FUn

Page 9: Advanced Digital Design Practical Example: DARTS by A. Steininger and M. Delvai Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 9

ref. clock

clock drift

Real time

Clo

ck t

ime

|C1(t)-C2(t)| ≤ Π„Precision“

t

T

C1

C2

Synchronization: PrincipleSynchronization: Principle

Page 10: Advanced Digital Design Practical Example: DARTS by A. Steininger and M. Delvai Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 10

Why Synchronize?Why Synchronize?

For unsynchronized clocks the distance For unsynchronized clocks the distance between corresponding edges („nbetween corresponding edges („nthth edge“) becomes arbitrarily large.edge“) becomes arbitrarily large.

Hence the relative timing between the Hence the relative timing between the two FUs is completely undefined.two FUs is completely undefined.

Therefore consistent temporal ordering Therefore consistent temporal ordering of global events is impossible.of global events is impossible. This may, e.g., cause redundant modules to This may, e.g., cause redundant modules to

deliver differing results even in the fault-deliver differing results even in the fault-free case!free case!

Page 11: Advanced Digital Design Practical Example: DARTS by A. Steininger and M. Delvai Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 11

PLL

voted ref. clock

voter

local clock

node A

clock output

clock inputs

HW Clock SynchronizationHW Clock Synchronization

node B

node C node D

at every node do:at every node do:

(1) derive reference clock (1) derive reference clock and phase by voting over and phase by voting over all local clocksall local clocks

(2) adjust local clock (2) adjust local clock phase by means of PLLphase by means of PLL

Page 12: Advanced Digital Design Practical Example: DARTS by A. Steininger and M. Delvai Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 12

a closed-loop control circuita closed-loop control circuit different implementation styles different implementation styles

(analog, digital, fully digital)(analog, digital, fully digital) potential stability problemspotential stability problems cannot sync local clock only for small cannot sync local clock only for small

deviations from refdeviations from ref

PLL (Phase Locked Loop)PLL (Phase Locked Loop)

phasedetector

loopfilter

voltage controlled oscillator

ref inout

Page 13: Advanced Digital Design Practical Example: DARTS by A. Steininger and M. Delvai Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 13

One voter is required on One voter is required on everyevery node. node. For a global agreement all voters For a global agreement all voters

must derive the same reference clock, must derive the same reference clock, otherwise cliques may evolve.otherwise cliques may evolve.

BUT…BUT…

Obviously the local voting result Obviously the local voting result is influenced by skew!is influenced by skew!

VoterVoter

Page 14: Advanced Digital Design Practical Example: DARTS by A. Steininger and M. Delvai Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 14

independent local clock sources: independent local clock sources: „microtick“„microtick“ identical instances of a distributed algorithm identical instances of a distributed algorithm

executed on every nodeexecuted on every node global time established by global time established by „macroticks“„macroticks“ (i.e. (i.e.

defined number M of local microticks)defined number M of local microticks) message exchange between nodes provides message exchange between nodes provides

knowledge on other nodes‘ local timeknowledge on other nodes‘ local time algorithm continuously derives correction for algorithm continuously derives correction for

M to keep macrotick in synchrony with othersM to keep macrotick in synchrony with others set of N nodes can tolerate f Byzantine faulty set of N nodes can tolerate f Byzantine faulty

nodes, if N ≥ 3f+1nodes, if N ≥ 3f+1

SW Clock SynchronizationSW Clock Synchronization

Page 15: Advanced Digital Design Practical Example: DARTS by A. Steininger and M. Delvai Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 15

Rate CorrectionRate Correction

makrotick

microtick

1

t

1 2 3 uu-1

…1 2 3 vv-1

t

v+1

makrotick

Page 16: Advanced Digital Design Practical Example: DARTS by A. Steininger and M. Delvai Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 16

Critical ParametersCritical Parameters

message jittermessage jitter algorithm execution time jitteralgorithm execution time jitter native drift of clock sources (microtick)native drift of clock sources (microtick) resynchronization periodresynchronization period network topology (fully connected, …)network topology (fully connected, …) correction function (algorithm)correction function (algorithm)

precision of 100ns achievableprecision of 100ns achievable

Page 17: Advanced Digital Design Practical Example: DARTS by A. Steininger and M. Delvai Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 17

Consistency Based Algs.Consistency Based Algs.

all non-faulty nodes agree on same view of all non-faulty nodes agree on same view of every (even faulty) node‘s local time, every (even faulty) node‘s local time,

can calculate the global time consistentlycan calculate the global time consistently agreement needs several rounds of agreement needs several rounds of

communication per valuecommunication per value creates high network trafficcreates high network traffic requires bounded transmission delayrequires bounded transmission delay does not require fully connected networkdoes not require fully connected network good performance wrt. skewgood performance wrt. skew agreed value not necessarily correct (only in agreed value not necessarily correct (only in

case of non-faulty sender)case of non-faulty sender)

Page 18: Advanced Digital Design Practical Example: DARTS by A. Steininger and M. Delvai Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 18

An Example AlgorithmAn Example Algorithm

when received (tick[k]) from f+1 then send (tick[k]) to all {once}

when received (tick[k]) from 2f+1 then send (tick[k+1]) to all {once} local_tick := k+1

Rule 2: “Increment”Rule 2: “Increment”

(Srikanth & Toueg, 87)(Srikanth & Toueg, 87)

Rule 1: “Relay”Rule 1: “Relay”

Page 19: Advanced Digital Design Practical Example: DARTS by A. Steininger and M. Delvai Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 19

FT Clocking – ResumeFT Clocking – Resume

GALSGALS sacrifices global time basesacrifices global time base potential metastability problemspotential metastability problems

HW clock sync HW clock sync (PCB level)(PCB level) skew can cause inconsistent votingskew can cause inconsistent voting

SW clock sync SW clock sync (distributed systems)(distributed systems) precision of better than 100ns not precision of better than 100ns not

realisticrealistic

Page 20: Advanced Digital Design Practical Example: DARTS by A. Steininger and M. Delvai Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 20

DistributedDistributed SystemSystem on Chip on Chip??

complex structurecomplex structure

communication delay communication delay negilgiblenegilgible

fault tolerance fault tolerance rarely neededrarely needed

explicit explicit computing nodescomputing nodes

pronounced pronounced communic. delaycommunic. delay

need for need for fault tolerancefault tolerance

wealth of wealth of existing existing researchresearch

Distributed Systems

classical VLSI Systems

System on Chip

modular structuremodular structure

communication delay communication delay dominatesdominates

fault tolerancefault tolerancedefinitely desireddefinitely desired

new new problemsproblems

Page 21: Advanced Digital Design Practical Example: DARTS by A. Steininger and M. Delvai Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 21

1s is excellent precision for distributed clock

at 1GHz this means 360.000° phase shift

phase synchronisation

tick synchronisation

clock synchronisation

keep same frequency for all modules, AND deterministically accommodate significant skew

„„Tick Synchronization“Tick Synchronization“

Page 22: Advanced Digital Design Practical Example: DARTS by A. Steininger and M. Delvai Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 22

DSP

WLAN

Kamera

GPRS

GPS

New Synchrony on ChipNew Synchrony on Chip

DSP

WLAN

Kamera

GPRS

GPS

Page 23: Advanced Digital Design Practical Example: DARTS by A. Steininger and M. Delvai Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 23

The DARTS ApproachThe DARTS Approach

adopt a distributed adopt a distributed synchronization algo-synchronization algo-rithm for SoC clockingrithm for SoC clocking

inherit all fault-tolerance inherit all fault-tolerance propertiesproperties

implement algorithm in implement algorithm in hardwarehardware

thus achieve much better thus achieve much better precision (a few clock precision (a few clock cycles) and clock ratecycles) and clock rate

Page 24: Advanced Digital Design Practical Example: DARTS by A. Steininger and M. Delvai Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 24

FU1

FU2

FU3

data bus

Clock tree

TG algs

TG network

Distributed clock

modules FUmodules FUii augmen- augmen-ted with simple local ted with simple local clock unit (clock unit (TG algTG alg))

TG algs implemented in TG algs implemented in asynchronous logic styleasynchronous logic style

TG algs communicate TG algs communicate over dedicated bus (over dedicated bus (TG TG networknetwork) to generate ) to generate local clockslocal clocks

need 3f+1 modules to need 3f+1 modules to tolerate f arbitrary faultstolerate f arbitrary faults

Synchronous solution

The DARTS ArchitectureThe DARTS Architecture

Page 25: Advanced Digital Design Practical Example: DARTS by A. Steininger and M. Delvai Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 25

Conceptual Advantages (1)Conceptual Advantages (1)

best possible synchronybest possible synchrony locally:locally: still (phase)synchronous still (phase)synchronous

remain with traditional synchronous paradigmremain with traditional synchronous paradigm

globally:globally: frequency synchronous frequency synchronous

global precision global precision pp is known and bounded: is known and bounded:

delay delaymaxmax / delay / delayminmin (relative!) (relative!)

completely avoid metastability issuescompletely avoid metastability issues

Page 26: Advanced Digital Design Practical Example: DARTS by A. Steininger and M. Delvai Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 26

Conceptual Advantages (2)Conceptual Advantages (2)

fault-tolerant clock generationfault-tolerant clock generation algorithm algorithm generatesgenerates clock clock

no crystal oscillator requiredno crystal oscillator required

distributed algorithmdistributed algorithmno single point of failureno single point of failure

scalable fault tolerancescalable fault tolerance

use n ≥ 3f+1 nodesuse n ≥ 3f+1 nodes to to tolerate f Byzantine tolerate f Byzantine faultsfaults

Page 27: Advanced Digital Design Practical Example: DARTS by A. Steininger and M. Delvai Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 27

Conceptual Advantages (3)Conceptual Advantages (3)

weaker timing assumptionsweaker timing assumptions TG-net instead of costly clock treeTG-net instead of costly clock tree

large skew uncritical for operationlarge skew uncritical for operation

closed-loop timing closed-loop timing => frequency adapts to=> frequency adapts to

variation of operating conditionsvariation of operating conditions type variations from fabtype variations from fab

Page 28: Advanced Digital Design Practical Example: DARTS by A. Steininger and M. Delvai Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 28

An Example AlgorithmAn Example Algorithm

when received (tick[k]) from f+1 then send (tick[k]) to all {once}

when received (tick[k]) from 2f+1 then send (tick[k+1]) to all {once} local_tick := k+1

Rule 2: “Increment”Rule 2: “Increment”

(Srikanth & Toueg, 87)(Srikanth & Toueg, 87)

Rule 1: “Relay”Rule 1: “Relay”

recall

Our choice fo

r DARTS

Our choice fo

r DARTS

Page 29: Advanced Digital Design Practical Example: DARTS by A. Steininger and M. Delvai Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 29

Algorithm PropertiesAlgorithm Properties

formal proof for precision existsformal proof for precision exists precision depends on precision depends on relativerelative timing only timing only

simplesimple scalable fault tolerancescalable fault tolerance can handle Byzantine failurescan handle Byzantine failures formal proof for booting existsformal proof for booting exists time free ideal assumption coveragetime free ideal assumption coverage

to write downto write down

Page 30: Advanced Digital Design Practical Example: DARTS by A. Steininger and M. Delvai Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 30

when received (tick[k]) from 2f+1 then send (tick[k+1]) to all {once} local_tick := k+1

when received (tick[k]) from f+1 then send (tick[k]) to all {once}

message number k of unbounded size

atomicity of actions

The Hardware PerspectiveThe Hardware Perspective

Rule 1: “Relay”Rule 1: “Relay”

Rule 2: “Increment”Rule 2: “Increment”

integer comparison

Page 31: Advanced Digital Design Practical Example: DARTS by A. Steininger and M. Delvai Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 31

Further PropertiesFurther Properties

fully connected netfully connected net message basedmessage based message book-keepingmessage book-keeping integer comparisoninteger comparison majority voting on comparison resultmajority voting on comparison result high-level proofs imply high-level proofs imply

many properties – are many properties – are they met by the they met by the implementation?implementation?

Page 32: Advanced Digital Design Practical Example: DARTS by A. Steininger and M. Delvai Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 32

(1)(1)adapt algorithm for single-rail, zero-adapt algorithm for single-rail, zero-bit messages bit messages

(2)(2)maintain only the maintain only the differencedifference of local of local and remote ticksand remote ticks

challenge: challenge: “unbounded” size of tick numbers“unbounded” size of tick numbers options:options:

serialize transmission of tick numbers:serialize transmission of tick numbers:too slow for clock generationtoo slow for clock generation

multiple parallel rails per clock signalmultiple parallel rails per clock signalhardware/wiring effort too highhardware/wiring effort too high

From Algorithm to HW (1)From Algorithm to HW (1)

Page 33: Advanced Digital Design Practical Example: DARTS by A. Steininger and M. Delvai Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 33

challenge: challenge: atomicity of actions not implicitlyatomicity of actions not implicitlyguaranteed by asynchronous HWguaranteed by asynchronous HW

options:options:

strict serialization of operationsstrict serialization of operationsalternating processing of remote & local tickalternating processing of remote & local tick

interlocking of the algorithm’s rulesinterlocking of the algorithm’s rulesseparate processing for rising and falling separate processing for rising and falling

edgeedge

From Algorithm to HW (2)From Algorithm to HW (2)

Page 34: Advanced Digital Design Practical Example: DARTS by A. Steininger and M. Delvai Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 34

algorithm redesignalgorithm redesign

appropriate HW designappropriate HW design

ASIC implementationASIC implementation

demo application demo application

experim. evaluationexperim. evaluation

for „zero-bit“ messagesfo

rmal

ly p

rove

d

DARTS Project AimsDARTS Project Aims

Page 35: Advanced Digital Design Practical Example: DARTS by A. Steininger and M. Delvai Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 35

asnyHW behav. assumptionsasnyHW behav. assumptions 0-bit messages0-bit messages

SW behavior assumptionsSW behavior assumptions unbounded message lengthunbounded message length unbounded local memoryunbounded local memory strong local atomicity, strong local atomicity,

local lock-steplocal lock-step high comput. performance, high comput. performance,

algorithm can contain algorithm can contain complex computationscomplex computations

Results: precision, accuracy, …Results: precision, accuracy, …

Formal ProofsFormal Proofs

Page 36: Advanced Digital Design Practical Example: DARTS by A. Steininger and M. Delvai Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 36

algorithm redesignalgorithm redesign

appropriate HW designappropriate HW design

ASIC implementationASIC implementation

demo application demo application

experim. evaluationexperim. evaluation

DARTS Project AimsDARTS Project Aims

ack_ext ack_int

req_ext req_int

R em ote P ipe

____

_G

EQ

e

GR

e

GE

Qo

___

GR

o

3f+1

1

= 2f+1 = 2f+1

= f+1 = f+1

......

......

Threshold Logic_____G EQ e

G R e

G EQ o

___G R o

clk_

out

Pipeline 1

N ode p...

...

...

Pipe C om pare S ignal G enerators

C

C

C

C

C

C

C

C

C

Diff-G ate

CC

C

Local P ipe

rem

ote

clk_

in

E xterna l P ipe

P ipeline 2

Loca l P ipeD iff-G ate

P ipe C om pare S ignal G en.

ExternalP ipe

P ipeline 3

Loca l P ipeD iff-G ate

P ipe C om pare S igna l G en.

R em oteP ipe

P ipeline 3f+1

LocalP ipe

D iff-G ate

P ipe C om pare S ignal G en.

...

circuit design ready

Page 37: Advanced Digital Design Practical Example: DARTS by A. Steininger and M. Delvai Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 37

algorithm redesignalgorithm redesign

appropriate HW designappropriate HW design

ASIC implementationASIC implementation

demo application demo application

experim. evaluationexperim. evaluation

DARTS Project AimsDARTS Project Aims

FPGA prototype running

Page 38: Advanced Digital Design Practical Example: DARTS by A. Steininger and M. Delvai Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 38

algorithm redesignalgorithm redesign

appropriate HW designappropriate HW design

ASIC implementationASIC implementation

demo application demo application

experim. evaluationexperim. evaluation

DARTS Project AimsDARTS Project Aims

postponed to follow-up

Page 39: Advanced Digital Design Practical Example: DARTS by A. Steininger and M. Delvai Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 39

algorithm redesignalgorithm redesign

appropriate HW designappropriate HW design

ASIC implementationASIC implementation

demo application demo application

experim. evaluationexperim. evaluation

DARTS Project AimsDARTS Project Aims

ongoing…

Page 40: Advanced Digital Design Practical Example: DARTS by A. Steininger and M. Delvai Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 40

TG Alg Circuit PrincipleTG Alg Circuit Principle

+ Counter + Counter

+ Counter + Counter

+ Counter

clocks from other TG

algs

≥ 2f+1

≥ f+1

„OR“

local clock

GRGR

GEQGEQ

+ Counter

Page 41: Advanced Digital Design Practical Example: DARTS by A. Steininger and M. Delvai Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 41

ack_ext ack_int

req_ext req_int

R em ote P ipe

____

_G

EQ

e

GR

e

GE

Qo

___

GR

o

3f+1

1

= 2f+1 = 2f+1

= f+1 = f+1

......

......

Threshold Logic_____G EQ e

G R e

G EQ o

___G R o

clk_

out

Pipeline 1

N ode p

...

...

...

Pipe C om pare S ignal G enerators

C

C

C

C

C

C

C

C

C

Diff-G ate

CC

C

Local P ipe

rem

ote

clk_

in

E xterna l P ipe

P ipeline 2

Loca l P ipeD iff-G ate

P ipe C om pare S ignal G en.

ExternalP ipe

P ipeline 3

Loca l P ipeD iff-G ate

P ipe C om pare S igna l G en.

R em oteP ipe

P ipeline 3f+1

LocalP ipe

D iff-G ate

P ipe C om pare S ignal G en.

...

TG Alg Block DiagramTG Alg Block Diagram

clock output

clock inputs

counter modules

threshold function and tick generation

Page 42: Advanced Digital Design Practical Example: DARTS by A. Steininger and M. Delvai Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 42

C

C

C

C

Reset

Rremote,in

C

C

C

C

Reset

Rlocal,in

NAND2

NOR2

NOR1

NAND3

NAND4

NAND5

GEQe

GRe

GEQo

GRo

Pipeline 3f+1 of 3f+1

Local PipeDiff-Gate

Remote Pipe

Pipe Compare Signal Gen.

...

...

≥2f+1 ≥2f+1

≥f+1 ≥f+1

......

......

Threshold Gates

____GEQe

___GRe

____GEQo

___GRo

C

...3f+

1

...

Cbo

ttom

Ctop

clk_out

00

02 04

06

08

01

03

05

07

09 10

12

14

16

18

11

13

1517

19

40

42

44

4648

41

43

45

47

49

5051

20

22

24

26

2821

2325

27

29

30

32

34

36

3133

3537

38

39

52

54 56

58

53

55

57

s0

s1

i0 i1 i2 i3 i4 i5 i6 i7 i8 i9

PCSGModule

Threshold Module

DifferenceModule(EP + DG)

TG Alg ModulesTG Alg Modules

Page 43: Advanced Digital Design Practical Example: DARTS by A. Steininger and M. Delvai Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 43

Elastic Pipeline (EP)Elastic Pipeline (EP)

buffers incoming clock edgesbuffers incoming clock edges proof shows that 4 stages are sufficient hereproof shows that 4 stages are sufficient here ack_out is ignored !ack_out is ignored !

minimum distance between successive input minimum distance between successive input transitions is ttransitions is tpipepipe

ack_in

data_outdata_in

ack_out

C

C

C

Ctpipe

tpipe

Page 44: Advanced Digital Design Practical Example: DARTS by A. Steininger and M. Delvai Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 44

Diffifference Gate (DG)Diffifference Gate (DG)

asynchronous state machine (see homework ) removes matching transitions from tick buffers alternating progress on remote and local buffers

to serialize processing of ticks

Rremote,out Rlocal,out

Alocal,outAremote,out

Reset

Res

et

C

C

remote ticks

localticks

Page 45: Advanced Digital Design Practical Example: DARTS by A. Steininger and M. Delvai Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 45

Muller-C-ElementMuller-C-Element

basic building block for EP and DMbasic building block for EP and DM internal storage loop !internal storage loop !

minimum distance between successive minimum distance between successive input transitions is tinput transitions is tlooploop

Ca

b

y

º

tloop

b y

a

y

tprop

a b

010

01

10

1

yold

0

1yold

Page 46: Advanced Digital Design Practical Example: DARTS by A. Steininger and M. Delvai Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 46

The PCSGThe PCSG Pipe Compare Signal GeneratorPipe Compare Signal Generator

Compares the fill levels of remote and local Compares the fill levels of remote and local

pipepipe must not react to dynamic effects during must not react to dynamic effects during

substraction (glitches!)substraction (glitches!) purely combinational logicpurely combinational logic

Page 47: Advanced Digital Design Practical Example: DARTS by A. Steininger and M. Delvai Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 47

≥2f+1 ≥2f+1

≥f+1 ≥f+1

......

......

Threshold Modules

GEQe

GRe

GEQo

GRo

Cclk_out

Tick-Gen. Module

buffer fill-levels

generates new tick when rule fires (= threshold reached) separate rising and falling transitions for interlocking back-transition from state logic to event logic problematic wrt. glitches !!

local clock output

Threshold Modules (THM)Threshold Modules (THM)

Page 48: Advanced Digital Design Practical Example: DARTS by A. Steininger and M. Delvai Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 48

Thresold GatesThresold Gates

building blocks for THMbuilding blocks for THM activate output if more than k of activate output if more than k of

their n inputs are activetheir n inputs are active very inconvenient logic functionvery inconvenient logic function several different implementation several different implementation

options (ROM, sum-of-products, options (ROM, sum-of-products, custom cell, non-CMOS,…)custom cell, non-CMOS,…)

must be free of hazards!must be free of hazards! sum-of-products turned out bestsum-of-products turned out best

Page 49: Advanced Digital Design Practical Example: DARTS by A. Steininger and M. Delvai Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 49

Implementation ResultsImplementation Results

FPGA prototype: 24MHz with 4ns skew

non-optimized ASIC simulation: >200MHz with 650ps skew (RadHard UMC018 library)

ultimately available custom cell for Muller-C element will further increase performance

Page 50: Advanced Digital Design Practical Example: DARTS by A. Steininger and M. Delvai Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 50

Fully Connected…Fully Connected…

A fully connected system of A fully connected system of nn nodes nodes comprisescomprises 22nn((nn-1) 4-stage EPs-1) 4-stage EPs nn((nn-1) DGs and PCSGs-1) DGs and PCSGs 44nn THGs with THGs with nn-1 inputs-1 inputs nn22 interconnect lines in the TG net interconnect lines in the TG net

A reduction to a sparsely connected system A reduction to a sparsely connected system would yield substantial savings, butwould yield substantial savings, but

it is impossible to handle Byzantine failures it is impossible to handle Byzantine failures then.then.

Does hardware fail „Byzantine“?Does hardware fail „Byzantine“?

Page 51: Advanced Digital Design Practical Example: DARTS by A. Steininger and M. Delvai Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 51

Failure ModelsFailure Models

Distributed Systems people talk aboutDistributed Systems people talk about omission failuresomission failures clean/unclean crashesclean/unclean crashes Byzantine failures,…Byzantine failures,…

Hardware people care aboutHardware people care about stuck-at faultsstuck-at faults bit-flipsbit-flips metastability,metastability, opens and shorts,…opens and shorts,…

We are currently investigating this issueWe are currently investigating this issue

??

Page 52: Advanced Digital Design Practical Example: DARTS by A. Steininger and M. Delvai Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 52

Test ProblemsTest Problems

Fault toleranceFault tolerance implies fault maskingimplies fault masking

Self-timed behaviorSelf-timed behavior inconvenient for testerinconvenient for tester

Delay faultsDelay faults different effects in a DI circuitdifferent effects in a DI circuit

Sequential behaviorSequential behavior due to Muller-C Gatesdue to Muller-C Gates

Scan chain Scan chain does not naturally existdoes not naturally exist

Sequential asyn ATPG toolSequential asyn ATPG tool not availablenot available

Page 53: Advanced Digital Design Practical Example: DARTS by A. Steininger and M. Delvai Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 53

Dynamic states difficult / impossible to Dynamic states difficult / impossible to control for testercontrol for tester

Resulting necessities:Resulting necessities: Breaking feedback loopsBreaking feedback loops Halting the circuit in a dynamic stateHalting the circuit in a dynamic state

Alternative: scan chainAlternative: scan chain Lock internal stateLock internal state Controlled by tester clockControlled by tester clock

Temporal ControlTemporal Control

Page 54: Advanced Digital Design Practical Example: DARTS by A. Steininger and M. Delvai Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 54

The “Freeze” LatchesThe “Freeze” Latches

S0

11Sn

ArAl

S1

01

S2

00S3

10

RrRl

11

01

00

10

(a) (b)

Rl

DD

Rr

Ar

Al

freezer freezel

Rr

ENEN

Q Q

C

Halting FSM of DG inhibits further operation Halting FSM of DG inhibits further operation of TG Algof TG Alg

Page 55: Advanced Digital Design Practical Example: DARTS by A. Steininger and M. Delvai Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 55

PartitioningPartitioning

C

C

C

C

Reset

Rremote,in

C

C

C

C

Reset

Rlocal,in

NAND2

NOR2

NOR1

NAND3

NAND4

NAND5

GEQe

GRe

GEQo

GRo

Pipeline 3f+1 of 3f+1

Local PipeDiff-Gate

Remote Pipe

Pipe Compare Signal Gen.

...

...

≥2f+1 ≥2f+1

≥f+1 ≥f+1

......

......

Threshold Gates

____GEQe

___GRe

____GEQo

___GRo

C

...3f+

1

...

Cbo

ttom

Ctop

clk_out

00

02 04

06

08

01

03

05

07

09 10

12

14

16

18

11

13

1517

19

40

42

44

4648

41

43

45

47

49

50

51

20

22

24

26

2821

2325

27

29

30

32

34

36

3133

3537

38

39

52

54 56

58

53

55

57

s0

s1

i0 i1 i2 i3 i4 i5 i6 i7 i8 i9

Page 56: Advanced Digital Design Practical Example: DARTS by A. Steininger and M. Delvai Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 56

Self-Checking PropertySelf-Checking Property

Any SAF in the EP will inhibit further Any SAF in the EP will inhibit further transitions at the outputtransitions at the output

If we observe a correct response for the If we observe a correct response for the input sequence „1010“ then the EP is free input sequence „1010“ then the EP is free of any SAFof any SAF

ack_in

data_outdata_in

ack_out

C

C

C

Ctpipe

tpipe

Page 57: Advanced Digital Design Practical Example: DARTS by A. Steininger and M. Delvai Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 57

use EP outputs use EP outputs as stimuli for as stimuli for PCSGPCSG

many dynamic many dynamic states needed states needed => freezing => freezing mandatorymandatory

Coverage still Coverage still 98% only, due 98% only, due to redundant to redundant inputinput

Test VectorTest Vector StableStable FreezeFreeze

1010 0101011010 010101

1010 0110011010 011001

1010 1001011010 100101

1010 1010101010 101010

1001 0101011001 010101

1001 0110011001 011001

1001 1010101001 101010

0110 0101010110 010101

0110 0110010110 011001

0101 0101010101 010101

0101 0110100101 011010

0101 1001100101 100110

xx10 010110xx10 010110

xx01 101001xx01 101001

Testing PCSG via the EPTesting PCSG via the EP

Page 58: Advanced Digital Design Practical Example: DARTS by A. Steininger and M. Delvai Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 58

Threshold Modules TestingThreshold Modules Testing

purely combinational logicpurely combinational logic implementation may varyimplementation may vary

=> black box test desirable=> black box test desirable exhaustive test tractable (11 inputs exhaustive test tractable (11 inputs

=> 2=> 211 11 vectors in our case)vectors in our case) can test all 4 THGs in parallelcan test all 4 THGs in parallel need direct access to outputs need direct access to outputs beforebefore

combination into Muller C-elementcombination into Muller C-element can use counter or LFSR as test can use counter or LFSR as test

pattern generatorpattern generator

Page 59: Advanced Digital Design Practical Example: DARTS by A. Steininger and M. Delvai Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 59

A Special Scan CellA Special Scan Cell

D

CLKCLK

SCAN_ENABLE

DATA

SCAN_DATA

0

1

0 G 01

SCAN_OUT

DATA_OUT

standard scan cell

special scan cell

Page 60: Advanced Digital Design Practical Example: DARTS by A. Steininger and M. Delvai Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 60

Why Constraints?Why Constraints?

Our event-logic based DARTS circuit Our event-logic based DARTS circuit is not fully delay insensitive:is not fully delay insensitive:

residual delay sensitivity in gate- residual delay sensitivity in gate- internal storage loops (MCG)internal storage loops (MCG)

open ack path for clock ticksopen ack path for clock ticks mixture of event logic (EP, DG) and mixture of event logic (EP, DG) and

state logic (PCSG, THM)state logic (PCSG, THM) modular fault-tolerance is modular fault-tolerance is

contradictory to the DI principlecontradictory to the DI principle

Page 61: Advanced Digital Design Practical Example: DARTS by A. Steininger and M. Delvai Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 61

Fault-tolerant Asyn LogicFault-tolerant Asyn Logic

Asynchronous Logic is based on the Asynchronous Logic is based on the handshake principle.handshake principle.

Before generating ist next output a DI gate Before generating ist next output a DI gate with multiple inputs waits for the last one to with multiple inputs waits for the last one to become valid („non-eager style“).become valid („non-eager style“).

Problem: In case of a single input failure Problem: In case of a single input failure (stuck-at) such a gate will wait forever.(stuck-at) such a gate will wait forever.

This voids all redundancy concepts This voids all redundancy concepts (duplication of units, TMR, …)(duplication of units, TMR, …)

This is not an implementation issue but a This is not an implementation issue but a fundamental dilemma, also in DARTSfundamental dilemma, also in DARTS

Page 62: Advanced Digital Design Practical Example: DARTS by A. Steininger and M. Delvai Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 62

Solving the DilemmaSolving the Dilemma

We generate a new tick before all pending We generate a new tick before all pending ticks have arrived. Sync loops remain open!ticks have arrived. Sync loops remain open!

Now we must prevent old ticks we have Now we must prevent old ticks we have missed from being mixed up with responses missed from being mixed up with responses to the new tick („de-synchronization“).to the new tick („de-synchronization“).

With anonymous ticks this can only be solved With anonymous ticks this can only be solved by timing constraints.by timing constraints.

We can weaken the constraints by treating We can weaken the constraints by treating rising and falling edges separately.rising and falling edges separately.

For DARTS this yields the constraint that the For DARTS this yields the constraint that the slowest path must have no more than twice slowest path must have no more than twice the round trip delay of the fastest onethe round trip delay of the fastest one

Page 63: Advanced Digital Design Practical Example: DARTS by A. Steininger and M. Delvai Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 63

Optimal ConstraintsOptimal Constraints DARTS has been proven on the algorithm DARTS has been proven on the algorithm

levellevel Does the HW implementation meet all proof Does the HW implementation meet all proof

assumptions without restrictions?assumptions without restrictions?Definitely not – we have already identified Definitely not – we have already identified constaintsconstaints

How can we derive the minimum set of How can we derive the minimum set of constraints for a given implementation?constraints for a given implementation?

This is a non-trivial „constraint satisfaction“ This is a non-trivial „constraint satisfaction“ problem!problem!

We are currently planning to further pursue We are currently planning to further pursue this issue.this issue.

Page 64: Advanced Digital Design Practical Example: DARTS by A. Steininger and M. Delvai Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger & M. Delvai / TU Vienna 64

SummarySummary

FT clocking is becoming an issue, but current FT clocking is becoming an issue, but current solutions are insufficientsolutions are insufficient

DARTS adopts a distributed algorithm for SoC DARTS adopts a distributed algorithm for SoC clockingclocking

The TG alg implementation points out The TG alg implementation points out substantial differences between HW and SWsubstantial differences between HW and SW

It also exemplifies practical problems and It also exemplifies practical problems and benefits of asyncrnonus circuitsbenefits of asyncrnonus circuits

Timing constraints are unavoidable for Timing constraints are unavoidable for several reasons but difficult to derive and several reasons but difficult to derive and even minimize.even minimize.

There is still much work ahead…There is still much work ahead…