capes / dfg project universidade do brasilia universitaet kaiserslautern universitaet karlsruhe...

CAPES / DFG Project Universidade do Brasilia

Universitaet KaiserslauternUniversitaet Karlsruhe

Reiner Hartenstein*

University ofKaiserslautern

November 14, 2003, Brasilia, Brazil

Present and Future of Reconfigurable

Systems

*) IEEE fellow

University of Kaiserslautern

Xputer LabLiterature (also downloads)

http://hartenstein.de

also click „recent talks“this page: also links to available Ph. D theses:

Becker ,Herz, Kress, Nageldinger,

Xputer LabReconfigurable Computing:

a second programming domain

Migration of programming to the structural domain

The opportunity to introduce the structural domain to programmers ...

The structural domain has become RAM-based

... to bridge the gap by clever abstraction mechanisms using a simple new machine paradigm

Xputer LabIT ages

mainframe age

computer age (PC age)

data streams ...

morphware age

von Neumann does not support morphware

flowware

Xputer Lab>> outline <<

•fine grain reconfigurable•Placement and routing •coarse grain reconfigurable•Flowware•Datastream-based Computing•The Anti Machine Paradigm•Final Remarks

http://www.uni-kl.de

Xputer Labfine grain

• Fine Grain morphware platforms

already mainstream: reconfigurable logic

just logic design on a strange platform ?

speed-up til 3 orders of magnitude

Xputer Lab

cost / mio §

1mask set

cost [eASIC]

NRE and mask cost

[dataquest] .

12 12 16 20 26 28 30 >30no. of masks

0.8 0.6 0.35 0.25 0.18 0.15 0.13 0.1 0.07 feature size

PC: 25%

22%communication

others: 31%

6 %automotive

16% consumer

Xilinx42%

Altera37%

Lattice15%

Actel6%

Top 4 PLD Manufacturers 2000total: $3.7 Bio

• [Dataquest] > $7 billion by 2003.

• FPGAs going into every type of application – also SoC• fastest growing segment of semiconductor market

you don‘t need specific silicon !

Xputer Lab

switch

rGA with island architecture(Ausschnitt)

connect

switch

Xputer Lab switch box• R

switch box

switch

Xputer Lab connect box• R

connect boxconnect point

part of configuration

memory

Xputer Lab

Verbindungspunkt (vergrößert)

Verbindungs-Punkt• R

reconfigurable logic box

illustration

Xputer Lab connection activated

Die Zuleitung zur Funktionswahl des

rLB nicht gezeigt

reconfigurable logic box

illustration

Xputer Labconnect point activated• R

Xputer Lab

der 4. Schaltpunkt

der 5. Schaltpunkt

3 Schaltpunkte switch points

activated

• Ro

switch box

switch

Xputer Lab Routing continued

• Ro

Xputer Lab A

Plazierungs- und Routing Software bekannt s. 25 Jahren

Solche Netzwerk-Probleme manuell oder mit Hilfe der Graphen-Theorie behandelbar.

1979 Silva Lisco (Silicon Valley Research Corp.) bietet CALM-P an

20 Transistors + 20 Flipflops

Routing completed

for 1 net

•Routing

Xputer Lab

Passing through: long distance wiring from rLBs outside this region

Routing:long distance nets

A path can be used only once at a time .....

Xputer LabA

C and D are not reachable.

A bridge can be passed only once (bridges of Königsberg)

routing congestion

C cannot be connected with D.

Xputer Lab

Leonhard Euler

Euler‘s problem of the bridges of Königsberg is such a network problem (1736):

Find a way, which passes each bridge exactly once .....

... also an optimization: none of the bridges remains unused.

Xputer LabL. Euler: Solutio Problematis Ad geometriam Situs

Pertinentis; Commetarii Academiae Scientiarum Imperialis Petropolitanae 8 (1736), pp. 128-140

Left Bank

Right Bank

Kneiphof Island

Other Island

Xputer Lab

adjacency matrix

Data structures for Graphs

ListGraph

1 2 3 4from

2 14 /2

2 /33 /4

directed graph

1 2 3 4from

3 /2 13 1 22 1 33 /2 4

undirected graph

J. E. Hopcroft, R. E. Tarjan: Efficient algorithm

for graph manipulation; Comm. ACM, 1973

Xputer Lab

ENIAC, completed 1945

Partitioning over racks in the hallPartitioning over card cages in the rackPartitioning over boards (cards) in card cages Partitioning over chips etc. on the card (e. g. SBC)Partitioning over blocks on the chip (e. g. microprocessor)

Large Scale Routing

Xputer LabPCBs (printed circuit boards)

for 40 years

MULTEC at Böblingen produces printed circuits boards since 1963

planar „wiring“

no. of pins is limited

Xputer Lab

Integated Citcuit (Chip)limited number of pins

„wiring“ on a planar surface

Xputer Labhierarchy

card cage

cardchip

macro cell

basic cell

more levels

Kaisers-lautern

KL2 KL3 KL4

IMSIMS

Xputer Labwiring

hierarchy

cables in the rackconnect thecard cages

card cage wiringconnectsthe cards

card wiring connects the chips

macro cell

on-Chip-wiringconnectsthe cells

*) 30er: Telefon-Vermittlung (ohne Chips,Crossbar / Hebdreh-Wähler statt Karten)40er: erste Computer (ohne Chips)

Xputer Lab An obsolete Application Area

before fabrication ?

after fabrication ?

Xputer Lab

Celaro Pro (Mentor)

Dini Group

EmulatorsQuickturn

PCi bus extender

Dini group

Xputer LabCrossbar

324 x 4

no. of crossbar chips

n x n/2n

100 5000

cossbar chips in

full crossbar

100 100

no. of crossbar chips

cossbar chips in

partial crossbar

Xputer Lab

14 Logic Chips (Lchip) with 128 pins(occasionally for rout-through)

32 Crossbar Chips (Xchip) with 72 I/O pins(for rout-through only)

each Xchip: 4 pins connected to each Lchip

8 Logic cards per card cage

Logik-Karte

Einschub

Schrank

8 card cages per rack

8 Ychip cards per card cage

Backplane: 8 Zboard cards per rack

Routing

Xputer Lab

1913 J. N. Reynold‘s crossbar switch

1915 patent granted

1926 first public telefon switching application in Shweden

Betulander‘s crossbar switch 1919

NASA telemetrics crossbar array 1964

Crossbar ?

Xputer LabRWC Real World Computing, Japan, 40 TFLOPS

Crossbar weight: 220 tons, 3000 km cable,5120 processors with 5000 pins each

Xputer Lab Routing Congestion

Example

direct connection impossible

rGA rGA rGA rGA

rout-throughdetour connection

Xputer LabRouting-only configuration

(2 examples)

Identitityfunction

configured

• Ro

Xputer Lab

T. Uehara, W. M. van Cleemput: Optimal Layout of CMOS Functional Arrays; IEEE Trans. C-30, pp. 305-312, May 1981

Graphs, Partitioning, Algorithms

B. Kernighan, S. Lin: An Efficient Heuristic Procedure for Partitioning Graphs; BSTJ 49, 1970,

C. Alpert, A. Kahng: Recent Directions in Netlist Partitioning: A Survey; Integration, vol 19 (1-2), pp. 1-81, 1995

T. Cormen, et al.: Introduction to Algorithms; MIT Press / McGraw-Hill, 1991

Xputer Labwhy emulators are obsolete

10 000 000

1 000 000

100 000

10 000

1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004

planned

Virtex II

XC 40250XV

Virtex

XC 4085XL

System gates per rGA chip

[Xilinx Data]

Xputer Lab

More and more the prototyping platform of rGA based systems will be directly delivered as the product to the customer: fully configured

ASICs lost the battle. rGAs are the winners

2001 2002 2003 2004

50,000

40,000

30,000

20,000

10,000

number of design starts

rGA-basiert

[N. Tredennick, Gilder Technology Report, 2003]

why declining ASIC business?

ASIC emulators have been a transient solution: now with declining commercial significance.

you don‘t need specific silicon !you don‘t need specific silicon !

Xputer Lab

• FPGA Fabric-based on Virtex-II Architecture

Source: Ivo Bolsens, Xilinx

On Chip Memory Controller

Power PCCore

EmbededRAM

RocketIO

Xilinx: full hierarchy on chip

from rack to chipfrom rack to chip• Xilinx Virtex-II Pro

FPGA Architecture

• PowerPC 405 RISC CPU (PPC405) cores

Xputer Labfocusing on coarse grain

• Fine Grain morphware platforms

• Coarse Grain platforms:

already mainstream: reconfigurable logicjust logic design on a strange platform

Reconfigurable Computing :not that new – but shocking the

fundamentals of CS curricula

an order of magnitude more MIPS/mW than fine grain

Xputer Labwhy coarse grain

0.0012 1 0.5 0.25 0.13 0.1 0,07

MOPS / mW

µ feature size

FPGAs (reconfigurable logic)hardwired

instruction set processors

standard microprocessor

T. Claasen et al.: ISSCC 1999*) R. Hartenstein: ISIS 1997

rDPAs (reconfigurable computing)*

flexibility

throughput

hard-wired

vonNeumann

coarse grain goes far beyond bridging the gap

coarsegrain

Xputer Lab

Reconfigurable Interconnect Fabric

separate routing area

rDPA (Reconfigurable Datapath Array)

rDPU rDPU rDPU rDPU

RIF layouted over rDPUs:rDPA wired by abutment

Xputer LabCMOS intercoonnect resources

Foundries offer up to 9 metal layers

and up to 3 poly layers

reconfigurable interconnect fabric layouted over the

rDU cell

Xputer LabCommercial rDPAs

XPU family (IP cores):PACT Corp., Munich

XPU128

Xputer Lab

rDPU not used used for routing only operator and routing port location markerLegend: backbus connect

array size: 10 x 16 = 160 rDPUs

mapping algorithms efficently onto rDPA

rout thru only

not usedbackbus connect

SNN filter on KressArray

by the way: example of scalability / relocatability by EDA support

„Structured

Configware

Design“ [R. H.]

Xputer Lab

badly scalable

Hundreds of rGAs or very large rGAs

Routing congestion growing exponentially

•Routing

Xputer Lab Communication Resource Requirements

... often Functional Resources are not the Throughput

BottleneckIn some Application Areas,such as e. g. Wireless Communication, Reconfigurable Computing Arraysneed extraordinarily rich and powerful Communication ResourcesThe Solution: Generators for Domain-specific RA Platforms

Xputer Lab

KressArray Family generic Fabrics: a few examples

Examples of 2nd Level Interconnect:layouted overrDPU cell - no separate routing areas !

rout-through and function

rout-throug

h only more NNports:

rich Rout Resources

Select Function

Repertory

select Nearest Neighbour (NN) Interconnect: an example

16 32 8 24

2 rDPU

Select mode, number, width of NNports

http://kressarray.de

Xputer LabSuper Pipe Networks

pipeline propertiesarray applications

shape resources

mappingscheduling

(data streamformation)

systolicarray

regular datadependencies

linearonly

uniformonly

linear projection oralgebraic synthesis

super-systolicRA

no restrictionssimulated

annealing orP&R algorithm

(e.g. force-directed)schedulingalgorithm

The key is mapping, rather than architecture

**) KressArray [ASP-DAC-1995]

Xputer LabMorphware machines vs. hardwired

machines

platformprogram source

running on it

hardware (not programmable)

morphware

fine grain rGA (FPGA)configwarecoarse

grainrDPU, rDPA

machine

reconfigurable data stream processor

flowware & configware

hardwired

data stream processor

flowware

instruction stream processor (v. N.)

software

A clear terminology helps a lot

Xputer Lab

input data streams

|output data streams

port #

port #time

port #

... which data item at which time at which port

Flowware defines:

Xputer LabParadigm Shifts:

Nick Tredennick‘s view

algorithms variable

resources fixed

instruction-stream-based computing:

algorithms variable

resources variable

data-stream-based reconfigurable computing:

programmable

why 2 program sources ?

Configware

resources variable

Flowware

data-stream

Software

instruction-stream

Xputer Lab

Flowware heading toward mainstream

•Data-stream-based Computing is heading for mainstream

–1997 SCCC (LANL) Streams-C Configurabble Computing

–SCORE (UCB) Stream Computations Organized for Reconfigurable Execution

–ASPRC (UCB) Adapting Software Pipelining for Reconfigurable Computing

–2000 Bee (UCB), ...

–Most stream-based multimedia systems, etc.

–Many other areas ....

Flowware ..... mostly not yet modelled that way: most

flowware is hidden by its indirect instruction-stream-based implementationFlowware:

managing data streamsSoftware: managing instruction streams

Xputer Labcontrol-procedural vs. data-procedural

The structural domain is primarily data-stream-based:

Flowware provides a (data-)procedural abstraction of the (data-stream-based) structural domain

Flowware converts „procedural vs. structural“ into „control-procedural vs. data-procedural“ ...

... a Troyan horse to introduce the structural domain to the procedural mind set of programmers

Xputer Lab

distributed memory

architecture

distributed memory

architecture

Configware / Flowware Compilation

r. DataPath

rDPA intermediate

high level source

wrapper

flowwareflowware

scheduler

M M M M

data streams

data sequencer

address generato

„instruction“ fetch before runtime

configwareconfigware

mapper

Xputer Lab>>> extremely high

efficiency: flowware-based computing

1. avoiding address computation memory cycle overhead

2. avoiding instruction fetch and interpretation overhead

3. high parallelism, massively multiple deep pipelines

4. much less configuration memory

5. interconnect layouted over the cell: no extra routing areas

6. methodologies readily available

Xputer LabProgramming Language

Paradigms

language category Software Languages Languages f. Anti Machine

both deterministic procedural sequencing: traceable, checkpointable

operation sequence driven by:

read next instruction, goto (instr. addr.),

jump (to instr. addr.), instr. loop, loop nesting

no parallel loops, escapes, instruction stream branching

capes / dfg project universidade do brasilia universitaet kaiserslautern universitaet karlsruhe...

university of kaiserslautern

switch slide

rgas slide

ieee fellow slide

configuration memory

switch rga

structural domain

schaltpunkte switch

Documents

(keynote) (from hpc to) new horizons of very high...

reiner hartenstein, university of kaiserslautern,...

however, we are far >> outline - - tu kaiserslautern ·...

cs curricula update proposed: by adding reconfigurable...

alexander hartenstein deep learning paradigm

the von neumann syndrome calls for a revolution reiner...

reiner hartenstein, university of kaiserslautern, · pdf...

workshop selbstoptimierung und adaption reiner hartenstein*...

how to cope with the power wall reiner hartenstein tu...

reiner hartenstein, tu kaiserslautern,...

personal overview lars hartenstein

powerpoint-präsentation - - tu kaiserslautern ·...

isict 2005 supercomputing going reconfigurable reiner...

reiner hartenstein, university of kaiserslautern, germany...

smart city hackathon - tu kaiserslautern · 2018-04-20 ·...

vlsi-soc 2001 ifip - lirmm stream-based arrays: converging...

enabling technologies for reconfigurable computing reiner...

deliverable d6.1 of task 6.2 initial version of the...

reiner hartenstein, university of kaiserslautern, germany...

universitaet augsburg