an introduction to reconfigurable computing mitch sukalski and craig ulmer dean r&d seminar 11...

19
An Introduction to Reconfigurable Computing Mitch Sukalski and Craig Ulmer Dean R&D Seminar 11 December 2003

Upload: martina-marsh

Post on 16-Dec-2015

216 views

Category:

Documents


2 download

TRANSCRIPT

An Introduction to Reconfigurable Computing

Mitch Sukalski and Craig Ulmer

Dean R&D Seminar

11 December 2003

Reconfigurable Computing…

is computation on a platform with reconfigurable (i.e., modifiable at run-time) hardware capable of implementing application-specific algorithms and functionality on demand.

Computing Spectrum

Executex / xor

Fetch

Decode

Registers

+

Memory

Writeback

Software

General-PurposeCPU

•Easily reprogrammed•Low cost•Fundamental bottlenecks

+

z-1

xorx

+

x

A B D π

x

C

result

Hardware

Application-Specific Integrated Circuit (ASIC)

•Not modifiable•High cost•Extremely fast

Soft-Hardware

Field ProgrammableGate Arrays (FPGAs)

•Reconfigurable hardware•Medium cost•Speedup potential

History

The Teramac CCM: Multi-Chip Module of FPGAs

Fixed+Variable CPU:Users can attach new computational circuits

to a fixed ALU

Xilinx Virtex FPGA

1945: Eckert, Mauchly, von Neumann: ENIAC

1945: “von Neumann architecture”

1960: Estrin: Fixed+Variable Structure Computer

1970’s: Simple PLDs

1985: Xilinx introduces first FPGA

1990’s: Custom Computing Machines (CCMs)

1999: FPGAs exceed million logic gates

2002: FPGAs include complex cores

ENIACConnecting computational

Blocks for an algorithmXilinx Virtex II Pro(image courtesy of rapidio.org)

Reconfigurable Computing in Modern HPC

• Stand-alone platforms– OctigaBay 12K– SRC-6– Starbridge Hypercomputer

• Accelerator cards– Timelogic’s DeCypher– Nallatech’s BenNUEY– Annapolis Micro Systems

WILDSTAR II

Example: Computational Fluid Dynamics

William Smith & Austars Schnore at GE Global Research

From: “Towards an RCC-based Accelerator for Computational Fluid Dynamics,” ERSA 2003

And now for some details…

• Field Programmable Gate Arrays (FPGAs)• Common RC design techniques• Reported examples

Field-Programmable Gate Arrays (FPGAs)

• FPGAs emulate digital logic circuitry– Large array of configurable logic blocks– Internal routing through programmable interconnection network

• FPGAs hold hardware configuration in SRAM– Change the digital circuitry by loading new configuration

• Design approach:– User designs in hardware description language– Synthesis tools translate to logic gates– Mapping tools target specific FPGA

Register

Register

LUT

LUT

Simplified Logic Block

• Emulates logic function– Thousands per chip

• Lookup Table (LUT)– Holds truth table– Inputs produce outputs

• 1-bit registers– Hold data between cycles

• Note: Greatly simplified

LUT Example:1-bit Adder

A B Cin Cout Sum

0 0 0 0 0

0 0 1 0 1

0 1 0 0 1

0 1 1 1 0

1 0 0 0 1

1 0 1 1 0

1 1 0 1 0

1 1 1 1 1

Register

Register

LUT

LUT

ABC0

ABC0

Cout

Sum

Truth Table

LBLB LBLB LBLB

LBLB LBLB LBLB

LBLB LBLB

LBLB LBLB

X X XX

LBLB LBLB LBLB LBLB LBLB

X X XX

LBLB LBLB LBLB LBLB LBLB

X X XX

LBLB LBLB LBLB LBLB LBLB

X X XX

Routing Data between Logic Blocks

• Need to connect logic blocks

• Wires and Switchboxes– LBs connect to local wires– Switchboxes route long

connections

• Routing set at compile time– Performed by tools

Reconfiguration

• Modern FPGAs SRAM based– Can be loaded with new circuitry

• Full reconfiguration– Few megabytes of configuration– Milliseconds

• Partial reconfiguration– Reprogram only a portion of chip– Reduces configuration time– Non-trivial, poorly supported

FPGA

Full Configuration Image

Partial Configuration Image

Design Techniques

Digital logic design techniques for exploiting FPGAs

FPGAs as Computational Accelerators

• Use FPGAs as soft-hardware– Port algorithm to hardware– Run inside FPGA– Reuse hardware

• Techniques– Concurrency, memory, partial evaluation

1. Concurrency

• Load FPGA with multiple computational circuits– Hardware state machines are like threads, but..– All tasks are always running

• Raw parallelism– Units run in parallel– Example: Key breaking

• Pipelining– Chain units together in series– Example: Streaming computations, data-flow

2. Custom Memory Interactions

• Most FPGA cards have multiple memory banks– Fetch/store multiple data values at same time– Predictable performance (as opposed to caches)– Hide address generation

SRAMBank 0

SRAMBank 1

SRAMBank 2

SRAMBank 3

X

X

XSRAMBank 4

FPGA

3. Partial Evaluation

• Know data constants at design time– Apply to circuits and reduce hardware– Synthesis tools perform automatically

Note: FPGAs unique because we can easily generate new, optimized hardware configurations for each set of constants.

Example: 4-bit Ripple-Carry Adder

RC Performance Examples

• CFD: 23 GFLOPS sustained– “Towards an RCC-based Accelerator for

Computational Fluid Dynamics,” Smith & Schnore, 2003

• Adaptive beamforming: 20 GFLOPS– Parallel systolic array architecture– “20 GFLOPS QR processor on a Xilinx Virtex-E

FPGA,” Walke, et. al., 2000

• Real-time holographic video display at 30fps– “Using field programmable gate arrays to scale up the

speed of holographic video computation,” Nwodoh

In Summary

• Reconfigurable computing uses FPGAs to emulate application-specific hardware– Achieve performance gains with dedicated hardware

• It is possible to implement just about any kind of digital hardware in the FPGA. – Limited by capacity and effort– Resurrect application-specific hardware architectures– SIMD, MIMD, Systolic Processor Arrays, Data-Flow…