making fpgas a cost-effective computing architecture

12
Making FPGAs a Making FPGAs a Cost-Effective Cost-Effective Computing Computing Architecture Architecture Tom VanCourt Tom VanCourt Yongfeng Gu Yongfeng Gu Martin Herbordt Martin Herbordt Boston University Boston University BOSTON UNIVERSITY

Upload: slade-farrell

Post on 31-Dec-2015

15 views

Category:

Documents


0 download

DESCRIPTION

BOSTON. UNIVERSITY. Making FPGAs a Cost-Effective Computing Architecture. Tom VanCourt Yongfeng Gu Martin Herbordt Boston University. FPGAs as Compute Engines. Proven successful 1000x speedup in bioinformatics, computational chemistry, etc. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Making FPGAs a  Cost-Effective Computing Architecture

Making FPGAs a Making FPGAs a Cost-Effective Cost-Effective

Computing ArchitectureComputing Architecture

Tom VanCourtTom VanCourtYongfeng GuYongfeng Gu

Martin HerbordtMartin Herbordt

Boston UniversityBoston University

BOSTONUNIVERSITY

Page 2: Making FPGAs a  Cost-Effective Computing Architecture

21 Jan 2005 2FPGAs for Computing

FPGAs as Compute EnginesFPGAs as Compute Engines

Proven successfulProven successful1000x speedup in bioinformatics, computational 1000x speedup in bioinformatics, computational

chemistry, etc.chemistry, etc.Accelerator board fits standard PC backplaneAccelerator board fits standard PC backplaneOff-the-shelf availability, modest HW costOff-the-shelf availability, modest HW cost

What makes them work so wellWhat makes them work so well>400 memory busses, 3Tbit>400 memory busses, 3Tbit//sec* total bandwidthsec* total bandwidthMassive parallelism, fast on-chip communicationMassive parallelism, fast on-chip communication

So why doesn’t everybody use them?So why doesn’t everybody use them?*Xilinx XC2VP100

Page 3: Making FPGAs a  Cost-Effective Computing Architecture

21 Jan 2005 3FPGAs for Computing

Field Programmable Gate ArraysField Programmable Gate Arrays

What is an FPGA?What is an FPGA?A bag of uncommitted computer partsA bag of uncommitted computer partsNo defined function – can be whatever you wantNo defined function – can be whatever you want

Why is programming a problem?Why is programming a problem?CPU CPU runsruns the application, FPGA the application, FPGA isis the application the applicationAll of what’s hard in programming, only more soAll of what’s hard in programming, only more so

How are we changing the model? How are we changing the model? Separating circuit design from application logicSeparating circuit design from application logicAddressing specific application areasAddressing specific application areas

Page 4: Making FPGAs a  Cost-Effective Computing Architecture

21 Jan 2005 4FPGAs for Computing

CPU vs. FPGA HardwareCPU vs. FPGA Hardware

CPU CPU has …has … FPGA FPGA has …has …Arithmetic & Arithmetic & logiclogic

1-10 pipelines1-10 pipelinesFixed data widthFixed data width

> 400 HW multipliers> 400 HW multipliers~100K function cells~100K function cells

Registers & Registers & memorymemory

Fixed reg. arrayFixed reg. array0-4 caches0-4 caches

~200K reg. bits~200K reg. bits> 400 cache RAMs> 400 cache RAMs

Connectivity & Connectivity & communicationcommunication

Fixed datapathFixed datapath1 ext. data bus1 ext. data bus1-32 dedicated I/O1-32 dedicated I/O

Arbitrary data pathArbitrary data path>1000 data I/O pins>1000 data I/O pins20 links, 3-10Gbit20 links, 3-10Gbit

Process Process technologytechnology

Incremental growthIncremental growthProcess limitedProcess limited

Exponential growthExponential growthProcess driverProcess driver

Page 5: Making FPGAs a  Cost-Effective Computing Architecture

21 Jan 2005 5FPGAs for Computing

Programming Skills vs. FPGAsProgramming Skills vs. FPGAs

Single-threadingSingle-threadingNo synchronizationNo synchronizationfor/if/switchfor/if/switch control control

Incremental executionIncremental executionOne instruction at a timeOne instruction at a timeResults are immediateResults are immediate

Common parallelizationCommon parallelizationLarge units of workLarge units of workCostly communicationCostly communication

Massive parallelismMassive parallelismVisible timing relationsVisible timing relationsState machine/hardwiredState machine/hardwired

Pipelined executionPipelined executionAll operations activeAll operations activeVisible dependenciesVisible dependencies

Parallelism modelParallelism modelFine grain – one ALU opFine grain – one ALU opCheap on-chip comm.Cheap on-chip comm.

CPU model FPGA model

Page 6: Making FPGAs a  Cost-Effective Computing Architecture

21 Jan 2005 6FPGAs for Computing

Attempts to DateAttempts to Date

Hardware description languages Hardware description languages Unfamiliar control & resource modelsUnfamiliar control & resource models

Graphical design entry - tedious Graphical design entry - tedious Example: Example: X = 3X = 3**Y + 5Y + 5**ZZ

Standard programming languages Standard programming languages Good SW structure Good SW structure good HW structure good HW structure

Semantic gap works both ways Semantic gap works both ways Good HW designers aren’t application expertsGood HW designers aren’t application experts

Page 7: Making FPGAs a  Cost-Effective Computing Architecture

21 Jan 2005 7FPGAs for Computing

Requirements for a SolutionRequirements for a Solution

Acknowledge SW and HW skills separatelyAcknowledge SW and HW skills separatelyHW expertise: system interface, memory structure, HW expertise: system interface, memory structure,

synchronization, computation arrayssynchronization, computation arrays

SW expertise: problem origination, data manipulation,SW expertise: problem origination, data manipulation,algorithm variations, explorationalgorithm variations, exploration

Allow ‘normal’ representations to bothAllow ‘normal’ representations to both

Eliminate dependencies between HW and SWEliminate dependencies between HW and SW Balance generality vs. domain specificsBalance generality vs. domain specifics

Create multiple levels of generalityCreate multiple levels of generality

Page 8: Making FPGAs a  Cost-Effective Computing Architecture

21 Jan 2005 8FPGAs for Computing

Behavior as a Parameter Behavior as a Parameter Reusable structureReusable structure

Standard HW reuse:Standard HW reuse: prefab leaf components + custom connectivityprefab leaf components + custom connectivity

Required HW reuse:Required HW reuse:

prefab connectivity + custom leavesprefab connectivity + custom leaves

Go beyond VHDL parameterizationGo beyond VHDL parameterizationNot just data values as parametersNot just data values as parametersBehavior as parameterBehavior as parameterLike C library’s Like C library’s qsort(data[], qsort(data[], compare()compare() ) )

? ? ?

?

? ?

?

Page 9: Making FPGAs a  Cost-Effective Computing Architecture

21 Jan 2005 9FPGAs for Computing

Reusing Control, not Function Reusing Control, not Function

Example: Iterative OptimizationExample: Iterative Optimization Logic designer provides:Logic designer provides:

Parameterized logic modelParameterized logic model End user provides:End user provides:

XX00 – Initial candidate – Initial candidate

FFjj(X) – Score next candidate solution (X) – Score next candidate solution jj

Best(SBest(S00, S, S11, …) – Select solution[s] based on score[s], …) – Select solution[s] based on score[s]

Fill-ins define the search algorithmFill-ins define the search algorithmHill climbing, Gibbs sampling, simulated annealing, …Hill climbing, Gibbs sampling, simulated annealing, …

Xi

F0(X) F1(X)

Xi+1 = Best(S0, …)X0

Page 10: Making FPGAs a  Cost-Effective Computing Architecture

21 Jan 2005 10FPGAs for Computing

Familiar SW Development StyleFamiliar SW Development Style

Standard design pattern*Standard design pattern*Template MethodTemplate Method or or StrategyStrategy

Event driven Event driven ‘inverted’ flow of control ‘inverted’ flow of controlSequence and synchronization outside of app.Sequence and synchronization outside of app.System calls app-specific logic when neededSystem calls app-specific logic when neededWidely used for GUI, web applicationsWidely used for GUI, web applications

Good match to object oriented design styleGood match to object oriented design styleSystem refers to abstract application interfaceSystem refers to abstract application interfaceApplication provides concrete logicApplication provides concrete logic

*Gamma et al., ‘Design Patterns’

Page 11: Making FPGAs a  Cost-Effective Computing Architecture

21 Jan 2005 11FPGAs for Computing

Preliminary ResultsPreliminary Results

Computational chemistry: 3D correlationComputational chemistry: 3D correlationSystolic array for direct correlationSystolic array for direct correlationSpeedup: 400x – 1000x relative to PC (using FFT)Speedup: 400x – 1000x relative to PC (using FFT)

Microarray analysisMicroarray analysisRegression analysis of disease vs. healthy stateRegression analysis of disease vs. healthy stateSpeedup: ~ 1000Speedup: ~ 1000×× relative to PC relative to PC

Approximate string matchingApproximate string matchingDynamic programming – Smith-WatermanDynamic programming – Smith-Waterman2.23 – 9.68 2.23 – 9.68 ×10×1099 character comparisons/sec character comparisons/sec

Page 12: Making FPGAs a  Cost-Effective Computing Architecture

21 Jan 2005 12FPGAs for Computing

Work in ProgressWork in Progress

XML representation for modelsXML representation for modelsDefine abstract application interfaceDefine abstract application interfaceDefine HW in terms of abstract interfaceDefine HW in terms of abstract interfaceDefine abstract FPGA resources & model constraintsDefine abstract FPGA resources & model constraints

Concrete representation of application logicConcrete representation of application logicCreate concrete application logicCreate concrete application logicCreate concrete FPGA resource descriptionCreate concrete FPGA resource descriptionBind concretions to abstract modelBind concretions to abstract model

Create synthesizable outputCreate synthesizable outputRepeatable elements scaled to actual FPGA resource limitsRepeatable elements scaled to actual FPGA resource limits