making fpgas a cost-effective computing architecture
DESCRIPTION
BOSTON. UNIVERSITY. Making FPGAs a Cost-Effective Computing Architecture. Tom VanCourt Yongfeng Gu Martin Herbordt Boston University. FPGAs as Compute Engines. Proven successful 1000x speedup in bioinformatics, computational chemistry, etc. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Making FPGAs a Cost-Effective Computing Architecture](https://reader035.vdocuments.mx/reader035/viewer/2022072015/5681301b550346895d9596ec/html5/thumbnails/1.jpg)
Making FPGAs a Making FPGAs a Cost-Effective Cost-Effective
Computing ArchitectureComputing Architecture
Tom VanCourtTom VanCourtYongfeng GuYongfeng Gu
Martin HerbordtMartin Herbordt
Boston UniversityBoston University
BOSTONUNIVERSITY
![Page 2: Making FPGAs a Cost-Effective Computing Architecture](https://reader035.vdocuments.mx/reader035/viewer/2022072015/5681301b550346895d9596ec/html5/thumbnails/2.jpg)
21 Jan 2005 2FPGAs for Computing
FPGAs as Compute EnginesFPGAs as Compute Engines
Proven successfulProven successful1000x speedup in bioinformatics, computational 1000x speedup in bioinformatics, computational
chemistry, etc.chemistry, etc.Accelerator board fits standard PC backplaneAccelerator board fits standard PC backplaneOff-the-shelf availability, modest HW costOff-the-shelf availability, modest HW cost
What makes them work so wellWhat makes them work so well>400 memory busses, 3Tbit>400 memory busses, 3Tbit//sec* total bandwidthsec* total bandwidthMassive parallelism, fast on-chip communicationMassive parallelism, fast on-chip communication
So why doesn’t everybody use them?So why doesn’t everybody use them?*Xilinx XC2VP100
![Page 3: Making FPGAs a Cost-Effective Computing Architecture](https://reader035.vdocuments.mx/reader035/viewer/2022072015/5681301b550346895d9596ec/html5/thumbnails/3.jpg)
21 Jan 2005 3FPGAs for Computing
Field Programmable Gate ArraysField Programmable Gate Arrays
What is an FPGA?What is an FPGA?A bag of uncommitted computer partsA bag of uncommitted computer partsNo defined function – can be whatever you wantNo defined function – can be whatever you want
Why is programming a problem?Why is programming a problem?CPU CPU runsruns the application, FPGA the application, FPGA isis the application the applicationAll of what’s hard in programming, only more soAll of what’s hard in programming, only more so
How are we changing the model? How are we changing the model? Separating circuit design from application logicSeparating circuit design from application logicAddressing specific application areasAddressing specific application areas
![Page 4: Making FPGAs a Cost-Effective Computing Architecture](https://reader035.vdocuments.mx/reader035/viewer/2022072015/5681301b550346895d9596ec/html5/thumbnails/4.jpg)
21 Jan 2005 4FPGAs for Computing
CPU vs. FPGA HardwareCPU vs. FPGA Hardware
CPU CPU has …has … FPGA FPGA has …has …Arithmetic & Arithmetic & logiclogic
1-10 pipelines1-10 pipelinesFixed data widthFixed data width
> 400 HW multipliers> 400 HW multipliers~100K function cells~100K function cells
Registers & Registers & memorymemory
Fixed reg. arrayFixed reg. array0-4 caches0-4 caches
~200K reg. bits~200K reg. bits> 400 cache RAMs> 400 cache RAMs
Connectivity & Connectivity & communicationcommunication
Fixed datapathFixed datapath1 ext. data bus1 ext. data bus1-32 dedicated I/O1-32 dedicated I/O
Arbitrary data pathArbitrary data path>1000 data I/O pins>1000 data I/O pins20 links, 3-10Gbit20 links, 3-10Gbit
Process Process technologytechnology
Incremental growthIncremental growthProcess limitedProcess limited
Exponential growthExponential growthProcess driverProcess driver
![Page 5: Making FPGAs a Cost-Effective Computing Architecture](https://reader035.vdocuments.mx/reader035/viewer/2022072015/5681301b550346895d9596ec/html5/thumbnails/5.jpg)
21 Jan 2005 5FPGAs for Computing
Programming Skills vs. FPGAsProgramming Skills vs. FPGAs
Single-threadingSingle-threadingNo synchronizationNo synchronizationfor/if/switchfor/if/switch control control
Incremental executionIncremental executionOne instruction at a timeOne instruction at a timeResults are immediateResults are immediate
Common parallelizationCommon parallelizationLarge units of workLarge units of workCostly communicationCostly communication
Massive parallelismMassive parallelismVisible timing relationsVisible timing relationsState machine/hardwiredState machine/hardwired
Pipelined executionPipelined executionAll operations activeAll operations activeVisible dependenciesVisible dependencies
Parallelism modelParallelism modelFine grain – one ALU opFine grain – one ALU opCheap on-chip comm.Cheap on-chip comm.
CPU model FPGA model
![Page 6: Making FPGAs a Cost-Effective Computing Architecture](https://reader035.vdocuments.mx/reader035/viewer/2022072015/5681301b550346895d9596ec/html5/thumbnails/6.jpg)
21 Jan 2005 6FPGAs for Computing
Attempts to DateAttempts to Date
Hardware description languages Hardware description languages Unfamiliar control & resource modelsUnfamiliar control & resource models
Graphical design entry - tedious Graphical design entry - tedious Example: Example: X = 3X = 3**Y + 5Y + 5**ZZ
Standard programming languages Standard programming languages Good SW structure Good SW structure good HW structure good HW structure
Semantic gap works both ways Semantic gap works both ways Good HW designers aren’t application expertsGood HW designers aren’t application experts
![Page 7: Making FPGAs a Cost-Effective Computing Architecture](https://reader035.vdocuments.mx/reader035/viewer/2022072015/5681301b550346895d9596ec/html5/thumbnails/7.jpg)
21 Jan 2005 7FPGAs for Computing
Requirements for a SolutionRequirements for a Solution
Acknowledge SW and HW skills separatelyAcknowledge SW and HW skills separatelyHW expertise: system interface, memory structure, HW expertise: system interface, memory structure,
synchronization, computation arrayssynchronization, computation arrays
SW expertise: problem origination, data manipulation,SW expertise: problem origination, data manipulation,algorithm variations, explorationalgorithm variations, exploration
Allow ‘normal’ representations to bothAllow ‘normal’ representations to both
Eliminate dependencies between HW and SWEliminate dependencies between HW and SW Balance generality vs. domain specificsBalance generality vs. domain specifics
Create multiple levels of generalityCreate multiple levels of generality
![Page 8: Making FPGAs a Cost-Effective Computing Architecture](https://reader035.vdocuments.mx/reader035/viewer/2022072015/5681301b550346895d9596ec/html5/thumbnails/8.jpg)
21 Jan 2005 8FPGAs for Computing
Behavior as a Parameter Behavior as a Parameter Reusable structureReusable structure
Standard HW reuse:Standard HW reuse: prefab leaf components + custom connectivityprefab leaf components + custom connectivity
Required HW reuse:Required HW reuse:
prefab connectivity + custom leavesprefab connectivity + custom leaves
Go beyond VHDL parameterizationGo beyond VHDL parameterizationNot just data values as parametersNot just data values as parametersBehavior as parameterBehavior as parameterLike C library’s Like C library’s qsort(data[], qsort(data[], compare()compare() ) )
? ? ?
?
? ?
?
![Page 9: Making FPGAs a Cost-Effective Computing Architecture](https://reader035.vdocuments.mx/reader035/viewer/2022072015/5681301b550346895d9596ec/html5/thumbnails/9.jpg)
21 Jan 2005 9FPGAs for Computing
Reusing Control, not Function Reusing Control, not Function
Example: Iterative OptimizationExample: Iterative Optimization Logic designer provides:Logic designer provides:
Parameterized logic modelParameterized logic model End user provides:End user provides:
XX00 – Initial candidate – Initial candidate
FFjj(X) – Score next candidate solution (X) – Score next candidate solution jj
Best(SBest(S00, S, S11, …) – Select solution[s] based on score[s], …) – Select solution[s] based on score[s]
Fill-ins define the search algorithmFill-ins define the search algorithmHill climbing, Gibbs sampling, simulated annealing, …Hill climbing, Gibbs sampling, simulated annealing, …
Xi
F0(X) F1(X)
Xi+1 = Best(S0, …)X0
![Page 10: Making FPGAs a Cost-Effective Computing Architecture](https://reader035.vdocuments.mx/reader035/viewer/2022072015/5681301b550346895d9596ec/html5/thumbnails/10.jpg)
21 Jan 2005 10FPGAs for Computing
Familiar SW Development StyleFamiliar SW Development Style
Standard design pattern*Standard design pattern*Template MethodTemplate Method or or StrategyStrategy
Event driven Event driven ‘inverted’ flow of control ‘inverted’ flow of controlSequence and synchronization outside of app.Sequence and synchronization outside of app.System calls app-specific logic when neededSystem calls app-specific logic when neededWidely used for GUI, web applicationsWidely used for GUI, web applications
Good match to object oriented design styleGood match to object oriented design styleSystem refers to abstract application interfaceSystem refers to abstract application interfaceApplication provides concrete logicApplication provides concrete logic
*Gamma et al., ‘Design Patterns’
![Page 11: Making FPGAs a Cost-Effective Computing Architecture](https://reader035.vdocuments.mx/reader035/viewer/2022072015/5681301b550346895d9596ec/html5/thumbnails/11.jpg)
21 Jan 2005 11FPGAs for Computing
Preliminary ResultsPreliminary Results
Computational chemistry: 3D correlationComputational chemistry: 3D correlationSystolic array for direct correlationSystolic array for direct correlationSpeedup: 400x – 1000x relative to PC (using FFT)Speedup: 400x – 1000x relative to PC (using FFT)
Microarray analysisMicroarray analysisRegression analysis of disease vs. healthy stateRegression analysis of disease vs. healthy stateSpeedup: ~ 1000Speedup: ~ 1000×× relative to PC relative to PC
Approximate string matchingApproximate string matchingDynamic programming – Smith-WatermanDynamic programming – Smith-Waterman2.23 – 9.68 2.23 – 9.68 ×10×1099 character comparisons/sec character comparisons/sec
![Page 12: Making FPGAs a Cost-Effective Computing Architecture](https://reader035.vdocuments.mx/reader035/viewer/2022072015/5681301b550346895d9596ec/html5/thumbnails/12.jpg)
21 Jan 2005 12FPGAs for Computing
Work in ProgressWork in Progress
XML representation for modelsXML representation for modelsDefine abstract application interfaceDefine abstract application interfaceDefine HW in terms of abstract interfaceDefine HW in terms of abstract interfaceDefine abstract FPGA resources & model constraintsDefine abstract FPGA resources & model constraints
Concrete representation of application logicConcrete representation of application logicCreate concrete application logicCreate concrete application logicCreate concrete FPGA resource descriptionCreate concrete FPGA resource descriptionBind concretions to abstract modelBind concretions to abstract model
Create synthesizable outputCreate synthesizable outputRepeatable elements scaled to actual FPGA resource limitsRepeatable elements scaled to actual FPGA resource limits