commercial in confidence. copyright ©2005, nallatech.1 is it time for von neumann and harvard to...
TRANSCRIPT
Commercial In Confidence. Copyright ©2005, Nallatech.1
Is it time for Von Neumann and Harvard
to Retire?
Presented by :Allan Cantle – CEO
www.nallatech.com
Commercial In Confidence. Copyright ©2005, Nallatech.2
Agenda
» History & Commercial Realities of FPGA Computing» Thoughts on the future possibilities for FPGA
Computing» FPGA Coprocessor vs FPGA main processor» Optimizing Spatial & Temporal Demands of Computing
Problems» Homogeneous vs Heterogeneous vs Polymorphic Computing» Coarse Grain Vs Fine Grain Architectures» Distributed Parallel processing Vs Clustered Parallel processing
» Should Von-Neumann and Harvard Architectures be retired
» Summary
Commercial In Confidence. Copyright ©2005, Nallatech.3
Introduction
» FPGAs………a 20 year History!» From “Glue Logic” beginnings to complete
“Systems on a Chip” today. » Mathematical functions in FPGAs since 1993 » FPGAs pervasive in virtually all electronics
equipment » Many different perspectives on FPGA capability» This leads to confusion in the market place and
mixed messages
Commercial In Confidence. Copyright ©2005, Nallatech.4
FPGA Perceptions
ASICMicro-
processor
Easy to ProgramFlexible
Difficult to programFixed Function
Low Performance High Performance
DSP FPGA
When an FPGA is viewed as an ASIC……………..…the observer historically saw……
Low performance
Wasted TransistorsHigher Power Consumption
Good for prototyping
Never Use in Production
High Cost
Commercial In Confidence. Copyright ©2005, Nallatech.5
FPGA Perceptions
ASICMicro-
processor
Easy to ProgramFlexible
Difficult to programFixed Function
Low Performance High Performance
DSP FPGA
These were the original views of FPGAs and they have STUCK in many peoples minds
Low performance
Wasted TransistorsHigher Power Consumption
Good for prototyping
Never Use in Production
High Cost
Commercial In Confidence. Copyright ©2005, Nallatech.6
FPGA Perceptions
ASICMicro-
processor
Easy to ProgramFlexible
Difficult to programFixed Function
Low Performance High Performance
DSP FPGA
When an FPGA is viewed as a DSP……………..…today the observer sees ……
Co-ProcessorHigh Performance
I/O InterfaceNiche
Commercial In Confidence. Copyright ©2005, Nallatech.7
FPGA Perceptions
ASICMicro-
processor
Easy to ProgramFlexible
Difficult to programFixed Function
Low Performance High Performance
DSP FPGA
When an FPGA is viewed as a processor……………..…Nallatech sees ……
Main ProcessorHigh Performance
Floating Point
Lower Power Consumption
Increased Performance Density
Immature Tools
Commercial In Confidence. Copyright ©2005, Nallatech.8
Have It Your Way!
ASICMicro-
processor
Easy to ProgramFlexible
Difficult to programFixed Function
Low Performance High Performance
FPGA
Commercial In Confidence. Copyright ©2005, Nallatech.9
HPEC Vs HPC
» Earlier FPGA adoption within HPEC Community » FPGA based HPEC is a volume based commercial
reality today » High Performance Embedded Computing (HPEC)
» Users have an appreciation of underlying hardware technology» Low level of programming abstraction» Applications are often severely SWAP restricted
» High Performance Computing (HPC)» Users are far more software centric » Programming is achieved using high level software languages» Exclusive use of Floating Point arithmetic» Focus on ease of implementation of complex algorithms
Commercial In Confidence. Copyright ©2005, Nallatech.10
1993 – Simple Maths Functions within FPGA
» Simple Mathematical Functions» 2 bit arithmetic function per Logic Slice» Effective for 1D Pipelined data paths» Very Basic Functions - highly repetitive and Data Intensive» Schematic Hardware designed
» XC4006 Series FPGAs
» 256 Logic Slices» 128 max user I/O
Commercial In Confidence. Copyright ©2005, Nallatech.11
MicroprocessorOnly
Microprocessorbased
FPGAcoprocessor
1990
Pre Nallatech
1993
Professional Consulting Services
Nallatech’s Adoption of FPGAs for HPEC
Commercial In Confidence. Copyright ©2005, Nallatech.12
Nallatech’s Early FPGA HPEC Computing Experience - 1993
» Real time, ultra low latency, Imaging Simulator» Embedded Distributed Processing System» Floating point, matrix transformations, convolution, sensor interfacing etc
Gain + Offset Control Convolution
Image Composition
Sensor Interface
Target Image Generation
Background Image Processing
T9FPGA I860
x6 x4x2I860I860I860T9
FPGAFPGAFPGAFPGAFPGA
FPGAx2 FPGA
T9
FPGA
x3
x1
T9T9
T9
FPGA
C80
x2
x3
x4
T9
FPGAFPGA
C80C80C80
T9x3 T9T9
Commercial In Confidence. Copyright ©2005, Nallatech.13
1998 – Virtex FPGA Family
» Revolutionary Xilinx Virtex FPGA family Introduced» 32 x 4Kbits Block RAMs + other mathematical features
introduced» allowed 2D mathematical algorithms to be implemented» Excellent for Image Processing Algorithms
» Significant DSP capability
» Virtex
» 12,000 Logic Slices» 804 max user I/O» 32 4Kbit Block RAMs
Commercial In Confidence. Copyright ©2005, Nallatech.14
MicroprocessorOnly
Microprocessorbased
FPGAcoprocessor
1990
Pre Nallatech
1993
Professional Consulting Services
Nallatech’s Adoption of FPGAs for HPEC
Microprocessor coprocessor
FPGAbased
1998DIME Product
Family FPGA Centric
Computing Architecture
Commercial In Confidence. Copyright ©2005, Nallatech.15
2001 – Virtex-II FPGA Family
» Virtex-II FPGA introduced followed by Virtex-II Pro in 2003» 444 18x18 Multipliers & 18kbit block RAMs introduced» Gbit Serial I/O Communications & Power PC Processors Introduced» Complex Floating Point Algorithm Implementation now possible
» Virtex-II / Pro» 44,000 Logic Slices» 444 18Kbits BRAMs» 444 18x18
Multipliers» 2 PowerPC
Processors» 20 Gbit I/O» 1164 Max User I/O
Commercial In Confidence. Copyright ©2005, Nallatech.16
What This Means for HPC
MicroprocessorItanium 2
FPGAVirtex 2VP100
Technology 0.13 Micron 0.13 Micron
Clock Speed 1.6GHz 180MHz
Internal Memory Bandwidth
102 GBytes per Sec 7.5 TBytes per Sec
# Processing Units
5 FPU(2MACs + 1FPU) + 6 MMU + 6 Integer Units
212 FPU or 300+ Integer Units
or ……….
Power Consumption
130 WATTS 15 WATTS
Peak Performance 8 GFLOPs 38 GFLOPS
Sustained Performance
~2 GFLOPs ~19 GFLOPS
I/O / External Memory Bandwidth
6.4 GBytes/sec 67 GBytes/sec
Commercial In Confidence. Copyright ©2005, Nallatech.17
MicroprocessorOnly
Microprocessorbased
FPGAcoprocessor
Microprocessor coprocessor
FPGAbased
1990
Pre Nallatech
1993
Professional Consulting Services
1998
DIME Product Family
FPGA Centric Computing Architecture
Nallatech’s Adoption of FPGAs for HPEC
FPGA
Microprocessor embedded
2001
DIME-II Product Family
FPGA Centric Computing Architecture
Commercial In Confidence. Copyright ©2005, Nallatech.18
HardwarePlatform
System Software
SystemsCommunications
FPGA ComputingThe Whole Solution
» Inter-FPGA Communication
»Abstracts Hardware Architecture
» System Management and control
» APIs
» COTS Hardware
» Modular
» Multiple-FPGA Systems
Commercial In Confidence. Copyright ©2005, Nallatech.19
FPGA Communications and Tool Support
From PCI
Physical Link
FPGA
PCI Host
FPGA
N3
N4
R
N5
R N6N2
N1
RE
BB
VHDL
Memory
MATLAB
Processors
Block Flows
C FlowsOpen 3rd Party Component
Support
N
N N
N N
N
Viva
- C
Commercial In Confidence. Copyright ©2005, Nallatech.20
Accelerated Hardware Implementation
From PCI
Physical Link
FPGA
PCI Host
FPGA
N3
N4
R
N5
R N6N2
N1
RE
BB
VHDL
Memory
MATLAB
Processors
Block Flows
C Flows3rd Party Component
Support
N
N N
N N
N
Viva
- C
Commercial In Confidence. Copyright ©2005, Nallatech.21
Commercial Realities for HPC
» Not a Panacea – As with all parallelisation» Translation of Legacy Code
Legacy Code
C Program
FPGA Translated
Execution Time
0%
100%
On Processor
Bandwidth & Latency
Considerations
uP ExecutedFPGA
Executed
uP / FPGA Partition
Execution Time
Commercial In Confidence. Copyright ©2005, Nallatech.22
Commercial Realities for HPC
» Maturity of High Level Languages» Good progress has been made in FPGA compilers» Often Trade off between performance and ease of use
» Parallelising of code» Fine Grain parallelism is critical» Still not automatic» Taking implied parallelism from serial code is NOT good
enough» HPC Software engineers are well qualified to deal with this
» Development & Debug Time» Comparable to programming in assembler vs C» Biggest Hurdle is the Synthesis times –hours to days!» Where tools can make a significant impact- If they have no
bugs!
Commercial In Confidence. Copyright ©2005, Nallatech.23
Commercial Realities for HPC
» No Real Industry Standardisation » Requires expertise to “brew your own solutions”» Difficulty for Beginners
» Bang for Buck – for Floating Point Implementations» NRE today WILL be more expensive - ~5-10 times» Can approach 100 times performance for 1/2th the Cost » Significantly reduced SWAP, >200 times,
» Result in a significant Cost of Ownership savings
Commercial In Confidence. Copyright ©2005, Nallatech.24
Nallatech’s Adoption of FPGAs for HPEC
MicroprocessorOnly
Microprocessorbased
FPGAcoprocessor
Microprocessor coprocessor
FPGAbased
1990
Pre Nallatech
1993
Professional Consulting Services
1998
DIME Product Family
FPGA Centric Computing Architecture
FPGA
Microprocessor embedded
2001
DIME-II Product Family
FPGA Centric Computing Architecture
FPGA
2003
FPGA Based HPC Solutions
Commercial In Confidence. Copyright ©2005, Nallatech.25
Algorithm Acceleration
Seismic Processing - Kirchhoff algorithm
- Single Precision Floating Point - 64 times faster than a 2GHz Pentium 4-200 times less power consumption
Smith-Waterman- Dynamic Programming Algorithm used in Biological Sequencing- 155 times faster than SunFIRE 280R processing unit
Commercial In Confidence. Copyright ©2005, Nallatech.26
Algorithm Acceleration
Real Time Video Processing - Single Precision
Floating Point calculations-36 GFlops + 40 GOPs sustained Performance on a single PCI card- >200 times Power reduction over Xeon
Gravity Simulation
- N-Body computation- Single Precision Floating Point - 20GFlops/sec sustained performance-100 times faster than 2.4GHz Pentium 4 CPU
Commercial In Confidence. Copyright ©2005, Nallatech.27
A Young Solar System
Commercial In Confidence. Copyright ©2005, Nallatech.28
Colliding Galaxies
Commercial In Confidence. Copyright ©2005, Nallatech.29
FPGA CoprocessorVs
FPGA Main Processor
Commercial In Confidence. Copyright ©2005, Nallatech.30
FPGA’s as Coprocessors
» Accelerator / Offload Engine for Microprocessor based solutions
» Advantages» Easy to conceptualise» Pragmatic Approach» Possible large performance improvement for least effort» Port small functions of Compute intensive Legacy Code» Rest of code remains on Existing Host. » Benefit from Existing Host interfaces
» Disadvantages » Only Applicable to certain functions» Need to consider bandwidth / latency requirements
Commercial In Confidence. Copyright ©2005, Nallatech.31
FPGA’s as Main Processor
» The FPGA takes on the complete Compute Function
» Advantages» Build the Computing Architecture around Algorithmic
Problem» Can provide another order of magnitude increase in
performance» Can go back to First principles» Don’t have to port Optimised processor code to FPGA
» Disadvantages» Rarely Start with a clean sheet of paper» Tool Maturity» Only practical for relatively straight forward algorithms
Commercial In Confidence. Copyright ©2005, Nallatech.32
Main Vs Co - Recommendations
» Co-processor approach is most applicable Today : -» For HPC» Whenever a Man Machine interface is required» Whenever low performance industry standard interfaces are
required. » If you need to work with legacy code» Quick wins
» A Main Processor Approach is recommended Today : -» For Stand alone Embedded applications (HPEC)» When starting with a clean sheet of paper» If ultimate performance is a pre-requisite » The best power/performance ratio is required from your system» Relaxed development times
Commercial In Confidence. Copyright ©2005, Nallatech.33
Optimizing Spatial & Temporal Demands
of Computing Problems
Commercial In Confidence. Copyright ©2005, Nallatech.34
Spatial & Temporal Definitions
» There are several perspective on the meaning of spatial and temporal 1. Cluster of Microprocessors
» Temporal = Function runs within one processor» Spatial = Function spread across many microprocessor nodes
2. Traditional Embedded Computing Hardware» Temporal = using Microprocessor» Spatial = Implementing dedicated ASIC accelerators
3. FPGAs» Temporal = using same logic resources for multiple Functions» Spatial = Paralleling and pipelining a function across the FPGA
Fabric
» Ultimate Aim is to ensure that no processor goes idle» And you utilise all the available resources
Commercial In Confidence. Copyright ©2005, Nallatech.35
Complexity and Speed
» Any Application will be constructed from several functions» Each Function will have varying degrees of complexity» Each Function will also have varying demands on its
execution time
ComplexSimple
Can Execute Slowly
Must Execute Quickly
ComputeIntensive
MediumCompute
LowCompute
MediumCompute
Fully spatial implementati
on
Balanced spatial and temporal
implementation
Fully Temporal
Implementation
TraditionalASIC
Traditional Vector
TraditionalMicroproces
sorFPGA
All parallel / pipelined
FPGAPartially Parallel
Partial reuse
FPGASoft
Microprocessor
Commercial In Confidence. Copyright ©2005, Nallatech.36
Homogenous ComputingVersus
Heterogeneous ComputingVersus
Polymorphic Computing
Commercial In Confidence. Copyright ©2005, Nallatech.37
Direction of Computing
Cell 1
PPC
Cell 3
Cell 5
Cell 7
Cell 2
Cell 4
Cell 6
Cell 8
Cell Processor – IBM, Sony, Toshiba
Heterogeneous Computing on Silicon
Global Shared Memory
uPuP
GP
FPGAFPGAASSP
SGI – Heterogeneous Architecture
Intel – 16 processor per dieHomogeneous on Silicon
Polymorphic Computing
Commercial In Confidence. Copyright ©2005, Nallatech.38
Polymorphic Computing
Application Data Types
SymbolicVector/StreamingBit Level
SW
EP
T
Effi
cie
ncy
Siz
e, W
eig
ht,
Energ
y, Perf
orm
ance
, Tim
e
FPGA Processor
Polymorphic
Commercial In Confidence. Copyright ©2005, Nallatech.39
Cell
Pro
cess
or
RP –
e.g
. C
lears
peed
Elix
ent,
Pic
och
ip
Support
for
FPU
?
Support
for
DSP
Polymorphic Computing
Application Data Types
SymbolicVector/StreamingBit Level
SW
EP
T
Effi
cie
ncy
Siz
e, W
eig
ht,
Energ
y, Perf
orm
ance
, Tim
e
FPGA Processor
Commercial In Confidence. Copyright ©2005, Nallatech.40
What is Polymorphic processor?
MicroprocessorDSP UnitFloating Point Operator
Integer OperatorLogical Operator
Commercial In Confidence. Copyright ©2005, Nallatech.41
A Polymorphic FPGA
» FPGA is the closest concept to Polymorphic computing» It can morph into the different operators» However it cannot perform them all with equal efficiency
» Is a polymorphic course grained FPGA Possible
= Polymorphic processing elementCan Morph into : -•Microprocessor•Integer Operator•DP/SP Floating Point Operator•Logical Operator•Text Operator
= Traditional FPGA Fabric
Commercial In Confidence. Copyright ©2005, Nallatech.42
Coarse Grain ArchitecturesVersus
Fine Grain Architectures
Commercial In Confidence. Copyright ©2005, Nallatech.43
Coarse vs Fine Grain
» Cluster = ultimate in coarse grained parallelism» ASIC = Ultimate in fine grain parallelism» FPGA = Programmable Fine grain parallelism» The Finer the grain, the more you can make the
architecture exactly fit the problem.» However Fine Grain Programmable FPGA are a
sub-optimal solution as they suffer from» An inefficient transistor utilisation on coarser grain
operations» A slower clock frequency that could be improved with
coarser granularity
Commercial In Confidence. Copyright ©2005, Nallatech.44
Distributed Parallel Processing (DPP)
VsCluster Parallel Processing
(CPP)
Commercial In Confidence. Copyright ©2005, Nallatech.45
DPP & CPP Definitions
» Processing Power is distributed to where it is needed
» Direct Communications built as needed
» Computing Architecture designed to fit the Application
Gain + Offset Control Convolution
Image Composition
Sensor Interface
Target Image Generation
Background Image Processing
T9FPGA I860
x6 x4x2I860I860I860T9
FPGAFPGAFPGAFPGAFPGA
FPGAx2 FPGA
T9
FPGA
x3
x1
T9T9
T9
FPGA
C80
x2
x3
x4
T9
FPGAFPGA
C80C80C80
T9x3 T9T9
Gain + Offset Control Convolution
Image Composition
Sensor Interface
Target Image Generation
Background Image Processing
Gain + Offset Control Convolution
Image Composition
Sensor Interface
Target Image Generation
Background Image Processing
T9FPGA I860
x6 x4x2I860I860I860T9
FPGAFPGAFPGAFPGAFPGA
FPGAx2 FPGA
T9
FPGA
x3
x1
T9T9
T9
FPGA
C80
x2
x3
x4
T9
FPGAFPGA
C80C80C80
T9x3 T9T9
T9FPGA I860
x6 x4x2I860I860I860T9
FPGAFPGAFPGAFPGAFPGAT9
FPGA I860
x6 x4x2I860I860I860T9
FPGAFPGAFPGAFPGAFPGA
FPGAx2 FPGAFPGAx2 FPGA
T9
FPGA
x3
x1
T9T9T9
FPGA
x3
x1
T9T9
T9
FPGA
C80
x2
x3
x4
T9
FPGAFPGA
C80C80C80
T9
FPGA
C80
x2
x3
x4
T9
FPGAFPGA
C80C80C80
T9x3 T9T9
T9x3 T9T9
CommunicationsInfrastructure
= Server node
Distributed Parallel ProcessingCluster Parallel Processing
» Regular processor Architecture » Regular communications
Infrastructure» Application must be designed
to fit the computer architecture
Commercial In Confidence. Copyright ©2005, Nallatech.46
Application Implementation
» Example application consisting of 8 algorithms
» Need to map onto hardware for real-time implementation
» Algorithms each have different characteristics
Algorithm A
Algorithm B
Algorithm C
Algorithm D
Algorithm E
Algorithm F
Algorithm G
Algorithm H
Commercial In Confidence. Copyright ©2005, Nallatech.47
CommunicationsInfrastructure
Fitting Application to Cluster
Algorithm A
Algorithm B
Algorithm C
Algorithm D
Algorithm E
Algorithm F
Algorithm G
Algorithm H
Application Cluster
Commercial In Confidence. Copyright ©2005, Nallatech.48
Fitting Application to Distributed FPGA Computer
» VME Blade form-factor» Five high-density platform
FPGAs» High-speed external
analog interfaces» High-speed synchronous
SRAM memory» Gigabit Ethernet interface
Commercial In Confidence. Copyright ©2005, Nallatech.49
Application Implementation
» Example application consisting of 8 algorithms
» Need to map onto hardware for real-time implementation
» Algorithms each have different characteristics
Algorithm A
Algorithm B
Algorithm C
Algorithm D
Algorithm E
Algorithm F
Algorithm G
Algorithm H
Commercial In Confidence. Copyright ©2005, Nallatech.50
Same Application Implemented on FPGAs
FPGA
FPGA
VME FPGA
GBit E
thern
et
FPGA
FPGA
C(PicoBlaze uP)
Algorithm B
VHDL
Algorithm AAlgorithm D
C(MicroBlaze
uP)Algorithm CAlgorithm F
Verilog
Algorithm G
MATLAB or Simulink
Algorithm HAlgorithm E
Commercial In Confidence. Copyright ©2005, Nallatech.51
Communications network to connect algorithms
FPGA
FPGA
VME FPGA
GBit E
thern
et
FPGA
FPGA
B
N
N
B B
E R
R
N
N
B
B
B
N
N
B
N
N
N
R
N
R
B
R
N
R
Commercial In Confidence. Copyright ©2005, Nallatech.52
So, should Von-Neuman and Harvard Architectures be
retired?
Commercial In Confidence. Copyright ©2005, Nallatech.53
Should they Retire?
» Von Neumann and Harvard provide highly efficient use of silicon real estate whilst still being capable of executing any computational function.
» Therefore perhaps they should still live on» However this will be less and less in a hard chip
implementation» The Intelligent compiler will instantiate a Von-
neumann or Harvard like architecture when they are the most efficient way to execute an algorithm
Von-Neumann and Harvard will live on as part of the intelligence within tomorrow’s Compilers.
Commercial In Confidence. Copyright ©2005, Nallatech.54
Summary
» FPGAs for computing is not new» 12 Years accelerating maths functions» Floating Point & Tools make FPGAs viable
for HPC Community» No coherent Industry Standardisation » Code development WILL take longer» Significant potential savings
» Price/Performance» SWAP» Cost of Ownership
Commercial In Confidence. Copyright ©2005, Nallatech.55
And Finally……………
SGI & Nallatech have formally agreed on a Strategic
Collaborative ArrangementThis brings together 12 years of expertise in delivering Real FPGA computing solutions from Nallatech with the Global Shared Memory MPP
computing from SGI.Customers now have a path to scale from a
commodity cluster with FPGAs all the way up to a massive HPC system with thousands of
Processors and & thousands of FPGAs
Commercial In Confidence. Copyright ©2005, Nallatech.56
Thank You for your attention
www.nallatech.com
Copyright © 2005 Nallatech Limited. All rights reserved. Nallatech, the Nallatech logo, the triangles device and “The High Performance FPGA Solutions Company" are trademarks of Nallatech Limited. All other trademarks acknowledged.