1 fpga boards and fpga-based supercomputers. 2 ece 448 fpga and asic design with vhdl resources pci
TRANSCRIPT
1
FPGA Boardsand
FPGA-based Supercomputers
2ECE 448 FPGA and ASIC Design with VHDL
Resources
PCI
http://en.wikipedia.org/wiki/Peripheral_Component_Interconnect
PCI-X
http://en.wikipedia.org/wiki/PCI-X
Reconfigurable SupercomputingT. El-Ghazawi, K. Gaj, D. Buell, D. PointerTutorial at the Supercomputing 2005 conferencehttp://hpcl.seas.gwu.edu/openfpga/tutorial_html/index.html
3ECE 448 FPGA and ASIC Design with VHDL
FPGA Device Capacity Trends
Year1985
Xil
inx
Dev
ice
Com
ple
xity
XC200050 MHz1K gates
XC4000100 MHz
250K gates
Virtex200 MHz1M gates
Virtex-II 450 MHz8M gates
Spartan80 MHz
40K gates
Spartan-II200 MHz
200K gates
Spartan-3326 MHz5M gates
19911987
XC300085 MHz
7.5K gates
Virtex-E240 MHz4M gates
XC520050 MHz
23K gates
1995 1998 1999 2000 2002 2003
Virtex-II Pro450 MHz8M gates*
2004 2006
Virtex-4500 MHz
16M gates*
Virtex-5550 MHz
24M gates*
Source: http://class.ece.iastate.edu/cpre583/lectures/Lect-01.ppt
4
FPGA Boards
5ECE 448 FPGA and ASIC Design with VHDL
General Architecture of an FPGA-Based Board
BU
S
ProcessingElement(PE#0)
ProcessingElement(PE#1)
ProcessingElement(PE#N-1)
COMMON MEMORY / INTERCONNECT NETWORK
LOCALMEMORY
LOCALMEMORY
LOCALMEMORY
CLK
BUS INTERFACE CONTROLLER
I/O CARD
6ECE 448 FPGA and ASIC Design with VHDL
Reconfigurable Computing Boards (Accelerators)
Boards may have one or several interconnected FPGA chips
Support different bus standards, e.g. PCI, PCI-X, VME
May have direct real-time data I/O through a daughter board
Boards may have local onboard memory (OBM) to handle large data while avoiding the system bus (e.g. PCI) bottleneck
7ECE 448 FPGA and ASIC Design with VHDL
Many boards per node can be supported
Host program (e.g. C) to interface user (and P) with board via a board API
Driver API functions may include functionalities such as Reset, Open, Close, Set Clocks, DMA, Read, Write, Download Configurations, Interrupt, Readback
Reconfigurable Computing Boards (Accelerators)
8ECE 448 FPGA and ASIC Design with VHDL
Common Interface - PCI
PCI = Peripheral Component Interconnect
32-bit bus 64-bit bus
9ECE 448 FPGA and ASIC Design with VHDL
PCI - Conventional hardware specifications
32-bit or 64-bit bus width
33.33 MHz clock with synchronous transfers
peak transfer rate of 133 MB per second for 32-bit bus width (33.33 MHz × 32 bits × (1 byte ÷ 8 bits) = 133 MB/s)
peak transfer rate of 266MB/s for 64-bit bus width
32-bit address space (4 gigabytes)
32-bit port space (now deprecated)
5-volt signaling
10ECE 448 FPGA and ASIC Design with VHDL
PCI-X (PCI eXtended)
PCI-X doubles the width to 64-bit, revises the protocol, and increases the maximum signaling frequency to 133 MHz (peak transfer rate of 1014 MB/s)
PCI-X 2.0 permits a 266 MHz rate (peak transfer rate of 2035 MB/s) and also 533 MHz rate, expands the configuration space to 4096 bytes, adds a 16-bit bus variant and allows for 1.5 volt signaling
11ECE 448 FPGA and ASIC Design with VHDL
Some Reconfigurable Boards Vendors
ANNAPOLIS MICRO SYSTEMS, INC. (www.annapmicro.com) University of Southern California -USC/ISI (http://www.east.isi.edu). AMONTEC (www.amontec.com/chameleon.shtml) XESS Corporation (www.xess.com) CELOXICA (www.celoxica.com) CESYS (www.cesys.com) TRAQUAIR (www.traquair.com) SILICON SOFTWARE: (www.silicon-software.com) COMPAQ: (www.research.compaq.com/SRC/pamette/) ALPHA DATA: (www.alpha-data.com) Associated Professional Systems: (www.associatedpro.com) NALLATECH: (www.nallatech.com)
12
Representative Example Boards From Annapolis Micro Systems (AMI)
http://www.annapmicro.com&
Nallatechhttp://www.nallatech.com
13ECE 448 FPGA and ASIC Design with VHDL
Source: [AMS02]
14ECE 448 FPGA and ASIC Design with VHDL
WILDSTAR TM II for VME
Copyright Annapolis Micro Systems, Inc. 2002
PE 2VIRTEX TM II
XC2V 6000, 8000
Backplane I/OP0
Backplane I/OP2
DDR2SRAM
2, 4 MB
36 DDR2SRAM
2, 4 MB
DDR2SRAM
2, 4 MB
36DDR2SRAM
2, 4 MB
DDR2SRAM
2, 4 MB
DDR2SRAM
2, 4 MB
36
36
36
ProgOsc
3
DDRSDRAM64 MB
32
88 88
PE 1VIRTEX TM II
XC2V 6000, 8000
I/O #1
DDR2SRAM
2, 4 MB
36 DDR2SRAM
2, 4 MB
DDR2SRAM
2, 4 MB
36DDR2SRAM
2, 4 MB
DDR2SRAM
2, 4 MB
DDR2SRAM
2, 4 MB
36
36
36
ProgOsc
3
DDRSDRAM64 MB
32
168
PE 0VIRTEX TM II
XC2V 6000, 8000
I/O #0
DDR2SRAM
2, 4 MB
36 DDR2SRAM
2, 4 MB
DDR2SRAM
2, 4 MB
36DDR2SRAM
2, 4 MB
DDR2SRAM
2, 4 MB
DDR2SRAM
2, 4 MB
36
36
ProgOsc
3
DDRSDRAM64 MB
32
168
172 172
36 36 3636
VME BUS
PCIController
32/64 Bits 33/66 MHz
32 32 32
32 32
172
PCI to VME Bridge
Flash
Flash Flash Flash
1616 16
MasterClock
Generator
PCLKMCLKICLK
16
DifferentialSingle Ended
Source: [AMS02]
15ECE 448 FPGA and ASIC Design with VHDL
WILDSTAR™ II Pro
Reproduced and displayed with permission
16ECE 448 FPGA and ASIC Design with VHDL
WILDSTAR™ II Pro
Reproduced and displayed with permission
17ECE 448 FPGA and ASIC Design with VHDL
Nallatech's BenNUEY-PCI-4E
18
Reconfigurable Supercomputers
19ECE 448 FPGA and ASIC Design with VHDL
Scalable Reconfigurable Systems
Large numbers of reconfigurable processors and microprocessors
Everything can be configured Functional units Interconnects Interfaces
High-level of scalability
Suitable for a wide range of applications
Everything can be reconfigured over and over at run time (Run-Time Reconfiguration) to suite underlying applications
Can be easily programmed by application scientists, at least in the same way of programming conventional parallel computers
20ECE 448 FPGA and ASIC Design with VHDL
Interface
P memory
P memory
. . .
P P . . .
I/O Interface
FPGA memory
FPGA memory
. . .
FPGA FPGA . . .
I/O
Microprocessor system Reconfigurable system
Early Reconfigurable Architecture
21ECE 448 FPGA and ASIC Design with VHDL
Current Reconfigurable Architecture
. . .
Shared Memory and or NIC
FPGA memory
FPGA
P memory
P
FPGA memory
FPGA
P memory
P
22ECE 448 FPGA and ASIC Design with VHDL
Possible Classes of Reconfigurable Supercomputers
μP Board RP Board
…μP 1 μP N …RP 1 RP N
Joint μP/RP Board
…μP 1 μP N …RP 1 RP N
Tighter Integration
Independent BoardDesign
Joint BoardDesign
23ECE 448 FPGA and ASIC Design with VHDL
Possible Classes of Reconfigurable Supercomputers – cont.
Tighter Integration
μP inside of RP
Design
RP inside of μP
Design
Joint μP/RP Board
μP 1 …RP 1
μP N
RP N
Joint μP/RP Board
RP 1 …μP 1
RP N
μP N
24ECE 448 FPGA and ASIC Design with VHDL
FPGA based supercomputers
Machine Released
SRC 6 fromSRC Computers
Cray XD1 fromfrom Cray
SGI Altix fromSGI
SRC 7 fromSRC Computers, Inc,
2002
2005
2005
2006
25
How to choose the system that best suits your needs?
Typical users’ criteria:
1. Clock speed
2. Amount of memory
3. Cost of Ownership
26
How to choose the system that best suits your needs?
Recommended users’ criteria:1. Tools
- right level of abstraction- ease of development & verification- progress & backward compatibility
2. Libraries- basic operations- examples of full applications
3. Technical support
27
How to choose the system that best suits your needs?
Recommended users’ criteria (cont.):
4. Data Bandwidth
Reconfigurable Processor
System
Psystem
externalI/O devices
28
How to choose the system that best suits your needs?
Recommended users’ criteria (cont.):
5. Scalability
- variable power and price - efficient communication among the modules
29
Recommended users’ criteria (cont.):
6. Transfer of control overhead
Theoreticalbehavior
Actualbehavior
P FPGA
time
P FPGA
Control transferoverhead
30
7. Reconfiguration overhead
P FPGA
Reconf A
Task A
Task A
Task A
P FPGA
Reconf A
Task A
Task B
Task C
Reconf B
Reconf C
P FPGA
Reconf A
Task A
Task B
Task C
Reconf B
Reconf C
31
7. Reconfiguration overhead (cont.)
P FPGA 1
Reconf A
Task A
Task C
Reconf C
FPGA 2
Reconf B
Task B
32
8. Number of FPGAs & number of microprocessors
9. Clock speed - maximum - variable vs. fixed
10. Amount of memory
Recommended users’ criteria (cont.):
33
ProgrammingReconfigurable
Computers
34
SRC Programming Model
Microprocessor FPGA
main.c
function_1()
function_2()
ANSI C
function_1
function_2
macro_1(a, b, c)
macro_2(b, d)macro_2(c, e)
macro_3(s, t)
macro_1(n, b)macro_4(t, k)
FPGA
Macro_1
Macro_2 Macro_2
a
b c
d eMAP C(subset of ANSI C)
I/O
I/O
Libraries of macros
VHDL
macro_1 macro_2macro_3 macro_4……………………….
35
C function for P
C function for MAP
VHDLmacro
SRC Program Partitioning
P system
FPGA system
HLL
HDL
36
SRC Compilation Process
Objectfiles
Application sources Macro sources
MAP CompilerP Compiler
Logic synthesis
Place & Route
Linker
.v files
.bin files
.ngo files
.o files .o files
Applicationexecutable
Configurationbitstreams
HDLsources
Netlists
.c or .f files .vhd or .v files
Logic synthesis
Place & Route
Linker
.v files
.bin files
.ngo files
HDLsources
. or.mc or .mf files
37
Library
Object
Sheets
StarStar Bridge Programming Environment - Viva
38
Place & Route
.bin files
.ngo files
Applicationexecutable
Configurationbitstreams
Netlists
Star Bridge Compilation Process
VIVA
Graphical User Interface
User input
Xilinx
39
Cray XD1 Programming Flows
Source: [Cray, MAPLD05]
Synthesis
process (a, m) isbegin z <= a and m;end process;
intmask(a, m){
return (a & m);}
VHDL/Verilog Synthesis
Mitrion-C
VHDL,Verilog
Mentor GraphicsSynopsysSynplicity
Xilinx
a
mz
01001011010101010101101010010100010101101010100101010101
MATLAB/Simulink
The MathWorks
StandardFlow
Mitrion
High-levelFlow
SystemGenerator
Xilinx
Xilinx
Place & Route
Gate-level EDIF
VHDL or Verilog
40
Xtreme DSP Design Flow
41
HDL-based SGI Altix Programming Flow
IA-32 Linux
Machine
Design iterations
Design Entry(Verilog, VHDL)
Design Synthesis(Synplify Pro,
Amplify)
Design Implementation
(ISE)
Design Verification
Behavioral Simulation(VCS, Modelsim)
Static Timing Analysis(ISE Timing Analyzer)
.v, .vhd.v, .vhd
.edf
.ncd, .pcf
.bin
MetadataProcessing
(Python)
.v, .vhd
.cfg
Altix Device Programming(RASC Abstraction Layer,
Device Manager, Device Driver)
Real-time Verification
(gdb)
.c
42
IA-32 Linux
Machine
RTL Generation and Integration with Core Services
Design Synthesis(Synplify Pro,
Amplify)
Design Verification
Behavioral Simulation(VCS, Modelsim)
Static Timing Analysis(ISE Timing Analyzer)
.v, .vhd
.v, .vhd
.edf
.ncd, .pcf
.bin
MetadataProcessing
(Python)
.v, .vhd
.cfg
Altix Device Programming(RASC Abstraction Layer,
Device Manager, Device Driver)
Real-time Verification
(gdb)
.c
Design Implementation(ISE)
HLL Design Entry(Handel-C, Mitrion C, Viva)
HLL-based SGI Altix Programming Flow
43
Mitrion-C Programming Model for Cray & SGI
Microprocessor FPGA
main.c
function_1(in1)start_fpga()
ANSI Cbased on Mitrion
API
FPGA
I/O
RAM
Application code
(platform independent)
Mitrion Distributed Processor Architecture(platform dependent)
Mitrion Compiler& Configurator
application on the
distributed processor
Input &output
Mitrion-C
VHDL
function_1(in2)start_fpga()
44
Hardware
Software
GraphicalData FlowDiagram
HLLHDL
Increased productivity
Increased capability to describe parallel execution
Program Entry for FPGA Accelerator Boards
Traditional
Extended(e.g.Corefire) Hardware
Software
45
Increased productivity
Increased capability to describe parallel execution
Star Bridge Hardware
Software
porting EDIF
COMobjects
Program Entry for Reconfigurable Computers
Hardware
SoftwareSRC
HLLHDLGraphical Data FlowDiagram
HDL macros
46
Increased productivity
Increased capability to describe parallel execution
CrayXD1withSimulink Hardware
Software
Program Entry for Reconfigurable Computers
Hardware
SoftwareSGIor CraywithMitrion
HLLHDLGraphical Data FlowDiagram
Mitrion Processor
Mitrion-C
Xilinx System Generator
Simulink