bittware overview march 2007. corporate overview hardware technical overview software technology...
TRANSCRIPT
BittWare OverviewMarch 2007
• Corporate Overview• Hardware Technical Overview• Software Technology Overview• Demo
Agenda
Who is BittWare?
A leading COTS signal processing vendor, focused on providing:
the the ““Essential building blocksEssential building blocks” ”
((DSPs, FPGAs, I/O, SW Tools, IP, & Integration) that our
customers can use to build “Innovative solutionsInnovative solutions””
BittWare Corporate Overview
• Private company founded in 1989 Founded by Jim Bittman (hence the spelling)
• Essential Building Blocks for innovative Signal Processing Solutions Focused on doing one thing extremely well #2 in recognition for DSP Boards (source: VDC Merchant Board Survey 2004)
• Committed to providing leading edge, deployable products, produced with timely & consistently high quality Tens of Thousands of boards shipped 100’s of active customers
• Financially Strong: Profitable & Growing• Headquartered in Concord, New Hampshire, USA
Engineering/Sales Offices in: Belfast, Northern Ireland (UK) (Formally EZ-DSP, acquired Sept. 2004) Leesburg, Virginia (Washington DC) Phoenix, Arizona
• 15 International Distributors Representing 38 countries
BittWare’s Building Blocks
• High-end Signal Processing Hardware (HW) Altera FPGAs & TigerSHARC DSPs High Speed I/O Board Formats:
CompactPCI (cPCI), PMC, & PCI VME Advanced Mezzanine Card (AMC or AdvancedMC)
• Silicon & IP Framework SharcFIN ATLANTiS
• Development Tools BittWorks Trident
• Systems & Services
BittWare Business Model & Markets
Application-specificProducts• System Integration• Custom FPGA Design
Interfacing Processing
• Tailored Signal Processing Boards
• Specialized/Custom I/O• Application Software
integration/implementation• Technology & Intellectual
Property Licensing
COTSProducts• Signal Processing HW
Altera FPGAs TigerSHARC DSPs High Performance I/O
• Development/Deployment PCI; PMC; cPCI VME AdvancedMC (AMC)
• Silicon & IP Frameworks SharcFIN ATLANTiS
• Development Tools BittWorks Tools Function Libraries Trident MP-RTOE
Markets• Defense/Aerospace
• Communications
• High-End Instrumentation
• Life Sciences
BittWare provides essential building blocks for innovative signal processing solutions at every stage of the OEM Life-cycle
• Hybrid Signal Processing• T2 Family
SharcFINATLANTiST2 Boards (PCI, PMC, cPCI,
VME)• GT and GX Family
FINeGT Boards (cPCI, VME)GX Boards (AMC, VME)
Hardware Technology Overview
Hybrid Signal Processing Concept
Programmable
DSP(s)
I/O Interface I/O Interface
Inter-Processor Communications (IPC)
Pre-Processing Co-Processing Post-Processing
FPGA
Input Output
Hybrid Signal Processing Architecture
DSP#2
DSP#3
DSP#0
DSP#1
FPGA
Host/ControlBridge PCI bus
LVDS pairs -or- Single-Ended DIO
Command & Control Bus
GigE
Inte
rpro
cess
or C
omm
unic
atio
ns (
IPC
)
Memory Module
RS232/422 Interface
SerDes
Flash(64 MB)
Control PlaneHybrid Signal Processing
I/O Interfacing
Clusters of 4 ADSP-TS201S DSPs @ up to 600MHz 14,400 MFLOPS per cluster
Xilinx Virtex-II Pro FPGA interface/coprocessor ATLANTiS™ Architecture: up to 8.0 GB/sec I/O
2 Links per DSP off-board @ 125MB/sec each routed via FPGA to DIO/RocketIO SERDES
Ring of link ports interconnected within cluster SharcFIN ASIC (SFIN-201) providing:
64-bit, 66 MHz PCI bus 8MB Boot Flash FPGA Control Interface
PMC+ expansion site(s) Large shared SDRAM memory (up to 512MB)
TigerSHARC multiprocessing boards for ultra-high performance applications, using a common
architecture across multiple platforms and formats
BittWare’s T2 Board Family
64-bit, 66 Mhz PCI Local Bus
Ser
des
TS201#1
TS201#2
TS201#4
TS201#3
SDRAM(SO-DIMM up to 512MB)
L2
L2
L2
L2L3
L3
L3
L3
SharcFINSF201
Boot Flash
8-bit Bus
4 x
L0
8 Ints&8 FlagsR
ocke
tIO
(8 C
ha
nne
ls)
8 Ints&8 Flags
4 x
L1
64-b
it, 8
3.3
Mhz
Clu
ster
l Bus
DIO
(64-
19
2 p
ins)
T2 Architecture Block Diagram
ATLANTiSFPGA
Up to 3 separate 64-pin DIO (Digital I/O) ports can be used to implement Link ports, parallel buses, and/or other interconnects
ATLANTiS FPGA implements link routing• Configured & controlled via SharcFIN• Access via TigerSHARCs and Host• Can also be used for pre/post-processing
SharcFIN-201 Bridge provides powerful, easy to use PCI/Host command & control interface
8 channels of RocketIO SerDes @ 2.5 GHz each• Each channel provides ~250 MB/sec both ways• Total I/O bandwidth is 4.0 GB/sec• Connected via two 4x Infiniband-type HW, or backplane
8 Full-duplex Link Ports from DSPs to FPGA (2 from each DSP):• Each link provides
• 125 MB/sec Transmit• 125 MB/sec Receive
• Total I/O bandwidth = 2.0 GB/secBasic Architecture is the same as before (HH & TS) except the two I/O links per DSP are routed (transferred) via ATLANTiS FPGA
• 64/66MHz PCI bus master Interface (rev. 2.2) 528MB/sec burst 460MB/sec sustained writes (SF to SF) 400MB/sec sustained reads (SF to SF)
• Cluster bus interface to ADSP-TS201s @ 83.3MHz• Access DSP internal memory & SDRAM from PCI• 2 independent PCI bus mastering DMA engines• 6 independent FIFOs (2.4KB total)
2 for PCI target to/from DSP DMA (fly-by to SDRAM) 2 for PCI target to/from DSP internal memory 2 for PCI bus mastering DMA to/from DSP DMA
• General purpose peripheral bus 8-bits wide, 22 address bits, 16MB/sec Reduces cluster bus loading, increasing cluster bus speed Accessible from DSP cluster bus & PCIbus Flash interface for DSP boot & non-volitile storage
• I2O V1.5 compliant• I2S serial controller• Programmable interrupt & flag multiplexer
10 inputs; 7 outputs 1 inputs/1output dedicated to PCI
• Extensive SW support via BittWorks HIL & DSP21K
SFIN -201
SharcFIN 201 Features
SharcFIN-201 Block Diagram
What is ATLANTiS?
A Generic FPGA Framework for I/O, Routing & Processing
• An I/O routing device in which every I/O can be dynamically connected to any other I/O! Like a Software programmable ‘cable’ – but better! ATLANTiS provides communication between the TigerSHARC link ports and
all other I/Os connected to the FPGA/Board Off-board I/O defined by board architecture
Communication can be point-to-point, or broadcast to various outputs Devices can be connected or disconnected as requirements dictate w/o
recompiling or changing cables
• A configurable FPGA Pre/Post/Co-Processing engine Standard IP blocks Customer/Custom developed blocks
T2 ATLANTiS Detail DiagramExternal I/O & connectors dependant on specific board implementation
PeripheralControl Bus
TigerSHARC Cluster Bus
TigerSHARCTS-201
#2
TigerSHARCTS-201
#1
TigerSHARCTS-201
#4
TigerSHARCTS-201
#3
PCI Bus
SharcFIN
64-bit DIO Port
64-bit DIO Port (optional)
64-bit DIO Port (optional)
RocketIO SerDes RocketIO SerDes
L0 L1 L0 L1 L0 L1 L0 L1
SD
RA
M
ATLANTiS FPGA
PMC+
Off-Board
Cluster-to-Cluster
8 x 8 ATLANTiS Switch Diagram
ConfigurationRegisters
Control Bus
128 128 128 128 128 128 128 128
128
IN0
128
IN1
128
IN2
128
IN3
128
IN4
128
IN5
128
IN6
128
IN7
OUT0
OUT1
OUT2
OUT3
OUT4
OUT5
OUT6
OUT7
8 x 8 Switch
Other Major ATLANTiS Components
Output FIFO Buffer
Processing
Co-Processing Block
Post-ProcessingBlock
SerDes Receiver
Packet Protocal
Input FIFO Buffer
Through RoutingBlock
TS201 LinkPort Receiver
Link Port Intf. Circuit
Input FIFO Buffer
Must be used as a pair to
same endpoint
Null Receiver
Null Intf. Circuit
Input FIFO Buffer
Pre-ProcessingBlock
Co-Processing Block
Pocessing
Input FIFO Buffer
Output FIFO Buffer
Null Intf. Circuit
Null Transmitter
Output FIFO Buffer
Link Port Intf. Circuit
TS201 LinkPort Transmitter
Output FIFO Buffer
Packet Protocal
SerDes Transmitter
PeripheralControl Bus
TigerSHARC Cluster Bus
TigerSHARCTS-201
#2
TigerSHARCTS-201
#1
TigerSHARCTS-201
#4
TigerSHARCTS-201
#3
PCI Bus
SharcFIN
64-bit DIO Port
64-bit DIO Port (optional)
64-bit DIO Port (optional)
SerDes SerDes
L0 L1 L0 L1 L0 L1 L0 L1
SD
RA
M
ATLANTiS FPGA
ConfigurationRegisters
IN0
IN1
IN2
IN3
IN4
IN5
IN6
IN7
OUT0
OUT1
OUT2
OUT3
OUT4
OUT5
OUT6
OUT7
Link
LinkCDR
CDRLink
Link
8 x 8 Switch
PMC+
Off-Board
Cluster-to-Cluster
*Links, DIO, & SerDes are now routed by Switch
ATLANTiS Put Together
How is Used?
FPGA Configuration1) BittWare Standard Implementations (Loads)
Works out-of-the-box (doesn’t require any FPGA design capabilities) Fixed interfaces & connections define switch I/Os Variety of I/O configuration options are available with boards
2) Developer’s kit Fully customizable (by BittWare and/or end user) All component cores in kit Requires FPGA Development Tools & design capabilities
Run-Time Set-up and Control1) Powerful, easy to use GUI (Navigator)
Set up for any and all possible routings2) Use DSP or Host to program Control Registers
Initial configuration Change routing at any time by re-programming Control Registers
ATLANTiS Configurator
T2PC: Quad PCI TigerSHARC BoardT2PM: Quad PMC TigerSHARC BoardT26U: Octal 6U cPCI TigerSHARC BoardT2V6: Octal 6U VME TigerSHARC Board
T2 Board Family
• One Cluster of Four ADSP-TS201S TigerSHARC® DSPs processors running at 600 MHz each
- 24 Mbits of on chip SRAM per DSP
- Static Superscaler Architecture
- Fixed or Floating point operations• 14.4 GFLOPS (floating point) or 58 GOPS (16-bit) of DSP
Processing power • Xilinx Virtex-II Pro FPGA interface/coprocessor• ATLANTiS Architecture: up to 4.0 GB/sec I/O
- Eight external link ports @ 250MB/sec each
- Routed via Virtex-II Pro
- RocketIO SerDes Xcvrs, PMC+, DIO headers• Two link ports per DSP dedicated for interprocessor
communications• Sharc®FIN (SFIN201) 64/66 PCI interface• PMC site with PMC+ extensions for BittWare’s PMC+ I/O
modules• 64 MB-512 MB SDRAM • 8 MB FLASH memory (boots DSPs & FPGA) • Complete software support, including remote control and
debug, support for multiple run-time and host operating systems, and optimized function libraries
• Standalone operation
T2-PCI Features
T2PC Block Diagram
PMC+
DIO
Hea
de
r6
4 S
ign
als
TS201#1
TS201#2
TS201#4
TS201#3
SDRAM(SO-DIMM up to 512MB)
L2
L2
L2
L2L3
L3
L3
L3
SharcFINSF201
JTA
G H
eade
r
Boot Flash
8-bit Bus
4 x
L0
PCI-PCIBridge
Ext
. Pow
er
8 Ints &8 Flags
64-bit, 66 Mhz PCI Local Bus
64
VirtexII-Pro
J4
DIO
Hea
de
rs2
@ 2
0 s
ign
als
Roc
ket I
/O(8
Cha
nnel
s)
8 Ints &8 Flags
PC
I Con
n.
4 x
L1
64-b
it, 8
3.3
Mhz
Clu
ster
l Bus
Serdes
• One Cluster of Four ADSP-TS201S TigerSHARC® DSPs processors running at up to 600 MHz each
- 24 Mbits of on chip SRAM per DSP
- Static Superscaler Architecture
- Fixed or Floating point operations• 14.4 GFLOPS (floating point) or 58 GOPS (16-bit) of DSP
Processing power • Xilinx Virtex-II Pro FPGA interface/coprocessor• ATLANTiS Architecture: up to 4.0 GB/sec I/O
- Eight external link ports @ 250MB/sec each
- Routed via Virtex-II Pro
- RocketIO SerDes Xcvrs, PMC+, DIO header• Two link ports per DSP dedicated for interprocessor
communications• Sharc®FIN (SFIN201) 64/66 PCI interface• PMC format with BittWare’s PMC+ extensions• 64 MB-256 MB SDRAM • 8 MB FLASH memory (boots DSPs & FPGA) • Complete software support, including remote control and debug,
support for multiple run-time and host operating systems, and optimized function libraries
• Standalone operation
T2PM Features
T2PM Block DiagramJ1-3
Serdes
TS201#1
TS201#2
TS201#4
TS201#3
SDRAM (up to 256MB)
L2
L2
L2
L2L3
L3
L3
L3
SharcFINSF201
Boot Flash
8-bit Bus
4 x
L0
8 Ints&8 Flags
64-bit, 66 Mhz PCI Local Bus
VirtexII-Pro J4
Roc
ket I
/O(8
Cha
nnel
s)
8 Ints&8 Flags
PM
C C
onn.
4 x
L1
64-b
it, 8
3.3
Mhz
Clu
ster
l Bus
PM
C+
Con
n.
J1-3
Fro
nt
Pan
el
JTA
G H
eade
r(o
ptio
nal)
64
FPGA
• Two Clusters of Four ADSP-TS201S TigerSHARC® DSPs processors (8 total) running at 500 MHz each
- 24 Mbits of on chip SRAM per DSP
- Static Superscaler Architecture
- Fixed or Floating point operations
• 24 GFLOPS (floating point) or 96 GOPS (16-bit) of DSP Processing power
• Two Xilinx Virtex-II Pro FPGA interface/coprocessors
• ATLANTiS Architecture: up to 6.0 GB/sec I/O
- Sixteen external link ports @ 250MB/sec each-
Routed via Virtex-II Pro
- RocketIO SerDes Xcvrs, PMC+, DIO (Cross-cluster)
• Two link ports per DSP dedicated for interprocessor communications
• Sharc®FIN (SFIN201) 64/66 PCI interface
• Two PMC sites with PMC+ extensions for BittWare’s PMC+ I/O modules
• 128 MB-512 MB SDRAM
• 16 MB FLASH memory (boots DSPs & FPGAs)
• Complete software support, including remote control and debug, support for multiple run-time and host operating systems, and optimized function libraries
• Standalone operation
T26U cPCI Features
T26U Block Diagram
4 x
L1
PMC+AA
Rocket I/O(4 Channels)
TS201#1
TS201#2
TS201#4
TS201#3
SDRAM (up to 256MB)
L2
L2
L2
L2L3
L3
L3
L3
SharcFINSF201
JTA
G H
ea
de
r
Boot Flash
4 x
L0
PCI-PCIBridge
8 Ints&8 Flags
64-bit, 66 Mhz PCI Local Bus
64
J4
Rear Panel DIO(64 Signals)
8 Ints&8 Flags
4 x
L1
64-b
it, 8
3.3
Mhz
Clu
ster
l Bus
PMC+BB
Rocket I/O(4 Channels)
TS201#1
TS201#2
TS201#4
TS201#3
SDRAM (up to 256MB)
L2
L2
L2
L2L3
L3
L3
L3
SharcFINSF201
Boot Flash
PCI-PCIBridge
8 Ints&8 Flags
64-bit, 66 Mhz PCI Local Bus
64
J4
Rear Panel DIO(64 Signals)
8 Ints&8 Flags
64-b
it, 8
3.3
Mhz
Clu
ster
l Bus
4 x
L0
PCI-PCIBridge
CPCI 64/66
Cluster A Cluster B
64
6464
8-bit bus8-bit bus
High-Speed Serdes High-speed Serdes
Rocket I/O(4 Channels)
Rocket I/O(4 Channels)
Hig
h-sp
eed
Ser
des
FPGA FPGA
• Two Clusters of Four ADSP-TS201S TigerSHARC® DSPs processors (8 total) running at 500 MHz each
- 24 Mbits of on chip SRAM per DSP- Static Superscaler Architecture- Fixed or Floating point operations
• 24 GFLOPS (floating point) or 96 GOPS (16-bit) of DSP Processing power
• Two Xilinx Virtex-II Pro FPGA interface/coprocessor• ATLANTiS Architecture: up to 8.0 GB/sec I/O
- Sixteen external link ports @ 250MB/sec each- Routed via Virtex-II Pro- RocketIO SerDes Xcvrs, PMC+, DIO (Cross-
cluster)• Two link ports per DSP for interprocessor ring• Sharc®FIN (SFIN201) 64/66 PCI interface• Tundra TSI-148 PCI-VME bridge with 2eSST support• VITA-41 VXS Switched-Fabric Interface• PMC site with PMC+ extensions for BittWare’s PMC+ I/O
modules• 128 MB-512 MB SDRAM • 16 MB FLASH memory (boots DSPs & FPGAs) • Complete software support, including remote control and
debug, support for multiple run-time and host operating systems, and optimized function libraries
• Standalone operation
T2 6U VME/VXS Features
T2V6 Block Diagram
T2V6 Block Diagram
4 x
L1
PMC+
VXS/P0(8 Channels)
TS201#0
TS201#1
TS201#3
TS201#2
SDRAM(up to 256MB)
L2
L2
L2
L2L3
L3
L3
L3
SharcFINSF201
JTA
G H
ea
de
r
4 x
L0
8 Ints&8 Flags
64
J4
8 Ints&8 Flags
4 x
L1
64-b
it, 8
3.3
Mhz
Clu
ster
l Bus TS201
#0
TS201#1
TS201#3
TS201#2
SDRAM(up to 256MB)
L2
L2
L2
L2L3
L3
L3
L3
SharcFINSF201
BootFlash
8 Ints&8 Flags
64-bit, 66 Mhz PCI Local Bus
8 Ints&8 Flags
64-b
it, 8
3.3
Mhz
Clu
ster
l Bus
4 x
L0
VME-PCIBridge
VME64/2eSST
Cluster A Cluster B64
64
8-bit bus8-bit bus
High-Speed Serdes High-speed Serdes
RocketIO(4 Channels)
RocketIO(4 Channels)
Hig
h-sp
eed
Ser
des
Hig
h-sp
eed
Ser
des
FactoryOptions
P2 User Pins
64
BootFlash
4444 32
FPGA FPGA
T2V6 Heat Frame - Transparent
T2V6 Heat Frame
T2V6 Thermal Model
BittWare Levels of Ruggedization
Characteristics Commercial Level 1 Level 1c Level 2c Level 3cType Air-Cooled Air-Cooled Air-Cooled Conduction-Cooled Conduction-CooledTemperature
Operating 0C to 50C w/ 300 lin.ft/min airflow
-40C to 75C w/ 300 lin.ft/min airflow
-40C to 75C w/ 300 lin.ft/min airflow
-40C to 75C at Thermal Interface
-40C to 85C at Thermal Interface
Storage -55C to 100C -55C to 100C -55C to 100C -55C to 100C -55C to 100C
Mechanical
VibrationRandom; 0.01g2/hz
15Hz to 2kHz Random; 0.04g2/hz
15Hz to 2kHz(per MIL-STD-810E)
Random; 0.04g2/hz 15Hz to 2kHz(per MIL-STD-810E)
Random; 0.1g2/hz 15Hz to 2kHz(per MIL-STD-810E)
Random; 0.1g2/hz 15Hz to 2kHz(per MIL-STD-810E)
Shock20g peak sawtooth
11ms duration20g peak sawtooth
11ms duration20g peak sawtooth
11ms duration40g peak sawtooth
11ms duration40g peak sawtooth
11ms duration
Conformal Coating No No Yes Yes Yes
Humidity0 to 95%
non-condensing
0 to 95%non-condensing
0 to 100%condensing
0 to 100%condensing
0 to 100%condensing
BittWare Levels of Ruggedization
• PMC+ Extensions
• Barracuda High-Speed 2-ch ADC
• Tetra High-Speed 4-ch ADC
Hardware Technology Overview
• BittWare’s PMC+ boards are an extension of the standard PMC specification (user-defined J4 connector)
• Provides tightly coupled I/O and processing to BittWare’s DSP boards:
Hammerhead Family4 links, Serial TDM, flags, irqs, reset, I2C
TS Family4 links, flags, irqs, reset, I2C
T2 Family64 signals, routed as 32 diff pairs to
ATLANTiS
Standard use is 4 links, plus flags and irqs
Can be customized for 3rd party PMCs
BittWare PMC+ Extensions
• 2 channel 14 bit A/D, 105 MHz (AD6645)• 78 dB SFDR; 67 dB SNR (real-world in-system performance)• AC (transformer) or DC (op-amp) coupled options• 64 bit, 66 MHz bus mastering PCI interface via SharcFIN• 64 MB- 512 MB SDRAM for large snapshot acquisitions• Virtex-II 1000 FPGA reconfigurable over PCI
used for A/D control and data distribution configurable preprocessing of high speed A/D data, such as digital filtering,
decimation, digital down conversion, etc. Developer’s kit available with VHDL source code Optional IP cores and integration from 3rd Parties for
DDR/DDC/SDR/comms applications Plethora of other IP cores available
• PMC+ links (4) in FPGA configurable for use with Hammerhead or Tiger PMC+ carrier boards
• Internal/external clock and triggering• Optional oven controlled oscillator/high stability clock• Onboard programmable clock divider & decimator• Large Snapshot acquisition to SDRAM (4K- 256M samples)
1 ch @ 105 MHz 2 ch @ 75 Mhz
• Continuous acquisition 2 ch @ 105 Mhz to TigerSHARC links 1 ch @ 105 Mhz or 2 ch @ 52.5 Mhz to PCI (system dependent)
Barracuda PMC+ Features
Barracuda PMC+ Block Diagram
Tetra PMC+ Features
• 4 channel 14 bit A/D, 105 MHz (AD6645)• 78 dB SFDR; 67 dB SNR (real-world in-system
performance)• DC (op-amp) coupled• 32 bit, 66 MHz bus mastering PCI interface via
SharcFIN• Cyclone-II 20/35/50 FPGA reconfigurable over PCI
used for A/D control and data distribution configurable preprocessing of high speed A/D
data, such as digital filtering, decimation, digital down conversion, etc.
Developer’s kit available with VHDL source code
Optional IP cores and integration from 3rd Parties including DDC
• PMC+ links (4) in FPGA configurable for use with TigerSHARC/ATLANTiS
• Internal/external clock and triggering Can source clock for chaining
• Onboard programmable clock divider & decimator
Tetra PMC+ (TRPM) Block Diagram
Use
r D
efin
ed P
ins
(P4
Con
nect
or)
PM
C In
terf
ace
(P1
– P
3 C
onne
ctor
s)
FPGACyclone II(EP2C20/35)
14
ADC105 MHz
ADC105 MHz
ADC105 MHz
ADC105 MHz
14
14
14
Clock Driver
Clk In
Trig In/Clk Out
XO
Ch.3
FactoryOption
Ch.2
Ch.1
Ch.0Link 0
Link 1
Link 2
Link 3
Flags/Ints
32-bit, 66 MHz PCI bus
• New FINe• New ATLANTiS
Hardware Technology Overview
FINe Host Interface Bridge
text
UART
RS232/422PHY
Clu
ster
Bus
I/F
PC
I Bus
I/F
Avalon Bus
NIOS II
UART (2x)
Boot FLASH
SDRAM
64-bit, 83 MHz Cluster Bus 32-bit, 66MHz PCI Bus
RS232/422PHY
PCIexpBridge
(1x)
to ATLANTiS
GigE; 10/100
to DSPs & ATLANTiS
Cyclone™ II
Peripheral I/F
Flags to/from DSPs (2 per)
Interrupts to/from DSPs (2 per)
8
8
Host/Control Side(Control Plane)
Signal Processing Side(Data Plane)
ATLANTiS FPGA
IN0
IN1
IN2
IN3
IN4
IN5
IN6
IN7
OUT0
OUT1
OUT2
OUT3
OUT4
OUT5
OUT6
OUT7
Link
LinkCDR
CDRLink
Link
8 x 8 Switch
TigerSHARC Cluster Bus
TigerSHARCTS-201
#2
TigerSHARCTS-201
#1
TigerSHARCTS-201
#4
TigerSHARCTS-201
#3
PC
I Bu
s
64-bit DIO Port
L0 L1 L0 L1 L0 L1 L0 L1
MemoryModule
SharcFINeCluster Bus I/FCo-
ConfigurationRegisters
Co-
Pro
cess
ing
DDR Controller
DMA
SerDes SerDes
New ATLANTiS - Putting it all Together
FINe
New Product Families
• B2 Family• B2AM
• GT Family• GT3U-cPCI • GTV6-Vita41/VXS
• GX Family• GXAM
• Full-height, single wide AMC (Advanced Mezzanine Card) • ATLANTiS/ADSP-TS201 Hybrid Signal Processing cluster
• Altera Stratix II FPGA for I/O routing and processing• 4 ADSP-TS201S TigerSHARC® DSPs processors up to 600 MHz
- 57.5 GOPS (16-bit) or 14.4 GFLOPS (floating point) of DSP Processing power
• Fat Pipes & Common Options Interface for Data & Control• Module management Control Implementing IPMI
- Monitors temperature and power usage of major devices- Supports hot swapping
• SharcFINe bridge providing GigE and PCI Express• ATLANTiS provides Fat Pipes Switch Fabric Interfaces:
- Serial RapidIO™
- PCI Express- GigE, XAUI™ (10 GigE)
• System Synchronization via AMC system clocks• Front Panel I/O
• 10/100 ethernet• LVDS & General Purpose Digital I/O• JTAG port for debug support• FiberOptic Transciever @ 2.5GHz (optional)
• Booting of DSPs and FPGA via Flash nonvolatile memory
B2AM Features
B2-AMC Block Diagram
TigerSHARCADSP-TS201S
#1
TigerSHARCADSP-TS201S
#0
TigerSHARCADSP-TS201S
#2
TigerSHARCADSP-TS201S
#3
AMCEdge Conn.
(B+)
GigE (Bx)
Network Interface
Fat
Pip
es
Com
mon
Opt
ions
TigerSHARC Linkports
MMC(AtMega16)
IPMI
TemperatureMonitoring
AMCFront Panel
LEDs
switch
Sys.Clks
1x PCIe
FLASHSharcFINe
Bridge
24
ATLANTiSFPGA
Stratix II(EP2S60,90,or 130)
SERDESQuadPHY(PM8358)
24-bitGP-DIO
11-LVDS (5Rx; 5Tx;
1Clk)
RS-232
JTAGHeader
22
(sRIO, PCIe, ASI,
GigE, XAUI)
Fiber Xcvr
Front- or-Back
10/100bEthernet
GT Cluster Architecture
TigerSHARCTS-201
#2
TigerSHARCTS-201
#3
TigerSHARCTS-201
#0
TigerSHARCTS-201
#1
ATLANTiSStratix II GX2SGX90/130
SharcFINeBridge
Local PCI (32-bit, 66MHz)
32 LVDS pairs -or- 64 Single-Ended DIO
64-b
it, 1
00 M
Hz
GigE
12
Tig
erS
HA
RC
Lin
kPo
rts
Memory Module1GB of DDR2 or 64MB of QDR
RS232/422 Interface 2 Xcvrs
SerDes
Flash(64 MB)
64
BittWare Memory Module (BMM)
Top
BackSide
240-pin Connector to Carrier
•Convection or Conduction Cooled
•67 mm x 40 mm
•240-pin Connector
•160 usable signals (plus 80 power/ground)
- Capability to address TBytes
•Can be implemented today as:
- 1 bank of SDRAM up to 1GB (x64)
- 2 banks of SDRAM up to 512MB each (x32)
- 1 bank of SRAM up to 64MB (x64)
- 1 bank or SDRAM up to 512MB (x32) and 1 bank of SRAM up to 32MB (x32)
GT 3U cPCI FeaturesGT3U Features• Altera® Stratix® II GX FPGA for I/O, routing, and processing
• One cluster of four ADSP-TS201S TigerSHARC® DSPs- 57.5 GOPS 16-bit fixed point, 14.4 GFLOPS floating point processing power- Four link ports per DSP
- Two link ports routed to the ATLANTiS FPGA- Two link ports routed for interprocessor communications
- 24 Mbits of on-chip RAM per DSP; Static superscalar architecture
• ATLANTiS architecture- 4 GB/s of simultaneous external input and output- Eight link ports @ up to 500 MB/s routed from the on-board DSPs- 36 LVDS pairs (72 pins) comprised of 16 inputs and 20 outputs- Four channels of high-speed SerDes transceivers
• BittWare Memory Module- Up to 1 GB of on-board DDR2 SDRAM or 64 MB of QDR SDRAM
• BittWare’s SharcFINe PCI bridge- 32-bit/66 MHz PCI- 10/100 ethernet- Two UARTs, software configurable as RS232 or RS422- One link port routed to ATLANTiS
• 64 MB of flash memory for booting of DSPs and FPGA
• 3U CompactPCI form factor – Air Cooled or Conduction
• Complete software support
GT3U Block Diagram
TigerSHARCTS-201
#2
TigerSHARCTS-201
#3
TigerSHARCTS-201
#0
TigerSHARCTS-201
#1
ATLANTiSStratix II GX2SGX90/130
SharcFINeBridge
Loca
l PC
I (32
-bit,
66M
Hz)
36 LVDS pairs (72 Pins)
64-b
it, 1
00 M
Hz
Ethernet (10/100)(4 pins)
4T
iger
SH
AR
C L
inkP
orts
4x S
erD
es C
onne
ctor
(Inf
inib
and-
Typ
e)Memory Module
1GB of DDR2 or 64MB of QDR
Use
r D
efin
ed P
ins
(J2
Con
nect
or)
Com
pact
PC
I 32-
bit,
66M
Hz
Bus
(J1
Con
nect
or)
RS232/422 Interface 2 Xcvrs
(8 pins)
SerDes
Flash(64 MB)
16
20
VXS/VITA41(P0)
VME64 with 2eSST(P1 & P2)
User Defined Pins(P2)
VME-PCIBridgeTsi148
FactoryConfiguration
Memory Module1GB of DDR2 or 64MB of QDR
TigerSHARCTS-201
TigerSHARCTS-201
ATLANTiS AStratix-IIGX
2SGX90
TigerSHARCTS-201
TigerSHARCTS-201
ATLANTiS BStratix-IIGX
2SGX90
SharcFINeBridge
SharcFINeBridge
Local PCI (64-bit/66MHz)
64
64
96
64-bit, 83.3 MHz 64-bit, 83.3 MHz
Eth
erne
t (G
igE
)
8Hig
h-S
peed
Ser
ial P
orts
(S
erD
es)
4
TigerSHARC LinkPorts
PMC+PMC+
J4
PMC Front-Panel I/O only available on air-cooled versions
Memory Module1GB of DDR2 or 64MB of QDR
4
64
4
Cluster A Cluster B
Hig
h-S
peed
Ser
ial P
orts
(S
erD
es)
4x SerDes Connector(Infiniband-Type)
4
Eth
erne
t (G
igE
)
Hig
h-S
peed
Ser
ial P
orts
(S
erD
es)
Flash(64 MB)
Flash(64 MB)
GTV6 Block Diagram
Available Q2 2007Available Q2 2007
GT3U/GTV6 BittWare Levels of Ruggedization
Characteristics Commercial Level 1 Level 1c Level 2c Level 3cType Air-Cooled Air-Cooled Air-Cooled Conduction-Cooled Conduction-CooledTemperature
Operating 0C to 50C w/ 300 lin.ft/min airflow
-40C to 75C w/ 300 lin.ft/min airflow
-40C to 75C w/ 300 lin.ft/min airflow
-40C to 75C at Thermal Interface
-40C to 85C at Thermal Interface
Storage -55C to 100C -55C to 100C -55C to 100C -55C to 100C -55C to 100C
Mechanical
VibrationRandom; 0.01g2/hz
15Hz to 2kHz Random; 0.04g2/hz
15Hz to 2kHz(per MIL-STD-810E)
Random; 0.04g2/hz 15Hz to 2kHz(per MIL-STD-810E)
Random; 0.1g2/hz 15Hz to 2kHz(per MIL-STD-810E)
Random; 0.1g2/hz 15Hz to 2kHz(per MIL-STD-810E)
Shock20g peak sawtooth
11ms duration20g peak sawtooth
11ms duration20g peak sawtooth
11ms duration40g peak sawtooth
11ms duration40g peak sawtooth
11ms duration
Conformal Coating No No Yes Yes Yes
Humidity0 to 95%
non-condensing
0 to 95%non-condensing
0 to 100%condensing
0 to 100%condensing
0 to 100%condensing
BittWare Levels of Ruggedization
• Mid-size, single wide AMC (Advanced Mezzanine Card) Common Options region:
Port 0 GigE; Ports 1 ,2 & 3 connect to BittWare’s ATLANTiS framework
Fat Pipes region has eight ports: ports 4-11 configurable to support.
Serial RapidIO™, PCI Express™, GigE, and XAUI™ (10 GigE)
Rear panel I/O has eight ports (8 LVDS IN, 8 LVDS OUT) System synchronization via AMC system clocks (all connected)
• High-density Altera Stratix II GX FPGA (2S90/130) BittWare’s ATLANTiS framework for control of I/O, routing, and
processing• BittWare’s FINe bridge provides control plane processing and
interface GigE, 10/100 Ethernet, and RS-232
• Over 1 GByte of Bulk Memory Two banks of DDR2 SDRAM (up to 512 MBytes each) One bank of QDR2 SRAM (up to 9 MBytes)
• Front panel I/O 10/100 Ethernet, RS-232, JTAG port for debug support, 4x
SERDES providing: Serial RapidIO™, PCI Express™, GigE, and XAUI™ (10 GigE)
• BittWare I/O Module 72 LVDS pairs, 4x SerDes, Clocks, I2C, JTAG, Reset
• Booting of FINe and FPGA via Flash
GXAM Features
Available Q2 2007
Available Q2 2007
GXAM Block Diagram
AMCEdge Conn.
(B+)
GigE (Bx)
Fat
Pip
es/N
etw
ork
Inte
rfac
eC
omm
on O
ptio
ns
MMC(AtMega16)
IPMI
TemperatureMonitoring
AMCFront Panel
LEDs
switch
0
FLASH
RS-232
JTAGHeader
(sR
IO, P
CIe
xp,
Gig
E, X
AU
I, ...
)
4X Infiniband
Type Connector(optional)
10/100bEthernet
32-bit Control Bus
123
Por
t#:
4567
891011
76 LVDS pairs (38 In & 38 Out)
Sys. Clks
16 LVDS pairs (8 In & 8 Out)
DDR2 SDRAM(up to 512 MB)
DDR2 SDRAM(up to 512 MB)
36QDR2 SRAM(up to 9 MB)
Clocks, I2C, JTAG, Reset
FPGAStratix II GX(EP2SGX90/130)
Supported by:
ATLANTiS Framework
36
32
32
Ser
des Ser
des
Ser
des
RP I/O
Serdes Serdes(2SGX130 only)
1
3
2
FrontPanel I/O Module(optional)
FP I/OConnectors
(optional)
(can be whole width
of AMC Front Panel)
FINeBridge
PRELIMINARY
PRELIMINARY
Available Q2 2007Available Q2 2007
IFFM Features - Preliminary
• 2 channels of high-speed (HS) ADCs (AD9640: 14-bit, 150 MHz) with good SFDR specs (target is 80db)
dual package to better sync channels fast detect (non-pipelined upper 4 bits) helps for AGC control
• 2 channels of HS-DACs (AD9777: 16-bit; 400 MHz) dual package to better sync channels built-in up conversion interpolation of 1x, 2x, 4x, and 8x
• High performance Clock generation via PLL/VCO (AD9516) inputs reference clock (e.g. 10MHz) from front panel or Baseboard generates programmable clocks for HS-ADCs and HS-DACs source reference clock to Baseboard (for system distribution)
• General Purpose (GP) 12-bit ADCs & DACs GP-ADCs can be used for driving AGC on RF front-end GP-DACs can be used for other utility signal such as GPS, positions, ...
The IFFM is an IF transceiver on a Front-panel Module (FM) format. Combined with a GXAM, this forms an integrated IF/FPGA interface & processing AMC board
Available Q3 2007
Available Q3 2007
IFFM Block Diagram - Preliminary
BaseBoard Connector
Front Panel
Ref.ClkInput
Rx 1
Rx 2
Tx 1
Tx 2
Clock GenPLL/VCO
GP
DAC
GP
ADC
Ref.Clk Input
Ref.Clk Output
16-bit Input Bus
14-bit Output Bus
Command/StatusDual HS-ADC
14-bit;150 MHz(AD9640)
Dual HS-DAC
16-bit;160 MHz(AD9777)
Command/Status
SPI
SPI
Fast Detect for AGC
Available Q3 2007
Available Q3 2007
• BittWorks• TS Libs• Trident MPOE• GEDAE
Software Technology Overview
• Analog Devices Family Development Tools VisualDSP C++, C, Assembler, Linker, Debugger, Simulator, VDK Kernal JTAG Emulators (ADI/ White Mountain)
• BittWorks DSP21k Toolkit (DOS, Windows, LINUX & VxWorks) VDSP Target Remote VDSP Target & DSP21k Toolkit via Ethernet (combined in 8.0 Toolkit) Board Support Packages/Libraries & I/O GUIs SpeedDSP (ADSP-21xxx only - no TS) FPGA Developer’s Kits Porting Kit
• Function Libraries TS-Lib Float TS-Lib Fixed Algorithmic Design, Implementation, & Integration
• Real-Time Operating Systems BittWare’s Trident Enea’s OSEck
• Graphical Development Tools GEDAE MATLAB/SimuLink/RTW
Software Products
Software Products Diagram
DSP21k-SF Toolkit• Host Interface Library (HIL)
Provides C callable interface to BittWare boards from host system Download, upload, board and processor control, interrupts Symbol table aware, converts DSP based addresses Full featured, mature application programming interface (API) Supports all BittWare boards, including FPGA and I/O
• Configuration Manager (BwConfig) Find, track, and manage all BittWare devices in your system
• Diag21k – Command line diagnostic utility All the power of the HIL at a command prompt Built-in scripting language with conditionals and looping Assembly level debug with breakpoints stdio support (printf, etc).
• BitLoader Dynamically load FPGAs via PCI bus (or Ethernet) Reprogram FPGA configuration EEPROM
• DspBAD/DspTest Automated diagnostic tests for PCI, onboard memory, DSP memory & execution
• DspGraph Graphing utility for exploring board memory (Windows only)
BittWare Target
• Software Debug Target for VisualDSP++ VisualDSP++ source level debugging via PCI bus Supports most features of the debugger
• Only Software Target for COTS Sharc Boards Other board vendors require JTAG emulator for VisualDSP debug
• Multiprocessor Debug Sessions on All DSPs in a System Any processor in the system can be included in a debug session Not limited to the board-level JTAG chain
• Virtually Transparent to Application No special code, instrumentation, or build required Only uses a maximum of 8 words of program memory - user selectable location
• Some restrictions compared to JTAG debug For very low level debugging (e.g. interrupt service routines), an ICE is still nice
Remote Toolkit & Target
Allows Remote Code Development, Debug, & Control• Client-Server using RPC (remote procedure calls)
Server on system with BittWare hardware in it (Windows, Linux, VxWorks) Client on Windows machine connected via TCP/IP to server
• Run All BittWare Tools on Remote PC via Ethernet Diag21k, configuration manager, DspGraph, DspBad, Target Great for remote technical support
• Run All User Applications on Remote PC Just rebuild user app with Remote HIL instead of regular HIL
• Run VisualDSP++ Debug Session on Remote PC! No need to plug in JTAG emulator Don’t need Windows on target platform!
• Toolkit 8.0 Combines Remote and Standard Dsp21k-SF Allows you to access boards in local machine and remote machine No need to rebuild application to use remote board
Board Support Libraries & Examples
• All Boards Ship with Board Support Libraries & Examples Actual contents specific to each board Provides interface to standard hardware Examples of how to use key features of the board Same code as used by BittWare for validation & production test Examples include: PCI, links, SDRAM, FLASH, UART, utilities, ... Royalty free on BittWare hardware
• Source Provided for User Customization Users may tailor to their specific needs Hard to create “generic” optimal library as requirements vary greatly
• PCI Library for All DSP Boards Bus mastering DMA read/write Single access read/write
• Windows GUIs for All I/O Boards Allow user to learn board control and operation IOBarracuda, AdcPerf
FPGA Developer’s Kits
• For Users Customizing FPGAs on BittWare Boards Source for standard FPGA loads or examples Royalty free on BittWare hardware Mainly VHDL with some schematic (usually top level) Uses standard Xilinx (ISE Foundation) and Altera (Quartus) tools
• B2/T2 ATLANTiS FPGA Developer’s Kit TS-201 link transmit and receive ATLANTiS Switches Control registers on peripheral bus (TigerSharc and PCI accessible) Digital I/O SerDes I/O (Aurora, SerialLite, Serial Rapid IO in works) Pre/Post/Co-Processing shells
TS-Libs
Floating Point Library Over 450 optimised 32-bit floating point signal processing routines With over 200 extra striding versions
Integer Library Over 100 optimised 32-bit integer routines With over 80 extra striding versions
Fixed point (16-bit) Library Over 120 optimised 16-bit fixed point signal processing routines
• Fastest, most optimised library for TS (up to 10x faster than C)• Uses latest algorithm theory• Well documented, easy to use, and proven over wide user base• Allows customers to focus on application (not implementation)• Supported & maintained by highly experienced TS programmers• Additional routines & functions available upon request
Hand optimised, C-callable TigerSHARC Function Libraries
TS-Libs Function Coverage• FFT & DCTs
1 & 2-dimension, real/complex, • Filters
Convolution, correlation, IIR, FIR• Trigonometric• Vector Mathematics• Matrix Mathematics• Logic-Test-Sort Operations• Statistics
• Windowing functions• Compander• Distribution and Pseudo-Random
Number Generation Scalar/vector log/cubes, etc.
• Memory Move Matrix/Vector• Other Routines
Doppler, signal to noise density, Choleski decomposition
Routine Input Length VDSP Run-time TS-Lib % Faster Real Vector and Vector Add. 1,000 1,273 776 64.0 Real Vector and Vector Mult. 1,000 1,273 776 64.0 Complex Vector Addition 1,000 2,766 1,526 81.3 Complex Vector Mult. 1,000 3,012 2,526 19.2 Complex Vector Dot Product 1,000 3,022 2,039 48.2 Complex Matrix Addition (100,100) 25,030 12,713 96.9 Real Vector Mean 1,000 1,431 1,045 36.9 FIR 20 & 10,000 202,534 104,420 94.0 Real Cross Correlation 1,000 & 1,000 1,145,056 260,821 339.0 Real Convolution 1,000 & 1,000 2,513,531 874,567 187.4
Trident Multi Processor Operating Environment
Software Technology Overview
BittWare’s Trident - MPOE
Multi-Processor Operating Environment
• Designed specifically for BittWare’s TigerSHARC boards• Built on top of Analog Device’s VDK • Provides easy-to-use ‘Virtual Single Processor’ programming model• Optimized for determinism, low-latency, & high-throughput• Trident’s 3 Prongs:
Multi-Tasking multiple threads of execution on a single processor
Multi-Processor Transparent coordination of multiple threads on multiple processors in a system
Data Flow Management managing high-throughput, low-latency data transfer throughout the system
Why is Trident Needed?
Ease of Programming• Multiprocessor DSP programming is complicated
• Many customers don’t have this background/experience
Higher-level Tool Integration• Need underlying support for higher level software concepts (Corba, MPI,
etc)
Lack of Alternatives• Most RTOSs focus on control and single processor, not data flow and
multiprocessor
• VDK is multiprocessor limited multiprocessor messaging but limited to 32 DSPs no multiprocessor synchronization limited data flow management
Transparent Multiprocessing
• The key feature Trident provides is Transparent Multiprocessing
• Allows programmer to concentrate on developing threads of sequential execution (more traditional programming style)
• Provides for messaging between threads and synchronization of threads over processor boundaries transparently
• Programmer does not need to know where a thread is located in the system when coding
• Tools allow for defining system configuration and partitioning threads onto the available processors (at build time)
• Similar to “Virtual Single Processor” model of Virtuoso/VspWorks
Trident Threads
• Multiple threads spread over single or multiple processors allows user to split application into logical units of operation provides for more familiar linear programming style, I.e. one thread
deals with one aspect of the system locate threads at build time on appropriate processors
• Priority based preemptive scheduler (per processor) multiple levels of priority for threads round robin (time slice) or run to completion within a level preemption between levels based on a system event (eg. an
interrupt)
• Synchronization & control of threads Message between threads within a processor or spanning multiple
processors semaphores for resource control available for access anywhere in
system
Trident Runtime
• Device drivers for underlying board components
• Framework: message passing core responsible for addressing, topology and boot-time synchronization to support up to 65k processors
• Initial Modules CDF, MPSync, MPMQ
• Optional Modules Future functionality User expansion
• User API
• Continuous Data Flow module provides raw link port support
• Suitable for device I/O at the system/processing edge, e.g. ADC
• Simple-to-use interface for reading and writing data blocks across link ports
• Supports Single data block transfer Vector data block transfer Continuous data block transfers
• User-supplied call-back
• Mix-and-match approach
Trident Modules - CDF
Continuous Data Flows API
Trident_RegisterCallbackFunction
Trident_UnregisterCallbackFunction
Trident_Write
Trident_Read
Trident_WriteV
Trident_ReadV
Trident_WriteC
Trident_ReadC
Trident Modules - MPSync
• Multiprocessor Synchronization
• Synchronization methods are essential in any distributed system to protect shared resources or coordinate activities
• Allows threads to synchronize across processor boundaries
• Semaphores: counting and binary
• Barriers: a simple group synchronization method
Trident Modules - MPMQ
• Multiprocessor Message Queues
• Provides for messaging between threads anywhere in the system transparently
• Extends the native VDK channel-based messaging into multiprocessor space
• Provides point-to-point and broadcast capabilities
VDSP++ IDE Integration
• Trident Plugin fully integrated within VDSP++
• Configures The boards and their
interconnections The VDK projects Any Trident objects
• Builds the configuration files• Configures VDK kernel to support
Trident runtime
Trident – to Market
• Beta released Summer 2006
• First full release November 7
• Pricing ~$10k per project (max 3 developers) when
purchased with BittWare Hardware Royalty free on BittWare hardware
• 30 day trials available
Trident – Future Directions
• Extend debug and config tools
• Add support for buses (cluster, PCI)
• Add support for switch fabrics (RapidIO, ?)
• Incorporate FPGAs as processing elements “Threads” located in FPGAs as sources/sinks for
messaging
• Port to other processors Trident designed to use basic features of a kernel,
so could port to other platforms and kernels
• BittWare’s Gedae BSP for TigerSHARC
What Gedae says Gedae is
What is Gedae?
• Graphical Entry Distributed Application Environment
– Originally developed by Martin Marietta (now Lockheed Martin) under DARPA’s RASSP initiative to ‘abstract’ HW-level implementation
• A graphical software development tool for signal processing algorithm design and implementation on real-time embedded multiprocessor systems
• A tool designed to reduced software development costs and build reusable designs
• A tool that can help analyze the performance of the embedded implementation and optimize to the hardware
System Development in Gedae
1) Develop Algorithm that runs on the workstation
- A tool for algorithm development
- Design hardware independent systems
- Design reusable components
2) Implement systems on the embedded hardware
- Port designs to any supported hardware
- Re-port to new hardware
Designing Data Flow Graphs (DFG)• Basic Gedae interface: Design systems from Standard Function
Units in the hardware optimized embeddable library
• Function blocks represent the function units (FFT, sin, FIR, etc.)• Optimized routines/blocks form GEDAE “e_” library• 200 routines taken from TS-Libs for BittWare BSP
• Underlining code that each function block calls for execution is called a Primitive (written similar to C)
Designing Data Flow Graphs (DFG)
• Create sub-blocks to define your own function units (add to e_ library for component reuse)
• Connecting lines represent the token streams. The underlying communications are handled by the hardware BSP
Gedae Data Communications
• Uses data flow by token streams
• Communication is handled when transfer across hardware
Scalar values(or structures)
Vectors
Matrices
Run-time Schedules
Static Scheduling
Dynamic (Runtime) Scheduling
• The execution sequence and memory layout specified by the DFG • A schedule boundary is forced by dynamic queues
• Static schedule boundaries are forced when variable token streams are only determined at runtime
• Queues are used to separate two static schedules when this occurs
• Functions require defined number of tokens to run
• a branch, valve, merge, switch effect the token flow
• Produces one static schedule for each part separated by a queue
This black square indicates a queue
Run-time Schedules – Memory Usage• One of the primary resources available on a DSP is the memory• Memory scheduling dramatically reduces the amount of memory used by a
static schedule • Gedae used memory packer modes:
No packer: Gedae uses different memory for each output (wasteful) When function is finished, the memory is reused Other packers trade-off the time to pack with optimality of packing
• Vertically - static schedule
• Horizontally - memory used
Create parallelism in DFG
• A simple flow graph function blocks can be distributed across multiple processors
• A “family” of function blocks can be distributed across multiple of processors
• Families creates multiple instances of function block which can express parallelism
• Gedae treats families as separate function blocks (referenced with a vector index)
-
12
34
5
n n
n
Partitioning a Graph to multiple processors
• To run the function blocks on separate processors, partition a DFG into parts
• A separate executable is created for each part
• Partitions are independent of schedules
• Gedae creates a static schedule for each partition
• Extensive Group Controls facilitate management of partitions
Partitioning a Graph
Gedae has powerful visualization tools to view the timings of the processor schedules
BlockedReceive Operation Send
Visualization Tools: Trace table
Trace table – Function Details
Gedae has powerful visualization tools to view the trace details of a given function
Parallel DSP Operation
Trace table - Parallel Operation
• Optimized routines for the Gedae embeddable “e_” library200 TS-Libs functions – more can be ported if needed
• Memory Handler• Communication Methods
Support for HW capabilities: Link & Cluster bus• Multi-DSP Board Capability
Up to 128 clusters• Networking Support
Development and control of distributed network of BittWare boards, with remote debug capabilities
• BSP Support with over 12 man/years of TigerSHARC expertise
BittWare’s Gedea BSP for TigerSHARC
What does the BittWare Gedae BSP Provide?
- SHARED_WORD(cluster bus word-sync transfers)
- DSA_LINK(DMA over the link ports)
- DSA_SHMBUF(DSA DMA over the cluster bus)
- SHMBUF(cluster bus buffered transfers)
- LINK(link port transfers)
BSP Data Transfer Methods
• Hardware Max rate 666.4 Mbytes / second
• For 1k data packets 450 Mbytes / second (on-board)
dsa_shmbuf Data Transfer Rate
0
100
200
300
400
500
600
700
32 64 128 256 512 1024 2048 4096 8192
Data Transfer Size
MB
ytes
/sec
best_send_ready
Data Transfer Rates – Shared Memory
dsa_link Data Transfer Rate
507090
110130150
170190210230250
Data Transfer Size
MB
ytes
/sec
best_send_ready
• Hardware Max rate 250 Mbytes / second
• For 1k data packets 230 Mbytes / second
Data Transfer Rates – Link Ports
Gedae/BSP Summary
• Gedae Provides portable designs for embedded multi-DSP Scheduling, communication and memory handling is provided Optimized functions are provided for each supported board
• BittWare’s Gedae BSP for TigerSHARC:
Allows Gedae to target BittWare’s TigerSHARC Boards Compiles onto multiple DSP (up to 8 per board) Compiles to multiple boards (currently up to 128 boards) Optimized TigerSHARC library of functions Multiple communication methods (with efficient, high data rates) Removes TigerSHARC specialist engineering
Additional Slides/Info
Demo Description
• Dual B2-AMC hybrid signal processing boards 2S90 Stratix II FPGA Quad TigerSHARC DSPs
• FINe control interface via GigE• ATLANTiS framework
Reconfigurable data routing ‘Patch-able’ processing
• 4x Serial RapidIO endpoint implemented in FPGA 12.5 Gb/s inter-board xfer rates; 10 Gb/sec max payload rate 90% efficiencies
• MicroTCA-like “Pico Box”
Demo Hardware
BittWare’s B2-AMC
CorEdge’s PicoTCA
Demo System Architecture
4x S
eria
l Rap
idIO
CoreEdge Power, IPMI, & Ethernet Module
CoreEdge PICO Box
RJ45
RJ45Ethernet Hub
LaptopPC/Laptop
BittWare B2-AMC
DSP(TS201)
DSP(TS201)
DSP(TS201)
DSP(TS201)
SerDesQuadPHY
FINe(2S60)
FPGA(2S90)
GigE
BittWare B2-AMC
DSP(TS201)
DSP(TS201)
DSP(TS201)
DSP(TS201)
SerDesQuadPHY
FINe(2S60)
FPGA(2S90)
GigE
ATLANTiS FPGA(Stratix II 90/130)
TS-201TigerSHARC
DSP#1
L0
L1
FINe(2S60)
Cluster Bus
GigE
TS-201TigerSHARC
DSP#2
L0
L1
TS-201TigerSHARC
DSP#3
L0
L1
TS-201TigerSHARC
DSP#4
L0
L1
SerDesQuadPHYPMC Sierra
GMII
Front-Panel I/O
ATLANTiS – B2
ATLANTiS – SRIO
ATLANTiS FPGA(Stratix II 90/130)
TS-201TigerSHARC
DSP#1
L0
L1
FINe(2S60)
Cluster Bus
GigE
TS-201TigerSHARC
DSP#2
L0
L1
TS-201TigerSHARC
DSP#3
L0
L1
TS-201TigerSHARC
DSP#4
L0
L1
SerDesQuadPHYPMC Sierra
GMII
Front-Panel I/O
Switch 1
ATLANTiS – Connecting to FPGA Filters
ATLANTiS FPGA(Stratix II 90/130)
TS-201TigerSHARC
DSP#1
L0
L1
FINe(2S60)
Cluster Bus
GigE
TS-201TigerSHARC
DSP#2
L0
L1
TS-201TigerSHARC
DSP#3
L0
L1
TS-201TigerSHARC
DSP#4
L0
L1
SerDesQuadPHYPMC Sierra
GMII
Front-Panel I/O
Switch 2