1 nov02 implementing complex algorithms in fpgas workshop dr steve chappell director apps...
TRANSCRIPT
1Nov02
Implementing Complex Algorithms in FPGAs
Workshop
Dr Steve Chappell Director Apps Engineering
2
Workshop Materials
> For the Labs Course Workbook, Tutorials and Application Notes
DK integrated help system
> On your Workstations DK, PDK
> Target Platforms RC100, RC1000
3
Contents
> Introductions About Celoxica
> The Basics Opportunities with a HW Coprocessor
Target Boards
Design Flows – DK and Handel-C in brief
> Handel-C Language
> Tool Connectivity
> Platform Developers Kit Platform Abstraction
Codesign
> Appendices Technology, Applications, CUP
Lab#1
Lab#2
Labs#3,4
4
About Celoxica
> System EDA company Design Tools, FPGA Boards, Consultancy and Services
Incorporated on the 25th September 2000 (Formerly ESL)
Market leader in complete solutions for software-compiled system design
Core Technology is DK incorporating the Handel-C programming language
A senior management wealth of EDA and electronics industry experience
> Industry leading partners:
> Strong Links with Research & Development Technology and expertise based upon decades of research into state-of-
the-art at The University of Oxford Chief Science Officer Ian Page, visiting Professor at the Imperial College of
Science, Technology & Medicine, London
Established and active University Program (700 institutions world-wide)
> Investors Premier league investors including
Intel
5
Supporting Argonne
> Augmented Cluster supplied by Linux Networks Incorporating Tarari CPP cards and Software drivers
Celoxica Development Kit for FPGA content
> Ensuring successful deployment and evaluation Cluster support by Linux Networks
Augmented Application and CPP card support by Celoxica
6Nov02
The Basics
Opportunities and Challenges
Essence of an FPGA
Design Flows
7
Opportunities with a HW Co-processor
> Algorithm Acceleration Exploit the parallelism in algorithms to increase performance
with implementation in custom (parallel) hardware
> Algorithm Offload Exploit the coprocessor to free CPU resource
e.g., in an SSL proxy, the CPU can always handle more TCP traffic if algorithms such as RSA and 3DES are moved to a coprocessor
> For PCI-based coprocessor cards candidate algorithms include ones where CPU execution time far exceeds data transfer time over PCI
Full analysis needs to consider:
Time required to perform the algorithm in the Co-processor
System application performance improvement – Amdahl’s Law
8
Opportunities with FPGAs
> FPGA architecture
> What it means for applications “Soft” Hardware
Reconfigurability/Programmability
Integer processors (FP is “resource expensive”)
Wide data paths
Parallel Computation
> Challenges to deployment in enterprise computing Development complexity
IP deployment and integration
Design Framework and methods
Data Bandwidth to/from coprocessor
Choosing the right applications
9
Essence of an FPGA
> Soft Cores
Processor
> Block RAM
> Processor
> Multipliers
> Application
> SRAM Field Programmable Gate ArrayCLB’s+IOB’s+Interconnect Matrix
> CLB
10Nov02
Target Boards
RC100, RC1000, RC2000 Tarari CPP
11
RC-100
> Xilinx Spartan2-200 FPGA
> 2MB ZBT SRAM, in 2 36-bit banks. 8MB Flash RAM
> 50 pin expansion header, PS/2 mouse/keyboard, parallel port
> Video input decoder, VGA output DAC
> Two 7-segment LED displays
> 80MHz maximum clock
12
RC-1000
> PCI card, DMA transfers > 110 MB/sec sustained
> Xilinx Virtex-2000 FPGA
> 8MB SRAM, in 4 32-bit banks
> 2 PMC slots
> 50 auxiliary I/O pins
> Programmable clock
13
RC-1000
13
14
RC-2000
> Virtex II 2V3000-4, 2V6000-4 and 2V6000-6 FPGAs
> 64bit 66MHz PCI bus
> 6 banks of ZBT SRAM offering a total of either 12Mb or 24Mb
> Front-panel I/O up to 146 lines, dependant on options
> 64 I/O lines via PMC connector
> 16Mb Flash for configuration storage
> 2 Programmable clocks
> Options include: 16Mb additional ZBT SRAM in 2 banks
128Mb DDR Ram
15
RC-2000
15
16
CPP – Basic Board Architecture
> Two CPE’s – Content Processing Engines Virtex-II 1000 FPGA:
Eight LEDs
2x 1MB SRAM
Connection to CPC
> CPC – Content Processing Controller 256MB DDR SDRAM
PCI Bus to Host
CPC
PCI bus
256MB DDR SDRAM
CPE
CPE
1MB SRAM
1MB SRAM
1MB SRAM
1MB SRAM
Basic CPP architecture
L E D
L E D
17Nov02
Design Flows
DK and Handel-C
18
Designing acceleration IP
> Traditional Options – HDL based design Purchase FPGA (HW) development tools
Hire/use HW engineers
Pay 3rd Party development fees
> The Alternative – “Software Compiled System Design” Use Celoxica Content Processing Development Kit
Development framework with Example Acceleration IP
Comprehensive Hardware-Software Co-simulation environment
Tool and Language Connectivity
Enable SW engineers and/or increase HW engineer productivity
19
Why a Software Language Based Approach for System Design?
> Some problems are better expressed as a software algorithm
> Software Reference designs can be utilized
> Designs are often specified by a C/C++ executable
> Simplifies and delays hardware-software partitioning
> Software development techniques can be used
> Brings hardware and software teams closer together
> New Possibilities …
20
RC100
> RC100 prototyping board $10 FPGA
Commodity memory chips
Video Input and Output
1
21
RC100
> RC100 prototyping board $10 FPGA
Commodity memory chips
Video Input and Output
2
22
CPDK for developing acceleration IP> The Content Processing Development Kit includes
Celoxica “DK” and supporting libraries
> Consisting of “Software Compiled System Design” environment
Simple design flow with integrated Simulation and direct implementation
Similar SW/HW design methods simplifies design exploration and optimal allocation of functionality between SW and HW
Verification and Debug using a Symbolic Debugger
Connectivity and co-simulation with SW and HDL cores
API’s to hide complexity
> Enabling your software and hardware developers To rapidly develop acceleration IP
23
FinalHardware
Celoxica DK1 – Rapid Design
> Handel-C direct to FPGA, Minimum Tool Chain
> Easy-to-learn language – ISO-C (ANSI-C)
> Design of hardware and software in parallel with co-simulation
Netlist
Compile
Configure
SimulateHandel-C
Design Flow>
1
A
D
Q 1
Q 4
E N B
R eg is te r
FPGA Vendor’s Tools
Place & Route
24
Supported FPGA/PLD Devices
25
Minimal Tool Chain
Similar Languages
Standardised API’s
Platform Abstraction
External IP (optional)
Development Flow
HW
CompileEDIF OBJ
SW
Specification
Handel-C C
DK
BSP
OS
SW Tool
BSP
LIBS
HW SW
Implementation
Algorithm Definition
Partition
Develop
HLL Co-Verification
EDIF
HDL
LIB
C
CPP Host CPU
26
API’s Enable Rapid Co-verification
HW SW
Specification
DK
Handel-C C
Nexus
HW SW
Implementation
Virtual Platform
> “Virtual Platform” for Co-simulation and Co-design
> Cycle-accurate HLL simulator for Acceleration IP modelling
> Extendable Co-Sim to: C/C++, HDL, System-C, ISS
BSPBSP
HDL-Simulator SW and/or ISS
27
DK User Interface
File view
Symbol view
Syntax highlighting
Break-points
Multithreaded Debug
Watchvariables
Simulate Build
Clock Cycles
Info
28
Handel-C in Brief
> Handel-C is based on ANSI C
> Well-defined semantics similar to OCCAM/CSP
> Additions: support for parallelism
channels for communications between parallel processes
operators for detailed control of hardware
constructs for RAM, ROM, interfacing, etc.
29
HW-SW Co-Design
30Nov02
Handel-C Language
31
Core Language Features
> Standard C (if, while, switch etc) including
Functions
Structures
Pointers
> par {…} construct for parallelism
> Simple model of timing
each assignment is one clock cycle
> Arbitrary widths on variables
> Enhanced bit manipulation operators
> Sharing/Copying expressions
> Support for hardware constructs
Multiple clock domains, RAM, ROM, external interfaces
32
Handel-C describes Hardware!
> No side effects in expressions i.e. statements like a = b*c++; are not supported
> No floating point Floating point not directly supported by Handel-C.
Library support provided for fixed and floating point arithmetic
> No run-time recursion Due to the absence of any kind of ‘call stack’ in hardware.
> Limited standard library (i.e. no printf, fopen etc.) However, DK1.1 allows direct calls to external functions written
in C/C++, and these could incorporate file I/O, user interaction, recursion, etc.
33
Variables
> Handel-C has one basic type - integer
> May be signed or unsigned
> Can be any width, not limited to 8, 16, 32 etc.
Variables are mapped to hardware registers.
void main(void){
unsigned 6 a;a=45;
}
1 0 1 1 0 1 = 0x2da =
LSBMSB
34
Bit Manipulation Operators
> Extra operators have been added to allow more ‘hardware like’ bit manipulation:
<< Shift Left b = a<<2;
>> Shift Right b = a>>1;
<- Take least significant bits b = a<-5;
\\ Drop least significant bits b = a\\5;
@ Concatenate bits b = a@c;
[ ] Bit Selection b = a[4:1];
35
Example Bit Manipulation
[MSB :LSB ] - bit selection (range of bits)
1 0 1 1 0 1 = 0x2da =
0 1 1 0 = 0x6b =
b = a[4:1]
36
Bit Manipulation 2
> Other bit manipulation examples:
signed int 4 a;signed b,c,d;
a = 0b1100;
b = a<<1; // b = 0b1000b = a>>1; // b = 0b1110c = a[2:1]; // c = 0b10c = a<-2; // c = 0b00c = a\\2; // c = 0b11d = a @ a; // d = 0b11001100
37
index = 0; // 1 Cyclewhile (index < length){
if(table[index] = key)found=index; // 1 Cycle
elseindex = index+1; // 1 Cycle
}}
Timing model
> Assignments and delay statements take 1 clock cycle
> Combinatorial Expressions computed between clock edges
Most complex expression determines clock period
Example: takes 1+n cycles (n is number of iterations)
38
Parallelism
> Handel-C blocks are by default sequential
> par{…} executes statements in parallel
> par block completes when all statements complete Time for block is time for longest statement
Can nest sequential blocks in par blocks
// 3 Clock Cycles {
a=1;b=2;c=3;
}
Sequential BlockParallel Block
// 1 Clock Cycle par{
a=1;b=2;c=3;
}
39
More Parallelism
> Example – array initialisation
> Sequential version takes 20 clock cycles for() loop has 1 cycle overhead for increment
> Parallel version takes 1 clock cycle Replicated par() builds hardware to execute all
20 iterations in a single cycle Allows trade-off between hardware size and performance
for(i=0;i<10;i++){ array[i]=0;}
Sequential code Parallel code
par(i=0;i<10;i++){ array[i]=0;}
40
Channels
> Allow communication and synchronisation between two parallel branches
Semantics based on CSP: unbuffered (synchronous) send and receive
> Declaration Specifies data type to be communicated
{ … c?b; //read c to b …}
{ … c!a+1; //write a+1 to c …}
Chan unsigned 6 c;
ca b
41
Sharing Hardware for Expressions
> Functions provide a means of sharing hardware for expressions
> By default, compiler generates separate hardware for each expression
Hardware is idle when control flow is elsewhere in the program
Hardware function body is shared among call sites
{…x= x*a + b;y= y*c +d
}
int mult_add(int z,c1,c2){ return z*c1 + c2; }
{…x= mult_add(x,a,b);y= mult_add(y,c,d);
}
42
Replicating Hardware for Expressions
> Inline Functions are expanded at the call site Provide for functional abstraction of complex hardware
inline complex mult_complex(complex x,y){complex z;par{
z.re = x.re*y.re – x.im*y.im;z.im = x.re*y.im + x.im*y.re;
}return z;
}
complex x1,y1,x2,y2,z1,z2;…par{
z1 = mult_complex(x1,y1);z2 = mult_complex(x2,y2);
}
43
Macro procedures
> macro proc is similar to an inline function, but is expanded at compile time.
They also allow for arbitrary bit width calculations
> The following generates a reusable timer:
macro proc usleep(ms){ #define TENTH_SEC CLOCK_RATE/10
unsigned (log2ceil(TENTH_SEC)) Counter; Counter = TENTH_SEC * (0@ms) ;
while (Counter) Counter--;}
44
Signals
> A signal behaves like a wire - takes the value assigned to it but only for that clock cycle.
The value can be read back during the same clock cycle.
The signal can also be given a default value.
// Breaking up complex expressionsint 15 a, b;signal <int> sig1;static signal <int> sig2=0; //default value of 0a = 7;par{ sig1 = (a+34)*17;
sig2 = (a<<2)+2;b = sig1 + sig2;
}
45
Interfaces - Introduction
> Interfaces allow Handel-C designs to connect to external hardware and logic.
> Three types of interfaces
Buses – used for connecting to external pins
Ports – used for creating connection points for external logic.
e.g. Creating the ports for a VHDL entity
User Defined – used for including external logic blocks inside a Handel-C design.
e.g. Including an EDIF black box inside a deign.
46
Interfaces – Buses
> Makes connections to pins on the FPGA. Bus types
Output
Input – direct, clocked and latched input
Tri-state – direct, clocked and latched tri-state
interface bus_in(int 4) Address() with {data={P1,P2,P3,P4}};x=Address.in;
x
P1P2
P4P3
Addressx
47
Interfaces – Ports
> Allows connection points for external logic to be specified. e.g. Defining the ports for a ‘black box’ VHDL entity
Port types: Input, Output
//Declare Portsinterface port_in(int 4 Input1) InputPort1();interface port_in(int 4 Input2) InputPort2();interface port_out() OutputPort(int 4 Output = OutReg);
Handel-C black box
Input1
Input2
Output
48
Interfaces – User Defined
> Allows external logic blocks to be used inside a Handel-C design. e.g. Using an EDIF core.
//Instantiate connections to coreinterface pipe_mult(int 4 Result)
Multiplier( int 4 A, int 4 B);
Handel-C Design
ResultBA EDIF Module
pipe_mult.edf
49
Multiple Clock Domains - example
Domain1.c Domain2.c
chan unsigned 8 ComChan;
set clock = external "C1";
void main(void){
unsigned 8 x;
do{
x++;ComChan ! x;
}while(1);}
extern chan unsigned 8 ComChan;
set clock = external "C2";
void main(void){
unsigned 8 y;
do{
ComChan ? y;}while(1);
}
50
Handel-C Summary
> Handel-C is based on ANSI C
> Well-defined semantics similar to OCCAM/CSP
> Additions: support for parallelism
channels for communications between parallel processes
operators for detailed control of hardware
constructs for RAM, ROM, interfacing, etc.
51Nov02
Lab #1
Quick Start DK1, Handel-C and the RC100
52Nov02
Tool Connectivity
The Whole Y-Chart
53
Tool Connectivity
54
Black Boxes - Xilinx CoreGen
55
Co-Simulation with HDL
55
56
Co-Simulation with ISS
57
HW-SW Co-Simulation &Virtual Platforms
58
MatLab Simulink
Filter.hcc
Sfunc.cpp
dll
59
Co-Simulation with System-C
60Nov02
Lab #2
Advanced Features
61Nov02
PDK – Platform Dev Kit
PDK, PAL and DSM
62
Introduction to PDK
> PDK – Platform Developer’s Kit
> Goal – to provide an integrated package of tools, support libraries and implementations to simplify application development and verification using DK1
Insulate developer from hardware details
Improve portability and maintainability
Provide key pre-packaged value–adding functionality
Allow simulation of the complete environment from modelling through to hardware implementation
> Benefits Reduce development time
Allow development focus on application added value
63
Introduction to PDK
> PDK – Three major components
> DSM Integration between processors and FPGA/PLD
> PAL A consistent API for portable board-level Handel-C
implementations
> PSL Provides board, hardware or development tool specific support
for DK1 and Handel-C
64
Introduction to PDK
> Each PDK component provides four functional areas:
> Simulation Provides hardware independent simulation of DSM and PAL APIs
and co-simulation with external tools and simulators
> Kit Provides key components and/or templates to allow
development of new, platform specific, implementations
> Platform Platform specific implementations of DSM, PAL and PSL
components
> Cores Implementations of added-value functionality, demos or
examples
65
Platform Abstraction Layer (PAL)
Peripheral 1 Peripheral 2
Handel-C Application
Board
Platform Abstraction Layer Application Programming Interface
Platform Support Library (PSL)
PAL-Core
66
DSM – Data Stream Manager
Processor
Software DSM Library
FPGA
Hardware DSM Library
Handel - C program
Handel - C program
Application Application
Hardware Bus Controller
Software Bus Controller
Figure 4
67Nov02
Labs #3 and #4
PDK:
PAL and DSM
68
Summary
> High performance gains with HW acceleration cards For appropriate algorithms
> Development kit enabling rapid design using a software-like development framework
Celoxica DK and Handel-C
> Consultancy and Services
> For More > www.celoxica.com
72Nov02
Appendices
Technology Behind DK
Consultancy, Services, Projects
Case studies
University Programme
73Nov02
The Technology Behind DK
74
The Technology Behind DK
> Simple Hardware constructs
> Compilation Flow
> Optimisations
75
The Hardware Description
> Data Path Circuitry to Move/Manipulate/Store Data
> Control Path Circuitry to schedule operations
76
Control and Assignment
> Variables are mapped to hardware registers
> The control start signal forms the clock enable signal for the destination register of the assignment.
Figure. Implementation of Assignment
Q
DCLK
Start
F in ish
RCE
Exp
void main(void){
…R=Exp;…
}
77
The IF Construct
> Start
F in ish
S1 S2
BE
void main(void){
…if { BE }
S1;
else S2;
…}
78
Sequential Composition
>
S1
Start
F in ish
S2
S3
void main(void){
…S1;S2;S3;…
}
79
Parallel Composition
>
void main(void){
…par{
S1;S2;S3;
}…
}
Start
F in ish
S1 S2 S3
D
Q
D
Q
D
Q
80
Compilation Flow - Optimisations
> Generate AST from Source code
Macro Expansion
Width Inferencing
Design Checking
> Compilation to High Level Netlist
> Expansion to technology specific netlist
High Level Optimisation
Expansion
Low Level Optimisation
Compilation
Abstract Syntax Tree
High Level Netlist
Gate Level Netlist
Technology Independent
Technology Specific
Figure. Compilation flow after parsing.
81
Re-Writing
> Logical equivalence
(a) Constant 1 input to AND Gate removed
(b) Gate removed with unused output
(c) Block removed with unused output
1
xy
xy
Figure. Some re-writing optimizations.
Removed
(a)
(b)
(c)
82
Conditional Re-Writing
> Logical equivalence by testing for impossible Conditions
Gates removed for circuit with output independent of y
x
y
Figure. Conditional re-writing.
x
y
83
Common Sub-Expression Elimination
> Test for common logic Duplicate AND gate
removed
xy
Figure. Common sub-expression elimination.
xy
84
DK1 Optimisation Settings
85Nov02
Customer Highlights
Consultancy, Services, Projects
Case studies
86
Celoxica Expertise
> Technical Strengths Design Methodologies and Hardware Compiler technology
FPGA board design and prototyping
Image, Data processing and Multimedia Encryption
Compression/Decompression
Video Processing
Telecommunications Routers/Switches
Protocol stacks – IPv6, VoIP (H323, SIP), ATM
Software defined radio – UMTS, 3G, DAB
> Business Consultancy Analysis, Marketing and Strategy
Venture capital
Services and Support
87
Marconi Celoxica Technology Demonstrator
> Internet Reconfigurable Hardware from Software FPGA based, no microprocessor or operating system
Different applications from the same hardware
Can be reconfigured over internet to new applications
> MMT 2000 IP Phone
MP3 player
Games console
Graphic display
88
High Speed Video Prototyping System
> Customer Requirement to shorten the evaluation time of video
filter algorithms as candidates for use in DTVB
> Solution FPGA-based system comprising:
Wealth of analogue and digital video I/O
COTS boards and custom
Development kit: DK and Video framework libraries (SW/HW)
> Outcome Real-time evaluation system rather than
slow software models
Algorithm Evaluation times reduced from 12 to 3-6 months
Prototypes for ASIC process rather than software models
FP
GA
Hos
t Car
d
D IM Eexpansion
site
C om ponentA na logue
V ideoIn terfaceM odu le
(H D ) S D IIn terfaceM odu le
SD
I Inp
ut
SD
I Out
put #
1
SD
I Out
put #
2
Ana
logu
e In
put #
1
Ana
logu
e O
utpu
t #1
Ana
logu
e In
put #
2
Ana
logu
e O
utpu
t #2
C om ponentA na logue
V ideoIn terfaceM odu le
89
EuroSkyWay Multimedia Satellite: Ground Traffic EuroSkyWay Multimedia Satellite: Ground Traffic SimulatorSimulator
• Services: 512, 2048 kb/s, 8...32 Mb/s (provider)• fixed and mobile users (aircraft, busses, vessels)• service launch in 2004
GT S C ont ro lle r
E VA L AN
G T S Pa ram e te rD atabase Access M ana ge r
EC S
Access M ana ge r
Access M ana ger
Access M ana ge r
Access M anage r
EPS
E xternal trafficG /A
M& C
IP/A TM etx ernal traff ic
GT S C ont ro lle r
E VA L AN
G T S Pa ram e te rD atabase Access M ana ge r
EC S
Access M ana ge r
Access M ana ger
Access M ana ge r
Access M anage r
EPS
E xternal trafficG /A
M& C
IP/A TM etx ernal traff ic
EuroSkyWay PHY Board
• for system verification and end-to-end perf. testing• generation of total network traffic (ATM, IP)• full implementation of ESW protocols (layer 1/2/3)• digital baseband transmission
SaT-B/C
PTN GTW PrT-A,-B
SaT-A
ServiceProvider Center
SaT-A
CollectiveUse
IndividualUse
SaT-B
SaT-C
ISL
160 Kbps
512/2048Kbps
160 Kbps
512 Kbps
2048 Kbps
32.768 Mbps 6.144 / 32.768 Mbps
InSS
8 x 32.768 Mbps
To/FromSupportedNetworks
CLUSTER1
1B1A
MCS
2 x32.768Mbps
NetworkOperation
Center
InSS
n x 32.768 Mbps
To/From
Supported
Networks
90
JPEG2000 MQ encoder implementation
> SCSD version Slices 1,999
Device utilization 18%
Speed (MHz) 115.5
Lines of code330
Design time (days) 10 +2
Av cycles per code block (000’s) 108
Processing time (ms) 0.939
Simulation time for Lena jpeg 5 minutes> Traditional HDL Implementation
Slices 620
Device utilization 6%
Speed (MHz) 76
Lines of code 800
Design time (days) * 30+
Av cycles per code block (000’s) 67.5
Processing time (ms) 0.888
Simulation time for Lena jpeg XXX
IBM Power PC
Wind River SBC405 GP
Xilinx Virtex
Proteus FPGA daughter card
91
Customer highlights
"The DK1 suite enables us to work at a high level, quickly optimise C code for hardware implementation, prototype using FPGAs and will ultimately provide the HDL output for our ASIC design.“
Shigeru Kawada, General Manager, NEC Electronics Singapore's technology centre
I visited Celoxica's headquarters. While there, I re-implemented our existing VHDL solution using the DK1 suite in just one day. I was hooked.“
Jan Mennekens, chief technical officer M-TEC WIRELESS.
A new joint development team to create powerful, flexible and scaleable application specific servers was announced today. Celoxica Ltd, Motorola and StrongBow Technologies are working together to create servers that embed applications, such as transaction processing for credit cards, directly in hardware
“Without Celoxica’s tools, this would not have been possible,”
Alan Prouse, CEO and founder of StrongBow Technologies.
"The real value of the Celoxica tools is the quick re- engineering capability and smooth transfer to a production platform. The DK1 methodology allows us to accomplish tasks in a time frame that conventional design methods cannot handle.”
Dennis Hazel, Director of Engineering, Foxboro
"Our evaluation of DK1 clearly demonstrated that the flow increases our engineering throughput, and allows us to make better use of our scarce hardware engineering resources. Using Celoxica's Handel-C to hardware flow, our software engineers can take a software solution through to hardware allowing the hardware designer to focus on system integration and optimisation."
Andy Davey, senior engineer at Cogent Defence Systems
“Our original project plan was slated for 12-18 months using the traditional HDL design methodology. By adopting the Handel-C high level design language methodology, we were able to finish the project in 6 months with DK1 design suite and Xilinx ISE software targeting Xilinx VII-6000 FPGAs. We put in minimum development resource, but still met the design specification, timing far ahead of the schedule. Anyone can use the DK1 design suite to design efficient hardware.”
Gary Mallaley, Manager of Strategy Development at Northrop Grumman.
92Nov02
CUP:Celoxica University Programme
Recent Highlights
93
Introduction to CUP
> CUP has been active since the company was formed
> 700 universities worldwide registered with a multi-disciplinary user base
> Strategic relationship with XUP
> University specific products and services Heavily discounted
> Focused upon supporting innovative teaching and research
> Comprehensive Website www.celoxica.com/programs/university/index
> Register Now!
94
Benefits to Universities
> Rapid Design Exploration Fit more interest into time dependent project work through rapid
prototyping and productivity improvements
Port protoyped C designs to Handel-C for implementation in FPGA’s
> For Computer Science disciplines Familiar software environment
Parallel programming environment
Computer architecture exploration – build your own instruction sets
Exploring hardware accelerated systems
> For EE disciplines Cycle accurate interactive simulation
SW/HW co-design, system design and SOC
Integration with HDL’s
> Creates a bridge for increased collaboration between different disciplines
95
Update on University activity
> Research Articles: Customising Floating-Point Designs, Imperial College, Xilinx.
Accelerating Radiosity Calculations using Reconfigurable Platforms, Altaf Abdul Gaffar and Wayne Luk, Imperial College
A Hardware Implementation of a Genetic Programming System Using FPGAs and Handel-C, Peter Martin, University of Birmingham
> Teaching Programmes VDEC Japan now support DK1/Handel-C
HARDWARE/SOFTWARE CO-DESIGN: A SHORT COURSE FOR UNBELIEVERS, A. Downton et al, University of Essex
96
IGOL Framework
What is it?
> COM based Framework for Development and Distribution of Hardware Acceleration
Testing and debugging for development
Runtime services and packaging for deployment
> Application Examples Premier, Photoshop, WinAmp,
VirtualDub, DirectShow
Demonstrates
> Ease of Development and Deployment of Hardware Acceleration
> Separation of concerns Hardware developers only
develop hardware
Application developers only develop software
> Re-use of hardware and software components
Simply updating and patching
Automatic application support for new components
Adobe Acrobat Document