1 nov02 implementing complex algorithms in fpgas workshop dr steve chappell director apps...

93
1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

Upload: toby-phillips

Post on 22-Dec-2015

218 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

1Nov02

Implementing Complex Algorithms in FPGAs

Workshop

Dr Steve Chappell Director Apps Engineering

Page 2: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

2

Workshop Materials

> For the Labs Course Workbook, Tutorials and Application Notes

DK integrated help system

> On your Workstations DK, PDK

> Target Platforms RC100, RC1000

Page 3: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

3

Contents

> Introductions About Celoxica

> The Basics Opportunities with a HW Coprocessor

Target Boards

Design Flows – DK and Handel-C in brief

> Handel-C Language

> Tool Connectivity

> Platform Developers Kit Platform Abstraction

Codesign

> Appendices Technology, Applications, CUP

Lab#1

Lab#2

Labs#3,4

Page 4: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

4

About Celoxica

> System EDA company Design Tools, FPGA Boards, Consultancy and Services

Incorporated on the 25th September 2000 (Formerly ESL)

Market leader in complete solutions for software-compiled system design

Core Technology is DK incorporating the Handel-C programming language

A senior management wealth of EDA and electronics industry experience

> Industry leading partners:

> Strong Links with Research & Development Technology and expertise based upon decades of research into state-of-

the-art at The University of Oxford Chief Science Officer Ian Page, visiting Professor at the Imperial College of

Science, Technology & Medicine, London

Established and active University Program (700 institutions world-wide)

> Investors Premier league investors including

Intel

Page 5: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

5

Supporting Argonne

> Augmented Cluster supplied by Linux Networks Incorporating Tarari CPP cards and Software drivers

Celoxica Development Kit for FPGA content

> Ensuring successful deployment and evaluation Cluster support by Linux Networks

Augmented Application and CPP card support by Celoxica

Page 6: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

6Nov02

The Basics

Opportunities and Challenges

Essence of an FPGA

Design Flows

Page 7: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

7

Opportunities with a HW Co-processor

> Algorithm Acceleration Exploit the parallelism in algorithms to increase performance

with implementation in custom (parallel) hardware

> Algorithm Offload Exploit the coprocessor to free CPU resource

e.g., in an SSL proxy, the CPU can always handle more TCP traffic if algorithms such as RSA and 3DES are moved to a coprocessor

> For PCI-based coprocessor cards candidate algorithms include ones where CPU execution time far exceeds data transfer time over PCI

Full analysis needs to consider:

Time required to perform the algorithm in the Co-processor

System application performance improvement – Amdahl’s Law

Page 8: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

8

Opportunities with FPGAs

> FPGA architecture

> What it means for applications “Soft” Hardware

Reconfigurability/Programmability

Integer processors (FP is “resource expensive”)

Wide data paths

Parallel Computation

> Challenges to deployment in enterprise computing Development complexity

IP deployment and integration

Design Framework and methods

Data Bandwidth to/from coprocessor

Choosing the right applications

Page 9: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

9

Essence of an FPGA

> Soft Cores

Processor

> Block RAM

> Processor

> Multipliers

> Application

> SRAM Field Programmable Gate ArrayCLB’s+IOB’s+Interconnect Matrix

> CLB

Page 10: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

10Nov02

Target Boards

RC100, RC1000, RC2000 Tarari CPP

Page 11: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

11

RC-100

> Xilinx Spartan2-200 FPGA

> 2MB ZBT SRAM, in 2 36-bit banks. 8MB Flash RAM

> 50 pin expansion header, PS/2 mouse/keyboard, parallel port

> Video input decoder, VGA output DAC

> Two 7-segment LED displays

> 80MHz maximum clock

Page 12: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

12

RC-1000

> PCI card, DMA transfers > 110 MB/sec sustained

> Xilinx Virtex-2000 FPGA

> 8MB SRAM, in 4 32-bit banks

> 2 PMC slots

> 50 auxiliary I/O pins

> Programmable clock

Page 13: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

13

RC-1000

13

Page 14: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

14

RC-2000

> Virtex II 2V3000-4, 2V6000-4 and 2V6000-6 FPGAs

> 64bit 66MHz PCI bus

> 6 banks of ZBT SRAM offering a total of either 12Mb or 24Mb

> Front-panel I/O up to 146 lines, dependant on options

> 64 I/O lines via PMC connector

> 16Mb Flash for configuration storage

> 2 Programmable clocks

> Options include: 16Mb additional ZBT SRAM in 2 banks

128Mb DDR Ram

Page 15: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

15

RC-2000

15

Page 16: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

16

CPP – Basic Board Architecture

> Two CPE’s – Content Processing Engines Virtex-II 1000 FPGA:

Eight LEDs

2x 1MB SRAM

Connection to CPC

> CPC – Content Processing Controller 256MB DDR SDRAM

PCI Bus to Host

CPC

PCI bus

256MB DDR SDRAM

CPE

CPE

1MB SRAM

1MB SRAM

1MB SRAM

1MB SRAM

Basic CPP architecture

L E D

L E D

Page 17: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

17Nov02

Design Flows

DK and Handel-C

Page 18: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

18

Designing acceleration IP

> Traditional Options – HDL based design Purchase FPGA (HW) development tools

Hire/use HW engineers

Pay 3rd Party development fees

> The Alternative – “Software Compiled System Design” Use Celoxica Content Processing Development Kit

Development framework with Example Acceleration IP

Comprehensive Hardware-Software Co-simulation environment

Tool and Language Connectivity

Enable SW engineers and/or increase HW engineer productivity

Page 19: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

19

Why a Software Language Based Approach for System Design?

> Some problems are better expressed as a software algorithm

> Software Reference designs can be utilized

> Designs are often specified by a C/C++ executable

> Simplifies and delays hardware-software partitioning

> Software development techniques can be used

> Brings hardware and software teams closer together

> New Possibilities …

Page 20: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

20

RC100

> RC100 prototyping board $10 FPGA

Commodity memory chips

Video Input and Output

1

Page 21: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

21

RC100

> RC100 prototyping board $10 FPGA

Commodity memory chips

Video Input and Output

2

Page 22: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

22

CPDK for developing acceleration IP> The Content Processing Development Kit includes

Celoxica “DK” and supporting libraries

> Consisting of “Software Compiled System Design” environment

Simple design flow with integrated Simulation and direct implementation

Similar SW/HW design methods simplifies design exploration and optimal allocation of functionality between SW and HW

Verification and Debug using a Symbolic Debugger

Connectivity and co-simulation with SW and HDL cores

API’s to hide complexity

> Enabling your software and hardware developers To rapidly develop acceleration IP

Page 23: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

23

FinalHardware

Celoxica DK1 – Rapid Design

> Handel-C direct to FPGA, Minimum Tool Chain

> Easy-to-learn language – ISO-C (ANSI-C)

> Design of hardware and software in parallel with co-simulation

Netlist

Compile

Configure

SimulateHandel-C

Design Flow>

1

A

D

Q 1

Q 4

E N B

R eg is te r

FPGA Vendor’s Tools

Place & Route

Page 24: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

24

Supported FPGA/PLD Devices

Page 25: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

25

Minimal Tool Chain

Similar Languages

Standardised API’s

Platform Abstraction

External IP (optional)

Development Flow

HW

CompileEDIF OBJ

SW

Specification

Handel-C C

DK

BSP

OS

SW Tool

BSP

LIBS

HW SW

Implementation

Algorithm Definition

Partition

Develop

HLL Co-Verification

EDIF

HDL

LIB

C

CPP Host CPU

Page 26: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

26

API’s Enable Rapid Co-verification

HW SW

Specification

DK

Handel-C C

Nexus

HW SW

Implementation

Virtual Platform

> “Virtual Platform” for Co-simulation and Co-design

> Cycle-accurate HLL simulator for Acceleration IP modelling

> Extendable Co-Sim to: C/C++, HDL, System-C, ISS

BSPBSP

HDL-Simulator SW and/or ISS

Page 27: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

27

DK User Interface

File view

Symbol view

Syntax highlighting

Break-points

Multithreaded Debug

Watchvariables

Simulate Build

Clock Cycles

Info

Page 28: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

28

Handel-C in Brief

> Handel-C is based on ANSI C

> Well-defined semantics similar to OCCAM/CSP

> Additions: support for parallelism

channels for communications between parallel processes

operators for detailed control of hardware

constructs for RAM, ROM, interfacing, etc.

Page 29: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

29

HW-SW Co-Design

Page 30: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

30Nov02

Handel-C Language

Page 31: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

31

Core Language Features

> Standard C (if, while, switch etc) including

Functions

Structures

Pointers

> par {…} construct for parallelism

> Simple model of timing

each assignment is one clock cycle

> Arbitrary widths on variables

> Enhanced bit manipulation operators

> Sharing/Copying expressions

> Support for hardware constructs

Multiple clock domains, RAM, ROM, external interfaces

Page 32: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

32

Handel-C describes Hardware!

> No side effects in expressions i.e. statements like a = b*c++; are not supported

> No floating point Floating point not directly supported by Handel-C.

Library support provided for fixed and floating point arithmetic

> No run-time recursion Due to the absence of any kind of ‘call stack’ in hardware.

> Limited standard library (i.e. no printf, fopen etc.) However, DK1.1 allows direct calls to external functions written

in C/C++, and these could incorporate file I/O, user interaction, recursion, etc.

Page 33: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

33

Variables

> Handel-C has one basic type - integer

> May be signed or unsigned

> Can be any width, not limited to 8, 16, 32 etc.

Variables are mapped to hardware registers.

void main(void){

unsigned 6 a;a=45;

}

1 0 1 1 0 1 = 0x2da =

LSBMSB

Page 34: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

34

Bit Manipulation Operators

> Extra operators have been added to allow more ‘hardware like’ bit manipulation:

<< Shift Left b = a<<2;

>> Shift Right b = a>>1;

<- Take least significant bits b = a<-5;

\\ Drop least significant bits b = a\\5;

@ Concatenate bits b = a@c;

[ ] Bit Selection b = a[4:1];

Page 35: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

35

Example Bit Manipulation

[MSB :LSB ] - bit selection (range of bits)

1 0 1 1 0 1 = 0x2da =

0 1 1 0 = 0x6b =

b = a[4:1]

Page 36: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

36

Bit Manipulation 2

> Other bit manipulation examples:

signed int 4 a;signed b,c,d;

a = 0b1100;

b = a<<1; // b = 0b1000b = a>>1; // b = 0b1110c = a[2:1]; // c = 0b10c = a<-2; // c = 0b00c = a\\2; // c = 0b11d = a @ a; // d = 0b11001100

Page 37: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

37

index = 0; // 1 Cyclewhile (index < length){

if(table[index] = key)found=index; // 1 Cycle

elseindex = index+1; // 1 Cycle

}}

Timing model

> Assignments and delay statements take 1 clock cycle

> Combinatorial Expressions computed between clock edges

Most complex expression determines clock period

Example: takes 1+n cycles (n is number of iterations)

Page 38: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

38

Parallelism

> Handel-C blocks are by default sequential

> par{…} executes statements in parallel

> par block completes when all statements complete Time for block is time for longest statement

Can nest sequential blocks in par blocks

// 3 Clock Cycles {

a=1;b=2;c=3;

}

Sequential BlockParallel Block

// 1 Clock Cycle par{

a=1;b=2;c=3;

}

Page 39: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

39

More Parallelism

> Example – array initialisation

> Sequential version takes 20 clock cycles for() loop has 1 cycle overhead for increment

> Parallel version takes 1 clock cycle Replicated par() builds hardware to execute all

20 iterations in a single cycle Allows trade-off between hardware size and performance

for(i=0;i<10;i++){ array[i]=0;}

Sequential code Parallel code

par(i=0;i<10;i++){ array[i]=0;}

Page 40: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

40

Channels

> Allow communication and synchronisation between two parallel branches

Semantics based on CSP: unbuffered (synchronous) send and receive

> Declaration Specifies data type to be communicated

{ … c?b; //read c to b …}

{ … c!a+1; //write a+1 to c …}

Chan unsigned 6 c;

ca b

Page 41: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

41

Sharing Hardware for Expressions

> Functions provide a means of sharing hardware for expressions

> By default, compiler generates separate hardware for each expression

Hardware is idle when control flow is elsewhere in the program

Hardware function body is shared among call sites

{…x= x*a + b;y= y*c +d

}

int mult_add(int z,c1,c2){ return z*c1 + c2; }

{…x= mult_add(x,a,b);y= mult_add(y,c,d);

}

Page 42: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

42

Replicating Hardware for Expressions

> Inline Functions are expanded at the call site Provide for functional abstraction of complex hardware

inline complex mult_complex(complex x,y){complex z;par{

z.re = x.re*y.re – x.im*y.im;z.im = x.re*y.im + x.im*y.re;

}return z;

}

complex x1,y1,x2,y2,z1,z2;…par{

z1 = mult_complex(x1,y1);z2 = mult_complex(x2,y2);

}

Page 43: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

43

Macro procedures

> macro proc is similar to an inline function, but is expanded at compile time.

They also allow for arbitrary bit width calculations

> The following generates a reusable timer:

macro proc usleep(ms){ #define TENTH_SEC CLOCK_RATE/10

unsigned (log2ceil(TENTH_SEC)) Counter; Counter = TENTH_SEC * (0@ms) ;

while (Counter) Counter--;}

Page 44: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

44

Signals

> A signal behaves like a wire - takes the value assigned to it but only for that clock cycle.

The value can be read back during the same clock cycle.

The signal can also be given a default value.

// Breaking up complex expressionsint 15 a, b;signal <int> sig1;static signal <int> sig2=0; //default value of 0a = 7;par{    sig1 = (a+34)*17;

sig2 = (a<<2)+2;b = sig1 + sig2;

}

Page 45: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

45

Interfaces - Introduction

> Interfaces allow Handel-C designs to connect to external hardware and logic.

> Three types of interfaces

Buses – used for connecting to external pins

Ports – used for creating connection points for external logic.

e.g. Creating the ports for a VHDL entity

User Defined – used for including external logic blocks inside a Handel-C design.

e.g. Including an EDIF black box inside a deign.

Page 46: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

46

Interfaces – Buses

> Makes connections to pins on the FPGA. Bus types

Output

Input – direct, clocked and latched input

Tri-state – direct, clocked and latched tri-state

interface bus_in(int 4) Address() with {data={P1,P2,P3,P4}};x=Address.in;

x

P1P2

P4P3

Addressx

Page 47: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

47

Interfaces – Ports

> Allows connection points for external logic to be specified. e.g. Defining the ports for a ‘black box’ VHDL entity

Port types: Input, Output

//Declare Portsinterface port_in(int 4 Input1) InputPort1();interface port_in(int 4 Input2) InputPort2();interface port_out() OutputPort(int 4 Output = OutReg);

Handel-C black box

Input1

Input2

Output

Page 48: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

48

Interfaces – User Defined

> Allows external logic blocks to be used inside a Handel-C design. e.g. Using an EDIF core.

//Instantiate connections to coreinterface pipe_mult(int 4 Result)

Multiplier( int 4 A, int 4 B);

Handel-C Design

ResultBA EDIF Module

pipe_mult.edf

Page 49: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

49

Multiple Clock Domains - example

Domain1.c Domain2.c

chan unsigned 8 ComChan;

set clock = external "C1";

void main(void){

unsigned 8 x;

do{

x++;ComChan ! x;

}while(1);}

extern chan unsigned 8 ComChan;

set clock = external "C2";

void main(void){

unsigned 8 y;

do{

ComChan ? y;}while(1);

}

Page 50: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

50

Handel-C Summary

> Handel-C is based on ANSI C

> Well-defined semantics similar to OCCAM/CSP

> Additions: support for parallelism

channels for communications between parallel processes

operators for detailed control of hardware

constructs for RAM, ROM, interfacing, etc.

Page 51: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

51Nov02

Lab #1

Quick Start DK1, Handel-C and the RC100

Page 52: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

52Nov02

Tool Connectivity

The Whole Y-Chart

Page 53: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

53

Tool Connectivity

Page 54: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

54

Black Boxes - Xilinx CoreGen

Page 55: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

55

Co-Simulation with HDL

55

Page 56: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

56

Co-Simulation with ISS

Page 57: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

57

HW-SW Co-Simulation &Virtual Platforms

Page 58: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

58

MatLab Simulink

Filter.hcc

Sfunc.cpp

dll

Page 59: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

59

Co-Simulation with System-C

Page 60: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

60Nov02

Lab #2

Advanced Features

Page 61: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

61Nov02

PDK – Platform Dev Kit

PDK, PAL and DSM

Page 62: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

62

Introduction to PDK

> PDK – Platform Developer’s Kit

> Goal – to provide an integrated package of tools, support libraries and implementations to simplify application development and verification using DK1

Insulate developer from hardware details

Improve portability and maintainability

Provide key pre-packaged value–adding functionality

Allow simulation of the complete environment from modelling through to hardware implementation

> Benefits Reduce development time

Allow development focus on application added value

Page 63: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

63

Introduction to PDK

> PDK – Three major components

> DSM Integration between processors and FPGA/PLD

> PAL A consistent API for portable board-level Handel-C

implementations

> PSL Provides board, hardware or development tool specific support

for DK1 and Handel-C

Page 64: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

64

Introduction to PDK

> Each PDK component provides four functional areas:

> Simulation Provides hardware independent simulation of DSM and PAL APIs

and co-simulation with external tools and simulators

> Kit Provides key components and/or templates to allow

development of new, platform specific, implementations

> Platform Platform specific implementations of DSM, PAL and PSL

components

> Cores Implementations of added-value functionality, demos or

examples

Page 65: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

65

Platform Abstraction Layer (PAL)

Peripheral 1 Peripheral 2

Handel-C Application

Board

Platform Abstraction Layer Application Programming Interface

Platform Support Library (PSL)

PAL-Core

Page 66: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

66

DSM – Data Stream Manager

Processor

Software DSM Library

FPGA

Hardware DSM Library

Handel - C program

Handel - C program

Application Application

Hardware Bus Controller

Software Bus Controller

Figure 4

Page 67: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

67Nov02

Labs #3 and #4

PDK:

PAL and DSM

Page 68: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

68

Summary

> High performance gains with HW acceleration cards For appropriate algorithms

> Development kit enabling rapid design using a software-like development framework

Celoxica DK and Handel-C

> Consultancy and Services

> For More > www.celoxica.com

Page 69: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

72Nov02

Appendices

Technology Behind DK

Consultancy, Services, Projects

Case studies

University Programme

Page 70: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

73Nov02

The Technology Behind DK

Page 71: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

74

The Technology Behind DK

> Simple Hardware constructs

> Compilation Flow

> Optimisations

Page 72: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

75

The Hardware Description

> Data Path Circuitry to Move/Manipulate/Store Data

> Control Path Circuitry to schedule operations

Page 73: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

76

Control and Assignment

> Variables are mapped to hardware registers

> The control start signal forms the clock enable signal for the destination register of the assignment.

Figure. Implementation of Assignment

Q

DCLK

Start

F in ish

RCE

Exp

void main(void){

…R=Exp;…

}

Page 74: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

77

The IF Construct

> Start

F in ish

S1 S2

BE

void main(void){

…if { BE }

S1;

else S2;

…}

Page 75: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

78

Sequential Composition

>

S1

Start

F in ish

S2

S3

void main(void){

…S1;S2;S3;…

}

Page 76: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

79

Parallel Composition

>

void main(void){

…par{

S1;S2;S3;

}…

}

Start

F in ish

S1 S2 S3

D

Q

D

Q

D

Q

Page 77: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

80

Compilation Flow - Optimisations

> Generate AST from Source code

Macro Expansion

Width Inferencing

Design Checking

> Compilation to High Level Netlist

> Expansion to technology specific netlist

High Level Optimisation

Expansion

Low Level Optimisation

Compilation

Abstract Syntax Tree

High Level Netlist

Gate Level Netlist

Technology Independent

Technology Specific

Figure. Compilation flow after parsing.

Page 78: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

81

Re-Writing

> Logical equivalence

(a) Constant 1 input to AND Gate removed

(b) Gate removed with unused output

(c) Block removed with unused output

1

xy

xy

Figure. Some re-writing optimizations.

Removed

(a)

(b)

(c)

Page 79: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

82

Conditional Re-Writing

> Logical equivalence by testing for impossible Conditions

Gates removed for circuit with output independent of y

x

y

Figure. Conditional re-writing.

x

y

Page 80: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

83

Common Sub-Expression Elimination

> Test for common logic Duplicate AND gate

removed

xy

Figure. Common sub-expression elimination.

xy

Page 81: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

84

DK1 Optimisation Settings

Page 82: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

85Nov02

Customer Highlights

Consultancy, Services, Projects

Case studies

Page 83: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

86

Celoxica Expertise

> Technical Strengths Design Methodologies and Hardware Compiler technology

FPGA board design and prototyping

Image, Data processing and Multimedia Encryption

Compression/Decompression

Video Processing

Telecommunications Routers/Switches

Protocol stacks – IPv6, VoIP (H323, SIP), ATM

Software defined radio – UMTS, 3G, DAB

> Business Consultancy Analysis, Marketing and Strategy

Venture capital

Services and Support

Page 84: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

87

Marconi Celoxica Technology Demonstrator

> Internet Reconfigurable Hardware from Software FPGA based, no microprocessor or operating system

Different applications from the same hardware

Can be reconfigured over internet to new applications

> MMT 2000 IP Phone

MP3 player

Games console

Graphic display

Page 85: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

88

High Speed Video Prototyping System

> Customer Requirement to shorten the evaluation time of video

filter algorithms as candidates for use in DTVB

> Solution FPGA-based system comprising:

Wealth of analogue and digital video I/O

COTS boards and custom

Development kit: DK and Video framework libraries (SW/HW)

> Outcome Real-time evaluation system rather than

slow software models

Algorithm Evaluation times reduced from 12 to 3-6 months

Prototypes for ASIC process rather than software models

FP

GA

Hos

t Car

d

D IM Eexpansion

site

C om ponentA na logue

V ideoIn terfaceM odu le

(H D ) S D IIn terfaceM odu le

SD

I Inp

ut

SD

I Out

put #

1

SD

I Out

put #

2

Ana

logu

e In

put #

1

Ana

logu

e O

utpu

t #1

Ana

logu

e In

put #

2

Ana

logu

e O

utpu

t #2

C om ponentA na logue

V ideoIn terfaceM odu le

Page 86: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

89

EuroSkyWay Multimedia Satellite: Ground Traffic EuroSkyWay Multimedia Satellite: Ground Traffic SimulatorSimulator

• Services: 512, 2048 kb/s, 8...32 Mb/s (provider)• fixed and mobile users (aircraft, busses, vessels)• service launch in 2004

GT S C ont ro lle r

E VA L AN

G T S Pa ram e te rD atabase Access M ana ge r

EC S

Access M ana ge r

Access M ana ger

Access M ana ge r

Access M anage r

EPS

E xternal trafficG /A

M& C

IP/A TM etx ernal traff ic

GT S C ont ro lle r

E VA L AN

G T S Pa ram e te rD atabase Access M ana ge r

EC S

Access M ana ge r

Access M ana ger

Access M ana ge r

Access M anage r

EPS

E xternal trafficG /A

M& C

IP/A TM etx ernal traff ic

EuroSkyWay PHY Board

• for system verification and end-to-end perf. testing• generation of total network traffic (ATM, IP)• full implementation of ESW protocols (layer 1/2/3)• digital baseband transmission

SaT-B/C

PTN GTW PrT-A,-B

SaT-A

ServiceProvider Center

SaT-A

CollectiveUse

IndividualUse

SaT-B

SaT-C

ISL

160 Kbps

512/2048Kbps

160 Kbps

512 Kbps

2048 Kbps

32.768 Mbps 6.144 / 32.768 Mbps

InSS

8 x 32.768 Mbps

To/FromSupportedNetworks

CLUSTER1

1B1A

MCS

2 x32.768Mbps

NetworkOperation

Center

InSS

n x 32.768 Mbps

To/From

Supported

Networks

Page 87: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

90

JPEG2000 MQ encoder implementation

> SCSD version Slices 1,999

Device utilization 18%

Speed (MHz) 115.5

Lines of code330

Design time (days) 10 +2

Av cycles per code block (000’s) 108

Processing time (ms) 0.939

Simulation time for Lena jpeg 5 minutes> Traditional HDL Implementation

Slices 620

Device utilization 6%

Speed (MHz) 76

Lines of code 800

Design time (days) * 30+

Av cycles per code block (000’s) 67.5

Processing time (ms) 0.888

Simulation time for Lena jpeg XXX

IBM Power PC

Wind River SBC405 GP

Xilinx Virtex

Proteus FPGA daughter card

Page 88: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

91

Customer highlights

"The DK1 suite enables us to work at a high level, quickly optimise C code for hardware implementation, prototype using FPGAs and will ultimately provide the HDL output for our ASIC design.“

Shigeru Kawada, General Manager, NEC Electronics Singapore's technology centre

I visited Celoxica's headquarters. While there, I re-implemented our existing VHDL solution using the DK1 suite in just one day. I was hooked.“

Jan Mennekens, chief technical officer M-TEC WIRELESS.

A new joint development team to create powerful, flexible and scaleable application specific servers was announced today. Celoxica Ltd, Motorola and StrongBow Technologies are working together to create servers that embed applications, such as transaction processing for credit cards, directly in hardware

“Without Celoxica’s tools, this would not have been possible,”

Alan Prouse, CEO and founder of StrongBow Technologies.

"The real value of the Celoxica tools is the quick re- engineering capability and smooth transfer to a production platform. The DK1 methodology allows us to accomplish tasks in a time frame that conventional design methods cannot handle.”

Dennis Hazel, Director of Engineering, Foxboro

"Our evaluation of DK1 clearly demonstrated that the flow increases our engineering throughput, and allows us to make better use of our scarce hardware engineering resources. Using Celoxica's Handel-C to hardware flow, our software engineers can take a software solution through to hardware allowing the hardware designer to focus on system integration and optimisation."

Andy Davey, senior engineer at Cogent Defence Systems

“Our original project plan was slated for 12-18 months using the traditional HDL design methodology. By adopting the Handel-C high level design language methodology, we were able to finish the project in 6 months with DK1 design suite and Xilinx ISE software targeting Xilinx VII-6000 FPGAs. We put in minimum development resource, but still met the design specification, timing far ahead of the schedule. Anyone can use the DK1 design suite to design efficient hardware.”

Gary Mallaley, Manager of Strategy Development at Northrop Grumman.

Page 89: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

92Nov02

CUP:Celoxica University Programme

Recent Highlights

Page 90: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

93

Introduction to CUP

> CUP has been active since the company was formed

> 700 universities worldwide registered with a multi-disciplinary user base

> Strategic relationship with XUP

> University specific products and services Heavily discounted

> Focused upon supporting innovative teaching and research

> Comprehensive Website www.celoxica.com/programs/university/index

> Register Now!

Page 91: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

94

Benefits to Universities

> Rapid Design Exploration Fit more interest into time dependent project work through rapid

prototyping and productivity improvements

Port protoyped C designs to Handel-C for implementation in FPGA’s

> For Computer Science disciplines Familiar software environment

Parallel programming environment

Computer architecture exploration – build your own instruction sets

Exploring hardware accelerated systems

> For EE disciplines Cycle accurate interactive simulation

SW/HW co-design, system design and SOC

Integration with HDL’s

> Creates a bridge for increased collaboration between different disciplines

Page 92: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

95

Update on University activity

> Research Articles: Customising Floating-Point Designs, Imperial College, Xilinx.

Accelerating Radiosity Calculations using Reconfigurable Platforms, Altaf Abdul Gaffar and Wayne Luk, Imperial College

A Hardware Implementation of a Genetic Programming System Using FPGAs and Handel-C, Peter Martin, University of Birmingham

> Teaching Programmes VDEC Japan now support DK1/Handel-C

HARDWARE/SOFTWARE CO-DESIGN: A SHORT COURSE FOR UNBELIEVERS, A. Downton et al, University of Essex

Page 93: 1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

96

IGOL Framework

What is it?

> COM based Framework for Development and Distribution of Hardware Acceleration

Testing and debugging for development

Runtime services and packaging for deployment

> Application Examples Premier, Photoshop, WinAmp,

VirtualDub, DirectShow

Demonstrates

> Ease of Development and Deployment of Hardware Acceleration

> Separation of concerns Hardware developers only

develop hardware

Application developers only develop software

> Re-use of hardware and software components

Simply updating and patching

Automatic application support for new components

Adobe Acrobat Document