46524297 reconfigurable computing

70
Reconfigurable Computing Sherif Abou Zied Mohammad [email protected] Reconfigurable Computing 1

Upload: denise-nelson

Post on 21-Jan-2016

22 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 46524297 Reconfigurable Computing

Reconfigurable Computing

Sherif Abou Zied Mohammad [email protected]

Reconfigurable Computing 1

Page 2: 46524297 Reconfigurable Computing

OUTLINE

• INTRODUCTION

• RECONFIGURABLE COMPUTING ARCHITECTURES

• RECONFIGURATION MANAGEMENT

• PROGRAMMING RECONFIGURABLE SYSTEMS

• COMPILING C FOR SPATIAL COMPUTING

• HW/SW Partitioning

• BEE2:A High-End Reconfigurable Computing System

• REFERENCES

Reconfigurable Computing 2

Page 3: 46524297 Reconfigurable Computing

INTRODUCTION Conventional Computing

• Software-programmed microprocessors

– Processors execute a set of instructions.

– Performance can suffer, if not in clock speed then in work rate.

– Lower performance than ASICs.

Reconfigurable Computing 3

Page 4: 46524297 Reconfigurable Computing

INTRODUCTION Conventional Computing

• Hardwired (ASICs)

– Special purpose.

– Very fast and efficient.

– Circuit cannot be altered after fabrication.(Redesign!)

Reconfigurable Computing 4

Page 5: 46524297 Reconfigurable Computing

INTRODUCTION Reconfigurable Computing • Fill the gap between hardware and software.

– Much higher performance than software.

– Higher level of flexibility than hardware.

Reconfigurable Computing 5

Page 6: 46524297 Reconfigurable Computing

INTRODUCTION Reconfigurable Computing • Uses FPGAs or other programmable hardware for

compute-intensive calculations.

• Usually coupled with a general-purpose microprocessor that is responsible for

– Controlling the reconfigurable logic .

– Executing program code that cannot be efficiently accelerated.

Reconfigurable Computing 6

Page 7: 46524297 Reconfigurable Computing

INTRODUCTION Reconfigurable devices

• Contain an array of computational elements.

• Functionality is determined through configuration bits.

Reconfigurable Computing 7

Page 8: 46524297 Reconfigurable Computing

INTRODUCTION Reconfigurable devices

• Most current FPGAs and reconfigurable devices are

SRAM-programmable

– Control routing.

– Control multiplexers, LUT,…

– Control signals for a computational units.

Reconfigurable Computing 8

D flip-flop with optional bypass

3-input LUT

Page 9: 46524297 Reconfigurable Computing

INTRODUCTION Reconfigurable devices

• Reconfigurable Processing Fabric (RPF)

– Fine-grained

– Coarse-grained

Reconfigurable Computing 9

Page 10: 46524297 Reconfigurable Computing

INTRODUCTION Reconfigurable devices

• Fine-grained RPF

– Bit manipulation tasks

– For complex calculations, numerous fine-grained PEs are required.

• slower clock rates

Reconfigurable Computing 10

Page 11: 46524297 Reconfigurable Computing

INTRODUCTION Reconfigurable devices

• Coarse-grained RPF

– Use bus interconnect and PEs

– Performs more than just bitwise operations, such as ALUs and multipliers.

Reconfigurable Computing 11

Page 12: 46524297 Reconfigurable Computing

OUTLINE

• INTRODUCTION

• RECONFIGURABLE COMPUTING ARCHITECTURES

• RECONFIGURATION MANAGEMENT

• PROGRAMMING RECONFIGURABLE SYSTEMS

• COMPILING C FOR SPATIAL COMPUTING

• HW/SW Partitioning

• BEE2:A High-End Reconfigurable Computing System

• REFERENCES

Reconfigurable Computing 12

Page 13: 46524297 Reconfigurable Computing

RECONFIGURABLE COMPUTING ARCHITECTURES

Reconfigurable Computing 13

S. Goldstein, H. Schmit, M. Moe, M. Budiu, S. Cadambi, R. R. Taylor, R. Laufer. PipeRench: A coprocessor for streaming multimedia acceleration.

RPF integration

Page 14: 46524297 Reconfigurable Computing

RECONFIGURABLE COMPUTING ARCHITECTURES

• RPF integration – Separate processor (coprocessor)

• Data communication takes place through main memory

• Limited bandwidth between CPU and RPF

Reconfigurable Computing 14

Page 15: 46524297 Reconfigurable Computing

RECONFIGURABLE COMPUTING ARCHITECTURES

• RPF integration – Loosely coupled RPF and

processor architecture • RPF with the host

processor on the same chip

• Direct interaction between RPF and processor

• RPF with direct memory access

Reconfigurable Computing 15

Chameleon’s architecture

Page 16: 46524297 Reconfigurable Computing

RECONFIGURABLE COMPUTING ARCHITECTURES

• RPF integration – Tightly coupled RPF and

processor • RPF integrated as

functional unit such as ALU, Multipliers.

• RFU access input data through register files.

Reconfigurable Computing 16

The datapath of the processor + RFU architecture

Page 17: 46524297 Reconfigurable Computing

RECONFIGURABLE COMPUTING ARCHITECTURES

• RPF integration – Tightly coupled RPF and processor

• Virtual Instruction Configurations(VICs ) in the RFU typically run during the execute stage (and possibly the memory stage) of the pipeline.

Reconfigurable Computing 17

An example of a pipeline of a processor with an RFU

Page 18: 46524297 Reconfigurable Computing

OUTLINE

• INTRODUCTION

• RECONFIGURABLE COMPUTING ARCHITECTURES

• RECONFIGURATION MANAGEMENT

• PROGRAMMING RECONFIGURABLE SYSTEMS

• COMPILING C FOR SPATIAL COMPUTING

• HW/SW Partitioning

• BEE2:A High-End Reconfigurable Computing System

• REFERENCES

Reconfigurable Computing 18

Page 19: 46524297 Reconfigurable Computing

RECONFIGURATION MANAGEMENT Problem Definition

• Reconfigurability allows hardware to perform different tasks at different times.

• Application’s configurations can be swapped

• Reconfiguring the hardware at runtime is called Runtime Reconfiguration (RTR).

Reconfigurable Computing 19

Page 20: 46524297 Reconfigurable Computing

RECONFIGURATION MANAGEMENT Problem Definition

• RTR

– Run-time reconfiguration is based upon the concept of virtual hardware, which is similar to virtual memory.

• physical hardware is much smaller than the sum of the resources required.

• swap configurations in and out of the actual hardware.

Reconfigurable Computing 20

Page 21: 46524297 Reconfigurable Computing

RECONFIGURATION MANAGEMENT Problem Definition

• RTR

– Increases hardware utilization

– Introduces significant reconfiguration overhead

– Time consuming • Can require of hundreds of

milliseconds

Reconfigurable Computing 21

Page 22: 46524297 Reconfigurable Computing

RECONFIGURATION MANAGEMENT Problem Definition

• Computation and reconfiguration are mutually exclusive

– time spent reconfiguring is time lost in terms of application acceleration.

• Reconfiguration occupies approximately 25 to 98 percent of total execution time

Reconfigurable Computing 22

Page 23: 46524297 Reconfigurable Computing

RECONFIGURATION MANAGEMENT Configuration Architectures

• What is Configuration architectures?

• Architectures

– Single-context

– Multi-context

– Partially Reconfigurable

– Others

Reconfigurable Computing 23

Page 24: 46524297 Reconfigurable Computing

RECONFIGURATION MANAGEMENT Configuration Architectures

Single-context

• configurations are grouped into contexts, and each full context is swapped in and out of the FPGA as needed.

Reconfigurable Computing 24

Page 25: 46524297 Reconfigurable Computing

RECONFIGURATION MANAGEMENT Configuration Architectures

Single-context

• Configuration information is loaded into the programmable array through a serial shift chain

Reconfigurable Computing 25

Page 26: 46524297 Reconfigurable Computing

RECONFIGURATION MANAGEMENT Configuration Architectures

Single-context

• require few pins for configuration, potentially simplifying board-level design

• Entire chip must be reprogrammed for any change to the configuration data because the data cannot be selectively “reused” on the chip.

Reconfigurable Computing 26

Page 27: 46524297 Reconfigurable Computing

RECONFIGURATION MANAGEMENT Configuration Architectures

Single-context

• Configuration cycles can be reduced by widening the configuration path

– Virtex-5 allow a configuration data bus up to 32 bits wide

Reconfigurable Computing 27

Page 28: 46524297 Reconfigurable Computing

RECONFIGURATION MANAGEMENT Configuration Architectures

Multi-context

• Providing storage for multiple configurations

– facilitating configuration prefetching and fast reconfiguration

– Contains multiple planes (contexts) of configuration data

Reconfigurable Computing 28

Page 29: 46524297 Reconfigurable Computing

RECONFIGURATION MANAGEMENT Configuration Architectures

Multi-context

• Multiplexer chooses between the context planes

Reconfigurable Computing 29

Page 30: 46524297 Reconfigurable Computing

RECONFIGURATION MANAGEMENT Configuration Architectures

Multi-context advantage

• Background loading of configuration data

• Fast switching between stored configurations

– some in a single clock cycle

• Overlapping computations with configuration

Reconfigurable Computing 30

Page 31: 46524297 Reconfigurable Computing

RECONFIGURATION MANAGEMENT Configuration Architectures

Multi-context drawbacks

• Area overhead

– Additional configuration data

– Multiplexing

• Single cycle configuration

– Dynamic power?

Reconfigurable Computing 31

Page 32: 46524297 Reconfigurable Computing

RECONFIGURATION MANAGEMENT Configuration Architectures

Partially Reconfigurable

• Not all configurations require the entire chip area

• Reconfigure utilized resources only

• Use addressable configuration memory

Reconfigurable Computing 32

Page 33: 46524297 Reconfigurable Computing

RECONFIGURATION MANAGEMENT Configuration Architectures

Partially Reconfigurable

– Decrease reconfiguration time

– Decrease configuration data

– Configuration occupying large area (time issue)

– Independent configurations with overlapping hardware?

Reconfigurable Computing 33

Page 34: 46524297 Reconfigurable Computing

OUTLINE

• INTRODUCTION

• RECONFIGURABLE COMPUTING ARCHITECTURES

• RECONFIGURATION MANAGEMENT

• PROGRAMMING RECONFIGURABLE SYSTEMS

• COMPILING C FOR SPATIAL COMPUTING

• HW/SW Partitioning

• BEE2:A High-End Reconfigurable Computing System

• REFERENCES

Reconfigurable Computing 34

Page 35: 46524297 Reconfigurable Computing

PROGRAMMING RECONFIGURABLE SYSTEMS

• Reconfigurable systems can be ignored by application programmers unless they are able to easily incorporate its use into their systems.

• Software design environment that aids in the creation of configurations for the reconfigurable hardware is required.

Reconfigurable Computing 35

Page 36: 46524297 Reconfigurable Computing

PROGRAMMING RECONFIGURABLE SYSTEMS

• Software design environment

– Manual

• Powerful method for the creation of high-quality circuit designs.

• Requires a great deal of background knowledge of the particular reconfigurable system employed.

• Significant amount of design time.

Reconfigurable Computing 36

Page 37: 46524297 Reconfigurable Computing

PROGRAMMING RECONFIGURABLE SYSTEMS

• Software design environment

– Fully automatic

• Quick and easy.

• Makes the use of reconfigurable hardware more accessible to general application programmers.

• Quality may suffer.

Reconfigurable Computing 37

Page 38: 46524297 Reconfigurable Computing

PROGRAMMING RECONFIGURABLE SYSTEMS

Reconfigurable Computing 38

Page 39: 46524297 Reconfigurable Computing

OUTLINE

• INTRODUCTION

• RECONFIGURABLE COMPUTING ARCHITECTURES

• RECONFIGURATION MANAGEMENT

• PROGRAMMING RECONFIGURABLE SYSTEMS

• COMPILING C FOR SPATIAL COMPUTING

• HW/SW Partitioning

• BEE2:A High-End Reconfigurable Computing System

• REFERENCES

Reconfigurable Computing 39

Page 40: 46524297 Reconfigurable Computing

Compiling C for spatial computing Why C? • There are many more C programmers than hardware

designers.

• Writing an algorithm in C is typically faster than in an HDL.

• Large existing code base.

• Allows both hardware (HW) and software (SW) versions to be created

– operating system can choose at runtime which is better

Reconfigurable Computing 40

Page 41: 46524297 Reconfigurable Computing

Compiling C for spatial computing Why C? • Easy for the designer or compiler to quickly explore

the tradeoffs between different hardware/software partitioning.

• The code can be easily tested on a conventional

microprocessor.

Reconfigurable Computing 41

Page 42: 46524297 Reconfigurable Computing

Compiling C for spatial computing How C runs on spatial hardware (overview)

• In a C program, the statements execute in order.

• With spatial computation, each operation is implemented as a function unit

Reconfigurable Computing 42

Page 43: 46524297 Reconfigurable Computing

Compiling C for spatial computing How C runs on spatial hardware (overview)

Memory loads and stores

• Memory access operations must be scheduled

– allow sharing among memory operations.

– preserve sequential C semantics.

Reconfigurable Computing 43

Page 44: 46524297 Reconfigurable Computing

Compiling C for spatial computing How C runs on spatial hardware (overview)

If-then-else Using

Multiplexers

Reconfigurable Computing 44

Page 45: 46524297 Reconfigurable Computing

Compiling C for spatial computing How C runs on spatial hardware (overview)

More than just simple

if-then-else control flow

– Use sub-circuits

Reconfigurable Computing 45

Page 46: 46524297 Reconfigurable Computing

Compiling C for spatial computing How C runs on spatial hardware (overview)

Optimizing the

Common Path

Reconfigurable Computing 46

Page 47: 46524297 Reconfigurable Computing

Compiling C for spatial computing How C runs on spatial hardware (overview)

• What about

–Parallelism?

–Pipelining?

–Memory dependencies?

–Operator size?

Reconfigurable Computing 47

Page 48: 46524297 Reconfigurable Computing

Overall compiler flow

Compiling C for spatial computing Automatic Compilation

Reconfigurable Computing 48

C source code

Control

Flow Graph

Hyperblocks

Circuit

Generation

Data Flow

Graph

Page 49: 46524297 Reconfigurable Computing

Compiling C for spatial computing Automatic Compilation

Control Flow Graph (CFG)

• Breaking code into basic blocks of simple instructions.

• Blocks are connected by control edges indicating a possible branch.

• All instructions inside a given block execute once the block is entered.

Reconfigurable Computing 49

Page 50: 46524297 Reconfigurable Computing

Compiling C for spatial computing Automatic Compilation

Hyperblocks • CFG basic blocks are quite

small and limit our opportunities for parallelism.

• Compiler combines blocks along commonly taken paths.

• Hyperblocks have a single entry point at the top and one or more exits.

Reconfigurable Computing 50

Page 51: 46524297 Reconfigurable Computing

Compiling C for spatial computing Automatic Compilation

Data Flow Graph (DFG)

• The DFG is composed of nodes and edges.

• Nodes

– Inputs, constants, operations, memory access and exit nodes

• Edges

– Data transfer edges, ordering edge, exit edge

Reconfigurable Computing 51

Page 52: 46524297 Reconfigurable Computing

Compiling C for spatial computing Automatic Compilation

Data Flow Graph (DFG)

Reconfigurable Computing 52

Page 53: 46524297 Reconfigurable Computing

Compiling C for spatial computing Automatic Compilation

DFG optimizations

• Strength reduction – replacing one operator with another operator(s)

having less overall latency/area. • replace x*2 with x+x or x<<1

• x*7 can be expressed as (x<<2)+(x<<1)+x, but even better as (x<<3)-x.

Reconfigurable Computing 53

Page 54: 46524297 Reconfigurable Computing

Compiling C for spatial computing Automatic Compilation

DFG optimizations

• Boolean value identification – ISO C does not contain a Boolean data type

– Although the result of a comparison is defined to be either 0 or 1, the type of the result is a signed integer—typically 32 bits.

– Use only one bit

Reconfigurable Computing 54

Page 55: 46524297 Reconfigurable Computing

Compiling C for spatial computing Automatic Compilation

DFG optimizations • Type-based operator size reduction

– ISO C semantics dictate that arithmetic and logical operations involving type char and/or short operands must be performed at the precision of type int.

– Thus, a 16-bit adder will give the same result as a 32-bit adder

Reconfigurable Computing 55

Page 56: 46524297 Reconfigurable Computing

Compiling C for spatial computing Automatic Compilation

DFG optimizations

• Type-based operator size reduction

– Analyze number of bits actually required by variables and operators.

• Example – Integer i within the loop

for (i = 0; i < 100; i++)

Reconfigurable Computing 56

Page 57: 46524297 Reconfigurable Computing

Compiling C for spatial computing Automatic Compilation

DFG to Reconfigurable Fabric • Mapping DFG nodes to modules

• Scheduling each module to a specific timestep.

• Then, finally, connections are made between modules from different hyperblocks sub-circuits to complete the overall circuit.

Reconfigurable Computing 57

Page 58: 46524297 Reconfigurable Computing

OUTLINE

• INTRODUCTION

• RECONFIGURABLE COMPUTING ARCHITECTURES

• RECONFIGURATION MANAGEMENT

• PROGRAMMING RECONFIGURABLE SYSTEMS

• COMPILING C FOR SPATIAL COMPUTING

• HW/SW Partitioning

• BEE2:A High-End Reconfigurable Computing System

• REFERENCES

Reconfigurable Computing 58

Page 59: 46524297 Reconfigurable Computing

HW/SW Partitioning • For systems that include both reconfigurable

hardware and a traditional microprocessor.

• program must first be partitioned into – Sections to be executed on the reconfigurable

hardware • ex. fixed datapath operations

– Sections to be executed in software on the microprocessor • ex. complex control sequences such as variable-length

loops

Reconfigurable Computing 59

Page 60: 46524297 Reconfigurable Computing

HW/SW Partitioning

• Partitioning

– Manually

• Program developed ends up tuned to a specific machine

• Alternative solution is to use compiler directives

Reconfigurable Computing 60

The NAPA C language [Gokhale and Stone 1998] provides pragma statements to allow a programmer to specify whether a section of

code is to be executed in software on the Fixed Instruction Processor (FIP), or in hardware on the Adaptive Logic Processor

(ALP).

Page 61: 46524297 Reconfigurable Computing

HW/SW Partitioning

• Partitioning

– Automatically

• compiler and runtime system take full responsibility for determining the right code and granularity to move to the reconfigurable fabric.

• reconfigurable hardware transparent to the designer

• Cost functions based upon acceleration gained • to determine whether the cost of configuration

is overcome by the benefits of hardware execution or not.

Reconfigurable Computing 61

Page 62: 46524297 Reconfigurable Computing

OUTLINE

• INTRODUCTION

• RECONFIGURABLE COMPUTING ARCHITECTURES

• RECONFIGURATION MANAGEMENT

• PROGRAMMING RECONFIGURABLE SYSTEMS

• COMPILING C FOR SPATIAL COMPUTING

• HW/SW Partitioning

• BEE2:A High-End Reconfigurable Computing System

• REFERENCES

Reconfigurable Computing 62

Page 63: 46524297 Reconfigurable Computing

BEE2:A High-End Reconfigurable Computing System

• BEE: Berkeley Emulation Engine

– BEE2 can provide over 10 times more computing throughput than a DSP-based system with similar power consumption and cost.

– Over 100 times that of a microprocessor-based system.

Reconfigurable Computing 63

Page 64: 46524297 Reconfigurable Computing

BEE2:A High-End Reconfigurable Computing System

• BEE: Berkeley Emulation Engine

– Applications

• Emulation and design of novel wireless communications systems.

• High-performance real-time digital signal processing.

• Real-time scientific computation and simulation.

• The acceleration of CAD tools.

Reconfigurable Computing 64

Page 65: 46524297 Reconfigurable Computing

BEE2:A High-End Reconfigurable Computing System

• BEE: Berkeley Emulation Engine

– BEE2 system uses Xilinx Virtex-2 Pro FPGAs

– Virtex-2 Pro embeds PowerPC 405 processor cores into the reconfigurable fabric.

– BEE2 has no hardware-managed caches, hence all data transfers within the system have tightly bounded latency.

• BEE2 is therefore well suited for real-time applications

Reconfigurable Computing 65

Page 66: 46524297 Reconfigurable Computing

BEE2:A High-End Reconfigurable Computing System

• BEE: Berkeley Emulation Engine

– Programming environment

• High-level block diagram design environment based on Mathworks Simulink and the Xilinx System Generator library.

• Uses automatic compilation tools

Reconfigurable Computing 66

Page 67: 46524297 Reconfigurable Computing

BEE2:A High-End Reconfigurable Computing System

• Compute modules: – Compute modules:

consists of five “Xilinx Virtex 2 Pro 70” FPGA chips directly connected to four Dual Data- rate2(DDR2)- 240-pin DRAM DIMMs, with a maximum capacity of 4 Gbytes per FPGA.

– The local mesh connects the four compute FPGAs on a 2D grid.

Reconfigurable Computing 67

Page 68: 46524297 Reconfigurable Computing

BEE2:A High-End Reconfigurable Computing System

• Compute modules:

– Each link between the adjacent FPGAs on the grid provides over 40 Gbps of data throughput per link.

– The four down links from the control FPGA to each of the computing FPGAs provide up to 20 Gbps per link

Reconfigurable Computing 68

Page 69: 46524297 Reconfigurable Computing

REFERNCES

• Scott Hauck and Andre Dehon, “Reconfigurable Computing The Theory and Practice of FPGA Based Computing”

• Katherine Compton,” Reconfigurable Computing: A Survey of Systems and Software”, Northwestern University.

• Chen Chang, John Wawrzynek, and Robert W. Brodersen, “Berkeley BEE2: A High-End Reconfigurable Computing System”, University of California.

Reconfigurable Computing 69

Page 70: 46524297 Reconfigurable Computing

Reconfigurable Computing 70

Thank You